Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lolipopshock
Copy link
Collaborator

No description provided.

@lolipopshock
Copy link
Collaborator Author

lolipopshock commented Jul 13, 2021

After some reconsideration, I think it would probably be better if the load_image function exist in Document. When calling doc.images, it will identify whether the document image is loaded - If not, it will run the commands to have it loaded. The reason is very simple - efficiency. It's an expensive operation and we only want to have images loaded when needed.

But there are definitely some drawbacks. And in order to implement this function, we need some modifications for the Document class - for example, having the page size and original file path stored.

@lolipopshock
Copy link
Collaborator Author

  1. The images are loaded by default, and can be disabled via parser.parse(..., load_images=False).
  2. Will add readme and tests later today.

@lolipopshock lolipopshock requested a review from kyleclo July 14, 2021 16:36
Copy link
Collaborator

@kyleclo kyleclo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

,lgtm

def tobase64(self):
# Ref: https://stackoverflow.com/a/31826470
buffered = BytesIO()
self.save(buffered, format="JPEG")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not png?


# Monkey patch the PIL.Image methods to add base64 conversion

def tobase64(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to support 2 forms: base64 and proper image file that one can download & view

img = Image.open(buffered)
return img

Image.Image.tobase64 = tobase64 # This is the method applied to individual Image classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer we define our own Image class, inherit from PIL's Image class, and override the methods.

from mmda.types.image import Image


class BaseParser:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the rename?

Sent: [],
Block: []
Block: [],
DocImage: [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add inline comment saying this is loaded into doc in parse()

@kyleclo kyleclo merged commit 41325ee into main Jul 14, 2021
@kyleclo kyleclo deleted the add-image-extractor branch July 14, 2021 22:10
@geli-gel geli-gel mentioned this pull request Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants