Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lolipopshock
Copy link
Collaborator

@lolipopshock lolipopshock commented Apr 11, 2022

The fix tries to improve the robustness of the VILA library for "large" PDF -- the width or height of the PDF is more than 1000, and it has tokens with bounding box dimensions larger than 1000. In this case, the input will break the 2D position encoding process used in the base Transformer models, which is fundamentally a lookup table (bbox dimension value -> some embedding values) that only takes input from 0~1000.

I added a normalize function to solve this issue. When the input PDF size is "large" (i.e., either page_width>1000 or page_height>1000), it will normalize all the tokens in this page using the normalize_bbox function that coverts the dimension to the range 0~1000.

However, this solution is not perfect ~ our models hasn't been appropriately tuned for this large PDFs. Ideally, we should retrain such models with normalized inputs.

It will lead to one API change:

import layoutparser as lp # For visualization 

from vila.pdftools.pdf_extractor import PDFExtractor
from vila.predictors import HierarchicalPDFPredictor
# Choose from SimplePDFPredictor,
# LayoutIndicatorPDFPredictor, 
# and HierarchicalPDFPredictor

pdf_extractor = PDFExtractor("pdfplumber")
page_tokens, page_images = pdf_extractor.load_tokens_and_image(f"path-to-your.pdf")

vision_model = lp.EfficientDetLayoutModel("lp://PubLayNet") 
pdf_predictor = HierarchicalPDFPredictor.from_pretrained("allenai/hvila-row-layoutlm-finetuned-docbank")

for idx, page_token in enumerate(page_tokens):
    blocks = vision_model.detect(page_images[idx])
    page_token.annotate(blocks=blocks)
    pdf_data = page_token.to_pagedata().to_dict()
    predicted_tokens = pdf_predictor.predict(pdf_data, page_token.page_size) #<---- you need to specify the page size in the predict function! 
    lp.draw_box(page_images[idx], predicted_tokens, box_width=0, box_alpha=0.25)

@lolipopshock
Copy link
Collaborator Author

lolipopshock commented Apr 11, 2022

Now VILA works for large PDFs like poster: page_size is (2304.0, 2448.0) for the example below.

image

Comment on lines +50 to +51
# Right now only execute this for only "large" PDFs
# TODO: Change it for all PDFs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're doing this because there isn't a retrained model at this time, correct?

Copy link
Collaborator Author

@lolipopshock lolipopshock Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's correct!

@lolipopshock
Copy link
Collaborator Author

For further reference, when we merge this issue, we'll also release v0.3.0 of vila due to changes in API.

Copy link

@yoganandc yoganandc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lolipopshock let's merge, i verified that master crashes with a big pdf and that this branch doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants