-
Couldn't load subscription status.
- Fork 15
Open
Description
Hi team, I was trying to get the project running, but it failed at the template prediction step. Here are the logs:
$ python3 structured_extraction.py
--- Processing ./ExtractFromPDF.pdf ---
2025-05-04 09:56:07,079 - INFO - Starting TWIX processing for: ./ExtractFromPDF.pdf
2025-05-04 09:56:07,079 - INFO - Running twix.transform...
Phrase extraction starts...
Phrase extraction for the merged file starts...
Phrase extraction for individual files starts...
Field prediction starts...
2025-05-04 09:56:17,536 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
perfect match starts...
cluster pruning starts...
2025-05-04 09:56:22,187 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
re-clustering starts...
2025-05-04 09:56:41,467 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Template prediction starts...
There does not exist an input such that every predict key exists at least twice.
2025-05-04 09:56:41,476 - ERROR - An error occurred during transaction extraction: list index out of range
Traceback (most recent call last):
File "/Users/devkrishna/Desktop/Playground/TWIX/twix-ui/backend/structured_extraction.py", line 267, in extract_credit_card_transactions
fields, template, extraction_objects, cost = twix.transform(
^^^^^^^^^^^^^^^
File "/Users/devkrishna/Desktop/Playground/TWIX/twix/transform.py", line 11, in transform
template, cost = pattern.predict_template(pdf_paths, result_folder_path, LLM_model_name=LLM_model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/devkrishna/Desktop/Playground/TWIX/twix/pattern.py", line 1583, in predict_template
template = predict_template_docs(phrases_bb, keywords, phrases, metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/devkrishna/Desktop/Playground/TWIX/twix/pattern.py", line 513, in predict_template_docs
row_mp = seperate_rows(sample_phrases_bb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/devkrishna/Desktop/Playground/TWIX/twix/pattern.py", line 1165, in seperate_rows
p_pre = pv[0][0]
~~^^^
IndexError: list index out of range
--- Extraction Results ---
{
"Transactions": []
}
--- Estimated Cost ---
Total estimated cost: $0.000000
Intermediate files saved in: ./twix_output/ExtractFromPDF
Another thing I noticed is that although it prints out total cost is $0, there was usage of the OpenAI APIs which I could confirm from my dashboard.
Metadata
Metadata
Assignees
Labels
No labels