-
Notifications
You must be signed in to change notification settings - Fork 27
More Data From the PDF #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
# Conflicts: # server/api/lib/mavat/index.js # server/api/model/plan.js # server/package-lock.json # server/package.json # server/tests/tables_structs/plan_struct.js
| // this function look for the correct columns for the given headers, and returns a factory embedded with these findings | ||
| const rowAbstractFactoryChart1_7 = (firstPageOfTable, headersStartIndex) => { | ||
| // clean the headers | ||
| firstPageOfTable = firstPageOfTable.map(row => row.map(cell => cell.replace(/\n/g, ' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if firstPageOfTable is undefined or not an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always an array. We can add a check beforehand, but we firstPageOfTable comes from the PDF extension and it's consistent.
Adding a check here will require adding the same check on at least 10 places.
| contains: getFromArr(row, containsIdx), | ||
| scale: getFromArr(row, scaleIdx), | ||
| number_of_pages: getFromArr(row, numberOfPagesIdx), | ||
| edit_date: getFromArr(row, editDateIdx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snake casing in purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, for the db. By using snake casing we don't need to change the map's keys when inserting into the db.
|
|
||
| if (headersStartIndex === -1) { | ||
| // it's an error | ||
| log.error('didn\'t find headers for some char 1.6'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add planId to the log so we can assign to a process for investigation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a planId here requires me to pass planId into more software layers (at least 2 more) and change at least 25 lines in at least 15 different files. It will be a bit complicated.
The crawler that runs in production doesn't log the planId if it fails. Can you check the logs to see if we got a lot of "didn't find headers for *" to see if it worth investing in?
|
|
||
| if (headersStartIndex === -1) { | ||
| // it's an error | ||
| log.error('didn\'t find headers for some chart 3.1_without_change'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as the comment on the log.error on 1.6
This PR brings us tables:
This PR also brings an improvement in runtime since it knows to cut the appendixes from a heavy PDF.