-
Notifications
You must be signed in to change notification settings - Fork 6.5k
migrate code from googleapis/python-documentai #8450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
227 commits
Select commit
Hold shift + click to select a range
2547a0a
feat: add v1beta3 (#34)
yoshi-automation 958d735
Chore: Add requirements.txt and noxfile.py for new samples (#45)
aribray d1e4a53
docs(samples): new Doc AI samples for v1beta3 (#44)
aribray 71987a6
chores: fixed small issue with start index problem (#56)
munkhuushmgl 59d811d
chore: update samples noxfile
yoshi-automation 898aee1
fix: removes C-style semicolons and slash comments (#59)
telpirion 5b5558e
chore(deps): update dependency google-cloud-storage to v1.33.0 (#61)
renovate-bot c669546
fix: added if statement to filter out dir blob files (#63)
munkhuushmgl 86646b7
samples(fix): change comments to match function signature (#68)
telpirion 108f47b
fix: moves import statment inside region tags (#71)
telpirion 7d0b369
samples: added test that covers the wrong file type case (#69)
munkhuushmgl 984627e
chore: update templates (#74)
yoshi-automation deeba8e
samples: migrate v1beta2 doc AI samples (#79)
munkhuushmgl 4cf805b
chore(deps): update dependency google-cloud-storage to v1.35.0 (#78)
renovate-bot 16016fa
chore: added increased timeout on flaky batch request (#84)
munkhuushmgl 8ee5e99
chore: exclude `.nox` directories from linting (#87)
yoshi-automation d07e423
chore(deps): update dependency google-cloud-storage to v1.36.0 (#91)
renovate-bot a71eb69
chore(deps): update dependency google-cloud-storage to v1.36.1 (#92)
renovate-bot a8959c7
fix(samples): swaps 'continue' for 'return' (#93)
telpirion 5b15d2d
fix: adds comment with explicit hostname change (#94)
telpirion a762210
chore(deps): update dependency google-cloud-storage to v1.36.2 (#95)
renovate-bot 722b954
chore: update templates (#97)
yoshi-automation b6065a1
chore(deps): update dependency google-cloud-storage to v1.37.0 (#104)
renovate-bot 6eef6ae
chore(deps): update dependency google-cloud-documentai to v0.4.0 (#103)
renovate-bot c152dc7
samples: updates Document AI samples to v1 version of service (#108)
telpirion a205555
samples: more updates for v1 (#121)
telpirion 972ca4e
chore: template updates (#120)
yoshi-automation a8f0e1e
chore(deps): update dependency google-cloud-storage to v1.37.1 (#114)
renovate-bot 9df81d5
chore: migrate to owl bot (#130)
parthea 83feb0c
chore(deps): update dependency pytest to v6.2.4 (#124)
renovate-bot 587e8bd
chore: new owl bot post processor docker image (#152)
gcf-owl-bot[bot] 47a81e2
fix: Parsing pages, but should be paragraphs (#147)
dgallegos 6dd239d
chore(deps): update dependency google-cloud-documentai to v0.5.0 (#155)
renovate-bot a43e901
chore(deps): update dependency google-cloud-storage to v1.38.0 (#133)
renovate-bot 0890cd4
chore(deps): update dependency google-cloud-storage to v1.39.0 (#169)
renovate-bot b034e29
chore(deps): update dependency google-cloud-storage to v1.40.0 (#173)
renovate-bot 4e326a0
chore(deps): update dependency google-cloud-storage to v1.41.0 (#177)
renovate-bot 4998f05
feat: add Samples section to CONTRIBUTING.rst (#181)
gcf-owl-bot[bot] efe62f9
chore(deps): update dependency google-cloud-storage to v1.41.1 (#182)
renovate-bot 503978f
chore(deps): update dependency google-cloud-documentai to v1 (#185)
renovate-bot 9d1bba9
samples: moves region tag to include import statement (#186)
telpirion 3b074d0
chore: fix INSTALL_LIBRARY_FROM_SOURCE in noxfile.py (#192)
gcf-owl-bot[bot] 377ba35
chore(deps): update dependency google-cloud-storage to v1.42.0 (#194)
renovate-bot 45acae0
chore: drop mention of Python 2.7 from templates (#197)
gcf-owl-bot[bot] 8c83118
samples: moves import statement within region tags (#190)
telpirion 551e358
chore(deps): update dependency pytest to v6.2.5 (#204)
renovate-bot a5aca52
chore(deps): update dependency google-cloud-storage to v1.42.1 (#209)
renovate-bot d6a98e4
chore: blacken samples noxfile template (#212)
gcf-owl-bot[bot] 6d8146a
chore(deps): update dependency google-cloud-storage to v1.42.2 (#213)
renovate-bot 8b44e06
chore: fail samples nox session if python version is missing (#218)
gcf-owl-bot[bot] 7f0985a
chore(deps): update dependency google-cloud-storage to v1.42.3 (#219)
renovate-bot 0df9c4a
chore(python): Add kokoro configs for python 3.10 samples testing (#225)
gcf-owl-bot[bot] 07f6465
chore(deps): update dependency google-cloud-documentai to v1.1.0 (#227)
renovate-bot 2ec503b
chore(deps): update dependency google-cloud-documentai to v1.2.0 (#232)
renovate-bot 1d60a72
docs(samples): add OCR, form, quality, splitter and specialized proce…
0c6dd35
chore(python): run blacken session for all directories with a noxfile…
gcf-owl-bot[bot] 7fb9192
chore(samples): Add check for tests in directory (#257)
gcf-owl-bot[bot] 0025bee
chore(deps): update dependency google-cloud-storage to v2 (#247)
renovate-bot d816662
chore(python): Noxfile recognizes that tests can live in a folder (#262)
gcf-owl-bot[bot] cdf1d20
chore(deps): update dependency google-cloud-documentai to v1.2.1 (#263)
renovate-bot c0e07a6
test: strip quotes and newlines from output (#279)
busunkim96 eb4c2ff
chore(deps): update dependency google-cloud-storage to v2.1.0 (#264)
renovate-bot d5d6d9d
chore: Adding support for pytest-xdist and pytest-parallel (#286)
gcf-owl-bot[bot] 3d2d595
chore(deps): update all dependencies (#281)
renovate-bot e695b3e
chore(deps): update dependency google-cloud-documentai to v1.3.0 (#290)
renovate-bot 9a3e17d
chore(deps): update dependency pytest to v7.1.0 (#291)
renovate-bot 86c3740
chore(deps): update dependency google-cloud-storage to v2.2.0 (#292)
renovate-bot 22b275f
chore(deps): update dependency google-cloud-storage to v2.2.1 (#293)
renovate-bot cf055ea
chore(deps): update dependency pytest to v7.1.1 (#296)
renovate-bot 938fe7c
chore(deps): update dependency google-cloud-documentai to v1.4.0 (#297)
renovate-bot 3f6153b
chore(python): use black==22.3.0 (#301)
gcf-owl-bot[bot] 89db439
chore(deps): update dependency google-cloud-storage to v2.3.0 (#310)
renovate-bot 8d86326
chore(python): add nox session to sort python imports (#312)
gcf-owl-bot[bot] 12423ce
chore(deps): update dependency pytest to v7.1.2 (#316)
renovate-bot 8cb7c21
chore: removed v1beta2 samples (#315)
galz10 e06fe31
chore(deps): update dependency google-cloud-documentai to v1.4.1 (#319)
renovate-bot 667edcd
fix: require python 3.7+ (#348)
gcf-owl-bot[bot] 35b59e6
chore(deps): update all dependencies (#338)
renovate-bot bfe4ffc
refactor: Updates to Document AI Python Samples (#323)
holtskinner 1cbbbe9
chore(deps): update all dependencies (#355)
renovate-bot 8695324
chore(deps): update dependency google-cloud-documentai to v1.5.1 (#362)
renovate-bot 3fc6791
docs(samples): Added Human Review Request Sample (#357)
holtskinner be8a692
chore(deps): update dependency google-cloud-documentai to v2 (#364)
renovate-bot ee80dee
chore(deps): update dependency pytest to v7.1.3 (#374)
renovate-bot bb71e29
docs(samples): Updated Samples for v2.0.0 Client Library (#365)
holtskinner f1ce969
chore(main): release 2.0.1 (#378)
release-please[bot] d548678
chore: detect samples tests in nested directories (#379)
gcf-owl-bot[bot] ce57573
chore(deps): update dependency google-cloud-documentai to v2.0.1 (#380)
renovate-bot 8edec37
docs(samples): Added Processor Version Samples (#382)
holtskinner 24a0627
chore(deps): update dependency google-cloud-documentai to v2.0.2 (#386)
renovate-bot 2bdd856
chore(deps): update dependency google-cloud-documentai to v2.0.3 (#390)
renovate-bot 347b2d4
chore(deps): update dependency pytest to v7.2.0 (#392)
renovate-bot 44f7f92
docs(samples): Added extra exception handling to operation samples (#…
holtskinner da12685
chore:Remove Sample Inputs/Outputs from Repo (#391)
holtskinner de63d3c
Merge remote-tracking branch 'migration/main' into python-documentai-…
nicain 16a38f5
Remove unused previous document ai samples folder. Naming is inconsis…
nicain a7914ac
Update documentai folder
nicain 21b58d9
update blunderbuss.yml
nicain 5d7df35
Merge branch 'main' into python-documentai-migration
nicain 2da4812
Merge branch 'main' into python-documentai-migration
nicain 3fe2e1d
Update documentai/AUTHORING_GUIDE.md
nicain 9916963
Update CODEOWNERS
nicain 7e8e694
Update .github/CODEOWNERS
dandhlee 249c62e
Update .github/blunderbuss.yml
dandhlee f86c95f
Update .github/blunderbuss.yml
dandhlee da19172
Update documentai/CONTRIBUTING.md
dandhlee f9bf8cd
Merge branch 'main' into python-documentai-migration
holtskinner 1557474
fix(samples): Fixed import issues in tests
holtskinner 02921e7
Merge branch 'main' into python-documentai-migration
holtskinner fa30eb5
fix(samples): Changes snippets import to include documentai module
holtskinner 4b3f800
Merge branch 'python-documentai-migration' of https://github.com/Goog…
holtskinner 15adcfa
Merge branch 'main' into python-documentai-migration
dandhlee b21e18c
Chore: Add requirements.txt and noxfile.py for new samples (#45)
aribray 5f94ec5
docs(samples): new Doc AI samples for v1beta3 (#44)
aribray d9428e6
chores: fixed small issue with start index problem (#56)
munkhuushmgl a702ed2
chore: update samples noxfile
yoshi-automation b4d03a8
fix: removes C-style semicolons and slash comments (#59)
telpirion 7cd4615
chore(deps): update dependency google-cloud-storage to v1.33.0 (#61)
renovate-bot 666a7ff
fix: added if statement to filter out dir blob files (#63)
munkhuushmgl c755ff9
samples(fix): change comments to match function signature (#68)
telpirion 85ecf86
fix: moves import statment inside region tags (#71)
telpirion b1b0f92
samples: added test that covers the wrong file type case (#69)
munkhuushmgl efb2acc
chore: update templates (#74)
yoshi-automation 7b2f8c9
samples: migrate v1beta2 doc AI samples (#79)
munkhuushmgl 2774619
chore(deps): update dependency google-cloud-storage to v1.35.0 (#78)
renovate-bot 87b9220
chore: added increased timeout on flaky batch request (#84)
munkhuushmgl 70d51cf
chore: exclude `.nox` directories from linting (#87)
yoshi-automation ab86c32
chore(deps): update dependency google-cloud-storage to v1.36.0 (#91)
renovate-bot f533e42
chore(deps): update dependency google-cloud-storage to v1.36.1 (#92)
renovate-bot 48b2add
fix(samples): swaps 'continue' for 'return' (#93)
telpirion 5f5b76c
fix: adds comment with explicit hostname change (#94)
telpirion 28051db
chore(deps): update dependency google-cloud-storage to v1.36.2 (#95)
renovate-bot 37ea57c
chore: update templates (#97)
yoshi-automation ecd0e99
chore(deps): update dependency google-cloud-storage to v1.37.0 (#104)
renovate-bot d174bf2
chore(deps): update dependency google-cloud-documentai to v0.4.0 (#103)
renovate-bot 513026f
samples: updates Document AI samples to v1 version of service (#108)
telpirion 43f5dd9
samples: more updates for v1 (#121)
telpirion 03536a1
chore: template updates (#120)
yoshi-automation 97a834c
chore(deps): update dependency google-cloud-storage to v1.37.1 (#114)
renovate-bot 43c22f5
chore: migrate to owl bot (#130)
parthea 0ffdf66
chore(deps): update dependency pytest to v6.2.4 (#124)
renovate-bot 6629f9c
chore: new owl bot post processor docker image (#152)
gcf-owl-bot[bot] 321d6fc
fix: Parsing pages, but should be paragraphs (#147)
dgallegos 728e01f
chore(deps): update dependency google-cloud-documentai to v0.5.0 (#155)
renovate-bot 89214d5
chore(deps): update dependency google-cloud-storage to v1.38.0 (#133)
renovate-bot ad51a30
chore(deps): update dependency google-cloud-storage to v1.39.0 (#169)
renovate-bot fe11474
chore(deps): update dependency google-cloud-storage to v1.40.0 (#173)
renovate-bot 167cb1d
chore(deps): update dependency google-cloud-storage to v1.41.0 (#177)
renovate-bot fe23cdd
feat: add Samples section to CONTRIBUTING.rst (#181)
gcf-owl-bot[bot] ebc5d5d
chore(deps): update dependency google-cloud-storage to v1.41.1 (#182)
renovate-bot 3879484
chore(deps): update dependency google-cloud-documentai to v1 (#185)
renovate-bot 4bafbc2
samples: moves region tag to include import statement (#186)
telpirion aa63362
chore: fix INSTALL_LIBRARY_FROM_SOURCE in noxfile.py (#192)
gcf-owl-bot[bot] efefb26
chore(deps): update dependency google-cloud-storage to v1.42.0 (#194)
renovate-bot 6abc37f
chore: drop mention of Python 2.7 from templates (#197)
gcf-owl-bot[bot] f9098a5
samples: moves import statement within region tags (#190)
telpirion 9035553
chore(deps): update dependency pytest to v6.2.5 (#204)
renovate-bot dc6fb2c
chore(deps): update dependency google-cloud-storage to v1.42.1 (#209)
renovate-bot a6171a9
chore: blacken samples noxfile template (#212)
gcf-owl-bot[bot] d7bbf09
chore(deps): update dependency google-cloud-storage to v1.42.2 (#213)
renovate-bot 2ed85dd
chore: fail samples nox session if python version is missing (#218)
gcf-owl-bot[bot] 9b6e2fa
chore(deps): update dependency google-cloud-storage to v1.42.3 (#219)
renovate-bot e8710d3
chore(python): Add kokoro configs for python 3.10 samples testing (#225)
gcf-owl-bot[bot] dcbccf3
chore(deps): update dependency google-cloud-documentai to v1.1.0 (#227)
renovate-bot c6feba3
chore(deps): update dependency google-cloud-documentai to v1.2.0 (#232)
renovate-bot 5134855
docs(samples): add OCR, form, quality, splitter and specialized proce…
0852e35
chore(python): run blacken session for all directories with a noxfile…
gcf-owl-bot[bot] a07ba8e
chore(samples): Add check for tests in directory (#257)
gcf-owl-bot[bot] 00deacb
chore(deps): update dependency google-cloud-storage to v2 (#247)
renovate-bot ba7a494
chore(python): Noxfile recognizes that tests can live in a folder (#262)
gcf-owl-bot[bot] b4e80b4
chore(deps): update dependency google-cloud-documentai to v1.2.1 (#263)
renovate-bot cdbaadc
test: strip quotes and newlines from output (#279)
busunkim96 2353f68
chore(deps): update dependency google-cloud-storage to v2.1.0 (#264)
renovate-bot 6755a2e
chore: Adding support for pytest-xdist and pytest-parallel (#286)
gcf-owl-bot[bot] f00d657
chore(deps): update all dependencies (#281)
renovate-bot 4e513c8
chore(deps): update dependency google-cloud-documentai to v1.3.0 (#290)
renovate-bot 3fab8ee
chore(deps): update dependency pytest to v7.1.0 (#291)
renovate-bot 5a293f1
chore(deps): update dependency google-cloud-storage to v2.2.0 (#292)
renovate-bot 334bb42
chore(deps): update dependency google-cloud-storage to v2.2.1 (#293)
renovate-bot ed173ea
chore(deps): update dependency pytest to v7.1.1 (#296)
renovate-bot a8a45ee
chore(deps): update dependency google-cloud-documentai to v1.4.0 (#297)
renovate-bot ac38098
chore(python): use black==22.3.0 (#301)
gcf-owl-bot[bot] 638a923
chore(deps): update dependency google-cloud-storage to v2.3.0 (#310)
renovate-bot 38ecbbf
chore(python): add nox session to sort python imports (#312)
gcf-owl-bot[bot] a0a729d
chore(deps): update dependency pytest to v7.1.2 (#316)
renovate-bot f1339b7
chore: removed v1beta2 samples (#315)
galz10 82d5bb0
chore(deps): update dependency google-cloud-documentai to v1.4.1 (#319)
renovate-bot cc59c78
fix: require python 3.7+ (#348)
gcf-owl-bot[bot] 851c4e0
chore(deps): update all dependencies (#338)
renovate-bot 2f7da58
refactor: Updates to Document AI Python Samples (#323)
holtskinner 6119ada
chore(deps): update all dependencies (#355)
renovate-bot c00fd29
chore(deps): update dependency google-cloud-documentai to v1.5.1 (#362)
renovate-bot 89e7ce1
docs(samples): Added Human Review Request Sample (#357)
holtskinner dd4bc19
chore(deps): update dependency google-cloud-documentai to v2 (#364)
renovate-bot 7e52ab1
chore(deps): update dependency pytest to v7.1.3 (#374)
renovate-bot 7f4d82d
docs(samples): Updated Samples for v2.0.0 Client Library (#365)
holtskinner c35078e
chore(main): release 2.0.1 (#378)
release-please[bot] aca1634
chore: detect samples tests in nested directories (#379)
gcf-owl-bot[bot] 9384f5a
chore(deps): update dependency google-cloud-documentai to v2.0.1 (#380)
renovate-bot f1f3c37
docs(samples): Added Processor Version Samples (#382)
holtskinner 21fe303
chore(deps): update dependency google-cloud-documentai to v2.0.2 (#386)
renovate-bot 74b3a44
chore(deps): update dependency google-cloud-documentai to v2.0.3 (#390)
renovate-bot 6dc4b94
chore(deps): update dependency pytest to v7.2.0 (#392)
renovate-bot 34c0e3f
docs(samples): Added extra exception handling to operation samples (#…
holtskinner fbdcfe1
chore:Remove Sample Inputs/Outputs from Repo (#391)
holtskinner 4356d33
chore(deps): update dependency google-cloud-storage to v2.6.0 (#399)
renovate-bot 3d21322
chore(deps): update dependency google-cloud-documentai to v2.1.0 (#407)
renovate-bot 1e68334
docs(samples): Updated code samples for 2.1.0 release (#406)
holtskinner 7600e28
chore(deps): update dependency google-cloud-documentai to v2.2.0 (#411)
renovate-bot 702f709
chore(deps): update dependency google-cloud-documentai to v2.3.0 (#414)
renovate-bot 7593fb2
chore(python): drop flake8-import-order in samples noxfile (#421)
gcf-owl-bot[bot] 42deddf
fix(samples): Fix Typos in Batch process & get processor Samples (#420)
holtskinner 5a2459c
chore(deps): update dependency google-cloud-documentai to v2.4.0 (#423)
renovate-bot 822488b
chore(deps): update dependency google-cloud-storage to v2.7.0 (#426)
holtskinner 58d487b
chore(deps): update dependency google-cloud-documentai to v2.4.1 (#428)
renovate-bot 804dddc
chore(deps): update dependency google-cloud-documentai to v2.5.0 (#432)
renovate-bot 3043731
chore(deps): update dependency google-cloud-documentai to v2.6.0 (#435)
renovate-bot a798a17
Moved Python Files to new_directory
holtskinner 32f2b63
Pulled in updates from python-documentai repository
holtskinner 1889479
Deleted temporary directory
holtskinner 1328a19
Merge branch 'main' into python-documentai-migration
holtskinner e3d58e2
Addressed Test Failures
holtskinner c3f455a
Merge branch 'python-documentai-migration' of https://github.com/Goog…
holtskinner 77dd5d9
Merge branch 'main' into python-documentai-migration
holtskinner 781c196
Merge branch 'main' into python-documentai-migration
kweinmeister de30aa2
Updated Document AI/Storage Client Library Versions in requirements.txt
holtskinner f453358
Addressed flake8 import linter errors
holtskinner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
See https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/AUTHORING_GUIDE.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
See https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/CONTRIBUTING.md |
Empty file.
Empty file.
163 changes: 163 additions & 0 deletions
163
documentai/snippets/batch_process_documents_processor_version_sample.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
# Copyright 2020 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
# [START documentai_batch_process_documents_processor_version] | ||
import re | ||
|
||
from google.api_core.client_options import ClientOptions | ||
from google.api_core.exceptions import InternalServerError | ||
from google.api_core.exceptions import RetryError | ||
from google.cloud import documentai | ||
from google.cloud import storage | ||
|
||
# TODO(developer): Uncomment these variables before running the sample. | ||
# project_id = 'YOUR_PROJECT_ID' | ||
# location = 'YOUR_PROCESSOR_LOCATION' # Format is 'us' or 'eu' | ||
# processor_id = 'YOUR_PROCESSOR_ID' # Example: aeb8cea219b7c272 | ||
# processor_version_id = "YOUR_PROCESSOR_VERSION_ID" # Example: pretrained-ocr-v1.0-2020-09-23 | ||
# gcs_input_uri = "YOUR_INPUT_URI" # Format: gs://bucket/directory/file.pdf | ||
# input_mime_type = "application/pdf" | ||
# gcs_output_bucket = "YOUR_OUTPUT_BUCKET_NAME" # Format: gs://bucket | ||
# gcs_output_uri_prefix = "YOUR_OUTPUT_URI_PREFIX" # Format: directory/subdirectory/ | ||
# field_mask = "text,entities,pages.pageNumber" # Optional. The fields to return in the Document object. | ||
|
||
|
||
def batch_process_documents_processor_version( | ||
project_id: str, | ||
location: str, | ||
processor_id: str, | ||
processor_version_id: str, | ||
gcs_input_uri: str, | ||
input_mime_type: str, | ||
gcs_output_bucket: str, | ||
gcs_output_uri_prefix: str, | ||
field_mask: str = None, | ||
timeout: int = 400, | ||
): | ||
|
||
# You must set the api_endpoint if you use a location other than 'us'. | ||
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com") | ||
|
||
client = documentai.DocumentProcessorServiceClient(client_options=opts) | ||
|
||
gcs_document = documentai.GcsDocument( | ||
gcs_uri=gcs_input_uri, mime_type=input_mime_type | ||
) | ||
|
||
# Load GCS Input URI into a List of document files | ||
gcs_documents = documentai.GcsDocuments(documents=[gcs_document]) | ||
input_config = documentai.BatchDocumentsInputConfig(gcs_documents=gcs_documents) | ||
|
||
# NOTE: Alternatively, specify a GCS URI Prefix to process an entire directory | ||
# | ||
# gcs_input_uri = "gs://bucket/directory/" | ||
# gcs_prefix = documentai.GcsPrefix(gcs_uri_prefix=gcs_input_uri) | ||
# input_config = documentai.BatchDocumentsInputConfig(gcs_prefix=gcs_prefix) | ||
# | ||
|
||
# Cloud Storage URI for the Output Directory | ||
# This must end with a trailing forward slash `/` | ||
destination_uri = f"{gcs_output_bucket}/{gcs_output_uri_prefix}" | ||
|
||
gcs_output_config = documentai.DocumentOutputConfig.GcsOutputConfig( | ||
gcs_uri=destination_uri, field_mask=field_mask | ||
) | ||
|
||
# Where to write results | ||
output_config = documentai.DocumentOutputConfig(gcs_output_config=gcs_output_config) | ||
|
||
# The full resource name of the processor version | ||
# e.g. projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id} | ||
name = client.processor_version_path( | ||
project_id, location, processor_id, processor_version_id | ||
) | ||
|
||
request = documentai.BatchProcessRequest( | ||
name=name, | ||
input_documents=input_config, | ||
document_output_config=output_config, | ||
) | ||
|
||
# BatchProcess returns a Long Running Operation (LRO) | ||
operation = client.batch_process_documents(request) | ||
|
||
# Continually polls the operation until it is complete. | ||
# This could take some time for larger files | ||
# Format: projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID | ||
try: | ||
print(f"Waiting for operation {operation.operation.name} to complete...") | ||
operation.result(timeout=timeout) | ||
# Catch exception when operation doesn't finish before timeout | ||
except (RetryError, InternalServerError) as e: | ||
print(e.message) | ||
|
||
# NOTE: Can also use callbacks for asynchronous processing | ||
# | ||
# def my_callback(future): | ||
# result = future.result() | ||
# | ||
# operation.add_done_callback(my_callback) | ||
|
||
# Once the operation is complete, | ||
# get output document information from operation metadata | ||
metadata = documentai.BatchProcessMetadata(operation.metadata) | ||
|
||
if metadata.state != documentai.BatchProcessMetadata.State.SUCCEEDED: | ||
raise ValueError(f"Batch Process Failed: {metadata.state_message}") | ||
|
||
storage_client = storage.Client() | ||
|
||
print("Output files:") | ||
# One process per Input Document | ||
for process in metadata.individual_process_statuses: | ||
# output_gcs_destination format: gs://BUCKET/PREFIX/OPERATION_NUMBER/INPUT_FILE_NUMBER/ | ||
# The Cloud Storage API requires the bucket name and URI prefix separately | ||
matches = re.match(r"gs://(.*?)/(.*)", process.output_gcs_destination) | ||
if not matches: | ||
print( | ||
"Could not parse output GCS destination:", | ||
process.output_gcs_destination, | ||
) | ||
continue | ||
|
||
output_bucket, output_prefix = matches.groups() | ||
|
||
# Get List of Document Objects from the Output Bucket | ||
output_blobs = storage_client.list_blobs(output_bucket, prefix=output_prefix) | ||
|
||
# Document AI may output multiple JSON files per source file | ||
for blob in output_blobs: | ||
# Document AI should only output JSON files to GCS | ||
if ".json" not in blob.name: | ||
print( | ||
f"Skipping non-supported file: {blob.name} - Mimetype: {blob.content_type}" | ||
) | ||
continue | ||
|
||
# Download JSON File as bytes object and convert to Document Object | ||
print(f"Fetching {blob.name}") | ||
document = documentai.Document.from_json( | ||
blob.download_as_bytes(), ignore_unknown_fields=True | ||
) | ||
|
||
# For a full list of Document object attributes, please reference this page: | ||
# https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document | ||
|
||
# Read the text recognition output from the processor | ||
print("The document contains the following text:") | ||
print(document.text) | ||
|
||
|
||
# [END documentai_batch_process_documents_processor_version] |
49 changes: 49 additions & 0 deletions
49
documentai/snippets/batch_process_documents_processor_version_sample_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Copyright 2020 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
import os | ||
from uuid import uuid4 | ||
|
||
from documentai.snippets import \ | ||
batch_process_documents_processor_version_sample | ||
|
||
location = "us" | ||
project_id = os.environ["GOOGLE_CLOUD_PROJECT"] | ||
processor_id = "90484cfdedb024f6" | ||
processor_version_id = "pretrained-form-parser-v1.0-2020-09-23" | ||
gcs_input_uri = "gs://cloud-samples-data/documentai/invoice.pdf" | ||
input_mime_type = "application/pdf" | ||
gcs_output_bucket = "gs://document-ai-python" | ||
gcs_output_uri_prefix = f"{uuid4()}/" | ||
field_mask = "text,pages.pageNumber" | ||
|
||
|
||
def test_batch_process_documents_processor_version(capsys): | ||
batch_process_documents_processor_version_sample.batch_process_documents_processor_version( | ||
project_id=project_id, | ||
location=location, | ||
processor_id=processor_id, | ||
processor_version_id=processor_version_id, | ||
gcs_input_uri=gcs_input_uri, | ||
input_mime_type=input_mime_type, | ||
gcs_output_bucket=gcs_output_bucket, | ||
gcs_output_uri_prefix=gcs_output_uri_prefix, | ||
field_mask=field_mask, | ||
) | ||
out, _ = capsys.readouterr() | ||
|
||
assert "operation" in out | ||
assert "Fetching" in out | ||
assert "text:" in out |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.