Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ daiR Public

❗ This is a read-only mirror of the CRAN R package repository. daiR — Interface with Google Cloud Document AI API. Homepage: https://github.com/Hegghammer/daiRhttps://dair.info Report bugs for this package: https://github.com/Hegghammer/daiR/issues

License

Notifications You must be signed in to change notification settings

cran/daiR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

daiR: OCR with Google Document AI in R

daiR is an R package for Google Document AI, a powerful server-based OCR processor with support for over 60 languages. The package provides an interface for the Document AI API and comes with additional tools for output file parsing and text reconstruction. See the daiR website for more details.

Use

Quick OCR short documents:

## NOT RUN
library(daiR)
response <- dai_sync("file.pdf")
text <- text_from_dai_response(response)
cat(text)

Batch process asynchronously via Google Storage:

## NOT RUN
library(googleCloudStorageR)
library(purrr)
my_files <- c("file1.pdf", "file2.pdf", "file3.pdf")
map(my_files, gcs_upload)
dai_async(my_files)
contents <- gcs_list_objects()
output_files <- grep("json$", contents$name, value = TRUE)
map(output_files, ~ gcs_get_object(.x, saveToDisk = file.path(tempdir(), .x)))
sample_text <- text_from_dai_file(file.path(tempdir(), output_files[1]))
cat(sample_text)

Turn images of tables into R dataframes:

## NOT RUN:
response <- dai_sync_tab("tables.pdf")
dfs <- tables_from_dai_response(response) 

Requirements

Google Document AI is a paid service that requires a Google Cloud account and a Google Storage bucket. I recommend using Mark Edmondson's googleCloudStorageR package in combination with daiR.

Installation

Install the latest development version from Github:

devtools::install_github("hegghammer/daiR")

CRAN status R-CMD-check Codecov test coverage

About

❗ This is a read-only mirror of the CRAN R package repository. daiR — Interface with Google Cloud Document AI API. Homepage: https://github.com/Hegghammer/daiRhttps://dair.info Report bugs for this package: https://github.com/Hegghammer/daiR/issues

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages