Thanks to visit codestin.com
Credit goes to github.com

Skip to content

vincentpyg/vghadoop-io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#vghadoop-io

Custom InputFormats for MapReduce. Also useful in Spark.

##MultipleRawFileInputFormat Reads the contents of multiple files, each stored into a ByteArrayInputStream. Useful for reading PDFs, Images, XML files, etc.

##MultipleTextFileInputFormat Reads the contents of multiple files, each stored as a String. Useful for operations (i.e. Sentiment Analysis, Event Relation Extraction, Document Classification, etc.) that need to read the entire contents of a file.

##VgZipInputFormat Reads multiple files inside a ZIP archive.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages