Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
126 views2 pages

Pentaho Data Lake-1

pentaho_data_lake-1

Uploaded by

hokusmanoli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views2 pages

Pentaho Data Lake-1

pentaho_data_lake-1

Uploaded by

hokusmanoli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

DATASHEET

BLUEPRINT FOR BIG DATA SUCCESS: A BEST PRACTICE SOLUTION PATTERN

Filling the Data Lake

Simplify and Accelerate Hadoop Data Ingestion with a Scalable Approach

What is it?
As organizations scale up data onboarding from just a few • A flexible, scalable, and repeatable process to onboard a
sources going into Hadoop to hundreds or more, IT time growing number of data sources into Hadoop data lakes
and resources can be monopolized, creating hundreds of
• Streamlined data ingestion from hundreds or thousands
hard-coded data movement procedures – and the process
of disparate CSV files or database tables into Hadoop
is often highly manual and error-prone. The Pentaho Filling
• An automated, template-based approach to data work-
the Data Lake blueprint provides a template-based
flow creation
approach to solving these challenges, and is comprised of:
• Simplified regular data movement at scale into Hadoop in
the AVRO format

Why do it?
• Reduce IT time and cost spent building and maintaining • Automate business processes for efficiency and speed,
repetitive big data ingestion jobs, allowing valuable staff while maintaining data governance
to dedicate time to more strategic projects
• Enable more sophisticated analysis by business users
• Minimize risk of manual errors by decreasing dependence with new and emerging data sources
on hard-coded data ingestion procedures

Value of Pentaho
• Unique metadata injection capability accelerates time-to- • Ability to architect a governed process that is highly reus-
value by automating many onboarding jobs with just a few able
templates
• Robust integration with the broader Hadoop ecosystem
• Intuitive graphical user interface for big data integration and semi-structured data
means existing ETL developers can create repeatable data
movement flows without coding – in minutes, not hours
Example of how a Filling the Data Lake blueprint
implementation may look in a financial organization
This company uses metadata injection to move thousands of data sources into
Hadoop in a streamlined, dynamic integration process.

• Large financial services organization with thousands of input sources

• Reduce number of ingest processes through Metadata Injection

• Deliver transformed data directly into Hadoop in the AVRO Format

RDBMS

HADOOP AVRO FORMAT

INGEST PROCEDURES

CSV

DISPARATE DATA SOURCES DYNAMIC DATA INTEGRATION PROCESSES DYNAMIC TRANSFORMATIONS

Be social
with Pentaho:

Copyright ©2016 Pentaho Corporation. All rights reserved. Worldwide +1 (866) 660 7555. pentaho.com/contact | +1 (866) 660-7555

You might also like