Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
142 views11 pages

Tableau Integration With Hadoop

Tableau is a visualization tool that allows users to easily analyze large datasets through drag-and-drop features. It generates interactive charts and dashboards that can be shared. Tableau connects to various data sources like databases, cubes, and spreadsheets. It offers different products for desktop, server, and online use. Tableau uses an in-memory engine to quickly query data and visualize results. It also supports connecting to Hadoop via Cloudera's Hive ODBC driver to perform analysis on big data.

Uploaded by

sunita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views11 pages

Tableau Integration With Hadoop

Tableau is a visualization tool that allows users to easily analyze large datasets through drag-and-drop features. It generates interactive charts and dashboards that can be shared. Tableau connects to various data sources like databases, cubes, and spreadsheets. It offers different products for desktop, server, and online use. Tableau uses an in-memory engine to quickly query data and visualize results. It also supports connecting to Hadoop via Cloudera's Hive ODBC driver to perform analysis on big data.

Uploaded by

sunita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Tableau Integration with Hadoop

Tableau Overview
Tableau is a visualization tool based on breakthrough technology that provides drag & drop features to
analyze data on large amounts of data very easily and quickly.
The Dashboard of Tableau is very interactive and gives dynamic results. Tableau supports strong
interactive capabilities and provides rich set of graphic charts. Tableau can query relational databases,
cubes, cloud database, and spreadsheets and then generates a number of graph types that can be
combined into dashboards and shared over a computer network or the internet.

Tableau offers 4 different products that can be used for various purposes:
 Tableau desktop – desktop version and can be used by anyone for 15 days without any license.
 Tableau server – Tableau Server is a business intelligence application that provides browser-
based analytics.
 Tableau public – Public version of Tableau software.
 Tableau online – Tableau Online is a hosted version of Tableau Server.

Tableau supports Visual Query Language (VizQL). VizQL translates drag-and-drop actions into data
queries and then expresses that data visually.
T
ableau data engine is based on advanced In-Memory Technology to speed up ad-hoc analysis of massive
data in a few seconds. Query results are cached in memory, to provide the results very quickly on even
large amounts of data sets.

Tableau is a commercial product but it provides 15 days of trial period on Tableau Desktop, so that we
can try this for some sample metrics. As of now Tableau supports only Windows and Mac Os. There is no
support for Linux operating systems.

Tableau Features:
 Ease of Use – No programming or scripts needed. We can connect to data in a few clicks, via
its drag and drop features and then visualize and create interactive dashboards with a few more.
 Speed processing – With its advance in-memory technology its processing speed is fast.
 Rich Set of GUI and Chart based dashboards.
 Easy publishing and sharing of dashboards on web or emails.
 Its drawback is that, no predictive analytical capabilities are supported by Tableau.
Tableau Desktop Installation & Tableau Integration with Hadoop:
 Tableau Desktop for 15 days of trial period can be downloaded
at http://www.tableausoftware.com/products/trial. Based on the windows operating system
choose either 32 bit version or 64 bit version of installer files for Tableau Desktop. Download the
windows installer file (TableauPublicDesktop-32bit.msi) and install it on the machine.
 For connecting with MySQL relational database, we need to download its ODBC (Open database
connectivity) drivers and install on the machine. ODBC drivers for MySQL can be downloaded
(mysql-connector-odbc-5.3.4-win32.msi) at http://dev.mysql.com/downloads/connector/odbc/
 Tableau supports visualization on Hadoop data by connecting to Hiveserver or Hiveserver2.
There are no ODBC drivers available as of now for connecting to Apache Hive directly from
Tableau but Cloudera provides ODBC drivers for Tableau integration with Hadoop via cloudera
hiveserver.
 Cloudera hive ODBC drivers can be downloaded
at http://www.cloudera.com/content/cloudera/en/downloads/connectors.html . Make sure to
download ODBC drivers (ClouderaHiveODBC32.msi) but not JDBC drivers.
 Install the above .msi files in windows machine and restart it to correctly integrate the drivers
with each other.
Connect to Tableau and Perform Data Visualization on Hadoop Data:
Start tableau desktop software and connect to Cloudera Hive (Data –> Connect) and provide the
details as shown in below screen. Connect to Hiveserver at port 10000 on IP address of the machine
on which hiveserver is running.

Once we have successfully connected to Hiveserver machine, we need to select the schema (database
name) and need to either search tables or select any listed tables.
For example in the above screen default the schema name and we need to select the default schema
here in the above screen. After selecting default as schema and clicking on search button against tables,
then all the tables present in the this schema will be listed as shown in the below screen.
Now it is very easy to perform analysis/statistics/reports/joins on tables from the above list. We just
need to drag and drop the tables in to worksheet as shown in above screen for table t1. If we need to
join two tables then we need to drag two tables from the list and drop into worksheet, if there are any
common columns between two tables they will be automatically used to join perform join other wise,
we need to give the column names on which, the join should be performed. Also Tableau will ask by
highlighting to choose join type (Inner join, left outer join, right outer join or full outer joins).
Once the tables are dropped into worksheet and if we click on “Go to Worksheet” Tableau will fire the
query in the background on hiveserver and pulls the records from into its advanced in-memory. By
default Tableau retrieves only 10000 rows into its memory but we can configure this by entering our
required value into “Rows” field on right side of the above page and clicking on either of the options
“Update now” or “Automatically Update”
Once the data from the table is loaded into memory, we are ready to perform any queries on
visualizations on top of the data. Here we can drag any column from the tables and drop into “Columns”
tab in the below screen and Preferably measures into Rows for better visualizations but this rule is not
mandatory. We can explore the best possible ways to visualize the data just by playing around the drag
and drop of various columns into different areas.
Here in the below screen we are trying to list out the number of records per date from a sample table.
So, we have selected Dt into “Columns” and No of Records into “Rows” and below is the automatic bar
chart visualization.
But also we can select any of the highlighted chart formats (Pie charts, bubble chart, etc…) to visualize
data.
In the background, tableau will fire queries against hiveserver to pull data from hive table into it’s in-
memory. For example, below is the screen shot of the query fired for data pulling the above sample
table from hiveserver. We can observe the Tez job running in the below screen.
Tableau connectivity with MySQL:
Below is the procedure for connecting Tableau with MySQL RDBMS. After successful installation of
ODBC drivers for MySQL database, (Data –> connect –> Select MySQL) and enter the details as shown in
below screen.
Provide the MySQL server IP address and default port number 3306 for MySQL and provide username
and password details to connect the database.
And select the database first and drag and drop the tables which we want to include in analysis.
Now as shown in above (cloudera hive) section, we can perform visualizations on the tables.
Data Blending in Tableau:
We can perform data blending in Tableau, by connecting two data sources at a time. As shown in above
two sections we can connect to two data sources (Cloudera Hive and MySQL) at a time and select the
tables into Worksheet. We can blend (mix/merging of data) these tables by assigning relationships
between them. This can be done by Selecting a table from MySQL (say user table) and another table
from Cloudera Hive Server (say t1) and we can perform data blending by selecting data –> edit
relationships and selecting the common columns in the tables to blend the data among the tables.
Below is a sample screen shot for the same. If both the tables have at least one column in common then
only we can blend the tables.

Building Dashboards in Tableau:


We can build dashboards in Tableau with the visualization charts prepared on various charts into single
dash board by selecting any layout. Below is the 4 sheet layout containing four types of visualization
charts in a single dashboard.

You might also like