Basic Rules and What Data Analysis
Basic Rules and What Data Analysis
STEPS IN TABULATION
The steps are followed in tabulation - Transformation of information according to classification from questionnaire to
worksheets for facilitating handling. After summarization of information in the work-sheet draft table is to be prepared.
Preparation of final table containing the results of draft table. Forms of Table Forms of tables may be single, double,
triple or manifold, according to the number of characteristic covered by the table. Practical illustration will make the idea
more clear. A simple table shows only one characteristic. The data are presented only in terms of one of their characteristic.
In two-fold table two characteristics are included.
Specific Types of Tables Analysis of Variance (ANOVA) Tables: The conventional format for an ANOVA table is to list the
source in the stub column, then the degrees of freedom (df) and the F ratios. Give the betweensubject variables and error
first, then within-subject and any error. Mean square errors must be enclosed in parentheses. Provide a general note to the
table to explain what those values mean (see example). Use asterisks to identify statistically significant F ratios, and provide
a probability footnote.
Regression Tables: Conventional reporting of regression analysis follows two formats. If the study is purely applied, list only
the raw or un-standardized coefficients (B). If the study is purely theoretical, list only the standardized coefficients (beta). If
the study was neither purely applied nor theoretical, then list both standardized and un-standardized coefficients. Specify
the type of analysis, either hierarchical or simultaneous, and provide the increments of change if you used hierarchical
regression.
Path and LISREL (Linear Structural Relations) Tables: Present the means, standard deviations, and intercorrelations of the
entire set of variables you use input to path and LISREL analysis. These data are essential for the reader to replicate or
confirm your analyses and are necessary for archival purposes, if your study is included in meta-analysis. To help the reader
interpret your table, give short descriptions instead of just a list of symbols of the x and y variables used in the models. If
you need to use acronyms, be sure to define each one.
Word Tables: Unlike most tables, which present quantitative data, some tables consist mainly of words. Word tables
present qualitative comparisons or descriptive information. For example, a word table can enable the reader to compare
characteristics of studies in an article that reviews many studies, or it can present questions and responses from a survey or
show an outline of the elements of a theory. Word tables illustrate the discussion in the text; they should not repeat the
discussion. Word tables include the same elements of format as do other types of tables - table number and title, headings,
rules, and possibly notes. Keep column entries brief and simple. Indent any runover lines in entries. Double-space all parts
of a word table.
Notes in Tables
There are three types of notes for tables - general, specific, and probability notes. All of them must be placed below the
table in that order. General notes explain, qualify or provide information about the table as a whole. Put explanations of
abbreviations, symbols, etc. here. Example: Note. The racial categories used by the US Census (African-American, Asian
American, Native-American, and Pacific Islander) have been collapsed into the category ‘non-White’. E = excludes
respondents who self-identified as ‘White’ and at least one other ‘non-White’ race. Specific notes explain, qualify or
provide information about a particular column, row, or individual entry. To indicate specific notes, use superscript
lowercase letters (e.g. a , b , c ), and order the superscripts from left to right, top to bottom. Each table’s first footnote must
be superscript a . Example: a n = 823. b One participant in this group was diagnosed with schizophrenia during the survey.
Probability notes provide the reader with the results of the texts for statistical significance. Asterisks indicate the values for
which the null hypothesis is rejected, with the probability (p-value) specified in the probability note. Such notes are BASIC
STRUCTURE OF A TABLE
The following parts are the basic structure of tables. Numbers: Number all tables with Arabic numerals sequentially. Do not
use suffix letters (e.g. Table 3a, 3b, 3c); instead, combine the related tables. If the manuscript includes an appendix with
tables, identify them with capital letters and Arabic numerals (e.g. Table A1, Table B2). Titles: Like the title of the paper
itself, each table must have a clear and concise title. When appropriate, you may use the title to explain an abbreviation
parenthetically. Example: Comparison of Median Income of Adopted Children (AC) v. Foster Children (FC) Headings: Keep
headings clear and brief. The heading should not be much wider than the widest entry in the column. Use of standard
abbreviations can aid in achieving that goal. All columns must have headings, even the stub column, which customarily lists
the major independent variables. Body: In reporting the data, consistency is key. Numerals should be expressed to a
consistent number of decimal places that is determined by the precision of measurement. Never change the unit of
measurement or the number of decimal places in the same column. More specifically the different parts of a table areTitle:
Each table has its title describing the contents. Sub Head: It describes the characteristic of the stub entries. Stub Entries:
These are the classification of actual data. Caption Head: This explains the data placed in each column of caption head.
Body: It contains the data in classified form. Foot Note: It may be used to describe anything. Source: Disclosing the source
of information. required only when relevant to the data in the table. Consistently use the same number of asterisks for a
given alpha level throughout your paper. Example: *p
Table Checklist Is the table necessary? Is the entire table single- or double-spaced (including the title, headings, and
notes)? Are all comparable tables presented consistently? Is the title brief but explanatory? Does every column have a
column heading? Are all abbreviations; special use of italics, parentheses, and dashes; and special symbols explained?
Are all probability level values correctly identified, and are asterisks attached to the appropriate table entries? Is a
probability level assigned the same number of asterisks in all the tables in the same document? Are the notes organized
according to the convention of general, specific, probability? Are all vertical rules eliminated? If the table or its data are
from another source, is the source properly cited? Is the table referred to in the text?
DECIDING TO USE FIGURE In APA journals, any type of illustration other than a table is called a figure. Because tables are
typeset, rather than photographed from art-work supplied by the author, they are not considered figures. A figure may be a
chart, graph, photograph, drawing, or other depiction. Consider carefully whether to use a figure. Tables are often
preferred for the presentation of quantitative data in archival journals because they provide exact information; figures
typically require the reader to estimate values. On the other hand, figures convey at a quick glance an overall pattern of
results. They are especially useful in describing an interaction – or lack thereof- and nonlinear relations. A well-prepared
figure can also convey structural or pictorial concepts more efficiently than can text. During the process of drafting a
manuscript, and in deciding whether to use a figure, ask yourself these questions – What idea do you need to convey?
Is the figure necessary? If it duplicates text, it is not necessary. If it complements text or eliminates lengthy discussion, it
may be the most efficient way to present the information. What type of figure (e.g., graph, chart, diagram, drawing, map,
or photograph) is most suited to your purpose? Will a simple, relatively inexpensive figure (e.g., line art) convey the point as
well as an elaborate, expensive figure (e.g., photographs combined with line art, figures that are in color instead of in black
and white)? Standard for Figure The standards for good figures are simplicity, clarity, and continuity. A good figure-
augments rather than duplicated the text; conveys only essential facts; omits visually distracting detail; is easy to read
– its elements (type, lines, labels, symbols, etc.) are large enough to be read with ease in the printed form; is easy to
understand – its purpose is readily apparent; is consistent with and is prepared in the same style as similar figures in the
same article; that is, the lettering is of the same size and typeface, lines are of the same weight, and so forth; and is
carefully planned and prepared. 12.7
TYPES OF FIGURE Several types of figures can be used to present data to the reader. Some-times the choice of which type
to use will be obvious, but at other times it will not.
Graphs are good at quickly conveying relationships like comparison and distribution. The most common forms of graphs are
scatter plots, line graphs, bar graphs, pictorial graphs, and pie graphs. Graph shows relations - comparison and distribution
– in a set of data and may show, for example, absolute values, percentages, or index numbers.
Scatter plots are composed of individual dots that represent the value of a specific event on the scale established by the
two variables plotted on the x- and y-axes. When the dots cluster together, a correlation is implied. On the other hand,
when the dots are scattered randomly, no correlation is seen. For example, a cluster of dots along a diagonal implies a
linear relationship, and if all the dots fall on a diagonal line, the coefficient of correlation is 1.00.
Line graphs depict the relationship between quantitative variables. Customarily, the independent variable is plotted
along the x-axis (horizontally) and the dependent variable is plotted along the y-axis (vertically).
(3) sliding bars. In solid bar graphs, the independent variable is categorical, and each bar represents one kind of datum, e. g.
a bar graph of monthly expenditures. A multiple bar graph can show more complex information than a simple bar graph, e.
g. monthly expenditures divided into categories (housing, food, transportation, etc.). In sliding bar graphs, the bars are
divided by a horizontal line which serves as the baseline, enabling the representation of data above and below a specific
reference point, e. g. high and low temperatures v. average temperature. Pictorial graphs can be used to show
quantitative differences between groups. Pictorial graphs can be very deceptive: if the height of an image is doubled, its
area is quadrupled. Therefore, great care should be taken that images representing the same values must be the same size.
Circle (or pie) graphs, or 100% graphs are used to represent percentages and proportions. For the sake of readability, no
more than five variables should be compared in a single pie graph. Thesegments should be ordered very strictly: beginning
at twelve o’clock, order them from the largest to the smallest, and shade the segments from dark to light (i.e., the largest
segment should be the darkest). Lines and dots can be used for shading in black and white documents. Charts can
describe the relations between parts of a group or object or the sequence of operations in a process; charts are usually
boxes connected with lines. For example, organizational charts show the hierarchy in a group, flowcharts show the
sequence of steps in a process, and schematics show components in a system. Dot maps can show population density,
and shaded maps can show averages or percentages. In these cased, plotted data are superimposed on a map. Maps should
always be prepared by a professional artist, who should clearly indicate the compass orientation (e.g., north-south) of the
map, fully identify the map’s location, and provide the scale to which the map is drawn. Use arrows to help readers focus
on reference points. Drawings and photographs can be used to communicate very specific information about a subject.
Thanks to software, both are now highly manipulable. For the sake of readability and simplicity, line drawings should be
used, and photographs should have the highest possible contrast between the background and focal point. Cropping,
cutting out extraneous detail, can be very beneficial for a photograph. Use software like GraphicConverter or Photoshop to
convert color photographs to black and white before printing on a laser printer. Otherwise most printers will produce an
image with poor contrast. 12.8 PREPARATION OF FIGURE In preparing figures, communication and readability must be the
ultimate criteria. Avoid the temptation to use the special effects available in most advanced software packages. While
threedimensional effects, shading, and layered text may look interesting to the author, overuse, inconsistent use, and
misuse may distort the data, and distract or even annoy readers. Design properly done is inconspicuous, almost invisible,
because it supports communication. Design improperly, or amateurishly, done draws the reader’s attention from the data,
and makes him or her question the author’s credibility. The APA has determined specifications for the size of figures and
the fonts used in them. Figures of one column must be between 2 and 3.25 inches wide (5 to 8.45 cm). Two-column figures
must be between 4.25 and 6.875 inches wide (10.6 to 17.5 cm). The height of figures should not exceed the top and bottom
margins. The text in a figure should be in a san serif font (such as Helvetica, Arial, or Futura). The font size must be between
eight and fourteen point. Use circles and squares to distinguish curves on a line graph (at the same font size as the other
labels). 9 CREATING GRAPH Following these guidelines in creating a graph mechanically or with a computer. Computer
software that generates graphs will often handle most of these steps automatically. Use bright white paper. Use
medium lines for the vertical and horizontal axes. The best aspect ratio of the graph may depend on the data. Choose the
appropriate grid scale. Consider the range and scale separation to be used on both axes and the overall dimensions of the
figure so that plotted curves span the entire illustration. In line graphs, a change in the proportionate sizes of the x units
to the y units changes the slant of the line. Indicate units of measurement by placing tick marks on each axis at the
appropriate intervals. Use equal increments of space between tick marks on linear scales. If the units of measurement on
the axes do not begin at zero, break the axes with a double slash. Clearly label each axis with both the quantity measured
and the units in which the quantity is measured. Carry numerical labels for axis intervals to the same number of decimal
places. Position the axis label parallel to its axis. Do not stack letters so that the label reads vertically; do not place a label
perpendicular to the verticcal (y) axis unless it is very short (i.e., two words or a maximum of 10 characters). The numbering
and lettering of grid points should be horizontal on both axes. Use legibility as a guide in determining the number of
curves to place on a figure – usually no more than four curves per graph. Allow adequate space between and within curves,
rememering that the figure may need to be reduced. Use distinct, simple geometric forms for plot points; good choices
are open and solid circles and triangles. Combinations of squares and circles or squares and diamonds are not
recommended because they can be difficult to differentiate if the art is reduced as can open symbols with dots inside.
12.10 FIGURE LEGENDS AND CAPTIONS In APA journals, a legend explains the symbols used in the figure; it is placed within
and photographed as part of the figure. A caption is a concise explanation of the figure; it is typeset and placed below the
figure. For figures, make sure to include the figure number and a title with a legend and caption. These elements appear
below the visual display. For the figure number, type Figure X. Then type the title of the figure in sentence case. Follow the
title with a legend that explains the symbols in the figure and a caption that explains the figure. For example - Figure 1. How
to create figures in APA style. This figure illustrates effective elements in APA style figures. Captions serve as a brief, but
complete, explanation and as a title. For example, ‘Figure 4. Population’ is insufficient, whereas ‘Figure 4. Population of
Grand Rapids, MI by race (1980)’ is better. If the figure has a title in the image, crop it. Graphs should always include a
legend that explains the symbols, abbreviations, and terminology used in the figure. These terms must be consistent with
those used in the text and in other figures. The lettering in the legend should be of the same type and size as that used in
the figure.
FIGURE CHECKLIST
APA includes the following within the definition of figures - ● Graphs, ● Charts, ● Maps, ● Drawings, and ● Photographs.
Figure Checklist Is the figure necessary? Is the figure simple, clean, and free of extraneous detail? Are the data plotted
accurately? Is the grid scale correctly proportioned? Is the lettering large and dark enough to read? Is the lettering
compatible in size with the rest of the figure? Are parallel figures or equally important figures prepared according to the
same scale? Are terms spelled correctly? Are all abbreviations and symbols explained in a figure legend or figure
caption? Are the symbols, abbreviations, and terminology in the figure consistent with those in the figure caption? In other
figures? In the text? Are the figures numbered consecutively with Arabic numerals? Are all figures mentioned in the
text? Guidelines for Figures within Assignments These guidelines have been adapted from the Publication manual of the
American Psychological Association. Be selective in what figures, as well as the number of figures you include within your
text. Figures should supplement rather than duplicate your text. Ensure for all figures included that lines are smooth
and sharp, the typeface is legible, any units of measure are included, axes are clearly identified and elements within the
figure are labeled and explained. If required, include a legend explaining symbols used within a figure. Refer to every
figure within your text by the figure number (eg. Figure 1), highlighting only the point you want to emphasize. Figures
must be numbered consecutively in the order in which they appear within the text, in italics. That is, the first table is
labeled “Figure 1“, the second “Figure 2”, and so on. Include the figure number directly below the figure itself, followed
by a full-stop then a brief title. Capitalize only the first word of the title and any proper nouns. The title should be
descriptive of the contents of the figure. Include any additional notes for the figure directly underneath the figure. Any
changes to a figure from the original must be identified and included as a Note.
1. Qualitative Analysis
This approach mainly answers questions such as ‘why,’ ‘what’ or ‘how.’ Each of these questions is
addressed via quantitative techniques such as questionnaires, attitude scaling, standard outcomes, and
more. Such kind of analysis is usually in the form of texts and narratives, which might also include audio
and video representations.
2. Quantitative Analysis
Generally, this analysis is measured in terms of numbers. The data here present themselves in terms of
measurement scales and extend themselves for more statistical manipulation.
The other techniques include:
3. Text analysis
Text analysis is a technique to analyze texts to extract machine-readable facts. It aims to create
structured data out of free and unstructured content. The process consists of slicing and dicing heaps of
unstructured, heterogeneous files into easy-to-read, manage and interpret data pieces. It is also known
as text mining, text analytics, and information extraction.
The ambiguity of human languages is the biggest challenge of text analysis. For example, the humans
know that “Red Sox Tames Bull” refers to a baseball match, but if this text is fed to a computer without
background knowledge, then it would generate several linguistically valid interpretations, and sometimes
people not interested in baseball might have trouble understanding it too.
4. Statistical analysis
Statistics involves data collection, interpretation, and validation. Statistical analysis is the technique of
performing several statistical operations to quantify the data and apply statistical analysis. Quantitative
data involves descriptive data like surveys and observational data. It is also called a descriptive
analysis. It includes various tools to perform statistical data analysis such as SAS (Statistical Analysis
System), SPSS (Statistical Package for the Social Sciences), Stat soft, and more.
5. Diagnostic analysis
The diagnostic analysis is a step further to statistical analysis to provide more in-depth analysis to
answer the questions. It is also referred to as root cause analysis as it includes processes like data
discovery, mining and drill down and drill through.
The diagnostic analysis is a step further to statistical analysis to provide more in-depth analysis to
answer the questions. It is also referred to as root cause analysis as it includes processes like data
discovery, mining and drill down and drill through.
The functions of diagnostic analytics fall into three categories:
Identify anomalies: After performing statistical analysis, analysts are required to identify areas requiring
further study as such data raise questions that cannot be answered by looking at the data.
Drill into the Analytics (discovery): Identification of the data sources helps analysts explain the
anomalies. This step often requires analysts to look for patterns outside the existing data sets and
requires pulling in data from external sources, thus identifying correlations and determining if any of them
are causal in nature.
Determine Causal Relationships: Hidden relationships are uncovered by looking at events that might
have resulted in the identified anomalies. Probability theory, regression analysis, filtering, and time-series
data analytics can all be useful for uncovering hidden stories in the data.
6. Predictive analysis
Predictive analysis uses historical data and feds it into the machine learning model to find critical
patterns and trends. The model is applied to the current data to predict what would happen next. Many
organizations prefer it because of its various advantages like volume and type of data, faster and
cheaper computers, easy-to-use software, tighter economic conditions, and a need for competitive
differentiation.
The following are the common uses of predictive analysis:
Fraud Detection: Multiple analytics methods improves pattern detection and prevents criminal behavior.
Optimizing Marketing Campaigns: Predictive models help businesses attract, retain, and grow their
most profitable customers. It also helps in determining customer responses or purchases, promoting
cross-sell opportunities.
Improving Operations: The use of predictive models also involves forecasting inventory and managing
resources. For example, airlines use predictive models to set ticket prices.
Reducing Risk: Credit score that is used to assess a buyer’s likelihood of default for purchases is
generated by a predictive model that incorporates all data relevant to a person’s creditworthiness. Other
risk-related uses include insurance claims and collections.
7. Prescriptive Analysis
Prescriptive analytics suggests various courses of action and outlines what the potential implications
could be reached after predictive analysis. Prescriptive analysis generating automated decisions or
recommendations requires specific and unique algorithmic and clear direction from those utilizing the
analytical techniques.
2. Data Collection
o Gather your data based on your measurement parameters.
o Collect data from databases, websites, and many other sources. This data may not be structured
or uniform, which takes us to the next step.
3. Data Processing
o Organize your data and make sure to add side notes, if any.
o Cross-check data with reliable sources.
o Convert the data as per the scale of measurement you have defined earlier.
o Exclude irrelevant data.
4. Data Analysis
o Once you have collected your data, perform sorting, plotting, and identifying correlations.
o As you manipulate and organize your data, you may need to traverse your steps again from the
beginning, where you may need to modify your question, redefine parameters, and reorganize your
data.
o Make use of the different tools available for data analysis.
Once you have an inference, always remember it is only a hypothesis. Real-life scenarios may always
interfere with your results. In the process of Data Analysis, there are a few related terminologies that
identity with different phases of the process.
1. Data Mining
This process involves methods in finding patterns in the data sample.
2. Data Modelling
This refers to how an organization organizes and manages its data.
1. Excel
It has a variety of compelling features, and with additional plugins installed, it can handle a massive
amount of data. So, if you have data that does not come near the significant data margin, then Excel
can be a very versatile tool for data analysis.
2. Tableau
It falls under the BI Tool category, made for the sole purpose of data analysis. The essence of Tableau
is the Pivot Table and Pivot Chart and works towards representing data in the most user-friendly way. It
additionally has a data cleaning feature along with brilliant analytical functions.
If you want to learn Tableau, udemy's online course Hands-On Tableau Training for Data Science can
be a great asset for you.
3. Power BI
It initially started as a plugin for Excel, but later on, detached from it to develop in one of the most data
analytics tools. It comes in three versions: Free, Pro, and Premium. Its Power Pivot and DAX language
can implement sophisticated advanced analytics similar to writing Excel formulas.
4. Fine Report
Fine Report comes with a straightforward drag and drops operation, which helps to design various
styles of reports and build a data decision analysis system. It can directly connect to all kinds of
databases, and its format is similar to that of Excel. Additionally, it also provides a variety of dashboard
templates and several self-developed visual plug-in libraries.
5. R & Python
These are programming languages which are very powerful and flexible. R is best at statistical analysis,
such as normal distribution, cluster classification algorithms, and regression analysis. It also performs
individual predictive analysis like customer behavior, his spend, items preferred by him based on his
browsing history, and more. It also involves concepts of machine learning and artificial intelligence.
6. SAS
It is a programming language for data analytics and data manipulation, which can easily access data
from any source. SAS has introduced a broad set of customer profiling products for web, social media,
and marketing analytics. It can predict their behaviors, manage, and optimize communications.
Conclusion
This is a complete beginner guide about What is Data Analysis? Data Analysis is the key to any
business, whether it be starting up a new venture, making marketing decisions, continuing with a
particular course of action, or going for a complete shut-down. The inferences and the statistical
probabilities calculated from data analysis help to base the most critical decisions by ruling out all
human bias. Different analytical tools have overlapping functions and different limitations, but they are
also complementary tools. Before choosing a data analytical tool, it is essential to take into account the
scope of work, infrastructure limitations, economic feasibility, and the final report to be prepared.
3. Programming Skills
The responsibilities of Data Analysts are more inclined towards data crafting and presenting rather than
coding. However, without knowing programming languages, a data analyst is not able to put his
knowledge into practice. The programming languages in this context are R, Python, Matlab, and SAS. A
knowledge of these dominant languages helps a data analyst to perform advanced analytics on large
datasets without depending on an addon programming expert. It also proves to be a preferable factor for
recruiters.
4. Databases
In addition to programming languages and as a data analyst, you must have a sound knowledge of
databases. You need to understand the concepts of data storage, data warehouses, and data lakes.
SQL is the standard for current big data platforms. To work on data, you need to extract it from
databases that call for expertise in SQL. It enables you to handle structured data, query them from
databases, perform data wrangling, and data preparation. A Data Analyst hence, must possess sound
knowledge in RDBMS, SQL Queries, Indexes, Keys, and Tables. Some of the leading platforms to
consider are Apache Hadoop, Oracle, MySQL, HiveQL, and Microsoft SQL.
5. Communication Skills & Team Work
It is a non-technical skill but of utmost importance. Data Visualization and Storytelling are essential
responsibilities of a Data Analyst. A Data Analyst needs to work with several members of the
organization, such as business analysts, software teams, marketing teams, and more. You need to
communicate with your audience. Before that, you must be able to acquire all the relevant information
required to perform your analysis. Without a fluent ability to communicate, it is also challenging to work
as a team.
6. Research:
In order to infer accurate trends and patterns, a Data Analyst needs to have the ability to perform
research in the right area up to the right details. A Data Analyst should have the ability to select all
relevant and necessary data in order to reach an accurate insight and present a strong argument for the
company’s next big decision.
7. Problem-Solving Skills:
Despite all the attention to detail, there are instances when problems arise in algorithms. In such
situations, data analysts are expected to use their problem-solving skills, work with the team,
troubleshoot what went wrong and provide a solution via data analysis.
Career Prospects
The career path of a data analyst mainly depends upon the employer. Employers can be significant
investment firms, healthcare industry, retail, hospitality, marketing, insurance, or technology firms. A few
times, Data Analysts are labeled as Information Scientists. It involves working with the organization’s
core database infrastructure, thus acquiring additional technical expertise. Government Sectors,
insurance, and healthcare are such domains that rely on information scientists heavily for their deep
data infrastructures. Job opportunities for Data Analysts are plentiful with a multilane career path.
Data Analysts can have the following career prospects:
Imagine you wake up with your dream holiday destination and jump on to explore more about it online.
You search more about the place and enjoy reading information. And, you log into Facebook. What do
you see? Advertisements of your dream destination pop up in every corner of the screen. It implies that
smart digital assistants track your search and load you with additional information that might help you
make your dream come true.
That is where Big Data and Data Analytics tools and techniques help unfold the world of hidden, yet
targeted information.
A 2021 prediction says - each user would create 1.7 megabytes of new data every second. Within a
year, there would be 44 trillion gigabytes of data accumulated in the world. This raw data needs to be
analyzed for business decision making, optimizing business performances, studying customer trends,
and delivering better products and services.
There are many tools to assist this Data-Driven Decision-making process, and choosing the right tool is
a challenge for data scientists or data analysts. Common queries that could run in your mind are: how
many users use tools, how easy it is to learn, how it is placed in the market, and if you are a business
owner, you may be concerned about the cost of ownership of such tools.
1. Python
2. R
3. SAS
4. Excel
5. Power BI
6. Tableau
7. Apache Spark
Python was initially designed as an Object-Oriented Programming language for software and web
development and later enhanced for data science. Python is the fastest-growing programming languages
today.
It is a powerful Data Analysis tool and has a great set of friendly libraries for any aspect of scientific
computing.
Python is free, open-source software, and it is easy to learn.
Python’s data analysis library Pandas was built over NumPy, which is one of the earliest libraries in
Python for data science.
With Pandas, you can just do anything! You can perform advanced data manipulations and numeric
analysis using data frames.
Pandas support multiple file-formats; for example, you can import data from Excel spreadsheets to
processing sets for time-series analysis. (By definition - Time-series analysis is a statistical technique
that analyses time series data, i.e., data collected at a certain interval of time)
Pandas is a powerful tool for data visualizing, data masking, merging, indexing and grouping data, data
cleaning, and many more.
To know more about Pandas, checkout Python Pandas Tutorials.
Other libraries, such as Scipy, Scikit-learn, StatsModels, are used for statistical modeling, mathematical
algorithms, machine learning, and data mining.
Matplotlib, seaborn, and vispy are packages for data visualization and graphical analysis
Python has an extensive developer community for support and is the most widely used language
Top Companies that use Python for data analysis are Spotify, Netflix, NASA, Google and CERN and
many more
2. R
R is the leading programming language for statistical modeling, visualization, and data analysis. It is
majorly used by statisticians for statistical analysis, Big Data and machine learning.
R is a free, open-source programming language and has a lot of enhancements to it in the form of user
written packages
R has a steep learning curve and needs some amount of working knowledge of coding. However, it is a
great language when it comes to syntax and consistency.
R is a winner when it comes to EDA(By definition - In statistics, exploratory data analysis(EDA) is an
approach to analyzing data sets to summarize their main characteristics, often with visual methods).
Data manipulation in R is easy with packages such as plyr, dplyr, and tidy.
R is excellent when it comes to data visualization and analysis with packages such as ggplot, lattice,
ggvis, etc.
R has a huge community of developers for support.
R is used by
o Facebook - For behavior analysis related to status updates and profile pictures.
o Google - For advertising effectiveness and economic forecasting.
o Twitter - For data visualization and semantic clustering
o Uber - For statistical analysis
3. SAS
SAS is a statistical software suite widely used for BI (Business Intelligence), data management, and
predictive analysis.
SAS is proprietary software, and companies need to pay to use it. A free university edition has been
introduced for students to learn and use SAS.
SAS has a simple GUI; hence it is easy to learn; however, a good knowledge of the SAS programming
knowledge is an added advantage to use the tool.
SAS’s DATA step (The data step is where data is created, imported, modified, merged, or calculated)
helps inefficient data handling and manipulation. SAS’s data analytics process is as shown:
SAS’s Visual Analytics software is a powerful tool for interactive dashboards, reports, BI, self-service
analytics, Text analytics, and smart visualizations.
SAS is widely used in the pharmaceutical industry, BI, and weather forecasting.
Since SAS is a paid-for service, it has a 24X7 customer support to help with your doubts.
Google, Facebook, Netflix, Twitter are a few companies that use SAS.
SAS is used for clinical research reporting in Novartis and Covance, Citibank, Apple, Deloitte and much
more use SAS for predictive analysis
4. Excel
Excel is a spreadsheet and a simple yet powerful tool for data collection and analysis.
Excel is not free; it is a part of the Microsoft Office “suite” of programs.
Excel does not need a UI to enter data; you can start right away.
It is readily available, widely used and easy to learn and start on data analysis
The Data Analysis Toolpak in Excel offers a variety of options to perform statistical analysis of your data.
The charts and graphs in Excel give a clear interpretation and visualization of your data, which helps in
decision making as they are easy to understand.
The Analysis Toolpak feature needs to be enabled and configured in Excel, as shown.
Once the Toolpak has been set up, you will see the list of tools. You can choose the tool based on your
goals and the information that you want to analyze.
Excel is used by more than 750 million users across the world.
5. Power BI
Gartner says - Microsoft is a Magic Quadrant Leader among analytics and business intelligence platforms
Top companies using Power BI are Nestle, Tenneco, Ecolab, and more.
To know more about Power BI, you can click on the link.
6. Tableau
Tableau is a BI(Business Intelligence) tool developed for data analysts where one can visualize, analyze,
and understand their data.
Tableau is not free software, and the pricing varies as per different data needs
It is easy to learn and deploy Tableau
7. Apache Spark
Spark Is an integrated analytics engine for Big Data processing designed for developers, researchers, and
data scientists.
It is free, open-source and a wide range of developers contribute to its development
It is a high-performance tool and works well for batch and streaming data.
Learning Spark is easy, and you can use it interactively from the Scala, Python, R, and SQL shells too.
Spark can run on any platform such as Hadoop, Apache Mesos, standalone, or in the cloud. It can access
diverse data sources.
Spark includes libraries such as
o for SQL and structured data - SparkSQL
o Machine learning - MLlib
o Live dataStream processing - SparkStreaming
o Graph analytics - GraphX.
Uber, Slack, Shopify, and many other companies use Apache Spark for data analytics.
To know and learn Apache Spark, you can visit the link.
Summary
I am sure by now; you would have got a fair understanding of data analytics tools. For you to move
ahead in your data analytics journey and search for the right tool, you need to invest quite a bit of your
time in understanding your and/or your organization’s data needs, and then scout around analyzing
various tools available in the market and then decide.