Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
33 views20 pages

Basic Rules and What Data Analysis

The document outlines the definition, purpose, and steps involved in tabulation, including the basic structure and considerations for constructing tables and figures. It details various types of tables and figures, such as ANOVA and regression tables, and provides guidelines for their preparation and presentation. Additionally, it emphasizes the importance of clarity and consistency in data representation, along with the proper formatting of legends and captions for figures.

Uploaded by

Ahmed Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views20 pages

Basic Rules and What Data Analysis

The document outlines the definition, purpose, and steps involved in tabulation, including the basic structure and considerations for constructing tables and figures. It details various types of tables and figures, such as ANOVA and regression tables, and provides guidelines for their preparation and presentation. Additionally, it emphasizes the importance of clarity and consistency in data representation, along with the proper formatting of legends and captions for figures.

Uploaded by

Ahmed Amer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Definition of Tabulation

12.2 Purpose of the Tabulation


12.3 Steps in Tabulation
12.4 Basic Structure of a Table
12.5 Consideration in Construction of a Table
12.6 Deciding to Use Figure
12.7 Types of Figure
12.8 Preparation of Figure
12.9 Creating Graphs
12.10 Figure Legends and Captions
12.11 Figure Checklist

STEPS IN TABULATION

The steps are followed in tabulation -  Transformation of information according to classification from questionnaire to
worksheets for facilitating handling.  After summarization of information in the work-sheet draft table is to be prepared.
 Preparation of final table containing the results of draft table. Forms of Table Forms of tables may be single, double,
triple or manifold, according to the number of characteristic covered by the table. Practical illustration will make the idea
more clear. A simple table shows only one characteristic. The data are presented only in terms of one of their characteristic.
In two-fold table two characteristics are included.

Specific Types of Tables Analysis of Variance (ANOVA) Tables: The conventional format for an ANOVA table is to list the
source in the stub column, then the degrees of freedom (df) and the F ratios. Give the betweensubject variables and error
first, then within-subject and any error. Mean square errors must be enclosed in parentheses. Provide a general note to the
table to explain what those values mean (see example). Use asterisks to identify statistically significant F ratios, and provide
a probability footnote.

Regression Tables: Conventional reporting of regression analysis follows two formats. If the study is purely applied, list only
the raw or un-standardized coefficients (B). If the study is purely theoretical, list only the standardized coefficients (beta). If
the study was neither purely applied nor theoretical, then list both standardized and un-standardized coefficients. Specify
the type of analysis, either hierarchical or simultaneous, and provide the increments of change if you used hierarchical
regression.

Path and LISREL (Linear Structural Relations) Tables: Present the means, standard deviations, and intercorrelations of the
entire set of variables you use input to path and LISREL analysis. These data are essential for the reader to replicate or
confirm your analyses and are necessary for archival purposes, if your study is included in meta-analysis. To help the reader
interpret your table, give short descriptions instead of just a list of symbols of the x and y variables used in the models. If
you need to use acronyms, be sure to define each one.

Word Tables: Unlike most tables, which present quantitative data, some tables consist mainly of words. Word tables
present qualitative comparisons or descriptive information. For example, a word table can enable the reader to compare
characteristics of studies in an article that reviews many studies, or it can present questions and responses from a survey or
show an outline of the elements of a theory. Word tables illustrate the discussion in the text; they should not repeat the
discussion. Word tables include the same elements of format as do other types of tables - table number and title, headings,
rules, and possibly notes. Keep column entries brief and simple. Indent any runover lines in entries. Double-space all parts
of a word table.

Notes in Tables

There are three types of notes for tables - general, specific, and probability notes. All of them must be placed below the
table in that order. General notes explain, qualify or provide information about the table as a whole. Put explanations of
abbreviations, symbols, etc. here. Example: Note. The racial categories used by the US Census (African-American, Asian
American, Native-American, and Pacific Islander) have been collapsed into the category ‘non-White’. E = excludes
respondents who self-identified as ‘White’ and at least one other ‘non-White’ race. Specific notes explain, qualify or
provide information about a particular column, row, or individual entry. To indicate specific notes, use superscript
lowercase letters (e.g. a , b , c ), and order the superscripts from left to right, top to bottom. Each table’s first footnote must
be superscript a . Example: a n = 823. b One participant in this group was diagnosed with schizophrenia during the survey.
Probability notes provide the reader with the results of the texts for statistical significance. Asterisks indicate the values for
which the null hypothesis is rejected, with the probability (p-value) specified in the probability note. Such notes are BASIC
STRUCTURE OF A TABLE

The following parts are the basic structure of tables. Numbers: Number all tables with Arabic numerals sequentially. Do not
use suffix letters (e.g. Table 3a, 3b, 3c); instead, combine the related tables. If the manuscript includes an appendix with
tables, identify them with capital letters and Arabic numerals (e.g. Table A1, Table B2). Titles: Like the title of the paper
itself, each table must have a clear and concise title. When appropriate, you may use the title to explain an abbreviation
parenthetically. Example: Comparison of Median Income of Adopted Children (AC) v. Foster Children (FC) Headings: Keep
headings clear and brief. The heading should not be much wider than the widest entry in the column. Use of standard
abbreviations can aid in achieving that goal. All columns must have headings, even the stub column, which customarily lists
the major independent variables. Body: In reporting the data, consistency is key. Numerals should be expressed to a
consistent number of decimal places that is determined by the precision of measurement. Never change the unit of
measurement or the number of decimal places in the same column. More specifically the different parts of a table areTitle:
Each table has its title describing the contents. Sub Head: It describes the characteristic of the stub entries. Stub Entries:
These are the classification of actual data. Caption Head: This explains the data placed in each column of caption head.
Body: It contains the data in classified form. Foot Note: It may be used to describe anything. Source: Disclosing the source
of information. required only when relevant to the data in the table. Consistently use the same number of asterisks for a
given alpha level throughout your paper. Example: *p

Table Checklist  Is the table necessary?  Is the entire table single- or double-spaced (including the title, headings, and
notes)?  Are all comparable tables presented consistently?  Is the title brief but explanatory?  Does every column have a
column heading?  Are all abbreviations; special use of italics, parentheses, and dashes; and special symbols explained? 
Are all probability level values correctly identified, and are asterisks attached to the appropriate table entries? Is a
probability level assigned the same number of asterisks in all the tables in the same document?  Are the notes organized
according to the convention of general, specific, probability?  Are all vertical rules eliminated?  If the table or its data are
from another source, is the source properly cited?  Is the table referred to in the text?

DECIDING TO USE FIGURE In APA journals, any type of illustration other than a table is called a figure. Because tables are
typeset, rather than photographed from art-work supplied by the author, they are not considered figures. A figure may be a
chart, graph, photograph, drawing, or other depiction. Consider carefully whether to use a figure. Tables are often
preferred for the presentation of quantitative data in archival journals because they provide exact information; figures
typically require the reader to estimate values. On the other hand, figures convey at a quick glance an overall pattern of
results. They are especially useful in describing an interaction – or lack thereof- and nonlinear relations. A well-prepared
figure can also convey structural or pictorial concepts more efficiently than can text. During the process of drafting a
manuscript, and in deciding whether to use a figure, ask yourself these questions –  What idea do you need to convey?

Is the figure necessary? If it duplicates text, it is not necessary. If it complements text or eliminates lengthy discussion, it
may be the most efficient way to present the information.  What type of figure (e.g., graph, chart, diagram, drawing, map,
or photograph) is most suited to your purpose? Will a simple, relatively inexpensive figure (e.g., line art) convey the point as
well as an elaborate, expensive figure (e.g., photographs combined with line art, figures that are in color instead of in black
and white)? Standard for Figure The standards for good figures are simplicity, clarity, and continuity. A good figure- 
augments rather than duplicated the text;  conveys only essential facts;  omits visually distracting detail;  is easy to read
– its elements (type, lines, labels, symbols, etc.) are large enough to be read with ease in the printed form;  is easy to
understand – its purpose is readily apparent;  is consistent with and is prepared in the same style as similar figures in the
same article; that is, the lettering is of the same size and typeface, lines are of the same weight, and so forth; and  is
carefully planned and prepared. 12.7

TYPES OF FIGURE Several types of figures can be used to present data to the reader. Some-times the choice of which type
to use will be obvious, but at other times it will not. 

Graphs are good at quickly conveying relationships like comparison and distribution. The most common forms of graphs are
scatter plots, line graphs, bar graphs, pictorial graphs, and pie graphs. Graph shows relations - comparison and distribution
– in a set of data and may show, for example, absolute values, percentages, or index numbers. 

Scatter plots are composed of individual dots that represent the value of a specific event on the scale established by the
two variables plotted on the x- and y-axes. When the dots cluster together, a correlation is implied. On the other hand,
when the dots are scattered randomly, no correlation is seen. For example, a cluster of dots along a diagonal implies a
linear relationship, and if all the dots fall on a diagonal line, the coefficient of correlation is 1.00.

 Line graphs depict the relationship between quantitative variables. Customarily, the independent variable is plotted
along the x-axis (horizontally) and the dependent variable is plotted along the y-axis (vertically).

 Bar graphs come in three main types:

(1) solid vertical or horizontal bars,

(2) multiple bar graphs, and

(3) sliding bars. In solid bar graphs, the independent variable is categorical, and each bar represents one kind of datum, e. g.
a bar graph of monthly expenditures. A multiple bar graph can show more complex information than a simple bar graph, e.
g. monthly expenditures divided into categories (housing, food, transportation, etc.). In sliding bar graphs, the bars are
divided by a horizontal line which serves as the baseline, enabling the representation of data above and below a specific
reference point, e. g. high and low temperatures v. average temperature.  Pictorial graphs can be used to show
quantitative differences between groups. Pictorial graphs can be very deceptive: if the height of an image is doubled, its
area is quadrupled. Therefore, great care should be taken that images representing the same values must be the same size.
 Circle (or pie) graphs, or 100% graphs are used to represent percentages and proportions. For the sake of readability, no
more than five variables should be compared in a single pie graph. Thesegments should be ordered very strictly: beginning
at twelve o’clock, order them from the largest to the smallest, and shade the segments from dark to light (i.e., the largest
segment should be the darkest). Lines and dots can be used for shading in black and white documents.  Charts can
describe the relations between parts of a group or object or the sequence of operations in a process; charts are usually
boxes connected with lines. For example, organizational charts show the hierarchy in a group, flowcharts show the
sequence of steps in a process, and schematics show components in a system.  Dot maps can show population density,
and shaded maps can show averages or percentages. In these cased, plotted data are superimposed on a map. Maps should
always be prepared by a professional artist, who should clearly indicate the compass orientation (e.g., north-south) of the
map, fully identify the map’s location, and provide the scale to which the map is drawn. Use arrows to help readers focus
on reference points.  Drawings and photographs can be used to communicate very specific information about a subject.
Thanks to software, both are now highly manipulable. For the sake of readability and simplicity, line drawings should be
used, and photographs should have the highest possible contrast between the background and focal point. Cropping,
cutting out extraneous detail, can be very beneficial for a photograph. Use software like GraphicConverter or Photoshop to
convert color photographs to black and white before printing on a laser printer. Otherwise most printers will produce an
image with poor contrast. 12.8 PREPARATION OF FIGURE In preparing figures, communication and readability must be the
ultimate criteria. Avoid the temptation to use the special effects available in most advanced software packages. While
threedimensional effects, shading, and layered text may look interesting to the author, overuse, inconsistent use, and
misuse may distort the data, and distract or even annoy readers. Design properly done is inconspicuous, almost invisible,
because it supports communication. Design improperly, or amateurishly, done draws the reader’s attention from the data,
and makes him or her question the author’s credibility. The APA has determined specifications for the size of figures and
the fonts used in them. Figures of one column must be between 2 and 3.25 inches wide (5 to 8.45 cm). Two-column figures
must be between 4.25 and 6.875 inches wide (10.6 to 17.5 cm). The height of figures should not exceed the top and bottom
margins. The text in a figure should be in a san serif font (such as Helvetica, Arial, or Futura). The font size must be between
eight and fourteen point. Use circles and squares to distinguish curves on a line graph (at the same font size as the other
labels). 9 CREATING GRAPH Following these guidelines in creating a graph mechanically or with a computer. Computer
software that generates graphs will often handle most of these steps automatically.  Use bright white paper.  Use
medium lines for the vertical and horizontal axes. The best aspect ratio of the graph may depend on the data.  Choose the
appropriate grid scale. Consider the range and scale separation to be used on both axes and the overall dimensions of the
figure so that plotted curves span the entire illustration.  In line graphs, a change in the proportionate sizes of the x units
to the y units changes the slant of the line.  Indicate units of measurement by placing tick marks on each axis at the
appropriate intervals. Use equal increments of space between tick marks on linear scales.  If the units of measurement on
the axes do not begin at zero, break the axes with a double slash.  Clearly label each axis with both the quantity measured
and the units in which the quantity is measured. Carry numerical labels for axis intervals to the same number of decimal
places.  Position the axis label parallel to its axis. Do not stack letters so that the label reads vertically; do not place a label
perpendicular to the verticcal (y) axis unless it is very short (i.e., two words or a maximum of 10 characters). The numbering
and lettering of grid points should be horizontal on both axes.  Use legibility as a guide in determining the number of
curves to place on a figure – usually no more than four curves per graph. Allow adequate space between and within curves,
rememering that the figure may need to be reduced.  Use distinct, simple geometric forms for plot points; good choices
are open and solid circles and triangles. Combinations of squares and circles or squares and diamonds are not
recommended because they can be difficult to differentiate if the art is reduced as can open symbols with dots inside.
12.10 FIGURE LEGENDS AND CAPTIONS In APA journals, a legend explains the symbols used in the figure; it is placed within
and photographed as part of the figure. A caption is a concise explanation of the figure; it is typeset and placed below the
figure. For figures, make sure to include the figure number and a title with a legend and caption. These elements appear
below the visual display. For the figure number, type Figure X. Then type the title of the figure in sentence case. Follow the
title with a legend that explains the symbols in the figure and a caption that explains the figure. For example - Figure 1. How
to create figures in APA style. This figure illustrates effective elements in APA style figures. Captions serve as a brief, but
complete, explanation and as a title. For example, ‘Figure 4. Population’ is insufficient, whereas ‘Figure 4. Population of
Grand Rapids, MI by race (1980)’ is better. If the figure has a title in the image, crop it. Graphs should always include a
legend that explains the symbols, abbreviations, and terminology used in the figure. These terms must be consistent with
those used in the text and in other figures. The lettering in the legend should be of the same type and size as that used in
the figure.

FIGURE CHECKLIST

APA includes the following within the definition of figures - ● Graphs, ● Charts, ● Maps, ● Drawings, and ● Photographs.
Figure Checklist  Is the figure necessary?  Is the figure simple, clean, and free of extraneous detail?  Are the data plotted
accurately?  Is the grid scale correctly proportioned?  Is the lettering large and dark enough to read? Is the lettering
compatible in size with the rest of the figure?  Are parallel figures or equally important figures prepared according to the
same scale?  Are terms spelled correctly?  Are all abbreviations and symbols explained in a figure legend or figure
caption? Are the symbols, abbreviations, and terminology in the figure consistent with those in the figure caption? In other
figures? In the text?  Are the figures numbered consecutively with Arabic numerals?  Are all figures mentioned in the
text? Guidelines for Figures within Assignments These guidelines have been adapted from the Publication manual of the
American Psychological Association.  Be selective in what figures, as well as the number of figures you include within your
text.  Figures should supplement rather than duplicate your text.  Ensure for all figures included that lines are smooth
and sharp, the typeface is legible, any units of measure are included, axes are clearly identified and elements within the
figure are labeled and explained.  If required, include a legend explaining symbols used within a figure.  Refer to every
figure within your text by the figure number (eg. Figure 1), highlighting only the point you want to emphasize.  Figures
must be numbered consecutively in the order in which they appear within the text, in italics. That is, the first table is
labeled “Figure 1“, the second “Figure 2”, and so on.  Include the figure number directly below the figure itself, followed
by a full-stop then a brief title.  Capitalize only the first word of the title and any proper nouns. The title should be
descriptive of the contents of the figure.  Include any additional notes for the figure directly underneath the figure.  Any
changes to a figure from the original must be identified and included as a Note.

What is Data Analysis? Definition & Example


The systematic application of statistical and logical techniques to describe the data scope, modularize
the data structure, condense the data representation, illustrate via images, tables, and graphs, and
evaluate statistical inclinations, probability data, to derive meaningful conclusions, is known as Data
Analysis. These analytical procedures enable us to induce the underlying inference from data by
eliminating the unnecessary chaos created by the rest of it. The generation of data is a continual
process; this makes data analysis a continuous, iterative process where the collection and performing
data analysis simultaneously. Ensuring data integrity is one of the essential components of data
analysis.
There are various examples where data analysis is used ranging from transportation, risk and fraud
detection, customer interaction, city planning healthcare, web search, digital advertisement, and more.
Considering the example of healthcare as we have noticed recently that with the outbreak of the
pandemic Coronavirus hospitals are facing the challenge of coping up with the pressure in treating as
many patients as possible, considering data analysis allows to monitor machine and data usage in such
scenarios to achieve efficiency gain.

 Ensure availability of the necessary analytical skills


 Ensure appropriate implementation of data collection methods and analysis.
 Determine the statistical significance
 Check for inappropriate analysis
 Ensure the presence of legitimate and unbiased inference
 Ensure the reliability and validity of data, data sources, data analysis methods, and inferences derived.
 Account for the extent of analysis

Data Analysis Methods


There are two main methods of Data Analysis:

1. Qualitative Analysis
This approach mainly answers questions such as ‘why,’ ‘what’ or ‘how.’ Each of these questions is
addressed via quantitative techniques such as questionnaires, attitude scaling, standard outcomes, and
more. Such kind of analysis is usually in the form of texts and narratives, which might also include audio
and video representations.

2. Quantitative Analysis
Generally, this analysis is measured in terms of numbers. The data here present themselves in terms of
measurement scales and extend themselves for more statistical manipulation.
The other techniques include:

3. Text analysis
Text analysis is a technique to analyze texts to extract machine-readable facts. It aims to create
structured data out of free and unstructured content. The process consists of slicing and dicing heaps of
unstructured, heterogeneous files into easy-to-read, manage and interpret data pieces. It is also known
as text mining, text analytics, and information extraction.
The ambiguity of human languages is the biggest challenge of text analysis. For example, the humans
know that “Red Sox Tames Bull” refers to a baseball match, but if this text is fed to a computer without
background knowledge, then it would generate several linguistically valid interpretations, and sometimes
people not interested in baseball might have trouble understanding it too.

4. Statistical analysis
Statistics involves data collection, interpretation, and validation. Statistical analysis is the technique of
performing several statistical operations to quantify the data and apply statistical analysis. Quantitative
data involves descriptive data like surveys and observational data. It is also called a descriptive
analysis. It includes various tools to perform statistical data analysis such as SAS (Statistical Analysis
System), SPSS (Statistical Package for the Social Sciences), Stat soft, and more.

5. Diagnostic analysis
The diagnostic analysis is a step further to statistical analysis to provide more in-depth analysis to
answer the questions. It is also referred to as root cause analysis as it includes processes like data
discovery, mining and drill down and drill through.
The diagnostic analysis is a step further to statistical analysis to provide more in-depth analysis to
answer the questions. It is also referred to as root cause analysis as it includes processes like data
discovery, mining and drill down and drill through.
The functions of diagnostic analytics fall into three categories:

 Identify anomalies: After performing statistical analysis, analysts are required to identify areas requiring
further study as such data raise questions that cannot be answered by looking at the data.
 Drill into the Analytics (discovery): Identification of the data sources helps analysts explain the
anomalies. This step often requires analysts to look for patterns outside the existing data sets and
requires pulling in data from external sources, thus identifying correlations and determining if any of them
are causal in nature.
 Determine Causal Relationships: Hidden relationships are uncovered by looking at events that might
have resulted in the identified anomalies. Probability theory, regression analysis, filtering, and time-series
data analytics can all be useful for uncovering hidden stories in the data.

6. Predictive analysis
Predictive analysis uses historical data and feds it into the machine learning model to find critical
patterns and trends. The model is applied to the current data to predict what would happen next. Many
organizations prefer it because of its various advantages like volume and type of data, faster and
cheaper computers, easy-to-use software, tighter economic conditions, and a need for competitive
differentiation.
The following are the common uses of predictive analysis:

 Fraud Detection: Multiple analytics methods improves pattern detection and prevents criminal behavior.
 Optimizing Marketing Campaigns: Predictive models help businesses attract, retain, and grow their
most profitable customers. It also helps in determining customer responses or purchases, promoting
cross-sell opportunities.
 Improving Operations: The use of predictive models also involves forecasting inventory and managing
resources. For example, airlines use predictive models to set ticket prices.
 Reducing Risk: Credit score that is used to assess a buyer’s likelihood of default for purchases is
generated by a predictive model that incorporates all data relevant to a person’s creditworthiness. Other
risk-related uses include insurance claims and collections.

7. Prescriptive Analysis
Prescriptive analytics suggests various courses of action and outlines what the potential implications
could be reached after predictive analysis. Prescriptive analysis generating automated decisions or
recommendations requires specific and unique algorithmic and clear direction from those utilizing the
analytical techniques.

Data Analysis Process


Once you set out to collect data for analysis, you are overwhelmed by the amount of information that
you find to make a clear, concise decision. With so much data to handle, you need to identify relevant
data for your analysis to derive an accurate conclusion and make informed decisions. The following
simple steps help you identify and sort out your data for analysis.

1. Data Requirement Specification - define your scope:


o Define short and straightforward questions, the answers to which you finally need to make a
decision.
o Define measurement parameters
o Define which parameter you take into account and which one you are willing to negotiate.
o Define your unit of measurement. Ex – Time, Currency, Salary, and more.

2. Data Collection
o Gather your data based on your measurement parameters.
o Collect data from databases, websites, and many other sources. This data may not be structured
or uniform, which takes us to the next step.

3. Data Processing
o Organize your data and make sure to add side notes, if any.
o Cross-check data with reliable sources.
o Convert the data as per the scale of measurement you have defined earlier.
o Exclude irrelevant data.

4. Data Analysis
o Once you have collected your data, perform sorting, plotting, and identifying correlations.
o As you manipulate and organize your data, you may need to traverse your steps again from the
beginning, where you may need to modify your question, redefine parameters, and reorganize your
data.
o Make use of the different tools available for data analysis.

5. Infer and Interpret Results


o Review if the result answers your initial questions
o Review if you have considered all parameters for making the decision
o Review if there is any hindering factor for implementing the decision.
o Choose data visualization techniques to communicate the message better. These visualization
techniques may be charts, graphs, color coding, and more.

Once you have an inference, always remember it is only a hypothesis. Real-life scenarios may always
interfere with your results. In the process of Data Analysis, there are a few related terminologies that
identity with different phases of the process.

1. Data Mining
This process involves methods in finding patterns in the data sample.

2. Data Modelling
This refers to how an organization organizes and manages its data.

Data Analysis Techniques


There are different techniques for Data Analysis depending upon the question at hand, the type of data,
and the amount of data gathered. Each focuses on strategies of taking onto the new data, mining
insights, and drilling down into the information to transform facts and figures into decision making
parameters. Accordingly, the different techniques of data analysis can be categorized as follows:

1. Techniques based on Mathematics and Statistics


 Descriptive Analysis: Descriptive Analysis takes into account the historical data, Key Performance
Indicators, and describes the performance based on a chosen benchmark. It takes into account past
trends and how they might influence future performance.
 Dispersion Analysis: Dispersion in the area onto which a data set is spread. This technique allows data
analysts to determine the variability of the factors under study.
 Regression Analysis: This technique works by modeling the relationship between a dependent variable
and one or more independent variables. A regression model can be linear, multiple, logistic, ridge, non-
linear, life data, and more.
 Factor Analysis: This technique helps to determine if there exists any relationship between a set of
variables. In this process, it reveals other factors or variables that describe the patterns in the relationship
among the original variables. Factor Analysis leaps forward into useful clustering and classification
procedures.
 Discriminant Analysis: It is a classification technique in data mining. It identifies the different points on
different groups based on variable measurements. In simple terms, it identifies what makes two groups
different from one another; this helps to identify new items.
 Time Series Analysis: In this kind of analysis, measurements are spanned across time, which gives us a
collection of organized data known as time-series.

2. Techniques based on Artificial Intelligence and Machine Learning


 Artificial Neural Networks: a Neural network is a biologically-inspired programming paradigm that
presents a brain metaphor for processing information. An Artificial Neural Network is a system that
changes its structure based on information that flows through the network. ANN can accept noisy data and
are highly accurate. They can be considered highly dependable in business classification and forecasting
applications.
 Decision Trees: As the name stands, it is a tree-shaped model that represents a classification or
regression models. It divides a data set in smaller subsets simultaneously developing into a related
decision tree.
 Evolutionary Programming: This technique combines the different types of data analysis using
evolutionary algorithms. It is a domain-independent technique, which can explore ample search space and
manages attribute interaction very efficiently.
 Fuzzy Logic: It is a data analysis technique based on probability which helps in handling the uncertainties
in data mining techniques.

3. Techniques based on Visualization and Graphs


 Column Chart, Bar Chart: Both these charts are used to present numerical differences between
categories. The column chart takes to the height of the columns to reflect the differences. Axes
interchange in the case of the bar chart.
 Line Chart: This chart is used to represent the change of data over a continuous interval of time.
 Area Chart: This concept is based on the line chart. It additionally fills the area between the polyline and
the axis with color, thus representing better trend information.
 Pie Chart: It is used to represent the proportion of different classifications. It is only suitable for only one
series of data. However, it can be made multi-layered to represent the proportion of data in different
categories.
 Funnel Chart: This chart represents the proportion of each stage and reflects the size of each module. It
helps in comparing rankings.
 Word Cloud Chart: It is a visual representation of text data. It requires a large amount of data, and the
degree of discrimination needs to be high for users to perceive the most prominent one. It is not a very
accurate analytical technique.
 Gantt Chart: It shows the actual timing and the progress of activity in comparison to the requirements.
 Radar Chart: It is used to compare multiple quantized charts. It represents which variables in the data
have higher values and which have lower values. A radar chart is used for comparing classification and
series along with proportional representation.
 Scatter Plot: It shows the distribution of variables in the form of points over a rectangular coordinate
system. The distribution in the data points can reveal the correlation between the variables.
 Bubble Chart: It is a variation of the scatter plot. Here, in addition to the x and y coordinates, the area of
the bubble represents the 3rd value.
 Gauge: It is a kind of materialized chart. Here the scale represents the metric, and the pointer represents
the dimension. It is a suitable technique to represent interval comparisons.
 Frame Diagram: It is a visual representation of a hierarchy in the form of an inverted tree structure.
 Rectangular Tree Diagram: This technique is used to represent hierarchical relationships but at the same
level. It makes efficient use of space and represents the proportion represented by each rectangular area.
 Map
o Regional Map: It uses color to represent value distribution over a map partition.
o Point Map: It represents the geographical distribution of data in the form of points on a
geographical background. When the points are the same in size, it becomes meaningless for
single data, but if the points are as a bubble, then it additionally represents the size of the data in
each region.
o Flow Map: It represents the relationship between an inflow area and an outflow area. It represents
a line connecting the geometric centers of gravity of the spatial elements. The use of dynamic flow
lines helps reduce visual clutter.
o Heat Map: This represents the weight of each point in a geographic area. The color here
represents the density.

Data Analysis Tools


There are several data analysis tools available in the market, each with its own set of functions. The
selection of tools should always be based on the type of analysis performed, and the type of data
worked. Here is a list of a few compelling tools for Data Analysis.

1. Excel
It has a variety of compelling features, and with additional plugins installed, it can handle a massive
amount of data. So, if you have data that does not come near the significant data margin, then Excel
can be a very versatile tool for data analysis.

2. Tableau
It falls under the BI Tool category, made for the sole purpose of data analysis. The essence of Tableau
is the Pivot Table and Pivot Chart and works towards representing data in the most user-friendly way. It
additionally has a data cleaning feature along with brilliant analytical functions.
If you want to learn Tableau, udemy's online course Hands-On Tableau Training for Data Science can
be a great asset for you.
3. Power BI
It initially started as a plugin for Excel, but later on, detached from it to develop in one of the most data
analytics tools. It comes in three versions: Free, Pro, and Premium. Its Power Pivot and DAX language
can implement sophisticated advanced analytics similar to writing Excel formulas.

4. Fine Report
Fine Report comes with a straightforward drag and drops operation, which helps to design various
styles of reports and build a data decision analysis system. It can directly connect to all kinds of
databases, and its format is similar to that of Excel. Additionally, it also provides a variety of dashboard
templates and several self-developed visual plug-in libraries.

5. R & Python
These are programming languages which are very powerful and flexible. R is best at statistical analysis,
such as normal distribution, cluster classification algorithms, and regression analysis. It also performs
individual predictive analysis like customer behavior, his spend, items preferred by him based on his
browsing history, and more. It also involves concepts of machine learning and artificial intelligence.

6. SAS
It is a programming language for data analytics and data manipulation, which can easily access data
from any source. SAS has introduced a broad set of customer profiling products for web, social media,
and marketing analytics. It can predict their behaviors, manage, and optimize communications.

Conclusion
This is a complete beginner guide about What is Data Analysis? Data Analysis is the key to any
business, whether it be starting up a new venture, making marketing decisions, continuing with a
particular course of action, or going for a complete shut-down. The inferences and the statistical
probabilities calculated from data analysis help to base the most critical decisions by ruling out all
human bias. Different analytical tools have overlapping functions and different limitations, but they are
also complementary tools. Before choosing a data analytical tool, it is essential to take into account the
scope of work, infrastructure limitations, economic feasibility, and the final report to be prepared.

Who is a Data Analyst?


Analytics is one of the most in-demand jobs today in this age of digitization. With data as the most
powerful tool for business transformation, organizations are now on the look-out for resources who have
an understanding of data mechanics and the ability to interpret hidden trends in them, which can
influence business decisions.
The domain of analytics has three components – Business Context, Technology, and Data Science.
Data Science involves techniques for statistical and operations research, machine learning, and deep
learning algorithm. When we drill deep down Data Science, we see that Data Science calls for both
Data Scientists as well as Data Analysts. At the most advanced level, there is little difference between a
Data Scientist and a Data Analyst. Both have to handle large datasets, resolve complex problems, and
define new machine learning algorithms.
A Data Scientist is responsible for identifying a business problem that has a substantial business value
once solved. A Data Analyst addresses those business problems, finds answers to the questions put
forward by the data scientist, and presents different perspectives on the problem at hand.
Data Analysts are the people who can identify customer requirements, forecast using predictive
analysis, and exceptional clarity in data visualization and presentation to prescribe business decisions.

Required Primary Data Analyst skill sets


1. Analytical Thinking
Specified business requirements need analysis. It requires defining the main business questions, the
answers to which need extraction from datasets. Analytical thinking involves figuring out the parameters
to consider for defining the range of datasets, analyze them from different perspectives, determine
variable dependencies, and derive meaningful information from the results. Analytical thinking is the
ability to break down a complex problem into simple components and resolve these components one-
by-one.

2. Basic Mathematics & Statistics


Having a mathematical concept is very important in Data Analysis. Mathematical concepts help in
logical thinking, identify patterns, and design algorithms. Concepts include Linear Algebra, Calculus,
Optimization Theory, and Discrete Math. A proper grasp of statistical methods gives you the ability to
collect relevant data, perform the correct analyses, and present the results in the most useful form. This
entire process helps analysts to make predictions based on data. Key concepts required in this field are
the probability theory, data transformation, regression, classification, statistical computation, and
graphics.

3. Programming Skills
The responsibilities of Data Analysts are more inclined towards data crafting and presenting rather than
coding. However, without knowing programming languages, a data analyst is not able to put his
knowledge into practice. The programming languages in this context are R, Python, Matlab, and SAS. A
knowledge of these dominant languages helps a data analyst to perform advanced analytics on large
datasets without depending on an addon programming expert. It also proves to be a preferable factor for
recruiters.

4. Databases
In addition to programming languages and as a data analyst, you must have a sound knowledge of
databases. You need to understand the concepts of data storage, data warehouses, and data lakes.
SQL is the standard for current big data platforms. To work on data, you need to extract it from
databases that call for expertise in SQL. It enables you to handle structured data, query them from
databases, perform data wrangling, and data preparation. A Data Analyst hence, must possess sound
knowledge in RDBMS, SQL Queries, Indexes, Keys, and Tables. Some of the leading platforms to
consider are Apache Hadoop, Oracle, MySQL, HiveQL, and Microsoft SQL.
5. Communication Skills & Team Work
It is a non-technical skill but of utmost importance. Data Visualization and Storytelling are essential
responsibilities of a Data Analyst. A Data Analyst needs to work with several members of the
organization, such as business analysts, software teams, marketing teams, and more. You need to
communicate with your audience. Before that, you must be able to acquire all the relevant information
required to perform your analysis. Without a fluent ability to communicate, it is also challenging to work
as a team.

6. Research:
In order to infer accurate trends and patterns, a Data Analyst needs to have the ability to perform
research in the right area up to the right details. A Data Analyst should have the ability to select all
relevant and necessary data in order to reach an accurate insight and present a strong argument for the
company’s next big decision.

7. Problem-Solving Skills:
Despite all the attention to detail, there are instances when problems arise in algorithms. In such
situations, data analysts are expected to use their problem-solving skills, work with the team,
troubleshoot what went wrong and provide a solution via data analysis.

Career Prospects
The career path of a data analyst mainly depends upon the employer. Employers can be significant
investment firms, healthcare industry, retail, hospitality, marketing, insurance, or technology firms. A few
times, Data Analysts are labeled as Information Scientists. It involves working with the organization’s
core database infrastructure, thus acquiring additional technical expertise. Government Sectors,
insurance, and healthcare are such domains that rely on information scientists heavily for their deep
data infrastructures. Job opportunities for Data Analysts are plentiful with a multilane career path.
Data Analysts can have the following career prospects:

 Market Research Analyst


 Actuary
 Business Intelligence Developer
 Business Analyst
 Budget Analyst
 Credit Analyst
 Data Warehousing
 Data Administrators
 Financial Analysts
 Fraud Analyst
 HR Analyst
 Machine Learning Analyst
 System Engineers
 System Analysts
 Strategy Analyst
 Sales Analyst
 Social Media Data Analyst
 Web Analyst
Apart from data analysis expertise in the relevant domain, the basic skill sets required for all of the
above careers are the same. They are

 Mathematical and Statistical Acumen


 Analytical and Problem-Solving Skills.
 An understanding of Statistical Modelling Software
 Communication Fluency
 Attention to Detail

Imagine you wake up with your dream holiday destination and jump on to explore more about it online.
You search more about the place and enjoy reading information. And, you log into Facebook. What do
you see? Advertisements of your dream destination pop up in every corner of the screen. It implies that
smart digital assistants track your search and load you with additional information that might help you
make your dream come true.
That is where Big Data and Data Analytics tools and techniques help unfold the world of hidden, yet
targeted information.
A 2021 prediction says - each user would create 1.7 megabytes of new data every second. Within a
year, there would be 44 trillion gigabytes of data accumulated in the world. This raw data needs to be
analyzed for business decision making, optimizing business performances, studying customer trends,
and delivering better products and services.
There are many tools to assist this Data-Driven Decision-making process, and choosing the right tool is
a challenge for data scientists or data analysts. Common queries that could run in your mind are: how
many users use tools, how easy it is to learn, how it is placed in the market, and if you are a business
owner, you may be concerned about the cost of ownership of such tools.

Top Data Analytics Tools


Here are the top 7 data analytics tools in vogue today:

1. Python
2. R
3. SAS
4. Excel
5. Power BI
6. Tableau
7. Apache Spark

Let us walk through each of these tools.


1. Python

 Python was initially designed as an Object-Oriented Programming language for software and web
development and later enhanced for data science. Python is the fastest-growing programming languages
today.
 It is a powerful Data Analysis tool and has a great set of friendly libraries for any aspect of scientific
computing.
 Python is free, open-source software, and it is easy to learn.
 Python’s data analysis library Pandas was built over NumPy, which is one of the earliest libraries in
Python for data science.

With Pandas, you can just do anything! You can perform advanced data manipulations and numeric
analysis using data frames.
Pandas support multiple file-formats; for example, you can import data from Excel spreadsheets to
processing sets for time-series analysis. (By definition - Time-series analysis is a statistical technique
that analyses time series data, i.e., data collected at a certain interval of time)
Pandas is a powerful tool for data visualizing, data masking, merging, indexing and grouping data, data
cleaning, and many more.
To know more about Pandas, checkout Python Pandas Tutorials.

 Other libraries, such as Scipy, Scikit-learn, StatsModels, are used for statistical modeling, mathematical
algorithms, machine learning, and data mining.
 Matplotlib, seaborn, and vispy are packages for data visualization and graphical analysis
 Python has an extensive developer community for support and is the most widely used language
 Top Companies that use Python for data analysis are Spotify, Netflix, NASA, Google and CERN and
many more

2. R
 R is the leading programming language for statistical modeling, visualization, and data analysis. It is
majorly used by statisticians for statistical analysis, Big Data and machine learning.
 R is a free, open-source programming language and has a lot of enhancements to it in the form of user
written packages
 R has a steep learning curve and needs some amount of working knowledge of coding. However, it is a
great language when it comes to syntax and consistency.
 R is a winner when it comes to EDA(By definition - In statistics, exploratory data analysis(EDA) is an
approach to analyzing data sets to summarize their main characteristics, often with visual methods).
 Data manipulation in R is easy with packages such as plyr, dplyr, and tidy.
 R is excellent when it comes to data visualization and analysis with packages such as ggplot, lattice,
ggvis, etc.
 R has a huge community of developers for support.
 R is used by


o Facebook - For behavior analysis related to status updates and profile pictures.
o Google - For advertising effectiveness and economic forecasting.
o Twitter - For data visualization and semantic clustering
o Uber - For statistical analysis

To know more about R you can visit here:

3. SAS

 SAS is a statistical software suite widely used for BI (Business Intelligence), data management, and
predictive analysis.
 SAS is proprietary software, and companies need to pay to use it. A free university edition has been
introduced for students to learn and use SAS.
 SAS has a simple GUI; hence it is easy to learn; however, a good knowledge of the SAS programming
knowledge is an added advantage to use the tool.
 SAS’s DATA step (The data step is where data is created, imported, modified, merged, or calculated)
helps inefficient data handling and manipulation. SAS’s data analytics process is as shown:
 SAS’s Visual Analytics software is a powerful tool for interactive dashboards, reports, BI, self-service
analytics, Text analytics, and smart visualizations.

 SAS is widely used in the pharmaceutical industry, BI, and weather forecasting.
 Since SAS is a paid-for service, it has a 24X7 customer support to help with your doubts.
 Google, Facebook, Netflix, Twitter are a few companies that use SAS.
 SAS is used for clinical research reporting in Novartis and Covance, Citibank, Apple, Deloitte and much
more use SAS for predictive analysis

To know more about SAS you could visit here.

4. Excel

 Excel is a spreadsheet and a simple yet powerful tool for data collection and analysis.
 Excel is not free; it is a part of the Microsoft Office “suite” of programs.
 Excel does not need a UI to enter data; you can start right away.
 It is readily available, widely used and easy to learn and start on data analysis
 The Data Analysis Toolpak in Excel offers a variety of options to perform statistical analysis of your data.
The charts and graphs in Excel give a clear interpretation and visualization of your data, which helps in
decision making as they are easy to understand.

The Analysis Toolpak feature needs to be enabled and configured in Excel, as shown.
Once the Toolpak has been set up, you will see the list of tools. You can choose the tool based on your
goals and the information that you want to analyze.
 Excel is used by more than 750 million users across the world.

5. Power BI

 Power BI is yet another powerful business analytics solution by Microsoft.


 Power BI comes in three versions – Desktop, Pro, and Premium. The desktop version is free for users;
however, Pro and Premium are priced versions.
 You can visualize your data connect to many data sources and share the outcomes across your
organization.
 With Power BI, you can and bring your data to life with live dashboards and reports.
 Power BI integrates with other tools, including Microsoft Excel, so you can get up to speed quickly and
work seamlessly with your existing solutions.

 Gartner says - Microsoft is a Magic Quadrant Leader among analytics and business intelligence platforms

 Top companies using Power BI are Nestle, Tenneco, Ecolab, and more.

To know more about Power BI, you can click on the link.

6. Tableau

 Tableau is a BI(Business Intelligence) tool developed for data analysts where one can visualize, analyze,
and understand their data.
 Tableau is not free software, and the pricing varies as per different data needs
 It is easy to learn and deploy Tableau

To know and learn Tableau, you can visit the link.


 Tableau provides fast analytics; it can explore any type of data – spreadsheets, databases, data on
Hadoop and cloud services
 It is easy to use as it has a powerful drag and drop features that anyone with an intuitive mind can handle.
 The data visualization with smart dashboards can be shared within seconds.
 Top companies that use Tableau are Amazon, Citibank, Barclays, LinkedIn, and many more.

7. Apache Spark

 Spark Is an integrated analytics engine for Big Data processing designed for developers, researchers, and
data scientists.
 It is free, open-source and a wide range of developers contribute to its development
 It is a high-performance tool and works well for batch and streaming data.
 Learning Spark is easy, and you can use it interactively from the Scala, Python, R, and SQL shells too.

 Spark can run on any platform such as Hadoop, Apache Mesos, standalone, or in the cloud. It can access
diverse data sources.
 Spark includes libraries such as


o for SQL and structured data - SparkSQL
o Machine learning - MLlib
o Live dataStream processing - SparkStreaming
o Graph analytics - GraphX.
 Uber, Slack, Shopify, and many other companies use Apache Spark for data analytics.

To know and learn Apache Spark, you can visit the link.

Summary
I am sure by now; you would have got a fair understanding of data analytics tools. For you to move
ahead in your data analytics journey and search for the right tool, you need to invest quite a bit of your
time in understanding your and/or your organization’s data needs, and then scout around analyzing
various tools available in the market and then decide.

You might also like