Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views11 pages

Data Analysis

data analysis class notes 1

Uploaded by

rohit972012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

Data Analysis

data analysis class notes 1

Uploaded by

rohit972012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Analysis

Data Analysis is a process of inspecting, cleaning, transforming and


modeling data with the goal of discovering useful information,
suggesting conclusions and supporting decision-making

Application/Uses of Data Analytics

Data analytics in finance –

Financial data analytics interprets time-series data to understand risks involved in


monetary operations. Possible future scenarios relating to finance can be generated by
analyzing past data trends.

Data analytics in logistics –

Data analytics is also applied in the fields of logistics and delivery. Data analysis in these
fields enables the logistic companies to figure which is the best route to take for delivery.

Data analytics in transportation -


Data analysis in this domain helps carry out a segment-wise analysis. This includes road safety,
road, air and rail, and traffic management, route monitoring for waterway transport, etc.

Data analytics in manufacturing –


sing big data analytics in manufacturing, the efficiency in the supply chain of vehicle
manufacturing is improved. Product customization in the manufacturing industries are also
made easier with the introduction of data analytics.

Data analytics in healthcare –


in discovering which treatment to make referring to the trends in a patient’s medical history.
Other than a medical perspective, healthcare data analytics also plays key when it comes to
management.

Fraud detection

Many organizations in different industries use data analytics to detect fraudulent activities. These

industries include pharmaceutical, banking, finance, tax, retail, etc.


Security

Security personnel use data analytics (especially predictive analytics) to find future cases of
crimes or security breaches. They can also investigate past or ongoing attacks. Analytics makes it
possible to analyze how IT systems were breached during an attack

Marketing and digital advertising

Marketers use data analytics to understand the audience and get high conversion rates. There are
different activities in these two sub-applications, which are done using data analytics. To
understand the audience, digital ad experts use analytics to know the intended audience’s likes,
dislikes, age, race, gender, and other features.

Need of data analysis

Informed Decision-Making

Improved Understanding

Competitive Advantage

Risk Mitigation

Efficient Resource Allocation

Continuous Improvement

what is data and different types of data


Data is a collection of raw information that consists of facts and figures. It can come in the form
of text, observations, figures, images, numbers, graphs, or symbols.
There are 3 types:

Structured data –
Structured data is data whose elements are addressable for effective analysis. It has been organized
into a formatted repository that is typically a database.

It concerns all data which can be stored in database SQL in a table with rows and columns.

Semi-Structured data –
Semi-structured data is information that does not reside in a relational database but that has some
organizational properties that make it easier to analyze. With some processes, you can store them in
the relation database
Unstructured data –
Unstructured data is a data which is not organized in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms for storing and managing,

Differences between Structured, Semi-structured and Unstructured data:

Structured Unstructured
Properties data Semi-structured data data

It is based on It is based on It is based on


Technology Relational XML/RDF(Resource character and
database table Description Framework). binary data

Matured
No transaction
transaction and
Transaction Transaction is adapted management
various
management from DBMS not matured and no
concurrency
concurrency
techniques

It is more flexible than It is more flexible


It is schema
structured data but less and there is
Flexibility dependent and
flexible than unstructured absence of
less flexible
data schema

It is very difficult
It’s scaling is simpler than It is more
scalability to scale DB
structured data scalable.
schema

Types of Data Analytics


Descriptive data analytics

This type of data analytics examines past data to explain what had happened. It is the most
straightforward data analytics technique

Diagnostic data analytics

Diagnostic data analytics examines past data to explain the cause of an anomaly. This type of
analytics aims to answer “why did this happen?” from a descriptive analytics result.
Predictive data analytics

Predictive data analytics involves using current or historical data to predict future actions.

Individuals and companies conduct predictive analysis by combining historical data with

machine learning

Prescriptive data analytics

Prescriptive data analytics involves selecting the best solution for a problem from available

options. This type of data analytics examines results from other analytics and gives guidance on
how to reach a specific answer

Real-time data analytics

Real-time data analytics involves using data immediately when entered into the database. Unlike
other types of data analytics that use data from past events (historical data), this type analyses
new data from customers or external sources on the go.

Augmented data analytics

Augmented analytics uses machine language (ML) and natural language processing (NLP) to
analyze data.

Data Analysis Process consists of the following phases


Data Requirements Specification

The data required for analysis is based on a question or an

experiment.

Data Collection

Data Collection is the process of gathering information on targeted


variables identified as data requirements. Data Collection ensures
that data gathered is accurate such that the related decisions are
valid.

Data Processing

The data that is collected must be processed or organized for


analysis. This includes structuring the data as required for the
relevant Analysis Tools.

Data Cleaning

The processed and organized data may be incomplete, contain


duplicates, or contain errors. Data Cleaning is the process of
preventing and correcting these errors.
Data Analysis

Data that is processed, organized and cleaned would be ready for


the analysis. Various data analysis techniques are available to
understand, interpret, and derive conclusions based on the
requirements.

Communication

The results of the data analysis are to be reported in a format as


required by the users to support their decisions and further action.

data analytics work


1. Data collection
2. Adjusting data quality
3. Building an analytical model
4. Presentation

What is R Programming
 R programming is used as a leading tool for machine learning, statistics, and data analysis. Objects,
functions, and packages can easily be created by R.
 It’s a platform-independent language.
 It’s an open-source free language
 R programming language is not only a statistic package but also allows us to integrate with other
languages
Features of R
R Packages:
Distributed Computing:
Data analysis:

R – Array
R Arrays consist of all elements of the same data type. Arrays are essential data storage structures
defined by a fixed number of dimensions
In R Programming Language Uni-dimensional arrays are called vectors. Two-dimensional arrays are
called matrices, consisting of fixed numbers of rows and columns.
Creating an Array
An R array can be created with the use of array() the function. A list of elements is passed to the
array() functions along with the dimensions as required.

Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)

R – Matrices
In R programming, matrices are two-dimensional, homogeneous data structures. In a matrix, rows are
the ones that run horizontally and columns are the ones that run vertically.
Creating a Matrix in R
To create a matrix in R you need to use the function called matrix().
The arguments to this matrix() are the set of elements in the vector.
Syntax to Create R-Matrix
matrix(data, nrow, ncol, byrow, dimnames)

R Vectors

R Vectors are the same as the arrays in R language which are used to hold multiple data values of the same type.
One major key point is that in R Programming Language the indexing of the vector will start from ‘1’ and not from
‘0’.
Types of R vectors
Numeric vectors
Numeric vectors are those which contain numeric values such as integer, float, etc.
Character vectors
Character vectors in R contain alphanumeric values and special characters.
Logical vectors
Logical vectors in R contain Boolean values such as TRUE, FALSE and NA for Null values.
R Factors

Factors in R Programming Language are data structures that are implemented to categorize the data or represent
categorical data and store it on multiple levels.
Attributes of Factors in R Language
 x: It is the vector that needs to be converted into a factor.
 Levels: It is a set of distinct values which are given to the input vector x.
 Labels: It is a character vector corresponding to the number of labels.
 Exclude: This will mention all the values you want to exclude.
 Ordered: This logical attribute decides whether the levels are ordered.
Creating a Factor in R Programming Language
The command used to create or modify a factor in R language is – factor() with a vector as input.
Functions in R Programming

A function accepts input arguments and produces the output by executing valid R commands that are inside the
function.
Functions are useful when you want to perform a certain task multiple times.
In R Programming Language when you are creating a function the function name and the file in which you are
creating the function need not be the same
Creating a Function in R Programming
Functions are created in R by using the command function().
Packages in R
Packages in R Programming language are a set of R functions, compiled code, and sample data.
These are stored under a directory called “library” within the R environment. By default, R installs a
group of packages during installation. Once we start the R console, only the default packages are
available by default. Other packages that are already installed need to be loaded explicitly to be utilized
by the R program
explain 10 different R command
 print(): Displays an R object on the R console.
 read.table(): Reads files with labels in the first row.
 help(): Obtains documentation for a given R command.
 mtext(): Sets the title of a plot.
 matplot(): Creates a matrix plot.
 ls() Lists memory contents
 rm() Removes an item from memory
 boxplot() Produces a boxplot
 data() Load built-in dataset
 example() View some examples on the use of a command

Environments in R Programming

The environment is a virtual space that is triggered when an interpreter of a programming language is launched.
Environment can be assumed as a top-level object that contains the set of names/variables associated
with some values.
Create a New Environment
An environment in R programming can be created using new.env() function. Further, the variables can
be accessed using $ or [[ ]] operator. But, each variable is stored in different memory locations.
Syntax: new.env(hash = TRUE)
Or
Here are the steps to create an R environment
1. Start Navigator
2. Go to the Environments page
3. Click Create
4. Enter a descriptive name for your environment
5. Next to Packages, select version 3.7
6. Check the box next to R and select the version of R you want to use
7. Click Create

Control Statements in R Programming



Control statements are expressions used to control the execution and flow of the program based on the conditions
provided in the statements. These structures are used to make a decision after assessing the variable.
types of control statements
if condition
This control structure checks the expression provided in parenthesis is true or not. If true, the execution
of the statements in braces {} continues.

Syntax:
if(expression){
statements
....
....
}
if-else condition
It is similar to if condition but when the test expression in if condition fails, then statements
in else condition are executed.
Syntax:
if(expression){
statements
....
....
}
else{
statements
....
....
}
for loop
It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.
Syntax:
for(value in vector){
statements
....
....
}
Nested loops
Nested loops are similar to simple loops. Nested means loops inside loop. Moreover, nested loops are
used to manipulate the matrix.
while loop
while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.
Syntax:
while(expression){
statement
....
....
}
repeat loop and break statement
repeat is a loop which can be iterated many number of times but there is no exit condition to come out
from the loop. So, break statement is used to exit from the loop. break statement can be used in any
type of loop to exit from the loop.
Syntax:
repeat {
statements
....
....
if(expression) {
break
}
}
return statement
return statement is used to return the result of an executed function and returns control to the calling
function.
Syntax:
return(expression)
next statement
next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.
Here is an example of a program working for matrix in R:

MATRIX PROGRAM
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
ccol_names = c("col1", "col2", "col3")

#Creating matrix.
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames =
list(row_names, ccol_names))

#Accessing element present on 3rd row and 2nd column.


print(R[3,2])

#Adding columns to a matrix with the cbind() function.


new_col = c(17, 18, 19, 20)
R <- cbind(R, new_col)

#Printing the updated matrix.


print(R)

Here is the output of the program:


[1] 11
col1 col2 col3 col4
row1 5 6 7 8
row2 9 10 11 12
row3 13 14 15 16
row4 17 18 19 20
This program first defines the column and row names for the matrix. Then, it creates the matrix using
the matrix() function, specifying the number of rows and columns, the data elements, and the row and
column names.

You might also like