0% found this document useful (0 votes)

41 views52 pages

DV Lab

The document discusses using R packages like dplyr, tidyr, data.table, ggplot2 and pandas to perform data manipulation and visualization operations on datasets like iris, mtcars and airquality. It provides code snippets to filter, select, arrange, mutate and summarize data using functions from these packages. It also shows how to create visualizations like scatter plots, histograms, bar graphs and pie charts using ggplot2.

Uploaded by

Narenkumar. N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views52 pages

DV Lab

Uploaded by

Narenkumar. N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

EX.

NO:1 DPLYR PACKAGE

DATE:

Do the data manipulation operations for iris and mtcars dataset using dplyr package and obtain
the results for following functions

i)filter

ii)select

iii)arrange

iv) mutate

v) summarise

AIM:

To Do the data manipulation operations for iris and mtcars dataset using dplyr package and obtain the

results for following functions.

Procedure and code:

i) Filter:

To install dplyr, use the below command

install.packages("dplyr")

To load dplyr, use the below command

library(dplyr)

Loading a data set

data("mtcars")

data("iris")
mydata <- mtcars

mydata

Creating a local data frame. Local frame are easier to read

mynewdata <- tbl_df(mydata)

mynewdata

myirisdata <-tbl_df(iris)

myirisdata
Use filter to filter data with required condition

filter(mynewdata,cyl>4 & gear>4)

filter(mynewdata,cyl>4)

filter( myirisdata,Species %in% c('setosa' , 'virginica'))

II) select:

When you are working with large datasets with many columns, but you are interested in a few,

select() allows you to rapidly zoom in on on a useful subset using operations that usually only work

on numeric variable positions.

select(mynewdata, cyl, mpg, hp)

Hide a range of columns:

select(mynewdata,-c(mpg, cyl, disp))

III) arrange:

mynewdata %>%

select( cyl,wt,gear)%>%

arrange( desc(wt))
IV) mutate:

This function ,mutate() adds new variables while preserving the existing ones. mutate() is used to

select sets of existing columns and add new columns that are functions of existing columns.

mynewdata %>%

select( mpg,cyl)%>%

mutate( newvariable = mpg*cyl)

v)summarise:

The summarise() function collapses a data frame to a single row

myirisdata %>%

group_by(Species)%>%

summarise(average=mean(Sepal.Length,na.rm=TRUE))
RESULT:

The dplyr package Program has been Executed Successfully

EX.NO:2 TIDYR PACKAGE

DATE:
Create a data frame and do the following operations using tidyr package

i)gather
ii)spread
iii) separate

iv)unite
AIM:
To Create a data frame and do the following operations using tidyr package
Procedure and Code:
Installing tidyr package
install.packages('tidyr')
library(tidyr)
Creating a dummy data set.
name <- c('Akanash', 'Bhanu','Vinay', 'Varun', 'Prashanth')
weight <- c(35,45,55,65,75)
age <- c(20,21,22,23,24)
class <- c('maths','physics','chemistry','biology','science')
Create a data frame
tdata <- data.frame( name, weight, age, class)
tdata
I) gather():

gathers multiple columns and converts them into key: value pairs. This function transforms wide form of
data to long form. It can be used as an alternative to ‘melt’ in reshape package.
longt <- tdata %>% gather( key, value, weight:class)
longt

II) Spread():
Does reverse of gather. It accepts a key:value pair and converts it into
Separate columns.
wide <- longt %>% spread( key, value)
wide
III) separate():
Splits a column into multiple columns.
Use the separate function when you have date time variable in the data set. Because a column contains
multiple information , It make sense to split it and use those values individually. The following code
shows the usage of the separate function.
Create a data frame:
Humidity <- c(37.79,42.34,52.16,44.57,48.83,44.59)
Rain <- c(0.971360441,1.1096716,1.06475853,0.953183435,0.98878849,0.9887643)
Time <- c("13/03/2018 23:24","09/01/2019 15.44","25/12/2018 19:15","02/01/2019 07:46","14/03/2018
01:55","20/10/2018 20:52")
dset <- data.frame (Humidity,Rain,Time)
dset

Using separate function we can separate date, month,year.

separate_d <- dset %>% separate(Time,c('Date','Month','Year'))
separate_d
IV) unite():
Does reverse of separate. It unites multiple columns into single column.
unite_d <-separate_d%>%unite( Time,
c(Date, Month, Year), sep="/")
unite_d

RESULT:
The tidyr package Program has been Executed Successfully
EX.NO:3 TABLE PACKAGE

DATE:
Do the following operations for any external dataset using data. table package
i) Select a subset row
ii) Select a column with particular values

iii) Select columns with multiple values

AIM:
To do the data manipulation operations for external csv file using data.table package.
Procedure and Code:

Loading air quality data:

data("airquality")
mydata <- airquality
mydata

Loading iris data:

data("iris")
myiris <- iris
myiris
Converting into table format:
install.packages("data.table")
library(data.table)
myirisdata <-data.table(myiris)
myirisdata

i) Select subset rows:

mydata[2:4,]

ii) Select a column with particular values

myirisdata[Species == 'setosa’]
iii) Select columns with multiple values
myirisdata[Species %in% c('setosa','virginica')]

RESULT:
Thus, the Data Manipulation using DATA. Table Package Executed Successfully.
EXPNO: 4 GGPLOT PACKAGE

DATE:

AIM:
To do the different types of visualization for air quality data set using ggplot
package in R.
a. Line graph
a. Bar graph
a. Histogram
a. Scatter plot
a. Pie chart

PROCEDURE AND CODE:

Ggplot2 is a plotting package that provides helpful commands to create plots

from data in a data frame. It provides a more programmatic interface
Installation and Loading:
install.packages("ggplot2")
library(ggplot2)
The 'iris' data comprises of 150 observations with 5 variables.
i) Line graph:
ggplot(iris, aes(x=Sepal.Length, color=Species)) + geom_density( )
OUTPUT:

ii) Bar graph:

ggplot(mpg, aes(x= class)) + geom_bar()
qplot->plot function from ggplpt2 library
geom -> geometry (plot type)
fill -> Denotes colour
OUTPUT:
iii) Histogram:
ggplot(data = iris, aes( x = Sepal.Length)) + geom_histogram( )
OUTPUT:

iv) Scatter plot:

library(ggplot2)
ggplot(mtcars, aes(x = drat, y = mpg)) +
geom_point()
OUTPUT:

RESULT:
Thus, the Data Visualisation using DATA. GG Plot Package Executed Successfully.
EX.NO:5 PANDAS PACKAGE
DATE:

AIM:
To Do the data manipulation operations for iris and airquality dataset
using data.table package and obtain the results for following
functions.
a. Select a subset row
b. Select a column with particular values
c. Select columns with multiple values
d. Select a column to return a vector
e. Select multiple columns
f. Returns the sum and standard deviation
g. Sum of selected columns
PROCEDURE AND CODE:
Data.table package is a enhanced version of data.frame s, which
are the standard data structure for storing data in base R.
To install dplyr, use the below command
install.packages("data.table")

To load dplyr, use the below command

library(data.table)
Converted data set into data.table:
data<-as.data.table(iris)
air<-as.data.table(airquality)
head(air)
head(data)
i) Select subset rows:
head(data[,2:4])
OUTPUT:

ii) Select a column with particular values:

data[Species == 'setosa‘]
OUTPUT:

iii) Select columns with multiple values:

data [Species %in% c('setosa','virginica')]

OUTPUT:
iv) Select a column to return
a vector: air [, Temp]
OUTPUT:

v) Select multiple columns: air[,.(Temp,Month)]

OUTPUT:

vi) Returns the sum and standard deviation

myairdata[,.(sum(Ozone,na.rm=TRUE),
sd(Ozone,na.rm=TRUE))] OUTPUT:

vii) Sum of selected columns:

myairdata[,sum(Ozone,na.rm=TRUE)]
OUTPUT:
RESULT:
Thus, the DATA.TABLE Package Program has been Excuted
Successfully.
EX.NO:6 CREATE THE VISUALIZATION GRAPHS
DATE:

AIM:
To create the different types of graphs for user inputs. 1)Line graph 2)Line
Graph with style 3)Bar Graph(Horizontal and verticle) 4)Histogram
5)Scatter Plot.

PROCEDURE AND CODE:

1. LINE GRAPH:
import matplotlib.pyplot as plt
x=[5,6,8,10,15]
y=[20,30,40,50,55]
plt.plot(x,y)
plt.title("STUDENT DATA-LINE GRAPH")
plt.ylabel('Present %')
plt.xlabel('Roll.no')
plt.show()
OUTPUT PLOT:

2. LINE GRAPH WITH STYLE:

import matplotlib.pyplot as plt
import matplotlib.style
x=[5,6,8,10,15]
y=[20,30,40,50,55]
x2=[2,13,16,20,18]
y2=[25,35,16,23.5,40]
plt.plot(x,y,'c',label='A',linewidth=6)
plt.plot(x2,y2,'purple',label='B',linewidth=6)
plt.title('STUDENT DATA-LINE GRAPH WITH STYLE')
plt.ylabel('Present %')
plt.xlabel('Roll.no')
plt.legend()
plt.show()
OUTPUT PLOT:

3. BAR
GRAPH:
A-VERTICAL
import matplotlib.pyplot as plt
studentnames = ['Adeline','Jane','Roo','Bluewhale','Rossey'] marks =
[85,55,90,45,60]
plt.bar(studentnames,marks,color='purple')
plt.title('STUDENT DATA-BAR GRAPH VERTICAL')
plt.xlabel('NAMES')
plt.ylabel('MARKS)
plt.show()
OUTPUT PLOT:

B-HORIZONTAL :
import matplotlib.pyplot as plt
studentnames = ['Adeline','Jane','Roo','Bluewhale','Rossey'] marks =
[85,55,90,45,60]
plt.barh(studentnames,marks,color='c')
plt.title('STUDENT DATA-BAR GRAPH VERTICAL')
plt.xlabel('`MARKS')
plt.ylabel ('NAMES')
plt.show()
OUTPUT PLOT:

4. HISTOGRAM:
import matplotlib.pyplot as plt
student_marks=[45,12,13,26,15,55,100,98,95,54,58,56,52,24,71,66,6
6.5,12,23,55,78,10,9,5,10,22,35,65,45]
bins=[0,10,20,30,40,50,60,70,80,90,100]
plt.hist(student_marks,bins,histtype='bar',rwidth=0.8,color='purple')
plt.xlabel('MARKS')
plt.ylabel('NUMBER OF STUDENT')
plt.title('STUDENT DATA-HISTOGTAM')
plt.show()
OUTPUT PLOT:

5. SCATTER PLOT:
import matplotlib.pyplot as plt
import matplotlib.style
x=[5,6,8,10,15]
y=[20,30,40,50,55]
x2=[2,13,16,20,18]
y2=[25,35,16,23.5,40]
plt.scatter(x,y,color='purple')
plt.scatter(x2,y2,color='c')
plt.title=('STUDENT DATA-SCATTER PLOT')
plt.ylabel('Present %')
plt.xlabel('Roll.no')
plt.show()
OUTPUT PLOT:

RESULT:
The Visualization Graphs Program has been Executed Successfully
EX.NO:7 EXPLORATORY DATA ANALYSIS(EDA)
DATE:

AIM:
To Write the R program to implement the Exploratory
Data Analysis for the inbuild data set in data
visualization

PROCEDURE AND CODE:

The EDA approach can be used to gather knowledge about the
following aspects of data:
Main characteristics or features of the data.
Finding out the important variables that can be used in our
problem
In R Language, we are going to perform EDA under two broad
classifications:
Descriptive Statistics, which includes mean, median, mode,
inter-quartile range, and so on.
Graphical Methods, which includes histogram, density
estimation, box plots, and so on.
Reading dataset:
import pandas as pd
import numpy as np
data=pd.read_csv(r"D:\Dataset\train.csv")
print(data)
OUTPUT:

1) Getting first few rows of the dataset:

print(data.head())

OUTPUT:
2) Getting shape of the data:
print(data.shape)
OUTPUT:

3) Checking missing values in the data:

print(data.isnull().sum())
OUTPUT:

4) Checking Data Types of the data:

print(data.dtypes)
OUTPUT:
5) Filling missing values with categorical variable mode:

data["Gender"].fillna(data["Gender"].mode()[0],inplace=Tr
ue)
data["Married"].fillna(data["Married"].mode()[0],inplace=T
rue)
data["Dependents"].fillna(data["Dependents"].mode()[0],inp
lace=True)
data["Self_Employed"].fillna(data["Self_Employed"].mode(
)[0],inplace=True)

data["Loan_Amount_Term"].fillna(data["Loan_Amount_Te
rm"].mode()[0],inplace=True)
data["Credit_History"].fillna(data["Credit_History"].mode(
) [0],inplace=True)
6) Filling missing values with continuous variable
with mean:
data["LoanAmount"].fillna(data["LoanAmount"].mean(),inp
lace=True)
7) Checking missing values:

print(data.isnull().sum())
OUTPUT:
8) Converting Categorical into numerical:

print(data['Gender'].replace(['Male', 'Female'], [0,

1], inplace=True))
print(data['Married'].replace(['No', 'Yes'], [0, 1], inplace=True))
print(data['Dependents'].replace(['0', '1', '2', '3+'], [0, 1, 2, 3],
inplace=True))
print(data['Education'].replace(['Not Graduate', 'Graduate'], [0,
1], inplace=True))
print(data['Self_Employed'].replace(['No', 'Yes'], [0, 1],
inplace=True))
print(data['Property_Area'].replace(['Rural', 'Semiurban',
'Urban'], [0, 1, 2], inplace=True))
print(data['Loan_Status'].replace(['N', 'Y'], [0, 1], inplace=True))
9) Checking data values:

print(data.head())
OUTPUT:

10) Saving the pre-processed data:

data.to_csv(“new_data.csv”,index=False)
RESULT:
The Exploratory data Analysis Program has been Executed Successfully
EX.NO: 8 IBM WATSON STUDIO-PROJECT
DATE:

AIM:

To create a new Data visualization project in IBM Watson

Studio Using individual account.

PROCEDURE AND CODE:

1) Open IBM Watson studio and Login using your account. The
project and the catalog must be created by members of the same IBM
Cloud account.
2) Click New project on the home page or on your Projects page.

3) Choose whether to create an empty project or to create a

project based on an exported project file or a sample project.
4) On the New project screen, add a name and optional
description for the project.

If the project file that you select to import is encrypted, you must
enter the password that was used for encryption to enable decrypting
sensitive connection properties.

5) Create a New project in Data analysis section

6) Add to Project

7) Select Data for data manipulation.

8) Assets and Browse the data

9) Refine the data:

10) Visualizations

11) Columns to visualize:

12) Visualize the Data

RESULT:
Thus, the IBM-Watson Studio Project has been Executed
Successfully.
EX.NO: 9 DATA ANALYSIS – COVID 19 DATASET
DATE:

AIM:
To do the data analysis and visualization for covid19
dataset.

PROCEDURE AND CODE:

Import the Libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
2) Read the Covid Analysis Data set:
data=pd.read_csv(r"D:\Dataset\Covid
Analysis.csv") print(data.head())
OUTPUT:
3) Getting statistical information from the data:
print(data.describe())
OUTPUT:

4) Making serial number as an index:

print(data.index.name == 'S_No')
OUTPUT:

5) Removing Null value:

new_data =
data.drop(0)
print(new_data)
OUTPUT:
6) Describing the new cleaned dataset:
print(new_data.describe())
OUTPUT:

7) Fetching the information of Tamilnadu:

Tn = new_data.loc[29]
print(Tn)
OUTPUT:

8) Plotting a bargraph for STATE Vs Death:

plt.figure(figsize=(10,10))
plt.bar(new_data['Name of State / UT'],new_data['Death'])
plt.xticks(rotation=90)
plt.show()
OUTPUT:

9) Plotting a graph All variables for an each state:

plt.plot(new_data['Name of State / UT'],new_data['Date'],color='Blue')

plt.scatter(new_data['Name of State / UT'],new_data['Total Confirmed cases*

'],color='Blue')

plt.plot(new_data['Name of State /
UT'],new_data['Cured/Discharged/Migrated'],color='Red')

plt.scatter(new_data['Name of State / UT'],new_data['Death'],color='Red')

plt.plot(new_data['Name of State / UT'],new_data['Latitude'],color='Green')

plt.scatter(new_data['Name of State / UT'],

new_data['Longitude'],color='Green')

plt.xticks(rotation=90)

Plt.show()
OUTPUT:

RESULT:
The Data analysis Program has been Executed Successfully.

Justine, or The Misfortunes of Virtue by Marquis de Sade
100% (1)
Justine, or The Misfortunes of Virtue by Marquis de Sade
324 pages
Link-Belt RTC-8065
No ratings yet
Link-Belt RTC-8065
32 pages
Dplyr Cheatsheet PDF
100% (1)
Dplyr Cheatsheet PDF
2 pages
பூக்குழி பெருமாள்முருகன்
No ratings yet
பூக்குழி பெருமாள்முருகன்
108 pages
Variomatic Clutch Tuning Guide
0% (1)
Variomatic Clutch Tuning Guide
5 pages
Adp 7955 PDF
No ratings yet
Adp 7955 PDF
12 pages
Tirfor Wrench
No ratings yet
Tirfor Wrench
18 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Ibm Assignment
No ratings yet
Ibm Assignment
14 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
R Basics
No ratings yet
R Basics
18 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
05 Dplyr
No ratings yet
05 Dplyr
37 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Module IV
No ratings yet
Module IV
43 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
R Practical
No ratings yet
R Practical
9 pages
Data Transformacion Rstudio
No ratings yet
Data Transformacion Rstudio
2 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Data Transformation Cheatsheet R
No ratings yet
Data Transformation Cheatsheet R
2 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Data Transformation Cheatsheet
No ratings yet
Data Transformation Cheatsheet
2 pages
Tidyr & Dplyr Functions Guide
No ratings yet
Tidyr & Dplyr Functions Guide
3 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Group Manipulation and Data Reshaping in R
No ratings yet
Group Manipulation and Data Reshaping in R
10 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
No ratings yet
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
10 pages
R File Code
No ratings yet
R File Code
16 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
R Data Preprocessing Guide
No ratings yet
R Data Preprocessing Guide
24 pages
UL2
No ratings yet
UL2
2 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
32 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Untitled - English Revision 2
No ratings yet
Untitled - English Revision 2
11 pages
Being John Malkovich - A Screenplay by Charlie Kaufman
No ratings yet
Being John Malkovich - A Screenplay by Charlie Kaufman
148 pages
The Cambridge Companion To Samuel Beckett
No ratings yet
The Cambridge Companion To Samuel Beckett
317 pages
Unit 3 Is
No ratings yet
Unit 3 Is
2 pages
Wa0002.
No ratings yet
Wa0002.
79 pages
College (Untitled)
No ratings yet
College (Untitled)
27 pages
Socialalienationnavitha Sree
No ratings yet
Socialalienationnavitha Sree
5 pages
Face To Face
No ratings yet
Face To Face
136 pages
Draft
No ratings yet
Draft
7 pages
Aimbot
No ratings yet
Aimbot
1 page
Script
No ratings yet
Script
4 pages
AI Unit 2 QB
No ratings yet
AI Unit 2 QB
1 page
ED Unit 1
No ratings yet
ED Unit 1
8 pages
பெரியார் - அ.மார்க்ஸ்
No ratings yet
பெரியார் - அ.மார்க்ஸ்
70 pages
Ex 5
No ratings yet
Ex 5
3 pages
Is QB
No ratings yet
Is QB
1 page
AI Unit 1 QB
No ratings yet
AI Unit 1 QB
1 page
VB Lab Full Programs Record
No ratings yet
VB Lab Full Programs Record
50 pages
Aim Bot
No ratings yet
Aim Bot
11 pages
Linux Record 2023 24
No ratings yet
Linux Record 2023 24
36 pages
The Current State of Psychoanalytically
No ratings yet
The Current State of Psychoanalytically
6 pages
Dog
No ratings yet
Dog
2 pages
VB Unit 3 Notes
No ratings yet
VB Unit 3 Notes
17 pages
Gigerenzer 2023 Psychological Ai Designing Algorithms Informed by Human Psychology
No ratings yet
Gigerenzer 2023 Psychological Ai Designing Algorithms Informed by Human Psychology
10 pages
NTFX Price Prediction
No ratings yet
NTFX Price Prediction
5 pages
DV - Unit 1
No ratings yet
DV - Unit 1
40 pages
R Graphics for Data Visualization
No ratings yet
R Graphics for Data Visualization
73 pages
ஜெயகாந்தனின் வெளிவராத கதைகள்
No ratings yet
ஜெயகாந்தனின் வெளிவராத கதைகள்
118 pages
Method Statement For Installation of Underground UPVC Soil and Waste Piping
No ratings yet
Method Statement For Installation of Underground UPVC Soil and Waste Piping
2 pages
Construction BOQ: Water Riser & Plumbing
No ratings yet
Construction BOQ: Water Riser & Plumbing
2 pages
University Rover Challenge Prep
No ratings yet
University Rover Challenge Prep
4 pages
Structuro 100: Uses Properties
No ratings yet
Structuro 100: Uses Properties
2 pages
Aceite Paroil Atlas Copco Variedades
No ratings yet
Aceite Paroil Atlas Copco Variedades
1 page
Firestop - Wikipedia
No ratings yet
Firestop - Wikipedia
7 pages
Design Conditions For Morning Glory Spil
No ratings yet
Design Conditions For Morning Glory Spil
2 pages
New Resins For Dental Composites: A.P.P. Fugolin and C.S. Pfeifer
No ratings yet
New Resins For Dental Composites: A.P.P. Fugolin and C.S. Pfeifer
7 pages
Datasheet Transistor 2SK1284
No ratings yet
Datasheet Transistor 2SK1284
1 page
Bronze Alloy Specifications
No ratings yet
Bronze Alloy Specifications
2 pages
Submarine Pipeline Design Challenges
No ratings yet
Submarine Pipeline Design Challenges
19 pages
Magnetic Properties of Materials 2.6
100% (1)
Magnetic Properties of Materials 2.6
12 pages
Paint Specification Guide
No ratings yet
Paint Specification Guide
34 pages
BS 9999 in Relation To Changes in Fire Legislation: Bill Parlor
No ratings yet
BS 9999 in Relation To Changes in Fire Legislation: Bill Parlor
41 pages
JB Gupta Electrical
No ratings yet
JB Gupta Electrical
1,138 pages
SDS Oli Hidraulic VG68 - Update 2022
No ratings yet
SDS Oli Hidraulic VG68 - Update 2022
9 pages
Miniano-Bosh Presentation-Bsee3f
No ratings yet
Miniano-Bosh Presentation-Bsee3f
8 pages
Automatic Railway System Using Wireless Sensor Network
No ratings yet
Automatic Railway System Using Wireless Sensor Network
5 pages
Marine Environment Overview
No ratings yet
Marine Environment Overview
22 pages
09 Calorimetry & Heat Transfer (QB)
No ratings yet
09 Calorimetry & Heat Transfer (QB)
15 pages
Construction Supply Inventory
No ratings yet
Construction Supply Inventory
2 pages
Elements of Feedback Control Systems
No ratings yet
Elements of Feedback Control Systems
53 pages
Caterpillar 3516B DP2 System Guide
100% (1)
Caterpillar 3516B DP2 System Guide
22 pages
Modem Log Extraction Guide
No ratings yet
Modem Log Extraction Guide
6 pages
RF Module Guide for Engineers
No ratings yet
RF Module Guide for Engineers
6 pages
0265open
No ratings yet
0265open
8 pages

DV Lab

Uploaded by

DV Lab

Uploaded by

EX.

NO:1 DPLYR PACKAGE

results for following functions.

Procedure and code:

To install dplyr, use the below command

To load dplyr, use the below command

Loading a data set

Creating a local data frame. Local frame are easier to read

mynewdata <- tbl_df(mydata)

filter(mynewdata,cyl>4 & gear>4)

filter( myirisdata,Species %in% c('setosa' , 'virginica'))

on numeric variable positions.

select(mynewdata, cyl, mpg, hp)

Hide a range of columns:

select(mynewdata,-c(mpg, cyl, disp))

mutate( newvariable = mpg*cyl)

The summarise() function collapses a data frame to a single row

The dplyr package Program has been Executed Successfully

Using separate function we can separate date, month,year.

iii) Select columns with multiple values

Loading air quality data:

Loading iris data:

i) Select subset rows:

ii) Select a column with particular values

PROCEDURE AND CODE:

Ggplot2 is a plotting package that provides helpful commands to create plots

ii) Bar graph:

iv) Scatter plot:

To load dplyr, use the below command

ii) Select a column with particular values:

iii) Select columns with multiple values:

data [Species %in% c('setosa','virginica')]

v) Select multiple columns: air[,.(Temp,Month)]

vi) Returns the sum and standard deviation

vii) Sum of selected columns:

PROCEDURE AND CODE:

2. LINE GRAPH WITH STYLE:

PROCEDURE AND CODE:

1) Getting first few rows of the dataset:

3) Checking missing values in the data:

4) Checking Data Types of the data:

print(data['Gender'].replace(['Male', 'Female'], [0,

10) Saving the pre-processed data:

To create a new Data visualization project in IBM Watson

PROCEDURE AND CODE:

3) Choose whether to create an empty project or to create a

5) Create a New project in Data analysis section

7) Select Data for data manipulation.

9) Refine the data:

11) Columns to visualize:

PROCEDURE AND CODE:

4) Making serial number as an index:

5) Removing Null value:

7) Fetching the information of Tamilnadu:

8) Plotting a bargraph for STATE Vs Death:

9) Plotting a graph All variables for an each state:

plt.plot(new_data['Name of State / UT'],new_data['Date'],color='Blue')

plt.scatter(new_data['Name of State / UT'],new_data['Total Confirmed cases*

plt.scatter(new_data['Name of State / UT'],new_data['Death'],color='Red')

plt.plot(new_data['Name of State / UT'],new_data['Latitude'],color='Green')

plt.scatter(new_data['Name of State / UT'],

You might also like