TY.B.Sc[COMPUTER SCIENCE] Roll No.
1952034
Certificate
VIDYA PRASARAK MANDAL’S
R.Z. SHAH COLLEGE OF ARTS, SCIENCE &
COMMERCE
Mithagar Road,Mulund(E) - 400 081
This is to certify that Mr./Miss Ritesh Shukla Roll No.1952034 has
completed his journal in the subject of DATA SCIENCE under the program
of BSC.Computer Science Semester SEM-VI University of Mumbai during
the year 2020-2021
He has completed the prescribed practical satisfactorily.
Examination Seat No.1952014
Date: _________________
In charge Head of the Department
Sr. Title Date Signature
No
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
1 Practical of Data 7/02/2020
collection, Data curation and
management for Unstructured data
(NoSQL)
2 Practical of Data collection, Data 31/01/2020
curation and management for Large-
scale Data system (such as MongoDB)
7/02/2020
3 Practical of Principal Component
Analysis
4 Practical of Clustering 24/01/2020
5 Practical of Time-series forecasting 17/01/2020
6 Practical of Simple/Multiple Linear 24/01/2020
Regression
7/02/2020
7 Practical of Logistics Regression
14/02/2020
8 Practical of Hypothesis testing
21/02/2020
9 Practical of Analysis of Variance
28/02/2020
10 Practical of Decision Tree
INDEX
Practical No. 1 Date: 7/02/2020
Aim: Practical of Data collection, Data curation and management for Unstructured data
(NoSQL)
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Couchdb database ---
rscript Install
couchdb first
Rscript code
install.packages('sofa')
#devtools::install_github("rop
ensci/sofa") library('sofa')
#create
connection
object x<-
Cushion$n
ew()
#to check whether
object created
x$ping()
#create
database ty
db_create(x,dbna
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
me = 'ty')
db_list(x)
#create json doc
doc1<-'{"rollno":"01","name":"ABC","GRADE":"A"}'
doc_create(x,doc1,dbname
= "ty",docid = "a_1")
doc2<-'{"rollno":"02","name":"
PQR","GRADE":"A"}'
doc_create(x,doc2,dbname = "ty",docid = "a_2")
doc3<-'{"rollno":"03","name":"xyz","GRADE":"B","REMARK":"PASS"}'
doc_create(x,doc3,dbname = "ty",docid = "a_3")
#CHANGES FEED
db_changes(x,"ty")
#search for id > null
so all docs will
display
db_query(x,dbname
= "ty",
selector = list('_id'=list('$gt'=NULL)))$docs
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
#search for students with grade is A
db_query(x,dbname = "ty",selector = list(GRADE="A"))$docs
#search for students with remark =pass
db_query(x,dbname = "ty",selector = list(REMARK="PASS"))$docs
#return only certain fields where rollno>2
db_query(x,dbname = "ty",selector =
list(rollno=list('$gt'='02')),fields=c("name","GRADE"))$docs
#convert the result of a query
into a data frame using jsonlite
library("jsonlite")
res<-db_query(x,dbname = "ty",selector =
list('_id'=list('$gt'=NULL)),fields=c("name","rollno","GRADE","REMARK"),as="json")
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
#display json doc fromJSON(res)$docs
#doc_delete(cushion,d
bname,docid)
doc_delete(x,dbname =
"ty",docid = "a_2")
doc_get(x,dbname =
"ty",docid = "a_2")
doc2<-'{"name":"Sdrink","beer":"TEST","note":"yummy","note2":"yay"}'
doc_update(x,dbname = "ty",doc=doc2,docid="a_3",rev = "3-
b1fb56db955b142c6efd3b3c52fe9e1b")
doc3<-'{"rollno":"0
1",
"name":"UZMA",
"GRADE":"A"}'
doc_update(x,dbname = "ty",doc=doc3,docid = "a_1",rev = "1-
be7c98bddf8ea7c46f4f401ff387593d")
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 2 Date: 31/01/2020
Aim: Practical of Data collection, Data curation and management for
Large-scale Data system (such as MongoDB)
MongoDB Create database
MongoDB Drop Database
MongoDB Create collection
MongoDB Drop collection
MongoDB Insert Document
MongoDB Query Document
MongoDB Update Document
Delete document in MongoDB
MongoDB Projection
limit() and skip() method in MongoDB
Sorting of Documents in MongoDB
MongoDB Indexing
Starting server with mongo or mongodb
C:\>mongo
>db
Test
Create Database in MongoDB
Once you are in the MongoDB shell, create the database in MongoDB
by typing this command:
use database_name
For example: create a database “tycs”:
> use tycs
switched to db tycs
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
create a collection user and insert a document in it.
> db.user.insert({name: "Asif", age: 20})
O/P: WriteResult({ "nInserted" : 1 })
>show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
tycs 0.000GB
MongoDB Drop Database
The syntax to drop a Database is:
>db.dropDatabase()
O/P:
{ "dropped" : "Testdb", "ok" : 1 }
MongoDB Enterprise > show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
tycs 0.000GB
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
O/P:
Create Collection in MongoDB
Method 1: Creating the Collection in MongoDB on the fly
MongoDB Enterprise > use tycs
switched to db tycs
MongoDB Enterprise > db.tycs.insert({name:"Asif
khan",age:21,website:"www.google.com"})
O/P:
WriteResult({ "nInserted" : 1 })
Syntax: db.collection_name.find()
MongoDB Enterprise > db.tycs.find()
o/p:
{ "_id" : ObjectId("5e410808e3755b1e06a63d1d"), "name" : "Asif khan",
"age" : 21, "website" : "www.google.com" }
show collections
MongoDB Enterprise > show collections
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
O/P:
tycs
user
Drop collection in MongoDB
SYNTAX:
db.collection_name.drop()
MongoDB Enterprise > use students
switched to db students
MongoDB Enterprise > show collections
students
teachers
tycs
user
MongoDB Enterprise > db.user.drop()
true
MongoDB Enterprise > show collections
students
teacher
tycs
MongoDB Insert Document
Syntax to insert a document into the collection:
db.collection_name.insert()
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
> db.tycs.insert(
... {
... name: "ASIF",
... age: 20,
... email: "[email protected]",
... course: [ { name: "MongoDB", duration: 7 }, { name: "Java",
duration: 30 } ]
... }
... )
O/P:
WriteResult({ "nInserted" : 1 })
Verification:
Syntax:
db.collection_name.find()
> db.tycs.find()
{ "_id" : ObjectId("5c2d37734fa204bd77e7fc1c"), "name" : "ASIF",
"age" : 20, "email" : "[email protected]", "course" : [ { "name" :
"MongoDB", "duration" : 7 }, { "name" : "Java", "duration" : 30 } ] }
MongoDB Example: Insert Multiple Documents in collection
MongoDB Enterprise > var beginners=
... [
... "studentID":1001,
... "studentName":"Asif",
... "age":20
... },
... ]
MongoDB Query Document using find() method
Querying all the documents in JSON format
MongoDB Enterprise > db.students.find().pretty()
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
{
"_id" : ObjectId("5e410f3fe3755b1e06a63d1e"),
"studentID" : 1001,
"studentName" : "Asif",
"age" : 20
}
Query Document based on the criteria
> db.students.find({StudentName : "Asif"}).pretty()
{
"_id" : ObjectId("5c281c90c23e08d1515fd9cc"),
"StudentId" : 1001,
"StudentName" : "Asif",
"age" : 20
}
Updating Document using update() method
Syntax:
db.collection_name.update(criteria, update_data)
> use tycs
switched to db tycs
> show collections
beginnersbook
students
tycs
> db.createCollection("got")
{ "ok" : 1 }
> var abc = [
... {
... "_id" : ObjectId("59bd2e73ce524b733f14dd65"),
... "name" : "Asif",
... "age" : 20
... },
... ];
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
> db.got.find().pretty()
{
"_id" : ObjectId("59bd2e73ce524b733f14dd65"),
"name" : "steve",
"age" : 20
}
To update multiple documents with the update() method:
db.got.update({"name":"Jon Snow"},
{$set:{"name":"Kit Harington"}},{multi:true})
Updating Document using save() method
Syntax:
db.collection_name.save( {_id:ObjectId(), new_document} )
To get the _id of a document, you can either type this command:
db.got.find().pretty()
> db.got.find({"name": "Asif"}).pretty()
{
"_id" : ObjectId("59bd2e73ce524b733f14dd65"),
"name" : "Asif",
"age" : 20
}
> db.got.find().pretty()
{
"_id" : ObjectId("59bd2e73ce524b733f14dd65"),
"name" : "Steve",
"age" : 20
}
MongoDB Delete Document from a Collection
Syntax of remove() method:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
db.collection_name.remove(delete_criteria)
Delete Document using remove() method
> db.students.find().pretty()
{
"_id" : ObjectId("59bcecc7668dcce02aaa6fed"),
"StudentId" : 1001,
"StudentName" : "Steve",
"age" : 30
}
db.students.remove({"StudentId": 3333})
Output:
WriteResult({ "nRemoved" : 1 })
To verify whether the document is actually deleted. Type the following
command:
db.students.find().pretty()
It will list all the documents of students collection.
> use tycs
switched to db tycs
> db.students.find().pretty()
{
"_id" : ObjectId("5c281c90c23e08d1515fd9cc"),
"StudentId" : 1001,
"StudentName" : "Asif",
"age" : 20
}
{
"_id" : ObjectId("5c2d38934fa204bd77e7fc1d"),
"StudentId" : 1001,
"StudentName" : "Steve",
"age" : 30
}
Remove all Documents
db.collection_name.remove({})
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
MongoDB Projection
Syntax:
db.collection_name.find({},{field_key:1 or 0})
> db.students.find().pretty()
{
"_id" : ObjectId("5c281c90c23e08d1515fd9cc"),
"StudentId" : 1001,
"StudentName" : "Steve",
"age" : 20
}
> db.students.find({}, {"_id": 0, "StudentId" : 1})
{ "StudentId" : 1001 }
{ "StudentId" : 1002 }
> db.students.find({}, {"_id": 0, "StudentName" : 0, "age" : 0})
{ "StudentId" : 1001 }
{ "StudentId" : 1002 }
MongoDB – limit( ) and skip( ) method
The limit() method in MongoDB
Syntax:
db.collection_name.find().limit(number_of_documents)
db.studentdata.find({student_id : {$gt:2002}}).pretty()
db.studentdata.find({student_id : {$gt:2002}}).limit(1).pretty()
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
MongoDB Skip() Method
db.studentdata.find({student_id : {$gt:2002}}).limit(1).skip(1).pretty()
MongoDB sort() method
Sorting Documents using sort() method
Syntax of sort() method:
db.collecttion_name.find().sort({field_key:1 or -1})
1 is for ascending order and -1 is for descending order. The default
value is 1.
For example: collection studentdata contains following documents:
> db.studentdata.find().pretty()
{
"_id" : ObjectId("59bf63380be1d7770c3982af"),
"student_name" : "Steve",
"student_id" : 1001,
"student_age" :1002
}
Let’s display the student_id of all the documents in descending order:
> db.studentdata.find({}, {"student_id": 1, _id:0}).sort({"student_id": -1})
{ "student_id" : 1001 }
{ "student_id" : 1002 }
To display the student_id field of all the students in ascending order:
> db.studentdata.find({}, {"student_id": 1, _id:0}).sort({"student_id": 1})
{ "student_id" : 1001 }
{ "student_id" : 1002 }
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
MongoDB Indexing with Example
How to create index in MongoDB
db.collection_name.createIndex({field_name: 1 or -1})
The value 1 is for ascending order and -1 is for descending order.
Let’s create the index on student_name field in ascending order:
db.studentdata.createIndex({student_name: 1})
Output:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
MongoDB – Finding the indexes in a collection
db.collection_name.getIndexes()
> db.studentdata.getIndexes()
[
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.studentdata"
},
]
Practical No. 3 Date :7/02/2020
Aim: Practical of Principal Component Analysis
data_iris<-iris[1:4]
Cov_data<-cov(data_iris)
Eigen_data<-eigen(Cov_data)
PCA_data<-princomp(data_iris,cor="False")
Eigen_data$values
PCA_data$sdev^2
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
PCA_data$loadings[,1:4]
Eigen_data$vectors
summary(PCA_data)
biplot(PCA_data)
screeplot(PCA_data,type="lines")
model2=PCA_data$loadings[,1]
model2_scores<-as.matrix(data_iris)%*%model2
library(class)
install.packages("e1071")
library(e1071)
mod1<-naiveBayes(iris[,1:4],iris[,5])
mod2<-naiveBayes(model2_scores,iris[,5])
table(predict(mod1,iris[,1:4]),iris[,5])
table(predict(mod2,model2_scores),iris[,5])
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 4 Date :24/01/2020
Aim: Practical of Clustering
"K-means Clustering "
data(iris)
names(iris)
new_data<-subset(iris,select = c(-Species))
new_data
cl<-kmeans(new_data,3)
cl
data<-new_data
wss<-sapply(1:15,function(k){kmeans(data,k)$tot.withinss})
wss
plot(1:15,wss,type="b",pch=19,frame=FALSE,xlab ="Number of
clusters K",ylab = "Total within-clusters sums of squares")
library(cluster)
clusplot(new_data,cl$cluster,color=TRUE,shade=TRUE,labels=2,lines=
0)
cl$cluster
cl$centers
"agglomarative clustering "
clusters<-hclust(dist(iris[,3:4]))
plot(clusters)
clusterCut<-cutree(clusters,3)
table(clusterCut,iris$Species)
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 5 Date: 17/01/2020
Aim: Practical of Time-series forecasting
#consider the inbuilt data set Air Passengers
data("AirPassengers")
class(AirPassengers)
start(AirPassengers)
#to know the end of time series
end(AirPassengers)
frequency(AirPassengers)
#to know the mean median etc of the dataset
summary(AirPassengers)
#to plot the time series model
plot(AirPassengers)
abline(reg=lm(AirPassengers~time(AirPassengers)))
cycle(AirPassengers)
plot(aggregate(AirPassengers,FUN=mean))
boxplot(AirPassengers~cycle(AirPassengers))
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 6 Date : 24/01/2020
Aim: Practical of Simple/Multiple Linear Regression
#consider some data set
height<-
c(102,117,105,141,135,115,138,144,137,100,131,119,115,121,113)
weight<-c(61,46,62,54,60,69,51,50,46,64,48,56,64,48,59)
#lm is for Linear Regression
student<-lm(weight~height)
student
#to predict use predict command
predict(student,data.frame(height=199),interval="confidence")
#to plot the data
plot(student)
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 7 Date :7/02/2020
Aim: Practical of Logistics Regression
library(datasets)
ir_data<-iris
head(ir_data)
str(ir_data)
levels(ir_data$Species)
sum(is.na(ir_data))
ir_data<-ir_data[1:100,]
set.seed(100)
samp<-sample(1:100,80)
ir_test<-ir_data[samp,]
ir_ctrl<-ir_data[-samp,]
install.packages("ggplot2")
library(ggplot2)
library(ggplot2)
install.packages("GGally")
ggpairs(ir_test)
y<-ir_test$Species;
x<-ir_test$Sepal.Length
glfit<-glm(y~x,family='binomial')
summary(glfit)
newdata<-data.frame(x=ir_ctrl$Sepal.Length)
predicted_val<-predict(glfit,newdata,type="response")
prediction<-
data.frame(ir_ctrl$Sepal.Length,ir_ctrl$Species,predicted_val)
prediction
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
qplot(prediction[,1],round(prediction[,3]),col=prediction[,2],xlab='sepal.
Length',ylab='prediction using logistic Reg')
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No. 8 Date :14/02/2020
Aim: Practical of Hypothesis testing
#Entering the data
>x=c(6.2,6.6,7.1,7.4,7.6,7.9,8,8.3,8.4,8.5,8.6,8.8,8.8,9.1,9.2,9.4,9.7,9.
9,10.2,10.4,10.8,11.3,11.9)
#one-sample Hypothesis test
>t.test(x-9,alternative = "two.sided",conf.level = 0.95)
#Two-sample hypothesis test
>x=c(481,421,421,422,425,427,431,434,437,439,446,447,448,454,46
3,465)
>y=c(429,430,430,431,36,437,440,441,445,446,447)
>test2<-t.test(x,y,alternative = "two.sided",mu=0,var.equal =
F,conf.level = 0.95)
>test2
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Output:
Interpretation of the result:
The p-value (0.2998) is greater than the significance level 5%(1-0.95)
So,we conclude that the null hypothesis that the population means
are equal is plausible.
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No.9 Date : 21/02/2020
Aim: Practical of Analysis of Variance
ftest<-read.csv(file.choose(),sep=",",header = T)
var.test(ftest$time_g1,ftest$time_g2,alternative = "two.sided")
//one way anova
names(data1)
summary(data1)
head(data1)
anv<-aov(formula=satindex~dept,data=data1)
summary(anv)
//two way anova
data2<-read.csv(file.choose(),sep=",",header=T)
names(data2)
summary(data2)
head(data2)
anv1<-aov(formula = satindex~dept+exp+dept*exp,data=data2)
summary(anv1)
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Practical No.10 Date : 28/02/2020
Aim: Practical of Decision Tree
mydata<-data.frame(iris)
attach(mydata)
install.packages("rpart")
library(rpart)
model<-
rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,da
ta=mydata,method="class")
plot(model)
text(model,use.n=TRUE,all=TRUE,cex=0.8)
install.packages("tree")
library(tree)
model1<-
tree(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data
=mydata,method="class",split="gini")
plot(model1)
text(model,all=TRUE,cex=0.6)
install.packages("party")
library(party)
model2<-
ctree(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,dat
a=mydata)
plot(model2)
library(tree)
mydata<-data.frame(iris)
attach(mydata)
model<-
tree(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data
=mydata,method="class",control=tree.control(nobs=150,mincut=10))
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
plot(model1)
text(model,all=TRUE,cex=0.6)
predict(model,iris)
model2<-
ctree(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,dat
a=mydata,controls=ctree_control(maxdepth=2))
plot(model2)
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Output:
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE
TY.B.Sc[COMPUTER SCIENCE] Roll No. 1952034
Sub:DATA SCIENCE