0% found this document useful (0 votes)

8 views8 pages

Data Science

DATA SCIENCE LAB MANUAL

Uploaded by

Geetha A L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

Data Science

DATA SCIENCE LAB MANUAL

Uploaded by

Geetha A L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

MODULE-3

1 Train a regularized logistic regression classifier on the iris dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

or the inbuilt iris dataset) using sklearn. Train the model with the following hyperparameter C = 1e4 and report the best
classification accuracy.

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.linear_model

import LogisticRegression from sklearn.preprocessing import StandardScaler from

sklearn.pipeline import make_pipeline

# Load the Iris dataset iris = load_iris()

X = iris.data y = iris.target

# Split the data into training and testing

sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a pipeline with

StandardScaler and LogisticRegression with regularization pipeline = make_pipeline(StandardScaler(),

LogisticRegression(C=1e4, max_iter=1000)) # Train the model

pipeline.fit(X_train, y_train)

# Calculate the accuracy on the testing set accuracy =

pipeline.score(X_test, y_test) print("Classification accuracy:", accuracy)

2. Train an SVM classifier on the iris dataset using sklearn. Try different kernels and the associated hyperparameters. Train model
with the following set of hyperparameters RBFkernel, gamma=0.5, one-vs-rest classifier, no-feature-normalization. Also try
C=0.01,1,10C=0.01,1,10. For the above set of hyperparameters, find the best classification accuracy along with total number of
support vectors on the test data

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.svm import SVC

# Load the Iris dataset

iris = load_iris()

X = iris.data y = iris.target

# Split the data into training and

testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Set of hyperparameters

to try

hyperparameters = [

{'kernel': 'rbf', 'gamma': 0.5, 'C': 0.01},

{'kernel': 'rbf', 'gamma': 0.5, 'C': 1},

{'kernel': 'rbf', 'gamma': 0.5, 'C': 10}

best_accuracy = 0 best_model = None

best_support_vectors = None

# Train SVM models with different hyperparameters and find the best accuracy for params in

hyperparameters:
model = SVC(kernel=params['kernel'], gamma=params['gamma'], C=params['C'],
decision_function_shape='ovr')

model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test) support_vectors =

model.n_support_.sum()

print(f"For hyperparameters: {params}, Accuracy: {accuracy}, Total

Support Vectors:
{support_vectors}")

if accuracy > best_accuracy:

best_accuracy = accuracy best_model = model

best_support_vectors = support_vectors print("\nBest accuracy:",

best_accuracy)

print("Total support vectors on test data:", best_support_vectors)

MODULE-4

Consider the following dataset. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
from sklearn.tree import DecisionTreeClassifier, export_graphviz

from sklearn.model_selection import train_test_split from sklearn.metrics import

accuracy_score

import pandas as pd from io import StringIO

from IPython.display import Image import pydotplus

# Define the dataset data = {

'Price': ['Low', 'Low', 'Low', 'Low', 'Low', 'Med',

'Med', 'Med', 'Med', 'High', 'High', 'High',
'High'],

'Maintenance': ['Low', 'Med', 'Low', 'Med', 'High', 'Med', 'Med', 'High', 'High', 'Med', 'Med', 'High', 'High'],

'Capacity': ['2', '4', '4', '4', '4', '4', '4', '2', '5', '4', '2', '2', '5'],

'Airbag': ['No', 'Yes', 'No', 'No', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes'], 'Profitable': [1, 1, 1, 0, 0, 0,

1, 0, 1, 1, 1, 0, 1]

df = pd.DataFrame(data)

# Convert categorical variables into numerical ones

df = pd.get_dummies(df, columns=['Price', 'Maintenance', 'Airbag']) # Separate features and

target variable

X = df.drop('Profitable', axis=1) y = df['Profitable']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree

classifier

clf = DecisionTreeClassifier(criterion='entropy')
# Train the classifier on the training data

clf.fit(X_train, y_train)

# Predict on the testing data y_pred = clf.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred) print("Accuracy:",

accuracy)

# Visualize the decision tree dot_data = StringIO()

export_graphviz(clf, out_file=dot_data, filled=True, rounded=True, special_characters=True, feature_names=X.columns)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png())

Consider the dataset spiral.txt (https://bit.ly/2Lm75Ly). The first two columns in the dataset corresponds to the co-ordinates of each
data point. The third column corresponds to the actual cluster label. Compute the rand index for the following methods:

K – means Clustering

 Single – link Hierarchical Clustering

 Complete link hierarchical clustering.

 Also visualize the dataset and which algorithm will be able to recover the true clusters.

import numpy as np

from sklearn.cluster import KMeans, AgglomerativeClustering from

sklearn.metrics import adjusted_rand_score

import matplotlib.pyplot as plt

# Load the dataset

data = np.loadtxt("Spiral.txt", delimiter=",", skiprows=1)

X = data[:, :2] # Features

y_true = data[:, 2] # Actual cluster labels

# Visualize the dataset

plt.figure(figsize=(8, 6))

plt.scatter(X[:, 0], X[:, 1], c=y_true, cmap='viridis') plt.title('True

Clusters')

plt.xlabel('X1')

plt.ylabel('X2') plt.show()

# K-means clustering

# kmeans = KMeans(n_clusters=3, random_state=42)

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) kmeans_clusters =

kmeans.fit_predict(X)

# Single-link Hierarchical Clustering

single_link = AgglomerativeClustering(n_clusters=3, linkage='single') single_link_clusters

= single_link.fit_predict(X)

# Complete-link Hierarchical Clustering

complete_link = AgglomerativeClustering(n_clusters=3, linkage='complete') complete_link_clusters

= complete_link.fit_predict(X)
# Compute the Rand Index

rand_index_kmeans = adjusted_rand_score(y_true, kmeans_clusters) rand_index_single_link =

adjusted_rand_score(y_true, single_link_clusters) rand_index_complete_link = adjusted_rand_score(y_true,

complete_link_clusters)

print("Rand Index for K-means Clustering:", rand_index_kmeans)

print("Rand Index for Single-link Hierarchical Clustering:", rand_index_single_link) print("Rand Index for Complete-

link Hierarchical Clustering:", rand_index_complete_link)

# This code will compute the Rand Index for each clustering method and provide a visualization of the true clusters.

# The Rand Index ranges from 0 to 1, where 1 indicates perfect clustering agreement with the true clusters.

# The method with a higher Rand Index is better at recovering the true clusters.

MODULE-5

Mini Project – Simple web scrapping in social media

import requests

from bs4 import BeautifulSoup

# URL of the Instagram profile you want to scrape url = 'https://

www.instagram.com/openai/'

# Send a GET request to the URL response =

requests.get(url) print(response.status_code)

# Check if the request was successful (status

code 200)
if response.status_code == 200:

# Parse the HTML content of the page

soup = BeautifulSoup(response.text, 'html.parser') # Find all post

elements

posts = soup.find_all('div', class_='v1Nh3') # Extract data

from each post

for post in posts:

print("Hi")

# Extract post link

post_link = post.find('a')['href'] # Extract post

image URL

image_url = post.find('img')['src'] print(f"Post

Link: {post_link}") print(f"Image URL:

{image_url}") print("------")

else:

print("Failed to retrieve data from Instagram")

AI Lab M.Tech
No ratings yet
AI Lab M.Tech
29 pages
A First Course in Stochastic Processes - Karlin S., Taylor H.M
80% (5)
A First Course in Stochastic Processes - Karlin S., Taylor H.M
573 pages
VAMSHI PR (1) 2 Edit
No ratings yet
VAMSHI PR (1) 2 Edit
16 pages
Machine - Learning - Assignment - 3
No ratings yet
Machine - Learning - Assignment - 3
5 pages
ML Surya
No ratings yet
ML Surya
19 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
BigData Week13
No ratings yet
BigData Week13
62 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
ML File
No ratings yet
ML File
7 pages
ML Remaining Jds
No ratings yet
ML Remaining Jds
35 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
26 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
ML Models
No ratings yet
ML Models
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
CS3491 Lab Manual
No ratings yet
CS3491 Lab Manual
21 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
VI SEMESTER ML Syllubus Lab Theory r20
No ratings yet
VI SEMESTER ML Syllubus Lab Theory r20
4 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Cheat Sheet Building Supervised Learning Models
No ratings yet
Cheat Sheet Building Supervised Learning Models
3 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Ex 6, EX 7 AIML
No ratings yet
Ex 6, EX 7 AIML
9 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Practical File DL
No ratings yet
Practical File DL
14 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
ML Assignment 1 - Nageswar
No ratings yet
ML Assignment 1 - Nageswar
7 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Machine Learning Guide for Beginners
No ratings yet
Machine Learning Guide for Beginners
24 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Vtu ML
No ratings yet
Vtu ML
13 pages
3 Classification
No ratings yet
3 Classification
16 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
Exp4 - Supervised Learning
No ratings yet
Exp4 - Supervised Learning
10 pages
Classification Review
No ratings yet
Classification Review
8 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
ML06 Classical Techniques
No ratings yet
ML06 Classical Techniques
38 pages
TD2345
No ratings yet
TD2345
3 pages
Classification
No ratings yet
Classification
4 pages
V.K.R, V.N.B & A.G.K College of Engineering: Explain The Two Uses of Features in Machine Learning
No ratings yet
V.K.R, V.N.B & A.G.K College of Engineering: Explain The Two Uses of Features in Machine Learning
2 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Hyperparameter Tuning Guide
No ratings yet
Hyperparameter Tuning Guide
7 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
P 4 Andp 5
No ratings yet
P 4 Andp 5
4 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Karmbir 19 ML
No ratings yet
Karmbir 19 ML
20 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
III B. Tech II Semester Supplementary Examinations, December - 2023 Machine Learning
No ratings yet
III B. Tech II Semester Supplementary Examinations, December - 2023 Machine Learning
13 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Customer Segmentation Techniques On E-Commerce
No ratings yet
Customer Segmentation Techniques On E-Commerce
4 pages
Python Machine Learning in 7 Days
No ratings yet
Python Machine Learning in 7 Days
10 pages
Machine Learning Timetable
No ratings yet
Machine Learning Timetable
4 pages
Predictive Modeling for Data Scientists
No ratings yet
Predictive Modeling for Data Scientists
16 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Classification
No ratings yet
Classification
14 pages
Anomaly Detection Algorithms For RapidMiner
No ratings yet
Anomaly Detection Algorithms For RapidMiner
12 pages
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
Ensemble Learning
No ratings yet
Ensemble Learning
9 pages
MachineLearning Unit-III
No ratings yet
MachineLearning Unit-III
26 pages
KNN Limitations in Spam Filtering
No ratings yet
KNN Limitations in Spam Filtering
9 pages
ML Visuals
No ratings yet
ML Visuals
61 pages
Machine Learning
No ratings yet
Machine Learning
73 pages
Analysis of Bank Marketing For Term Deposit Using Data Mining Techniques
No ratings yet
Analysis of Bank Marketing For Term Deposit Using Data Mining Techniques
11 pages
Unit 9 - Classification & Clustering
No ratings yet
Unit 9 - Classification & Clustering
34 pages
Instagram's Impact on Student Vocabulary
No ratings yet
Instagram's Impact on Student Vocabulary
134 pages
Classification
No ratings yet
Classification
10 pages
17 Ensemble Techniques Problem Statement
No ratings yet
17 Ensemble Techniques Problem Statement
28 pages
Clustering
No ratings yet
Clustering
24 pages
Confusion Matrix & Box Plot Analysis
No ratings yet
Confusion Matrix & Box Plot Analysis
5 pages
Experiments With A New Boosting Algorithm: Yoav Freund Robert E. Schapire
No ratings yet
Experiments With A New Boosting Algorithm: Yoav Freund Robert E. Schapire
9 pages
Scikit Learn
No ratings yet
Scikit Learn
10 pages
Maxima & Minima Practice Questions
No ratings yet
Maxima & Minima Practice Questions
2 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
8 pages
Optimal K-Means Clustering on Iris
No ratings yet
Optimal K-Means Clustering on Iris
4 pages
Breast Cancer SVM Classification Guide
No ratings yet
Breast Cancer SVM Classification Guide
2 pages
Clustering: Hierarchical vs Partitional
No ratings yet
Clustering: Hierarchical vs Partitional
3 pages
Fuzzy Extreme Learning Machine For Classification: W.B. Zhang and H.B. Ji
No ratings yet
Fuzzy Extreme Learning Machine For Classification: W.B. Zhang and H.B. Ji
2 pages