Python Project

This document loads news article text and label data, preprocesses the text by removing stopwords and special characters, splits the data into training and test sets, applies TF-IDF vectorization, trains a PassiveAggressiveClassifier model on the training set, predicts labels on the test set, and evaluates the model's accuracy and confusion matrix. Key steps include loading CSV data, preprocessing text, splitting into train and test, applying TF-IDF, training a PAC model, predicting on the test set, and evaluating accuracy.

Uploaded by

bebshnnsjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views2 pages

Python Project

Uploaded by

bebshnnsjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

# Import necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import re

# Download NLTK resources (if not downloaded)

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Load the dataset (assuming it's in CSV format)

data = pd.read_csv('/news.csv') # Replace 'your_dataset.csv' with your file name

# Explore the dataset

print(data.head()) # Check the first few rows
print(data.info()) # Get information about the dataset

# Data preprocessing
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
# Convert text to lowercase
text = text.lower()

# Remove special characters and digits

text = re.sub(r'\W', ' ', text)
text = re.sub(r'\d', ' ', text)

# Tokenize the text

words = word_tokenize(text)

# Remove stop words and lemmatize tokens

words = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]

# Join words back into text

processed_text = ' '.join(words)
return processed_text

data['text'] = data['text'].apply(preprocess_text)

# Feature extraction
X = data['text']
y = data['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test)

# Model building - using Passive Aggressive Classifier

model = PassiveAggressiveClassifier(max_iter=50)
model.fit(tfidf_train, y_train)

# Prediction
y_pred = model.predict(tfidf_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

print(f"Confusion Matrix:\n{conf_matrix}")

Glove
100% (1)
Glove
10 pages
Shreya Srivastava-27
No ratings yet
Shreya Srivastava-27
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
No ratings yet
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
27 pages
Unstructured Data Classification Handson
No ratings yet
Unstructured Data Classification Handson
4 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Fake News Detection with LSTM
No ratings yet
Fake News Detection with LSTM
8 pages
NLP Lab Manual for B.E. Students
No ratings yet
NLP Lab Manual for B.E. Students
21 pages
A Guide To Text Classification (NLP)
No ratings yet
A Guide To Text Classification (NLP)
17 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Text Preprocessing & Classification
100% (1)
Text Preprocessing & Classification
4 pages
Application Code
No ratings yet
Application Code
3 pages
Python Text Classification Guide
No ratings yet
Python Text Classification Guide
34 pages
Topic Classifierby David Caleb
No ratings yet
Topic Classifierby David Caleb
7 pages
Fake News Classification - Ipynb - Colaboratory
No ratings yet
Fake News Classification - Ipynb - Colaboratory
6 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
17 - Source Code - nlp-2-5
No ratings yet
17 - Source Code - nlp-2-5
4 pages
NLP Lab Assignment 8
No ratings yet
NLP Lab Assignment 8
14 pages
Sumati
No ratings yet
Sumati
10 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
DSC 253 Homework 1
No ratings yet
DSC 253 Homework 1
15 pages
Project Report
No ratings yet
Project Report
12 pages
Lab 6
No ratings yet
Lab 6
47 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Text, Pos, Wor2vec
No ratings yet
Text, Pos, Wor2vec
3 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Document
No ratings yet
Document
3 pages
IRT Lab Programs
No ratings yet
IRT Lab Programs
9 pages
IR 4 E-Mail Spam Filtering Spam - Dataset
No ratings yet
IR 4 E-Mail Spam Filtering Spam - Dataset
2 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Foundations of Python For AI
No ratings yet
Foundations of Python For AI
67 pages
Methodology
No ratings yet
Methodology
9 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Machine Learning Project in Python Step-By-Step
No ratings yet
Machine Learning Project in Python Step-By-Step
23 pages
Ai Lab Final
No ratings yet
Ai Lab Final
21 pages
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
No ratings yet
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
9 pages
NLP Practical Three
No ratings yet
NLP Practical Three
8 pages
Matrices Basics
No ratings yet
Matrices Basics
16 pages
CTI Record
No ratings yet
CTI Record
49 pages
Assignment 1 Harsh Agarwal
No ratings yet
Assignment 1 Harsh Agarwal
13 pages
NLP
No ratings yet
NLP
6 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
Token Ization
No ratings yet
Token Ization
5 pages
Module 3 Lab 3
No ratings yet
Module 3 Lab 3
4 pages
Assignment
No ratings yet
Assignment
6 pages
DSBDA7
No ratings yet
DSBDA7
5 pages
Hindi Solution Half Yearly
No ratings yet
Hindi Solution Half Yearly
7 pages
# Load The Dataset: 'News - Dataset - Pickle' 'RB'
No ratings yet
# Load The Dataset: 'News - Dataset - Pickle' 'RB'
2 pages
10253.exp 5
No ratings yet
10253.exp 5
12 pages
English Half Yearly Solution
No ratings yet
English Half Yearly Solution
6 pages
Part B
No ratings yet
Part B
6 pages
Natural Language Processing With Python
No ratings yet
Natural Language Processing With Python
7 pages
News Classification
No ratings yet
News Classification
4 pages