Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views22 pages

Nagamani

The internship report details Nagamani K's work on 'Machine Learning Using Python' at Varcons Technologies Pvt Ltd as part of the Bachelor of Engineering program in Information Science and Engineering. The report covers various machine learning techniques, particularly sentiment analysis, and outlines the objectives, tasks performed, and methodologies used during the internship. It emphasizes hands-on experience with Python libraries and the development of predictive models for chronic disease risk assessment.

Uploaded by

Rahul sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Nagamani

The internship report details Nagamani K's work on 'Machine Learning Using Python' at Varcons Technologies Pvt Ltd as part of the Bachelor of Engineering program in Information Science and Engineering. The report covers various machine learning techniques, particularly sentiment analysis, and outlines the objectives, tasks performed, and methodologies used during the internship. It emphasizes hands-on experience with Python libraries and the development of predictive models for chronic disease risk assessment.

Uploaded by

Rahul sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

JNANA SANGAMA, BELAGAVI-590018

Internship Report

on

“Machine Learning Using Python”


Submitted in the partial fulfillment of the requirement for the award of
Bachelor of Engineering
in
Information Science and Engineering
Submitted By

NAGAMANI K 1DT21IS094

Under the guidance of


Faculty name
Mrs. SUPRIYA R K
Dept. of Information Science and Engineering,
DSATM, Bangalore.

DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING


Accredited by NBA, New Delhi
DAYANANDA SAGAR ACADEMY OF TECHNOLOGY& MANAGEMENT
Udayapura, Kanakapura Main Road, Opp. Art of Living, Bangalore-82

2023 -2024
DAYANANDA SAGAR ACADEMY OF TECHNOLOGY &
MANAGEMENT
(Affiliated to Visvesvaraya Technological University, Belagavi & Approved by AICTE, New Delhi)
(Accredited by NBA, New Delhi)
Opp. Art of Living, Udayapura, Kanakapura Road, Bangalore- 560082

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

This is to certify that the internship work on “Machine Learning algorithms for predicting
the risks of chronic diseases” is carried under “Varcons technologies Pvt Ltd, Bangalore”
by NAGAMANI K (1DT21IS094) in partial fulfillment of the requirement of V semester
Internship report in Information Science and Engineering of the Visvesvaraya
Technological University, Belagavi during the year 2023-2024. It is certified that all the
corrections/ suggestions indicated for the given assessment have been incorporated in the
report. This report has been approved as it satisfies the academic requirements with respect to
the internship work.

Mrs. SUPRIYA R K Dr. Nandini Prasad K S


Dept. of ISE Dean-Foreign Affairs & HOD
DSATM,Bangalore. Department of ISE
DSATM, Bangalore.
ACKNOWLEDGEMENT

I would like to take this opportunity to express my sincere thanks and gratitude to all
those who have been kind enough to guide when needed which has led to the successful
completion of the internship.

I would like to express my special thanks and gratitude to the Management of


Dayananda Sagar Academy of Technology and Management for providing all the
required facilities.

I would like to convey my immense gratitude to Dr. M. Ravishankar, Principal,


Dayananda Sagar Academy of Technology and Management.

I would like to express my profound gratitude to Dr. Nandini Prasad K S, Dean-Foreign


Affairs & HOD, Department of Information Science and Engineering, Dayananda Sagar
Academy of Technology and Management, Bangalore, for her continuous support and
encouragement which has enabled me to complete my internship.

I would like to express my appreciation & gratitude to my Internship Guide-


SUPRIYA R K, Department of Information Science and Engineering, Dayananda
Sagar Academy of Technology and Management, Bangalore.

I extend my sincere thanks to the internship coordinators-Department of Information


Science and Engineering, Dayananda Sagar Academy of Technology and Management,
Bangalore, for their suggestions in successful completion of my internship.

I would like to thank our parents and friends who have helped me directly or indirectly in
completing this internship within the stipulated time frame.

NAGAMANI K 1DT21IS094
ABSTRACT

Many businesses are using social media networks to deliver different services and connect with
clients and collect information about the thoughts and views of individuals. Sentiment analysis is a
technique of machine learning that senses polarities such as positive or negative thoughts within the
text, full documents, paragraphs, lines, or subsections. Machine Learning (ML) is a
multidisciplinary field, a mixture of statistics and computer science algorithms that are commonly
used in predictive and classification analyses.

This presents the common techniques of analyzing sentiment from a machine learning perspective.
In light of this, this literature review explores and discusses the idea of Sentiment analysis by
undertaking a systematic review and assessment of corporate and community white papers,
scientific research articles, journals, and reports. The goal and primary objectives of this article are
to analytically categorize and analyze the prevalent research techniques and implementations of
Machine Learning techniques to Sentiment Analysis on various applications.

The limitation of this analysis is that by excluding the hardware and the theoretical exposure
pertinent to the subject, the main emphasis is on the application side alone. The limitation of this
study is that the major focus is on the application side thereby excluding the hardware and
theoretical aspects related to the subject. Finally, this paper includes a research proposal for e-
commerce environment towards sentiment analysis applying machine learning algorithms.
TABLE OF CONTENTS

CHAPTER TITLE PAGE No.


No.
1
1 INTRODUCTION
2-9
2 TASK PERFORMED AND OBJECTIVES
10
3 PROBLEM STATEMENT
11-12
4 METHODOLOGY
5 13
SNAPSHOTS

6 14
CONCLUSION

7 REFERENCE 15
LIST OF FIGURES

SL. FIGURE FIGURE PAGE


No. No. TITLE No.
1 1 DECISION TREE 12
2 2 COUNT OF DISEASE 12
OCCURENCE
Machine Learning using Python

CHAPTER 1

INTRODUCTION
Introduction to ML
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to
“selflearn” from training data and improve over time, without being explicitly programmed.
Machine learning algorithms are able to detect patterns in data and learn from them, in order to
make their own predictions. In short, machine learning algorithms and models learn through
experience. In traditional programming, a computer engineer writes a series of directions that
instruct a computer how to transform input data into a desired output. Instructions are mostly
based on an IF-THEN structure : when certain conditions are met, the program executes a
specific action. Machine learning, on the other hand, is an automated process that enables
machines to solve problems with little or no human input, and take actions based on past
observations.

Project Introduction:
Our project employs Python and Machine Learning algorithms to predict the risks of chronic
diseases. Leveraging datasets encompassing patient information, we aim to build predictive
models using popular Python libraries like scikit-learn. Our focus is on simplicity, efficiency,
and accuracy, contributing to proactive healthcare strategies and personalized interventions.

Dept. of ISE, DSATM 2023-24 Page No. 1


Machine Learning using Python

CHAPTER 2

TASK PERFORMED AND OBJECTIVES

Day 1 Introduction to python

Day 2 Variables and data types, operations in python

Day 3 Conditional statements and loops in python

Day 4 Functions in python

Day 5 Classes and objects in python

Day 6 Modules in python

Day 7 Packages in python

Day 8 Introduction to machine learning

Day 9 Project setup and Environment Configuration

Day 10 Overview of Python Libraries

Day 11 Understanding Jupyter notebooks

Day 12 Basic operation in NumPy with Pandas

Day 13 Handling Missing data in Python

Day 14 Dealing with Outliers in Data

Day 15 Exploratory Data Analysis

Day 16 EDA Advanced Techniques

Day 17 Introduction to Classification Algorithms

Day 18 Implementing Logistic Regression

Day 19 Decision Trees and Random Forests

Dept. of ISE, DSATM 2023-24 Page no.2


Machine Learning using Python

Day 20 SVM and Model Evaluation Metrics for Classification

Day 21 Introduction to Regression, Linear Regression, Polynomial


Regression

Day 22 Decision Tree and Regression, Model Evaluation Metrics for


Regression

Day 23 Introduction to Unsupervised Learning

Day 24 K means Clustering, Hierarchial Clustering, Principal Component


Analysis

Day 25 Project

Day 1: INTRODUCTION TO PYTHON


In the first class of our internship program we were introduced to python like what is python,
why it is required, how it is useful, applications, etc.. and we also were discussed about the
entire schedule and objective of the internship program.

Day 2: VARIABLES AND DATA TYPES IN PYTHON


In the second day we learnt variables and data types present in python. We Understood data
types and how to work with variables is fundamental to programming in Python. It allows us to
manipulate and process different kinds of data efficiently in your programs.

Day 3: OPERATIONS IN PYTHON


In the third day we were thought about operations. Python supports a wide range of operations,
and they can be categorized into different types. Different arithematic operations with
examples and Python supports many other operations for different data types and objects.
Understanding these operations is fundamental to writing effective and expressive Python
code.

Day 4: CONDITIONAL STATEMENTS AND LOOPS IN PYTHON


In the fourth day, we understood conditional statements and loops is crucial for writing flexible
and powerful programs in Python.

Dept. of ISE, DSATM 2023-24 Page no.3


Machine Learning using Python

They enable us to control the flow of your code based on conditions and repeat certain
operations as needed.

Day 5: Classes and objects in python


In the fifth day,we Understood classes and objects are crucial for building modular, scalable,
and maintainable code in Python. They are fundamental to the principles of object-oriented
programming and play a key role in designing complex software systems.

Day 6: Modules in python


In the sixth day, Understood modules is essential for structuring and organizing your code
effectively. They facilitate code reuse, maintainability, and collaboration in larger projects.

Day 7: Packages in python


In the seventh day, Packages provide a way to structure your Python projects, making them
more modular, maintainable, and easier to navigate. They are commonly used in larger projects
to organize code and avoid naming conflicts between modules.

Day 8: Introduction to machine learning


In the eighth day, Machine learning continues to evolve, with advancements in deep learning,
reinforcement learning, and other subfields pushing the boundaries of what is possible. It plays
a crucial role in the development of AI systems and has become an integral part of various
industries and everyday technologies.

Day 9: Project setup and Environment Configuration


In the nineth day, Setting up a machine learning project involves creating a clean and
organized structure, managing dependencies, and choosing the right tools for

Dept. of ISE, DSATM 2023-24 Page no.4


Machine Learning using Python

development. A well-structured project enhances collaboration, reproducibility, and overall


project management. Adjust the structure and tools based on the specific requirements and
scale of your machine learning project.

Day 10: Overview of Python Libraries


In the tenth day, libraries represent just a fraction of the extensive Python ecosystem.
Depending on your specific needs and domain, we may explore additional libraries that cater
to specialized tasks or emerging technologies. Keep in mind that the Python community is
active, and new libraries are continually being developed to address evolving requirements in
various fields.

Day 11: Understanding Jupyter notebooks


Jupyter Notebooks have become a popular tool for data science, machine learning, education,
and general interactive computing due to their versatility and ease of use. They support an
open and interactive workflow, making them well-suited for various tasks in the Python
programming ecosystem.

Day 12: Basic operation in NumPy with Pandas


NumPy and Pandas offer a wide range of functionalities for more complex data
manipulation, analysis, and statistical operations. NumPy is essential for numerical
operations and working with arrays, while Pandas is particularly powerful for handling
structured data in tabular form. They are often used together in data science and machine
learning workflows.

Day 13: Handling Missing data in Python


The choice of imputation strategy depends on the nature of the data and the analysis goals.
It's essential to understand the implications of each method and consider the impact on the
integrity of the data.To assess the reasons for missing data and choose an appropriate method
accordingly. Additionally, keep in mind that imputation introduces uncertainty, and it's
essential to document and report any imputation choices made during data preprocessing.

Dept. of ISE, DSATM 2023-24 Page no.5


Machine Learning using Python

Day 14: Dealing with Outliers in Data


Outliers are data points that deviate significantly from the rest of the data, potentially affecting
the results of statistical analysis and machine learning models. Dealing with outliers is an
important step in data preprocessing.

Day 15: Exploratory Data Analysis


Exploratory Data Analysis (EDA) is a crucial step in the data analysis process that involves
examining and visualizing the dataset to understand its structure, patterns, and relationships.
EDA helps in uncovering insights, identifying patterns, and formulating hypotheses for further
analysis.

Day 16: EDA Advanced Techniques


Advanced exploratory data analysis (EDA) techniques involve more sophisticated methods and
visualizations to gain deeper insights into the dataset.

Day 17: Introduction to Classification Algorithms


Classification is a type of supervised machine learning where the goal is to predict the
categorical class labels of new instances based on past observations. Classification algorithms
learn from labeled training data and then make predictions on unseen or future data.

Day 18: Implementing Logistic Regression


Implementing logistic regression involves training a model that can predict binary outcomes (0
or 1) based on input features. Logistic regression uses a logistic function to model the
probability of the positive class.

Day 19: Decision Trees and Random Forests


Decision Trees and Random Forests are popular machine learning models, particularly in
classification and regression tasks. A Random Forest is an ensemble learning method that
constructs a multitude of decision trees.

Dept. of ISE, DSATM 2023-24 Page no.6


Machine Learning using Python

Day 20: SVM and Model Evaluation Metrics for Classification


Support Vector Machines (SVM) is a versatile and powerful machine learning algorithm
used for both classification and regression tasks. Model evaluation metrics for classification
tasks are essential for assessing the performance of a machine learning model.

Day 21: Introduction to Regression, Linear Regression, Polynomial


Regression
Regression is a type of supervised machine learning task where the goal is to predict a
continuous output variable based on one or more input features. Linear Regression is a
simple and widely used statistical technique in machine learning for modeling the
relationship between a dependent variable and one or more independent variables. Linear
Regression is a simple and widely used statistical technique in machine learning for
modeling the relationship between a dependent variable and one or more independent
variables.

Day 22: Decision Tree and Regression, Model Evaluation Metrics for
Regression
Decision Trees can be used for both classification and regression tasks. In evaluating
regression models, there are several metrics to assess their performance in predicting
continuous values.

Day 23: Introduction to Unsupervised Learning


Unsupervised learning is a category of machine learning where the algorithm is trained on
data without labeled responses. It plays a crucial role in various domains, contributing to the
understanding of data structures and facilitating tasks such as clustering, dimensionality
reduction, and anomaly detection.

Dept. of ISE, DSATM 2023-24 Page no.7


Machine Learning using Python

Day 24: K means Clustering, Hierarchial Clustering and Principal


Component Analysis
K-Means clustering is a popular unsupervised machine learning algorithm used for
partitioning a dataset into K distinct, non-overlapping subgroups or clusters. Hierarchical
Clustering is an unsupervised machine learning algorithm that builds a hierarchy of clusters
by recursively partitioning the dataset into subsets.

Day 25: Project


We had made teams of 4-5 members to make a project based on ML so we as a team
decided to make project on predicting the risk of chronic disease using ML algorithms
and submitted the project within the due date.

Dept. of ISE, DSATM 2023-24 Page no.8


Machine Learning using Python

OBJECTIVES

The main objectives of internship are:


1.Hands-on Machine Learning Experience: Gain practical, hands-on experience in
implementing machine learning algorithms and models using Python libraries such as scikit-
learn, TensorFlow, or PyTorch.
2.Python Programming Proficiency: Enhance proficiency in Python programming, with a
focus on its applications in machine learning and data analysis.
3.Data Preprocessing and Exploration: Learn techniques for preprocessing and exploring
datasets to prepare them for machine learning tasks. This includes handling missing data,
scaling features, and visualizing data.
4.Model Development: Understand the process of model development, including selecting
appropriate algorithms, tuning hyperparameters, and evaluating model performance.
5.Feature Engineering: Gain skills in feature engineering, which involves transforming and
creating new features from existing data to improve model performance.
6.Model Evaluation and Metrics: Learn how to evaluate machine learning models using
various metrics such as accuracy, precision, recall, F1-score, and ROC curves.
7.Supervised and Unsupervised Learning: Understand the concepts and applications of
both supervised learning (classification, regression) and unsupervised learning (clustering,
dimensionality reduction).
8.Deep Learning Concepts: Gain exposure to deep learning concepts and frameworks, such
as neural networks, convolutional neural networks (CNNs), and recurrent neural networks
(RNNs).
9.Natural Language Processing (NLP) and Computer Vision: Explore applications of
machine learning in natural language processing and computer vision, depending on the
specific focus of the internship.
10.Project Work: Work on real-world projects that involve solving business problems or
addressing specific challenges using machine learning techniques.



 
Dept. of ISE, DSATM 2023-24 Page no.9


Machine Learning using Python

CHAPTER 3

PROBLEM STATEMENT

Artificial Intelligence personal assistants have become plentiful over the last few years.
Applications such as Siri, Bixby, Ok Google and Cortana make mobile device users’ daily
routines that much easier. You may be asking yourself how these functions. Well, the
assistants receive external data (such as movement, voice, light, GPS readings, visually
defined markers, etc.) via the hardware’s sensors for further processing - and take it from
there to function accordingly.

Not too long ago, building an AI assistant was a small component of developers’ capacities;
however, nowadays, it is quite a realistic objective even for novice programmers. To create a
simple personal AI assistant, one simply needs dedicated software and around an hour of
working time. It would take much more time, though, to create something more advanced and
conceptually innovative. Nonetheless, well thought-out concepts can result in a great base for
a profitable startup. Let us consider the six most renowned applications based on artificial
intelligence concepts that can help create your virtual AI assistant app.

OBJECTIVES :
1.Early Detection: Identify individuals at risk of developing chronic diseases before the
onset of symptoms.Enable early intervention and preventive measures to improve patient
outcomes.

2.Risk Stratification: Classify individuals into different risk categories based on their
likelihood of developing specific chronic diseases. Tailor interventions based on the identified
risk levels.

3.Feature Selection: Determine the most relevant features and risk factors contributing to
the prediction of chronic diseases. Enhance model interpretability and guide targeted
healthcare interventions.

Dept. of ISE, DSATM 2023-24 Page no.10


Machine Learning using Python

CHAPTER 4

METHODOLOGY
• Data Collection and Preprocessing: Gather relevant datasets, including electronic health
records, genetic information, lifestyle data, and environmental factors. Clean and preprocess the
data to handle missing values, outliers, and standardize formats.

• Feature Engineering: Identify and select features that are most relevant to the prediction of
chronic diseases. Use domain knowledge and statistical methods to create new informative features.

• Model Selection: Explore various machine learning algorithms such as logistic regression,
decision trees, random forests, support vector machines, or deep learning models. Evaluate and
compare the performance of different models using appropriate metrics.

• Cross-Validation: Implement cross-validation techniques to assess the generalizability of the


models. Mitigate overfitting and ensure the robustness of the selected algorithms.

• Hyperparameter Tuning: Optimize the hyperparameters of selected models to improve


predictive accuracy. • Fine-tune the models for optimal performance on the specific dataset.

• Model Interpretability: Utilize techniques like SHAP (SHapley Additive exPlanations)


values or LIME (Local Interpretable Model-agnostic Explanations) to interpret and explain model
predictions.

• Validation and Testing: Validate the predictive models on independent datasets to assess
their real-world performance.

 

Dept. of ISE, DSATM 2023-24 Page no.11



Machine Learning using Python

 Testing: Conduct unit testing for backend API endpoints and frontend
components. Perform integration testing to ensure proper communication between
the frontend and backend. Address and fix any bugs or issues identified during
testing. 

 Deployment: Choose a hosting solution for both the backend (Node.js server) and
frontend (React.js app). Deploy the Movies Tracker application to a cloud platform
or server. Set up any necessary configurations for a production environment. 

 Continuous Improvement: Gather feedback from users and stakeholders for


future enhancements. Consider adding new features or improving existing ones
based on user feedback.










Dept. of ISE, DSATM 2023-24 Page no.12


Machine Learning using Python

CHAPTER 5
SNAPSHOTS

Fig 1:Decision Tree

Fig 2: Count of disease occurrences

Dept. of ISE, DSATM 2023-24 Page no.13


Machine Learning using Python

CHAPTER 6

CONCLUSION & FUTURE WORK

The package was designed in such a way that future modifications can be done easily.
Automation of the entire system improves the efficiency. It provides a friendly graphical user
interface which proves to be better when compared to the existing system. It gives appropriate
access to the authorized users depending on their permissions. It effectively overcomes the
delay in communications. Updating of information becomes so easier. System security, data
security and reliability are the striking features. The System has adequate scope for
modification in future if it is necessary.

Future work of this project is :


1.Integration of Multi-Modal Data :
 Incorporate diverse data sources such as genomics, proteomics, metabolomics, and
wearable device data to create a more comprehensive patient profile.
 Combine structured and unstructured data, including electronic health records (EHRs),
social determinants of health, and patient-generated data.
2.Deep Learning Architectures :
 Explore and develop more sophisticated deep learning architectures, such as attention
mechanisms, graph neural networks, and transformers, to better capture complex
relationships within healthcare data.
 Investigate the use of transfer learning techniques to leverage pre-trained models on large
healthcare datasets.
3.Real-Time and Continuous Monitoring:
 Develop models that can provide real-time risk predictions by continuously monitoring
patient data through wearable devices and other IoT (Internet of Things) technologies.
 Implement anomaly detection algorithms to identify sudden changes in health parameters.

Dept. of ISE, DSATM 2023-24 Page no.14


Machine Learning using Python

CHAPTER 7

REFERENCES

• J. Bollen and H. Mao. Twitter mood as a stock market predictor. IEEE Computer.
• C.-C.Chang and C.-J.Lin LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology.
• G. P. Gang Leng and T. M. Mc Ginnity. An on-line algorithm for creating self-organizing
fuzzy neural networks. Neural Networks.
• A. Lapedes and R. Farber. Nonlinear signal processing using neural network:
Predictionand system modeling. In Los Alamos National Lab Technical Report.
• A. E. Stefano Baccianella and F. Sebastiani. Sentiwordnet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining. In LREC. LREC.

Dept. of ISE, DSATM 2023-24 Page no.15

You might also like