Machine Learning using Python
PRESENTED BY:
N.BHARATH REDDY
(4511-21-733-011)
CONTENTS
• INTRODUCTION
• APPLICATIONS
• MACHINE LEARNING
• STATISTICS
• TASK:FLIGHT TICKET PRICE PREDICTION
• PROBLEM STATEMENT
• SYSTEM ARCHITECTURE
• MODULES
• CONCLUSION
INTRODUCTION
• Python is a high level programming language.
• It was created by Guido van Rossum in 1991 and further
developed by a Python software Foundation.
• Python is a dynamic, interpreted language.
• Python is platform independent.
APPLICATIONS
• Web Development
• Game Development
• Audio and Video applications
• Machine Learning and AI
• Data analysis and visualization
• Image processing applications
LIBRARIES IN PYTHON
• A Library refers to a collection of pre-written code and functions
that can be used to perform specific tasks or operations.
• The Python Library contains the exact syntax, semantics, and tokens
of Python. It contains built-in modules that provide access to basic
system functionality like I/O and some other core modules.
Pandas: It makes data analysis, data manipulation, and cleaning of
data. It is a data frame for describing csv files.
• Numpy: The name Numpy stands for Numerical Python. It
consists of in-built mathematical functions for easy
computations.
• Matplotlib: This library is responsible for plotting numerical
data. It is used in data analysis.
• Scikit-learn: It is a famous Python library to work with
complex data. Scikit-learn is an open-source library that
supports machine learning.
MACHINE LEARNING
• Machine learning is a subset of artificial intelligence (AI) that
focuses on the development of algorithms and statistical models
that enable computers to learn and improve their performance on a
specific task without being explicitly programmed.
• Machine learning algorithms use data to identify patterns, make
predictions, and adapt their behavior over time.
• There are four types of Machine Learning.
Supervised ML: This type of ML involves supervision, where
machines are trained on labeled datasets and enabled to predict
outputs based on the provided training.
Unsupervised ML: The machine is trained using an unlabeled
dataset and is enabled to predict the output without any
supervision.
Semi-supervised learning: Semi-supervised learning comprises
characteristics of both supervised and unsupervised machine
learning. It uses the combination of labeled and unlabeled
datasets to train its algorithms.
Reinforcement learning: Reinforcement learning is a feedback-
based process. Unlike supervised learning, reinforcement learning
lacks labeled data, and the agents learn via experiences only.
STATISTICS
• Statistics is a core component of machine learning. It helps you
analyze and visualize data to find unseen patterns.
• It deals with collecting, analyzing, interpreting, and visualizing data.
Descriptive statistics and inferential statistics are the two major areas
of statistics.
• Population: In statistics, Population comprises all observations.
• Sample: Sample is a subset of population.
• Measures of Central Tendency: Measures of central tendency are the
measures that are used to describe the distribution of data using a
single value.
• Variance and Standard Deviation: Variance is used to measure
the variability in the data from the mean.
Standard deviation is the square root of the variance.
• Skewness and kurtosis: Skewness measures the shape of the
distribution. Kurtosis is used to check whether the tails of a given
distribution have extreme values.
FLIGHT TICKET PRICE
PREDICTION
PROBLEM STATEMENT:
The aim of this project is to develop a machine learning model
that can accurately predict the ticket prices of flights. The
prediction model will take into account various factors such as
the origin and destination cities, date and time of travel,
airline, and other relevant features to provide an estimate of
the flight price.
SYSTEM ARCHITECTURE
MODULES
• Data Collection
• Data Preprocessing
• Data Visualization
• Model Building
• Model Evaluation
Data collection
Data set
Reading the dataset from the website(www.kaggle.com)
Data preprocessing
Data visualization
Total count and names of airlines:
Total count and names of source cities:
Total count and departure time:
Total count and arrival time
Total count and class
Model Building
Linear Regression
Random Forest Regression
Model Evaluation
• Linear Regression
• Random Forest Regression
CONCLUSION
Here by I conclude that Random Forest Regression algorithm
is the best model that predict flight ticket price and performed
data visualization.
THANK YOU