Chapter one: introduction
Over the past two decades Machine Learning has become one
of the mainstays of information technology and with that, a
rather central, albeit usually hidden, part of our life. With the
ever increasing amounts of data becoming available there is
good reason to believe that smart data analysis will become
even more pervasive as a necessary ingredient for technological
progress. The purpose of this chapter is to provide the reader
with an overview over the vast range of applications which have
at their heart a machine learning problem and to bring some
degree of order to the zoo of problems. After that, we will
discuss some basic tools from statistics and probability theory,
since they form the language in which many machine learning
problems must be phrased to become amenable to solving.
Finally, we will outline a set of fairly basic yet effective
algorithms to solve an important problem, namely that of
classification. i
Machine learning, a subset of artificial intelligence (AI), has
transformed the way we approach problem-solving and
decision-making.
In this comprehensive guide, we will explore the evolution of
machine learning, its key components, pivotal milestones in its
history, and gaze into the future to uncover emerging trends.
From supervised to unsupervised learning, creative AI
advancements to ethical challenges, this blog provides a holistic
view of the machine learning landscape.
The Evolution of Machine Learning
Understanding the roots of machine learning is crucial to grasp
its significance today. The evolution can be traced through three
main paradigms:
1. Supervised Learning
Supervised learning, where the algorithm is trained on a labeled
dataset, has been a cornerstone of machine learning. From
linear regression to complex neural networks, this approach
powers various applications, such as image recognition and
natural language processing.
2. Unsupervised Learning
Unsupervised learning takes a different path, relying on
unlabeled data to identify patterns and relationships. Clustering
and dimensionality reduction are common techniques, opening
doors to insights in fields like customer segmentation and
anomaly detection.
3. Reinforcement Learning
Inspired by behavioral psychology, reinforcement learning
introduces a system that learns by interacting with its
environment. This has been pivotal in training algorithms to
make sequential decisions, powering advancements in robotics
and game-playing AI.
Key Components of Machine Learning
To grasp the intricacies of machine learning, understanding its
key components is essential. These include:
• Algorithms: The heart of machine learning, algorithms
process data to make predictions or decisions.
• Data: High-quality, diverse datasets fuel the training of
machine learning models.
• Models: Trained algorithms form models, representing the
acquired knowledge.
• Features: Variables within the data that influence the
model's output.
Milestones in the History of Machine Learning
The journey of machine learning is marked by significant
milestones:
o 1950s-1960s: The birth of machine learning, focusing
on pattern recognition.
o 1980s: The emergence of symbolic reasoning and
expert systems.
o 1990s: The shift towards statistical and probabilistic
methods.
o 2000s: The rise of big data and the resurgence of
neural networks.
o 2010s: Deep learning redefines the possibilities,
leading to breakthroughs in image and speech
recognition.
The Future of Machine Learning
As we peer into the future, several trends shape the trajectory
of machine learning:
o Explainable AI: The demand for transparency in AI
decision-making.
o Edge Computing: Bringing machine learning
capabilities closer to the data source.
o AutoML: Automated machine learning workflows for
non-experts.
o Federated Learning: Collaborative training models
without centralized data.
Advancements in Creative AI
Beyond traditional applications, machine learning has taken
strides in creative fields:
o Generative Adversarial Networks (GANs): Creating
realistic content, from images to music.
o Natural Language Processing (NLP): Generating
coherent and context-aware text.
o Teaching machines to see: The power of computer
vision.
Ethical Challenges and Responsible AI
With great power comes great responsibility. The rise of
machine learning also brings ethical considerations:
o Bias and Fairness: Addressing algorithmic biases in
decision-making.
o Privacy Concerns: Safeguarding sensitive information
in the age of data-driven insights.
o Building trustworthy AI: Creating guidelines for ethical
development and use.
Introduction to Machine learning
II. Choose the letter which contains the correct answer.
1) What is machine learning ?
Machine learning is the science of getting computers to act
without being explicitly programmed.
Machine Learning is a Form of AI that Enables a System to Learn
from Data.
Both A and B
None of the above
2) Machine learning is an application of ___________.
Blockchain
Artificial Intelligence
Both A and B
None of the above
3) which one is application of Machine learning?
email filtering
sentimental analysis
face recognition
All of the above
4) The term machine learning was coined in which year?
1958
1959
1980
1961
5) Machine learning approaches can be traditionally categorized
into ______ categories.
3
4
8
9
6) The categories in which Machine learning approaches can be
traditionally categorized are ______ .
Supervised learning
Unsupervised learning
Reinforcement learning
All of the above
II. Choose the letter which contains the correct answer.
1) What is machine learning ?
Machine learning is the science of getting computers to act
without being explicitly programmed.
Machine Learning is a Form of AI that Enables a System to Learn
from Data.
Both A and B
None of the above
2) Machine learning is an application of ___________.
Blockchain
Artificial Intelligence
Both A and B
None of the above
3) which one is application of Machine learning?
email filtering
sentimental analysis
face recognition
All of the above
4) The term machine learning was coined in which year?
1958
1959
1980
1961
5) Machine learning approaches can be traditionally categorized
into ______ categories.
3
4
8
9
6) The categories in which Machine learning approaches can be
traditionally categorized are ______ .
Supervised learning
Unsupervised learning
Reinforcement learning
All of the above
1.2.1 Comparison between AI and machine learning
Artificial intelligence and machine learning are very closely
related and connected. Because of this relationship, when you
look into AI vs. machine learning, you’re really looking into their
interconnection.
What is artificial intelligence (AI)?
Artificial intelligence is the capability of a computer system to
mimic human cognitive functions such as learning and problem-
solving. Through AI, a computer system uses math and logic to
simulate the reasoning that people use to learn from new
information and make decisions.
Are AI and machine learning the same?
While AI and machine learning are very closely connected,
they’re not the same. Machine learning is considered a subset
of AI.
What is machine learning?
Machine learning is an application of AI. It’s the process of using
mathematical models of data to help a computer learn without
direct instruction. This enables a computer system to continue
learning and improving on its own, based on experience.
One way to train a computer to mimic human reasoning is to
use a neural network, which is a series of algorithms that are
modeled after the human brain. The neural network helps the
computer system achieve AI through deep learning. This close
connection is why the idea of AI vs. machine learning is really
about the ways that AI and machine learning work together.
How AI and machine learning work together
When you’re looking into the difference between artificial
intelligence and machine learning, it’s helpful to see how they
interact through their close connection. This is how AI and
machine learning work together:
Step 1
An AI system is built using machine learning and other
techniques.
Step 2
Machine learning models are created by studying patterns in
the data.
Step 3
Data scientists optimize the machine learning models based on
patterns in the data.
Step 4
The process repeats and is refined until the models’ accuracy is
high enough for the tasks that need to be done.
Capabilities of AI and machine learning
Companies in almost every industry are discovering new
opportunities through the connection between AI and machine
learning. These are just a few capabilities that have become
valuable in helping companies transform their processes and
products:
Predictive analytics
This capability helps companies predict trends and behavioral
patterns by discovering cause-and-effect relationships in data.
Recommendation engines
With recommendation engines, companies use data analysis to
recommend products that someone might be interested in.
Speech recognition and natural language understanding
Speech recognition enables a computer system to identify
words in spoken language, and natural language understanding
recognizes meaning in written or spoken language.
Image and video processing
These capabilities make it possible to recognize faces, objects,
and actions in images and videos, and implement functionalities
such as visual search.
Sentiment analysis
A computer system uses sentiment analysis to identify and
categorize positive, neutral, and negative attitudes that are
expressed in text.
Benefits of AI and machine learning
The connection between artificial intelligence and machine
learning offers powerful benefits for companies in almost every
industry—with new possibilities emerging constantly. These are
just a few of the top benefits that companies have already seen:
More sources of data input
AI and machine learning enable companies to discover valuable
insights in a wider range of structured and unstructured data
sources.
Better, faster decision-making
Companies use machine learning to improve data integrity and
use AI to reduce human error—a combination that leads to
better decisions based on better data.
Increased operational efficiency
With AI and machine learning, companies become more
efficient through process automation, which reduces costs and
frees up time and resources for other priorities.
Applications of AI and machine learning
Companies in several industries are building applications that
take advantage of the connection between artificial intelligence
and machine learning. These are just a few ways that AI and
machine learning are helping companies transform their
processes and products:
Retail
Retailers use AI and machine learning to optimize their
inventories, build recommendation engines, and enhance the
customer experience with visual search.
Healthcare
Health organizations put AI and machine learning to use in
applications such as image processing for improved cancer
detection and predictive analytics for genomics research.
Banking and finance
In financial contexts, AI and machine learning are valuable tools
for purposes such as detecting fraud, predicting risk, and
providing more proactive financial advice.
Sales and marketing
Sales and marketing teams use AI and machine learning for
personalized offers, campaign optimization, sales forecasting,
sentiment analysis, and prediction of customer churn.
Cybersecurity
AI and machine learning are powerful weapons for
cybersecurity, helping organizations protect themselves and
their customers by detecting anomalies.
Customer service
Companies in a wide range of industries use chatbots and
cognitive search to answer questions, gauge customer intent,
and provide virtual assistance.
Transportation
AI and machine learning are valuable in transportation
applications, where they help companies improve the efficiency
of their routes and use predictive analytics for purposes such as
traffic forecasting.
AI vs Machine Learning vs Deep Learning, these terms have
confused a lot of people. If you too are one among them then,
this article is definitely for you.
Artificial Intelligence is the broader umbrella under which
Machine Learning and Deep Learning come. And you can also
see in the diagram that even deep learning is a subset of
Machine Learning. So all three of them AI, machine learning and
deep learning are just the subsets of each other. So let us move
on and understand how exactly they are different from each
other.
Manufacturing
Manufacturing companies use AI and machine learning for
predictive maintenance and to make their operations more
efficient than ever.
How are AI and machine learning connected?
An “intelligent” computer uses AI to think like a human and
perform tasks on its own.
Machine learning Vs Artificial intelligence
Data in Machine learning
ML | Introduction to Data in Machine Learning
Last Updated : 05 Apr, 2023
•
Data is a crucial component in the field of Machine Learning. It
refers to the set of observations or measurements that can be
used to train a machine-learning model. The quality and
quantity of data available for training and testing play a
significant role in determining the performance of a machine-
learning model. Data can be in various forms such as numerical,
categorical, or time-series data, and can come from various
sources such as databases, spreadsheets, or APIs. Machine
learning algorithms use data to learn patterns and relationships
between input variables and target outputs, which can then be
used for prediction or classification tasks.
Data is typically divided into two types:
1. Labeled data
2. Unlabeled data
Labeled data includes a label or target variable that the model is
trying to predict, whereas unlabeled data does not include a
label or target variable. The data used in machine learning is
typically numerical or categorical. Numerical data includes
values that can be ordered and measured, such as age or
income. Categorical data includes values that represent
categories, such as gender or type of fruit.
Data can be divided into training and testing sets. The training
set is used to train the model, and the testing set is used to
evaluate the performance of the model. It is important to
ensure that the data is split in a random and representative
way.
Data preprocessing is an important step in the machine learning
pipeline. This step can include cleaning and normalizing the
data, handling missing values, and feature selection or
engineering.
DATA: It can be any unprocessed fact, value, text, sound, or
picture that is not being interpreted and analyzed. Data is the
most important part of all Data Analytics, Machine Learning,
and Artificial Intelligence. Without data, we can’t train any
model and all modern research and automation will go in vain.
Big Enterprises are spending lots of money just to gather as
much certain data as possible.
Example: Why did Facebook acquire WhatsApp by paying a
huge price of $19 billion?
The answer is very simple and logical – it is to have access to
the users’ information that Facebook may not have but
WhatsApp will have. This information about their users is of
paramount importance to Facebook as it will facilitate the task
of improvement in their services.
INFORMATION: Data that has been interpreted and
manipulated and has now some meaningful inference for the
users.
KNOWLEDGE: Combination of inferred information,
experiences, learning, and insights. Results in awareness or
concept building for an individual or organization.
How do we split data in Machine Learning?
• Training Data: The part of data we use to train our model.
This is the data that your model actually sees(both input
and output) and learns from.
• Validation Data: The part of data that is used to do a
frequent evaluation of the model, fit on the training
dataset along with improving involved hyperparameters
(initially set parameters before the model begins learning).
This data plays its part when the model is actually training.
• Testing Data: Once our model is completely trained,
testing data provides an unbiased evaluation. When we
feed in the inputs of Testing data, our model will predict
some values(without seeing actual output). After
prediction, we evaluate our model by comparing it with
the actual output present in the testing data. This is how
we evaluate and see how much our model has learned
from the experiences feed in as training data, set at the
time of training.
Consider an example:
There’s a Shopping Mart Owner who conducted a survey for
which he has a long list of questions and answers that he had
asked from the customers, this list of questions and answers
is DATA. Now every time when he wants to infer anything and
can’t just go through each and every question of thousands of
customers to find something relevant as it would be time-
consuming and not helpful. In order to reduce this overhead
and time wastage and to make work easier, data is manipulated
through software, calculations, graphs, etc. as per your own
convenience, this inference from manipulated data
is Information. So, Data is a must for Information.
Now Knowledge has its role in differentiating between two
individuals having the same information. Knowledge is actually
not technical content but is linked to the human thought
process.
Different Forms of Data
• Numeric Data : If a feature represents a characteristic
measured in numbers , it is called a numeric feature.
• Categorical Data : A categorical feature is an attribute that
can take on one of the limited , and usually fixed number
of possible values on the basis of some qualitative
property . A categorical feature is also called a nominal
feature.
• Ordinal Data : This denotes a nominal variable with
categories falling in an ordered list . Examples include
clothing sizes such as small, medium , and large , or a
measurement of customer satisfaction on a scale from
“not at all happy” to “very happy”.
Properties of Data –
1. Volume: Scale of Data. With the growing world population
and technology at exposure, huge data is being generated
each and every millisecond.
2. Variety: Different forms of data – healthcare, images,
videos, audio clippings.
3. Velocity: Rate of data streaming and generation.
4. Value: Meaningfulness of data in terms of information that
researchers can infer from it.
5. Veracity: Certainty and correctness in data we are working
on.
6. Viability: The ability of data to be used and integrated into
different systems and processes.
7. Security: The measures taken to protect data from
unauthorized access or manipulation.
8. Accessibility: The ease of obtaining and utilizing data for
decision-making purposes.
9. Integrity: The accuracy and completeness of data over its
entire lifecycle.
10. Usability: The ease of use and interpretability of data
for end-users.
Some facts about Data:
• As compared to 2005, 300 times i.e. 40 Zettabytes
(1ZB=10^21 bytes) of data will be generated by 2020.
• By 2011, the healthcare sector has a data of 161 Billion
Gigabytes
• 400 Million tweets are sent by about 200 million active
users per day
• Each month, more than 4 billion hours of video streaming
is done by the users.
• 30 Billion different types of content are shared every
month by the user.
• It is reported that about 27% of data is inaccurate and so 1
in 3 business idealists or leaders don’t trust the
information on which they are making decisions.
The above-mentioned facts are just a glimpse of the actually
existing huge data statistics. When we talk in terms of real-
world scenarios, the size of data currently presents and is
getting generated each and every moment is beyond our
mental horizons to imagine.
Example:
Imagine you’re working for a car manufacturing company and
you want to build a model that can predict the fuel efficiency of
a car based on the weight and the engine size. In this case, the
target variable (or label) is the fuel efficiency, and the features
(or input variables) are the weight and engine size. You will
collect data from different car models, with corresponding
weight and engine size, and their fuel efficiency. This data is
labeled and it’s in the form of (weight,engine size,fuel
efficiency) for each car. After having your data ready, you will
then split it into two sets: training set and testing set, the
training set will be used to train the model and the testing set
will be used to evaluate the performance of the model.
Preprocessing could be needed for example, to fill missing
values or handle outliers that might affect your model accuracy.
Implementation:
Example: 1
• Python3
# Example input data
fromsklearn.linear_model importLogisticRegression
X =[[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y =[0, 0, 1, 1, 1]
# Train a model
model =LogisticRegression()
model.fit(X, y)
# Make a prediction
prediction =model.predict([[6, 7]])[0]
print(prediction)
Output:
0,1
If you run the code I provided, the output will be the prediction
made by the model. In this case, the prediction will be either 0
or 1, depending on the specific parameters learned by the
model during training.
For example, if the model learned that input data with a high
second element is more likely to have a label of 1, then the
prediction for [6, 7] would be 1.
Advantages Or Disadvantages:
Advantages of using data in Machine Learning:
1. Improved accuracy: With large amounts of data, machine
learning algorithms can learn more complex relationships
between inputs and outputs, leading to improved accuracy
in predictions and classifications.
2. Automation: Machine learning models can automate
decision-making processes and can perform repetitive
tasks more efficiently and accurately than humans.
3. Personalization: With the use of data, machine learning
algorithms can personalize experiences for individual
users, leading to increased user satisfaction.
4. Cost savings: Automation through machine learning can
result in cost savings for businesses by reducing the need
for manual labor and increasing efficiency.
Disadvantages of using data in Machine Learning:
1. Bias: Data used for training machine learning models can
be biased, leading to biased predictions and classifications.
2. Privacy: Collection and storage of data for machine
learning can raise privacy concerns and can lead to security
risks if the data is not properly secured.
3. Quality of data: The quality of data used for training
machine learning models is critical to the performance of
the model. Poor quality data can lead to inaccurate
predictions and classifications.
4. Lack of interpretability: Some machine learning models can
be complex and difficult to interpret, making it challenging
to understand how they are making decisions.
Use of Machine Learning :
Machine learning is a powerful tool that can be used in a wide
range of applications. Here are some of the most common uses
of machine learning:
• Predictive modeling: Machine learning can be used to
build predictive models that can predict future outcomes
based on historical data. This can be used in many
applications, such as stock market prediction, fraud
detection, weather forecasting, and customer behavior
prediction.
• Image recognition: Machine learning can be used to train
models that can recognize objects, faces, and other
patterns in images. This is used in many applications, such
as self-driving cars, facial recognition systems, and medical
image analysis.
• Natural language processing: Machine learning can be
used to analyze and understand natural language, which is
used in many applications, such as chatbots, voice
assistants, and sentiment analysis.
• Recommendation systems: Machine learning can be used
to build recommendation systems that can suggest
products, services, or content to users based on their past
behavior or preferences.
• Data analysis: Machine learning can be used to analyze
large datasets and identify patterns and insights that
would be difficult or impossible for humans to detect.
• Robotics: Machine learning can be used to train robots to
perform tasks autonomously, such as navigating through a
space or manipulating objects.
Issues of using data in Machine Learning:
• Data quality: One of the biggest issues with using data in
machine learning is ensuring that the data is accurate,
complete, and representative of the problem domain. Low-
quality data can result in inaccurate or biased models.
• Data quantity: In some cases, there may not be enough
data available to train an accurate machine learning
model. This is especially true for complex problems that
require a large amount of data to accurately capture all the
relevant patterns and relationships.
• Bias and fairness: Machine learning models can
sometimes perpetuate bias and discrimination if the
training data is biased or unrepresentative. This can lead to
unfair outcomes for certain groups of people, such as
minorities or women.
• Overfitting and underfitting: Overfitting occurs when a
model is too complex and fits the training data too closely,
resulting in poor generalization to new data. Underfitting
occurs when a model is too simple and does not capture
all the relevant patterns in the data.
• Privacy and security: Machine learning models can
sometimes be used to infer sensitive information about
individuals or organizations, raising concerns about privacy
and security.
• Interpretability: Some machine learning models, such as
deep neural networks, can be difficult to interpret and
understand, making it challenging to explain the reasoning
behind their predictions and decisions.
Are you passionate about data and looking to make one giant
leap into your career? Our Data Science Course will help you
change your game and, most importantly, allow students,
professionals, and working adults to tide over into the data
science immersion. Master state-of-the-art methodologies,
powerful tools, and industry best practices, hands-on projects,
and real-world applications. Become the executive head of
industries related to Data Analysis, Machine Learning,
and Data Visualization with these growing skills. Ready to
Transform Your Future? Enroll Now to Be a Data Science
Expert!
Python in Machine learning
What is Machine Learning?
Machine Learning is the field of study that gives computers the
capability to learn without being explicitly programmed. ML is
one of the most exciting technologies that one has ever come
across. As it is evident from the name, it gives the computer
something that makes it more similar to humans: The ability to
learn. Machine learning is actively being used today, perhaps in
many more places than one would expect.
What is Python?
Python is the most used high-level was developed by Guido van
Rossum and released first on February 20, 1991, It is interpreted
programming language known for its readability and clear
syntax. It provides various libraries and frameworks that
simplify machine learning development. Python’s versatility and
active community make it an ideal language for machine-
learning projects and supports object-oriented programming,
most commonly used to perform general-purpose
programming. Python is used in several domains like Data
Science, Machine Learning, Deep Learning, Artificial
Intelligence, Networking, Game Development, Web
Development, Web Scraping, and various other domains.
Python’s Role in Machine Learning
Python has a crucial role in machine learning because Python
provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow,
and Keras. These libraries offer tools and functions essential for
data manipulation, analysis, and building machine learning
models. It is well-known for its readability and offers platform
independence. These all things make it the perfect language of
choice for Machine Learning.
Setting Up Python for Machine Learning
Follow these steps:
Step 1: Install Python and Required Libraries
Begin by installing Python on your system. You can download
the latest version from the official Python website. Additionally,
you’ll need to install the required libraries for machine learning,
like NumPy, Pandas, Matplotlib, and Scikit-learn.
Step 2: Choose an Integrated Development Environment (IDE)
Select an IDE for writing and executing your Python code. Some
popular options include Jupyter Notebook, PyCharm, and Visual
Studio Code.
Step 3: Load Datasets
For machine learning projects, you’ll often work with datasets.
Python’s Pandas library allows you to load and manipulate data
efficiently.
Why Python is Preferred for Machine Learning?
Python is preferred for machine learning for several key
reasons, which collectively contribute to its popularity and
widespread adoption in the field:
• Python is known for its readability and simplicity , making
it easy for beginners to grasp and valuable for experts due
to its clear and intuitive syntax.
• Its simplicity accelerates the development process,
allowing developers to write fewer lines of code compared
to languages like Java or C++.
• Python offers a rich ecosystem of libraries and frameworks
tailored for machine learning and data analysis, such as
Scikit-learn, TensorFlow, PyTorch, Keras, and Pandas.
• These libraries provide pre-built functions and utilities for
mathematical operations, data manipulation, and machine
learning tasks, reducing the need to write code from
scratch.
• Python has a large and active community, providing ample
tutorials, forums, and documentation for support,
troubleshooting, and collaboration.
• The community ensures regular updates and optimization
of libraries, keeping them up-to-date with the latest
features and performance improvements.
• Python’s flexibility makes it suitable for projects of any
scale, from small experiments to large, complex systems,
and across various stages of software development and
machine learning workflows.
Essential Python Libraries for Machine Learning
1. NumPy : This library is fundamental for scientific
computing with Python. It provides support for large,
multi-dimensional arrays and matrices, along with a
collection of high-level mathematical functions to operate
on these arrays.
2. Pandas : Essential for data manipulation and analysis,
Pandas provides data structures and operations for
manipulating numerical tables and time series. It is ideal
for data cleaning, transformation, and analysis.
3. Matplotlib : It is great for creating static, interactive, and
animated visualizations in Python. Matplotlib is highly
customizable and can produce graphs and charts that are
publication quality.
4. Scikit-learn : Perhaps the most well-known Python library
for machine learning, Scikit-learn provides a range of
supervised and unsupervised learning algorithms via a
consistent interface. It includes methods for classification,
regression, clustering, and dimensionality reduction, as
well as tools for model selection and evaluation.
5. SciPy : Built on NumPy, SciPy extends its capabilities by
adding more sophisticated routines for optimization,
regression, interpolation, and eigenvector decomposition,
making it useful for scientific and technical computing.
6. TensorFlow : Developed by Google, TensorFlow is primarily
used for deep learning applications. It allows developers to
create large-scale neural networks with many layers,
primarily focusing on training and inference of deep neural
networks.
Getting Started with Python Programming
Last Updated : 05 Sep, 2024
•
•
•
Python is a high-level, interpreted, and general-purpose
programming language. It was created by Guido van Rossum
and first released in 1991. Python emphasizes code readability
and simplicity, making it an excellent language for beginners
and experienced developers. Getting Started with Python is
easy due to its straightforward syntax and extensive
documentation, which provides ample resources for learning
and development.
Python Features
1. Simple and Easy to Learn: Python has a simple syntax,
which makes it easy to learn and read. It’s a great language
for beginners who are new to programming.
2. Interpreted: Python is an interpreted language, which
means that the Python code is executed line by line. This
makes it easy to test and debug code.
3. High-Level: Python is a high-level language, which means
that it abstracts away low-level details like memory
management and hardware interaction. This makes it
easier to write and understand code.
4. Dynamic Typing: Python is dynamically typed, which
means that you don’t need to declare the data type of a
variable explicitly. Python will automatically infer the data
type based on the value assigned to the variable.
5. Strong Typing: Python is strongly typed, which means that
the data type of a variable is enforced at runtime. This
helps prevent errors and makes the code more robust.
6. Extensive Standard Library: Python comes with a large
standard library that provides tools and modules for
various tasks, such as file I/O, networking, and more. This
makes it easy to build complex applications without having
to write everything from scratch.
7. Cross-Platform: Python is a cross-platform language, which
means that Python code can run on different operating
systems without modification. This makes it easy to
develop and deploy Python applications on different
platforms.
8. Community and Ecosystem: Python has a large and active
community, which contributes to its ecosystem. There are
many third-party libraries and frameworks available for
various purposes, making Python a versatile language for
many applications.
9. Versatile: Python is a versatile language that can be used
for various purposes, including web development, data
science, artificial intelligence, game development, and
more.
Table of Content
• Install Python
• Setting up a Python Development Environment
• Create and Run your First Python Program
• Python Basic Guide
• Beginner Tips for Learning Python Programming
Here’s a basic guide to get you started with Python:
Install Python
Before starting this Python course first, you need to install
Python on your computer. To install Python on your computer,
follow these steps:
1. Download Python: Go to the official Python website at
https://www.python.org/. On the homepage, you will see
a “Downloads” section. Click on the “Download Python”
button.
2. Choose the Version: You will be directed to a page where
you can choose the version of Python you want to
download. Python usually has two main versions available:
Python 3. Python 3 is the recommended version. Click on
the appropriate version for your operating system
(Windows, macOS, or Linux).
3. Add Python to PATH (Optional): On Windows, you may be
given the option to add Python to your system’s PATH
environment variable. This makes it easier to run Python
from the command line. If you’re not sure, it’s usually safe
to select this option.
4. Install Python: Click the “Install Now” button to begin the
installation. The installer will copy the necessary files to
your computer.
5. Verify the Installation: After the installation is complete,
you can verify that Python was installed correctly by
opening a command prompt (on Windows) or a terminal
(on macOS or Linux) and typing python --version. This
should display the version of Python you installed.
That’s it! Python should now be installed on your computer, and
you’re ready to start using Python.
Setting up a Python Development Environment
An IDE makes coding easier. Popular choices
include PyCharm, Visual Studio Code, and Jupyter Notebook.
Install one and set it up for Python development. Or you can
also use an online Python IDE.
Create and Run your First Python Program
For the first program, we will try to print a very simple
message “Hello World” in Python, the code for which is given
below:
Once you have Python installed, you can run the program by
following these steps:
1. Open a text editor (e.g., Notepad on Windows, TextEdit on
macOS, or any code editor like VS Code, PyCharm, etc.).
2. Copy the code above and paste it into the text editor.
3. Save the file with a .py extension (e.g., hello_world.py).
4. Open a terminal or command prompt.
5. Navigate to the directory where you saved the file using
the cd command (e.g., cd path/to/your/directory).
6. Run the program by typing python hello_world.py and
pressing Enter.
You should see the output “Hello, World!” printed in the
terminal.
Python
print("Hello, World!")
Output
Hello, World!
Python Basic Guide
Python has a simple and readable syntax, making it an excellent
language for beginners. Here are some basics of Python syntax:
Comments
Comments in Python start with the # symbol and are used to
explain code or make notes. Comments are ignored by the
Python interpreter.
# This is a comment
print("Hello, World!") # This is another comment
Variables
Variables are used to store data. In Python, you don’t need to
declare the data type of a variable explicitly. Python will
automatically infer the data type based on the value assigned to
the variable.
x = 10 # Integer
y = 3.14 # Float
name = "John" # String
Data Types
Python supports various data types, including integers, floats,
strings, lists, tuples, dictionaries, and more.
• Integers: Whole numbers without decimals.
• Floats: Numbers with decimals.
• Strings: Text enclosed in single or double quotes.
• Lists: Ordered collections of items.
• Tuples: Immutable collections of items.
• Dictionaries: Key-value pairs.
Indentation
Python uses indentation to define blocks of code, such as loops
and functions. Use four spaces for indentation. Incorrect
indentation can lead to syntax errors.
if x > 10:
print("x is greater than 10")
else:
print("x is less than or equal to 10")
Operators
Python supports various operators, including arithmetic,
comparison, logical, and assignment operators.
• Arithmetic operators: +, -
, *, /, %, ** (exponentiation), // (floor division).
• Comparison operators: ==, !=, <, >, <=, >=.
• Logical operators: and, or, not.
• Assignment operators: =, +=, -=, *=, /=, %=, **=, //=.
• Bitwise operators: &, |, ^, ~, <<, >>.
• Strings: Strings can be enclosed in single or double quotes.
You can use the + operator to concatenate strings.
greeting = "Hello"
name = "John"
message = greeting + ", " + name + "!"
print(message) # Output: Hello, John!
Control Flow
Python supports various control flow structures, such as if-else
statements, loops, and more.
• If-else statement
if x > 10:
print("x is greater than 10")
else:
print("x is less than or equal to 10")
• For Loops
for var in iterable: # statements
• While Loop
while expression:
statement(s)
Functions
Functions are blocks of code that perform a specific task. You
can define your own functions using the def keyword.
def greet(name):
print(f"Hello, {name}!")
greet("GeeksforGeeks") # Output: Hello, GeeksforGeeks!
Remember, Python is a dynamically typed language, so you
don’t need to declare the data type of a variable explicitly.
Python also uses whitespace (indentation) to define blocks of
code, which is different from many other programming
languages.
Beginner Tips for Learning Python Programming
Python is a versatile and widely-used programming language
with a vast ecosystem. Here are some areas where Python is
commonly used:
1. Web Development: Python is used to build web
applications using frameworks like Django, Flask, and
Pyramid. These frameworks provide tools and libraries for
handling web requests, managing databases, and more.
2. Machine Learning: Python is popular in data science and
machine learning due to libraries like NumPy, pandas,
Matplotlib, and scikit-learn. These libraries provide tools
for data manipulation, analysis, visualization, and machine
learning algorithms.
3. Natural Language Processing: Python is widely used in AI
and NLP applications. Libraries like TensorFlow, Keras,
PyTorch, and NLTK provide tools for building and training
neural networks, processing natural language, and more.
4. Game Development: Python can be used for game
development using libraries like Pygame and Panda3D.
These libraries provide tools for creating 2D and 3D games,
handling graphics, and more.
5. Desktop Applications: Python can be used to build
desktop applications using libraries like Tkinter, PyQt, and
wxPython. These libraries provide tools for creating
graphical user interfaces (GUIs), handling user input, and
more.
6. Scripting and Automation: Python is commonly used for
scripting and automation tasks due to its simplicity and
readability. It can be used to automate repetitive tasks,
manage files and directories, and more.
7. Web Scraping and Crawling: Python is widely used for web
scraping and crawling using libraries like BeautifulSoup and
Scrapy. These libraries provide tools for extracting data
from websites, parsing HTML and XML, and more.
Overall, Python is a powerful and flexible programming
language that is widely used in various fields. Whether you’re a
beginner or an experienced developer, Python has something to
offer for everyone.
Getting Started with Python Programming – FAQs
Can I learn Python on my own?
Yes, you can learn Python on your own. There are numerous
online resources, tutorials, courses, and books available for self-
study.
Is Python easier than Java?
Generally, Python is considered easier to learn and use than
Java due to its simpler syntax and dynamic typing.
Is Python coding hard?
Python coding is generally not considered hard, especially for
beginners. Its syntax is clear and readable, making it an
excellent choice for new programmers.
Is Python enough to get a job?
Yes, Python is enough to get a job. Python is widely used in web
development, data science, machine learning, automation, and
many other fields. Having strong Python skills can lead to
various job opportunities.
Is Python better than C++?
It depends on the use case. Python is better for rapid
development, ease of learning, and use in fields like data science
and web development. C++ is better for performance-critical
applications, such as game development, systems
programming, and applications requiring high-speed
computations.
Who earns more, Java or Python?
Salaries can vary based on location, experience, and specific job
roles. Generally, both Java and Python developers can earn high
salaries. Python developers often have an edge in fields like data
science and machine learning, which can command higher
salaries.
Ready to dive into the future? Mastering Generative AI and
ChatGPT is your gateway to the cutting-edge world of AI.
Perfect for tech enthusiasts, this course will teach you how to
leverage Generative AI and ChatGPT with hands-on, practical
lessons. Transform your skills and create innovative AI
applications that stand out. Don't miss out on becoming an AI
expert – Enroll now and start shaping the future!
Chapter two: Definition of Regretion
Regression in machine learning
Regression, a statistical approach, dissects the relationship
between dependent and independent variables, enabling
predictions through various regression models.
The article delves into regression in machine learning,
elucidating models, terminologies, types, and practical
applications.
What is Regression?
Regression is a statistical approach used to analyze the
relationship between a dependent variable (target variable) and
one or more independent variables (predictor variables). The
objective is to determine the most suitable function that
characterizes the connection between these variables.
It seeks to find the best-fitting model, which can be utilized to
make predictions or draw conclusions.
Regression in Machine Learning
It is a supervised machine learning technique, used to predict
the value of the dependent variable for new, unseen data. It
models the relationship between the input features and the
target variable, allowing for the estimation or prediction of
numerical values.
Regression analysis problem works with if output variable is a
real or continuous value, such as “salary” or “weight”. Many
different models can be used, the simplest is the linear
regression. It tries to fit data with the best hyper-plane which
goes through the points.
Terminologies Related to the Regression Analysis in Machine
Learning
Terminologies Related to Regression Analysis:
• Response Variable: The primary factor to predict or
understand in regression, also known as the dependent
variable or target variable.
• Predictor Variable: Factors influencing the response
variable, used to predict its values; also called independent
variables.
• Outliers: Observations with significantly low or high values
compared to others, potentially impacting results and best
avoided.
• Multicollinearity: High correlation among independent
variables, which can complicate the ranking of influential
variables.
• Underfitting and Overfitting: Overfitting occurs when an
algorithm performs well on training but poorly on testing,
while underfitting indicates poor performance on both
datasets.
Regression Types
There are two main types of regression:
• Simple Regression
o Used to predict a continuous dependent variable
based on a single independent variable.
o Simple linear regression should be used when there is
only a single independent variable.
• Multiple Regression
o Used to predict a continuous dependent variable
based on multiple independent variables.
o Multiple linear regression should be used when there
are multiple independent variables.
• NonLinear Regression
o Relationship between the dependent variable and
independent variable(s) follows a nonlinear pattern.
o Provides flexibility in modeling a wide range of
functional forms.
Regression Algorithms
There are many different types of regression algorithms, but
some of the most common include:
• Linear Regression
o Linear regression is one of the simplest and most
widely used statistical models. This assumes that
there is a linear relationship between the
independent and dependent variables. This means
that the change in the dependent variable is
proportional to the change in the independent
variables.
• Polynomial Regression
o Polynomial regression is used to model nonlinear
relationships between the dependent variable and
the independent variables. It adds polynomial terms
to the linear regression model to capture more
complex relationships.
• Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression
algorithm that is based on the support vector
machine (SVM) algorithm. SVM is a type of algorithm
that is used for classification tasks, but it can also be
used for regression tasks. SVR works by finding a
hyperplane that minimizes the sum of the squared
residuals between the predicted and actual values.
• Decision Tree Regression
o Decision tree regression is a type of regression
algorithm that builds a decision tree to predict the
target value. A decision tree is a tree-like structure
that consists of nodes and branches. Each node
represents a decision, and each branch represents the
outcome of that decision. The goal of decision tree
regression is to build a tree that can accurately predict
the target value for new data points.
• Random Forest Regression
o Random forest regression is an ensemble method that
combines multiple decision trees to predict the target
value. Ensemble methods are a type of machine
learning algorithm that combines multiple models to
improve the performance of the overall model.
Random forest regression works by building a large
number of decision trees, each of which is trained on
a different subset of the training data. The final
prediction is made by averaging the predictions of all
of the trees.
Regularized Linear Regression Techniques
• Ridge Regression
o Ridge regression is a type of linear regression that is
used to prevent overfitting. Overfitting occurs when
the model learns the training data too well and is
unable to generalize to new data.
• Lasso regression
o Lasso regression is another type of linear regression
that is used to prevent overfitting. It does this by
adding a penalty term to the loss function that forces
the model to use some weights and to set others to
zero.
Characteristics of Regression
Here are the characteristics of the regression:
• Continuous Target Variable: Regression deals with
predicting continuous target variables that represent
numerical values. Examples include predicting house
prices, forecasting sales figures, or estimating patient
recovery times.
• Error Measurement: Regression models are evaluated
based on their ability to minimize the error between the
predicted and actual values of the target variable.
Common error metrics include mean absolute error (MAE),
mean squared error (MSE), and root mean squared error
(RMSE).
• Model Complexity: Regression models range from simple
linear models to more complex nonlinear models. The
choice of model complexity depends on the complexity of
the relationship between the input features and the target
variable.
• Overfitting and Underfitting: Regression models are
susceptible to overfitting and underfitting.
• Interpretability: The interpretability of regression models
varies depending on the algorithm used. Simple linear
models are highly interpretable, while more complex
models may be more difficult to interpret.
Examples
Which of the following is a regression task?
• Predicting age of a person
• Predicting nationality of a person
• Predicting whether stock price of a company will increase
tomorrow
• Predicting whether a document is related to sighting of
UFOs?
Solution : Predicting age of a person (because it is a real value,
predicting nationality is categorical, whether stock price will
increase is discrete-yes/no answer, predicting whether a
document is related to UFO is again discrete- a yes/no answer).
Regression Model Machine Learning
Let’s take an example of linear regression. We have a Housing
data set and we want to predict the price of the house.
Following is the python code for it.
# Python code to illustrate
# regression using data set
import matplotlib
matplotlib.use('GTKAgg')
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
# Load CSV and columns
df = pd.read_csv("Housing.csv")
Y = df['price']
X = df['lotsize']
X=X.values.reshape(len(X),1)
Y=Y.values.reshape(len(Y),1)
# Split the data into training/testing sets
X_train = X[:-250]
X_test = X[-250:]
# Split the targets into training/testing sets
Y_train = Y[:-250]
Y_test = Y[-250:]
# Plot outputs
plt.scatter(X_test, Y_test, color='black')
plt.title('Test Data')
plt.xlabel('Size')
plt.ylabel('Price')
plt.xticks(())
plt.yticks(())
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(X_train, Y_train)
# Plot outputs
plt.plot(X_test, regr.predict(X_test), color='red',linewidth=3)
plt.show()
Output
Here in this graph, we plot the test data. The red line indicates
the best fit line for predicting the price.
To make an individual prediction using the linear regression
model:
print( str(round(regr.predict(5000))) )
Regression Evaluation Metrics
Here are some most popular evaluation metrics for regression:
• Mean Absolute Error (MAE): The average absolute
difference between the predicted and actual values of the
target variable.
• Mean Squared Error (MSE): The average squared
difference between the predicted and actual values of the
target variable.
• Root Mean Squared Error (RMSE): The square root of the
mean squared error.
• Huber Loss: A hybrid loss function that transitions from
MAE to MSE for larger errors, providing balance between
robustness and MSE’s sensitivity to outliers.
• Root Mean Square Logarithmic Error
• R2 – Score: Higher values indicate better fit, ranging from 0
to 1.
Applications of Regression
• Predicting prices: For example, a regression model could
be used to predict the price of a house based on its size,
location, and other features.
• Forecasting trends: For example, a regression model could
be used to forecast the sales of a product based on
historical sales data and economic indicators.
• Identifying risk factors: For example, a regression model
could be used to identify risk factors for heart disease
based on patient data.
• Making decisions: For example, a regression model could
be used to recommend which investment to buy based on
market data.
Advantages of Regression
• Easy to understand and interpret
• Robust to outliers
• Can handle both linear and nonlinear relationships.
Disadvantages of Regression
• Assumes linearity
• Sensitive to multicollinearity
• May not be suitable for highly complex relationships
Conclusion
Regression, a vital facet of supervised machine learning,
navigates the realm of continuous predictions. Its diverse
algorithms, from linear to ensemble methods, cater to a
spectrum of real-world applications, underscoring its
significance in data-driven decision-making.
Frequently Asked Question(FAQ’s)
What is regression and classification?
Regression are used to predict continuous values, while
classification categorizes data. Both are supervised learning
tasks in machine learning.
What is simple regression in machine learning?
Simple regression predicts a dependent variable based on one
independent variable, forming a linear relationship.
What are the different regression algorithm?
Regression algorithms include linear regression, polynomial
regression, support vector regression, and decision tree
regression.
Are you passionate about data and looking to make one giant
leap into your career? Our Data Science Course will help you
change your game and, most importantly, allow students,
professionals, and working adults to tide over into the data
science immersion. Master state-of-the-art methodologies,
powerful tools, and industry best practices, hands-on projects,
and real-world applications. Become the executive head of
industries related to Data Analysis, Machine Learning, and Data
Visualization with these growing skills. Ready to Transform Your
Future? Enroll Now to Be a Data Science Expert!
Introduction to Regression.
Multiple Choice
I. Choose the letter which contains the correct answer.
1) In which subject regression is mostly used?
Machine learning
Statistics
History
all except history
2) The best fit line method for data in Linear Regression?
Least Square Error
Maximum Likelihood
Logarithmic Loss
Both A and B
3) Which of the following metrics can be used to evaluate a
model with a continuous output variable?
Precision-Recall Curve
Accuracy
Logloss
Mean-Squared-Error
4) FOr a give N independent input variables (X1,X2… Xn)
and dependent (target) variable Y a linear regression is fitted for
the best fit line using least square error on this data. The
correlation coefficient for one of it’s variable(Say X1)
with Y is -0.97. Which of the following is true for X1? ?
Relation between the X1 and Y is weak
Relation between the X1 and Y is strong
Relation between the X1 and Y is neutral
Correlation does not imply relationship
5) Given below characteristics which of the following option is
the correct for Pearson correlation between V1 and V2? If you
are given the two variables V1 and V2 and they are following
below two characteristics. 1. If V1 increases then V2 also
increases 2. If V1 decreases then V2 behavior is unknown ?
Pearson correlation will be close to 1
Pearson correlation will be close to -1
Pearson correlation will be close to 0
None of these
6) Suppose Pearson correlation between V1 and V2 is zero. In
such case, is it right to conclude that V1 and V2 do not have any
relation between them?
True
False
7) Which statement is true about outliers in Linear regression?\
Linear regression model is not sensitive to outliers
Linear regression model is sensitive to outliers
Can’t say
None of these
8) The correlation coefficient is used to determine:
A specific value of the y-variable given a specific value of the x-
variable
A specific value of the x-variable given a specific value of the y-
variable
The strength of the relationship between the x and y variables
None of these
I. Choose the letter which contains the correct answer.
1) In which subject regression is mostly used?
Machine learning
Statistics
History
all except history
2) The best fit line method for data in Linear Regression?
Least Square Error
Maximum Likelihood
Logarithmic Loss
Both A and B
3) Which of the following metrics can be used to evaluate a
model with a continuous output variable?
Precision-Recall Curve
Accuracy
Logloss
Mean-Squared-Error
4) FOr a give N independent input variables (X1,X2… Xn) and
dependent (target) variable Y a linear regression is fitted for the
best fit line using least square error on this data. The correlation
coefficient for one of it’s variable(Say X1) with Y is -0.97. Which
of the following is true for X1? ?
Relation between the X1 and Y is weak
Relation between the X1 and Y is strong
Relation between the X1 and Y is neutral
Correlation does not imply relationship
5) Given below characteristics which of the following option is
the correct for Pearson correlation between V1 and V2? If you
are given the two variables V1 and V2 and they are following
below two characteristics. 1. If V1 increases then V2 also
increases 2. If V1 decreases then V2 behavior is unknown ?
Pearson correlation will be close to 1
Pearson correlation will be close to -1
Pearson correlation will be close to 0
None of these
6) Suppose Pearson correlation between V1 and V2 is zero. In
such case, is it right to conclude that V1 and V2 do not have any
relation between them?
True
False
7) Which statement is true about outliers in Linear regression?\
Linear regression model is not sensitive to outliers
Linear regression model is sensitive to outliers
Can’t say
None of these
8) The correlation coefficient is used to determine:
A specific value of the y-variable given a specific value of the x-
variable
A specific value of the x-variable given a specific value of the y-
variable
The strength of the relationship between the x and y variables
None of these
Use of regression in machine learning
Regression can be applied to various types of data. The input
features can be different types of datasets, including
numerical, categorical, image, text, video, etc. The goal of
regression is to predict a continuous numerical target
variable or parameter based on the input features.
Let’s explore how regression can be applied to different types of
data with examples:
Numerical Data:
• Example: Predicting house prices based on features like
the number of bedrooms, square footage, and distance to
the nearest amenities.
• Features: Number of bedrooms (numerical), Square
footage (numerical), Distance to amenities (numerical)
• Target: House price (continuous numeric)
Categorical Data with Numeric Values (Ordinal):
• Example: Predicting customer satisfaction scores based on
categorical feedback.
• Features: Feedback type (categorical: good, neutral, bad),
Response time (numerical)
• Target: Customer satisfaction score (continuous numeric)
Text Data (Natural Language Processing Regression):
• Example: Predicting the sentiment score of customer
reviews.
• Features: Textual content (text data), Keywords or
sentiment scores extracted from text
• Target: Sentiment score (continuous numeric)
Image Data:
• Example: Estimating the age of a person in an image.
• Features: Pixel values from the image, Features extracted
from pre-trained convolutional neural networks (CNNs)
• Target: Age of the person (continuous numeric)
Audio Data:
• Example: Predicting the duration of a sound event.
• Features: Spectrogram or other audio representations,
Features extracted from audio signals
• Target: Duration of the sound event (continuous numeric)
Time Series Data:
• Example: Predicting future stock prices.
• Features: Historical stock prices (numeric), Economic
indicators (numeric)
• Target: Future stock price (continuous numeric)
Spatial Data:
• Example: Predicting property values based on geographic
features.
• Features: Geographic coordinates (numerical), Area-
specific characteristics (numeric)
• Target: Property value (continuous numeric)
Video Data:
• Example: Predicting the viewer engagement duration of an
online advertisement video.
• Features: Frame-level features: Extracted visual elements,
colors, and composition from each frame of the video.
Audio features: Audio intensity over time, sentiment
analysis of background music or voiceover. Content-
specific features: Scene changes, object recognition (e.g.,
presence of a product), facial expressions of people in the
video.
• Target: Duration of viewer engagement with the
advertisement (continuous numeric).
Graph Data:
• Example: Predicting the traffic flow on a road network
based on various graph features.
• Features: Nodes representing intersections or locations,
Edges representing road segments, Traffic-related features
(e.g., historical traffic flow, time of day)
• Target: Traffic flow (continuous numeric)
Week 1 - Check Your Knowledge
Multiple Choice
1 point possible (ungraded)
Which of the following statements is correct?
Humans evolved from chimpanzees
Humans evolved from a common ancestor that we share with
chimpanzees
Humans do not share a common ancestor with chimpanzees
Multiple Choice
1 point possible (ungraded)
Biological anthropologists study which of the following?
Bones
Living primates
Human genetics
Teeth
Fossils
All of these choices
Multiple Choice
1 point possible (ungraded)
Paleoanthropology is a multidisciplinary field of study. It
incorporates studies from which of the following fields?
Biological Sciences
Geological Sciences
Archaeology
All of these choices
Multiple Choice
1 point possible (ungraded)
You are examining a geological sequence. You notice that a
stratum at the top of the sequence and a stratum at the bottom
of the sequence contain fossils. Which of the following
statements is true?
The fossils in the bottom stratum are geologically younger than
the fossils in the top stratum.
The fossils in the top stratum are geologically younger than the
fossils in the bottom stratum.
It is impossible to tell whether the fossils in the top stratum or
the fossils in the bottom stratum are geologically younger.Som.
Multiple Choice
1 point possible (ungraded)
Which of the following dating methods rely on the radioactive
decay of elements?
Paleomagnetism
Radiometric dating
Stratigraphy
Classification is a supervised machine learning process that
predicts the class of input data based on the algorithms training
data. Here’s what you need to know.
Classification is the process of predicting the class of given data
points. Classes are sometimes called targets, labels or
categories. Classification predictive modeling is the task of
approximating a mapping function (f) from input variables (X) to
discrete output variables (y.)
What Is Classification in Machine Learning?
Classification is a supervised machine learning process that
involves predicting the class of given data points. Those classes
can be targets, labels or categories. For example, a spam
detection machine learning algorithm would aim to classify
emails as either “spam” or “not spam.” Common classification
algorithms include: K-nearest neighbor, decision trees, naive
bayes and artificial neural networks.
For example, spam detection in email service providers can be
identified as a classification problem. This is a binary
classification since there are only two classes marked as “spam”
and “not spam.” A classifier utilizes some training data to
understand how given input variables relate to the class. In this
case, known spam and non-spam emails have to be used as the
training data. When the classifier is trained accurately, it can be
used to detect an unknown email.
Classification belongs to the category of supervised
learning where the targets are also provided with the input
data. Classification can be applied to a wide-variety of tasks,
including credit approval, medical diagnosis and target
marketing, etc.
Types of Classification in Machine Learning
There are two types of learners in classification — lazy learners
and eager learners.
1. Lazy Learners
Lazy learners store the training data and wait until testing data
appears. When it does, classification is conducted based on the
most related stored training data. Compared to eager learners,
lazy learners spend less training time but more time in
predicting.
Examples: K-nearest neighbor and case-based reasoning.
2. Eager Learners
Eager learners construct a classification model based on the
given training data before receiving data for classification. It
must be able to commit to a single hypothesis that covers the
entire instance space. Because of this, eager learners take a
long time for training and less time for predicting.
Chapter three: Classification Based algorithms
•A supervised scenario is characterized by the concept of a
teacher or supervisor, whose main task is to provide the agent
with a precise measure of its error (directly comparable
with output values)
•The goal is to infer a function or mapping from training data
that is labeled.
•The training data consist of input vector X and output
vector Y of labels or tags.
•Based the training set, the algorithms generalize to respond
correctly to all possible inputs i.e. it is called learning
from Examples.
Dataset Example:
•Weather information of last 14 days
•Whether match was played or not on that particular day.
•Then predict whether the game will happen or not if the
weather condition is (outlook=Rain, Humidity=High,
wind=weak) using Supervised learning model
•A data set denoted in the form of (x_i,y_i ):
oWhere the inputs are x_i , the outputs are y_i and i=1 to N, N
is the number of observation.
•Generalization: the algorithms should produce sensible output
for the inputs that were not encountered during learning.
Supervised Learning categorized into two:
oClassification: data is classified into one of two or more classes
oRegression: a task of predicting continuous quantity.
Classification:
oIt is a systematic approach to build classification models from
an input data set.
oIt is the task of assigning a new object to one of several
predefined categories.
•Examples of classification algorithms: decision tree classifiers,
rule-based classifiers, neural networks, support vector
machines, naive Bayes classifiers etc.
•Each technique employs a learning algorithm to identify a
model that best fits the relationship between the attribute set
and class label of the input data.
Blank Problem
1 point possible (graded)
Which of the following characteristics is NOT found in fossil
omomyids?
Five-cuspid lower molars
Postorbital bar
Nails
Grasping thumb
Blank Problem
1 point possible (graded)
True or false? Fossils of some of the earliest euprimates (i.e.,
true primates) have been discovered in North America.
True
False
Blank Problem
1 point possible (graded)
Adapids and omomyids were skeletally similar to which of the
following?
living cercopithecoids
living lemurs
living hominoids
living anthropoids
Blank Problem
1 point possible (graded)
Similar to apes and humans today, Aegyptopithecus did not
have a tail.
True
False
Blank Problem
1 point possible (graded)
Which of the following is true of the ape Aegyptopithecus?
All of the above are true
It had a 2.1.2.3./2.1.2.3. dental formula
It lacked a postorbital bar
It lived during the Miocene
Blank Problem
1 point possible (graded)
Among apes and humans, the Y-5 molar pattern can be
described as which of the following?
Homologous
Analogous
Blank Problem
1 point possible (graded)
Which of the following fossil hominoids is thought to have
weighed over 300 kilograms and eaten bamboo?
Dryopithecus
Gigantopithecus
Ouranopithecus
Sivapithecus
Blank Problem
1 point possible (graded)
True or false? In the early Miocene there were more ape
species in Africa than there were monkey species.
True
False
Blank Problem
1 point possible (graded)
True or false? Proconsul possessed some features that were
more monkey-like than ape-like.
True
False
Blank Problem
1 point possible (graded)
What geological feature had a major influence on the climate of
East Africa and on ape evolution?
The East African Rift Valley
The Grand Canyon
The Eurasian Plate
Lake Turkana
Blank Problem
1 point possible (graded)
Humans' bipedalism requires less calories than the locomotion
of chimpanzees or gorillas.
false
true
Blank Problem
1 point possible (graded)
Humans' lesser gluteal muscles help stabilize the _____ joint
when we walk.
ankle
hip
vertebral
knee
Blank Problem
1 point possible (graded)
The two primary phases of the modern human gait cycle are
the swing phase and which of the following?
plantar phase
step phase
stance phase
steady phase
Blank Problem
1 point possible (graded)
True or false? The sacrum of a human is broader than that of a
chimpanzee.
True
False
Blank Problem
1 point possible (graded)
The femur of a human forms an angle with a horizontal plane;
this angle results in humans having what type of knee?
lateral
valgus
medial
obtuse
Blank Problem
1 point possible (graded)
The remains of Ardipithecus ramidus, or Ardi, were first
discovered by a research team led by:
Zeresenay Alemseged
Matt Cartmill
Tim White
Donald Johanson
Blank Problem
1 point possible (graded)
True or false? Ardipithecus ramidus could NOT walk upright.
True
False
Blank Problem
1 point possible (graded)
Studies of the ancient ecology of Ardipithecus ramidus show it
lived in a __________________ environment.
wooded
savanna
desert
Blank Problem
1 point possible (graded)
To accomodate bipedalism, the lower region of the modern
human vertebral column bends slightly:
forward
backward
Blank Problem
1 point possible (graded)
Of the following Miocene apes, which has a pelvis that
resembles the pelvis of Ardipithecus ramidus?
Pierolapithecus
Sivapithecus
Oreopithecus
Dryopithecus