CSC 445 Merged
CSC 445 Merged
Philosophy of AI
Goals of AI
The goals of AI are to
A computer program without AI can answer A computer program with AI can answer the
the specific questions it is meant to solve.
generic questions it is meant to solve.
AI programs can absorb new modifications by
Modification in the program leads tochange putting highly independent pieces of information
in its structure. together. Hence you can modify even a minute
piece of information of program
without affecting its structure.
Applications of AI
1. Gaming
AI plays crucial role in strategic games such as chess, poker, tic-tac-toe, etc.,
where machine can think of large number of possible positions based on
heuristic knowledge.
2. Natural Language Processing
There are some applications which integrate machine, software, and special
information to impart reasoning and advising. They provide explanation and
advice to the users.
4. Vision Systems
These systems understand, interpret, and comprehend visual input on the computer 2
5. Speech Recognition
6. Handwriting Recognition
7. Intelligent Robots
Robots are able to perform the tasks given by a human. They have sensors
to detect physical data from the real world such as light, heat, temperature,
movement, sound, bump, and pressure. They have efficient processors,
multiple sensors and huge memory, to exhibit intelligence. In addition, they
are capable of learning from their mistakes and they can adapt to the new
environment.
Types of AI
a) Narrow AI (Weak AI)
This type of AI is designed to perform a specific task or a narrow set of tasks. It is specialized
in one domain and operates within pre-defined parameters. Narrow AI can outperform humans
in specific tasks but does not have general reasoning or understanding beyond its scope.
Examples are Voice assistants (e.g., Siri, Alexa), Image recognition systems (e.g., facial
recognition), Autonomous vehicles, Chatbots etc
General AI refers to a machine with the ability to understand, learn, and apply intelligence
across a broad range of tasks, similar to human cognitive abilities. It can perform any
intellectual task that a human can do, and it has the capacity for general reasoning and 3
adaptation. As of now, General AI remains a theoretical concept. No machine has yet achieved
the level of versatility and general intelligence found in humans. Example is an Hypothetical
AI that can perform any human task, such as reasoning, emotional intelligence, and complex
problem-solving across all domains.
c) Super-intelligent AI
This type of AI would surpass human intelligence in all aspects, including creativity, problem-
solving, and emotional intelligence. Super-intelligent AI is the subject of many philosophical
and ethical discussions, particularly regarding its potential impact on society. Super-intelligent
AI is a concept still in the realm of speculation and future possibilities. It is not yet developed.
Example is an hypothetical AI that exceeds human intelligence in every field, potentially
leading to self-improvement and creating innovations beyond human comprehension.
The Symbolic approach, also known as Classical AI or GOFAI, is based on the idea of using
symbols to represent knowledge about the world and logical rules to manipulate these symbols
to draw inferences or make decisions. This paradigm was the foundation of early AI research
and focuses on structured, rule-based reasoning. Symbolic AI systems use algorithms like
search techniques (e.g., depth-first search, breadth-first search) to explore possible solutions to
a problem by reasoning through symbolic representations.
Strengths:
Limitations:
Strengths:
1. Learning from Data: Excellent at handling large and complex datasets, especially
unstructured data like images, video, and speech.
2. Adaptability: Connectionist systems can adapt and improve over time as they are
exposed to more data, making them highly flexible in dynamic environments.
3. Generalization: They can generalize from examples and make predictions about new,
unseen data.
Limitations:
Lack of Transparency: Deep learning models are often considered "black boxes"
because it is difficult to interpret how they arrive at a particular decision.
Data Dependency: Connectionist approaches require large amounts of labeled data to
train effectively. Without sufficient data, the performance may be poor.
Computational Cost: Training deep neural networks can be computationally
expensive and may require specialized hardware (e.g., GPUs).
5
2. Intelligent Systems
What isIntelligence?
Intelligence can be defined as the capacity to learn from experience, adapt to new situations,
understand and handle abstract concepts, and use knowledge to manipulate one’s environment.
It encompasses a variety of cognitive functions such as reasoning, problem-solving, decision-
making, creativity, and comprehension. Intelligence allows an individual or a system to analyze
information, make judgments, and take actions that lead to successful outcomes.
In the context of Artificial Intelligence (AI), intelligence involves the ability of a machine or
system to simulate cognitive processes to perform tasks that would typically require human
intellect. This includes learning from data, reasoning to make decisions, understanding
language, and perceiving the world through sensory input.
1. Reasoning: It is the set of processes that enables us to provide basis for judgement, making
decisions, and prediction. There are broadly two types:
For example,
“Nita is a teacher.
All teachers are studious.
Therefore, Nita is studious.”
(b) Deductive reasoning starts with a general statement or hypothesis and examines the
possibilities to reach a specific, logical conclusion. It moves from general premises to specific 6
conclusions. It begins with a theory or rule and applies it to specific cases to draw a certain
conclusion. If something is true of a class of things in general, it is also true for all members
of that class. Deductive reasoning provides certainty.
For example,
"All women of age above 60 years aregrandmothers.
Shalini is 65 years.
Therefore, Shalini is a grandmother."
3. Problem solving: It is the process in which one perceives and tries to arrive at a desired
solution from a present situation by taking some path, which is blocked by known or unknown
hurdles. Problem solving also includes decision making. Decision making is the process of
selecting thebest suitable alternative out of multiple alternatives to reach the desired goal are
available.
7. Creativity: refers to the ability to produce new, original, and valuable ideas, solutions, or
artistic expressions. It involves combining existing concepts in novel ways, thinking beyond
conventional boundaries, and generating unique approaches to problems. Creativity is a
hallmark of higher cognitive functioning and plays a significant role in innovation, problem-
solving, and adaptation to complex challenges.
DifferencebetweenHumanandMachineIntelligence
1. Human Intelligence is biological in nature and driven by neural processes in the human
brain. Machine Intelligence is artificial, based on computational models, and operates within
the limits of predefined data, algorithms, and hardware.
2. Human Intelligence is capable of learning from a wide range of experiences and applying
knowledge to new and unfamiliar situations, through a combination of observation, experience,
practice, and adaptation. Machine intelligence relies on training data and pre-programmed
algorithms, and excels in specific, narrow tasks and needs retraining or new algorithms to adapt
to different domains or complex changes outside its initial programming.
3. Human Intelligence has the ability to use intuition and common sense, applying knowledge
to situations with incomplete information. Machine Intelligence Struggles with applying
common sense unless explicitly programmed. While machine learning models can identify
patterns in vast datasets, they may produce illogical results when faced with scenarios not
included in their training.
The Turing Test is a famous concept in artificial intelligence (AI) introduced by the British
mathematician and computer scientist Alan Turing in 1950. It is a method for assessing
whether a machine exhibits behavior that is indistinguishable from human intelligence.
Turing proposed that instead of asking whether machines can "think" (a complex philosophical
question), we should ask if a machine can behave intelligently enough to convince a human 8
observer that it is also human. In essence, the test measures a machine's ability to exhibit
intelligent behavior that is indistinguishable from human behavior.
1. The Human Interrogator: A person who asks questions to both the machine and the human
without knowing which is which.
2. The Human: A person who answers the same set of questions as the machine, without any
bias.
3. The Machine: The AI system or machine being tested, which also responds to the same set of
questions.
The interrogator's task is to determine which of the two is the human and which is the machine
based on their responses. The machine is considered to have passed the Turing Test if the
interrogator is unable to reliably distinguish between the machine and the human, or if the
machine’s responses are mistaken for those of the human.
9
3. Agents and Environments
What areAgent s?
An agent in AI refers to an autonomous entity or system capable of perceiving its environment through sensors
and acting upon that environment through actuators to achieve specific goals. The actions are chosen based on the
agent’s internal algorithms or decision-making processes.
Sensors: These are the agent's input mechanisms, allowing it to perceive information
from its environment. For example, a robot's sensors might include cameras,
microphones, and touch sensors.
Effectors: These are the agent's output mechanisms, enabling it to interact with the
environment. A robot's effectors could be motors, grippers, or speakers.
Actuators: These are the physical components of the agent that carry out actions. For
example, a robot's actuators might include its arms, legs, and wheels.
Characteristics of an AI Agent:
Autonomy: It operates without human intervention, making decisions based on its own
experiences or programming. 1
0
Reactivity: It responds to changes in the environment in a timely manner.
Proactiveness: It can take initiative to reach a goal, not just respond passively.
Adaptability: It can learn from interactions and modify its behaviour based on
experiences.
Types of AI Agents:
Simple Reflex Agents: They make decisions based solely on current percepts without
considering history (e.g., thermostats).
Model-based Agents: They maintain an internal model of the world to handle partial
observations and make informed decisions.
Goal-based Agents: They make decisions with the objective of achieving specified goals
and can plan future actions.
Utility-based Agents: They choose actions based on maximizing a utility function to reach
the most favourable outcome.
Learning Agents: They can improve their performance over time through learning from
past experiences.
An environme nt in AI refers to the external context or world within which an agent operates and interacts. The
environment provides the conditions and information that the agent perceives and acts upon.
Types of Environments:
1
1
Deterministic vs. Stochastic: In a deterministic environment, the next state of the
environment is entirely determined by the current state and the agent's action. In a
stochastic environment, there is randomness or uncertainty in the outcomes of actions.
Static vs. Dynamic: In a static environment, the environment remains unchanged while
the agent is deliberating. In a dynamic environment, the environment can change
independently of the agent's actions.
Discrete vs. Continuous: In a discrete environment, the number of possible states and
actions is finite. In a continuous environment, the states and actions can take on any
value within a given range.
Agent-Environment Interaction:
The interaction between an agent and its environment forms the basis of how an AI system
functions:
Perception: The agent gathers data from the environment using sensors (e.g., cameras, microphones,
or software inputs).
Decision-Making: The agent processes this data and decides on an action based on its programming or
learning algorithms.
Action: The agent executes an action using its actuators or output mechanism (e.g., motors in a robot,
commands in software).
Feedback Loop: The environment reacts to the action, changing its state, and the agent perceives the
result, completing the loop.
1
Environment: The road, traffic signals, pedestrians, weather conditions, and other vehicles. 2
Interaction: The car’s sensors (cameras, radar, LIDAR) perceive the environment, the AI processes
this information to make driving decisions, and actuators execute steering, acceleration, or braking
actions. The environment then changes in response (e.g., the car moves forward), which the agent
perceives again to adjust its behavior.
1
3
What is Natural Language Processing
Natural Language Processing (NLP) stands as a pivotal technology in the realm of
artificial intelligence, bridging the gap between human communication and computer
understanding. It is a multidisciplinary domain that empowers computers to interpret,
analyze, and generate human language, enabling seamless interaction between humans
and machines. The significance of NLP is evident in its widespread applications, ranging
from automated customer support to real-time language translation.
What is Natural Language Processing?
NLP is used in a wide variety of everyday products and services. Some of the most
common ways NLP is used are through voice-activated digital assistants on
smartphones, email-scanning programs used to identify spam, and translation apps that
decipher foreign languages.
NLP involves enabling machines to understand, interpret, and produce human language
in a way that is both valuable and meaningful. OpenAI, known for developing advanced
language models like ChatGPT, highlights the importance of NLP in creating intelligent
systems that can understand, respond to, and generate text, making technology more
user-friendly and accessible.
Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct response
within seconds.
o NLP offers exact answers to the question means it does not offer unnecessary and
unwanted information.
o NLP is unable to adapt to the new domain, and it has a limited function that's why
NLP is built for a single and specific task only.
Natural language techniques
NLP encompasses a wide range of techniques to analyze human language. Some of the
most common techniques you will likely encounter in the field include:
• Keyword extraction: An NLP technique that analyzes a text to identify the most
important keywords or phrases. Keyword extraction is commonly used for search
engine optimization (SEO), social media monitoring, and business intelligence
purposes.
Components of NLP
Natural Language Processing is not a monolithic, singular approach, but rather, it is
composed of several components, each contributing to the overall understanding of
language. The main components that NLP strives to understand are Syntax, Semantics,
Pragmatics, and Discourse.
Syntax
• Example: Consider the sentence "The cat sat on the mat." Syntax involves
analyzing the grammatical structure of this sentence, ensuring that it adheres to
the grammatical rules of English, such as subject-verb agreement and proper word
order
Semantics
• Definition: Semantics is concerned with understanding the meaning of words and
how they create meaning when combined in sentences.
• Example: In the sentence "The panda eats shoots and leaves," semantics helps
distinguish whether the panda eats plants (shoots and leaves) or is involved in a
violent act (shoots) and then departs (leaves), based on the meaning of the words
and the context.
Pragmatics
• Definition: Pragmatics deals with understanding language in various contexts,
ensuring that the intended meaning is derived based on the situation, speaker’s
intent, and shared knowledge.
• Example: If someone says, "Can you pass the salt?" Pragmatics involves
understanding that this is a request rather than a question about one's ability to
pass the salt, interpreting the speaker’s intent based on the dining context.
Discourse
• Example: In a conversation where one person says, "I’m freezing," and another
responds, "I’ll close the window," discourse involves understanding the coherence
between the two statements, recognizing that the second statement is a response
to the implied request in the first.
Understanding these components is crucial for anyone delving into NLP, as they form
the backbone of how NLP models interpret and generate human language.
NLP techniques and methods
To analyze and understand human language, NLP employs a variety of techniques and
methods. Here are some fundamental techniques used in NLP:
• Tokenization. This is the process of breaking text into words, phrases, symbols,
or other meaningful elements, known as tokens. Tokenization breaks text into smaller
parts for easier machine analysis, helping machines understand human language.
Tokenization, in the realm of Natural Language Processing (NLP) and machine
learning, refers to the process of converting a sequence of text into smaller parts,
known as tokens. These tokens can be as small as characters or as long as words.
The primary reason this process matters is that it helps machines understand
human language by breaking it down into bite-sized pieces, which are easier to
analyze.
To delve deeper into the mechanics, consider the sentence, "Chatbots are helpful."
When we tokenize this sentence by words, it transforms into an array of individual
words:
["C", "h", "a", "t", "b", "o", "t", "s", " ", "a", "r", "e", " ", "h", "e", "l", "p", "f", "u", "l"].
This character-level breakdown is more granular and can be especially useful for
certain languages or specific NLP tasks.
To delve deeper into the mechanics, consider the sentence, "Chatbots are helpful."
When we tokenize this sentence by words, it transforms into an array of individual
words:
["C", "h", "a", "t", "b", "o", "t", "s", " ", "a", "r", "e", " ", "h", "e", "l", "p", "f", "u", "l"].
This character-level breakdown is more granular and can be especially useful for
certain languages or specific NLP tasks.
It's worth noting that while our discussion centers on tokenization in the context
of language processing, the term "tokenization" is also used in the realms of
security and privacy, particularly in data protection practices like credit card
tokenization. In such scenarios, sensitive data elements are replaced with non-
sensitive equivalents, called tokens. This distinction is crucial to prevent any
confusion between the two contexts.
Types of Tokenization
Tokenization methods vary based on the granularity of the text breakdown and
the specific requirements of the task at hand. These methods can range from
dissecting text into individual words to breaking them down into characters or
even smaller units. Here's a closer look at the different types:
• Word tokenization. This method breaks text down into individual words. It's the
most common approach and is particularly effective for languages with clear word
boundaries like English.
The landscape of Natural Language Processing offers many tools, each tailored to
specific needs and complexities. Here's a guide to some of the most prominent
tools and methodologies available for tokenization:
• BERT tokenizer. Emerging from the BERT pre-trained model, this tokenizer
excels in context-aware tokenization. It's adept at handling the nuances and
ambiguities of language, making it a top choice for advanced NLP projects (see
this tutorial on NLP with BERT).
• Lemmatization. This technique reduces words to their base or root form, allowing
for the grouping of different forms of the same word.
Each of these techniques plays a vital role in enabling computers to process and
understand human language, forming the building blocks of more advanced NLP
applications.
What is NLP Used For?
Now that we have some of the basic concepts defined, let’s take a look at how natural
language processing is used in the modern world.
Industry applications
Natural Language Processing has found extensive applications across various industries,
revolutionizing the way businesses operate and interact with users. Here are some of the
key industry applications of NLP.
Healthcare
NLP assists in transcribing and organizing clinical notes, ensuring accurate and efficient
documentation of patient information. For instance, a physician might dictate their notes,
which NLP systems transcribe into text. Advanced NLP models can further categorize
the information, identifying symptoms, diagnoses, and prescribed treatments, thereby
streamlining the documentation process, minimizing manual data entry, and enhancing
the accuracy of electronic health records.
Finance
Financial institutions leverage NLP to perform sentiment analysis on various text data
like news articles, financial reports, and social media posts to gauge market sentiment
regarding specific stocks or the market in general. Algorithms analyze the frequency of
positive or negative words, and through machine learning models, predict potential
impacts on stock prices or market movements, aiding traders and investors in making
informed decisions.
Customer Service
• Virtual assistants. Siri, Alexa, and Google Assistant are examples of virtual
assistants that use NLP to understand and respond to user commands.
• Translation services. Services like Google Translate employ NLP to provide real-
time language translation, breaking down language barriers and fostering
communication.
• Email filtering. NLP is used in email services to filter out spam and categorize
emails, helping users manage their inboxes more effectively.
• Social media monitoring. NLP enables the analysis of social media content to
gauge public opinion, track trends, and manage online reputation.
The applications of NLP are diverse and pervasive, impacting various industries and our
daily interactions with technology. Understanding these applications provides a glimpse
into the transformative potential of NLP in shaping the future of technology and human
interaction.
Challenges and The Future of NLP
Although natural language processing is an incredibly useful tool, it’s not without its
flaws. Here, we look at some of the challenges we need to overcome, as well as what the
future holds for NLP.
Overcoming NLP challenges
Natural Language Processing, despite its advancements, faces several challenges due to
the inherent complexities and nuances of human language. Here are some of the
challenges in NLP:
• Context. Understanding the context in which words are used is crucial for accurate
interpretation, and it remains a significant challenge for NLP.
Check out our advanced NLP with spaCy course to discover how to build advanced
natural language understanding systems using machine learning approaches.
• Ethical and responsible AI. The focus on ethical considerations and responsible
AI will shape the development of NLP models, ensuring fairness, transparency,
and accountability.
The exploration of challenges provides insights into the complexities of NLP, while the
glimpse into the future highlights the potential advancements and the evolving
landscape of Natural Language Processing.
Text Classification & Sentiment Analysis
Text classification holds immense significance in NLP due to its wide range of
applications across different fields. It serves as the backbone for various downstream
NLP tasks, including sentiment analysis, spam detection, topic categorization, and
document organization. By automatically categorizing textual data, text classification
algorithms enable efficient information retrieval, content filtering, and knowledge
extraction from large corpora.
These steps include preprocessing, and feature extraction techniques are applied to
represent the text data in a numerical format. Once the data is preprocessed and
represented, machine learning models are trained on labeled training data to learn
patterns and relationships between features and labels.
• Handling stopwords.
Tokenization
Tokenization involves breaking down text into smaller units, such as words, phrases, or
characters. These units, known as tokens, serve as the basic building blocks for NLP tasks.
Common tokenization techniques include:
• Word tokenization.
• Character tokenization.
Normalization
Normalization involves transforming text into a standardized format to reduce
redundancy and variation. This step helps ensure consistency in the representation of
text data and improves the effectiveness of NLP algorithms. Common normalization
techniques include:
• Stemming.
• Lemmatization.
Read in more detail about text processing techniques and how you can implement them
in the following article Tokenization the Cornerstone for NLP
Feature Extraction and Text Representation
Feature extraction and text representation are critical steps in Natural Language
Processing (NLP) that involve converting raw text data into numerical vectors or
matrices. These representations capture the semantic and syntactic information of the
text, enabling machine learning algorithms to operate effectively. Here are some common
techniques for feature extraction and representation in NLP:
Bag-of-Words (BoW) Model:
The Bag-of-Words (BoW) model is a simple yet effective technique for representing text
data. It involves creating a vocabulary of unique words from the entire corpus of
documents and representing each document as a fixed-length vector, where each
dimension corresponds to the frequency of a word in the document. The BoW model
disregards the order of words and only considers their frequency, making it suitable for
tasks like sentiment analysis and document classification.
Read More about the BOW model in this article BOW Understanding
Word Embeddings
Word2Vec: Word2Vec is a shallow neural network model that learns continuous word
embeddings by predicting the context of words in a large corpus of text. It provides dense
vector representations for words based on their distributional semantics.
Naive Bayes: Naive Bayes is a probabilistic classifier based on Bayes’ theorem with the
assumption of independence between features. It is simple, efficient, and works well with
high-dimensional data such as text.
Support Vector Machines (SVM): SVM is a supervised learning algorithm that separates
data points by maximizing the margin between classes in a high-dimensional space.
SVMs are effective for text classification tasks with linear or non-linear decision
boundaries and can handle large feature spaces efficiently.
Random Forest: Random Forest is an ensemble learning method that builds multiple
decision trees and combines their predictions through voting or averaging. It is robust,
scalable, and less prone to overfitting compared to individual decision trees. Random
Forests perform well for text classification tasks with complex feature interactions and
large datasets.
Recurrent Neural Networks (RNNs): RNNs are a class of neural networks designed to
handle sequential data, making them well-suited for text processing tasks. They have
recurrent connections that allow them to capture temporal dependencies in text
sequences.
• Deep learning models like CNNs, RNNs, and Transformers tend to excel with
large datasets due to their capacity to learn complex representations.
Complexity of the Task:
• Deep learning models, particularly Transformers, are suitable for complex text
classification tasks requiring semantic understanding, contextual reasoning, and
handling of long-range dependencies.
• For simpler tasks with straightforward feature interactions, traditional machine
learning algorithms may suffice.
Computational Resources:
Figure 3:
Customer report classification
Spam Detection and Email Filtering:
Text classification plays a crucial role in email filtering systems by distinguishing
between legitimate emails and spam messages. By classifying incoming emails into spam
and non-spam categories, email providers can protect users from unsolicited and
potentially harmful messages, ensuring a clutter-free inbox.
Figure 4: Mail
Classifier into inbox or spam
Sentiment Analysis:
In social media platforms like Twitter and Facebook, text classification is employed for
sentiment analysis, which involves categorizing social media posts or comments into
positive, negative, or neutral sentiment categories. This enables businesses to understand
public opinion, monitor brand perception, and respond to customer feedback in real-
time.
Understanding of Sentiment Analysis
Sentiment analysis, also known as opinion mining, is a natural language processing
(NLP) technique that involves the identification, extraction, and analysis of subjective
information from textual data. It aims to determine the sentiment or emotional tone
expressed in a piece of text, whether it’s positive, negative, or neutral.
Figure 5:
Sentiment Analysis Classifier
Sentiment analysis importance
Business and Marketing:
Example rules might identify keywords associated with positive like happy or negative
sentiments like sad.
Machine Learning Algorithms:
Machine learning (ML) algorithms are trained on labeled data to automatically learn
patterns and relationships between features and sentiment labels. ML algorithms require
feature engineering, where relevant features (e.g., word frequency, n-grams) are
extracted from text data before training.
Challenges of Sentiment Analysis
Dealing with Sarcasm, Irony, and Ambiguity in Text:
Sarcasm, irony, and ambiguity are prevalent in natural language and can lead to
misinterpretation by sentiment analysis systems. For example, a sarcastic statement
might contain positive words but convey negative sentiments.
Addressing Bias and Ethical Concerns in Sentiment Analysis:
Sentiment analysis systems may inadvertently perpetuate biases present in the training
data, leading to unfair or discriminatory outcomes. Biases can arise due to skewed
datasets, societal stereotypes, or cultural biases.
Handling Multilingual and Cross-cultural Sentiment Analysis:
Sentiment analysis models trained on one language or cultural context may not
generalize well to other languages or cultures. Differences in language structure,
sentiment expression, and cultural norms pose challenges for cross-cultural sentiment
analysis.
Code Implementation of Sentiment Classifier
Using Naive Bayes
Sentiment Classifier using Naive Bayes
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
ChatGPT is making waves worldwide, attracting over 1 million users in record time. GPT
(Generative Pre-trained Transformer) is a type of language model that has gained
significant attention in recent years due to its ability to perform various natural languages
processing tasks, such as text generation, summarization, and question-answering.
What is a language model?
A language model is a machine learning model that aims to predict and generate
plausible language. Autocomplete is a language model, for example.
If you assume that a token is a word, then a language model determines the probabilities
of different words or sequences of words to replace that underscore. For example, a
language model might determine the following probabilities:
cower 3.6%
nap 2.5%
relax 2.2%
...
Estimating the probability of what comes next in a sequence is useful for all kinds of
things: generating text, translating languages, and answering questions, to name a few.
Large Language Models (LLMs) are trained on massive amounts of text data. As a result,
they can generate coherent and fluent text. LLMs perform well on various natural
languages processing tasks, such as language translation, text summarization, and
conversational agents. LLMs perform so well because they are pre-trained on a large
corpus of text data and can be fine-tuned for specific tasks. GPT is an example of a Large
Language Model. These models are called “large” because they have billions of
parameters that shape their responses. For instance, GPT-3, the largest version of GPT,
has 175 billion parameters and was trained on a massive corpus of text data.
The basic premise of a language model is its ability to predict the next word or sub-word
(called tokens) based on the text it has observed so far. To better understand this, let’s
look at an example.
The above example shows that the language model predicts one token at a time by
assigning probabilities to tokens based on its training. Typically, the token with the
highest probability is used as the next part of the input. This process is repeated
continuously until a special <stop> token is selected.
The deep learning architecture that has made this process more human-like is the
Transformer architecture. So let us now briefly understand the Transformer
architecture.
The Transformer Architecture: The Building Block
The transformer architecture is the fundamental building block of all Language Models
with Transformers (LLMs). The transformer architecture was introduced in the paper
“Attention is all you need,” published in December 2017. The simplified version of the
Transformer Architecture looks like this:
There are seven important components in transformer architecture. Let’s go through
each of these components and understand what they do in a simplified manner:
1. Inputs and Input Embeddings: The tokens entered by the user are considered
inputs for the machine learning models. However, models only understand
numbers, not text, so these inputs need to be converted into a numerical format
called “input embeddings.” Input embeddings represent words as numbers,
which machine learning models can then process. These embeddings are like a
dictionary that helps the model understand the meaning of words by placing
them in a mathematical space where similar words are located near each other.
During training, the model learns how to create these embeddings so that similar
vectors represent words with similar meanings.
3. Encoder: The encoder is part of the neural network that processes the input text
and generates a series of hidden states that capture the meaning and context of
the text. The encoder in GPT first tokenizes the input text into a sequence of
tokens, such as individual words or sub-words. It then applies a series of self-
attention layers; think of it as voodoo magic to generate a series of hidden states
that represent the input text at different levels of abstraction. Multiple layers of
the encoder are used in the transformer.
4. Outputs (shifted right): During training, the decoder learns how to guess the
next word by looking at the words before it. To do this, we move the output
sequence over one spot to the right. That way, the decoder can only use the
previous words. With GPT, we train it on a ton of text data, which helps it make
sense when it writes. The biggest version, GPT-3, has 175 billion parameters and
was trained on a massive amount of text data. Some text corpora we used to train
GPT include the Common Crawl web corpus, the BooksCorpus dataset, and the
English Wikipedia. These corpora have billions of words and sentences, so GPT
has a lot of language data to learn from.
5. Output Embeddings: Models can only understand numbers, not text, like input
embeddings. So the output must be changed to a numerical format, known as
“output embeddings.” Output embeddings are similar to input embeddings and
go through positional encoding, which helps the model understand the order of
words in a sentence. A loss function is used in machine learning, which measures
the difference between a model’s predictions and the actual target values. The
loss function is particularly important for complex models like GPT language
models. The loss function adjusts some parts of the model to improve accuracy
by reducing the difference between predictions and targets. The adjustment
ultimately improves the model’s overall performance, which is great! Output
embeddings are used during both training and inference in GPT. During
training, they compute the loss function and update the model parameters.
During inference, they generate the output text by mapping the model’s
predicted probabilities of each token to the corresponding token in the
vocabulary.
6. Decoder: The positionally encoded input representation and the positionally
encoded output embeddings go through the decoder. The decoder is part of the
model that generates the output sequence based on the encoded input sequence.
During training, the decoder learns how to guess the next word by looking at the
words before it. The decoder in GPT generates natural language text based on
the input sequence and the context learned by the encoder. Like an encoder,
multiple layers of decoders are used in the transformer.
7. Linear Layer and Softmax: After the decoder produces the output embeddings,
the linear layer maps them to a higher-dimensional space. This step is necessary
to transform the output embeddings into the original input space. Then, we use
the softmax function to generate a probability distribution for each output token
in the vocabulary, enabling us to generate output tokens with probabilities.
The Concept of Attention Mechanism
Attention is all you need.
The transformer architecture beats out other ones like Recurrent Neural networks
(RNNs) or Long short-term memory (LSTMs) for natural language processing. The
reason for the superior performance is mainly because of the “attention mechanism”
concept that the transformer uses. The attention mechanism lets the model focus on
different parts of the input sequence when making each output token.
• The RNNs don’t bother with an attention mechanism. Instead, they just plow
through the input one word at a time. On the other hand, Transformers can
handle the whole input simultaneously. Handling the entire input sequence, all
at once, means Transformers do the job faster and can handle more complicated
connections between words in the input sequence.
• LSTMs use a hidden state to remember what happened in the past. Still, they can
struggle to learn when there are too many layers (a.k.a. the vanishing gradient
problem). Meanwhile, Transformers perform better because they can look at all
the input and output words simultaneously and figure out how they’re related
(thanks to their fancy attention mechanism). Thanks to the attention mechanism,
they’re really good at understanding long-term connections between words.
Let’s summarize:
• It lets the model selectively focus on different parts of the input sequence instead
of treating everything the same way.
• It can capture relationships between inputs far away from each other in the
sequence, which is helpful for natural language tasks.
• It needs fewer parameters to model long-term dependencies since it only has to
pay attention to the inputs that matter.
• It’s really good at handling inputs of different lengths since it can adjust its
attention based on the sequence length.
Introduction to Machine Learning
Definition
Machine Learning (ML) is a branch of Artificial Intelligence (AI) that focuses on building
systems capable of learning and improving from experience without being explicitly
programmed. It uses algorithms to identify patterns in data and make predictions or decisions.
The key objective of supervised learning are to minimize the difference between the predicted
output and the actual output i.e. reduce error and for the machine generalize well to unseen data
1. Data Explosion: With the massive growth in data, manual analysis is no longer
feasible. ML helps make sense of this data.
2. Real-World Applications: ML is at the heart of technologies like facial recognition,
self-driving cars, personalized recommendations, and medical diagnosis.
3. Continuous Improvement: ML models improve over time with more data, enabling
better predictions and insights.
Supervised Learning is a type of machine learning where the algorithm is trained on a labeled
dataset. The "supervision" comes from the availability of input-output pairs, where the desired
output (label) is known. The model learns to map inputs to the correct outputs and generalize
this mapping to unseen data.
Supervised learning revolves around teaching a machine to learn from labeled data. Each key
concept plays a crucial role in understanding and applying supervised learning effectively.
1. Labeled Data
Labeled data consists of input-output pairs, where each input is mapped to a known output.
The model learns this mapping during training. For example, in predicting the price of the
house, the input features are features of a house such as size, location and number of bedrooms;
while the output feature is the predicted price of the house.
The quality and quantity of labeled data directly impact the performance of the supervised
model. High-quality labels reduce noise and improve the model's accuracy.
2. Features and Labels
Features (X):
These are independent variables or inputs that provide information to the model. In predicting
the appropriate weather for crop yields, the input features X could be:
While the Output Label (Y) is Predicted crop yield (e.g., in tons per hectare).
Labels (Y):
These are the dependent variable or the target the model predicts. Examples are the predicted
price of house and predicted crop yield. For image classification, the label could be "dog,"
"cat," or "bird."
Feature Engineering
The is the crafting and selecting the right features is critical. Poor features may lead to
suboptimal performance, even with powerful algorithms.
Training Dataset:
A subset of the data used to train the model. The model learns patterns from this dataset.
Testing Dataset:
A separate subset of data used to evaluate how well the model generalizes to unseen examples.
Train-Test Split:
The Common ratio used is usually 70% training, 30% testing (or 80%-20%). This ensures that
the model is evaluated on data it hasn't seen during training.
Validation Set:
An additional split for hyper-parameter tuning, ensuring the testing set remains untouched until
the final evaluation. To give room for hyper-parameter tuning, one can make use of 70-20-10
split ratio where 70% is used for training, 20% for validation and 10% for testing.
Types of Supervised learning tasks
Supervised learning tasks are broadly categorized into regression and classification, based on the type
of output the model is predicting.
1. Regression Tasks
Regression involves predicting a continuous numerical value based on input features. The goal
is to model the relationship between the inputs and the output to predict a value as accurately
as possible. The output variable is continuous (e.g., height, temperature, price), and the
evaluation metrics focus on measuring prediction error. Examples are
1. Mean Squared Error (MSE): Average squared difference between actual and predicted
values.
2. Root Mean Squared Error (RMSE): Square root of the MSE, interpretable in the same units
as the target variable.
3. Mean Absolute Error (MAE): Average absolute difference between predictions and true
values.
4. R² Score (Coefficient of Determination): Measures how well the model explains variance in
the data.
2. Classification Tasks
Classification involves predicting a discrete category or class label based on input features. The
model assigns each input to one of several predefined classes. The output variable is categorical
(e.g., yes/no, spam/not spam, dog/cat). The evaluation metrics focus on the accuracy of class
predictions. Examples are
Types of Classification:
1. Accuracy
2. Precision
The proportion of correctly predicted positive observations out of all predicted positive
observations.
The proportion of correctly predicted positive observations out of all actual positive
observations.
4. F1 Score
The harmonic mean of precision and recall, providing a balance between the two.
Definitions of Terms:
Supervised learning algorithms are designed to learn from labeled datasets to make predictions.
These algorithms adjust their internal parameters by analysing input-output pairs and then
generalize this knowledge to make accurate predictions on unseen data. Below is a brief
description of some common supervised learning algorithms:
1. Linear Regression
Linear regression models the relationship between input features and a continuous output by
fitting a linear equation to the data. It minimizes the difference between predicted and actual
values using the least squares method. It is used for regression task. Use Cases include the
Prediction of house prices, stock prices, or temperature.
2. Logistic Regression
Logistic regression is used for binary classification problems. It estimates the probability of an
outcome belonging to a particular class using the logistic function (sigmoid). The output is a
probability between 0 and 1, which can be thresholded to classify into two categories. It is used for
Classifcation task. Use Cases incluseSpam detection, medical diagnosis (e.g., disease/no disease).
3. Decision Trees
A decision tree splits the data based on feature values into branches, with each branch representing a
decision. It continues splitting until a decision (class label or predicted value) is reached at the leaf
nodes. Decision trees are easy to understand and visualize. It is used for classification and regression
task. Use Cases include customer segmentation, loan approval, disease prediction.
k-NN classifies data points based on the majority class among the nearest 'k' neighbors. When
given a new data point (like you picking a movie), it finds the k closest neighbors based on
feature similarity by calculating the distance (often Euclidean) between data points. The new
data point is classified or predicted based on the majority label or value of its nearest neighbors.
This makes KNN ideal for tasks like recommendation systems, pattern recognition, or even
medical diagnoses, where similarity between data points plays a key role.
The algorithm is suitable for classification and regression tasks. Use cases include Image classification,
recommendation systems etc
SVM finds the hyperplane that best separates classes by maximizing the margin between the closest
points (support vectors) of each class. It can handle non-linear classification by using kernel functions.
It is best suited for classification tasks. Use Cases include text classification, image classification, etc
6. Random Forest
Random Forest is an ensemble learning method that combines multiple decision trees. Each tree is
trained on a random subset of the data, and predictions are made by aggregating the results of individual
trees, typically by voting for classification or averaging for regression, which makes it very suitable for
both classification and regression tasks. Use Cases include fraud detection, stock market prediction,
medical diagnosis etc.
7. Naive Bayes
Naive Bayes is based on Bayes' Theorem and assumes that features are independent. Despite this
simplistic assumption, it performs surprisingly well in many real-world applications, and suited for
classification tasks. Use cases include Spam filtering, sentiment analysis, document classification etc.
8. Neural Networks
Neural networks consist of layers of interconnected nodes (neurons). Each neuron processes inputs with
weights, applies an activation function, and passes the result to the next layer. Neural networks are
highly flexible and can model complex relationships in data. Neural Network algorithm is suited for
most classification and regression tasks. Use cases include image recognition, natural language
processing, autonomous driving.
Each of these algorithms has its strengths and weaknesses, and the choice of which one to use
depends on the problem at hand, the dataset size, and the required model interpretability.
Unsupervised machine learning involves training models on data that does not have labelled
outputs, and focuses on finding hidden patterns, structures, and groupings in unlabelled data.
They are essential in areas where labelled data is scarce or unavailable, providing insights into
the data's underlying structure. The goal is to uncover patterns, groupings, or structures within
the data. These algorithms are commonly used for clustering, dimensionality reduction, and
anomaly detection.
One of the key advantages of hierarchical clustering is its flexibility, as it does not require the
user to specify the number of clusters. Instead, the user can choose the desired level of
granularity by cutting the dendrogram at a certain point, which determines how many clusters
will be formed. However, hierarchical clustering can be computationally expensive,
particularly with large datasets, as the algorithm requires calculating the distance between all
pairs of data points. It also assumes that clusters are of roughly similar sizes and shapes, which
may not always be true in complex datasets. Despite these limitations, hierarchical clustering
is useful for data exploration, particularly when the number of clusters is unknown, and when
understanding the relationships between clusters is important.
PCA is widely applied in tasks like visualization of high-dimensional data, noise reduction,
and preprocessing for machine learning algorithms. For instance, in image compression,
PCA reduces the dimensionality of pixel data while keeping the image quality intact. However,
PCA assumes linear relationships between features and is sensitive to scaling, so preprocessing
steps like standardization are often necessary. Despite these limitations, PCA remains a
powerful tool for uncovering underlying patterns, speeding up computations, and reducing
overfitting in models by eliminating redundant features.
Deep Learning
Deep Learning is a specialized subfield of machine learning that focuses on using neural
networks with many layers to model complex patterns and representations in data. These neural
networks, often referred to as deep neural networks (DNNs), consist of multiple layers of
interconnected nodes, where each node performs simple mathematical computations. As data
flows through the network, each layer extracts increasingly complex features. For example, in
image processing, the first layers might detect edges, while deeper layers can recognize more
complex structures like faces or objects. This deep architecture allows the model to
automatically learn hierarchical representations of data, eliminating the need for manual feature
extraction.
One of the major strengths of deep learning is its ability to handle large amounts of high-
dimensional data, making it especially effective for tasks such as image recognition, speech
recognition, natural language processing (NLP), and autonomous systems. Popular deep
learning architectures include Convolutional Neural Networks (CNNs), which are used for
analyzing image data, Recurrent Neural Networks (RNNs), which are suited for time-series
or sequential data like speech or text, and Transformers, which are commonly used in NLP
tasks like language translation and text generation. These models are able to learn from vast
amounts of labeled data, often achieving superior performance compared to traditional machine
learning methods.
However, deep learning models have some challenges. They require significant computational
resources to train, particularly for large datasets. Specialized hardware such as Graphics
Processing Units (GPUs) is commonly used to accelerate training. Additionally, deep learning
models often require large amounts of labeled data to effectively learn patterns and generalize
well to new data. Despite these demands, deep learning has led to breakthrough advancements
in fields like autonomous driving, medical imaging, and AI-driven content generation. With
continuous advancements in computational power and algorithms, deep learning remains a
leading force in the development of artificial intelligence.