Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
43 views38 pages

03 NLP Document

The document discusses Natural Language Processing (NLP) and its advancements through Large Language Models (LLMs), highlighting their ability to understand and generate human language. It covers the evolution of NLP technologies, the significance of LLMs like GPT-3 and GPT-4, and the challenges associated with their ethical implications. The text emphasizes the interdisciplinary nature of NLP and its applications in various fields, showcasing its impact on human-computer interaction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views38 pages

03 NLP Document

The document discusses Natural Language Processing (NLP) and its advancements through Large Language Models (LLMs), highlighting their ability to understand and generate human language. It covers the evolution of NLP technologies, the significance of LLMs like GPT-3 and GPT-4, and the challenges associated with their ethical implications. The text emphasizes the interdisciplinary nature of NLP and its applications in various fields, showcasing its impact on human-computer interaction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

NLP: Decoding Human Language with AI

Bridging Words and Machines with Next-Gen Linguistics

Mauricio Henriquez Schott, Ph.D.


October 31, 2023
Index
Content
1 Introduction to Natural Language Processing (NLP) 4

2 Large Language Models (LLM) 8

3 LLM - Prompt Engineering 31

4 Conclusions 34

5 Additional Material 35

6 Glossary of Terms and Acronyms 36

7 References 37

List of Figures
1 Phases of a NLP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 (a) In the architecture Encoder-Decoder, the input sequence is first encoded into a
state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers
[Hol23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 NLP Models Size [Dat23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Embeddings: Word-Association Vector Representation [Dev23]. . . . . . . . . . . 16
5 Transformers Architecture [Vas+17]. . . . . . . . . . . . . . . . . . . . . . . . . 19
6 LLM and API Interaction, Query-Chain Example. . . . . . . . . . . . . . . . . . 23
7 LLM and API Interaction, Agents [Gre23]. . . . . . . . . . . . . . . . . . . . . . 25
8 LLM Advance Features, Retrieval Augmented Generation (RAG) Example [Nam23]. 28
9 LM Studio [Stu23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 KoboldCpp [Kob23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
11 Oobabooga Text Generation Web UI [Web23]. . . . . . . . . . . . . . . . . . . . 30

List of Tables
1 Current LLM Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Listings
1 Python summary code for a LSTM-LLM Model Implementation (lstm-llm01.py) 17

2
2 Python summary code for a Transformers-LLM Model Implementation (transformers-
llm02.py) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Python summary code for OpenAI LLM Query-Chain (chatest01.py) . . . . 24
4 Python summary code for Langchain Agents (autoagents-openai-01.py) . . . 26
5 Python summary code for Autogen Agents (autoagents-local llm-02.py) . . . 27
6 Python summary code for Ollama+RAG (rag.py) . . . . . . . . . . . . . . . 29

Mauricio Henriquez Schott, Ph.D. [email protected] 3


1 Introduction to Natural Language
Processing (NLP)

1 Introduction to Natural Language Processing (NLP)

For many years, the primary means of interacting with computers was through physical
devices such as keyboards and mice. These interfaces, though reliable and precise, posed a
stark contrast to the natural ways humans communicate with each other. With the advent of
advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies,
we now envision a future where interacting with computers could be as seamless as having
a conversation with a fellow human.
NLP, a field at the intersection of computer science, artificial intelligence, and linguistics,
strives to enable computers to understand, interpret, and generate human language in a
valuable manner. The core of recent advancements in NLP is attributed to Large Language
Models (LLMs), which are deep learning models with a vast number of parameters that are
trained on extensive corpora of text data. These models learn to predict the probability of a
word or phrase given its context in a sentence, thereby gaining an understanding of syntax
and semantics.
LLMs, such as GPT-3 and BERT, have shown remarkable abilities in understanding and
generating human-like text, opening doors to numerous applications including but not limited
to text summarization, translation, and question-answering systems. The paradigm shift
toward more natural human-machine interaction is becoming apparent with the proliferation
of voice assistants like Amazon’s Alexa and Apple’s Siri, which utilize NLP and LLM
technologies to understand and respond to user queries in natural language.
However, we are on the cusp of further advancements as researchers and practitioners
aim to push the boundaries of what LLMs can achieve. The recent unveiling of models
like GPT-4, with its enhanced reasoning capabilities, and BLOOM, with a staggering 176
billion parameters, indicates a trajectory towards more sophisticated language understanding
and generation. These models not only promise a future of more intuitive human-machine
interactions but also a vast landscape of applications that could revolutionize industries from
healthcare to education and beyond.
While the silver screen has often romanticized the idea of seamless human-machine
communication with examples from Star Trek’s universal translator to Tony Stark’s J.A.R.V.I.S.,
and depicted in films like ”Her” and ”Blade Runner,” the reality is that we are inching closer
to such futuristic scenarios. The continuous efforts in the field of NLP and the evolution of
LLMs are bridging the gap between the natural ease of human conversation and the digital
dialogue with machines, marking a significant stride towards a future where the keyboard
and mouse may become relics of a bygone era.
Pop Culture References:
ˆ Star Trek’s Translator: Depicts real-time language translation.
ˆ J.A.R.V.I.S. in Iron Man: An AI that understands and assists Tony Stark.
ˆ C-3PO in Star Wars: A droid fluent in over six million forms of communication,
showcasing the potential of advanced linguistic processing.

Mauricio Henriquez Schott, Ph.D. [email protected] 4


1 Introduction to Natural Language
Processing (NLP)

Decoding NLP: More than Just a Buzzword


Defining NLP
Natural Language Processing (NLP) is an AI domain that bridges human language with
machine understanding. It’s not just about translating words, but grasping meaning, context,
and sentiment.
Interdisciplinary Roots
NLP isn’t just computer science. It’s a fusion of linguistics, cognitive psychology, and tech,
working together to decipher human language complexities.
LLMs: The Pinnacle of Modern NLP
Language Models, especially the newer Large Language Models (LLMs) like GPT and BERT,
represent the forefront of Natural Language Processing (NLP) advancements. Trained on
vast datasets, these models excel in understanding, generating, and interacting with human
language. As cutting-edge embodiments of NLP, LLMs are pushing the boundaries of what
machines can achieve in terms of linguistic capabilities.
Applications in Daily Life:
ˆ Search Engines: Google uses NLP to understand and rank web pages.
ˆ Virtual Assistants: Siri and Alexa process user commands using NLP.

Why NLP is the Talk of the Town


Ubiquity of NLP
From auto-correcting texts to voice-activated smart homes, NLP is everywhere. It’s revolutionizing
how we interact with machines and access information.
Real-World Impact
NLP analyses public sentiment during global events, like elections, providing insights that
can shape strategies and decisions.
Challenges and Limitations
While NLP has made significant strides, it’s not without challenges. Factors like sarcasm,
cultural nuances, and multilingual contexts can complicate language processing. Continued
research is critical to refining and enhancing NLP capabilities.
Modern Use Cases:
ˆ Sentiment Analysis: Companies gauge product reception through online reviews.
ˆ Chatbots: Businesses use NLP-driven bots for customer service.
ˆ Machine Translation: NLP powers real-time language translation tools, breaking
down communication barriers across the globe.

Mauricio Henriquez Schott, Ph.D. [email protected] 5


1 Introduction to Natural Language
Processing (NLP)

NLP: Under the Hood


Software Foundations
At the heart of NLP systems lie sophisticated algorithms often built on frameworks like
TensorFlow, PyTorch, and Hugging Face’s Transformers. These software libraries provide
the tools to design, train, and implement large-scale language models.

Hardware Backbone
NLP models, especially LLMs, demand immense computational power. Companies leverage
GPUs and TPUs for training and inference. Specialized hardware like NVIDIA’s A100
Tensor Core GPUs play a pivotal role in accelerating NLP tasks.

Optimization Techniques
To deploy NLP efficiently, techniques like model pruning, quantization, and distillation
are used. These reduce model size without sacrificing much accuracy, ensuring smoother
deployment on edge devices.
Steps to Create an NLP Model:

ˆ Data Collection: Gather and preprocess a vast dataset or Corpus, usually textual
data, from diverse sources.

ˆ Model Architecture: Choose an appropriate neural network architecture, such as


RNNs, CNNs, or Transformers.

ˆ Training: Feed the data into the model, adjusting weights using backpropagation and
optimization algorithms.

ˆ Evaluation: Test the model’s performance on unseen data, assessing its accuracy and
understanding.

ˆ Fine-tuning: Adjust the model using smaller, domain-specific datasets for specialized
tasks.

ˆ Deployment: Integrate the trained model into applications, ensuring it can handle
real-world inputs efficiently.

From Lexemes to Intent: Decoding Language’s DNA


Understanding NLP Steps
NLP unravels human language through layered processes, from dissecting text to discerning
intent. These steps—lexical to pragmatic analysis—ensure machines grasp our language’s
depth, enabling more natural human-computer interactions.
Phases of a NLP System:

Mauricio Henriquez Schott, Ph.D. [email protected] 6


1 Introduction to Natural Language
Processing (NLP)

ˆ Lexical analysis or Morphological: Lexical means the collection of words and


phrases in a language. Lexical analysis breaks the whole chunks of text into words,
paragraphs, and sentences. It includes identifying and analyzing the structure of words.
For example, A word like ‘dishonest’ can be broken into ‘dis-honest.’

ˆ Syntactic analysis: Syntactic analysis checks the grammar in a sentence, and it


arranges words in a manner that represents the relationship among the words. For
example: Car you a have
The above sentence has no correct meaning, and also it is not correct grammatically.
So, this sentence will be rejected by syntactic analysis.

ˆ Semantic analysis: Semantic analysis is used to check the text’s meaningfulness. It


extracts the correct meaning form the text. Examples: Cold coffee, iced tea, etc.

ˆ Disclosure integration: Basically, it quickly describes the right meaning of any


correct sentence.

ˆ Pragmatic analysis: In this phase, pragmatic analysis deals with the entire communication
and social content, and it provides the actual meaning of what was said in reinterpreted.
For example: ’Let me water’ is interpreted as a request instead of an order.

Figure 1: Phases of a NLP.

Mauricio Henriquez Schott, Ph.D. [email protected] 7


2 Large Language Models (LLM)

2 Large Language Models (LLM)

The odyssey of Large Language Models (LLMs) commenced in the 1960s with the inception
of the first-ever chatbot, Eliza, engineered by MIT savant Joseph Weizenbaum. Despite its
elementary pattern recognition ability, Eliza kindled the flames of what would later burgeon
into the sophisticated realm of Natural Language Processing (NLP) we are familiar with
today. Over the ensuing decades, a slew of significant innovations drove the field of LLMs
forward. Notable among these milestones was the unveiling of Long Short-Term Memory
(LSTM) networks in 1997, which heralded the creation of deeper and more intricate neural
networks capable of wrangling vast amounts of data. A further leap was witnessed with the
introduction of Stanford’s CoreNLP suite in 2010, offering a toolkit to tackle complex NLP
tasks such as sentiment analysis and named entity recognition.
The plot thickened in 2011 with the launch of Google Brain, a venture that equipped
researchers with formidable computing resources and datasets, alongside avant-garde features
like word embeddings, enabling NLP systems to better grasp the context of words. This
initiative paved the way for monumental advancements like the introduction of Transformer
models in 2017, which, in turn, birthed more sophisticated LLMs such as OpenAI’s GPT-
3, serving as the bedrock for ChatGPT and a plethora of other awe-inspiring AI-driven
applications.
In the contemporary scene, LLMs have demonstrated remarkable prowess in a multitude
of NLP tasks, evolving conversational AI, and showcasing impressive results where they
can generate contextually relevant and coherent responses, thus propelling the widespread
adoption of chatbots and virtual assistants. The narrative of LLMs took a cinematic turn
with the advent of GPT-3 by OpenAI in July 2020, a behemoth in the LLM arena at the
time, trained to predict the ensuing word in a sentence akin to a text message autocomplete
feature, but on a grandiloquent scale.
Fast forward to 2023, the realm of LLMs is abuzz with next-gen models like GPT-4, which
has showcased astounding capabilities with complex reasoning, advanced coding proficiency,
and human-level performance in multiple academic exams. The landscape is now dotted with
illustrious models like GPT-3 from OpenAI, PaLM or LaMDA from Google, Galactica or
OPT from Meta, Megatron-Turing from Nvidia/Microsoft, and Jurassic-1 from AI21 Labs,
each vying for the crown in a burgeoning kingdom of linguistic prowess.
However, amidst this effulgence of advancements, the domain of LLMs is not devoid
of ethical quandaries. The very essence of LLMs, their ability to generate text based on
colossal datasets, beckons a plethora of ethical and moral concerns. These range from the
propensity of LLMs to perpetuate existing biases present in the training data, to serious
contemplations regarding the responsibility for outputs generated by LLMs. As LLMs
continue to proliferate and permeate various facets of society, the dialogue around their
ethical and moral implications is burgeoning, with scholars and practitioners alike delving
into topics like the capacity for moral self-correction in LLMs, the knowledge of cultural moral
norms, and the practical and ethical challenges posed by LLMs, especially in education.

Mauricio Henriquez Schott, Ph.D. [email protected] 8


2 Large Language Models (LLM)

The trajectory of LLMs is a testament to human ingenuity and the relentless pursuit
of knowledge, mirrored in the ceaseless advancement of these linguistic behemoths. As we
stand on the cusp of further groundbreaking discoveries in this domain, the tale of LLMs is
far from over; it’s a riveting saga that continues to unfold, with each chapter promising a
blend of awe, enlightenment, and a cadre of ethical deliberations awaiting resolution.
LLMs - Delving into the Architecture
Architecture (Transformers)
LLMs typically employ a Transformer architecture, which consists of an encoder and a
decoder. However, models like GPT-3 only use the decoder part. This architecture handles
sequential data efficiently, making it apt for language processing.

Embedding Layer
The first step in processing text is converting words into numerical vectors using an embedding
layer. This transformation captures semantic relationships between words.

Self-Attention Mechanism
The Transformer utilizes a self-attention mechanism, allowing each input sequence element
to focus on different parts, capturing dependencies regardless of sequence positions.

Deciphering Tokens
Tokens represent chunks of text, like words or characters, that the model processes. For
instance, in GPT-3, a model might have a token limit of 2048 tokens, while GPT-4 could
potentially handle even more. This means that for a chat interaction, GPT-3 can consider
up to 2048 tokens in a single input-output sequence, determining its response based on that
context.

Mauricio Henriquez Schott, Ph.D. [email protected] 9


2 Large Language Models (LLM)

Figure 2: (a) In the architecture Encoder-Decoder, the input sequence is first encoded
into a state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers [Hol23].

Beyond Words: Navigating the Landscape of Modern LLMs


Current LLM Models
In the current landscape of 2023, Large Language Models (LLMs) have witnessed significant
advancements in terms of size and capabilities. OpenAI’s GPT-4 stands as one of the
largest models with an undisclosed number of parameters, which are known to be the highest
among existing models.This model is multimodal, capable of handling both text and image
inputs and providing text outputs. Other notable LLMs include BLOOM, with 176 billion
parameters, and models like ChatGPT, LaMDA, and Jurassic-1, each specializing in various
tasks and domain.
The core technical facets of LLMs revolve around parameters and tokens. Parameters
are the elements within the model that are fine-tuned during the training process. They
are essentially the weights in the neural networks that help the model learn and generalize
from the training data to unseen data. The number of parameters is a key metric that often
correlates with the model’s capability; for instance, crossing the threshold of 100 billion
parameters has been associated with enhanced reasoning abilities in LLM.
Tokens are the basic units of text that LLMs process. In essence, a token can be as short
as one character or as long as one word, and in some cases, it could encapsulate even more
information. The tokenization process breaks down input text into these manageable units
which the model then processes to understand the text, generate responses, or perform other
specified tasks.
Furthermore, the field has witnessed a dichotomy between proprietary and open-source
models. While proprietary models like GPT-4 often come with advanced capabilities, there

Mauricio Henriquez Schott, Ph.D. [email protected] 10


2 Large Language Models (LLM)

Model N° Parameters Tokens Open Source VRAM Requirements Company Main Purpose
GPT-3 175B 2k No High OpenAI Conversational AI
GPT-4 1.76T 32k No Very High OpenAI Conversational AI, Text Generation
BERT 1.5B 512 Yes Depends on Task Google Text Classification, NER, etc.
Llama2 7B, 13B, 70B 4k Yes 16Gb, 32Gb, ... Meta Conversational AI, Code Interpreter, etc
Vicuna 7B 16k Yes 16Gb LMSYS Conversational AI
Mistral 7B 8k Yes 16Gb Mistral AI Conversational AI
Orca 3B, 7B, 13B 8k Yes 8Gb, 16Gb, 32Gb Microsoft Progressive Learning
Table 1: Current LLM Models.

is a significant push towards the democratization of LLMs. Recent efforts aim at advancing
open-source smaller models by distilling knowledge from larger, often proprietary, models.
This endeavor seeks to bridge the computational requirements and to allow for broader access
and utilization of LLM capabilities.
The GPT-4 model, in particular, has exhibited a broad spectrum of capabilities including
complex reasoning understanding, advanced coding ability, and proficiency in multiple academic
domains, thereby showcasing a trajectory of rapid advancement and a promise of near human-
level performance in certain task [tab.1].

Parameter Proliferation: A Glimpse into LLMs’ Expanding Complexity

Figure 3: NLP Models Size [Dat23].

Mauricio Henriquez Schott, Ph.D. [email protected] 11


2 Large Language Models (LLM)

NLP/LLM Models Size Evolution

ˆThe size of Large Language Models (LLMs) has seen a substantial increase, aligning with
a trend of 10x growth in parameters every year for a few consecutive years, akin to a new
form of Moore’s Law.

ˆThis evolution reflects the models’ pursuit of better semantic understanding and general-
purpose language processing capabilities.

ˆFor instance, GPT-2, which was finalized in 2019, had 1.5 billion parameters and was noted
for its ability to produce convincing prose.

ˆFollowing this, GPT-3, with 175 billion parameters, made a significant leap in model size.

ˆFurthermore, a model named PaLM showcased almost 3 times the parameter count of GPT-
3, tallying at 540 billion parameters.

ˆGPT-4 (2013), with a speculated parameter count ranging from over 1 trillion to 170 trillion,
exhibits advanced capabilities like enhanced text generation, image handling, interactive
chatting, and better business decision-making, representing a significant advancement in
Large Language Models.

ˆThis trajectory, however, raises concerns regarding computational resources, as LLMs require
massive amounts of data and computational power during training and operation, which may
lead to diminishing returns, increased costs, and added complexity.

Mastering LLM Customization


Specialized Training
Specialized training tailors LLMs to specific tasks or domains, optimizing their performance.
Fine-tuning, a prevalent method, adjusts the LLM’s weights based on a custom dataset,
enhancing the model’s capabilities for the intended applications.
Dataset Utilization
Datasets aligned with the target domain or task are crucial. They should be sufficiently
large and well-structured, representing the target domain effectively. The data can be raw
text or structured, and preprocessing like cleaning and tokenization is vital before training.
Innovative Techniques
Techniques like SteerLM have emerged to customize LLMs, aligning their responses with user
preferences. Such innovations continue to push the boundaries of what LLMs can achieve,
making them more user-centric and application-specific.

Mauricio Henriquez Schott, Ph.D. [email protected] 12


2 Large Language Models (LLM)

Sampling and Scaling:

ˆ Tools: Tools like Hugging Face provide diverse datasets and pre-trained models,
enabling customization for various applications.
ˆ Scaling: As models scale, the data requirements grow exponentially. For instance,
GPT-3’s training data was vast, but a values-targeted dataset with just 80 text samples
was used to refine its behavior.
ˆ Code Samples: Practical examples and code samples are crucial for understanding
the customization process, aiding in fine-tuning models efficiently for specific tasks.

LLMs - Further Dive and Application


Feed-forward Neural Networks
Post attention mechanism, the model employs feed-forward neural networks at each sequence
position simultaneously.

Layer Normalization and Residual Connections


These elements aid in training deep networks by preventing the vanishing gradient problem
and facilitating the re-use of previously learned features.

Parameter Sharing and Training


LLMs share parameters across layers, reducing the total parameters and allowing generalization
across tasks. Training involves large datasets using supervised, unsupervised, or semi-
supervised learning.

Understanding Input Parameters


When we say a model has ”7 billion input parameters,” it refers to the number of tunable
weights in the model. These weights are adjusted during training to help the model recognize
and learn patterns in data. Essentially, the number of parameters gives us an idea of the
model’s complexity and its capacity to learn from vast amounts of data.

Mauricio Henriquez Schott, Ph.D. [email protected] 13


2 Large Language Models (LLM)

LLMs - Embeddings, Translating to a Lower-Dimensional Space


The discourse surrounding Large Language Models (LLMs) is significantly underpinned by
the concept of embeddings, a critical element in the domain of language processing. The
understanding of embeddings commences with an introduction to vector spaces, where words
or phrases are mapped to vectors of real numbers. This mapping facilitates the application of
mathematical operations on linguistic entities, thereby enabling the computation of semantic
similarities and discernment of relationships among words or phrases.
The concept of a vector database is central to the discussion of embeddings. A vector
database is a structured repository where each word or phrase is associated with a vector
in a high-dimensional space. These dimensions encapsulate various linguistic or semantic
properties. For instance, consider a simplistic 2-dimensional vector space, where one dimension
might represent a word’s tense, and the other its plurality. In practical scenarios, vector
spaces are multi-dimensional, often encompassing hundreds or thousands of dimensions, each
capturing a nuanced linguistic or semantic trait.
The historical evolution of embeddings in LLMs can be traced back to seminal techniques
like Word2Vec and GloVe. These techniques were instrumental in mapping words to vectors
in a manner that facilitates the capture of semantic nuances in a machine-understandable
format. The advent of these techniques marked the dawn of modern Natural Language
Processing (NLP), paving the way for the development of sophisticated LLMs like BERT
and GPT-3.
In the domain of LLMs, embeddings serve as the underlying mechanism that powers the
ability of these models to comprehend and generate text. They are the critical components
that enable LLMs to perceive the subtleties of language, thereby enhancing the naturalness
and intuitiveness of interactions with machines.
However, the advancement in embeddings presents a set of challenges. The increasing
size of embeddings, driven by ever-expanding datasets and the quest for higher accuracy,
necessitates substantial computational power. The larger the model, the more the computational
resources required, and the more intricate the embeddings, the longer the training duration.
The narrative is further complicated by the emergence of multimodal models, which aim
to understand not merely text, but a variety of data types, necessitating a combination
of embeddings. The computational demands, along with the need for efficient storage and
access to large embedding matrices, pose significant challenges.
Furthermore, as the pursuit for richer representations in higher dimensions continues, the
curse of dimensionality becomes a pertinent concern. This phenomenon results in certain
computations becoming exponentially more challenging as the dimensionality increases.
The challenges associated with embeddings necessitate the development of optimized
algorithms, efficient hardware, and novel methodologies to manage, store, and compute
embeddings. These challenges present a fertile ground for innovation and mark the pathway
for the next generation of advancements in LLMs.
The domain of embeddings is crucial for the advancement of LLMs, and a thorough
understanding of this domain is akin to possessing a critical lexicon that aids in deciphering
the complex language of machines. While the challenges are substantial, the potential for

Mauricio Henriquez Schott, Ph.D. [email protected] 14


2 Large Language Models (LLM)

groundbreaking advancements in LLMs through a mastery of embeddings is boundless.

In the realm of Large Language Models (LLMs), understanding the concept of word embeddings
and vector spaces is pivotal. This understanding begins with the transformation of words
into vectors in a manner that can be comprehended and manipulated mathematically. Here,
an example is provided to elucidate this concept further.
Consider three words: King, Queen, and Man. The objective is to encapsulate the
semantic relationship between these words in a mathematical format. In a simplified model,
let’s assign vectors to these words such that the vector operations reveal semantic relationships.
Let:
Vector(King) = [3, 1]
Vector(Queen) = [2, 2]
Vector(Man) = [3, 0]
Now, we aim to capture the gender relationship between these words. Intuitively, we
could say that a King is to Queen as a Man is to Woman. Mathematically, this relationship
can be represented using vector arithmetic as follows:
Vector(King) − Vector(Man) = Vector(Queen) − Vector(Woman)
Substituting the known vectors, we get:
[3, 1] − [3, 0] = [2, 2] − Vector(Woman)
This simplifies to:
[0, 1] = [2, 2] − Vector(Woman)
Now, solving for Vector(Woman), we may obtain a vector such as [2, 1] that satisfies
the relationship. This operation has provided a simplistic representation of how gender
relationships can be captured using vector arithmetic.
In practical scenarios, the vector space is multi-dimensional, often comprising hundreds
or thousands of dimensions, and the vectors are obtained through training on large datasets.
For instance, models like Word2Vec or GloVe are trained on vast corpora of text to learn
vectors for words such that the vector arithmetic reveals semantic relationships akin to the
one demonstrated above.
This example provides a glimpse into the powerful capability of embeddings in capturing
semantic relationships, which is central to the operation of Large Language Models.

Even a small multi-dimensional space provides the freedom to group semantically similar
items together and keep dissimilar items far apart. Position (distance and direction) in
the vector space can encode semantics in a good embedding. For example, the following
visualizations of real embeddings show geometrical relationships that capture semantic relations
like the relation between a country and its capital [Dev23].

Mauricio Henriquez Schott, Ph.D. [email protected] 15


2 Large Language Models (LLM)

Embeddings: Word-Association Vector Representation

Figure 4: Embeddings: Word-Association Vector Representation [Dev23].

LLM Implementation using Torch and a LSTM Model


LSTM using Torch
The provided pseudocode offers a simplified representation of a language model training
process using PyTorch and the LSTM architecture. In real-life applications, such as GPT-
3, the datasets are in the terabyte range, with models comprising billions of parameters.
While LSTMs can capture sequential information, state-of-the-art models like GPT-3 use
Transformer architectures, which excel in capturing long-range dependencies and context.
Training these mammoth models requires extensive computational resources and can take
weeks on specialized hardware. [list.1].

Mauricio Henriquez Schott, Ph.D. [email protected] 16


2 Large Language Models (LLM)

Python & Torch Library LSTM-LLM Example:


# Simulated dataset for demonstration. Real LLMs use terabytes of diverse texts.
data = "Once upon a time in a land far away, there was a brave knight..."

# Tokenization: Convert text into tokens. Real scenarios may use subwords or characters.
tokens = tokenize(data); vocab = createVocabulary(tokens)
word_to_idx, idx_to_word = createMappings(vocab)

# Create sequences to predict the next word.


SEQUENCE_LENGTH = 4; sequences, next_tokens = createSequences(tokens, SEQUENCE_LENGTH)

# Define the simple language model.


class SimpleLM:
function __init__(vocab_size):
# Embedding: Convert word indices to vectors. In GPT-3, this has millions of parameters.
self.embedding = Embedding(vocab_size, 50)

# LSTM: Learns patterns in sequences. GPT-3 uses Transformers for better long-term
dependencies.
self.lstm = LSTM(50, 100)

# Fully connected layer: Produces predictions. Real LLMs can have billions of parameters.
self.fc = FullyConnected(100, vocab_size)

function forward(x):
x = embed(x)
lstm_out = passThroughLSTM(x)
return producePredictions(lstm_out)

# Training setup
model = SimpleLM(len(vocab)); optimizer, criterion = setupTraining()

# Training loop: Real LLM training can take weeks on powerful hardware.
for epoch in range(100):
for seq, next_token in sequences:
optimizer.resetGradients()
input_seq = convertToIndices(seq)
output = model.predict(input_seq)

# Text generation
function generate_text(model, start_text, length=10):
...
return concatenateWords(words)

print(generate_text(model, "Once upon a time"))


Listing 1: Python summary code for a LSTM-LLM Model Implementation
(lstm-llm01.py)

Mauricio Henriquez Schott, Ph.D. [email protected] 17


2 Large Language Models (LLM)

Main Steps in the Code:

1.Tokenize dataset and establish vocabulary.


2.Construct LSTM-based language model.
3.Train model on sequence predictions.
4.Generate text from trained model.

Transformers: More than Meets the Eye


Attention is all you need
A new generation of powerful language models began with a breakthrough discovery in
2017, introducing a revolutionary AI structure called Transformer in a landmark paper
”Attention is all you need” [Vas+17]. This encoder-decoder architecture composed by
stacks of transformer layers, depicted below, quickly became popular for Natural Language
Processing (NLP) problems.

Key Characteristics:
ˆIts innovative use of attention mechanisms and parallel processing set this model apart
from the traditional Convolutional Neural Networks (CNN) and recurrent Long-Short Term
Memory (LSTM) networks. The network processed data sequences in parallel and used
attention layers to simulate the focus of attention in the human brain.

ˆThis mechanism connects relationships between words in the text, making it much more
efficient to process large sequences. As a result, the parallel nature of this architecture took
full advantage of graphics processors, and the attention layer eliminated the problem of
forgetting that plagues recurrent networks.

ˆIn the left diagram, you can see the activation of an attention layer in action. An attention
layer can handle many head attentions. These activations represent the significant associations
learned by the model during training.

Mauricio Henriquez Schott, Ph.D. [email protected] 18


2 Large Language Models (LLM)

1.- Inputs and Input Embeddings


The tokens entered by the user are considered inputs for machine learning models. However,
models understand numbers, not text, necessitating the conversion of these inputs into a
numerical format called “input embeddings.” Input embeddings place words in a mathematical
space where similar words are nearby. During training, the model learns to create these
embeddings, ensuring vectors represent words with similar meanings.
2.- Positional Encoding
The word order in a sentence is vital for its meaning. Traditional machine learning models
don’t inherently understand input order. Positional encoding encodes each word’s position in
the input sequence. This inclusion in the Transformer architecture allows GPT to understand
word order, producing grammatically and semantically correct outputs.
3.- Encoder
The encoder tokenizes the input text into tokens, like words or sub-words. It then applies self-
attention layers to generate hidden states representing the input text at various abstraction
levels. The transformer uses multiple encoder layers.

Figure 5: Transformers Architecture [Vas+17].

4.- Outputs (shifted right)


During training, the decoder predicts the next word by referencing preceding words. GPT
is trained on vast text data, with GPT-3 boasting 175 billion parameters. Training data
includes the Common Crawl web corpus, BooksCorpus dataset, and English Wikipedia.

Mauricio Henriquez Schott, Ph.D. [email protected] 19


2 Large Language Models (LLM)

5.- Output Embeddings


Like input embeddings, the model outputs must be translated to a numerical format, “output
embeddings.” These undergo positional encoding. A loss function measures the model’s
prediction accuracy, adjusting parts of the model to enhance accuracy. Output embeddings
are used in both training and inference in GPT.

6.- Decoder
Positionally encoded input representation and output embeddings are processed by the
decoder. It generates the output sequence based on the encoded input sequence. Like
the encoder, the transformer uses multiple decoder layers.

7.- Linear Layer and Softmax


After the decoder, the linear layer maps the output embeddings to a higher-dimensional
space. Then, the softmax function generates a probability distribution for each output
token in the vocabulary.

Mauricio Henriquez Schott, Ph.D. [email protected] 20


2 Large Language Models (LLM)

LLM Implementation using Torch and a Transformers Model


Python & Torch Library Transformers-LLM Example:
import torch # Popular library for deep learning.
import torch.nn as nn # Importing neural network modules from PyTorch.

class SelfAttention(nn.Module):
def __init__(self, embed_size, heads):
super(SelfAttention, self).__init__()
# Initialize layers for Q, K, V, and output.
self.layers = self._init_layers(embed_size, heads)

def forward(self, values, keys, query, mask):


# Compute attention and return result.
return self.layers(query, keys, values, mask)

class TransformerBlock(nn.Module):
def __init__(self, embed_size, heads, dropout, expansion):
super(TransformerBlock, self).__init__()
self.attention = SelfAttention(embed_size, heads)
# Other layers like normalization and feed-forward are initialized here.
self.layers = self._init_layers(embed_size, dropout, expansion)

def forward(self, value, key, query, mask):


# Process through attention and other layers.
return self.layers(value, key, query, mask)

class Transformer(nn.Module):
def __init__(self, vocab_size, embed_size, num_layers, heads, device, expansion, dropout):
super(Transformer, self).__init__()
# Define embeddings and transformer blocks.
self.embedding = nn.Embedding(vocab_size, embed_size)
self.transformer_blocks = self._init_blocks(embed_size, heads, dropout, expansion,
num_layers)
self.fc_out = nn.Linear(embed_size, vocab_size)

def forward(self, x, mask):


# Process sequence through transformer.
x = self.embedding(x)
for block in self.transformer_blocks:
x = block(x, x, x, mask)
return self.fc_out(x)

# Instantiate the model.


model = Transformer(len(vocab), 512, 6, 8, torch.device("cuda"), 4, 0.1)
Listing 2: Python summary code for a Transformers-LLM Model Implementation
(transformers-llm02.py)

Mauricio Henriquez Schott, Ph.D. [email protected] 21


2 Large Language Models (LLM)

Main Steps in the Code:


1.Tokenize dataset and establish vocabulary.
2.Construct Transformer-based architecture:
a) Define self-attention mechanism to weigh word relationships.
b) Build Transformer blocks with attention and feed-forward layers.
c) Incorporate positional embeddings to understand word order.
3.Train model on sequence predictions.
4.Generate text from trained model.

Transformers using Torch


This code defines a simplified Transformer model using PyTorch. The SelfAttention class
computes how words in a sentence relate to each other. The TransformerBlock integrates
attention with other neural layers for processing sequences. The main Transformer class
combines these blocks to model relationships in input data over multiple processing stages.
When instantiated, model represents this Transformer architecture, ready to be trained on
data. [list.2].

Attention vs. Traditional Methods


The transformer architecture surpasses others like Recurrent Neural Networks (RNN) or
Long Short-Term Memory (LSTM) in natural language processing due to its ”attention
mechanism”. While RNNs process input sequentially, Transformers tackle the entire input
simultaneously, leading to faster processing and recognizing intricate connections between
words.
Why Not LSTMs?
LSTMs utilize a hidden state for recalling past events, but they falter with too many layers
due to the vanishing gradient problem. Transformers, however, discern how input and output
words correlate by observing them simultaneously, excelling in recognizing long-term word
connections.
Summing It Up:
ˆ The attention mechanism enables models to focus selectively on various input sequence
parts.
ˆ Captures distant input relationships, beneficial for natural language tasks.
ˆ Requires fewer parameters for modeling long-term dependencies, emphasizing only
relevant inputs.
ˆ Excellently manages varying input lengths by adjusting its attention based on sequence
length.

Mauricio Henriquez Schott, Ph.D. [email protected] 22


2 Large Language Models (LLM)

Unleashing LLM Power with APIs


LLM API Overview
LangChain and other APIs for LLMs access from programming languages as python, are
platforms providing tools and APIs to facilitate the integration of LLMs into projects, thus
harnessing advanced language processing capabilities [Lan23].
Ease of Application Development
These APIs, simplifies the process of creating applications powered by LLMs, providing
components and interfaces for managing interactions with remote and local language models.
Open-Source Framework
Langchain and others, are open-source frameworks, they serves as an abstraction layer for
applications utilizing LLMs, promoting innovation and ease of customization.
Key Features:
ˆ Model Interface: Generic interface to a variety of foundational models.

ˆ Prompt Management: Framework for managing prompts efficiently.

ˆ Central Interface: Interface to long-term memory, external data, and other LLMs.

Figure 6: LLM and API Interaction, Query-Chain Example.

Mauricio Henriquez Schott, Ph.D. [email protected] 23


2 Large Language Models (LLM)

Langchain and OpenAI API Example


Langchain & OpenAI Query-Chain Example:
# Extracting text from the PDF
for page in pdf_reader.pages: text += page.extract_text()

# Initializing text splitter with specified chunk size and overlap


text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200,
length_function=len)
# Splitting the text into chunks for further processing
chunks = text_splitter.split_text(text=text)

# Checking if the vector store already exists, if not creating a new one
if os.path.exists(f"{store_name}.pkl"): # Loading the vector store from disk...
VectorStore = pickle.load(f)
else:
# Creating embeddings for text chunks
embeddings = OpenAIEmbeddings()
# Creating a new vector store from the text chunks
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
# Saving the vector store to disk for future use...
pickle.dump(VectorStore, f)

# Accepting user questions/query


query = st.text_input("Ask questions about your PDF file:")

# Performing similarity search to find relevant text chunks


docs = VectorStore.similarity_search(query=query, k=3)

# Initializing the OpenAI LLM


llm = OpenAI()
# Loading a QA chain for processing the query
chain = load_qa_chain(llm=llm, chain_type="stuff")
with get_openai_callback() as cb:
# Running the QA chain to get a response
response = chain.run(input_documents=docs, question=query)

st.write(response) # Displaying the response to the user


Listing 3: Python summary code for OpenAI LLM Query-Chain (chatest01.py)

LLM API Access


The code creates a Streamlit web application where users can upload a PDF, extract its
text, and break it into chunks. These chunks are then transformed into embeddings using
OpenAI and stored efficiently using FAISS. Users can then input questions related to the
PDF content, and the application employs an OpenAI Large Language Model to search for
the most relevant chunks and provide corresponding answers. [list.3].

Mauricio Henriquez Schott, Ph.D. [email protected] 24


2 Large Language Models (LLM)

Agent Alchemy: Transmuting Text to Tasks


LangChain Agents
LangChain provides a foundation for working with agents that use LLMs to choose sequences
of actions. It employs a language model as a reasoning engine, determining actions based
on user inputs [Lan23].

AutoGen: Microsoft’s LLM Agent Creator


AutoGen, a Microsoft initiative, simplifies the development of LLM-based applications with
a multi-agent framework, supporting complex workflows and conversations among agents to
collaboratively solve tasks [Aut23].

DeepPavlov Agent: Multi-Skill Conversational Maestro


DeepPavlov, an open-source library, enables the creation of multi-skill conversational agents
using LLMs for sophisticated, context-aware dialog systems handling diverse tasks and
queries. [Tea23].
Supplementary Insights:
ˆ LLM Utilization: All three platforms harness the power of Large Language Models
(LLMs) to enhance their functionalities.
ˆ Agent Framework: They provide frameworks for developing and managing agents
to interact with users or other systems.
ˆ Task Automation: Agents across these platforms can be programmed to automate
tasks and provide intelligent responses.
ˆ Modular Design: They support modular designs, allowing for the integration of
various skills, tools, and external APIs.

Figure 7: LLM and API Interaction, Agents [Gre23].

Mauricio Henriquez Schott, Ph.D. [email protected] 25


2 Large Language Models (LLM)

Langchain Agent Example


Langchain & Agents Example:
llm = ChatOllama(model="llama2") # Creating an instance of ChatOllama or ChatOpenAI, etc

@tool # Decorating the function to be recognized as a tool within langchain environment


def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word) # Returns the length of the given word

# Creating a chat prompt template


prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
("user", "{input}"),MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Binding the formatted tools to the llama language model instance


llm_with_tools = llm.bind(
functions=[format_tool_to_openai_function(t) for t in tools] # Formatting the tools using
langchain’s render function
)

# Invoking the agent with an input question


agent.invoke({
"input": "how many letters in the word education?",
"intermediate_steps": []
})

# Importing the AgentFinish class to identify when the agent has completed its task
while True: # Infinite loop to continue processing until a finish signal is received
output = agent.invoke({
"input": "how many letters in the word education?",
"intermediate_steps": intermediate_steps
})
if isinstance(output, AgentFinish): # if the output is an instance of AgentFinish, finish
final_result = output.return_values["output"]; break
else:
print(output.tool, output.tool_input)
tool = {"get_word_length": get_word_length}[output.tool]

print(final_result) # Output the final result

# Creating an instance of AgentExecutor with the defined agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Invoking the agent executor with an input question


agent_executor.invoke({"input": "how many letters in the word education?"})
Listing 4: Python summary code for Langchain Agents (autoagents-openai-01.py)

LLM Agent APIs


The code showcases how Agent APIs like Langchain and Autogen, initialize models, define
tools, create prompt templates, bind tools to models, and orchestrate the agent pipeline
to process input and produce output. The Agent APIs in general facilitate the creation,
management, and execution of agents, offering a structured approach to building and running
interactive AI systems. [list.4].

Mauricio Henriquez Schott, Ph.D. [email protected] 26


2 Large Language Models (LLM)

Autogen Agent Example


Autogen & Agents Example:
import autogen

# Configuration for initializing a language model with API details.


# Define a list of configurations for initializing a language model; in this case,
specifying GPT-4 with an API key.
#config_list = [
# {
# ’model’: ’gpt-4’,
# ’api_key’: ’sk-AOeQyDCyLIb8k-----’
# }
#]

config_list = [
{
"api_type": "open_ai",
"api_base": "http://localhost:1234/v1",
’api_key’: "NULL"
}
]

# Settings for the LLM, including timeout and randomness control.


llm_config = {
"request_timeout": 600,
"seed": 42,
"config_list": config_list,
"temperature": 0
}

# Instantiate the AssistantAgent and UserProxyAgent for interactions.


assistant = autogen.AssistantAgent(name="CTO", llm_config=llm_config, system_message="
Chief
technical officer")
user_proxy = autogen.UserProxyAgent(name="user_proxy", llm_config=llm_config,
system_message="Reply TERMINATE if task is solved, else reply CONTINUE."
)

# Define tasks and initiate chats for them.


task1 = "Write python code to output numbers 1 to 100 and store in a file"
user_proxy.initiate_chat(assistant, message=task1)

task2 = "Modify the code to output numbers 1 to 200"


user_proxy.initiate_chat(assistant, message=task2)

# Initiate another chat with the assistant to handle the second task. The task message is
passed to the assistant through the user_proxy agent.
user_proxy.initiate_chat(assistant, message=task2)
Listing 5: Python summary code for Autogen Agents (autoagents-local llm-02.py)

LLM Agent APIs


The code integrates the autogen framework to create a conversational interface between a
user and an AI ”assistant” agent. This agent, named ”CTO”, is tasked with generating
Python code based on the user’s instructions. The interface also employs a UserProxyAgent
to mediate interactions and handle specific commands. Notably, with the integration of LM
Studio [Stu23], the system can seamlessly switch between using local models and those hosted

Mauricio Henriquez Schott, Ph.D. [email protected] 27


2 Large Language Models (LLM)

by OpenAI or other remote servers. This flexibility ensures that developers can maintain a
consistent codebase without having to adjust for the model’s location, be it local or remote.
[list.5].

Enhanced Access to LLM Capabilities


Cloud Enablers for LLM Exploration
Hugging Face, Google Colab and Weight & Biases, are key cloud platforms facilitating easy
access to Large Language Models (LLMs). Hugging Face provides a repository of pre-trained
models, while Google Colab offers a collaborative notebook environment with computational
resources, making LLM experimentation and deployment accessible to a wider audience
[Hig23; Col23; Bia23].

Local and Remote LLM Interactions


Ollama, PrivateGPT, GPT4All, and other frameworks, simplifies running LLMs locally
or remotely with GPU acceleration if available, providing a access APIs and a CLI for
interaction, aiding developers in effortlessly integrating LLM capabilities (chat, query-chain
and retrieval, RAG, etc.) into their applications [Oll23; Pri23; GPT23].

User-Friendly LLM Interfacing: Diverse Paths to Text Generation


LM Studio, KoboldCpp, and Oobabooga’s Text Generation WebUI embody the essence of
making LLMs accessible to a broader audience. They provide diverse avenues for both
novices and seasoned users to explore, interact with, and leverage the power of LLMs for
various text generation tasks [Stu23; Kob23; Web23].

Figure 8: LLM Advance Features, Retrieval Augmented Generation (RAG) Example


[Nam23].

Mauricio Henriquez Schott, Ph.D. [email protected] 28


2 Large Language Models (LLM)

Langchain & Ollama RAG Example


Ollama+RAG Example:
# RAG prompt
from langchain import hub # Import the hub module from langchain
QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-llama") # Pull a specific RAG prompt from the
langchain hub

# LLM
llm = Ollama(model="llama2", # Specify the language model to use
verbose=True, # Set verbose to True for more detailed output
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])) # Set up a callback
manager with a specified callback handler
print(f"Loaded LLM model {llm.model}") # Output the loaded language model

# QA chain
from langchain.chains import RetrievalQA # Import the RetrievalQA class from langchain.
chains module
qa_chain = RetrievalQA.from_chain_type(
llm, # Specify the language model to use
retriever=vectorstore.as_retriever(), # Set up a retriever using the vector store
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}, # Specify additional arguments for the
chain type
)

# Ask a question
question = f"What are the latest headlines on {url}?" # Formulate a question ....
result = qa_chain({"query": question}) # Pass the question to the QA chain and store the
result
Listing 6: Python summary code for Ollama+RAG (rag.py)

RAG
The RAG (Retrieval-Augmented Generation), refers to augmenting the generative capabilities
of LLMs with external knowledge retrieval. This is achieved by integrating a retrieval
mechanism that fetches relevant information from external sources, which is then used by the
LLM to generate more informed and contextually relevant responses. It’s a way to expand
the knowledge base of LLMs beyond what they were trained on, enhancing their ability to
provide accurate and up-to-date responses [list.6].

Mauricio Henriquez Schott, Ph.D. [email protected] 29


2 Large Language Models (LLM)

Figure 9: LM Studio [Stu23]

Figure 10: KoboldCpp [Kob23].

Easy Access to LLM Capabilities with UI


LM Studio
LM Studio is an easy to use desktop app for experimenting with local and open-source Large
Language Models (LLMs). Cross platform app, allows the use of models through the in-app
Chat UI or an OpenAI compatible local server, use any model from HuggingFace.

KoboldCpp
A simple one-file way to run various GGML and GGUF models with KoboldAI’s UI

Oobabooga
A Gradio web UI for Large Language Models.

Figure 11: Oobabooga Text Generation Web UI [Web23].

Mauricio Henriquez Schott, Ph.D. [email protected] 30


3 LLM - Prompt Engineering

3 LLM - Prompt Engineering


Prompt Engineering: The LLM Whisperer
System and User Messages
System messages are predefined instructions guiding the model’s behavior, while user messages
are input queries from the user. Example: a system message sets a polite tone, a user message
asks for information.

Chain-of-Thought (CoT) Prompting


CoT prompting maintains a coherent narrative or argument across multiple exchanges,
ensuring the model retains context over a conversation, crucial for tasks requiring a logical
flow of ideas.

Few-Shot Prompting
Few-shot prompting presents a model with a task alongside a few examples of the desired
output, aiding the model in understanding the task at hand.
Examples and Applications
ˆ Prompt Tuning: Fine-tuning LLMs using specific prompts to enhance performance
on desired tasks.
ˆ Contextual Prompts: Utilizing previous conversation turns as prompts to maintain
context.
ˆ Task Descriptors: Employing task descriptions within prompts to guide LLMs in
task execution.
System and User Messages Example
System message: ”Respond in a formal tone.” User message: ”Tell me about the
weather.” Model response: ”The weather today is quite pleasant with a high of 75°F.”

Chain-of-Though Example
User Message: Which is a faster way to get home? Option 1: Take an 10 minutes bus,
then an 40 minute bus, and finally a 10 minute train. Option 2: Take a 90 minutes train,
then a 45 minute bike ride, and finally a 10 minute bus. Since Option 1 takes 60 minutes
and Option 2 takes 145 minutes, Option 1 is faster.
Which is a faster way to get to work? Option 1: Take a 1000 minute bus, then a half
hour train, and finally a 10 minute bike ride. Option 2: Take an 800 minute bus, then an
hour train, and finally a 30 minute bike ride.
Model response: Option 1 will take 1000+30+10 = 1040 minutes. Option 2 will take
800+60+30 = 890 minutes. Since Option 2 takes 890 minutes and Option 1 takes 1040
minutes, Option 2 is faster.

Mauricio Henriquez Schott, Ph.D. [email protected] 31


3 LLM - Prompt Engineering

n-shot Prompting
n-shot prompting, including zero-shot and few-shot prompting, are techniques where the
model is provided with zero or a few examples to learn and generalize from.

Generated Knowledge Prompting


This technique leverages the model’s ability to generate knowledge based on the provided
prompts, allowing for more informed and contextually relevant responses.

Optimizing Prompts
Optimizing prompts involves developing and refining prompts to efficiently utilize LLMs
across various applications and research topics, enhancing the model’s capabilities and
understanding its limitations.
Advanced Techniques
ˆ Tree of Thoughts: A notable technique in CoT prompting for maintaining a logical
flow of ideas.
ˆ Prompt Efficiency: Ensuring prompts are concise and effective to reduce
computational resources.
ˆ Prompt Variability: Experimenting with varying prompt structures to explore the
model’s response diversity.
Few-Shot Example
User Message:
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
Model response: Negative. //Correct
User Message:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.

Model response: The answer is True. //Incorrect

Mauricio Henriquez Schott, Ph.D. [email protected] 32


3 LLM - Prompt Engineering

Practical Applications
Prompt engineering finds its applications in creating better AI-powered services, customer-
facing chatbots, and industry-specific document generation among others.

Challenges and Considerations


Considering the right balance between prompt simplicity and effectiveness, handling out-
of-distribution queries, and ensuring model robustness are key considerations in prompt
engineering.

Future Directions
Exploration of advanced techniques, learning from in-context prompts, and evolving prompt
engineering practices to keep pace with the advancements in LLMs.
Future Horizons
ˆ Automated Prompt Engineering: Leveraging algorithms for automatic prompt
generation and optimization.
ˆ Cross-Model Prompting: Designing prompts compatible across different LLM
architectures.
ˆ Community-Driven Prompt Repositories: Establishing shared repositories for
effective prompts, aiding in the democratization of prompt engineering practices.

Mauricio Henriquez Schott, Ph.D. [email protected] 33


4 Conclusions

4 Conclusions

In reflecting upon the domain of Natural Language Processing (NLP), Large Language
Models (LLMs), and their associated APIs, several pivotal conclusions can be drawn. Firstly,
NLP has undoubtedly revolutionized the realm of human-computer interactions, facilitating
seamless and intuitive dialogues. LLMs, with their expansive architecture, particularly the
Transformer framework, have furthered this frontier by capturing intricate linguistic nuances
and dependencies.
The advent of the attention mechanism in LLMs has propelled their efficacy, allowing
them to dynamically prioritize information in vast textual data. The integration of modern
APIs, exemplified by the likes of Langchain, has democratized access to these models,
enabling developers and businesses to harness the power of LLMs with ease.
While the current trajectory of NLP and LLMs suggests a future replete with advancements,
it is crucial to navigate this domain with a keen understanding of the underlying mechanics,
especially as they become more integrated into daily life and critical business operations. The
blend of theoretical knowledge and practical implementation is paramount for the sustainable
and ethical growth of this field.

Mauricio Henriquez Schott, Ph.D. [email protected] 34


5 Additional Material

5 Additional Material

Web Sites - Online Videos

1. Tech and Futurism: LLM API Videos


2. Exploring ML and AI: LLM and Agent Videos
3. CluodYeti: LLM and AI Videos

Mauricio Henriquez Schott, Ph.D. [email protected] 35


6 Glossary of Terms and Acronyms

6 Glossary of Terms and Acronyms

ˆ NLP (Natural Language Processing): AI’s subfield for machine understanding of human
language.

ˆ LLM (Large Language Model): Deep models like GPT-3 and BERT for NLP.

ˆ Transformer Architecture: Neural network known for its self-attention mechanism.

ˆ Attention Mechanism: Weighs input sequence importance in Transformers.

ˆ Token: Text units, such as words or sub-words, for NLP processing.

ˆ Embedding: Maps words to vectors based on semantic similarity.

ˆ Encoder-Decoder: Transformer’s components for processing inputs and generating


outputs.

ˆ Positional Encoding: Gives Transformers token position information.

ˆ BERT (Bidirectional Encoder Representations from Transformers): Pre-trained model


for contextual word understanding.

ˆ GPT (Generative Pre-trained Transformer): LLM for text generation, translation, etc.

ˆ RAG (Retriever-Augmented Generation): Combines LLMs with knowledge retrieval.

ˆ Vector Storage: Saves and retrieves high-dimensional vectors, often for embeddings.

ˆ LLM Agents: Software entities using LLMs for tasks and interactions.

Mauricio Henriquez Schott, Ph.D. [email protected] 36


7 REFERENCES

7 References
References
[Vas+17] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural information processing
systems 30 (2017).
[Aut23] AutoGen. AutoGen: A Framework for Developing LLM Applications using Multi-Agent Conversations.
https://microsoft.github.io/autogen. Accessed: 2023-10-22. 2023.
[Bia23] Weights & Biases. Weights & Biases: The AI Developer Platform. Platform for managing machine
learning workflows, tracking experiments, and versioning datasets. 2023. url: https://wandb.
ai/site.
[Col23] Google Colab. Google Colaboratory. https://colab.research.google.com/. Accessed: 2023-
10-22. 2023.
[Dat23] Harish Datalab. Unveiling the Power of Large Language Models (LLMs). Accessed: yyyy-mm-
dd. 2023. url: https://medium.com/@harishdatalab/unveiling- the- power- of- large-
language-models-llms-e235c4eba8a9.
[Dev23] Google Developers. Translating to a Lower Dimensional Space. Machine Learning Crash Course.
Google. 2023. url: https://developers.google.com/machine- learning/crash- course/
embeddings/translating-to-a-lower-dimensional-space.
[GPT23] GPT4All. GPT4All: Open-Source Ecosystem for Training and Deploying Large Language Models.
https://docs.gpt4all.io. Accessed: 2023-10-22. 2023.
[Gre23] Cobus Greyling. “Autonomous LLM Agents”. In: Medium (2023). Accessed: 2023-10-22. url:
https://cobusgreyling.medium.com/autonomous-llm-agents-f05eec35b6fb.
[Hig23] HiggingFace. Hugging Face: The AI Community Building the Future. https://huggingface.
co/. Accessed: 2023-10-22. 2023.
[Hol23] HolisticAI. From Transformer Architecture to Prompt Engineering. https://www.holisticai.
com/blog/from-transformer-architecture-to-prompt-engineering. Accessed: 2023-10-22.
2023.
[Kob23] KobolCpp. KoboldCpp: Easy-to-Use AI Text-Generation Software. https://llamasking.github.
io/Kobold.cpp/. Accessed: 2023-10-22. 2023.
[Lan23] Langchain. LangChain: A Framework for Developing Applications Powered by Language Models.
https://docs.langchain.com. Accessed: 2023-10-22. 2023.
[Nam23] Author Name. “Implementing RAG with LangChain and Hugging Face”. In: Medium: International
School of AI & Data Science (2023). Accessed: 2023-10-22. url: https : / / medium . com /
international- school- of- ai- data- science/implementing- rag- with- langchain- and-
hugging-face-28e3ea66c5f7.
[Oll23] Ollama. Ollama: Running Large Language Models Locally. https://ollama.ai. Accessed: 2023-
10-22. 2023.
[Pri23] PrivateGPT. PrivateGPT: Privacy Layer for Large Language Models. https://github.com/
imartinez/privateGPT. Accessed: 2023-10-22. 2023.
[Stu23] LM Studio. LM Studio: Cross-Platform Desktop Application for LLMs. https://github.com/
curiousexplorations/lm_studio. Accessed: 2023-10-22. 2023.
[Tea23] DeepPavlov Team. DeepPavlov Agent: An Open-Source Framework for Building Multi-Skill Conversational
Agents. https://deeppavlov.ai. Accessed: 2023-10-22. 2023.

Mauricio Henriquez Schott, Ph.D. [email protected] 37


REFERENCES 7 REFERENCES

[Web23] oobabooga WebUI. Oobabooga’s Text Generation WebUI: Gradio-Based Interface for LLMs.
https://github.com/oobabooga/Text-Generation-WebUI. Accessed: 2023-10-22. 2023.

Mauricio Henriquez Schott, Ph.D. [email protected] 38

You might also like