0% found this document useful (0 votes)

43 views38 pages

03 NLP Document

The document discusses Natural Language Processing (NLP) and its advancements through Large Language Models (LLMs), highlighting their ability to understand and generate human language. It covers the evolution of NLP technologies, the significance of LLMs like GPT-3 and GPT-4, and the challenges associated with their ethical implications. The text emphasizes the interdisciplinary nature of NLP and its applications in various fields, showcasing its impact on human-computer interaction.

Uploaded by

Mauricio Henriquez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views38 pages

03 NLP Document

Uploaded by

Mauricio Henriquez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

NLP: Decoding Human Language with AI

Bridging Words and Machines with Next-Gen Linguistics

Mauricio Henriquez Schott, Ph.D.

October 31, 2023
Index
Content
1 Introduction to Natural Language Processing (NLP) 4

2 Large Language Models (LLM) 8

3 LLM - Prompt Engineering 31

4 Conclusions 34

5 Additional Material 35

6 Glossary of Terms and Acronyms 36

7 References 37

List of Figures
1 Phases of a NLP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 (a) In the architecture Encoder-Decoder, the input sequence is first encoded into a
state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers
[Hol23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 NLP Models Size [Dat23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Embeddings: Word-Association Vector Representation [Dev23]. . . . . . . . . . . 16
5 Transformers Architecture [Vas+17]. . . . . . . . . . . . . . . . . . . . . . . . . 19
6 LLM and API Interaction, Query-Chain Example. . . . . . . . . . . . . . . . . . 23
7 LLM and API Interaction, Agents [Gre23]. . . . . . . . . . . . . . . . . . . . . . 25
8 LLM Advance Features, Retrieval Augmented Generation (RAG) Example [Nam23]. 28
9 LM Studio [Stu23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 KoboldCpp [Kob23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
11 Oobabooga Text Generation Web UI [Web23]. . . . . . . . . . . . . . . . . . . . 30

List of Tables
1 Current LLM Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Listings
1 Python summary code for a LSTM-LLM Model Implementation (lstm-llm01.py) 17

2
2 Python summary code for a Transformers-LLM Model Implementation (transformers-
llm02.py) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Python summary code for OpenAI LLM Query-Chain (chatest01.py) . . . . 24
4 Python summary code for Langchain Agents (autoagents-openai-01.py) . . . 26
5 Python summary code for Autogen Agents (autoagents-local llm-02.py) . . . 27
6 Python summary code for Ollama+RAG (rag.py) . . . . . . . . . . . . . . . 29

Mauricio Henriquez Schott, Ph.D. [email protected] 3

1 Introduction to Natural Language
Processing (NLP)

1 Introduction to Natural Language Processing (NLP)

For many years, the primary means of interacting with computers was through physical
devices such as keyboards and mice. These interfaces, though reliable and precise, posed a
stark contrast to the natural ways humans communicate with each other. With the advent of
advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies,
we now envision a future where interacting with computers could be as seamless as having
a conversation with a fellow human.
NLP, a field at the intersection of computer science, artificial intelligence, and linguistics,
strives to enable computers to understand, interpret, and generate human language in a
valuable manner. The core of recent advancements in NLP is attributed to Large Language
Models (LLMs), which are deep learning models with a vast number of parameters that are
trained on extensive corpora of text data. These models learn to predict the probability of a
word or phrase given its context in a sentence, thereby gaining an understanding of syntax
and semantics.
LLMs, such as GPT-3 and BERT, have shown remarkable abilities in understanding and
generating human-like text, opening doors to numerous applications including but not limited
to text summarization, translation, and question-answering systems. The paradigm shift
toward more natural human-machine interaction is becoming apparent with the proliferation
of voice assistants like Amazon’s Alexa and Apple’s Siri, which utilize NLP and LLM
technologies to understand and respond to user queries in natural language.
However, we are on the cusp of further advancements as researchers and practitioners
aim to push the boundaries of what LLMs can achieve. The recent unveiling of models
like GPT-4, with its enhanced reasoning capabilities, and BLOOM, with a staggering 176
billion parameters, indicates a trajectory towards more sophisticated language understanding
and generation. These models not only promise a future of more intuitive human-machine
interactions but also a vast landscape of applications that could revolutionize industries from
healthcare to education and beyond.
While the silver screen has often romanticized the idea of seamless human-machine
communication with examples from Star Trek’s universal translator to Tony Stark’s J.A.R.V.I.S.,
and depicted in films like ”Her” and ”Blade Runner,” the reality is that we are inching closer
to such futuristic scenarios. The continuous efforts in the field of NLP and the evolution of
LLMs are bridging the gap between the natural ease of human conversation and the digital
dialogue with machines, marking a significant stride towards a future where the keyboard
and mouse may become relics of a bygone era.
Pop Culture References:
Star Trek’s Translator: Depicts real-time language translation.
J.A.R.V.I.S. in Iron Man: An AI that understands and assists Tony Stark.
C-3PO in Star Wars: A droid fluent in over six million forms of communication,
showcasing the potential of advanced linguistic processing.

Mauricio Henriquez Schott, Ph.D. [email protected] 4

1 Introduction to Natural Language
Processing (NLP)

Decoding NLP: More than Just a Buzzword

Defining NLP
Natural Language Processing (NLP) is an AI domain that bridges human language with
machine understanding. It’s not just about translating words, but grasping meaning, context,
and sentiment.
Interdisciplinary Roots
NLP isn’t just computer science. It’s a fusion of linguistics, cognitive psychology, and tech,
working together to decipher human language complexities.
LLMs: The Pinnacle of Modern NLP
Language Models, especially the newer Large Language Models (LLMs) like GPT and BERT,
represent the forefront of Natural Language Processing (NLP) advancements. Trained on
vast datasets, these models excel in understanding, generating, and interacting with human
language. As cutting-edge embodiments of NLP, LLMs are pushing the boundaries of what
machines can achieve in terms of linguistic capabilities.
Applications in Daily Life:
Search Engines: Google uses NLP to understand and rank web pages.
Virtual Assistants: Siri and Alexa process user commands using NLP.

Why NLP is the Talk of the Town

Ubiquity of NLP
From auto-correcting texts to voice-activated smart homes, NLP is everywhere. It’s revolutionizing
how we interact with machines and access information.
Real-World Impact
NLP analyses public sentiment during global events, like elections, providing insights that
can shape strategies and decisions.
Challenges and Limitations
While NLP has made significant strides, it’s not without challenges. Factors like sarcasm,
cultural nuances, and multilingual contexts can complicate language processing. Continued
research is critical to refining and enhancing NLP capabilities.
Modern Use Cases:
Sentiment Analysis: Companies gauge product reception through online reviews.
Chatbots: Businesses use NLP-driven bots for customer service.
Machine Translation: NLP powers real-time language translation tools, breaking
down communication barriers across the globe.

Mauricio Henriquez Schott, Ph.D. [email protected] 5

1 Introduction to Natural Language
Processing (NLP)

NLP: Under the Hood

Software Foundations
At the heart of NLP systems lie sophisticated algorithms often built on frameworks like
TensorFlow, PyTorch, and Hugging Face’s Transformers. These software libraries provide
the tools to design, train, and implement large-scale language models.

Hardware Backbone
NLP models, especially LLMs, demand immense computational power. Companies leverage
GPUs and TPUs for training and inference. Specialized hardware like NVIDIA’s A100
Tensor Core GPUs play a pivotal role in accelerating NLP tasks.

Optimization Techniques
To deploy NLP efficiently, techniques like model pruning, quantization, and distillation
are used. These reduce model size without sacrificing much accuracy, ensuring smoother
deployment on edge devices.
Steps to Create an NLP Model:

Data Collection: Gather and preprocess a vast dataset or Corpus, usually textual
data, from diverse sources.

Model Architecture: Choose an appropriate neural network architecture, such as

RNNs, CNNs, or Transformers.

Training: Feed the data into the model, adjusting weights using backpropagation and
optimization algorithms.

Evaluation: Test the model’s performance on unseen data, assessing its accuracy and
understanding.

Fine-tuning: Adjust the model using smaller, domain-specific datasets for specialized
tasks.

Deployment: Integrate the trained model into applications, ensuring it can handle
real-world inputs efficiently.

From Lexemes to Intent: Decoding Language’s DNA

Understanding NLP Steps
NLP unravels human language through layered processes, from dissecting text to discerning
intent. These steps—lexical to pragmatic analysis—ensure machines grasp our language’s
depth, enabling more natural human-computer interactions.
Phases of a NLP System:

Mauricio Henriquez Schott, Ph.D. [email protected] 6

1 Introduction to Natural Language
Processing (NLP)

Lexical analysis or Morphological: Lexical means the collection of words and

phrases in a language. Lexical analysis breaks the whole chunks of text into words,
paragraphs, and sentences. It includes identifying and analyzing the structure of words.
For example, A word like ‘dishonest’ can be broken into ‘dis-honest.’

Syntactic analysis: Syntactic analysis checks the grammar in a sentence, and it

arranges words in a manner that represents the relationship among the words. For
example: Car you a have
The above sentence has no correct meaning, and also it is not correct grammatically.
So, this sentence will be rejected by syntactic analysis.

Semantic analysis: Semantic analysis is used to check the text’s meaningfulness. It

extracts the correct meaning form the text. Examples: Cold coffee, iced tea, etc.

Disclosure integration: Basically, it quickly describes the right meaning of any

correct sentence.

Pragmatic analysis: In this phase, pragmatic analysis deals with the entire communication
and social content, and it provides the actual meaning of what was said in reinterpreted.
For example: ’Let me water’ is interpreted as a request instead of an order.

Figure 1: Phases of a NLP.

Mauricio Henriquez Schott, Ph.D. [email protected] 7

2 Large Language Models (LLM)

The odyssey of Large Language Models (LLMs) commenced in the 1960s with the inception
of the first-ever chatbot, Eliza, engineered by MIT savant Joseph Weizenbaum. Despite its
elementary pattern recognition ability, Eliza kindled the flames of what would later burgeon
into the sophisticated realm of Natural Language Processing (NLP) we are familiar with
today. Over the ensuing decades, a slew of significant innovations drove the field of LLMs
forward. Notable among these milestones was the unveiling of Long Short-Term Memory
(LSTM) networks in 1997, which heralded the creation of deeper and more intricate neural
networks capable of wrangling vast amounts of data. A further leap was witnessed with the
introduction of Stanford’s CoreNLP suite in 2010, offering a toolkit to tackle complex NLP
tasks such as sentiment analysis and named entity recognition.
The plot thickened in 2011 with the launch of Google Brain, a venture that equipped
researchers with formidable computing resources and datasets, alongside avant-garde features
like word embeddings, enabling NLP systems to better grasp the context of words. This
initiative paved the way for monumental advancements like the introduction of Transformer
models in 2017, which, in turn, birthed more sophisticated LLMs such as OpenAI’s GPT-
3, serving as the bedrock for ChatGPT and a plethora of other awe-inspiring AI-driven
applications.
In the contemporary scene, LLMs have demonstrated remarkable prowess in a multitude
of NLP tasks, evolving conversational AI, and showcasing impressive results where they
can generate contextually relevant and coherent responses, thus propelling the widespread
adoption of chatbots and virtual assistants. The narrative of LLMs took a cinematic turn
with the advent of GPT-3 by OpenAI in July 2020, a behemoth in the LLM arena at the
time, trained to predict the ensuing word in a sentence akin to a text message autocomplete
feature, but on a grandiloquent scale.
Fast forward to 2023, the realm of LLMs is abuzz with next-gen models like GPT-4, which
has showcased astounding capabilities with complex reasoning, advanced coding proficiency,
and human-level performance in multiple academic exams. The landscape is now dotted with
illustrious models like GPT-3 from OpenAI, PaLM or LaMDA from Google, Galactica or
OPT from Meta, Megatron-Turing from Nvidia/Microsoft, and Jurassic-1 from AI21 Labs,
each vying for the crown in a burgeoning kingdom of linguistic prowess.
However, amidst this effulgence of advancements, the domain of LLMs is not devoid
of ethical quandaries. The very essence of LLMs, their ability to generate text based on
colossal datasets, beckons a plethora of ethical and moral concerns. These range from the
propensity of LLMs to perpetuate existing biases present in the training data, to serious
contemplations regarding the responsibility for outputs generated by LLMs. As LLMs
continue to proliferate and permeate various facets of society, the dialogue around their
ethical and moral implications is burgeoning, with scholars and practitioners alike delving
into topics like the capacity for moral self-correction in LLMs, the knowledge of cultural moral
norms, and the practical and ethical challenges posed by LLMs, especially in education.

Mauricio Henriquez Schott, Ph.D. [email protected] 8

2 Large Language Models (LLM)

The trajectory of LLMs is a testament to human ingenuity and the relentless pursuit
of knowledge, mirrored in the ceaseless advancement of these linguistic behemoths. As we
stand on the cusp of further groundbreaking discoveries in this domain, the tale of LLMs is
far from over; it’s a riveting saga that continues to unfold, with each chapter promising a
blend of awe, enlightenment, and a cadre of ethical deliberations awaiting resolution.
LLMs - Delving into the Architecture
Architecture (Transformers)
LLMs typically employ a Transformer architecture, which consists of an encoder and a
decoder. However, models like GPT-3 only use the decoder part. This architecture handles
sequential data efficiently, making it apt for language processing.

Embedding Layer
The first step in processing text is converting words into numerical vectors using an embedding
layer. This transformation captures semantic relationships between words.

Self-Attention Mechanism
The Transformer utilizes a self-attention mechanism, allowing each input sequence element
to focus on different parts, capturing dependencies regardless of sequence positions.

Deciphering Tokens
Tokens represent chunks of text, like words or characters, that the model processes. For
instance, in GPT-3, a model might have a token limit of 2048 tokens, while GPT-4 could
potentially handle even more. This means that for a chat interaction, GPT-3 can consider
up to 2048 tokens in a single input-output sequence, determining its response based on that
context.

Mauricio Henriquez Schott, Ph.D. [email protected] 9

2 Large Language Models (LLM)

Figure 2: (a) In the architecture Encoder-Decoder, the input sequence is first encoded
into a state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers [Hol23].

Beyond Words: Navigating the Landscape of Modern LLMs

Current LLM Models
In the current landscape of 2023, Large Language Models (LLMs) have witnessed significant
advancements in terms of size and capabilities. OpenAI’s GPT-4 stands as one of the
largest models with an undisclosed number of parameters, which are known to be the highest
among existing models.This model is multimodal, capable of handling both text and image
inputs and providing text outputs. Other notable LLMs include BLOOM, with 176 billion
parameters, and models like ChatGPT, LaMDA, and Jurassic-1, each specializing in various
tasks and domain.
The core technical facets of LLMs revolve around parameters and tokens. Parameters
are the elements within the model that are fine-tuned during the training process. They
are essentially the weights in the neural networks that help the model learn and generalize
from the training data to unseen data. The number of parameters is a key metric that often
correlates with the model’s capability; for instance, crossing the threshold of 100 billion
parameters has been associated with enhanced reasoning abilities in LLM.
Tokens are the basic units of text that LLMs process. In essence, a token can be as short
as one character or as long as one word, and in some cases, it could encapsulate even more
information. The tokenization process breaks down input text into these manageable units
which the model then processes to understand the text, generate responses, or perform other
specified tasks.
Furthermore, the field has witnessed a dichotomy between proprietary and open-source
models. While proprietary models like GPT-4 often come with advanced capabilities, there

Mauricio Henriquez Schott, Ph.D. [email protected] 10

2 Large Language Models (LLM)

Model N° Parameters Tokens Open Source VRAM Requirements Company Main Purpose
GPT-3 175B 2k No High OpenAI Conversational AI
GPT-4 1.76T 32k No Very High OpenAI Conversational AI, Text Generation
BERT 1.5B 512 Yes Depends on Task Google Text Classification, NER, etc.
Llama2 7B, 13B, 70B 4k Yes 16Gb, 32Gb, ... Meta Conversational AI, Code Interpreter, etc
Vicuna 7B 16k Yes 16Gb LMSYS Conversational AI
Mistral 7B 8k Yes 16Gb Mistral AI Conversational AI
Orca 3B, 7B, 13B 8k Yes 8Gb, 16Gb, 32Gb Microsoft Progressive Learning
Table 1: Current LLM Models.

is a significant push towards the democratization of LLMs. Recent efforts aim at advancing
open-source smaller models by distilling knowledge from larger, often proprietary, models.
This endeavor seeks to bridge the computational requirements and to allow for broader access
and utilization of LLM capabilities.
The GPT-4 model, in particular, has exhibited a broad spectrum of capabilities including
complex reasoning understanding, advanced coding ability, and proficiency in multiple academic
domains, thereby showcasing a trajectory of rapid advancement and a promise of near human-
level performance in certain task [tab.1].

Parameter Proliferation: A Glimpse into LLMs’ Expanding Complexity

Figure 3: NLP Models Size [Dat23].

Mauricio Henriquez Schott, Ph.D. [email protected] 11

2 Large Language Models (LLM)

NLP/LLM Models Size Evolution

The size of Large Language Models (LLMs) has seen a substantial increase, aligning with
a trend of 10x growth in parameters every year for a few consecutive years, akin to a new
form of Moore’s Law.

This evolution reflects the models’ pursuit of better semantic understanding and general-
purpose language processing capabilities.

For instance, GPT-2, which was finalized in 2019, had 1.5 billion parameters and was noted
for its ability to produce convincing prose.

Following this, GPT-3, with 175 billion parameters, made a significant leap in model size.

Furthermore, a model named PaLM showcased almost 3 times the parameter count of GPT-
3, tallying at 540 billion parameters.

GPT-4 (2013), with a speculated parameter count ranging from over 1 trillion to 170 trillion,
exhibits advanced capabilities like enhanced text generation, image handling, interactive
chatting, and better business decision-making, representing a significant advancement in
Large Language Models.

This trajectory, however, raises concerns regarding computational resources, as LLMs require
massive amounts of data and computational power during training and operation, which may
lead to diminishing returns, increased costs, and added complexity.

Mastering LLM Customization

Specialized Training
Specialized training tailors LLMs to specific tasks or domains, optimizing their performance.
Fine-tuning, a prevalent method, adjusts the LLM’s weights based on a custom dataset,
enhancing the model’s capabilities for the intended applications.
Dataset Utilization
Datasets aligned with the target domain or task are crucial. They should be sufficiently
large and well-structured, representing the target domain effectively. The data can be raw
text or structured, and preprocessing like cleaning and tokenization is vital before training.
Innovative Techniques
Techniques like SteerLM have emerged to customize LLMs, aligning their responses with user
preferences. Such innovations continue to push the boundaries of what LLMs can achieve,
making them more user-centric and application-specific.

Mauricio Henriquez Schott, Ph.D. [email protected] 12

2 Large Language Models (LLM)

Sampling and Scaling:

Tools: Tools like Hugging Face provide diverse datasets and pre-trained models,
enabling customization for various applications.
Scaling: As models scale, the data requirements grow exponentially. For instance,
GPT-3’s training data was vast, but a values-targeted dataset with just 80 text samples
was used to refine its behavior.
Code Samples: Practical examples and code samples are crucial for understanding
the customization process, aiding in fine-tuning models efficiently for specific tasks.

LLMs - Further Dive and Application

Feed-forward Neural Networks
Post attention mechanism, the model employs feed-forward neural networks at each sequence
position simultaneously.

Layer Normalization and Residual Connections

These elements aid in training deep networks by preventing the vanishing gradient problem
and facilitating the re-use of previously learned features.

Parameter Sharing and Training

LLMs share parameters across layers, reducing the total parameters and allowing generalization
across tasks. Training involves large datasets using supervised, unsupervised, or semi-
supervised learning.

Understanding Input Parameters

When we say a model has ”7 billion input parameters,” it refers to the number of tunable
weights in the model. These weights are adjusted during training to help the model recognize
and learn patterns in data. Essentially, the number of parameters gives us an idea of the
model’s complexity and its capacity to learn from vast amounts of data.

Mauricio Henriquez Schott, Ph.D. [email protected] 13

2 Large Language Models (LLM)

LLMs - Embeddings, Translating to a Lower-Dimensional Space

The discourse surrounding Large Language Models (LLMs) is significantly underpinned by
the concept of embeddings, a critical element in the domain of language processing. The
understanding of embeddings commences with an introduction to vector spaces, where words
or phrases are mapped to vectors of real numbers. This mapping facilitates the application of
mathematical operations on linguistic entities, thereby enabling the computation of semantic
similarities and discernment of relationships among words or phrases.
The concept of a vector database is central to the discussion of embeddings. A vector
database is a structured repository where each word or phrase is associated with a vector
in a high-dimensional space. These dimensions encapsulate various linguistic or semantic
properties. For instance, consider a simplistic 2-dimensional vector space, where one dimension
might represent a word’s tense, and the other its plurality. In practical scenarios, vector
spaces are multi-dimensional, often encompassing hundreds or thousands of dimensions, each
capturing a nuanced linguistic or semantic trait.
The historical evolution of embeddings in LLMs can be traced back to seminal techniques
like Word2Vec and GloVe. These techniques were instrumental in mapping words to vectors
in a manner that facilitates the capture of semantic nuances in a machine-understandable
format. The advent of these techniques marked the dawn of modern Natural Language
Processing (NLP), paving the way for the development of sophisticated LLMs like BERT
and GPT-3.
In the domain of LLMs, embeddings serve as the underlying mechanism that powers the
ability of these models to comprehend and generate text. They are the critical components
that enable LLMs to perceive the subtleties of language, thereby enhancing the naturalness
and intuitiveness of interactions with machines.
However, the advancement in embeddings presents a set of challenges. The increasing
size of embeddings, driven by ever-expanding datasets and the quest for higher accuracy,
necessitates substantial computational power. The larger the model, the more the computational
resources required, and the more intricate the embeddings, the longer the training duration.
The narrative is further complicated by the emergence of multimodal models, which aim
to understand not merely text, but a variety of data types, necessitating a combination
of embeddings. The computational demands, along with the need for efficient storage and
access to large embedding matrices, pose significant challenges.
Furthermore, as the pursuit for richer representations in higher dimensions continues, the
curse of dimensionality becomes a pertinent concern. This phenomenon results in certain
computations becoming exponentially more challenging as the dimensionality increases.
The challenges associated with embeddings necessitate the development of optimized
algorithms, efficient hardware, and novel methodologies to manage, store, and compute
embeddings. These challenges present a fertile ground for innovation and mark the pathway
for the next generation of advancements in LLMs.
The domain of embeddings is crucial for the advancement of LLMs, and a thorough
understanding of this domain is akin to possessing a critical lexicon that aids in deciphering
the complex language of machines. While the challenges are substantial, the potential for

Mauricio Henriquez Schott, Ph.D. [email protected] 14

2 Large Language Models (LLM)

groundbreaking advancements in LLMs through a mastery of embeddings is boundless.

In the realm of Large Language Models (LLMs), understanding the concept of word embeddings
and vector spaces is pivotal. This understanding begins with the transformation of words
into vectors in a manner that can be comprehended and manipulated mathematically. Here,
an example is provided to elucidate this concept further.
Consider three words: King, Queen, and Man. The objective is to encapsulate the
semantic relationship between these words in a mathematical format. In a simplified model,
let’s assign vectors to these words such that the vector operations reveal semantic relationships.
Let:
Vector(King) = [3, 1]
Vector(Queen) = [2, 2]
Vector(Man) = [3, 0]
Now, we aim to capture the gender relationship between these words. Intuitively, we
could say that a King is to Queen as a Man is to Woman. Mathematically, this relationship
can be represented using vector arithmetic as follows:
Vector(King) − Vector(Man) = Vector(Queen) − Vector(Woman)
Substituting the known vectors, we get:
[3, 1] − [3, 0] = [2, 2] − Vector(Woman)
This simplifies to:
[0, 1] = [2, 2] − Vector(Woman)
Now, solving for Vector(Woman), we may obtain a vector such as [2, 1] that satisfies
the relationship. This operation has provided a simplistic representation of how gender
relationships can be captured using vector arithmetic.
In practical scenarios, the vector space is multi-dimensional, often comprising hundreds
or thousands of dimensions, and the vectors are obtained through training on large datasets.
For instance, models like Word2Vec or GloVe are trained on vast corpora of text to learn
vectors for words such that the vector arithmetic reveals semantic relationships akin to the
one demonstrated above.
This example provides a glimpse into the powerful capability of embeddings in capturing
semantic relationships, which is central to the operation of Large Language Models.

Even a small multi-dimensional space provides the freedom to group semantically similar
items together and keep dissimilar items far apart. Position (distance and direction) in
the vector space can encode semantics in a good embedding. For example, the following
visualizations of real embeddings show geometrical relationships that capture semantic relations
like the relation between a country and its capital [Dev23].

Mauricio Henriquez Schott, Ph.D. [email protected] 15

2 Large Language Models (LLM)

Embeddings: Word-Association Vector Representation

Figure 4: Embeddings: Word-Association Vector Representation [Dev23].

LLM Implementation using Torch and a LSTM Model

LSTM using Torch
The provided pseudocode offers a simplified representation of a language model training
process using PyTorch and the LSTM architecture. In real-life applications, such as GPT-
3, the datasets are in the terabyte range, with models comprising billions of parameters.
While LSTMs can capture sequential information, state-of-the-art models like GPT-3 use
Transformer architectures, which excel in capturing long-range dependencies and context.
Training these mammoth models requires extensive computational resources and can take
weeks on specialized hardware. [list.1].

Mauricio Henriquez Schott, Ph.D. [email protected] 16

2 Large Language Models (LLM)

Python & Torch Library LSTM-LLM Example:

# Simulated dataset for demonstration. Real LLMs use terabytes of diverse texts.
data = "Once upon a time in a land far away, there was a brave knight..."

# Tokenization: Convert text into tokens. Real scenarios may use subwords or characters.
tokens = tokenize(data); vocab = createVocabulary(tokens)
word_to_idx, idx_to_word = createMappings(vocab)

# Create sequences to predict the next word.

SEQUENCE_LENGTH = 4; sequences, next_tokens = createSequences(tokens, SEQUENCE_LENGTH)

# Define the simple language model.

class SimpleLM:
function __init__(vocab_size):
# Embedding: Convert word indices to vectors. In GPT-3, this has millions of parameters.
self.embedding = Embedding(vocab_size, 50)

# LSTM: Learns patterns in sequences. GPT-3 uses Transformers for better long-term
dependencies.
self.lstm = LSTM(50, 100)

# Fully connected layer: Produces predictions. Real LLMs can have billions of parameters.
self.fc = FullyConnected(100, vocab_size)

function forward(x):
x = embed(x)
lstm_out = passThroughLSTM(x)
return producePredictions(lstm_out)

# Training setup
model = SimpleLM(len(vocab)); optimizer, criterion = setupTraining()

# Training loop: Real LLM training can take weeks on powerful hardware.
for epoch in range(100):
for seq, next_token in sequences:
optimizer.resetGradients()
input_seq = convertToIndices(seq)
output = model.predict(input_seq)

# Text generation
function generate_text(model, start_text, length=10):
...
return concatenateWords(words)

print(generate_text(model, "Once upon a time"))

Listing 1: Python summary code for a LSTM-LLM Model Implementation
(lstm-llm01.py)

Mauricio Henriquez Schott, Ph.D. [email protected] 17

2 Large Language Models (LLM)

Main Steps in the Code:

1.Tokenize dataset and establish vocabulary.

2.Construct LSTM-based language model.
3.Train model on sequence predictions.
4.Generate text from trained model.

Transformers: More than Meets the Eye

Attention is all you need
A new generation of powerful language models began with a breakthrough discovery in
2017, introducing a revolutionary AI structure called Transformer in a landmark paper
”Attention is all you need” [Vas+17]. This encoder-decoder architecture composed by
stacks of transformer layers, depicted below, quickly became popular for Natural Language
Processing (NLP) problems.

Key Characteristics:
Its innovative use of attention mechanisms and parallel processing set this model apart
from the traditional Convolutional Neural Networks (CNN) and recurrent Long-Short Term
Memory (LSTM) networks. The network processed data sequences in parallel and used
attention layers to simulate the focus of attention in the human brain.

This mechanism connects relationships between words in the text, making it much more
efficient to process large sequences. As a result, the parallel nature of this architecture took
full advantage of graphics processors, and the attention layer eliminated the problem of
forgetting that plagues recurrent networks.

In the left diagram, you can see the activation of an attention layer in action. An attention
layer can handle many head attentions. These activations represent the significant associations
learned by the model during training.

Mauricio Henriquez Schott, Ph.D. [email protected] 18

2 Large Language Models (LLM)

1.- Inputs and Input Embeddings

The tokens entered by the user are considered inputs for machine learning models. However,
models understand numbers, not text, necessitating the conversion of these inputs into a
numerical format called “input embeddings.” Input embeddings place words in a mathematical
space where similar words are nearby. During training, the model learns to create these
embeddings, ensuring vectors represent words with similar meanings.
2.- Positional Encoding
The word order in a sentence is vital for its meaning. Traditional machine learning models
don’t inherently understand input order. Positional encoding encodes each word’s position in
the input sequence. This inclusion in the Transformer architecture allows GPT to understand
word order, producing grammatically and semantically correct outputs.
3.- Encoder
The encoder tokenizes the input text into tokens, like words or sub-words. It then applies self-
attention layers to generate hidden states representing the input text at various abstraction
levels. The transformer uses multiple encoder layers.

Figure 5: Transformers Architecture [Vas+17].

4.- Outputs (shifted right)

During training, the decoder predicts the next word by referencing preceding words. GPT
is trained on vast text data, with GPT-3 boasting 175 billion parameters. Training data
includes the Common Crawl web corpus, BooksCorpus dataset, and English Wikipedia.

Mauricio Henriquez Schott, Ph.D. [email protected] 19

2 Large Language Models (LLM)

5.- Output Embeddings

Like input embeddings, the model outputs must be translated to a numerical format, “output
embeddings.” These undergo positional encoding. A loss function measures the model’s
prediction accuracy, adjusting parts of the model to enhance accuracy. Output embeddings
are used in both training and inference in GPT.

6.- Decoder
Positionally encoded input representation and output embeddings are processed by the
decoder. It generates the output sequence based on the encoded input sequence. Like
the encoder, the transformer uses multiple decoder layers.

7.- Linear Layer and Softmax

After the decoder, the linear layer maps the output embeddings to a higher-dimensional
space. Then, the softmax function generates a probability distribution for each output
token in the vocabulary.

Mauricio Henriquez Schott, Ph.D. [email protected] 20

2 Large Language Models (LLM)

LLM Implementation using Torch and a Transformers Model

Python & Torch Library Transformers-LLM Example:
import torch # Popular library for deep learning.
import torch.nn as nn # Importing neural network modules from PyTorch.

class SelfAttention(nn.Module):
def __init__(self, embed_size, heads):
super(SelfAttention, self).__init__()
# Initialize layers for Q, K, V, and output.
self.layers = self._init_layers(embed_size, heads)

def forward(self, values, keys, query, mask):

# Compute attention and return result.
return self.layers(query, keys, values, mask)

class TransformerBlock(nn.Module):
def __init__(self, embed_size, heads, dropout, expansion):
super(TransformerBlock, self).__init__()
self.attention = SelfAttention(embed_size, heads)
# Other layers like normalization and feed-forward are initialized here.
self.layers = self._init_layers(embed_size, dropout, expansion)

def forward(self, value, key, query, mask):

# Process through attention and other layers.
return self.layers(value, key, query, mask)

class Transformer(nn.Module):
def __init__(self, vocab_size, embed_size, num_layers, heads, device, expansion, dropout):
super(Transformer, self).__init__()
# Define embeddings and transformer blocks.
self.embedding = nn.Embedding(vocab_size, embed_size)
self.transformer_blocks = self._init_blocks(embed_size, heads, dropout, expansion,
num_layers)
self.fc_out = nn.Linear(embed_size, vocab_size)

def forward(self, x, mask):

# Process sequence through transformer.
x = self.embedding(x)
for block in self.transformer_blocks:
x = block(x, x, x, mask)
return self.fc_out(x)

# Instantiate the model.

model = Transformer(len(vocab), 512, 6, 8, torch.device("cuda"), 4, 0.1)
Listing 2: Python summary code for a Transformers-LLM Model Implementation
(transformers-llm02.py)

Mauricio Henriquez Schott, Ph.D. [email protected] 21

2 Large Language Models (LLM)

Main Steps in the Code:

1.Tokenize dataset and establish vocabulary.
2.Construct Transformer-based architecture:
a) Define self-attention mechanism to weigh word relationships.
b) Build Transformer blocks with attention and feed-forward layers.
c) Incorporate positional embeddings to understand word order.
3.Train model on sequence predictions.
4.Generate text from trained model.

Transformers using Torch

This code defines a simplified Transformer model using PyTorch. The SelfAttention class
computes how words in a sentence relate to each other. The TransformerBlock integrates
attention with other neural layers for processing sequences. The main Transformer class
combines these blocks to model relationships in input data over multiple processing stages.
When instantiated, model represents this Transformer architecture, ready to be trained on
data. [list.2].

Attention vs. Traditional Methods

The transformer architecture surpasses others like Recurrent Neural Networks (RNN) or
Long Short-Term Memory (LSTM) in natural language processing due to its ”attention
mechanism”. While RNNs process input sequentially, Transformers tackle the entire input
simultaneously, leading to faster processing and recognizing intricate connections between
words.
Why Not LSTMs?
LSTMs utilize a hidden state for recalling past events, but they falter with too many layers
due to the vanishing gradient problem. Transformers, however, discern how input and output
words correlate by observing them simultaneously, excelling in recognizing long-term word
connections.
Summing It Up:
The attention mechanism enables models to focus selectively on various input sequence
parts.
Captures distant input relationships, beneficial for natural language tasks.
Requires fewer parameters for modeling long-term dependencies, emphasizing only
relevant inputs.
Excellently manages varying input lengths by adjusting its attention based on sequence
length.

Mauricio Henriquez Schott, Ph.D. [email protected] 22

2 Large Language Models (LLM)

Unleashing LLM Power with APIs

LLM API Overview
LangChain and other APIs for LLMs access from programming languages as python, are
platforms providing tools and APIs to facilitate the integration of LLMs into projects, thus
harnessing advanced language processing capabilities [Lan23].
Ease of Application Development
These APIs, simplifies the process of creating applications powered by LLMs, providing
components and interfaces for managing interactions with remote and local language models.
Open-Source Framework
Langchain and others, are open-source frameworks, they serves as an abstraction layer for
applications utilizing LLMs, promoting innovation and ease of customization.
Key Features:
Model Interface: Generic interface to a variety of foundational models.

Prompt Management: Framework for managing prompts efficiently.

Central Interface: Interface to long-term memory, external data, and other LLMs.

Figure 6: LLM and API Interaction, Query-Chain Example.

Mauricio Henriquez Schott, Ph.D. [email protected] 23

2 Large Language Models (LLM)

Langchain and OpenAI API Example

Langchain & OpenAI Query-Chain Example:
# Extracting text from the PDF
for page in pdf_reader.pages: text += page.extract_text()

# Initializing text splitter with specified chunk size and overlap

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200,
length_function=len)
# Splitting the text into chunks for further processing
chunks = text_splitter.split_text(text=text)

# Checking if the vector store already exists, if not creating a new one
if os.path.exists(f"{store_name}.pkl"): # Loading the vector store from disk...
VectorStore = pickle.load(f)
else:
# Creating embeddings for text chunks
embeddings = OpenAIEmbeddings()
# Creating a new vector store from the text chunks
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
# Saving the vector store to disk for future use...
pickle.dump(VectorStore, f)

# Accepting user questions/query

query = st.text_input("Ask questions about your PDF file:")

# Performing similarity search to find relevant text chunks

docs = VectorStore.similarity_search(query=query, k=3)

# Initializing the OpenAI LLM

llm = OpenAI()
# Loading a QA chain for processing the query
chain = load_qa_chain(llm=llm, chain_type="stuff")
with get_openai_callback() as cb:
# Running the QA chain to get a response
response = chain.run(input_documents=docs, question=query)

st.write(response) # Displaying the response to the user

Listing 3: Python summary code for OpenAI LLM Query-Chain (chatest01.py)

LLM API Access

The code creates a Streamlit web application where users can upload a PDF, extract its
text, and break it into chunks. These chunks are then transformed into embeddings using
OpenAI and stored efficiently using FAISS. Users can then input questions related to the
PDF content, and the application employs an OpenAI Large Language Model to search for
the most relevant chunks and provide corresponding answers. [list.3].

Mauricio Henriquez Schott, Ph.D. [email protected] 24

2 Large Language Models (LLM)

Agent Alchemy: Transmuting Text to Tasks

LangChain Agents
LangChain provides a foundation for working with agents that use LLMs to choose sequences
of actions. It employs a language model as a reasoning engine, determining actions based
on user inputs [Lan23].

AutoGen: Microsoft’s LLM Agent Creator

AutoGen, a Microsoft initiative, simplifies the development of LLM-based applications with
a multi-agent framework, supporting complex workflows and conversations among agents to
collaboratively solve tasks [Aut23].

DeepPavlov Agent: Multi-Skill Conversational Maestro

DeepPavlov, an open-source library, enables the creation of multi-skill conversational agents
using LLMs for sophisticated, context-aware dialog systems handling diverse tasks and
queries. [Tea23].
Supplementary Insights:
LLM Utilization: All three platforms harness the power of Large Language Models
(LLMs) to enhance their functionalities.
Agent Framework: They provide frameworks for developing and managing agents
to interact with users or other systems.
Task Automation: Agents across these platforms can be programmed to automate
tasks and provide intelligent responses.
Modular Design: They support modular designs, allowing for the integration of
various skills, tools, and external APIs.

Figure 7: LLM and API Interaction, Agents [Gre23].

Mauricio Henriquez Schott, Ph.D. [email protected] 25

2 Large Language Models (LLM)

Langchain Agent Example

Langchain & Agents Example:
llm = ChatOllama(model="llama2") # Creating an instance of ChatOllama or ChatOpenAI, etc

@tool # Decorating the function to be recognized as a tool within langchain environment

def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word) # Returns the length of the given word

# Creating a chat prompt template

prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
("user", "{input}"),MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Binding the formatted tools to the llama language model instance

llm_with_tools = llm.bind(
functions=[format_tool_to_openai_function(t) for t in tools] # Formatting the tools using
langchain’s render function
)

# Invoking the agent with an input question

agent.invoke({
"input": "how many letters in the word education?",
"intermediate_steps": []
})

# Importing the AgentFinish class to identify when the agent has completed its task
while True: # Infinite loop to continue processing until a finish signal is received
output = agent.invoke({
"input": "how many letters in the word education?",
"intermediate_steps": intermediate_steps
})
if isinstance(output, AgentFinish): # if the output is an instance of AgentFinish, finish
final_result = output.return_values["output"]; break
else:
print(output.tool, output.tool_input)
tool = {"get_word_length": get_word_length}[output.tool]

print(final_result) # Output the final result

# Creating an instance of AgentExecutor with the defined agent and tools

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Invoking the agent executor with an input question

agent_executor.invoke({"input": "how many letters in the word education?"})
Listing 4: Python summary code for Langchain Agents (autoagents-openai-01.py)

LLM Agent APIs

The code showcases how Agent APIs like Langchain and Autogen, initialize models, define
tools, create prompt templates, bind tools to models, and orchestrate the agent pipeline
to process input and produce output. The Agent APIs in general facilitate the creation,
management, and execution of agents, offering a structured approach to building and running
interactive AI systems. [list.4].

Mauricio Henriquez Schott, Ph.D. [email protected] 26

2 Large Language Models (LLM)

Autogen Agent Example

Autogen & Agents Example:
import autogen

# Configuration for initializing a language model with API details.

# Define a list of configurations for initializing a language model; in this case,
specifying GPT-4 with an API key.
#config_list = [
# {
# ’model’: ’gpt-4’,
# ’api_key’: ’sk-AOeQyDCyLIb8k-----’
# }
#]

config_list = [
{
"api_type": "open_ai",
"api_base": "http://localhost:1234/v1",
’api_key’: "NULL"
}
]

# Settings for the LLM, including timeout and randomness control.

llm_config = {
"request_timeout": 600,
"seed": 42,
"config_list": config_list,
"temperature": 0
}

# Instantiate the AssistantAgent and UserProxyAgent for interactions.

assistant = autogen.AssistantAgent(name="CTO", llm_config=llm_config, system_message="
Chief
technical officer")
user_proxy = autogen.UserProxyAgent(name="user_proxy", llm_config=llm_config,
system_message="Reply TERMINATE if task is solved, else reply CONTINUE."
)

# Define tasks and initiate chats for them.

task1 = "Write python code to output numbers 1 to 100 and store in a file"
user_proxy.initiate_chat(assistant, message=task1)

task2 = "Modify the code to output numbers 1 to 200"

user_proxy.initiate_chat(assistant, message=task2)

# Initiate another chat with the assistant to handle the second task. The task message is
passed to the assistant through the user_proxy agent.
user_proxy.initiate_chat(assistant, message=task2)
Listing 5: Python summary code for Autogen Agents (autoagents-local llm-02.py)

LLM Agent APIs

The code integrates the autogen framework to create a conversational interface between a
user and an AI ”assistant” agent. This agent, named ”CTO”, is tasked with generating
Python code based on the user’s instructions. The interface also employs a UserProxyAgent
to mediate interactions and handle specific commands. Notably, with the integration of LM
Studio [Stu23], the system can seamlessly switch between using local models and those hosted

Mauricio Henriquez Schott, Ph.D. [email protected] 27

2 Large Language Models (LLM)

by OpenAI or other remote servers. This flexibility ensures that developers can maintain a
consistent codebase without having to adjust for the model’s location, be it local or remote.
[list.5].

Enhanced Access to LLM Capabilities

Cloud Enablers for LLM Exploration
Hugging Face, Google Colab and Weight & Biases, are key cloud platforms facilitating easy
access to Large Language Models (LLMs). Hugging Face provides a repository of pre-trained
models, while Google Colab offers a collaborative notebook environment with computational
resources, making LLM experimentation and deployment accessible to a wider audience
[Hig23; Col23; Bia23].

Local and Remote LLM Interactions

Ollama, PrivateGPT, GPT4All, and other frameworks, simplifies running LLMs locally
or remotely with GPU acceleration if available, providing a access APIs and a CLI for
interaction, aiding developers in effortlessly integrating LLM capabilities (chat, query-chain
and retrieval, RAG, etc.) into their applications [Oll23; Pri23; GPT23].

User-Friendly LLM Interfacing: Diverse Paths to Text Generation

LM Studio, KoboldCpp, and Oobabooga’s Text Generation WebUI embody the essence of
making LLMs accessible to a broader audience. They provide diverse avenues for both
novices and seasoned users to explore, interact with, and leverage the power of LLMs for
various text generation tasks [Stu23; Kob23; Web23].

Figure 8: LLM Advance Features, Retrieval Augmented Generation (RAG) Example

[Nam23].

Mauricio Henriquez Schott, Ph.D. [email protected] 28

2 Large Language Models (LLM)

Langchain & Ollama RAG Example

Ollama+RAG Example:
# RAG prompt
from langchain import hub # Import the hub module from langchain
QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-llama") # Pull a specific RAG prompt from the
langchain hub

# LLM
llm = Ollama(model="llama2", # Specify the language model to use
verbose=True, # Set verbose to True for more detailed output
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])) # Set up a callback
manager with a specified callback handler
print(f"Loaded LLM model {llm.model}") # Output the loaded language model

# QA chain
from langchain.chains import RetrievalQA # Import the RetrievalQA class from langchain.
chains module
qa_chain = RetrievalQA.from_chain_type(
llm, # Specify the language model to use
retriever=vectorstore.as_retriever(), # Set up a retriever using the vector store
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}, # Specify additional arguments for the
chain type
)

# Ask a question
question = f"What are the latest headlines on {url}?" # Formulate a question ....
result = qa_chain({"query": question}) # Pass the question to the QA chain and store the
result
Listing 6: Python summary code for Ollama+RAG (rag.py)

RAG
The RAG (Retrieval-Augmented Generation), refers to augmenting the generative capabilities
of LLMs with external knowledge retrieval. This is achieved by integrating a retrieval
mechanism that fetches relevant information from external sources, which is then used by the
LLM to generate more informed and contextually relevant responses. It’s a way to expand
the knowledge base of LLMs beyond what they were trained on, enhancing their ability to
provide accurate and up-to-date responses [list.6].

Mauricio Henriquez Schott, Ph.D. [email protected] 29

2 Large Language Models (LLM)

Figure 9: LM Studio [Stu23]

Figure 10: KoboldCpp [Kob23].

Easy Access to LLM Capabilities with UI

LM Studio
LM Studio is an easy to use desktop app for experimenting with local and open-source Large
Language Models (LLMs). Cross platform app, allows the use of models through the in-app
Chat UI or an OpenAI compatible local server, use any model from HuggingFace.

KoboldCpp
A simple one-file way to run various GGML and GGUF models with KoboldAI’s UI

Oobabooga
A Gradio web UI for Large Language Models.

Figure 11: Oobabooga Text Generation Web UI [Web23].

Mauricio Henriquez Schott, Ph.D. [email protected] 30

3 LLM - Prompt Engineering

Prompt Engineering: The LLM Whisperer
System and User Messages
System messages are predefined instructions guiding the model’s behavior, while user messages
are input queries from the user. Example: a system message sets a polite tone, a user message
asks for information.

Chain-of-Thought (CoT) Prompting

CoT prompting maintains a coherent narrative or argument across multiple exchanges,
ensuring the model retains context over a conversation, crucial for tasks requiring a logical
flow of ideas.

Few-Shot Prompting
Few-shot prompting presents a model with a task alongside a few examples of the desired
output, aiding the model in understanding the task at hand.
Examples and Applications
Prompt Tuning: Fine-tuning LLMs using specific prompts to enhance performance
on desired tasks.
Contextual Prompts: Utilizing previous conversation turns as prompts to maintain
context.
Task Descriptors: Employing task descriptions within prompts to guide LLMs in
task execution.
System and User Messages Example
System message: ”Respond in a formal tone.” User message: ”Tell me about the
weather.” Model response: ”The weather today is quite pleasant with a high of 75°F.”

Chain-of-Though Example
User Message: Which is a faster way to get home? Option 1: Take an 10 minutes bus,
then an 40 minute bus, and finally a 10 minute train. Option 2: Take a 90 minutes train,
then a 45 minute bike ride, and finally a 10 minute bus. Since Option 1 takes 60 minutes
and Option 2 takes 145 minutes, Option 1 is faster.
Which is a faster way to get to work? Option 1: Take a 1000 minute bus, then a half
hour train, and finally a 10 minute bike ride. Option 2: Take an 800 minute bus, then an
hour train, and finally a 30 minute bike ride.
Model response: Option 1 will take 1000+30+10 = 1040 minutes. Option 2 will take
800+60+30 = 890 minutes. Since Option 2 takes 890 minutes and Option 1 takes 1040
minutes, Option 2 is faster.

Mauricio Henriquez Schott, Ph.D. [email protected] 31

3 LLM - Prompt Engineering

n-shot Prompting
n-shot prompting, including zero-shot and few-shot prompting, are techniques where the
model is provided with zero or a few examples to learn and generalize from.

Generated Knowledge Prompting

This technique leverages the model’s ability to generate knowledge based on the provided
prompts, allowing for more informed and contextually relevant responses.

Optimizing Prompts
Optimizing prompts involves developing and refining prompts to efficiently utilize LLMs
across various applications and research topics, enhancing the model’s capabilities and
understanding its limitations.
Advanced Techniques
Tree of Thoughts: A notable technique in CoT prompting for maintaining a logical
flow of ideas.
Prompt Efficiency: Ensuring prompts are concise and effective to reduce
computational resources.
Prompt Variability: Experimenting with varying prompt structures to explore the
model’s response diversity.
Few-Shot Example
User Message:
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
Model response: Negative. //Correct
User Message:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.

Model response: The answer is True. //Incorrect

Mauricio Henriquez Schott, Ph.D. [email protected] 32

3 LLM - Prompt Engineering

Practical Applications
Prompt engineering finds its applications in creating better AI-powered services, customer-
facing chatbots, and industry-specific document generation among others.

Challenges and Considerations

Considering the right balance between prompt simplicity and effectiveness, handling out-
of-distribution queries, and ensuring model robustness are key considerations in prompt
engineering.

Future Directions
Exploration of advanced techniques, learning from in-context prompts, and evolving prompt
engineering practices to keep pace with the advancements in LLMs.
Future Horizons
Automated Prompt Engineering: Leveraging algorithms for automatic prompt
generation and optimization.
Cross-Model Prompting: Designing prompts compatible across different LLM
architectures.
Community-Driven Prompt Repositories: Establishing shared repositories for
effective prompts, aiding in the democratization of prompt engineering practices.

Mauricio Henriquez Schott, Ph.D. [email protected] 33

4 Conclusions

In reflecting upon the domain of Natural Language Processing (NLP), Large Language
Models (LLMs), and their associated APIs, several pivotal conclusions can be drawn. Firstly,
NLP has undoubtedly revolutionized the realm of human-computer interactions, facilitating
seamless and intuitive dialogues. LLMs, with their expansive architecture, particularly the
Transformer framework, have furthered this frontier by capturing intricate linguistic nuances
and dependencies.
The advent of the attention mechanism in LLMs has propelled their efficacy, allowing
them to dynamically prioritize information in vast textual data. The integration of modern
APIs, exemplified by the likes of Langchain, has democratized access to these models,
enabling developers and businesses to harness the power of LLMs with ease.
While the current trajectory of NLP and LLMs suggests a future replete with advancements,
it is crucial to navigate this domain with a keen understanding of the underlying mechanics,
especially as they become more integrated into daily life and critical business operations. The
blend of theoretical knowledge and practical implementation is paramount for the sustainable
and ethical growth of this field.

Mauricio Henriquez Schott, Ph.D. [email protected] 34

5 Additional Material

Web Sites - Online Videos

1. Tech and Futurism: LLM API Videos

2. Exploring ML and AI: LLM and Agent Videos
3. CluodYeti: LLM and AI Videos

Mauricio Henriquez Schott, Ph.D. [email protected] 35

6 Glossary of Terms and Acronyms

NLP (Natural Language Processing): AI’s subfield for machine understanding of human
language.

LLM (Large Language Model): Deep models like GPT-3 and BERT for NLP.

Transformer Architecture: Neural network known for its self-attention mechanism.

Attention Mechanism: Weighs input sequence importance in Transformers.

Token: Text units, such as words or sub-words, for NLP processing.

Embedding: Maps words to vectors based on semantic similarity.

Encoder-Decoder: Transformer’s components for processing inputs and generating

outputs.

Positional Encoding: Gives Transformers token position information.

BERT (Bidirectional Encoder Representations from Transformers): Pre-trained model

for contextual word understanding.

GPT (Generative Pre-trained Transformer): LLM for text generation, translation, etc.

RAG (Retriever-Augmented Generation): Combines LLMs with knowledge retrieval.

Vector Storage: Saves and retrieves high-dimensional vectors, often for embeddings.

LLM Agents: Software entities using LLMs for tasks and interactions.

Mauricio Henriquez Schott, Ph.D. [email protected] 36

7 REFERENCES

7 References
References
[Vas+17] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural information processing
systems 30 (2017).
[Aut23] AutoGen. AutoGen: A Framework for Developing LLM Applications using Multi-Agent Conversations.
https://microsoft.github.io/autogen. Accessed: 2023-10-22. 2023.
[Bia23] Weights & Biases. Weights & Biases: The AI Developer Platform. Platform for managing machine
learning workflows, tracking experiments, and versioning datasets. 2023. url: https://wandb.
ai/site.
[Col23] Google Colab. Google Colaboratory. https://colab.research.google.com/. Accessed: 2023-
10-22. 2023.
[Dat23] Harish Datalab. Unveiling the Power of Large Language Models (LLMs). Accessed: yyyy-mm-
dd. 2023. url: https://medium.com/@harishdatalab/unveiling- the- power- of- large-
language-models-llms-e235c4eba8a9.
[Dev23] Google Developers. Translating to a Lower Dimensional Space. Machine Learning Crash Course.
Google. 2023. url: https://developers.google.com/machine- learning/crash- course/
embeddings/translating-to-a-lower-dimensional-space.
[GPT23] GPT4All. GPT4All: Open-Source Ecosystem for Training and Deploying Large Language Models.
https://docs.gpt4all.io. Accessed: 2023-10-22. 2023.
[Gre23] Cobus Greyling. “Autonomous LLM Agents”. In: Medium (2023). Accessed: 2023-10-22. url:
https://cobusgreyling.medium.com/autonomous-llm-agents-f05eec35b6fb.
[Hig23] HiggingFace. Hugging Face: The AI Community Building the Future. https://huggingface.
co/. Accessed: 2023-10-22. 2023.
[Hol23] HolisticAI. From Transformer Architecture to Prompt Engineering. https://www.holisticai.
com/blog/from-transformer-architecture-to-prompt-engineering. Accessed: 2023-10-22.
2023.
[Kob23] KobolCpp. KoboldCpp: Easy-to-Use AI Text-Generation Software. https://llamasking.github.
io/Kobold.cpp/. Accessed: 2023-10-22. 2023.
[Lan23] Langchain. LangChain: A Framework for Developing Applications Powered by Language Models.
https://docs.langchain.com. Accessed: 2023-10-22. 2023.
[Nam23] Author Name. “Implementing RAG with LangChain and Hugging Face”. In: Medium: International
School of AI & Data Science (2023). Accessed: 2023-10-22. url: https : / / medium . com /
international- school- of- ai- data- science/implementing- rag- with- langchain- and-
hugging-face-28e3ea66c5f7.
[Oll23] Ollama. Ollama: Running Large Language Models Locally. https://ollama.ai. Accessed: 2023-
10-22. 2023.
[Pri23] PrivateGPT. PrivateGPT: Privacy Layer for Large Language Models. https://github.com/
imartinez/privateGPT. Accessed: 2023-10-22. 2023.
[Stu23] LM Studio. LM Studio: Cross-Platform Desktop Application for LLMs. https://github.com/
curiousexplorations/lm_studio. Accessed: 2023-10-22. 2023.
[Tea23] DeepPavlov Team. DeepPavlov Agent: An Open-Source Framework for Building Multi-Skill Conversational
Agents. https://deeppavlov.ai. Accessed: 2023-10-22. 2023.

Mauricio Henriquez Schott, Ph.D. [email protected] 37

REFERENCES 7 REFERENCES

[Web23] oobabooga WebUI. Oobabooga’s Text Generation WebUI: Gradio-Based Interface for LLMs.
https://github.com/oobabooga/Text-Generation-WebUI. Accessed: 2023-10-22. 2023.

Mauricio Henriquez Schott, Ph.D. [email protected] 38

LLM Book
No ratings yet
LLM Book
275 pages
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
No ratings yet
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
161 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
100% (1)
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
Large Language Models Concepts Techniques and Applications Atkinson Abutridy John 2024
No ratings yet
Large Language Models Concepts Techniques and Applications Atkinson Abutridy John 2024
254 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (6)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Y4 Autumn WhiteRose
No ratings yet
Y4 Autumn WhiteRose
155 pages
LLMs Guide for Developers & Data Scientists
100% (14)
LLMs Guide for Developers & Data Scientists
132 pages
AI 900 M2 Notes
No ratings yet
AI 900 M2 Notes
7 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Understanding LLMs Solberg-2025
No ratings yet
Understanding LLMs Solberg-2025
12 pages
Quick Start Guide to LLMs 2nd Ed
No ratings yet
Quick Start Guide to LLMs 2nd Ed
279 pages
Kindergarten Syllabus
100% (4)
Kindergarten Syllabus
2 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Large Language Models
100% (1)
Large Language Models
23 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
No ratings yet
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
185 pages
UU EktaVats AI Physics
No ratings yet
UU EktaVats AI Physics
102 pages
Module 2 Foundation Maven-V3
No ratings yet
Module 2 Foundation Maven-V3
60 pages
RPS INTERPRETING (English)
No ratings yet
RPS INTERPRETING (English)
5 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Module 3
No ratings yet
Module 3
9 pages
Techniques, Tricks & Frameworks
No ratings yet
Techniques, Tricks & Frameworks
143 pages
Language Models: A Guide For The Perplexed
No ratings yet
Language Models: A Guide For The Perplexed
35 pages
Generative AI NLP Bootcamp
No ratings yet
Generative AI NLP Bootcamp
17 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
Prompt Engineering NLP Master Guide
No ratings yet
Prompt Engineering NLP Master Guide
14 pages
Advanced Techniques in Training and Applying Large Language Models
No ratings yet
Advanced Techniques in Training and Applying Large Language Models
6 pages
Mod 4
No ratings yet
Mod 4
69 pages
Deped Ranking Interview
100% (1)
Deped Ranking Interview
7 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
AI Tools
No ratings yet
AI Tools
19 pages
LLM Review
No ratings yet
LLM Review
16 pages
Large Language Models
No ratings yet
Large Language Models
27 pages
Hardware Acceleration of LLMS: A Comprehensive Survey and Comparison
No ratings yet
Hardware Acceleration of LLMS: A Comprehensive Survey and Comparison
15 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Industrial Applications of Large Language Models
No ratings yet
Industrial Applications of Large Language Models
23 pages
LLMs
No ratings yet
LLMs
40 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
LLMs: Applications & Challenges
No ratings yet
LLMs: Applications & Challenges
30 pages
AI and Prompt
No ratings yet
AI and Prompt
18 pages
NLP Materia
No ratings yet
NLP Materia
29 pages
Episode 7
100% (2)
Episode 7
7 pages
Generative AI and Transformer Models
No ratings yet
Generative AI and Transformer Models
44 pages
Unit 5 A.I
No ratings yet
Unit 5 A.I
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
LLM Presentation
No ratings yet
LLM Presentation
10 pages
Reflective Practice for Professionals
No ratings yet
Reflective Practice for Professionals
2 pages
LLM 1
No ratings yet
LLM 1
6 pages
Konsep New Pedagogy
No ratings yet
Konsep New Pedagogy
14 pages
Overview of The Curriculum Development Process
No ratings yet
Overview of The Curriculum Development Process
13 pages
The Nature of Disease Pathology For The Health Professions 2nd Edition
No ratings yet
The Nature of Disease Pathology For The Health Professions 2nd Edition
312 pages
Nursing Module: Respiratory Care
No ratings yet
Nursing Module: Respiratory Care
20 pages
LLM Models
No ratings yet
LLM Models
23 pages
Sample Test 2 Key
No ratings yet
Sample Test 2 Key
5 pages
Understanding Large Language Models (LLMS)
No ratings yet
Understanding Large Language Models (LLMS)
2 pages
Inquiries, Investigation and Immersion: Contextualized Detailed Lesson Plan W/ Ims Sy 2020-2021
No ratings yet
Inquiries, Investigation and Immersion: Contextualized Detailed Lesson Plan W/ Ims Sy 2020-2021
32 pages
Attention Is All You Need.
No ratings yet
Attention Is All You Need.
5 pages
7 E.pilar Limin 12. 2.19
No ratings yet
7 E.pilar Limin 12. 2.19
28 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Day 2
No ratings yet
Day 2
3 pages
TSL 3143 Cs PPG Module
100% (1)
TSL 3143 Cs PPG Module
67 pages
1 - H7GD Ia 12
No ratings yet
1 - H7GD Ia 12
2 pages
Using Large Language Models
No ratings yet
Using Large Language Models
9 pages
Large Language Models and Their Use Cases
No ratings yet
Large Language Models and Their Use Cases
3 pages
Lesson Plan Reading Letter FF
No ratings yet
Lesson Plan Reading Letter FF
6 pages
Assessment of Learning Laptop 12rnlcop
No ratings yet
Assessment of Learning Laptop 12rnlcop
13 pages
Full Text
No ratings yet
Full Text
121 pages
Law of Conservation of Energy
No ratings yet
Law of Conservation of Energy
5 pages
Teacher Observation Tool FAQs - Teachers
No ratings yet
Teacher Observation Tool FAQs - Teachers
2 pages
DUZEN
No ratings yet
DUZEN
11 pages
Skills Checklist: Preliminary Period: Development
No ratings yet
Skills Checklist: Preliminary Period: Development
2 pages
Australian Physiotherapy Education
No ratings yet
Australian Physiotherapy Education
89 pages
Embedding in The Curriculum
No ratings yet
Embedding in The Curriculum
1 page
Synopsis 27.01.2023
No ratings yet
Synopsis 27.01.2023
3 pages
2024 8R Blueprint Final
No ratings yet
2024 8R Blueprint Final
6 pages
DLP - Perdev - 10-24-24 - Man-Cat
No ratings yet
DLP - Perdev - 10-24-24 - Man-Cat
4 pages
Week 5 Math 10-05-2020
No ratings yet
Week 5 Math 10-05-2020
2 pages
Lesson Plan 15-1-2025 - G2
No ratings yet
Lesson Plan 15-1-2025 - G2
2 pages