03 NLP Document
03 NLP Document
4 Conclusions 34
5 Additional Material 35
7 References 37
List of Figures
1 Phases of a NLP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 (a) In the architecture Encoder-Decoder, the input sequence is first encoded into a
state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers
[Hol23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 NLP Models Size [Dat23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Embeddings: Word-Association Vector Representation [Dev23]. . . . . . . . . . . 16
5 Transformers Architecture [Vas+17]. . . . . . . . . . . . . . . . . . . . . . . . . 19
6 LLM and API Interaction, Query-Chain Example. . . . . . . . . . . . . . . . . . 23
7 LLM and API Interaction, Agents [Gre23]. . . . . . . . . . . . . . . . . . . . . . 25
8 LLM Advance Features, Retrieval Augmented Generation (RAG) Example [Nam23]. 28
9 LM Studio [Stu23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 KoboldCpp [Kob23]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
11 Oobabooga Text Generation Web UI [Web23]. . . . . . . . . . . . . . . . . . . . 30
List of Tables
1 Current LLM Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Listings
1 Python summary code for a LSTM-LLM Model Implementation (lstm-llm01.py) 17
2
2 Python summary code for a Transformers-LLM Model Implementation (transformers-
llm02.py) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Python summary code for OpenAI LLM Query-Chain (chatest01.py) . . . . 24
4 Python summary code for Langchain Agents (autoagents-openai-01.py) . . . 26
5 Python summary code for Autogen Agents (autoagents-local llm-02.py) . . . 27
6 Python summary code for Ollama+RAG (rag.py) . . . . . . . . . . . . . . . 29
For many years, the primary means of interacting with computers was through physical
devices such as keyboards and mice. These interfaces, though reliable and precise, posed a
stark contrast to the natural ways humans communicate with each other. With the advent of
advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies,
we now envision a future where interacting with computers could be as seamless as having
a conversation with a fellow human.
NLP, a field at the intersection of computer science, artificial intelligence, and linguistics,
strives to enable computers to understand, interpret, and generate human language in a
valuable manner. The core of recent advancements in NLP is attributed to Large Language
Models (LLMs), which are deep learning models with a vast number of parameters that are
trained on extensive corpora of text data. These models learn to predict the probability of a
word or phrase given its context in a sentence, thereby gaining an understanding of syntax
and semantics.
LLMs, such as GPT-3 and BERT, have shown remarkable abilities in understanding and
generating human-like text, opening doors to numerous applications including but not limited
to text summarization, translation, and question-answering systems. The paradigm shift
toward more natural human-machine interaction is becoming apparent with the proliferation
of voice assistants like Amazon’s Alexa and Apple’s Siri, which utilize NLP and LLM
technologies to understand and respond to user queries in natural language.
However, we are on the cusp of further advancements as researchers and practitioners
aim to push the boundaries of what LLMs can achieve. The recent unveiling of models
like GPT-4, with its enhanced reasoning capabilities, and BLOOM, with a staggering 176
billion parameters, indicates a trajectory towards more sophisticated language understanding
and generation. These models not only promise a future of more intuitive human-machine
interactions but also a vast landscape of applications that could revolutionize industries from
healthcare to education and beyond.
While the silver screen has often romanticized the idea of seamless human-machine
communication with examples from Star Trek’s universal translator to Tony Stark’s J.A.R.V.I.S.,
and depicted in films like ”Her” and ”Blade Runner,” the reality is that we are inching closer
to such futuristic scenarios. The continuous efforts in the field of NLP and the evolution of
LLMs are bridging the gap between the natural ease of human conversation and the digital
dialogue with machines, marking a significant stride towards a future where the keyboard
and mouse may become relics of a bygone era.
Pop Culture References:
Star Trek’s Translator: Depicts real-time language translation.
J.A.R.V.I.S. in Iron Man: An AI that understands and assists Tony Stark.
C-3PO in Star Wars: A droid fluent in over six million forms of communication,
showcasing the potential of advanced linguistic processing.
Hardware Backbone
NLP models, especially LLMs, demand immense computational power. Companies leverage
GPUs and TPUs for training and inference. Specialized hardware like NVIDIA’s A100
Tensor Core GPUs play a pivotal role in accelerating NLP tasks.
Optimization Techniques
To deploy NLP efficiently, techniques like model pruning, quantization, and distillation
are used. These reduce model size without sacrificing much accuracy, ensuring smoother
deployment on edge devices.
Steps to Create an NLP Model:
Data Collection: Gather and preprocess a vast dataset or Corpus, usually textual
data, from diverse sources.
Training: Feed the data into the model, adjusting weights using backpropagation and
optimization algorithms.
Evaluation: Test the model’s performance on unseen data, assessing its accuracy and
understanding.
Fine-tuning: Adjust the model using smaller, domain-specific datasets for specialized
tasks.
Deployment: Integrate the trained model into applications, ensuring it can handle
real-world inputs efficiently.
Pragmatic analysis: In this phase, pragmatic analysis deals with the entire communication
and social content, and it provides the actual meaning of what was said in reinterpreted.
For example: ’Let me water’ is interpreted as a request instead of an order.
The odyssey of Large Language Models (LLMs) commenced in the 1960s with the inception
of the first-ever chatbot, Eliza, engineered by MIT savant Joseph Weizenbaum. Despite its
elementary pattern recognition ability, Eliza kindled the flames of what would later burgeon
into the sophisticated realm of Natural Language Processing (NLP) we are familiar with
today. Over the ensuing decades, a slew of significant innovations drove the field of LLMs
forward. Notable among these milestones was the unveiling of Long Short-Term Memory
(LSTM) networks in 1997, which heralded the creation of deeper and more intricate neural
networks capable of wrangling vast amounts of data. A further leap was witnessed with the
introduction of Stanford’s CoreNLP suite in 2010, offering a toolkit to tackle complex NLP
tasks such as sentiment analysis and named entity recognition.
The plot thickened in 2011 with the launch of Google Brain, a venture that equipped
researchers with formidable computing resources and datasets, alongside avant-garde features
like word embeddings, enabling NLP systems to better grasp the context of words. This
initiative paved the way for monumental advancements like the introduction of Transformer
models in 2017, which, in turn, birthed more sophisticated LLMs such as OpenAI’s GPT-
3, serving as the bedrock for ChatGPT and a plethora of other awe-inspiring AI-driven
applications.
In the contemporary scene, LLMs have demonstrated remarkable prowess in a multitude
of NLP tasks, evolving conversational AI, and showcasing impressive results where they
can generate contextually relevant and coherent responses, thus propelling the widespread
adoption of chatbots and virtual assistants. The narrative of LLMs took a cinematic turn
with the advent of GPT-3 by OpenAI in July 2020, a behemoth in the LLM arena at the
time, trained to predict the ensuing word in a sentence akin to a text message autocomplete
feature, but on a grandiloquent scale.
Fast forward to 2023, the realm of LLMs is abuzz with next-gen models like GPT-4, which
has showcased astounding capabilities with complex reasoning, advanced coding proficiency,
and human-level performance in multiple academic exams. The landscape is now dotted with
illustrious models like GPT-3 from OpenAI, PaLM or LaMDA from Google, Galactica or
OPT from Meta, Megatron-Turing from Nvidia/Microsoft, and Jurassic-1 from AI21 Labs,
each vying for the crown in a burgeoning kingdom of linguistic prowess.
However, amidst this effulgence of advancements, the domain of LLMs is not devoid
of ethical quandaries. The very essence of LLMs, their ability to generate text based on
colossal datasets, beckons a plethora of ethical and moral concerns. These range from the
propensity of LLMs to perpetuate existing biases present in the training data, to serious
contemplations regarding the responsibility for outputs generated by LLMs. As LLMs
continue to proliferate and permeate various facets of society, the dialogue around their
ethical and moral implications is burgeoning, with scholars and practitioners alike delving
into topics like the capacity for moral self-correction in LLMs, the knowledge of cultural moral
norms, and the practical and ethical challenges posed by LLMs, especially in education.
The trajectory of LLMs is a testament to human ingenuity and the relentless pursuit
of knowledge, mirrored in the ceaseless advancement of these linguistic behemoths. As we
stand on the cusp of further groundbreaking discoveries in this domain, the tale of LLMs is
far from over; it’s a riveting saga that continues to unfold, with each chapter promising a
blend of awe, enlightenment, and a cadre of ethical deliberations awaiting resolution.
LLMs - Delving into the Architecture
Architecture (Transformers)
LLMs typically employ a Transformer architecture, which consists of an encoder and a
decoder. However, models like GPT-3 only use the decoder part. This architecture handles
sequential data efficiently, making it apt for language processing.
Embedding Layer
The first step in processing text is converting words into numerical vectors using an embedding
layer. This transformation captures semantic relationships between words.
Self-Attention Mechanism
The Transformer utilizes a self-attention mechanism, allowing each input sequence element
to focus on different parts, capturing dependencies regardless of sequence positions.
Deciphering Tokens
Tokens represent chunks of text, like words or characters, that the model processes. For
instance, in GPT-3, a model might have a token limit of 2048 tokens, while GPT-4 could
potentially handle even more. This means that for a chat interaction, GPT-3 can consider
up to 2048 tokens in a single input-output sequence, determining its response based on that
context.
Figure 2: (a) In the architecture Encoder-Decoder, the input sequence is first encoded
into a state vector, which is then used to decode the output sequence (b) A transformer
layer, encoder and decoder modules were built by using stacks of transformer layers [Hol23].
Model N° Parameters Tokens Open Source VRAM Requirements Company Main Purpose
GPT-3 175B 2k No High OpenAI Conversational AI
GPT-4 1.76T 32k No Very High OpenAI Conversational AI, Text Generation
BERT 1.5B 512 Yes Depends on Task Google Text Classification, NER, etc.
Llama2 7B, 13B, 70B 4k Yes 16Gb, 32Gb, ... Meta Conversational AI, Code Interpreter, etc
Vicuna 7B 16k Yes 16Gb LMSYS Conversational AI
Mistral 7B 8k Yes 16Gb Mistral AI Conversational AI
Orca 3B, 7B, 13B 8k Yes 8Gb, 16Gb, 32Gb Microsoft Progressive Learning
Table 1: Current LLM Models.
is a significant push towards the democratization of LLMs. Recent efforts aim at advancing
open-source smaller models by distilling knowledge from larger, often proprietary, models.
This endeavor seeks to bridge the computational requirements and to allow for broader access
and utilization of LLM capabilities.
The GPT-4 model, in particular, has exhibited a broad spectrum of capabilities including
complex reasoning understanding, advanced coding ability, and proficiency in multiple academic
domains, thereby showcasing a trajectory of rapid advancement and a promise of near human-
level performance in certain task [tab.1].
The size of Large Language Models (LLMs) has seen a substantial increase, aligning with
a trend of 10x growth in parameters every year for a few consecutive years, akin to a new
form of Moore’s Law.
This evolution reflects the models’ pursuit of better semantic understanding and general-
purpose language processing capabilities.
For instance, GPT-2, which was finalized in 2019, had 1.5 billion parameters and was noted
for its ability to produce convincing prose.
Following this, GPT-3, with 175 billion parameters, made a significant leap in model size.
Furthermore, a model named PaLM showcased almost 3 times the parameter count of GPT-
3, tallying at 540 billion parameters.
GPT-4 (2013), with a speculated parameter count ranging from over 1 trillion to 170 trillion,
exhibits advanced capabilities like enhanced text generation, image handling, interactive
chatting, and better business decision-making, representing a significant advancement in
Large Language Models.
This trajectory, however, raises concerns regarding computational resources, as LLMs require
massive amounts of data and computational power during training and operation, which may
lead to diminishing returns, increased costs, and added complexity.
Tools: Tools like Hugging Face provide diverse datasets and pre-trained models,
enabling customization for various applications.
Scaling: As models scale, the data requirements grow exponentially. For instance,
GPT-3’s training data was vast, but a values-targeted dataset with just 80 text samples
was used to refine its behavior.
Code Samples: Practical examples and code samples are crucial for understanding
the customization process, aiding in fine-tuning models efficiently for specific tasks.
In the realm of Large Language Models (LLMs), understanding the concept of word embeddings
and vector spaces is pivotal. This understanding begins with the transformation of words
into vectors in a manner that can be comprehended and manipulated mathematically. Here,
an example is provided to elucidate this concept further.
Consider three words: King, Queen, and Man. The objective is to encapsulate the
semantic relationship between these words in a mathematical format. In a simplified model,
let’s assign vectors to these words such that the vector operations reveal semantic relationships.
Let:
Vector(King) = [3, 1]
Vector(Queen) = [2, 2]
Vector(Man) = [3, 0]
Now, we aim to capture the gender relationship between these words. Intuitively, we
could say that a King is to Queen as a Man is to Woman. Mathematically, this relationship
can be represented using vector arithmetic as follows:
Vector(King) − Vector(Man) = Vector(Queen) − Vector(Woman)
Substituting the known vectors, we get:
[3, 1] − [3, 0] = [2, 2] − Vector(Woman)
This simplifies to:
[0, 1] = [2, 2] − Vector(Woman)
Now, solving for Vector(Woman), we may obtain a vector such as [2, 1] that satisfies
the relationship. This operation has provided a simplistic representation of how gender
relationships can be captured using vector arithmetic.
In practical scenarios, the vector space is multi-dimensional, often comprising hundreds
or thousands of dimensions, and the vectors are obtained through training on large datasets.
For instance, models like Word2Vec or GloVe are trained on vast corpora of text to learn
vectors for words such that the vector arithmetic reveals semantic relationships akin to the
one demonstrated above.
This example provides a glimpse into the powerful capability of embeddings in capturing
semantic relationships, which is central to the operation of Large Language Models.
Even a small multi-dimensional space provides the freedom to group semantically similar
items together and keep dissimilar items far apart. Position (distance and direction) in
the vector space can encode semantics in a good embedding. For example, the following
visualizations of real embeddings show geometrical relationships that capture semantic relations
like the relation between a country and its capital [Dev23].
# Tokenization: Convert text into tokens. Real scenarios may use subwords or characters.
tokens = tokenize(data); vocab = createVocabulary(tokens)
word_to_idx, idx_to_word = createMappings(vocab)
# LSTM: Learns patterns in sequences. GPT-3 uses Transformers for better long-term
dependencies.
self.lstm = LSTM(50, 100)
# Fully connected layer: Produces predictions. Real LLMs can have billions of parameters.
self.fc = FullyConnected(100, vocab_size)
function forward(x):
x = embed(x)
lstm_out = passThroughLSTM(x)
return producePredictions(lstm_out)
# Training setup
model = SimpleLM(len(vocab)); optimizer, criterion = setupTraining()
# Training loop: Real LLM training can take weeks on powerful hardware.
for epoch in range(100):
for seq, next_token in sequences:
optimizer.resetGradients()
input_seq = convertToIndices(seq)
output = model.predict(input_seq)
# Text generation
function generate_text(model, start_text, length=10):
...
return concatenateWords(words)
Key Characteristics:
Its innovative use of attention mechanisms and parallel processing set this model apart
from the traditional Convolutional Neural Networks (CNN) and recurrent Long-Short Term
Memory (LSTM) networks. The network processed data sequences in parallel and used
attention layers to simulate the focus of attention in the human brain.
This mechanism connects relationships between words in the text, making it much more
efficient to process large sequences. As a result, the parallel nature of this architecture took
full advantage of graphics processors, and the attention layer eliminated the problem of
forgetting that plagues recurrent networks.
In the left diagram, you can see the activation of an attention layer in action. An attention
layer can handle many head attentions. These activations represent the significant associations
learned by the model during training.
6.- Decoder
Positionally encoded input representation and output embeddings are processed by the
decoder. It generates the output sequence based on the encoded input sequence. Like
the encoder, the transformer uses multiple decoder layers.
class SelfAttention(nn.Module):
def __init__(self, embed_size, heads):
super(SelfAttention, self).__init__()
# Initialize layers for Q, K, V, and output.
self.layers = self._init_layers(embed_size, heads)
class TransformerBlock(nn.Module):
def __init__(self, embed_size, heads, dropout, expansion):
super(TransformerBlock, self).__init__()
self.attention = SelfAttention(embed_size, heads)
# Other layers like normalization and feed-forward are initialized here.
self.layers = self._init_layers(embed_size, dropout, expansion)
class Transformer(nn.Module):
def __init__(self, vocab_size, embed_size, num_layers, heads, device, expansion, dropout):
super(Transformer, self).__init__()
# Define embeddings and transformer blocks.
self.embedding = nn.Embedding(vocab_size, embed_size)
self.transformer_blocks = self._init_blocks(embed_size, heads, dropout, expansion,
num_layers)
self.fc_out = nn.Linear(embed_size, vocab_size)
Central Interface: Interface to long-term memory, external data, and other LLMs.
# Checking if the vector store already exists, if not creating a new one
if os.path.exists(f"{store_name}.pkl"): # Loading the vector store from disk...
VectorStore = pickle.load(f)
else:
# Creating embeddings for text chunks
embeddings = OpenAIEmbeddings()
# Creating a new vector store from the text chunks
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
# Saving the vector store to disk for future use...
pickle.dump(VectorStore, f)
# Importing the AgentFinish class to identify when the agent has completed its task
while True: # Infinite loop to continue processing until a finish signal is received
output = agent.invoke({
"input": "how many letters in the word education?",
"intermediate_steps": intermediate_steps
})
if isinstance(output, AgentFinish): # if the output is an instance of AgentFinish, finish
final_result = output.return_values["output"]; break
else:
print(output.tool, output.tool_input)
tool = {"get_word_length": get_word_length}[output.tool]
config_list = [
{
"api_type": "open_ai",
"api_base": "http://localhost:1234/v1",
’api_key’: "NULL"
}
]
# Initiate another chat with the assistant to handle the second task. The task message is
passed to the assistant through the user_proxy agent.
user_proxy.initiate_chat(assistant, message=task2)
Listing 5: Python summary code for Autogen Agents (autoagents-local llm-02.py)
by OpenAI or other remote servers. This flexibility ensures that developers can maintain a
consistent codebase without having to adjust for the model’s location, be it local or remote.
[list.5].
# LLM
llm = Ollama(model="llama2", # Specify the language model to use
verbose=True, # Set verbose to True for more detailed output
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])) # Set up a callback
manager with a specified callback handler
print(f"Loaded LLM model {llm.model}") # Output the loaded language model
# QA chain
from langchain.chains import RetrievalQA # Import the RetrievalQA class from langchain.
chains module
qa_chain = RetrievalQA.from_chain_type(
llm, # Specify the language model to use
retriever=vectorstore.as_retriever(), # Set up a retriever using the vector store
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}, # Specify additional arguments for the
chain type
)
# Ask a question
question = f"What are the latest headlines on {url}?" # Formulate a question ....
result = qa_chain({"query": question}) # Pass the question to the QA chain and store the
result
Listing 6: Python summary code for Ollama+RAG (rag.py)
RAG
The RAG (Retrieval-Augmented Generation), refers to augmenting the generative capabilities
of LLMs with external knowledge retrieval. This is achieved by integrating a retrieval
mechanism that fetches relevant information from external sources, which is then used by the
LLM to generate more informed and contextually relevant responses. It’s a way to expand
the knowledge base of LLMs beyond what they were trained on, enhancing their ability to
provide accurate and up-to-date responses [list.6].
KoboldCpp
A simple one-file way to run various GGML and GGUF models with KoboldAI’s UI
Oobabooga
A Gradio web UI for Large Language Models.
Few-Shot Prompting
Few-shot prompting presents a model with a task alongside a few examples of the desired
output, aiding the model in understanding the task at hand.
Examples and Applications
Prompt Tuning: Fine-tuning LLMs using specific prompts to enhance performance
on desired tasks.
Contextual Prompts: Utilizing previous conversation turns as prompts to maintain
context.
Task Descriptors: Employing task descriptions within prompts to guide LLMs in
task execution.
System and User Messages Example
System message: ”Respond in a formal tone.” User message: ”Tell me about the
weather.” Model response: ”The weather today is quite pleasant with a high of 75°F.”
Chain-of-Though Example
User Message: Which is a faster way to get home? Option 1: Take an 10 minutes bus,
then an 40 minute bus, and finally a 10 minute train. Option 2: Take a 90 minutes train,
then a 45 minute bike ride, and finally a 10 minute bus. Since Option 1 takes 60 minutes
and Option 2 takes 145 minutes, Option 1 is faster.
Which is a faster way to get to work? Option 1: Take a 1000 minute bus, then a half
hour train, and finally a 10 minute bike ride. Option 2: Take an 800 minute bus, then an
hour train, and finally a 30 minute bike ride.
Model response: Option 1 will take 1000+30+10 = 1040 minutes. Option 2 will take
800+60+30 = 890 minutes. Since Option 2 takes 890 minutes and Option 1 takes 1040
minutes, Option 2 is faster.
n-shot Prompting
n-shot prompting, including zero-shot and few-shot prompting, are techniques where the
model is provided with zero or a few examples to learn and generalize from.
Optimizing Prompts
Optimizing prompts involves developing and refining prompts to efficiently utilize LLMs
across various applications and research topics, enhancing the model’s capabilities and
understanding its limitations.
Advanced Techniques
Tree of Thoughts: A notable technique in CoT prompting for maintaining a logical
flow of ideas.
Prompt Efficiency: Ensuring prompts are concise and effective to reduce
computational resources.
Prompt Variability: Experimenting with varying prompt structures to explore the
model’s response diversity.
Few-Shot Example
User Message:
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
Model response: Negative. //Correct
User Message:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
Practical Applications
Prompt engineering finds its applications in creating better AI-powered services, customer-
facing chatbots, and industry-specific document generation among others.
Future Directions
Exploration of advanced techniques, learning from in-context prompts, and evolving prompt
engineering practices to keep pace with the advancements in LLMs.
Future Horizons
Automated Prompt Engineering: Leveraging algorithms for automatic prompt
generation and optimization.
Cross-Model Prompting: Designing prompts compatible across different LLM
architectures.
Community-Driven Prompt Repositories: Establishing shared repositories for
effective prompts, aiding in the democratization of prompt engineering practices.
4 Conclusions
In reflecting upon the domain of Natural Language Processing (NLP), Large Language
Models (LLMs), and their associated APIs, several pivotal conclusions can be drawn. Firstly,
NLP has undoubtedly revolutionized the realm of human-computer interactions, facilitating
seamless and intuitive dialogues. LLMs, with their expansive architecture, particularly the
Transformer framework, have furthered this frontier by capturing intricate linguistic nuances
and dependencies.
The advent of the attention mechanism in LLMs has propelled their efficacy, allowing
them to dynamically prioritize information in vast textual data. The integration of modern
APIs, exemplified by the likes of Langchain, has democratized access to these models,
enabling developers and businesses to harness the power of LLMs with ease.
While the current trajectory of NLP and LLMs suggests a future replete with advancements,
it is crucial to navigate this domain with a keen understanding of the underlying mechanics,
especially as they become more integrated into daily life and critical business operations. The
blend of theoretical knowledge and practical implementation is paramount for the sustainable
and ethical growth of this field.
5 Additional Material
NLP (Natural Language Processing): AI’s subfield for machine understanding of human
language.
LLM (Large Language Model): Deep models like GPT-3 and BERT for NLP.
GPT (Generative Pre-trained Transformer): LLM for text generation, translation, etc.
Vector Storage: Saves and retrieves high-dimensional vectors, often for embeddings.
LLM Agents: Software entities using LLMs for tasks and interactions.
7 References
References
[Vas+17] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural information processing
systems 30 (2017).
[Aut23] AutoGen. AutoGen: A Framework for Developing LLM Applications using Multi-Agent Conversations.
https://microsoft.github.io/autogen. Accessed: 2023-10-22. 2023.
[Bia23] Weights & Biases. Weights & Biases: The AI Developer Platform. Platform for managing machine
learning workflows, tracking experiments, and versioning datasets. 2023. url: https://wandb.
ai/site.
[Col23] Google Colab. Google Colaboratory. https://colab.research.google.com/. Accessed: 2023-
10-22. 2023.
[Dat23] Harish Datalab. Unveiling the Power of Large Language Models (LLMs). Accessed: yyyy-mm-
dd. 2023. url: https://medium.com/@harishdatalab/unveiling- the- power- of- large-
language-models-llms-e235c4eba8a9.
[Dev23] Google Developers. Translating to a Lower Dimensional Space. Machine Learning Crash Course.
Google. 2023. url: https://developers.google.com/machine- learning/crash- course/
embeddings/translating-to-a-lower-dimensional-space.
[GPT23] GPT4All. GPT4All: Open-Source Ecosystem for Training and Deploying Large Language Models.
https://docs.gpt4all.io. Accessed: 2023-10-22. 2023.
[Gre23] Cobus Greyling. “Autonomous LLM Agents”. In: Medium (2023). Accessed: 2023-10-22. url:
https://cobusgreyling.medium.com/autonomous-llm-agents-f05eec35b6fb.
[Hig23] HiggingFace. Hugging Face: The AI Community Building the Future. https://huggingface.
co/. Accessed: 2023-10-22. 2023.
[Hol23] HolisticAI. From Transformer Architecture to Prompt Engineering. https://www.holisticai.
com/blog/from-transformer-architecture-to-prompt-engineering. Accessed: 2023-10-22.
2023.
[Kob23] KobolCpp. KoboldCpp: Easy-to-Use AI Text-Generation Software. https://llamasking.github.
io/Kobold.cpp/. Accessed: 2023-10-22. 2023.
[Lan23] Langchain. LangChain: A Framework for Developing Applications Powered by Language Models.
https://docs.langchain.com. Accessed: 2023-10-22. 2023.
[Nam23] Author Name. “Implementing RAG with LangChain and Hugging Face”. In: Medium: International
School of AI & Data Science (2023). Accessed: 2023-10-22. url: https : / / medium . com /
international- school- of- ai- data- science/implementing- rag- with- langchain- and-
hugging-face-28e3ea66c5f7.
[Oll23] Ollama. Ollama: Running Large Language Models Locally. https://ollama.ai. Accessed: 2023-
10-22. 2023.
[Pri23] PrivateGPT. PrivateGPT: Privacy Layer for Large Language Models. https://github.com/
imartinez/privateGPT. Accessed: 2023-10-22. 2023.
[Stu23] LM Studio. LM Studio: Cross-Platform Desktop Application for LLMs. https://github.com/
curiousexplorations/lm_studio. Accessed: 2023-10-22. 2023.
[Tea23] DeepPavlov Team. DeepPavlov Agent: An Open-Source Framework for Building Multi-Skill Conversational
Agents. https://deeppavlov.ai. Accessed: 2023-10-22. 2023.
[Web23] oobabooga WebUI. Oobabooga’s Text Generation WebUI: Gradio-Based Interface for LLMs.
https://github.com/oobabooga/Text-Generation-WebUI. Accessed: 2023-10-22. 2023.