Large Language Models Overview

Large language models (LLMs) are advanced machine learning models designed for natural language processing, characterized by their extensive parameters and self-supervised training on large text datasets. The evolution of LLMs includes significant milestones such as the introduction of the transformer architecture in 2017 and the development of notable models like GPT-3 and GPT-4, which have garnered widespread attention for their capabilities. As of 2024, LLMs are increasingly multimodal, capable of processing various data types beyond text, with ongoing competition among models to enhance performance and accessibility.

Uploaded by

agshadow300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views3 pages

Large Language Models Overview

Uploaded by

agshadow300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

What is large language models (LLM)

A large language model (LLM) is a type of machine learning model designed for
natural language processing tasks such as language generation. LLMs are language
models with many parameters, and are trained with self-supervised learning on a
vast amount of text.

The largest and most capable LLMs are generative pretrained transformers (GPTs).
Modern models can be fine-tuned for specific tasks or guided by prompt
engineering.[1] These models acquire predictive power regarding syntax, semantics,
and ontologies[2] inherent in human language corpora, but they also inherit
inaccuracies and biases present in the data they are trained in.[3]

History

The training compute of notable large models in FLOPs vs publication date over the
period 2010-2024. For overall notable models (top left), frontier models (top right),
top language models (bottom left) and top models within leading companies (bottom
right). The majority of these models are language models.

The training compute of notable large AI models in FLOPs vs publication date over
the period 2017-2024. The majority of large models are language models or
multimodal models with language capacity.
Before 2017, there were a few language models that were large as compared to
capacities then available. In the 1990s, the IBM alignment models pioneered
statistical language modelling. A smoothed n-gram model in 2001 trained on 0.3
billion words achieved state-of-the-art perplexity at the time.[4] In the 2000s, as
Internet use became prevalent, some researchers constructed Internet-scale
language datasets ("web as corpus"[5]), upon which they trained statistical language
models.[6][7] In 2009, in most language processing tasks, statistical language
models dominated over symbolic language models, as they can usefully ingest large
datasets.[8]

After neural networks became dominant in image processing around 2012,[9] they
were applied to language modelling as well. Google converted its translation service
to Neural Machine Translation in 2016. As it was before transformers, it was done by
seq2seq deep LSTM networks.
An illustration of main components of the transformer model from the original paper,
where layers were normalized after (instead of before) multiheaded attention
At the 2017 NeurIPS conference, Google researchers introduced the transformer
architecture in their landmark paper "Attention Is All You Need". This paper's goal
was to improve upon 2014 seq2seq technology,[10] and was based mainly on the
attention mechanism developed by Bahdanau et al. in 2014.[11] The following year
in 2018, BERT was introduced and quickly became "ubiquitous".[12] Though the
original transformer has both encoder and decoder blocks, BERT is an encoder-only
model. Academic and research usage of BERT began to decline in 2023, following
rapid improvements in the abilities of decoder-only models (such as GPT) to solve
tasks via prompting.[13]

Although decoder-only GPT-1 was introduced in 2018, it was GPT-2 in 2019 that
caught widespread attention because OpenAI at first deemed it too powerful to
release publicly, out of fear of malicious use.[14] GPT-3 in 2020 went a step further
and as of 2024 is available only via API with no offering of downloading the model to
execute locally. But it was the 2022 consumer-facing browser-based ChatGPT that
captured the imaginations of the general population and caused some media hype
and online buzz.[15] The 2023 GPT-4 was praised for its increased accuracy and as
a "holy grail" for its multimodal capabilities.[16] OpenAI did not reveal the high-level
architecture and the number of parameters of GPT-4. The release of ChatGPT led to
an uptick in LLM usage across several research subfields of computer science,
including robotics, software engineering, and societal impact work.[17] In 2024
OpenAI released the reasoning model OpenAI o1, which generates long chains of
thought before returning a final answer.

Competing language models have for the most part been attempting to equal the
GPT series, at least in terms of number of parameters.[18]

Since 2022, source-available models have been gaining popularity, especially at first
with BLOOM and LLaMA, though both have restrictions on the field of use. Mistral
AI's models Mistral 7B and Mixtral 8x7b have the more permissive Apache License.
In January 2025, DeepSeek released DeepSeek R1, a 671-billion-parameter open-
weight model that performs comparably to OpenAI o1 but at a much lower cost.[19]

Since 2023, many LLMs have been trained to be multimodal, having the ability to
also process or generate other types of data, such as images or audio. These LLMs
are also called large multimodal models (LMMs).[20]
As of 2024, the largest and most capable models are all based on the transformer
architecture. Some recent implementations are based on other architectures, such
as recurrent neural network variants and Mamba (a state space model).[21][22][23]

Answer For Introduction To Generative AI Quiz
75% (8)
Answer For Introduction To Generative AI Quiz
5 pages
Large Language Models
No ratings yet
Large Language Models
40 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
Large Language Model
No ratings yet
Large Language Model
49 pages
Efficient Multimodal Large Language Models - A Survey
No ratings yet
Efficient Multimodal Large Language Models - A Survey
36 pages
Module 1
No ratings yet
Module 1
33 pages
GPT 4 Wikipedia
No ratings yet
GPT 4 Wikipedia
23 pages
Survey of Different Large Language Model Architectures Trends Benchmarks and Challenges
No ratings yet
Survey of Different Large Language Model Architectures Trends Benchmarks and Challenges
43 pages
Large Language Model
No ratings yet
Large Language Model
22 pages
Autonomous Prompt Engineering in Large Language Models
No ratings yet
Autonomous Prompt Engineering in Large Language Models
38 pages
Itmconf Iwadi2024 02025
No ratings yet
Itmconf Iwadi2024 02025
11 pages
Paper 1
No ratings yet
Paper 1
44 pages
A Survey Large Language Models
No ratings yet
A Survey Large Language Models
58 pages
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
No ratings yet
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
23 pages
Large Language Model: Instructor Name: Shukdev Datta ML Developer at Innovative Skills
No ratings yet
Large Language Model: Instructor Name: Shukdev Datta ML Developer at Innovative Skills
22 pages
ChatGPT in The Age of Generative AI and Large Lang
No ratings yet
ChatGPT in The Age of Generative AI and Large Lang
60 pages
LMM Model
No ratings yet
LMM Model
41 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Lec # 12
No ratings yet
Lec # 12
26 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Essay On Large Language Models
No ratings yet
Essay On Large Language Models
2 pages
Scalexm - Ai: A Compact Guide To Large Language Models
No ratings yet
Scalexm - Ai: A Compact Guide To Large Language Models
9 pages
Large Language Models Overview
No ratings yet
Large Language Models Overview
43 pages
GPT 3
No ratings yet
GPT 3
15 pages
References
No ratings yet
References
3 pages
Survey On Large Language Models
No ratings yet
Survey On Large Language Models
52 pages
Three 150224 Generative A I Intro
No ratings yet
Three 150224 Generative A I Intro
19 pages
Innovations in LLMs Presentation Expanded MSOffice
No ratings yet
Innovations in LLMs Presentation Expanded MSOffice
24 pages
EMA and HMA On AI
No ratings yet
EMA and HMA On AI
10 pages
Applsci 14 05068
No ratings yet
Applsci 14 05068
30 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
A214 Ayush Nigam Seminar-1
No ratings yet
A214 Ayush Nigam Seminar-1
16 pages
GPT-3 - Wikipedia
No ratings yet
GPT-3 - Wikipedia
22 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
LLMs: A Research Community Overview
No ratings yet
LLMs: A Research Community Overview
37 pages
Report - PDF 20240827 210738 0000
No ratings yet
Report - PDF 20240827 210738 0000
23 pages
Technical Seminar
No ratings yet
Technical Seminar
16 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
STORM - The Function of Large Language - Models in Embedding Space and The Subspace of The Learned Manifold ?
No ratings yet
STORM - The Function of Large Language - Models in Embedding Space and The Subspace of The Learned Manifold ?
11 pages
LLMs: Applications & Challenges
No ratings yet
LLMs: Applications & Challenges
30 pages
The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
LLM Survey
100% (1)
LLM Survey
43 pages
LETRS Readiness Checklist Ext
0% (1)
LETRS Readiness Checklist Ext
11 pages
LLM Review
No ratings yet
LLM Review
16 pages
In Consulting Nasscom Deloitte Paper Large Language Models LLMs Noexp
No ratings yet
In Consulting Nasscom Deloitte Paper Large Language Models LLMs Noexp
13 pages
LLMS&EMBEDDINGS
No ratings yet
LLMS&EMBEDDINGS
10 pages
Natural Language Processing in The Era of Large La
No ratings yet
Natural Language Processing in The Era of Large La
5 pages
Behavioral Intervention Plan
No ratings yet
Behavioral Intervention Plan
4 pages
Stuart Stress Adaptation Model
No ratings yet
Stuart Stress Adaptation Model
10 pages
Timeline: Timeline of Natural Language Processing Models
No ratings yet
Timeline: Timeline of Natural Language Processing Models
5 pages
LLMs and Future Directions in AI
No ratings yet
LLMs and Future Directions in AI
8 pages
Chapter Five Communicative Language Teaching
No ratings yet
Chapter Five Communicative Language Teaching
10 pages
J.Krishnamurti Selected Quotes PDF
No ratings yet
J.Krishnamurti Selected Quotes PDF
18 pages
PROF ED Assessment and Evaluation of Learning I
100% (1)
PROF ED Assessment and Evaluation of Learning I
5 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
An Explanation of Noon Qutni in The Quraan
100% (9)
An Explanation of Noon Qutni in The Quraan
2 pages
The Role of Reading Skills On Reading Comprehensio
No ratings yet
The Role of Reading Skills On Reading Comprehensio
16 pages
Module 1 Quizzes, TEFL FULL CIRCLE
No ratings yet
Module 1 Quizzes, TEFL FULL CIRCLE
4 pages
Threeofakind B Eds157 A2
No ratings yet
Threeofakind B Eds157 A2
51 pages
TSSC PUBLISHER EDITED Final 1
No ratings yet
TSSC PUBLISHER EDITED Final 1
26 pages
Metco Flavel
No ratings yet
Metco Flavel
7 pages
Johari Window
No ratings yet
Johari Window
3 pages
The Silent Language - Body Language
No ratings yet
The Silent Language - Body Language
10 pages
GPT 3
No ratings yet
GPT 3
14 pages
Web-Based English Literacy Tool Study
No ratings yet
Web-Based English Literacy Tool Study
24 pages
Guide to Large Language Models
No ratings yet
Guide to Large Language Models
6 pages
3is Class Note (Lesson 5)
No ratings yet
3is Class Note (Lesson 5)
8 pages
Keefektifan Pemberian Terapi Guided Imagery Untuk Mengurangi Tingkat Kecemasan Pada Pasien Gangguan Jiwa Skizofrenia
No ratings yet
Keefektifan Pemberian Terapi Guided Imagery Untuk Mengurangi Tingkat Kecemasan Pada Pasien Gangguan Jiwa Skizofrenia
8 pages
Principles of Management - Test Example: Question One
No ratings yet
Principles of Management - Test Example: Question One
2 pages
Taslk 1-3
No ratings yet
Taslk 1-3
7 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
Gita Assessment 1
No ratings yet
Gita Assessment 1
1 page
Understanding Large Language Models (LLMS)
No ratings yet
Understanding Large Language Models (LLMS)
2 pages
Stainton Rogers - Chapter11 - Social Selves, Social Identities-1
No ratings yet
Stainton Rogers - Chapter11 - Social Selves, Social Identities-1
31 pages
Ethos Logos Pathos in Advertising
No ratings yet
Ethos Logos Pathos in Advertising
4 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Sex and Relationships Lesson Plans
No ratings yet
Sex and Relationships Lesson Plans
9 pages
The Support of Decision Processes With Business Intelligence and Analytics
No ratings yet
The Support of Decision Processes With Business Intelligence and Analytics
13 pages
Athifa Aura Kenza Aydin - The Importance of Reading in The Digital Age
No ratings yet
Athifa Aura Kenza Aydin - The Importance of Reading in The Digital Age
4 pages
Pengertian Conditional Sentence Type 3
No ratings yet
Pengertian Conditional Sentence Type 3
3 pages
Grade 7 SA Unit 6
No ratings yet
Grade 7 SA Unit 6
3 pages
About Perplexity AI
No ratings yet
About Perplexity AI
1 page
Cyberpunk
No ratings yet
Cyberpunk
1 page
Managing Strong Emotions Effectively
No ratings yet
Managing Strong Emotions Effectively
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
What Is Tokenization in Dataset Preprocessing
No ratings yet
What Is Tokenization in Dataset Preprocessing
1 page
What Cyberpunk Is
No ratings yet
What Cyberpunk Is
1 page

Large Language Models Overview

Uploaded by

Large Language Models Overview

Uploaded by

What is large language models (LLM)

You might also like