Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b44096a

Browse files
authored
Implement Llama 3.2 (rasbt#383)
1 parent a5405c2 commit b44096a

File tree

5 files changed

+8874
-6
lines changed

5 files changed

+8874
-6
lines changed

ch05/07_gpt_to_llama/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33

44

5-
This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture:
5+
This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture in the following recommended reading order:
66

7-
- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
7+
- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
8+
- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
9+
- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2
10+
11+
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">

ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@
108108
"id": "UJJneXpTEg4W"
109109
},
110110
"source": [
111+
"&nbsp;\n",
111112
"# 1. Convert the GPT model implementation step by step"
112113
]
113114
},
@@ -129,6 +130,7 @@
129130
"id": "979c7b6d-1370-4da1-8bfb-a2b27537bf2f"
130131
},
131132
"source": [
133+
"&nbsp;\n",
132134
"## 1.1 Replace LayerNorm with RMSNorm layer"
133135
]
134136
},
@@ -228,6 +230,7 @@
228230
"id": "5eb81f83-c38c-46a4-b763-aa630a32e357"
229231
},
230232
"source": [
233+
"&nbsp;\n",
231234
"## 1.2 Replace GELU with SiLU activation"
232235
]
233236
},
@@ -300,6 +303,7 @@
300303
"id": "4f9b5167-1da9-46c8-9964-8036b3b1deb9"
301304
},
302305
"source": [
306+
"&nbsp;\n",
303307
"## 1.3 Update the FeedForward module"
304308
]
305309
},
@@ -388,6 +392,7 @@
388392
"id": "f6b7bf4f-99d0-42c1-807c-5074d2cc1949"
389393
},
390394
"source": [
395+
"&nbsp;\n",
391396
"## 1.4 Implement RoPE"
392397
]
393398
},
@@ -503,6 +508,7 @@
503508
"id": "f78127b0-dda2-4c5a-98dd-bae8f5fe8297"
504509
},
505510
"source": [
511+
"&nbsp;\n",
506512
"## 1.5 Add RoPE to MultiHeadAttention module"
507513
]
508514
},
@@ -652,6 +658,7 @@
652658
"id": "e5a1a272-a038-4b8f-aaaa-f4b241e7f23f"
653659
},
654660
"source": [
661+
"&nbsp;\n",
655662
"## 1.6 Update the TransformerBlock module"
656663
]
657664
},
@@ -727,6 +734,7 @@
727734
"id": "ada953bc-e2c0-4432-a32d-3f7efa3f6e0f"
728735
},
729736
"source": [
737+
"&nbsp;\n",
730738
"## 1.7 Update the model class"
731739
]
732740
},
@@ -791,6 +799,7 @@
791799
"id": "4bc94940-aaeb-45b9-9399-3a69b8043e60"
792800
},
793801
"source": [
802+
"&nbsp;\n",
794803
"## 2. Initialize model"
795804
]
796805
},
@@ -1029,6 +1038,7 @@
10291038
"id": "5dc64a06-27dc-46ec-9e6d-1700a8227d34"
10301039
},
10311040
"source": [
1041+
"&nbsp;\n",
10321042
"## 3. Load tokenizer"
10331043
]
10341044
},
@@ -1288,6 +1298,7 @@
12881298
"id": "f63cc248-1d27-4eb6-aa50-173b436652f8"
12891299
},
12901300
"source": [
1301+
"&nbsp;\n",
12911302
"## 4. Load pretrained weights"
12921303
]
12931304
},
@@ -1544,14 +1555,23 @@
15441555
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
15451556
]
15461557
},
1558+
{
1559+
"cell_type": "markdown",
1560+
"id": "d72ed949-b6c0-4966-922f-eb0da732c404",
1561+
"metadata": {},
1562+
"source": [
1563+
"&nbsp;\n",
1564+
"## 5. Using the instruction-finetuned model"
1565+
]
1566+
},
15471567
{
15481568
"cell_type": "markdown",
15491569
"id": "akyo7WNyF_YL",
15501570
"metadata": {
15511571
"id": "akyo7WNyF_YL"
15521572
},
15531573
"source": [
1554-
"- Tip: as mentioned earlier, this is the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead"
1574+
"- As mentioned earlier, above we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead, as shown below"
15551575
]
15561576
},
15571577
{
@@ -1630,6 +1650,24 @@
16301650
"\n",
16311651
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
16321652
]
1653+
},
1654+
{
1655+
"cell_type": "markdown",
1656+
"id": "0f693da1-a07c-4e1d-af5a-c3923525f1e2",
1657+
"metadata": {},
1658+
"source": [
1659+
"&nbsp;\n",
1660+
"# What's next?"
1661+
]
1662+
},
1663+
{
1664+
"cell_type": "markdown",
1665+
"id": "fae93739-ca12-46ba-8ca7-7c07c59f669b",
1666+
"metadata": {},
1667+
"source": [
1668+
"- This notebook converted the original GPT-2 architecture into a Llama 2 model\n",
1669+
"- If you are interested in how to convert Llama 2 into Llama 3, Llama 3.1, and Llama 3.2, check out the [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb) notebook"
1670+
]
16331671
}
16341672
],
16351673
"metadata": {
@@ -1653,7 +1691,7 @@
16531691
"name": "python",
16541692
"nbconvert_exporter": "python",
16551693
"pygments_lexer": "ipython3",
1656-
"version": "3.10.6"
1694+
"version": "3.11.4"
16571695
},
16581696
"widgets": {
16591697
"application/vnd.jupyter.widget-state+json": {

0 commit comments

Comments
 (0)