AI6130: Large Language Models

Course Objectives

The course on Large Language Models (LLMs) aims to equip students with a comprehensive understanding of the principles, architectures, and applications of state-of-the-art LLMs like GPT-4. This course is designed for graduate students in computer science, data science, and related fields who have a foundational knowledge of machine learning and artificial intelligence. By taking this course, students will gain valuable skills in developing, fine-tuning, and deploying LLMs, which are increasingly integral to advancements in natural language processing, automated content creation, and AI-driven decision-making. This expertise will not only enhance their academic and research capabilities but also significantly boost their employability in tech industries, research institutions, and innovative startups focused on AI and machine learning technologies.

Optional Textbooks

Deep Learning by Goodfellow, Bengio, and Courville free online
Machine Learning — A Probabilistic Perspective by Kevin Murphy online
Natural Language Processing by Jacob Eisenstein free online
Speech and Language Processing by Dan Jurafsky and James H. Martin (3rd ed. draft)

Optional Papers

On the Opportunities and Risks of Foundation Models
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Large Multimodal Models: Notes on CVPR 2023 Tutorial
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Interactive Natural Language Processing
Towards Reasoning in Large Language Models: A Survey

Intended Learning Outcomes

By the end of this course, you should be able to:

Analyze the underlying architectures and mechanisms of large language models.
Implement and fine-tune large language models for specific applications.
Evaluate the performance of large language models in various contexts.
Design novel applications leveraging large language models to solve real-world problems.
Critically assess the limitations and potential improvements of current large language models.

Assessment Approach

Assignments (individually graded)

There will be two (2) assignments contributing to 2 * 25% = 50% of the total assessment.
Students will be graded individually on the assignments. They will be allowed to discuss with each other on the homework assignments, but they are required to submit individual write-ups and coding exercises.

Final Project (Group work but individually graded)

There will be a final project contributing to the remaining 50% of the total coursework assessment.
- 3–6 people per group
- Presentation: 20%, report: 30%
The project will be group work but the students will be graded individually. The final project presentation will ensure the student’s understanding of the project

Course Prerequisites

Proficiency in Deep Learning models
Proficiency in Python (using Numpy and PyTorch)
Deep Learning and NLP basics

Teaching

Instructor

Luu Anh Tuan

[email protected]

Teaching Assistants

Nguyen Tran Cong Duy

[email protected]

Schedule & Course Content

Week 1: Introduction

Lecture Slide

Lecture Content

Logistics of the course
Introduction about deep learning
Types of deep learning
Introduction about Large language models

Python & PyTorch Basics

Programming in Python
- Jupiter Notebook and google colab
- Introduction to Python
- Deep Learning Frameworks
- Why Pytorch?
- Deep learning with PyTorch
[Supplementary]
- Numerical programming with Numpy/Scipy - Numpy intro
- Numerical programming with Pytorch - Pytorch intro

Week 2: Neural Networks & Optimization Basics

Lecture Slide

Lecture Content

From Logistic Regression to Feed-forward NN
Activation functions
SGD with Backpropagation
Adaptive SGD (adagrad, adam, RMSProp)
Word Embeddings
CNN
RNN
RNN variants
Information bottleneck issue with vanilla Seq2Seq
Attention to the rescue
Details of attention mechanism
Transformer architecture
- Self-attention
- Positional encoding
- Multi-head attention

Practical exercise with Pytorch

Deep learning with PyTorch
Linear Regression
Logistic Regression
Numpy notebook Pytorch notebook
- Backpropagation
- Dropout
- Batch normalization
- Initialization
- Gradient clipping

Week 3: Language Models

Lecture Slide

Lecture Content

Language model
N-gram based LM
Window-based Language Model
Neural Language Models
Encoder-decoder
Seq2Seq
Sampling algorithms
Beam search

Week 4: Effective Transformers

Lecture Slide

Instruction to choose final project's topic

Lecture Content

FFN
Mixture of Experts
Attention
Layer Norm
Positional Encoding

Week 5: Pretrained Language Models and Large Language Models

Lecture Slide

Lecture Content

About pre-training
Why we need pre-training
Does pre-training indeed help?
Pre-trained Language models
Large Language Models

Practical

Using pretrained language model for classification: https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/pytorch/sequence_classification.ipynb
LLM prompting: https://huggingface.co/docs/transformers/main/en/tasks/prompting

Week 6: LLM finetuning

Lecture Slide

Lecture Content

LLM full finetuning
In-context learning
Parameter-efficient finetuning
Instruction finetuning

Week 7: Instruction tuning & RLHF

Assignment 1 is out here. Deadline: 13 Oct 2025.

Lecture Slide

Lecture Content

Instruction tuning
Multitask Prompted Training Enables Zero-shot Task Generalization (T0)
LIMA: Less Is More for Alignment
Instructed GPT

Week 8: RLHF recap & DPO

Lecture Slide

Lecture Content

Reinforcement learning from human feedback (RLHF)
Direct preference optimization (DPO)
Frontier, pitfalls and open problems of RLHF

Week 9: LLM Prompting

Lecture Slide

Lecture Content

Chain-of-Thought Prompting
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Tree of Thoughts Prompting
Program of Thoughts Prompting
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Measuring and Narrowing the Compositionality Gap in Language Models

Week 10: Retrieval-augmented LMs

Lecture Slide

Instruction for final project report here

Lecture Content

Limitations of parametric LLMs
What are retrieval-augmented LMs?
Benefit of retrieval-augmented LMs
Past: Architecture and training of retrieval-augmented LMs for downstream tasks
Present: Retrieval-augmented generation with LLMs

Week 11: Efficient Inference Methods

Assignment 2 is out here. Deadline: 17 Nov 2025.

Lecture Slide

Lecture Content

General concepts of eﬃcient inference methods for LLM serving
Speculative decoding systems
Model-based eﬃciency
Paged attention
Flash attention

Week 12: LLM agents and Agentic AI

Move to week 13

Lecture Slide

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets/images		assets/images
README.md		README.md

ntu-nail/AI6130

Folders and files

Latest commit

History

Repository files navigation

AI6130: Large Language Models

Course Objectives

Intended Learning Outcomes

Assessment Approach

Course Prerequisites

Teaching

Schedule & Course Content

Week 1: Introduction

Lecture Content

Python & PyTorch Basics

Week 2: Neural Networks & Optimization Basics

Lecture Content

Practical exercise with Pytorch

Suggested Readings

Week 3: Language Models

Lecture Content

Suggested Readings

Week 4: Effective Transformers

Lecture Content

Suggested Readings

Week 5: Pretrained Language Models and Large Language Models

Lecture Content

Suggested Readings

Practical

Week 6: LLM finetuning

Lecture Content

Week 7: Instruction tuning & RLHF

Lecture Content

Week 8: RLHF recap & DPO

Lecture Content

Week 9: LLM Prompting

Lecture Content

Week 10: Retrieval-augmented LMs

Lecture Content

Week 11: Efficient Inference Methods

Lecture Content

Week 12: LLM agents and Agentic AI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages