Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ntu-nail/AI6130

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 

Repository files navigation

AI6130: Large Language Models

Course Objectives

The course on Large Language Models (LLMs) aims to equip students with a comprehensive understanding of the principles, architectures, and applications of state-of-the-art LLMs like GPT-4. This course is designed for graduate students in computer science, data science, and related fields who have a foundational knowledge of machine learning and artificial intelligence. By taking this course, students will gain valuable skills in developing, fine-tuning, and deploying LLMs, which are increasingly integral to advancements in natural language processing, automated content creation, and AI-driven decision-making. This expertise will not only enhance their academic and research capabilities but also significantly boost their employability in tech industries, research institutions, and innovative startups focused on AI and machine learning technologies.

Optional Textbooks

  • Deep Learning by Goodfellow, Bengio, and Courville free online
  • Machine Learning — A Probabilistic Perspective by Kevin Murphy online
  • Natural Language Processing by Jacob Eisenstein free online
  • Speech and Language Processing by Dan Jurafsky and James H. Martin (3rd ed. draft)

Optional Papers

  • On the Opportunities and Risks of Foundation Models
  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants
  • Large Multimodal Models: Notes on CVPR 2023 Tutorial
  • A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
  • Interactive Natural Language Processing
  • Towards Reasoning in Large Language Models: A Survey

Intended Learning Outcomes

By the end of this course, you should be able to:

  • Analyze the underlying architectures and mechanisms of large language models.

  • Implement and fine-tune large language models for specific applications.

  • Evaluate the performance of large language models in various contexts.

  • Design novel applications leveraging large language models to solve real-world problems.

  • Critically assess the limitations and potential improvements of current large language models.

Assessment Approach

Assignments (individually graded)

  • There will be two (2) assignments contributing to 2 * 25% = 50% of the total assessment.
  • Students will be graded individually on the assignments. They will be allowed to discuss with each other on the homework assignments, but they are required to submit individual write-ups and coding exercises.

Final Project (Group work but individually graded)

  • There will be a final project contributing to the remaining 50% of the total coursework assessment.
    • 3–6 people per group
    • Presentation: 20%, report: 30%
  • The project will be group work but the students will be graded individually. The final project presentation will ensure the student’s understanding of the project

Course Prerequisites

  • Proficiency in Deep Learning models
  • Proficiency in Python (using Numpy and PyTorch)
  • Deep Learning and NLP basics

Teaching

Instructor

Luu Anh Tuan

[email protected]

Teaching Assistants

Nguyen Tran Cong Duy

[email protected]

Schedule & Course Content

Week 1: Introduction

Lecture Slide

Lecture Content

  • Logistics of the course
  • Introduction about deep learning
  • Types of deep learning
  • Introduction about Large language models

Python & PyTorch Basics

Week 2: Neural Networks & Optimization Basics

Lecture Slide

Lecture Content

  • From Logistic Regression to Feed-forward NN
  • Activation functions
  • SGD with Backpropagation
  • Adaptive SGD (adagrad, adam, RMSProp)
  • Word Embeddings
  • CNN
  • RNN
  • RNN variants
  • Information bottleneck issue with vanilla Seq2Seq
  • Attention to the rescue
  • Details of attention mechanism
  • Transformer architecture
    • Self-attention
    • Positional encoding
    • Multi-head attention

Practical exercise with Pytorch

Suggested Readings

Week 3: Language Models

Lecture Slide

Lecture Content

  • Language model
  • N-gram based LM
  • Window-based Language Model
  • Neural Language Models
  • Encoder-decoder
  • Seq2Seq
  • Sampling algorithms
  • Beam search

Suggested Readings

Week 4: Effective Transformers

Lecture Slide

Instruction to choose final project's topic

Lecture Content

  • FFN
  • Mixture of Experts
  • Attention
  • Layer Norm
  • Positional Encoding

Suggested Readings

  • Attention Is All You Need
  • Hendrycks and Gimpel. 2016. Gaussian Error Linear Units.
  • Ramachandran et al. 2017. Searching for Activation Functions.
  • Shazeer 2017. GLU Variants Improve Transformer
  • Ainslie et al. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
  • Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need.
  • DeepSeek team. DeepSeek-V2

Week 5: Pretrained Language Models and Large Language Models

Lecture Slide

Lecture Content

  • About pre-training
  • Why we need pre-training
  • Does pre-training indeed help?
  • Pre-trained Language models
  • Large Language Models

Suggested Readings

  • Chang, Y., Wang, X., Wang, J., Wu, Y., Zhu, K., Chen, H., Yang, L., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., & Xie, X. (2023). A Survey on Evaluation of Large Language Models. ArXiv, abs/2307.03109
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551.
  • A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
  • Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., ... & Fiedel, N. (2023). Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240), 1-113.
  • Chen, Mark, et al. Evaluating Large Language Models Trained on Code. arXiv:2107.03374, arXiv, 14 July 2021. arXiv.org, https://doi.org/10.48550/arXiv.2107.03374.
  • Touvron, Hugo, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288, arXiv, 19 July 2023. arXiv.org, https://doi.org/10.48550/arXiv.2307.09288.
  • Jiang, Albert Q., et al. Mixtral of Experts. arXiv:2401.04088, arXiv, 8 Jan. 2024. arXiv.org, https://doi.org/10.48550/arXiv.2401.04088.

Practical

Week 6: LLM finetuning

Lecture Slide

Lecture Content

  • LLM full finetuning
  • In-context learning
  • Parameter-efficient finetuning
  • Instruction finetuning

Week 7: Instruction tuning & RLHF

Assignment 1 is out here. Deadline: 13 Oct 2025.

Lecture Slide

Lecture Content

  • Instruction tuning
  • Multitask Prompted Training Enables Zero-shot Task Generalization (T0)
  • LIMA: Less Is More for Alignment
  • Instructed GPT

Week 8: RLHF recap & DPO

Lecture Slide

Lecture Content

  • Reinforcement learning from human feedback (RLHF)
  • Direct preference optimization (DPO)
  • Frontier, pitfalls and open problems of RLHF

Week 9: LLM Prompting

Lecture Slide

Lecture Content

  • Chain-of-Thought Prompting
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models
  • Tree of Thoughts Prompting
  • Program of Thoughts Prompting
  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  • Measuring and Narrowing the Compositionality Gap in Language Models

Week 10: Retrieval-augmented LMs

Lecture Slide

Instruction for final project report here

Lecture Content

  • Limitations of parametric LLMs
  • What are retrieval-augmented LMs?
  • Benefit of retrieval-augmented LMs
  • Past: Architecture and training of retrieval-augmented LMs for downstream tasks
  • Present: Retrieval-augmented generation with LLMs

Week 11: Efficient Inference Methods

Assignment 2 is out here. Deadline: 17 Nov 2025.

Lecture Slide

Lecture Content

  • General concepts of efficient inference methods for LLM serving
  • Speculative decoding systems
  • Model-based efficiency
  • Paged attention
  • Flash attention

Week 12: LLM agents and Agentic AI

Move to week 13

Lecture Slide

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •