Collect 0-10 the most impressive AI papers each year. Actively keep updating
- [2025] Qwen3 Technical Report (Qwen3). [paper]
- [2025] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-R1). [paper]
- [2025] Kimi K1.5: Scaling Reinforcement Learning with LLMs (Kimi K1.5). [paper]
- [2025] Scaling In‑the‑Wild Training for Diffusion‑based Illumination Harmonization and Editing (IC-Light, ICLR 2025). [paper]
- [2025] Learning to (Learn at Test Time): RNNs with Expressive Hidden States(TTT). [paper]
- [2024] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2024). [paper]
- [2024] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2024). [paper]
- [2024] The Llama 3 Herd of Models (Llama 3). [paper]
- [2024] Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context (Gemini 1.5). [paper]
- [2024] Mixtral of Experts (SMoE). [paper]
- [2024] Phi-3 technical report: A highly capable language model locally on your phone (PHI). [paper]
- [2024] A survey on evaluation of large language models(Survey). [paper]
- [2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Models(VIM). [paper]
- [2024] DeepSeek-V3 Technical Report (DeepSeek-V3). [paper]
- [2023] GPT-4 Technical Report (GPT-4). [paper]
- [2023] Llama 2: Open Foundation and Fine-Tuned Chat Models(LLaMa 2). [paper]
- [2023] Mamba: Linear-time sequence modeling with selective state spaces (Mamba). [paper]
- [2023] LLaMA: Open and Efficient Foundation Language Models (LLaMa). [paper]
- [2023] QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA). [paper]
- [2023] Gemini: A Family of Highly Capable Multimodal Models(Gemini). [paper]
- [2023] Qwen Technical Report (Qwen). [paper]
- [2023] PaLM: Scaling Language Modeling with Pathways (PaLM, JMLR 2023). [paper]
- [2023] Visual Instruction Tuning (LLaVA, NeurIPS 2023). [paper]
- [2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (CoT). [paper]
- [2022] Training language models to follow instructions with human feedback (InstructGPT, GPT-3.5). [paper]
- [2022] Masked Autoencoders Are Scalable Vision Learners (MAE, CVPR 2022). [paper]
- [2022] High-Resolution Image Synthesis with Latent Diffusion Models (StableDiffusion, CVPR 2022). [paper]
- [2022] LoRA: Low-Rank Adaptation of Large Language Models (ICLR 2022). [paper]
- [2022] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding(NeurIPS 2022). [paper]
- [2021] Emerging Properties in Self‑Supervised Vision Transformers(DINO). [paper]
- [2021] Highly accurate protein structure prediction with AlphaFold (Nature 2021). [paper]
- [2021] Hierarchical Vision Transformer using Shifted Windows (Swin, ICCV 2021). [paper]
- [2021] An image is worth 16x16 words: Transformers for image recognition at scale (ViT, ICLR 2021). [paper]
- [2021] Learning Transferable Visual Models From Natural Language Supervision (CLIP, ICML 2021). [paper]
- [2021] Zero-Shot Text-to-Image Generation(PMLR 2021). [paper]
- [2021] Evaluating Large Language Models Trained on Code(Codex). [paper]
- [2020] End-to-end object detection with transformers(DETR, ECCV 2020). [paper]
- [2020] Language Models are Few-Shot Learners (GPT-3, NeurIPS 2020). [paper]
- [2020] Denoising Diffusion Probabilistic Models (Diffusion, NeurIPS 2020). [paper]
- [2020] YOLOv4: Optimal Speed and Accuracy of Object Detection(YOLOv4). [paper]
- [2020] Exploring the limits of transfer learning with a unified text-to-text transformer(T5). [paper]
- [2020] Efficientdet: Scalable and efficient object detection (ICCV 2020). [paper]
- [2020] A Simple Framework for Contrastive Learning of Visual Representations (ICML 2020). [paper]
- [2020] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations(ICLR 2020). [paper]
- [2019] Language Models are Unsupervised Multitask Learners (GPT-2). [paper]
- [2019] Decoupled Weight Decay Regularization (AdamW, ICLR 2019). [paper]
- [2018] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (BART). [paper]
- [2018] Improving language understanding by generative pre-training (GPT-1). [paper]
- [2018] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Bert). [paper]
- [2017] Mastering the game of Go without human knowledge (AlphaGOZero, Nature 2017). [paper]
- [2017] Attention Is All You Need (Transformer, NeurIPS 2017). [paper]
- [2017] Pointnet: Deep learning on point sets for 3d classification and segmentation (PointNet, CVPR 2017). [paper]
- [2017] Mask R-CNN (ICCV 2017). [paper]
- [2016] Neural Architecture Search with Reinforcement Learning (NAS). [paper]
- [2016] Mastering the game of Go with deep neural networks and tree search (AlphaGo, Nature 2016). [paper]
- [2016] Deep Residual Learning for Image Recognition (ResNet, CVPR 2016). [paper]
- [2016] You only look once: Unified, real-time object detection (YOLO, CVPR 2016). [paper]
- [2015] Deep learning (Nature 2015). [paper]
- [2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (BN, ICML 2015). [paper]
- [2015] Adam: A Method for Stochastic Optimization (Adam). [paper]
- [2015] U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net). [paper]
- [2015] Very deep convolutional networks for large-scale image recognition (VGG, ICLR 2015). [paper]
- [2014] Generative Adversarial Nets (GAN, NeurIPS 2014). [paper]
- [2014] Neural Machine Translation by Jointly Learning to Align and Translate (Attention). [paper]
- [2014] Dropout: a simple way to prevent neural networks from overfitting (Dropout). [paper]
- [2014] Sequence to Sequence Learning with Neural Networks (Seq2seq, NeurIPS 2014). [paper]
- [2014] Distilling the Knowledge in a Neural Network (Knowledge Distillation). [paper]
- [2013] Distributed Representations of Words and Phrases and their Compositionality (word2vec, NeurIPS 2013). [paper]
- [2013] Playing Atari with Deep Reinforcement Learning (Q-learning). [paper]
- [2013] auto-encoding variational bayes (VAE). [paper]
- [2012] Imagenet classification with deep convolutional neural networks (AlexNet, NeurIps 2012). [paper]
- [2011] Deep Sparse Rectifier Neural Networks (ReLU). [paper]
- [2009] Imagenet: A large-scale hierarchical image database (CVPR 2009). [paper]
- [1998] Gradient-based learning applied to document recognition (CNN, Proceedings of the IEEE 1998). [paper]
- [1997] Long Short-Term Memory (LSTM, Neural Computation 1997). [paper]
- [1990] Finding Structure in Time (RNN). [paper]
- [1986] Learning internal representations by error propagation (BP, Biometrika 1986). [paper]
- The collection of papers is somewhat subjective and limited in knowledge. Sorry for any possible omissions.
- Before this list, there exist [another awesome deep learning list].