A curated list of papers, datasets, and toolkits for Code-Switching & Code-Mixing in Natural Language Processing.
Click on any link to jump to the corresponding section on this page.
- Survey Papers
- 1. NLP Tasks
- 2. Datasets & Resources
- 3. Model Training & Adaptation
- 4. Evaluation & Benchmarking
- 5. Multi- & Cross-Modal Applications
- Workshops & Shared Tasks
- Contributing
Comprehensive reviews of the code-switching research landscape. A great place to start.
-
A Survey of Current Datasets for Code-Switching Research - Jose, N., et al. (2020).
-
A Survey of Code-switched Speech and Language Processing - Sitaram, S., et al. (2020).
-
A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies - Doğruöz, A. S., et al. (2021).
-
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges - Winata, G. I., et al. (2023).
-
A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions - Hamed, I., et al. (2025).
-
Position Paper
- Building Educational Technologies for Code-Switching: Current Practices, Difficulties and Future Directions - Li Nguyen, et al. (2022).
Tasks focused on understanding, parsing, and extracting meaning from code-mixed text.
-
Word Level Language Identification in English Telugu Code Mixed Data - Gundapu, S. & Mamidi, R. (2018).
-
A fast, compact, accurate model for language identification of codemixed text - Zhang, Y., et al. (2018).
-
Language identification framework in code-mixed social media text based on quantum LSTM - Shekhar, S., et al. (2020).
-
A Pre-trained Transformer and CNN model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text - Dowlagar, S., et al. (2021).
-
Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? - Iliescu, D.-M., et al. (2021).
-
IRNLP_DAIICT@LT-EDI-EACL2021: Hope Speech detection in Code Mixed text using TF-IDF Char N-grams and MuRIL - Dave, B., et al. (2021).
-
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts - Lambebo Tonja, A., et al. (2022).
-
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts - Tonja, A. L., et al. (2022).
-
TongueSwitcher: Fine-Grained Identification of German-English Code-Switching - Sterner, I., & Teufel, S. (2023).
-
Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation - Doğruöz, A. S., et al. (2023).
-
Multilingual Large Language Models Are Not (Yet) Code-Switchers - Zhang, R., et al. (2023).
-
Code-Switched Language Identification is Harder Than You Think - Burchell, L., et al. (2024).
-
Multilingual Identification of English Code-Switching - Sterner, I. (2024).
-
MaskLID: Code-Switching Language Identification through Iterative Masking - Kargaran, A. H., et al. (2024).
-
Offensive Language Identification
- MUCS@DravidianLangTech-EACL2021:COOLI-Code-Mixing Offensive Language Identification - Balouchzahi, F., et al. (2021).
- SJ AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification - Jayanthi, S. M., et al. (2021).
- DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text - Chakravarthi, B. R., et al. (2022).
- Offensive Content Detection Via Synthetic Code-Switched Text - Salaam, C., et al. (2022).
- Offensive Language Identification in Transliterated and Code-Mixed Bangla - Raihan, M. N., et al. (2023).
- OffMix-3L: A Novel Code-Mixed Test Dataset in Bangla-English-Hindi for Offensive Language Identification - Goswami, D., et al. (2023).
- Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying - Nafis, N., et al. (2023).
- SetFit: A Robust Approach for Offensive Content Detection in Tamil-English Code-Mixed Conversations Using Sentence Transfer Fine-tuning - Kathiravan Pannerselvam, et al. (2024).
- LLMsAgainstHate@NLU of Devanagari Script Languages 2025: Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs - Rushendra Sidibomma, et al. (2025).
-
Hope Speech Detection
- SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification - Jayanthi, S. M., et al. (2021).
- IRNLP_DAIICT@LT-EDI-EACL2021: Hope Speech detection in Code Mixed text using TF-IDF Char N-grams and MuRIL - Dave, B., et al. (2021).
- Hope Speech Detection in Code-Mixed Text: Exploring Deep Learning Models and Language Effects - Bhat, S., et al. (2021).
- Hope Speech Detection in code-mixed Roman Urdu tweets - Ahmad, M., et al. (2025).
- POS Tagging of English-Hindi Code-Mixed Social Media Content - Vyas, Y., et al. (2014).
- POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments - Sequiera, R., et al. (2015).
- Development of POS tagger for English-Bengali Code-Mixed data - Raha, T., et al. (2019).
- Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging - Appidi, A. R., et al. (2020).
- A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text - Dowlagar, S. & Mamidi, R. (2021).
- Are Multilingual Models Effective in Code-Switching? - Winata, G. I., et al. (2021).
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in Hinglish - Kumar, V., et al. (2022).
- PRO-CS : An Instance-Based Prompt Composition Technique for Code-Switched Tasks - Bansal, S., et al. (2022).
- PACMAN: PArallel CodeMixed dAta generatioN for POS tagging - Chatterjee, A., et al. (2022).
- CoMix: Guide Transformers to Code-Mix using POS structure and Phonetics - Arora, G., et al. (2023).
- Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data - Shynkarov, Y., et al. (2025).
- Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts - Absar, S. (2025).
- Tackling Code-Switched NER: Participation of CMU - Geetha, P., et al. (2018).
- Cross Script Hindi English NER Corpus from Wikipedia - Ansari, M. Z., et al. (2019).
- Character level neural architectures for boosting named entity recognition in code mixed tweets - Narayanan, A., et al. (2020).
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP - Qin, L., et al. (2020).
- Contextual Embeddings for Arabic-English Code-Switched Data - Sabty, C., et al. (2020).
- Named Entity Recognition for Code Mixed Social Media Sentences - Sharma, Y., et al. (2021).
- Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching - Chopra, P., et al. (2021).
- Performance analysis of named entity recognition approaches on code-mixed data - Gaddamidi, S. & Prasath, R. R. (2021).
- Are Multilingual Models Effective in Code-Switching? - Winata, G. I., et al. (2021).
- CMNEROne at SemEval-2022 Task 11: Code-Mixed Named Entity Recognition by leveraging multilingual data - Dowlagar, S. & Mamidi, R. (2022).
- UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer - El Mekki, A., et al. (2022).
- "Kanglish alli names!" Named Entity Recognition for Kannada-English Code-Mixed Social Media Data - S, Sumukh & Shrivastava, M. (2022).
- MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER - Zhou, R., et al. (2022).
- CMB AI Lab at SemEval-2022 Task 11: A Two-Stage Approach for Complex Named Entity Recognition via Span Boundary Detection and Span Classification - PU, K., et al. (2022).
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in Hinglish - Kumar, V., et al. (2022).
- COCOA: An Encoder-Decoder Model for Controllable Code-switched Generation - Mondal, S., et al. (2022).
- Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data - Peng, S., et al. (2024).
- GPT-NER: Named Entity Recognition via Large Language Models - Wang, S., et al. (2025).
- Code-Mixing in Social Media Text: The Last Language Identification Frontier? - Mave, D., et al. (2018).
- Sentiment Analysis of Code-Mixed Hinglish - Saha, R., et al. (2020).
- Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017 - Patra, B. G., et al. (2020).
- Overview of the Mixed Script Identification @ ICON-2020 - Sequiera, R., et al. (2020).
- SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets - Patwa, P., et al. (2020).
- Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings - Tiwari, P., et al. (2020).
- Sentiment Analysis in Code-Mixed Telugu-English Text with Multilingual Embeddings - Yasaswini, K., et al. (2020).
- Data Augmentation for Low-Resource Code-Switching Speech Recognition - Gonen, H., et al. (2020).
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP - Qin, L., et al. (2020).
- Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text - Singh, K., et al. (2020).
- BERT-based Language Identification in Code-Mixed Social Media Text - Dowlagar, S., et al. (2021).
- Multitask Learning for Emotionally Analyzing Code-Mixed Social Media Text - Dowlagar, S., et al. (2021).
- Offensive Language Detection in Code-Mixed Social Media Text - Suryawanshi, S., et al. (2021).
- From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text - Gautam, S., et al. (2021).
- Sentiment Analysis For Code-Mixed Indian Social Media Text With Code-Mix Embedding - Suryawanshi, S., et al. (2021).
- Hope Speech Detection in Code-Mixed Dravidian Languages - Chakravarthi, B. R., et al. (2021).
- DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text - Chakravarthi, B. R., et al. (2022).
- Code-Switching Patterns in Multilingual Dialogue Systems - Sitaram, S., et al. (2022).
- Code-Switching Text Generation for Multilingual Dialogue - Sitaram, S., et al. (2022).
- PRO-CS : An Instance-Based Prompt Composition Technique for Code-Switched Tasks - Bansal, S., et al. (2022).
- Multi-Label Emotion Classification on Code-Mixed Text: Data and Methods - Ameer, I., et al. (2022).
- Code-Mixed Sentiment Analysis with Pretrained Language Models - Sitaram, S., et al. (2022).
- Code-Mixed Sentiment Analysis with Data Augmentation - Saha, R., et al. (2022).
- Sentiment Analysis in Code-Mixed Low-Resource Dravidian Languages - Chakravarthi, B. R., et al. (2023).
- Multitask Learning for Code-Mixed Sentiment and Emotion Analysis - Dowlagar, S., et al. (2023).
- Sentiment Analysis for Code-Mixed Indian Language Texts - Sitaram, S., et al. (2023).
- Emotion Analysis in Code-Mixed WhatsApp Messages - Suryawanshi, S., et al. (2023).
- Offensive Language Identification in Code-Mixed Dravidian Languages - Chakravarthi, B. R., et al. (2023).
- Emotion Detection in Code-Mixed Roman Urdu - English Text - Suryawanshi, S., et al. (2023).
- Sarcasm Detection in Dravidian Code-Mixed Text Using Transformer-Based Models - Bhaumik, A. B. & Das, M. (2023).
- Hate Speech Detection in Code-Mixed Hinglish Text - Saha, R., et al. (2023).
- Findings of the WILDRE Shared Task on Code-mixed Less-resourced Sentiment Analysis for Indo-Aryan Languages - Mishra, A., et al. (2024).
- Findings of the WILDRE Shared Task on Code-mixed Less-resourced Sentiment Analysis for Indo-Aryan Languages - Mishra, A., et al. (2024).
- WILDRE Shared Task: Sentiment Analysis in Code-Mixed Telugu-English Text - Chakravarthi, B. R., et al. (2024).
- Code-Mixed Sentiment Analysis with Multimodal Data - Sitaram, S., et al. (2024).
- SemEval-2024 Task 9: Sentiment Analysis in Code-Mixed Text - Patra, B. G., et al. (2024).
- Emotion Analysis in Code-Mixed Social Media Text - Suryawanshi, S., et al. (2024).
- Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data - Shynkarov, Y., et al. (2025).
- Code-Mixed Sentiment Analysis with Low-Resource Settings - Sitaram, S., et al. (2025).
- Cross-Lingual Transfer for Code-Mixed Sentiment Analysis - Saha, R., et al. (2025).
- Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data - Shynkarov, Y., et al. (2025).
- Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning - Winata, G. I., et al. (2018).
- Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data - Pratapa, A., et al. (2018).
- Dependency Parser for Bengali-English Code-Mixed Data enhanced with a Synthetic Treebank - Ghosh, U., et al. (2019).
- A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning - Gupta, D., et al. (2020).
- From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text - Tarunesh, I., et al. (2021).
- PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality - Kodali, P., et al. (2022).
- SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing - Kodali, P., et al. (2022).
- Improving Code-Switching Dependency Parsing with Semi-Supervised Auxiliary Tasks - Özateş, Ş. B., et al. (2022).
- CoMix: Guide Transformers to Code-Mix using POS structure and Phonetics - Arora, G., et al. (2023).
- CST5: Data Augmentation for Code-Switched Semantic Parsing - Agarwal, A., et al. (2023).
- Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data - Peng, S., et al. (2024).
- Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying - Nafis, N., et al. (2024).
- Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts - Absar, S. (2024).
- Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation - Doğruöz, A. S., et al. (2024).
- A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions - Hamed, I., et al. (2025).
- From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences - Kodali, P., et al. (2025).
- IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment Classification Using Candidate Sentence Generation and Selection - Srivastava, V. & Singh, M. (2020).
- Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling - Krishnan, J., et al. (2021).
- Regional language code-switching for natural language understanding and intelligent digital assistants - Rajeshwari, S. & Kallimani, J. S. (2021).
- Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs - Nag, A., et al. (2024).
- Uncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural based Question Answering - Gupta, D., et al. (2018).
- Code-Mixed Question Answering Challenge using Deep Learning Methods - Thara, S., et al. (2020).
- MLQA: Evaluating Cross-lingual Extractive Question Answering - Lewis, P., et al. (2020)
- The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding - Prasad, A., et al. (2021).
- To Ask LLMs about English Grammaticality, Prompt Them in a Different Language - Behzad, S., et al. (2024).
- COMMIT: Code-Mixing English-Centric Large Language Model for Multilingual Instruction Tuning - Lee, J., et al. (2024).
- MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks - Ahuja, S., et al. (2024).
- Controlling Language Confusion in Multilingual LLMs - Lee, N., et al. (2025).
- Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts - Goloburda, M., et al. (2025).
- Code-Switching Curriculum Learning for Multilingual Transfer in LLMs - Yoo, H., et al. (2025).
- Detecting entailment in code-mixed Hindi-English conversations - Chakravarthy, S., et al. (2020).
- A New Dataset for Natural Language Inference from Code-mixed Conversations - Khanuja, S., et al. (2020).
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP - Qin, L., et al. (2020).
- The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding - Prasad, A., et al. (2021).
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in Hinglish - Kumar, V., et al. (2022).
- Toward the Limitation of Code-Switching in Cross-Lingual Transfer - Feng, Y., et al. (2022).
- Aligning Multilingual Embeddings for Improved Code-switched Natural Language Understanding - Fazili, B., et al. (2022).
- Incontext Mixing (ICM): Codemixed Prompts for Multilingual LLMs - Shankar, B., et al. (2024).
- Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties - Tang, Z., et al. (2025).
Tasks focused on generating fluent and coherent code-mixed text.
- A Deep Generative Model for Code Switched Text - Samanta, B., et al. (2019).
- A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning - Gupta, D., et al. (2020).
- Towards Code-Mixed Hinglish Dialogue Generation - Agarwal, V., et al. (2021).
- HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text - Srivastava, V., et al. (2021).
- From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text - Tarunesh, I., et al. (2021).
- PACMAN:PArallel CodeMixed dAta generatioN for POS tagging - Chatterjee, A., et al. (2022).
- MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation - Liu, Y., et al. (2022).
- Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges - Shaikh, S., et al. (2022).
- CoCoa: An Encoder-Decoder Model for Controllable Code-switched Generation - Mondal, S., et al. (2022).
- Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages - Yong, Z. X., et al. (2023).
- Enhancing Code-mixed Text Generation Using Synthetic Data Filtering in Neural Machine Translation - Sravani, D., et al. (2023).
- Code-Switched Text Synthesis in Unseen Language Pairs - Hsu, I.-H., et al. (2023).
- Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models - Kuwanto, G., et al. (2024).
- Leveraging Large Language Models for Code-Mixed Data Augmentation in Sentiment Analysis - Zeng, L. (2024).
- Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation - Kartik, K., et al. (2024).
- LLM-based Code-Switched Text Generation for Grammatical Error Correction - Potter, T., et al. (2024).
- Understanding and Mitigating Language Confusion in LLMs - Marchisio, K., et al. (2024).
- Pun Generation
- Bridging Laughter Across Languages: Generation of Hindi-English Code-mixed Puns - Asapu, L., et al. (2025).
- Homophonic Pun Generation in Code Mixed Hindi English - Sarrof, Y. R. (2025).
- Code-Switching for Enhancing NMT with Pre-Specified Translation - Song, K., et al. (2019).
- PhraseOut: A Code Mixed Data Augmentation Method for Multilingual Neural Machine Translation - Jasim, B., et al. (2020).
- CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences - Gautam, D., et al. (2021).
- Training Data Augmentation for Code-Mixed Translation - Gupta, A., et al. (2021).
- Translate and Classify: Improving Sequence Level Classification for English-Hindi Code-Mixed Data - Gautam, D., et al. (2021).
- Gated Convolutional Sequence to Sequence Based Learning for English-Hinglish Code-Switched Machine Translation - Dowlagar, S., et al. (2021).
- IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus - Appicharla, R., et al. (2021).
- Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing - Jawahar, G., et al. (2021).
- Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation - Nagoudi, E. M. B., et al. (2021).
- Hinglish to English Machine Translation using Multilingual Transformers - Agarwal, V., et al. (2021).
- Neural Machine Translation for Sinhala-English Code-Mixed Text - Kugathasan, A., et al. (2021).
- From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text - Tarunesh, I., et al. (2021).
- Adapting Multilingual Models for Code-Mixed Translation - Vavre, A., et al. (2022).
- MUCS@MixMT: indicTrans-based Machine Translation for Hinglish Text - Hegde, A., et al. (2022).
- SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models - Khan, A. R., et al. (2022).
- Gui at MixMT 2022 : English-Hinglish : An MT approach for translation of code mixed data - Gahoi, A., et al. (2022).
- CNLP-NITS-PP at MixMT 2022: Hinglish–English Code-Mixed Machine Translation - Laskar, S. R., et al. (2022).
- End-to-End Speech Translation for Code Switched Speech - Weller, O., et al. (2022).
- MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation - Gupta, K. (2022).
- Can You Translate for Me? Code-Switched Machine Translation with Large Language Models - Khatri, J., et al. (2023).
- Lost in Translation No More : Fine-tuned transformer-based models for CodeMix to English Machine Translation - Chatterjee, A., et al. (2023).
- Enhancing Code-mixed Text Generation Using Synthetic Data Filtering in Neural Machine Translation - Sravani, D., et al. (2023).
- Towards Real-World Streaming Speech Translation for Code-Switched Speech - Alastruey, B., et al. (2023).
- Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text - Gaser, M., et al. (2023).
- Exploring Enhanced Code-Switched Noising for Pretraining in Neural Machine Translation - Iyer, V., et al. (2023).
- Evaluating Code-Switching Translation with Large Language Models - Huzaifah, M., et al. (2024).
- Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? - Hada, R., et al. (2024).
- ContrastiveMix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval - Do, J., et al. (2024).
- Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation - Kartik, et al. (2024).
- CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units - Kang, Y. (2024).
- MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer - Hong, S., et al. (2025).
- Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs - Dai, Y., et al. (2025).
- Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training - Wang, Z., et al. (2025).
- From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning - Wu, L., et al. (2025).
- The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR - Hamed, I., et al. (2025).
- Tongue-Tied: Breaking LLMs Safety Through New Language Learning - Upadhayay, B., et al. (2025).
- Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair - Borisov, M., et al. (2025).
- XLP at SemEval-2020 Task 9: Cross-lingual Models with Focal Loss for Sentiment Analysis of Code-Mixing Language - Ma, Y., et al. (2020).
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP - Qin, L., et al. (2020).
- Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling - Krishnan, J., et al. (2021).
- Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification - Lai, S., et al. (2021).
- Scopa: Soft code-switching and pairwise alignment for zero-shot cross-lingual transfer - Lee, D., et al. (2021).
- Toward the Limitation of Code-Switching in Cross-Lingual Transfer - Feng, Y., et al. (2022).
- ENTITYCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching - Whitehouse, C., et al. (2022).
- Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching - Li, Z., et al. (2024).
- Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction - Sheng, D., et al. (2025).
- GupShup: Summarizing Open-Domain Code-Switched Conversations - Mehnaz, L., et al. (2021).
- Multilingual Large Language Models Are Not (Yet) Code-Switchers - Zhang, R., et al. (2023).
- CoMix: Guide Transformers to Code-Mix using POS structure and Phonetics - Arora, G., et al. (2023).
- Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? - Hada, R., et al. (2024).
- CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization - Zhang, R. & Eickhoff, C. (2024).
- Code-Switching Curriculum Learning for Multilingual Transfer in LLMs - Yoo, H., et al. (2025).
- An Adapted Few-Shot Prompting Technique Using ChatGPT to Advance Low-Resource Languages Understanding - Sarrof, Y. R., et al. (2025).
- Detecting Entailment in Code-Mixed Hindi-English Conversations - Sharanya Chakravarthy, et al. (2020).
- A New Dataset for Natural Language Inference from Code-mixed Conversations - Simran Khanuja, et al. (2020).
- Do Multilingual Users Prefer Chat-bots that Code-mix? Let's Nudge and Find Out! - Anshul Bawa, et al. (2020).
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP - Libo Qin, et al. (2020).
- Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling - Jitin Krishnan, et al. (2021).
- Towards Code-Mixed Hinglish Dialogue Generation - Vibhav Agarwal, et al. (2021).
- GupShup: Summarizing Open-Domain Code-Switched Conversations - Laiba Mehnaz, et al. (2021).
- Code-switched inspired losses for generic spoken dialog representations - Emile Chapuis, et al. (2021).
- Towards Code-Mixed Hinglish Dialogue Generation - Vibhav Agarwal, et al. (2021).
- MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation - Yongkang Liu, et al. (2022).
- X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents - Mehrad Moradshahi, et al. (2023).
- CST5: Data Augmentation for Code-Switched Semantic Parsing - Agarwal, A., et al. (2023).
- Does a code-switching dialogue system help users learn conversational fluency in Choctaw? - Jacqueline Brixey, et al. (2025).
- Performance Analysis of Effective Retrieval of Kannada Translations in Code-Mixed Sentences using BERT and MPnet - H. P. Rohith, et al. (2025).
- Towards an Efficient Code-Mixed Grapheme-to-Phoneme Conversion in an Agglutinative Language: A Case Study on To-Korean Transliteration - Won Ik Cho, et al. (2020).
- Detecting Entailment in Code-Mixed Hindi-English Conversations - Sharanya Chakravarthy, et al. (2020).
- Normalization and Back-Transliteration for Code-Switched Data - Parikh, D. & Solorio, T. (2021).
- Abusive content detection in transliterated Bengali-English social media corpus - Salim Sazzed (2021).
- MUCS@MixMT: indicTrans-based Machine Translation for Hinglish Text - Asha Hegde, et al. (2022).
- CodeSwitching and BackTransliteration Using a Bilingual Model - Daniel Weisberg Mitelman, et al. (2024).
- Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs - Arijit Nag, et al. (2024).
- Homophonic Pun Generation in Code Mixed Hindi English - Yash Raj Sarrof (2025).
Corpora, toolkits, and frameworks to support your research.
- Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data - Adithya Pratapa, et al. (2018).
- Uncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural Based Question Answering - Deepak Gupta, et al. (2018).
- Dependency Parser for Bengali-English Code-Mixed Data enhanced with a Synthetic Treebank - Upendra Kumar, et al. (2019).
- Dependency Parsing for English–Malayalam Code-mixed Text - Sanket Sonu, et al. (2019).
- A New Dataset for Natural Language Inference from Code-mixed Conversations - Simran Khanuja, et al. (2020).
- Detecting Entailment in Code-Mixed Hindi-English Conversations - Sharanya Chakravarthy, et al. (2020).
- GupShup: Summarizing Open-Domain Code-Switched Conversations - Laiba Mehnaz, et al. (2021).
- CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences - Devansh Gautam, et al. (2021).
- Exploring Language Identification from Short Multilingual Code-Switched Texts - Pei-Chi Lo, et al. (2022).
- A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings - Milana Karaica, et al. (2022).
- Code-MixPro: A Framework for Code-Mixed Data Augmentation via Prompt Tuning - Rohit Kundu, et al. (2023).
- OffMix-3L: A Novel Code-Mixed Test Dataset in Bangla-English-Hindi for Offensive Language Identification - Goswami, D., et al. (2023).
- My Boli: A Comprehensive Suite of Corpora and Pre-trained Models for Marathi-English Code-Mixing - Joshi, A., et al. (2023).
- Sentiment Analysis in Code-Mixed Telugu-English Text with Multi-task Learning - Siva Sai, et al. (2024).
- Multilingual Harmful Meme Detection Using Large Language Models - Sanchit Ahuja, et al. (2024).
- Aligning Speech to Languages to Enhance Code-switching Speech Recognition - Hexin Liu, et al. (2024).
- HiACC: Hinglish adult & children code-switched corpus - Singh, S., et al. (2025).
- AfroCS-xs: Creating a Compact, High-Quality, Human-Validated Code-Switched Dataset for African Languages - Olaleye, K., et al. (2025).
- CoSSAT: Code-Switched Speech Annotation Tool - Shah, S., et al. (2019).
- A Unified Framework for Multilingual and Code-Mixed Visual Question Answering - Deepak Gupta, et al. (2020).
- CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing - Jayanthi, S. M., et al. (2021).
- GCM: A Toolkit for Generating Synthetic Code-mixed Text - Rizvi, M. S. Z., et al. (2021).
- Commentator: A Code-mixed Multilingual Text Annotation Framework - Sheth, R., et al. (2024).
- ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos - Krishanu Maity, et al. (2024).
- CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback - Wenbo Zhang (2024).
Techniques for building and adapting models to understand and generate code-mixed language.
- Modeling Code-Switch Languages Using Bilingual Parallel Corpus - Grandee Lee, et al. (2020).
- SJ AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification - Sai Muralidhar Jayanthi, et al. (2021).
- Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching - Parul Chopra, et al. (2021).
- Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data - Akshat Gupta, et al. (2021).
- Task-Specific Pre-Training and Cross Lingual Transfer for Code-Switched Data - Akshat Gupta, et al. (2021).
- BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT? - Santy, S., et al. (2021).
- HingBERT: A Code Mixed Hindi-English Dataset and BERT Language Models - Nayak, R. & Joshi, R. (2022).
- L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Model for Language Identification - Raviraj Joshi, et al. (2022).
- MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation - Kshitij Gupta (2022).
- Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter? - Kushal Tatariya, et al. (2023).
- Improving Pretraining Techniques for Code-Switched NLP - Richeek Das, et al. (2023).
- Exploring Enhanced Code-Switched Noising for Pretraining in Neural Machine Translation - Vivek Iyer, et al. (2023).
- Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training - Zhijun Wang, et al. (2025).
- Breaking the Language Barrier: Can One Language Model Understand All Languages? - Sanchit Ahuja, et al. (2025).
- From English to Code-Switching: Transfer Learning with Strong Morphological Clues - Gustavo Aguilar, et al. (2020).
- FiSSA at SemEval-2020 Task 9: Fine-tuned for Feelings - Bertelt Braaksma, et al. (2020).
- A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning - Deepak Gupta, et al. (2020).
- A Pre-trained Transformer and CNN model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text - Suman Dowlagar, et al. (2021).
- The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding - Archiki Prasad, et al. (2021).
- Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification - Siyu Lai, et al. (2021).
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in Hinglish - Vishwajeet Kumar, et al. (2022).
- Adapting Multilingual Models for Code-Mixed Translation - Aditya Vavre, et al. (2022).
- PRO-CS : An Instance-Based Prompt Composition Technique for Code-Switched Tasks - Srijan Bansal, et al. (2022).
- Progressive Sentiment Analysis for Code-Switched Text Data - Sudhanshu Ranjan, et al. (2022).
- ENTITYCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching - Chenxi Whitehouse, et al. (2022).
- COCOA: An Encoder-Decoder Model for Controllable Code-switched Generation - Sneha Mondal, et al. (2022).
- Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter? - Kushal Tatariya, et al. (2023).
- From Translation to Generative LLMs: Classification of Code-Mixed Affective Tasks - Anjali Yadav, et al. (2024).
- SetFit: A Robust Approach for Offensive Content Detection in Tamil-English Code-Mixed Conversations Using Sentence Transfer Fine-tuning - Kathiravan Pannerselvam, et al. (2024).
- Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation - Kartik, et al. (2024).
- COMMIT: Code-Mixing English-Centric Large Language Model for Multilingual Instruction Tuning - Lee, J., et al. (2024).
- Demystifying Instruction Mixing for Fine-tuning Large Language Models - Wang, R., et al. (2024).
- CHAI for LLMs: Improving Code-Mixed Translation in LLMs through Reinforcement Learning with AI Feedback - Zhang, W., et al. (2025).
- LLMsAgainstHate@NLU of Devanagari Script Languages 2025: Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs - Rushendra Sidibomma, et al. (2025).
- Controlling Language Confusion in Multilingual LLMs - Nahyun Lee, et al. (2025).
- Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts - Shayaan Absar (2025).
- Code-Switching Curriculum Learning for Multilingual Transfer in LLMs - Haneul Yoo, et al. (2025).
- MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer - Seongtae Hong, et al. (2025).
- Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs - Yuqian Dai, et al. (2025).
- Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training - Zhijun Wang, et al. (2025).
- Beyond Monolingual Limits: Fine-Tuning Monolingual ASR for Yoruba-English Code-Switching - Oreoluwa Babatunde, et al. (2025).
- Tongue-Tied: Breaking LLMs Safety Through New Language Learning - Bibek Upadhayay, et al. (2025).
- Identifying Aggression and Offensive Language in Code-Mixed Tweets: A Multi-Task Transfer Learning Approach - Bharath Kancharla, et al. (2025).
- Multi-task detection of harmful content in code-mixed meme captions using large language models with zero-shot, few-shot, and fine-tuning approaches - Bharath Kancharla, et al. (2025).
- Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification - Siyu Lai, et al. (2021).
- Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling - Jitin Krishnan, et al. (2021).
- PRO-CS : An Instance-Based Prompt Composition Technique for Code-Switched Tasks - Bansal, S., et al. (2022).
- ENTITY CS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching - Chenxi Whitehouse, et al. (2022).
- MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation - Yongkang Liu, et al. (2022).
- MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation - Kshitij Gupta (2022).
- Multilingual Large Language Models Are Not (Yet) Code-Switchers - Ruochen Zhang, et al. (2023).
- Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter? - Kushal Tatariya, et al. (2023).
- Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages - Zheng-Xin Yong, et al. (2023).
- OffMix-3L: A Novel Code-Mixed Test Dataset in Bangla-English-Hindi for Offensive Language Identification - Dhiman Goswami, et al. (2023).
- Leveraging Large Language Models for Code-Mixed Data Augmentation in Sentiment Analysis - Zeng, L. (2024).
- In-context Mixing (ICM): Code-mixed Prompts for Multilingual LLMs - Shankar, B., et al. (2024).
- From Translation to Generative LLMs: Classification of Code-Mixed Affective Tasks - * Anjali Yadav, et al. (2024)*.
- COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing - Rajvee Sheth, et al. (2025).
- DweshVaani: An LLM for Detecting Religious Hate Speech in Code-Mixed Hindi-English - Varad Srivastava (2025).
- Multi-task detection of harmful content in code-mixed meme captions using large language models with zero-shot, few-shot, and fine-tuning approaches - Bharath Kancharla, et al. (2025).
- An Adapted Few-Shot Prompting Technique Using ChatGPT to Advance Low-Resource Languages Understanding - Yash Raj Sarrof, et al. (2025).
Resources for evaluating model performance on code-switching tasks.
- LinCE: A centralized benchmark for linguistic code-switching evaluation - Aguilar, G., et al. (2020).
- GLUECoS: An Evaluation Benchmark for Code-Switched NLP - Khanuja, S., et al. (2020).
- Detecting Entailment in Code-Mixed Hindi-English Conversations - Sharanya Chakravarthy, et al. (2020).
- PACMAN: PArallel CodeMixed dAta generatioN for POS tagging - Arindam Chatterjee, et al. (2022).
- HinglishEval Generation Challenge on Quality Estimation of Synthetic Code-Mixed Text: Overview and Results - Vivek Srivastava, et al. (2022).
- MultiCoNER: A Large-scale Multilingual Dataset for Complex Named Entity Recognition - Shervin Malmasi, et al. (2022).
- X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents - Mehrad Moradshahi, et al. (2023).
- CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization - Ruochen Zhang, et al. (2024).
- MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks - Sanchit Ahuja, et al. (2024).
- COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing - Rajvee Sheth (2025).
- CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts - Parth Sawant (2025).
- SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset - Peng Xie (2025).
- Bleu: a Method for Automatic Evaluation of Machine Translation - Papineni, K., et al. (2002).
- chrF: character n-gram F-score for automatic MT evaluation - Popović, M. (2015).
- Code-Mixing in Social Media Text - Amitava Das, et al. (2013).
- Comparing the Level of Code-Switching in Corpora - Björn Gambäck, et al. (2016).
- Automatic Detection of Code-switching Style from Acoustics - SaiKrishna Rallabandi, et al. (2018).
- Detecting de minimis Code-Switching in Historical German Books - Shijia Liu, et al. (2020).
- Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text - Vivek Srivastava, et al. (2021).
- SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing - Prashant Kodali, et al. (2022).
- PreCogIIITH at HinglishEval: Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality - Prashant Kodali, et al. (2022).
- Code-Switching Metrics Using Intonation Units - Rebecca Pattichis, et al. (2023).
- Minimal Pair-Based Evaluation of Code-Switching - Sterner, I. & Teufel, S. (2025).
- PIER: A Novel Metric for Evaluating What Matters in Code-Switching - Ugan, E. Y., et al. (2025).
Applying code-switching NLP to speech, vision, and other modalities.
-
ASR
- Dependency Parsing for English–Malayalam Code-mixed Text - Sanket Sonu, et al. (2019).
- Semi-supervised Acoustic and Language Model Training for English-isiZulu Code-Switched Speech Recognition - Astik Biswas, et al. (2020).
- Improving code-switched ASR with linguistic information - Jie Chi, et al. (2022).
- End-to-End Speech Translation for Code Switched Speech - Orion Weller, et al. (2022).
- Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation - A. Seza Doğruöz, et al. (2023).
- New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction - Zhaohong Wan, et al. (2023).
- Code-Mixed Text Augmentation for Latvian ASR - Martins Kronis, et al. (2024).
- The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR - Injy Hamed, et al. (2025).
- Beyond Monolingual Limits: Fine-Tuning Monolingual ASR for Yoruba-English Code-Switching - Oreoluwa Babatunde, et al. (2025).
- Development of a code-switched Hindi-Marathi dataset and transformer-based architecture for enhanced speech recognition using dynamic switching algorithms - Palash Jain, et al. (2025).
- ENHANCING ASR ACCURACY AND COHERENCE ACROSS INDIAN LANGUAGES WITH WAV2VEC2 AND GPT - 2 - R. Geetha Rajakumari, et al. (2025).
- Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM - Yu Xi, et al. (2024).
-
Speech Translation
- Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation - Humair Raj Khan, et al. (2021).
- End-to-End Speech Translation for Code Switched Speech - Weller, O., et al. (2022).
- CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units - Kang, Y. (2024).
- Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM - Yu Xi, et al. (2024).
- Beyond Monolingual Limits: Fine-Tuning Monolingual ASR for Yoruba-English Code-Switching - Oreoluwa Babatunde, et al. (2025).
- The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR - Injy Hamed, et al. (2025).
- Code-Switching and Syntax: A Large–Scale Experiment - Igor Sterner, et al. (2025).
- Development of a code-switched Hindi-Marathi dataset and transformer-based architecture for enhanced speech recognition using dynamic switching algorithms - P. Hemant, et al. (2025).
- ENHANCING ASR ACCURACY AND COHERENCE ACROSS INDIAN LANGUAGES WITH WAV2VEC2 AND GPT - 2 - R. Geetha Rajakumari, et al. (2025).
- A Unified Framework for Multilingual and Code-Mixed Visual Question Answering - Deepak Gupta, et al. (2020).
- Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation - Raj Khan, H., et al. (2021).
- "To Have the 'Million' Readers Yet": Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper - Dereza, O., et al. (2024).
- MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks - Sanchit Ahuja, et al. (2024).
- ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos - Krishanu Maity, et al. (2024).
- Multi-task detection of harmful content in code-mixed meme captions using large language models with zero-shot, few-shot, and fine-tuning approaches - Bharath Kancharla, et al. (2025).
- BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service - Francesco Kruk (2025).
- Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts - Maiya Goloburda, et al. (2025).
- Enhancing Participatory Development Research in South Asia through LLM Agents System: An Empirically-Grounded Methodological Initiative from Field Evidence in Sri Lankan - Xinjie Zhao, et al. (2025).
- Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences - Genta Indra Winata, et al. (2019).
- Translate and Classify: Improving Sequence Level Classification for English-Hindi Code-Mixed Data - Devansh Gautam, et al. (2021).
- Data Augmentation to Address Out of Vocabulary Problem in Low Resource Sinhala English Neural Machine Translation - Aloka Fernando, et al. (2021).
- CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition - Dai, W., et al. (2022).
- Typo-Robust Representation Learning for Dense Retrieval - Panuthep Tasawong, et al. (2023).
- Advancing Multi-Criteria Chinese Word Segmentation Through Criterion Classification and Denoising - Tzu Hsuan Chou, et al. (2023).
- ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos - Maity, K., et al. (2024).
- Machine Translation and Transliteration for Indo-Aryan Languages: A Systematic Review - Sandun Sameera Perera, et al. (2025).
A list of academic workshops and community shared tasks dedicated to code-switching.
- CALCS 2018: Workshop on Computational Approaches to Linguistic Code-Switching.
- CALCS 2020: Workshop on Computational Approaches to Linguistic Code-Switching.
- CALCS 2021: Workshop on Computational Approaches to Linguistic Code-Switching.
- WILDRE-6 2022: Workshop within the 13th Language Resources and Evaluation Conference.
- ICON 2022: 19th International Conference on Natural Language Processing (ICON).
- CALCS 2023: 6th Workshop on Computational Approaches to Linguistic Code-Switching.
- CALCS 2025: 7th Workshop on Computational Approaches to Linguistic Code-Switching.
Your contributions are always welcome and make this community resource better!
If you have a paper, dataset, or tool you'd like to add:
- Fork the repository.
- Add your resource to the relevant section.
- Please try to follow the existing format and include a direct link.
- Submit a pull request!