Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Samir-Guenchi's full-sized avatar

Highlights

  • Pro

Block or report Samir-Guenchi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Samir-Guenchi/README.md

Samir Guenchi

Building NLP systems for the 400M+ Arabic speakers who deserve better language tools.

Currently: RAG architecture for Arabic Q&A • Training competitive programmers • CS at ENSIA


What I'm Building

Arabic NLP that actually works
Most language models treat Arabic as an afterthought. I'm building RAG systems, tokenizers, and Q&A pipelines specifically designed for RTL languages and Arabic morphology.

Production ML, not just notebooks
From medical diagnosis pipelines to security-focused mobile apps, I focus on systems that ship—with proper evaluation, reproducible research, and real impact.

Algorithms education
Coach for Algeria's national programming olympiad. Turns out teaching is the best way to truly understand complexity theory.


Tech Stack

Core: Python • C++ • TensorFlow • scikit-learn • Hugging Face
NLP: LangChain • RAG • Tokenization • Arabic morphology
Mobile: Flutter • Dart
Tools: Linux • Git • LaTeX • Jupyter


Selected Work

RAG-powered Q&A system for Arabic government documents. Natural language search instead of ctrl+F through 500-page PDFs.
PythonLangChainArabic NLP

Flutter app with ML-based phishing detection. Analyzes QR codes before you scan them, not after you're compromised.
FlutterMLSecurity

End-to-end ML pipeline for medical diagnosis. Focused on reproducibility and proper cross-validation—because healthcare predictions need more than 0.95 accuracy on a test set.
Pythonscikit-learnMedical ML

Interactive demos of pathfinding algorithms. Built this for my olympiad students who learn better by seeing, not memorizing pseudocode.
PythonAlgorithmsTeaching

More at github.com/Samir-Guenchi


Why This Matters

There are 400 million Arabic speakers online. Most NLP tools were built for English, then poorly adapted. I'm working on infrastructure that treats Arabic as a first-class citizen—proper tokenization for agglutinative morphology, embeddings that understand context in RTL text, RAG systems that handle diacritics.

Also: competitive programming teaches you to think in constraints. That mindset carries over when you're optimizing transformer inference or designing algorithms that scale.


Currently

  • Building Arabic RAG systems at ENSIA (Algeria's National School of AI)
  • Training national olympiad candidates in algorithms
  • Contributing to Arabic NLP tooling
  • Looking for: Research collaborations, internships in ML/NLP, open-source opportunities

Connect

LinkedInEmailKaggleCodeforces


Pinned Loading

  1. Search_Algo Search_Algo Public

    "A collection of AI search algorithms with interactive visualizations to demonstrate their behavior and performance.

    Python

  2. portfolio portfolio Public

    A fully custom, no-framework portfolio built from scratch with vanilla HTML, CSS, and JavaScript. No Bootstrap, no React, no dependencies—just clean, performant code.

    HTML 2

  3. Amal Amal Public

    A Cross-Platform Multilingual LLM & Retrieval-Augmented Generation System for Drug Addiction Awareness, Prevention, and Recovery Support

    TypeScript 1

  4. Tokenized-Reward-Credential-System Tokenized-Reward-Credential-System Public

    A blockchain-based system for issuing, managing, and verifying tokenized rewards and digital credentials using secure smart contracts and decentralized identity.

    TypeScript