Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View cceyda's full-sized avatar
:octocat:
:octocat:

Organizations

@croquiscom @Hugging-Face-Supporter @Hugging-Face-Helping-Hand

Block or report cceyda

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

✔️ Annotation

Data annotation tools
20 repositories

🖼️ Visualizations

Repos for visualizing, graphing, model interpretability tools... all that good stuff
46 repositories

Adversarial

5 repositories

🗳️ Archived

Projects no longer active

🎨 Color

39 repositories

🛢Data

Repos of datasets & data wrangling libs
25 repositories

🎁 DBs

27 repositories

🎨 Design

3 repositories

Starred repositories

3394 results for source starred repositories
Clear filter

Code at data for "Explaining and Mitigating Crosslingual Tokenizer Inequities", published at NeurIPS 2025

Jupyter Notebook 3 1 Updated Oct 24, 2025

🎒 Token-Oriented Object Notation (TOON) – JSON for LLM prompts at half the tokens. Spec, benchmarks & TypeScript implementation.

TypeScript 9,873 318 Updated Nov 4, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,937 1,254 Updated Oct 27, 2025

This is a package to fine-tune language models in order to create clustering-friendly embeddings.

Python 6 1 Updated Aug 12, 2025
Python 114 15 Updated Sep 23, 2025

Detection and automatic updating of Korean datasets uploaded to Hugging Face

Python 12 1 Updated Oct 28, 2025

Sample base images for Databricks Container Services

Jupyter Notebook 182 124 Updated Oct 15, 2025

A very fast SIMD-first image comparison library (with nodejs API)

Zig 2,641 97 Updated Oct 22, 2025

Unofficial implementation of Tiny Recursive Model (TRM), improvement to HRM from Sapient AI, by Alexia Jolicoeur-Martineau

Python 124 15 Updated Oct 14, 2025

HSEB: Hybrid Search Engine Benchmark

Python 13 1 Updated Oct 5, 2025

Efficient RWKV inference engine. RWKV7 7.2B fp16 decoding 10250 tps @ single 5090.

Python 54 17 Updated Nov 3, 2025

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contain…

Python 58 5 Updated Oct 25, 2024

Python project to ship N8N execution data to Langfuse using OTEL API.

Python 3 1 Updated Nov 4, 2025

Ko-SyllaBERT: A Syllable-Based Efficient and Robust Korean Language Model for Real-World Noise and Typographical Errors

Python 3 1 Updated Jun 5, 2025

biasing the universal tokenizer and an attempt to optimize compression rates in multilingual compression

Python 5 1 Updated Aug 28, 2025

Code for Zero-Shot Tokenizer Transfer

Python 140 11 Updated Jan 14, 2025

A simple tool for adapting a pretrained Huggingface model to a new vocabulary with (almost) no training.

Python 12 1 Updated Aug 12, 2025

Datamodels for hugging face tokenizers

Python 85 4 Updated Nov 4, 2025

Run any GUI app in the terminal❗

TypeScript 6,773 153 Updated Oct 26, 2025

A massively multilingual modern encoder language model

Python 105 9 Updated Oct 13, 2025

▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.

Shell 6,834 275 Updated Sep 2, 2025

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

Jupyter Notebook 17,365 4,058 Updated Jul 21, 2025

Zero-Config Code Flow for Claude code & Codex

TypeScript 3,449 265 Updated Nov 3, 2025

한국 서비스에 이용 가능한 Public API 모음 | Public APIs Available for Korean Services

Python 951 72 Updated Nov 4, 2025

An n8n community node that brings Langfuse observability to your OpenAI chat workflows.

TypeScript 27 1 Updated Sep 24, 2025

AI Product Design Agent - Open Source

TypeScript 5,167 573 Updated Oct 31, 2025

Train embedding and reranker models for retrieval tasks on Apple Silicon with MLX

Python 164 8 Updated Sep 18, 2025

GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction

Python 80 6 Updated Jul 31, 2024

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Python 3,609 531 Updated Oct 16, 2024

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

Python 1,606 132 Updated Jan 24, 2025
Next