Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Jay-ju's full-sized avatar
  • Bytedance
  • Hangzhou
  • 16:22 (UTC +08:00)

Block or report Jay-ju

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…

Rust 5,590 461 Updated Oct 28, 2025

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…

Python 8,875 692 Updated Oct 24, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,813 428 Updated Mar 5, 2025

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…

Python 76,860 15,043 Updated May 10, 2024

A Data Streaming Library for Efficient Neural Network Training

Python 1,404 174 Updated Oct 27, 2025

Pythonic file-system interface for TOS(Tinder Object Storage)https://tosfs.readthedocs.io/en/latest/

Python 16 1 Updated Sep 8, 2025

Distributed query engine providing simple and reliable data processing for any modality and scale

Rust 4,647 324 Updated Oct 28, 2025

All-in-one text de-duplication

Python 725 73 Updated Aug 31, 2025

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 39,554 6,835 Updated Oct 28, 2025
Java 5 3 Updated May 28, 2025

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Java 1 Updated Oct 8, 2024

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 1 Updated Jul 2, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 151,728 30,967 Updated Oct 27, 2025

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 154,866 13,477 Updated Oct 28, 2025

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.

Java 863 200 Updated Oct 20, 2025
C++ 4 Updated Jun 20, 2023

StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.

Java 1 Updated Sep 26, 2024

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …

Java 10,806 2,170 Updated Oct 28, 2025

Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution

Java 141 72 Updated Jan 3, 2023

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,177 28,896 Updated Oct 28, 2025

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

Java 686 90 Updated Sep 14, 2025

自然语言处理、知识图谱、对话系统,大模型等技术研究与应用。

1,728 370 Updated Feb 8, 2025

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integra…

Python 22,682 4,931 Updated Oct 28, 2025