Stars
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…
A lightweight data processing framework built on DuckDB and 3FS.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
A Data Streaming Library for Efficient Neural Network Training
Pythonic file-system interface for TOS(Tinder Object Storage)https://tosfs.readthedocs.io/en/latest/
Distributed query engine providing simple and reliable data processing for any modality and scale
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Jay-ju / celeborn
Forked from apache/celebornApache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Jay-ju / ray
Forked from ray-project/rayRay is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Jay-ju / starrocks
Forked from StarRocks/starrocksStarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
Apache Spark - A unified analytics engine for large-scale data processing
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integra…