Starred repositories
Databricks Toolkit for Coding Agents provided by Field Engineering
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
verl: Volcano Engine Reinforcement Learning for LLMs
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Crawl a site to generate knowledge files to create your own custom GPT from a URL
21 Lessons, Get Started Building with Generative AI
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
Vector-free L-BFGS implementation for Spark MLlib
Emacs minor mode to highlight each source code identifier uniquely based on its name
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Labeled records for Scala based on structural refinement types and macros.
Notes talking about the design and implementation of Apache Spark
A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.
Alluxio, data orchestration for analytics and machine learning in the cloud
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.