Stars
Deep Agents is an agent harness built on langchain and langgraph. Deep Agents are equipped with a planning tool, a filesystem backend, and the ability to spawn subagents - making them well-equipped…
Feature engineering and selection open-source Python library compatible with sklearn.
catch22: CAnonical Time-series CHaracteristics
A genetic programming algorithm used for generating alpha factors in the multi-factor investment strategy
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
A synthetic data generator for text recognition
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Custom…
Links to new technologies which improve on the tech which I used in old posts
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A curated, open, and ever-evolving learning path focused on practices of software development, principles of software design, and software architecture.
A list of semi to fully remote-friendly companies (jobs) in tech.
GitHub Classroom autograding example repo with C++ and Catch.
Class lecture materials for C++ Object-oriented Programming course taught at New York University in Spring 2021.
OOP in C++ and some Python
Manipulate audio with a simple and easy high level interface
System output for the NAACL 2021 SRW paper "A Sliding-Window Approach to Automatic Creation of Meeting Minutes"
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
mchine learning portfolio project