Thanks to visit codestin.com
Credit goes to github.com

Skip to content

allenai/genesys

Repository files navigation



🧬 Genesys: Language Modeling by Language Models

GitHub License GitHub release Paper URL Playground

Genesys (Genetic discovery system) is a repository for utils and code for the distributed evolutionary system of using LLM agents to discover better LLMs. It covers the full workflow from ideation, implementation, checking, training, and evaluating. You can play with the demo here: https://genesys.allen.ai (it may take some time to load). Our experiment data are presented in these pages:

  • Evolution statistics: Evolve - Evolution Statistics
  • Discovered Designs: Viewer - Design Artifacts (you can download them here)
  • Design Leaderboard: Viewer - Design Leaderboard

There are many other features from the GUI, you can explore them. Here is a short demo video that briefly show some of the features https://drive.google.com/file/d/1JG0hNAJuaPZWUKfwrwoF_ufh0GJuLO7z/view?usp=sharing.

Installation

  1. Clone the repo, assume its under your home directory ~

  2. Create a virtual env with pytorch, move to the repo, and install genesys cli

conda create -n genesys python=3.12 -y && \
conda activate genesys && \
cd ~/genesys && \
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia -y && \
pip install -e .
  1. Setup LLM API keys
export MY_OPENAI_KEY=YOURKEY
export TOGETHER_API_KEY=YOURKEY
export ANTHROPIC_API_KEY=YOURKEY
export HF_KEY=YOURKEY
export WANDB_API_KEY=YOURKEY
export S2_API_KEY=YOURKEY
export DATA_DIR=~/model_discovery/data # change it to a directory you like
export CKPT_DIR=~/model_discovery/ckpt # change it to a directory you like
export DB_KEY_PATH=~/model_discovery/secrets/db_key.json # provide yours, see item 4 below
export HF_DATASETS_TRUST_REMOTE_CODE=1
export PINECONE_API_KEY=YOURKEY
export COHERE_API_KEY=YOURKEY
export PERPLEXITY_API_KEY=YOURKEY
export MATHPIX_API_ID=YOURKEY # optional, it provides pdf to text service, useful if you need to get paper from arxiv url for example, its not used in the paper but you may try it yourself
  1. Setup a firebase backend, and store the secret json in DB_KEY_PATH, this is required for the distributed evolution

  2. Setup a pinecone vectorstore (optional, if you want to use the vector search of paper chunks). You need to store the chunks in your vectorstore, refer to the code in search_utils.py).

  3. Setup the requirements

genesys setup && \ 
pip install -r requirements_optional.txt # optional
  1. Test your setup by launching a node
genesys node
  1. Launch the gui
genesys gui

About Eval Environment

It should be setup if you followed the installation instruction, but if not here is how you separately set it up.

Install the custmoized lm_eval: https://github.com/chengjunyan1/lm-evaluation-harness/tree/main

You must export DATA_DIR first, then download evaluation data in DATA_DIR, e.g.:

{DATA_DIR}/blimp_filtered/adjunct_island.jsonl

The download link for babyLM evaluation data: https://files.osf.io/v1/resources/ad7qg/providers/osfstorage/66358ec34664da20a0ed6acc/?zip=evaluation_data

Notice that everytime you change your DATA_DIR, you may need to reinstall it, and remember DO NOT INSTALL peft which may cause errors. Supported tasks: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks, specially, babyLM tasks are: "blimp_filtered","blimp_supplement".

Hints for running evolution

Better separate the design nodes and verification nodes, design checkers need to use GPUs, so may cause conflicts. It is recommended to deploy few design nodes and many verification nodes as design nodes are mostly bounded by CPU and API rate limits.

About

Source code and utilities for the Genesys distributed language model architecture discovery system.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •