-
Couldn't load subscription status.
- Fork 0
Description
Was able to get things nominally up and running based on the following:
To install, clone it, cd cortex/ec3 (ec = explore-compress, the intellectual lineage of DreamCoder, which was ec2 in their repo).
./install-deps.sh
should build the ocaml executable. run it via
./run.sh -b 512 -g -p
-b : batch size (change based on your gpu memory)
-g : (optional) debug logging
-p : parallel. (Defaults to assuming there are ~16 cores .. i should make that a parameter. turn it off when debugging )Training: in a separate terminal,
cd cortex/ec3
python ec33.py -b 512
This will start training. Batch size needs to be the same.Dreaming: Once it writes out a model, you can start dreaming in yet another terminal:
python ec33.py -b 512 -d
where -d : dreaming
You can monitor training progress in yet another term
python plot_losslog.py -b 512
(window output: assumes running locally)At present, the dreams don't directly feed back into the training. What I'm working on now. But, this is enough for you to poke around!
Probably going to have some high level questions about how data is getting passed around between the processes here but want to poke it a little first...
Two quick ones to help orient me:
- What kind of setup have you been using to train on this so far in terms of hardware? I see a comment in
run.shthat reads# use the first 4090 (Second one for python)- does this imply two GPUs with one running the ocaml stuff and the other one doing pytorch? - Anything to think about in terms of setting up the python environment? I'm working from an image with Ubuntu 22.04 + CUDA 12.0 (had to go to datacrunch to get a GPU instance as AWS is being weirdly stingy with my personal account)... was able to get
ec33.pyrunning just by doing a naked pip install oftorchandmatplotlibthough that's not best practice obviously. Mainly asking because ocaml is a black box to me for now and I don't really understand what (if any) dependencies might be getting shared between it and a pytorch installation.- Once i get my bearings a bit I'd be down to maybe try to dockerize the setup procedure here if you think that makes sense