An attempt at replicating Lifelong Neural Developmental Programs 1 in pytorch.
Visualization of a rollout in cartpole (preceded by spontaneous activity phase):
visualization.mp4
... not quite there yet, regarding performance. 2
What I tried to fix it, my suspicions, and how I would debug it further
- tried a bunch of hyperparameter config and ablations of config values which were unclear from the paper & original implementation (partly conflicting) - logged a lot of intermediate values (but couldn't find anything too suspicious, except an unusually high number of edges with some settings) - vibe debuggingAs for why it doesn't learn properly, I have some guesses:
- rng goofed up somewhere
- misinterpretation of hyperparameters from original implementation & paper
- subtle bugs around masking, timing (esp add/prune), indexing, initialization, etc.
- goofed up something specific to the cartpole env?
How I would debug it further:
- actually think a bit harder about the behavior of the model (visualization, statistics)
- train in a different env from the paper (e.g. with discrete actions)
- but first, I would rewrite it (since my understanding made leaps since I wrote this) in jax (ecosystem+parallelism+rng+speed etc.; harder to goof up)
Mean fitness 50, max fitness 350 on cartpole. Seems like a big discrepancy, actually? Maybe this points to a bug with the optimization, as opposed to the architecture implementation?
Some lessons learned:
- no premature optimization: first make the code run, then make it pretty/efficient/modular/thoroughly typed
- don't try to be smart: add your own ideas after you have a working baseline
- put your (own) mind to it, or let it be: even the best LLMs [2025] fail hard at vibe-debugging nontrivial ML code (hallucinations, getting lost in dead ends, etc.)
Setup:
git clone [email protected]:MaxWolf-01/LNDP.git
cd LNDP
uv sync --all-extras
source .venv/bin/activateRun cartpole with default hparams and wandb logging:
python experiments/cartpole_paper.py --wandb.enabled trueFootnotes
-
In case you don't know what it is, here's a TLDR. For the conceptual picture, I recommend reading the paper, as my note mostly focuses on implementation details. ↩
-
Ultimately, I left it here since I had obtained a deeper understanding by getting my hands dirty, and I've since been busy collecting stepping stones towards successors which break with some assumptions of LNDP; getting this to work wouldn't be worth the effort. What's more, I worked on this during conscription - limited time/attention/patience. ↩