ddpg

Deep Deterministic Policy Gradients

I'm following the original DDPG paper as much as possible, and using their "low-dimensional" representation, not the pixels-based one.

Action space: -2 to 2.

python main.py Pendulum-v0

Status: not yet working. I think it's done but alas there is some bug somewhere. Ugh.

(These might be useful to supplement the original paper.)