LatentLinter is a plug-and-play PyTorch manifold that adds statistical regularization to gradient-based planners in model-based RL. It detects when planned trajectories leave the training manifold and penalizes them, acting as a safety check for optimizer stability.
In model-based RL, your planner optimizes actions against a learned world model (e.g., DreamerV3, TD-MPC). If the model has imperfections, the optimizer can find adversarial inputs, i.e., trajectories that exploit model errors to achieve impossible rewards, causing latent state divergence and physically invalid plans.
LatentLinter fits a Principal Component Analysis (PCA) subspace to your training latents and adds a reconstruction penalty to the planner's loss. If a planned state is far from the manifold, it's penalized, forcing the trajectory back to in-distribution regions.
pip install latent-linter
#quick start
import torch
from latent_linter import ManifoldConstraint
# 1 - fit guard to valid training latents
latents = torch.load("replay_buffer.pt") #shape(N, D)
guard = ManifoldConstraint(latents, pca_dim=16)
# 2 - in your planning loop
optimizer = torch.optim.Adam([actions], lr=0.1)
for i in range(iterations):
optimizer.zero_grad()
states = world_model(actions) #rollout your world model
#add manifold penalty
ood_penalty = guard(states).mean()
loss = task_loss + (10.0 * ood_penalty)
loss.backward()
optimizer.step()
Tested LatentLinter on a synthetic "glitch" environment where actions > 0.8 trigger a physics-breaking shortcut.
#see demo_honeypot.py
#baseline: 53% cheat rate
#guarded: 6% cheat rate
#prevention: 33.3%This proves the mechanism works, but not that it solves natural hallucinations (see Limitations).
- Debugging planners - If your gradient-based planner diverges, guard(states) gives you a diagnostic number.
- CI/CD - Fail the build if guard(states).max() > 1.0 (latent collapse detected).
- Hyperparameter sweeps - Plot guard(states) vs. lr to find stable optimizer ranges.
- Stochastic models - Effect is weaker (ensemble methods like MOPO are better for uncertainty).
- Deterministic models - Effect is marginal (CartPole tests show 2-10% improvement).
- Production safety - This is a diagnostic tool, not a certified safety layer.
- Not tested on high-dimensional models. DreamerV3, V-JEPA, TWM remain untested. Open an issue if you try this.
- PCA is a coarse approximation. For complex manifolds, normalizing flows or VAEs may work better.
- Effect size depends on model flaws. If your world model is near-perfect, the guard adds little value.
- No theoretical guarantees. Unlike CBFs, this is empirical regularization, not certified safety.
The long-term goal is to build "Representation Engineering (RepEng) for World Models". A comprehensive toolkit to inspect, interpret, and intervene in the latent dynamics.
Future versions will target:
- Inspection - Visualizing which latent neurons control specific physical properties (e.g., gravity, friction).
- Steering - Actively modifying latent states to guide agent behavior without retraining.
- Interpretation - decoding the "black box" of learned dynamics.
If you are interested in building the matplotlib of latent space, let's talk.
We welcome contributions and value simplicity over complexity!
How to contribute:
- New Constraints - Have a better idea than PCA? (e.g., Normalizing Flows, VAEs). Open a PR!
- Planner Integration - Write a wrapper for your favorite planner (MPPI, CEM, Cross-Entropy).
- Demos - Did you stabilize a different environment (Walker, Hopper, Quadruped)? Submit a demo script.
@misc{latentlinter2025,
title={A Differentiable Constraint for Stable Planning},
author={Yash Thube},
year={2025},
url={https://github.com/thubZ09/latent-linter}
}