Pinned Loading
-
agentic-misalignment-anthropic
agentic-misalignment-anthropic PublicForked from anthropic-experimental/agentic-misalignment
Mi fork de Anthropic's Agentic Misalignment
Python
-
empathy-probes
empathy-probes PublicDetecting empathy as a linear direction in transformer activation space. Weekend research project extending Virtue Probes to EIA benchmark.
Jupyter Notebook 2
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.




