Safety Research
Popular repositories Loading
-
-
persona_vectors
persona_vectors PublicPersona Vectors: Monitoring and Controlling Character Traits in Language Models
-
-
safety-tooling
safety-tooling PublicInference API for many LLMs and other useful tools for empirical research
-
open-source-alignment-faking
open-source-alignment-faking PublicOpen Source Replication of Anthropic's Alignment Faking Paper
Repositories
Showing 10 of 33 repositories
- circuit-tracer Public
safety-research/circuit-tracer’s past year of commit activity - impossiblebench Public
Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"
safety-research/impossiblebench’s past year of commit activity - SCONE-bench Public
safety-research/SCONE-bench’s past year of commit activity - unsupervised-truth-probes Public
safety-research/unsupervised-truth-probes’s past year of commit activity
Most used topics
Loading…