Thanks to visit codestin.com
Credit goes to github.com

Skip to content
@safety-research

Safety Research

Popular repositories Loading

  1. circuit-tracer circuit-tracer Public

    Python 2.5k 275

  2. petri petri Public

    An alignment auditing agent capable of quickly exploring alignment hypothesis

    Python 724 91

  3. persona_vectors persona_vectors Public

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models

    Python 308 72

  4. SCONE-bench SCONE-bench Public

    136 23

  5. safety-tooling safety-tooling Public

    Inference API for many LLMs and other useful tools for empirical research

    Python 85 22

  6. open-source-alignment-faking open-source-alignment-faking Public

    Open Source Replication of Anthropic's Alignment Faking Paper

    Jinja 52 7

Repositories

Showing 10 of 33 repositories

Most used topics

Loading…