SLEEPERAGENT

Agent Safety in an evolving AI landscape

“In this AI-renaissance, what if we could find the latest vulnerabilities?”

⚡ R E S E A R C H A G E N T Finds the latest updated research on adversarial prompts from Google, Github Repositories, and ARXIV Research papers. Uses up to date research to feed to the second agent

🥷 A D V E R S A R I A L A G E N T Synthesizes probabilistic prompts based on online jailbreaking research Improves jailbreaking prompt as the session goes on by learning from the responses of the target AI system

Full Presentation https://docs.google.com/presentation/d/1jLnWusQSHZmmS3mWvcX9xqn3kx3AySmMnITh3A4na68/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
jailbreak-detector		jailbreak-detector
prompts		prompts
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
requirements.txt		requirements.txt
search_google.py		search_google.py
setup.sh		setup.sh
test.py		test.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SLEEPERAGENT

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

taterowney/AI-Agents-Hackathon

Folders and files

Latest commit

History

Repository files navigation

SLEEPERAGENT

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages