Thanks to visit codestin.com
Credit goes to github.com

huangtiansheng

Follow

🌴

On vacation

Tiansheng Huang huangtiansheng

🌴

On vacation

Follow

A PhD student from Georgia Institute of Technology

82 followers · 173 following

Georgia Institute of Technology
Atlanta
https://huangtiansheng.github.io/
in/tiansheng-huang-5661a8293

Achievements

Achievements

Organizations

huangtiansheng/README.md

Hi there 👋 I am Tiansheng Huang

I’m currently a fourth-year PhD candidate from Georgia Tech.
I am working on safety alignment for large language models. Particularly, I am interested in red-teaming attacks and defenses for LLMs.

Selected Publications

[2025/3/01] Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable arXiv [paper] [code]
[2025/1/30] Virus: Harmful Fine-tuning Attack for Large Language Models bypassing Guardrail Moderation arXiv [paper] [code]
[2024/9/26] Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey arXiv [paper] [repo]
[2024/9/3] Booster: Tackling harmful fine-tuning for large language models via attenuating harmful perturbation ICLR2025 [paper] [code] [Openreview]
[2024/8/18] Antidote: Post-fine-tuning safety alignment for large language models against harmful fine-tuning ICML2025 [paper] [code]
[2024/5/28] Lazy safety alignment for large language models against harmful fine-tuning NeurIPS2024 [paper] [code]
[2024/2/2] Vaccine: Perturbation-aware alignment for large language model aginst harmful fine-tuning NeurIPS2024 [paper] [code]
[2023/12/01] Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training NeurIPS2023 [paper] [code]

Pinned Loading

git-disl/awesome_LLM-harmful-fine-tuning-papers git-disl/awesome_LLM-harmful-fine-tuning-papers Public

A survey on harmful fine-tuning attack for large language model

229 7
git-disl/Virus git-disl/Virus Public

This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"

Python 53 4
git-disl/Booster git-disl/Booster Public

This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation" (ICLR2025 Oral).

Shell 33 1
git-disl/Antidote git-disl/Antidote Public

This is the unofficial re-implementation of "Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack" (ICML2025)

Shell 6
git-disl/Lisa git-disl/Lisa Public

This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)

Python 25
git-disl/Vaccine git-disl/Vaccine Public

This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)

Shell 48 4