White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

Yixin Wan, Kai-Wei Chang

The official code repository for the NAACL 2024 TrustNLP Best Paper (non-archival track): White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

Language Agency Classification (LAC) Dataset Construction

We provide the code for generating the raw dataset. Alternatively, in the folder ./lac_dataset_construction/lac_dataset, we provide the final version of the cleaned, labeled, and split LAC dataset that you can directly use for training LAC classifiers.

To run data generation, first go to the folder for generation scripts:

cd lac_dataset_construction

Then, add in your OpenAI account configurations in ./lac_dataset_construction/generation_util.py and run:

python generate_dataset.py

Training BERT-based Language Agency Classifiers

To train a BERT-based LAC classifier from scratch, first return to the main labe-agency folder and run:

sh ./scripts/run_train_bert.sh

Alternatively, download model checkpoints from this Google Drive link and store all checkpoint subfolders (e.g. bert-base-cased_binary_*_*) in the labe-agency/checkpoints/ folder.

Running Generation Experiments on LLMs

For experiments on ChatGPT, first add in your OpenAI account configurations in generation_util.py. For generation experiments without prompt-based mitigation, run:

sh ./scripts/run_generate.sh

For generation experiments with prompt-based mitigation, run:

sh ./scripts/run_generate_mitigate.sh

Running Evaluation Experiments on LLMs

For evaluation experiments on language agency gender bias in human-written texts, run:

sh ./scripts/run_infer_calc_agency_aggregated_human_bert.sh

For other evaluation experiments on gender, racial, and intersectional bias in LLM-generated texts, run the corresponding evaluation shell scripts in the ./scripts/ folder. For instance, for the evaluation of Llama3-generated texts without mitigation, run:

sh ./scripts/run_infer_calc_agency_aggregated_llama3_bert.sh

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
lac_dataset_construction		lac_dataset_construction
scripts		scripts
LICENSE		LICENSE
README.md		README.md
agentic_log.txt		agentic_log.txt
calc_agency_aggregated.py		calc_agency_aggregated.py
calc_agency_gender.py		calc_agency_gender.py
communal_log.txt		communal_log.txt
constants.py		constants.py
fine_tune_bert.py		fine_tune_bert.py
generate_clg.py		generate_clg.py
generation_util.py		generation_util.py
inference_bert.py		inference_bert.py
metadata.json		metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

Language Agency Classification (LAC) Dataset Construction

Training BERT-based Language Agency Classifiers

Running Generation Experiments on LLMs

Running Evaluation Experiments on LLMs

About

Uh oh!

Releases

Packages

Languages

License

elainew728/labe-agency

Folders and files

Latest commit

History

Repository files navigation

White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

Language Agency Classification (LAC) Dataset Construction

Training BERT-based Language Agency Classifiers

Running Generation Experiments on LLMs

Running Evaluation Experiments on LLMs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages