Sparsemax-Supervised Attention for Explainable Hate Speech Detection

This repository is a fork of the original HateXplain project. The main addition is a BERT based classifier where the attention weights are normalised with sparsemax instead of the usual softmax. Sparsemax encourages sparser attention distributions which can make rationales easier to interpret.

The dataset and preprocessing utilities remain unchanged from the upstream repository. Below we highlight the files specific to the sparsemax variant and how to train or evaluate the model.

environment (docker or conda)

once you submit an interactive job with the docker image built from ./ee559_docker_env on runai you need to install the following, or anything else is asked after you launch training.

bash download_glove.sh

python -m spacy download en_core_web_sm

pip install wandb

pip install lime # only for testing with lime

we also made a standalone conda environment in case is preferred environment-conda-only.yml

Branches and key Files

for the master branch

master

Models/bertModels.py – implements SC_weighted_BERT with a sparsemax option (sparsemax_tensor and sparsemax_true_loss).
best_model_json/bestModel_bert_sparsemax.json – example hyper‑parameters for training the sparsemax model.
test_inference.py – supports --variant softmax|sparsemax for quick testing.
run_lime.sh – computes explanation metrics on both softmax and sparsemax models using LIME.

Refer to the main README.md for a complete overview of the dataset and other directories.

for the softmax branch

softmax_branch

it's a branch with the original softmax, to test the results with the same parameters we have ran

for the multi sparsemax branch

in multi_sparsemax_branch and multi_sparsemax_branch2 there where attempts to implement sparsemax loss at every attention layer.

Training

Use the provided parameter file to reproduce our configuration. The attention_lambda argument controls the weight of the supervised attention loss.

python manual_training_inference.py ./best_model_json/bestModel_bert_sparsemax.json  false 0.001

This will train SC_weighted_BERT with sparsemax attention on a GPU if available.

Inference

After training, you can run inference and visualise the most attended tokens:

python test_inference.py

you need the folder with the model.safetensors outputted by the training.

''' ./bert-base-uncased_11_6_3_0.001

config.json
model.safetensors
special_tokens_map.json
tokenizer_config.json
vocab.txt '''

the path of the folder is put at the top of test_inference.py

LIME Evaluation

scripts/run_lime.sh evaluates explanation quality via the LIME toolkit for both softmax and sparsemax models. The script expects trained checkpoints named bert_supervised and bert_sparsemax in the working directory.

bash run_lime.sh

note that the number of samples is decided at line 454 of testing_with_lime for the report results, we used 1000. now is set to 300.

The results will be stored in lime_results/ and can be compared using analyze_lime_ttest.py.

original readme:

🔎 HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [Accepted at AAAI 2021]

🎉 🎉 BERT for detecting abusive language(Hate speech+offensive) and predicting rationales is uploaded here. Be sure to check it out 🎉 🎉.

For more details about our paper

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection". Accepted at AAAI 2021.

Arxiv paper link

Abstract

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this work, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

WARNING: The repository contains content that are offensive and/or hateful in nature.

Please cite our paper in any published work that uses any of these resources.

@inproceedings{mathew2021hatexplain,
  title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
  author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={17},
  pages={14867--14875},
  year={2021}
}

Folder Description 📂


./Data                --> Contains the dataset related files.
./Models              --> Contains the codes for all the classifiers used
./Preprocess  	      --> Contains the codes for preprocessing the dataset	
./best_model_json     --> Contains the parameter values for the best models

Table of contents 📑

🔖 Dataset :- This describes the dataset format and setup for the dataset pipeline.

🔖 Parameters :- This describes all the different parameter that are used in this code

Usage instructions

please setup the Dataset first (more important if your using non-bert model). Install the libraries using the following command (preferably inside an environemt)

pip install -r requirements.txt

Training

To train the model use the following command.

usage: manual_training_inference.py [-h]
                                    --path_to_json --use_from_file
                                    --attention_lambda

Train a deep-learning model with the given data

positional arguments:
  --path_to_json      The path to json containining the parameters
  --use_from_file     whether use the parameters present here or directly use
                      from file
  --attention_lambda  required to assign the contribution of the atention loss

You can either set the parameters present in the python file, option will be (--use_from_file set to True). To change the parameters, check the Parameters section for more details. The code will run on CPU by default. The recommended way will be to copy one of the dictionary in best_model_json and change it accordingly.

For transformer models :-The repository current supports the model having similar tokenization as BERT. In the params set bert_tokens to True and path_files to any of BERT based models in Huggingface.
For non-transformer models :-The repository current supports the LSTM, LSTM attention and CNN GRU models. In the params set bert_tokens to False and model name according to Parameters section (either birnn, birnnatt, birnnscrat, cnn_gru).

For more details about the end to end pipleline visit our_demo

Blogs and github repos which we used for reference 👼

For finetuning BERT this blog by Chris McCormick is used and we also referred Transformers github repo.
For CNN-GRU model we used the original repo for reference.
For Evaluation using the Explanantion metrics we used the ERASER benchmark repo. Please look into their repo and paper for more details.

Todos

Add arxiv paper link and description.
Release better documentation for Models and Preprocess sections.
Add other Transformers model to the pipeline.
Upload our model to transformers community to make them public
Create an interface for social scientists where they can use our models easily with their data

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
Data		Data
Figures		Figures
Models		Models
Preprocess		Preprocess
TensorDataset		TensorDataset
best_model_json		best_model_json
ee559_docker_env		ee559_docker_env
eraserbenchmark		eraserbenchmark
plots:files		plots:files
wandb		wandb
wandb_utils		wandb_utils
.gitignore		.gitignore
Bias_Calculation_NB.ipynb		Bias_Calculation_NB.ipynb
Example_HateExplain.ipynb		Example_HateExplain.ipynb
Explainability_Calculation_NB.ipynb		Explainability_Calculation_NB.ipynb
LICENSE		LICENSE
Parameters_description.md		Parameters_description.md
README.md		README.md
analyze_lime_results.py		analyze_lime_results.py
analyze_lime_ttest.py		analyze_lime_ttest.py
analyze_original_data.py		analyze_original_data.py
best_runs.sh		best_runs.sh
convert_to_word2vec.py		convert_to_word2vec.py
download_glove.sh		download_glove.sh
environment-conda-only.yml		environment-conda-only.yml
environment-for-docker.yml		environment-for-docker.yml
itos.json		itos.json
manual_training_inference.py		manual_training_inference.py
parameters_selection.py		parameters_selection.py
requirements-docker.txt		requirements-docker.txt
requirements.txt		requirements.txt
run_lime.sh		run_lime.sh
sanity_check.py		sanity_check.py
stoi.json		stoi.json
test_inference.py		test_inference.py
test_parallel.sh		test_parallel.sh
testing_for_bias.py		testing_for_bias.py
testing_with_lime.py		testing_with_lime.py
testing_with_rational.py		testing_with_rational.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparsemax-Supervised Attention for Explainable Hate Speech Detection

environment (docker or conda)

Branches and key Files

for the master branch

for the softmax branch

for the multi sparsemax branch

Training

Inference

LIME Evaluation

original readme:

🔎 HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [Accepted at AAAI 2021]

🎉 🎉 BERT for detecting abusive language(Hate speech+offensive) and predicting rationales is uploaded here. Be sure to check it out 🎉 🎉.

For more details about our paper

Abstract

Folder Description 📂

Table of contents 📑

Usage instructions

Training

Blogs and github repos which we used for reference 👼

Todos

👍 The repo is still in active developements. Feel free to create an issue !! 👍

About

Uh oh!

Releases

Packages

Languages

License

MicheleSmaldone/HateXplain

Folders and files

Latest commit

History

Repository files navigation

Sparsemax-Supervised Attention for Explainable Hate Speech Detection

environment (docker or conda)

Branches and key Files

for the master branch

for the softmax branch

for the multi sparsemax branch

Training

Inference

LIME Evaluation

original readme:

🔎 HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [Accepted at AAAI 2021]

🎉 🎉 BERT for detecting abusive language(Hate speech+offensive) and predicting rationales is uploaded here. Be sure to check it out 🎉 🎉.

For more details about our paper

Abstract

Folder Description 📂

Table of contents 📑

Usage instructions

Training

Blogs and github repos which we used for reference 👼

Todos

👍 The repo is still in active developements. Feel free to create an issue !! 👍

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages