Unraveling Misinformation Propagation in LLM Reasoning

This is the official implementation for the paper "Unraveling Misinformation Propagation in LLM Reasoning", which explores how misinformation propagates in LLM reasoning.

We develop a pipeline to systematically evaluate the impact of misinformation on LLM reasoning, including its impact and mitigation.

Takeaways:

LLMs follow misinformation by default.
LLMs fail to correct misinformation even when instructed to do so.
Fine-tuning early factual corrections are most effective to mitigate misinformation propagation, but still doesn't fully recover performance.

Setup

Environment Setup

We recommend you to create a new conda environment for running codes in the repository. Replace {your_path} in err_prop_usr_misinfo.yml with your actual conda installation path. Then run the following commands:

conda env create -f err_prop_usr_misinfo.yml
conda activate err_prop_usr_misinfo

API Key Setup

In this project, we need to use OpenAI API, Together AI API, and Huggingface User Access Token to access different language models. Follow their tutorials to create API keys and put them in api_key/config.json. See the GPT-4o-mini-sft API key in 3.2.

{
    "openai_api_key": "...",
    "togetherai_api_key": "...",
    "huggingface_api_key": "...",
    "gpt-4o-mini-sft": "..."
}

Data Preprocessing

python test_data_generation/main.py

We start from raw testing data saved in raw_data/{dataset_name}/test.jsonl. The details of preprocessing are in Appendix D. The preprocessing process is implemented in the get_init_df function in scripts/test_data_generation/main.py.

After preprocessing, we retain only questions where the equation generation model (gpt-4-0613) produces correct answers to ensure the reliability of ground-truth equations. Additionally, to exclude overly simple questions, we filter out those with fewer than 5 CoT steps in their solutions.

Then we simulate misinformation by generating correct and relevant equations and then perturbing them using common human error patterns. Details are in Section 3.2 and Appendix B.

To generate the processed testing data, run the following command:

python test_data_generation/main.py --dataset_names gsm8k math mathqa metamath --api_config_file_path api_key/config.json --sample_size 100 --temperature 0.7 --top_p 0.7 --top_k 50 --number_of_outputs 1

(Sample size is set to 100 but the total number of questions is 100 * 4 (datasets) = 400.)

The processed data is saved in pcd_data/mix/test_400_perturbed.jsonl.

Experiments

Overview of Experiments

First, we analyze the impact of misinformation on final answers and reasoning behaviors of LLMs, including:

Default behaviors under misinformation (Section 5.1):
1. We analyze LLM's default behavior under misinformation and also explicitly instruct the models to follow misinformation.
2. By default, LLMs treat misinformation as instructions and follow them, leading to incorrect answers.
Instruct to correct misinformation (Section 5.2):
1. We explicitly instruct LLMs to correct misinformation.
2. LLMs still struggle to correct misinformation, even with explicit instructions.

Then, we explore how to mitigate misinformation via correction, including:

Factors of effective correction (Section 6.1):
1. We analyze the factors that affect the effectiveness of correction, including correction behaviors and positions of reasoning steps.
2. We find that early factual corrections are effective to mitigate misinformation propagation.
Fine-tuning with effective correction (Section 6.2):
1. We fine-tune LLMs with effective correction behaviors to mitigate misinformation propagation.
2. We find that fine-tuning with such correction behaviors significantly mitigates misinformation propagation.

Reproduce Section 5.1, 5.2, and 6.1

PART I. Generate Reasoning Steps in Original and Misinformed Settings (Section 5)

Performance is evaluated under two conditions: without misinformation (original) and with misinformation (misinformed) to assess whether strong reasoning models can be misled, and we also don't provide explicit instructions, or instruct the models to follow or correct misinformation.

To collect models' reasoning results, run the following command:

python run_experiments.py --model_name {model_name} --dataset_name mix --sample_size 400 --temperature 0.7 --top_p 0.7 --top_k 50 --number_of_outputs 5 --api_config_file_path api_key/config.json

The {model_name} could be Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, Llama-3.2-11B-Vision-Instruct, Llama-3.2-90B-Vision-Instruct, Mixtral-8x7B-Instruct-v0.1, Mixtral-8x22B-Instruct-v0.1, Qwen2-72B-Instruct, gpt-4o-mini. The prediction includes prfx_q (original performance, without misinformation), prfx_q_prfx_pert (misinformed performance, with misinformation), and prfx_q_prfx_pert_both (misinformed performance with explicit correction instruction) and results are saved in pcd_data/mix/test_400_perturbed_premise_{model_name}.jsonl.

PART II. Generate Reasoning Steps in the Controlled Analysis (Section 6.1)

To perform the controlled analysis, run the following command:

python run_experiments_intervention.py --model_name {model_name} --dataset_name mix --sample_size 400 --temperature 0.7 --top_p 0.7 --top_k 50 --number_of_outputs 5 --api_config_file_path api_key/config.json

The {model_name} could be Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, Llama-3.2-11B-Vision-Instruct. The prediction includes prfx_q, prfx_q_prfx_pert, and prfx_q_prfx_pert_both, the same as run_experiments.py. There are also prfx_q_prfx_pert_pos (misinformed performance with factual corrections at different positions), prfx_q_prfx_pert_corr_bad (misinformed performance with non-factual corrections), prfx_q_prfx_pert_point_out_only_bad (misinformed performance with no corrections). Results are saved in pcd_data/mix/test_400_perturbed_premise_{model_name}_correction.jsonl.

PART III. Evaluate Reasoning Steps with Verifiers

We implement several automatic verifiers to analyze the reasoning behaviors of different models. To evaluate the correction and misinformation-following behaviors corresponding, run the following command:

python scripts/run_evaluation.py --model_names "meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo" "Qwen/Qwen2-72B-Instruct" "mistralai/Mixtral-8x7B-Instruct-v0.1" "mistralai/Mixtral-8x22B-Instruct-v0.1" "gpt-4o-mini" "meta-llama/Llama-3.2-1B-Instruct" "meta-llama/Llama-3.2-3B-Instruct" "meta-llama/Llama-3.2-11B-Vision-Instruct" --output_path "./exp_results/eval/test_400_perturbed_premise_evaluated.pkl"

To evaluate the correction behaviors for the small model (Llama-3.2-1B) only (Appendix G), run the following command:

python scripts/run_evaluation.py --model_names "meta-llama/Llama-3.2-1B-Instruct" --output_path "./exp_results/eval/test_400_perturbed_premise_evaluated_1b.pkl"

We also have three annotators to manually evaluate the correction behaviors to validate the effectiveness of automatic verifiers. Follow demonstration/annotation.ipynb and we gather the results in exp_results/eval/test_400_perturbed_premise_evaluated_annotated_final.pkl.

PART IV. Visualize the Results

First precompute the bootstrap results:

python scripts/precompute_bootstrap_results.py

Results are in exp_results/.

Then follow demonstration/result_visualization.ipynb to plot figures, which are saved in figures/.

Reproduce Section 6.2

PART I. Fine-tune the Model

To evaluate model performance corresponding to Section 6.2, first, run the first part (Fine-tuning) of cells in demonstration/finetune.ipynb. You will finetune a GPT-4o-mini model and get the model id. Save the id in api_key/config.json.

PART II. Generate Reasoning Steps in the Fine-tuning Analysis

With the saved id, run the following command:

python run_experiments_finetune.py --model_name gpt-4o-mini-sft --dataset_name mix --sample_size 400 --temperature 0.7 --top_p 0.7 --top_k 50 --number_of_outputs 5 --api_config_file_path api_key/config.json

The prediction includes base_original (original performance), base_misinformed (misinformed performance), inst_original (original performance with explicit instructions), inst_misinformed (misinformed performance with explicit instruction), ft_original (original performance with fine-tuning), ft_misinformed (misinformed performance with fine-tuning), inst_ft_original (original performance with both explicit instructions and fine-tuning), and inst_ft_misinformed (misinformed performance with both explicit instructions and fine-tuning). Results are saved in pcd_data/mix/test_4oo_perturbed_premise_{model_name}.jsonl. Note that the correction-specific data is saved in pcd_data/finetune/correction_training_set.jsonl. We will release the data collection code in the future.

PART III. Visualize the Results

Follow the Plot Sankey Graph part in demonstration/finetune.ipynb.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{feng-etal-2025-unraveling,
    title = "Unraveling Misinformation Propagation in {LLM} Reasoning",
    author = "Feng, Yiyang  and
      Wang, Yichen  and
      Cui, Shaobo  and
      Faltings, Boi  and
      Lee, Mina  and
      Zhou, Jiawei",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.627/",
    pages = "11683--11707",
    ISBN = "979-8-89176-335-7",
    abstract = "Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning, positioning them as promising tools for supporting human problem-solving. However, what happens when their performance is affected by *misinformation*, i.e., incorrect inputs introduced by users due to oversights or gaps in knowledge? Such misinformation is prevalent in real-world interactions with LLMs, yet how it propagates within LLMs' reasoning process remains underexplored. Focusing on mathematical reasoning, we present a comprehensive analysis of how misinformation affects intermediate reasoning steps and final answers. We also examine how effectively LLMs can correct misinformation when explicitly instructed to do so. Even with explicit instructions, LLMs succeed less than half the time in rectifyingmisinformation, despite possessing correct internal knowledge, leading to significant accuracy drops (10.02{\%} {--} 72.20{\%}), and the degradation holds with thinking models (4.30{\%} {--} 19.97{\%}). Further analysis shows that applying factual corrections early in the reasoning process most effectively reduces misinformation propagation, and fine-tuning on synthesized data with early-stage corrections significantly improves reasoning factuality. Our work offers a practical approach to mitigating misinformation propagation."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unraveling Misinformation Propagation in LLM Reasoning

Setup

Environment Setup

API Key Setup

Data Preprocessing

Experiments

Overview of Experiments

Reproduce Section 5.1, 5.2, and 6.1

Reproduce Section 6.2

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api_key		api_key
demonstration		demonstration
exp_results		exp_results
figures		figures
img		img
pcd_data		pcd_data
raw_data		raw_data
scripts		scripts
test_data_generation		test_data_generation
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
err_prop_usr_misinfo.yml		err_prop_usr_misinfo.yml
run_experiments.py		run_experiments.py
run_experiments_finetune.py		run_experiments_finetune.py
run_experiments_intervention.py		run_experiments_intervention.py

Wind-2375-like/misinfo-prop

Folders and files

Latest commit

History

Repository files navigation

Unraveling Misinformation Propagation in LLM Reasoning

Setup

Environment Setup

API Key Setup

Data Preprocessing

Experiments

Overview of Experiments

Reproduce Section 5.1, 5.2, and 6.1

Reproduce Section 6.2

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages