⭐️ We are honored that Re2 is added into optillm, a repo to optimize LLMs' inference.
⭐️ The official code for the paper EMNLP 2024 "Re-reading improves reasoning in large language models".
Idea: Simply repeating the question to get "bidirectional" understanding for improving reasoning.
pip install -r requirements.txt
Set your OpenAI Key:
export OPENAI_API_KEY=your_openai_key
Run gpt-3.5-turbo-0613
on a dataset to test the code:
sh run_single_dataset.sh
Run gpt-3.5-turbo-0613
on all datasets:
sh run.sh
The parameter --multithread
could be added to run the code in parallel to speed up the process a lot.
python main.py --dataset gsm --temperature 0.0 --acts vanilla cot --model gpt-3.5-turbo-0613 --read_times 2 --multithread
Run Self-Consistency Experiments:
python main.py --dataset gsm --temperature 0.7 --multithread --acts vanilla --num_threads 40 --majority_at 10
python main.py --dataset gsm --temperature 0.7 --multithread --acts vanilla --num_threads 40 --majority_at 10 --read_times 2
Run Re2 on PAL and Plan-and-Solve prompting methods:
python main.py --dataset gsm --temperature 0.0 --multithread --acts ps pal --num_threads 40
python main.py --dataset gsm --temperature 0.0 --verbose --debug --acts ps pal --num_threads 40 --read_times 2
Evaluation will be done automatically after the generation. Or you could run the evaluation code manually:
python eval.py --dataset dataset_name --acts vanilla cot --eval_file generated_file
For example, to evaluate the generated file on the GSM dataset:
python eval.py --dataset gsm --acts vanilla cot --eval_file results/gsm_gpt-4o-mini-2024-07-18_topp1.0_temp0.0_majority1_readtimes1_20240926_221724.jsonl
You could run the code on other datasets by changing the --dataset
parameter.
Parameters for the datasets are as follows:
gsm
: GSM datasetsvamp
: SVAMP datasetasdiv
: ASdiv datasetaqua
: Aqua datasetmultiarith
: MultiArith datasetmawpssingleeq
: MAWPS Single Equation datasetmawpsaddsub
: MAWPS AddSub datasetcommonsenseqa
: CommonsenseQA datasetstrategyqa
: StrategyQA datasetarc_easy
: ARC Easy datasetarc_challenge
: ARC Challenge datasetdate_understanding
: Date Understanding datasetcoin_flip
: Coin Flip dataset
Some results logs are in the results
folder.
We also conduct experiments on GPT-4o-mini. Change the --model
parameter to gpt-4o-mini-2024-07-18
to run the experiments on GPT-4o-mini.
The results are shown as follows:
If you find this work helpful, please consider citing our paper:
@inproceedings{xu-etal-2024-reading,
title = "Re-Reading Improves Reasoning in Large Language Models",
author = "Xu, Xiaohan and
Tao, Chongyang and
Shen, Tao and
Xu, Can and
Xu, Hongbo and
Long, Guodong and
Lou, Jian-Guang and
Ma, Shuai",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.871",
pages = "15549--15575"
}
We refer to the code of PAL. Thanks for their contributions.
- Modification: Implemented the Re2 method based on the PAL codebase.
- Date: March 2024