NoiseFilter_IB

The code for our ACL 2023 paper An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation is provided at this repo. If you have any questions, please reach me at [email protected].

Data Construction

Our approach is not fixated on finding the optimal solution in the initial stages, but rather focuses on gradually approaching the optimal filter model through iterative training. Here we choose the two easiest methods, which can greatly reduce computational costs on the construction of training data

The goal of exact search is to find the paragraphs or sentences containing the ground answers. Greedy search is one of the most popular heuristic method by far used in extractive summarization. This algorithm extracts oracle labels with the highest ROUGE scores compared to human-annotated abstracts.

We considered two silver summaries, one that concatenates the query and answer, and the other that focuses solely on the answer itself. The former can cover more information, while the latter focuses more on the answer itself. Specially, the answer in intermediate state, supporting facts, are incorporated for multi-hop questions.

for oracle_mode in exact exact_para greedy_ans greedy
do
python pre_cands.py \
--oracle_mode ${oracle_mode} \
--source_path ./data/source/${dataset_name}.jsonl \
--compressed_path ./data/compressed/${oracle_mode}/${dataset_name}.jsonl
done

dataset_name=nq_dev
batch_size=2
max_example=5

python cal_ib.py \
--source_path ./data/source/ \
--compressed_path ./data/compressed/ \
--save_path ./data/combine/ \
--data_name ${dataset_name}.jsonl \
--save_name ${dataset_name}_loss.jsonl \
--model_path ./models/llama2_13b_chat_hf \
--batch_size ${batch_size} \
--max_example ${max_example}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data/source		data/source
Info_Bottle.py		Info_Bottle.py
cal_ib.py		cal_ib.py
check_ib.py		check_ib.py
pre_cands.py		pre_cands.py
readme.md		readme.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoiseFilter_IB

Data Construction

Über uns

Releases

Packages

Languages

zhukun1020/NoiseFilter_IB

Folders and files

Latest commit

History

Repository files navigation

NoiseFilter_IB

Data Construction

Über uns

Ressourcen

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages