GitHub - csalt-research/accented-codebooks-asr at accented-pretraining

Name	Name	Last commit message	Last commit date
Latest commit History 11 Commits
data	data
espnet_code	espnet_code
fairseq_code	fairseq_code
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md

Improving Self-supervised Pre-training using Accent-Specific Codebooks

Interspeech 2024

About The Repository
Getting Started
- Prerequisites and Installation
- Training
Roadmap
Dataset Statistics
Contributing
Contributors
Citation
License

About The Repository

This repository hosts the artefacts pertaining to our paper Improving Self-supervised Pre-training using Accent-Specific Codebooks accepted to the main conference of Interspeech 2024. This work is an extension of our previous work Accented Speech Recognition With Accent-specific Codebooks which was accepted at the main conference of EMNLP 2023.

The main contribution of this paper 🔎 is to extend the accent adaptation technique that uses a set of learnable codebooks and a modified beam-search decoding algorithm to both self-supervised pre-training and ASR finetuning.

Getting Started

The repository contains two folders:

fairseq code 📁 - Contains code to run our SSL experiments on Fairseq toolkit. Detailed instruction on how to run our experiments can be found here.
espnet_code 📁 - Contains code to run our experiments on ESPnet toolkit. Detailed instruction on how to run ASR experiments can be found here.

Prerequisites and Installation

For fairseq related installation, follow the instructions here.
FOr ESPnet related installation, follow the instructions here.
Finally, clone the repository containing our code and dataset.

git clone https://github.com/csalt-research/accented-codebooks-asr.git

git checkout accented-pretraining

Additionally, to run the dataset creation script, run the following:

pip install -r accented-codebooks-asr/data/requirements.txt

Training

Extract the csvs from the tar file in data folder

tar  -xvzf accented-codebooks-asr/data/dataset.tar.gz

For experiments related to Fairseq, please refer to these instructions.
Instructions related to ESPnet training:
- Copy the files from espnet_code into ESPnet egs
```
cp -r accented-codebooks-asr/espnet_code/* <espnet_root_folder>/egs/commonvoice/asr1
```
- Enter the path to the the directory hosting our splits in run.sh
```
csvdir=  # Path to the directory hosting all our csvs.
```
- Run the script
```
./run.sh
```

Dataset Statistics

The statistics of train, dev and test splits used in our experiments are as follows:

Accent	Train 100h (in hours)	Train (in hours)	Dev (in hours)	Test (in hours)
Australien	6.95	45.36	4.33	0.46
Kanada	6.79	41.13	1.16	1.21
England	19.51	119.9	3.22	1.65
Scotland	2.69	16.21	0.23	0.16
US	64.12	400.1	8.32	4.87
Africa	-	-	-	1.71
Hongkong	-	-	-	0.52
Indien	-	-	-	0.58
Irland	-	-	-	1.94
Malaysia	-	-	-	0.39
Newzealand	-	-	-	2.11
Philippinen	-	-	-	0.90
Singapur	-	-	-	0.64
Wales	-	-	-	0.27

Roadmap

See the open issues for a list of proposed features (and known issues) relevant to this work. For ESPnet related features/issues, checkout their github repository.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
Please open an individual PR for each suggestion.

Creating A Pull Request

Fork the Project
Create your Feature Branch (git checkout -b feature/NewFeature)
Commit your Changes (git commit -m 'Add appropriate commit message'). The correct way to write your commit message can be found here
Push to the Branch (git push origin feature/NewFeature)
Open a Pull Request

Contributors

Darshan Prabhu - M.Tech, CSE, IIT Bombay - Darshan Prabhu
Abhishek Kumar Gupta - M.Tech, CSE, IIT Bombay - Abhishek Kumar Gupta
Omkar Nitsure - B.Tech, EE, IIT Bombay - Omkar Nitsure
Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
Sriram Ganapathy - Associate Professor, EE, IISc Bangalore - Sriram Ganapathy
Vinit Unni - Ph.D, CSE, IIT Bombay - Vinit Unni

Citation

If you use this code for your research, please consider citing our works.

@misc{prabhu2023accented,
      title={Accented Speech Recognition With Accent-specific Codebooks}, 
      author={Darshan Prabhu and Preethi Jyothi and Sriram Ganapathy and Vinit Unni},
      year={2023},
      eprint={2310.15970},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{prabhu2024improvingselfsupervisedpretrainingusing,
      title={Improving Self-supervised Pre-training using Accent-Specific Codebooks}, 
      author={Darshan Prabhu and Abhishek Gupta and Omkar Nitsure and Preethi Jyothi and Sriram Ganapathy},
      year={2024},
      eprint={2407.03734},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.03734}, 
}

License

Distributed under the MIT License. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Self-supervised Pre-training using Accent-Specific Codebooks

Table Of Contents

About The Repository

Getting Started

Prerequisites and Installation

Training

Dataset Statistics

Roadmap

Contributing

Creating A Pull Request

Contributors

Citation

License

Über uns

Releases 1

Packages

Contributors 2

Languages

License

csalt-research/accented-codebooks-asr

Folders and files

Latest commit

History

Repository files navigation

Improving Self-supervised Pre-training using Accent-Specific Codebooks

Table Of Contents

About The Repository

Getting Started

Prerequisites and Installation

Training

Dataset Statistics

Roadmap

Contributing

Creating A Pull Request

Contributors

Citation

License

Über uns

Ressourcen

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages