Skip to content

csalt-research/accented-codebooks-asr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSALT @ IITB

Improving Self-supervised Pre-training using Accent-Specific Codebooks

Interspeech 2024

Downloads Contributors Forks Stargazers

Table Of Contents

About The Repository

This repository hosts the artefacts pertaining to our paper Improving Self-supervised Pre-training using Accent-Specific Codebooks accepted to the main conference of Interspeech 2024. This work is an extension of our previous work Accented Speech Recognition With Accent-specific Codebooks which was accepted at the main conference of EMNLP 2023.

The main contribution of this paper 🔎 is to extend the accent adaptation technique that uses a set of learnable codebooks and a modified beam-search decoding algorithm to both self-supervised pre-training and ASR finetuning.

Getting Started

The repository contains two folders:

  • fairseq code 📁 - Contains code to run our SSL experiments on Fairseq toolkit. Detailed instruction on how to run our experiments can be found here.
  • espnet_code 📁 - Contains code to run our experiments on ESPnet toolkit. Detailed instruction on how to run ASR experiments can be found here.

Prerequisites and Installation

  • For fairseq related installation, follow the instructions here.
  • FOr ESPnet related installation, follow the instructions here.
  • Finally, clone the repository containing our code and dataset.
git clone https://github.com/csalt-research/accented-codebooks-asr.git
git checkout accented-pretraining 
  • Additionally, to run the dataset creation script, run the following:
pip install -r accented-codebooks-asr/data/requirements.txt

Training

  1. Extract the csvs from the tar file in data folder
tar  -xvzf accented-codebooks-asr/data/dataset.tar.gz 
  1. For experiments related to Fairseq, please refer to these instructions.

  2. Instructions related to ESPnet training:

    • Copy the files from espnet_code into ESPnet egs
    cp -r accented-codebooks-asr/espnet_code/* <espnet_root_folder>/egs/commonvoice/asr1
    • Enter the path to the the directory hosting our splits in run.sh
    csvdir=  # Path to the directory hosting all our csvs.
    • Run the script
    ./run.sh

Dataset Statistics

The statistics of train, dev and test splits used in our experiments are as follows:

Accent Train 100h (in hours) Train (in hours) Dev (in hours) Test (in hours)
Australien 6.95 45.36 4.33 0.46
Kanada 6.79 41.13 1.16 1.21
England 19.51 119.9 3.22 1.65
Scotland 2.69 16.21 0.23 0.16
US 64.12 400.1 8.32 4.87
Africa - - - 1.71
Hongkong - - - 0.52
Indien - - - 0.58
Irland - - - 1.94
Malaysia - - - 0.39
Newzealand - - - 2.11
Philippinen - - - 0.90
Singapur - - - 0.64
Wales - - - 0.27

Roadmap

See the open issues for a list of proposed features (and known issues) relevant to this work. For ESPnet related features/issues, checkout their github repository.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  • If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
  • Please open an individual PR for each suggestion.

Creating A Pull Request

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewFeature)
  3. Commit your Changes (git commit -m 'Add appropriate commit message'). The correct way to write your commit message can be found here
  4. Push to the Branch (git push origin feature/NewFeature)
  5. Open a Pull Request

Contributors

  • Darshan Prabhu - M.Tech, CSE, IIT Bombay - Darshan Prabhu
  • Abhishek Kumar Gupta - M.Tech, CSE, IIT Bombay - Abhishek Kumar Gupta
  • Omkar Nitsure - B.Tech, EE, IIT Bombay - Omkar Nitsure
  • Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
  • Sriram Ganapathy - Associate Professor, EE, IISc Bangalore - Sriram Ganapathy
  • Vinit Unni - Ph.D, CSE, IIT Bombay - Vinit Unni

Citation

If you use this code for your research, please consider citing our works.

@misc{prabhu2023accented,
      title={Accented Speech Recognition With Accent-specific Codebooks}, 
      author={Darshan Prabhu and Preethi Jyothi and Sriram Ganapathy and Vinit Unni},
      year={2023},
      eprint={2310.15970},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{prabhu2024improvingselfsupervisedpretrainingusing,
      title={Improving Self-supervised Pre-training using Accent-Specific Codebooks}, 
      author={Darshan Prabhu and Abhishek Gupta and Omkar Nitsure and Preethi Jyothi and Sriram Ganapathy},
      year={2024},
      eprint={2407.03734},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.03734}, 
}

License

Distributed under the MIT License. See LICENSE for more information.