Skip to content

SEACrowd/globalutility

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Measuring the Global Utility of Language Technologies

This is supplementary data and code to help reproduce the results of a paper of the same name. It is currently available in somewhat unpolished form, but will be fully documented in the near future.

File Structure

  • populations: contains .tsv files with language-population data and other population-relevant data

  • task_results: contains .tsv files with the aggregated results for each NLP task.

  • economic_indicators_data: contains files from WITS (under wits_en_trade_summary_allcountries_allyears) and converted to map to languages instead of countries. Important files are:

    • languages_to_gdp.tsv for monolingual mapping of languages to associated GDP estimations.
    • bilingual_indicators.tsv for bilingual mapping of languages to associated bilingual indicators (Imports, Exports) estimations. Also includes the triangulated BLEU scores for the language pair.
  • figs: contains correlation figures, created with the plot_*_correlations.py scripts.

  • area-classifier: contains data and code for a classifier of areas

Other files:

  • counterfactuals.py: computes the counterfactual scenarios presented in the paper
  • constants.py contains functions to read in all necessary data, which are used in other files to run the metrics estimations and produce the plots
  • economic_indicators.py: contains function to read in economic indicators (called by constants.py)

TO-DO

  1. Add general metric calculation script
  2. Data paths are all absolute, need to correct this

Über uns

No description, website, or topics provided.

Ressourcen

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TeX 99.6%
  • Other 0.4%