Skip to content
Merged Remove files via hdfs dfs when done with mediawiki-content-dump.
repos / data-engineering / Airflow DAGs !506 · created by Xcollazo
Bumps to pickup https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/16. This fixes an issue in which `DROP TABLE ... PURGE` was not being honored. It als...
updated
Merged Add LOCATION to intermediate table creation.
repos / data-engineering / dumps / mediawiki-content-dump !16 · created by Xcollazo
Bug: T346281
updated
Merged Bump mediawiki-content-dump artifact to pickup *second* deduplication fix.
repos / data-engineering / Airflow DAGs !505 · created by Xcollazo
See https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/15 for details. Bug: T346281
updated
Merged Return valid predicate when duplicate list is empty.
repos / data-engineering / dumps / mediawiki-content-dump !15 · created by Xcollazo
Fixes silly bug introduced in https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/14 where, if no duplicates were found, the query would fail with a synta...
updated
Merged Bump mediawiki-content-dump artifact to pickup deduplication fix.
repos / data-engineering / Airflow DAGs !503 · created by Xcollazo
See https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/14 for details. Bug: T346281
updated
Merged Deduplicate intermediate table (wiki_db, revision_id) tuples on read.
repos / data-engineering / dumps / mediawiki-content-dump !14 · created by Xcollazo
In [T346281#9195431](https://phabricator.wikimedia.org/T346281#9195431), we figured out that the current dumps process can inadvertently publish duplicate `(wiki_db, revision_id)` tuples. Presumably t...
updated
Merged [JS] Add MetricsClient#submitInteraction()
repos / data-engineering / Metrics Platform !5 · created by Phuedx
... and `MetricsClient#submitClick()`. Depends-On: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/952252 Bug: T346287
updated
Merged Use an intermediate table when backfilling wmf_dumps.wikitext_raw_rc1.
repos / data-engineering / Airflow DAGs !497 · created by Xcollazo
(Depends on https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/13) This MR replaces the 4 DAGs `dumps_merge_backfill_to_wikitext_raw_*` with a single on...
updated
Merged Use an intermediate table when backfilling.
repos / data-engineering / dumps / mediawiki-content-dump !13 · created by Xcollazo
Bug: T346281
updated