Page MenuHomePhabricator

Search indices image suggestion tags differ from the dataset used to update
Closed, ResolvedPublic3 Estimated Story Points

Description

As part of T345141: No ALIS for 2023-08-14 snapshot, we elicited a manual update of search indices, taking as input the analytics_platform_eng.image_suggestions_search_index_full/snapshot=2023-08-21 dataset.

Data in the indices doesn't seem to be the same as in the dataset. For instance, article-level suggestions:

  • eswiki returns 79 results VS 90045 in the dataset
  • frwiki returns 102 VS 123718 in the dataset

Same for section-level image suggestions, e.g.,:

  • frwiki returns 27148 VS 54569 in the dataset

Details

TitleReferenceAuthorSource BranchDest Branch
search: generalize image_suggestions_manualrepos/data-engineering/airflow-dags!485dcausseimage_suggestion_fixup_T345545main
Customize query in GitLab

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel set the point value for this task to 3.
Gehel moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

Mentioned in SAL (#wikimedia-operations) [2023-09-05T17:47:15Z] <dcausse@deploy1002> Started deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual

Mentioned in SAL (#wikimedia-operations) [2023-09-05T17:47:42Z] <dcausse@deploy1002> Finished deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual (duration: 00m 26s)

Mentioned in SAL (#wikimedia-operations) [2023-09-05T17:57:24Z] <dcausse> T345545: triggered a manual dag run to import analytics_platform_eng.image_suggestions_search_index_full/snapshot=2023-08-21

Checked the search queries mentioned in the task description and they seem to return the expected number of results, moving to our board's "Needs Reporting" column but please consider the parent task unblocked.