Page MenuHomePhabricator

No ALIS for 2023-08-14 snapshot
Closed, ResolvedPublic

Description

The last image suggestions pipeline run resulted in no ALIS data.

Context

  • Missing wmf_raw.mediawiki_revision's (ALIS's upstream dependency) 2023-07 snapshot, see thread 1 and thread 2
  • 2023-07_backup snapshot available
  • 2 pipeline runs timed out waiting for 2023-07 and got skipped, i.e., 2023-07-31 and 2023-08-07
  • forced the execution of 2023-08-14 with the 2023-07_backup snapshot
  • the pipeline succeeded
  • no ALIS data

Tasks

  • rename all 2023-08-14 partitions to no_alis
  • set the previous_weekly_snapshot DAG property to no_alis
  • let the wmf_raw.mediawiki_revision sensor point to the 2023-06 snapshot
  • clear the last execution

Still no ALIS.

  • reproduce the same execution in an Airflow test instance
  • add a breakpoint where the wmf_raw.mediawiki_revision is read
  • debug the execution

Outcome

  • The upstream dependency issue doesn’t seem to be the cause
    • no ALIS even with the older snapshot
    • a debug session resulted in non-empty data joined with that dependency
  • manually ran the single ALIS task and suggestions are there!

Recovery plan

  • recompute a proper full index by manually running the pipeline in the Airflow test instance
  • copy the output to the production DB
  • let 2023-08-21's production run compute the proper delta
NOTE: the reason why the production Airflow instance behaved differently from a test one is yet to be investigated.

Event Timeline

mfossati changed the task status from Open to In Progress.Aug 29 2023, 10:07 AM
mfossati claimed this task.
mfossati moved this task from Incoming to Doing on the Structured-Data-Backlog (Current Work) board.

Latest search indices update running now.

ruwiki still has only 68 recommendations

@mfossati could you update the recovery plan to reflect what is currently happening, as far as I understand the let 2023-08-21's production run compute the proper delta step did not run as expected since you are requesting to import the full dataset in T345545, could you also elaborate on what when wrong that could explain why the delta is not correct?

as far as I understand the let 2023-08-21's production run compute the proper delta step did not run as expected

That run did generate the expected delta, but see below for what went wrong.

since you are requesting to import the full dataset in T345545, could you also elaborate on what when wrong that could explain why the delta is not correct?

Search’s index update DAG sensor succeeded as soon as that delta was available, leading to a missing delta between the no-ALIS state and that one.
My mistake, I should have elicited the manual update of the in-between delta before unpausing our DAG.
Here’s the sequence of what search indices got:

  • no ALIS (broken 2023-08-14)
  • good 2023-08-21

As a result, the 2023-08-21 full state would cover the in-between delta:

  • the 2023-08-21 delta has already been applied, so all deletions (i.e., __DELETE_GROUPING__) have already happened
  • no such rows appear in the full index
  • the same subset of 108 k rows that have already been added is also in the full index, but Elastic is expected to skip those identical updates
  • the remainder covers the in-between delta, which can be considered as a fix for the broken 2023-08-14 delta