Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (510 w, 5 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Fri, Jul 19

EBernhardson added a comment to T355267: Add extension NetworkSession to all wmf wikis.

The code has been deployed (but not loaded) to the beta cluster since Idea47a43d9fb.

Fri, Jul 19, 7:34 PM · Patch-For-Review, Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup

Wed, Jul 17

EBernhardson updated the task description for T346046: [Search Update Pipeline] Source streams for private wikis.
Wed, Jul 17, 8:25 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Patch-For-Review, Discovery-Search (Current work), Data-Engineering, CirrusSearch
EBernhardson added a comment to T346046: [Search Update Pipeline] Source streams for private wikis.

I've been looking over the related code and pondering what all could potentially go wrong with the practical short term solution.

Wed, Jul 17, 7:32 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Patch-For-Review, Discovery-Search (Current work), Data-Engineering, CirrusSearch
EBernhardson created T370290: Split abandonment and zero-results in the search superset dashboard.
Wed, Jul 17, 1:56 PM · Discovery-Search

Tue, Jul 16

EBernhardson claimed T346046: [Search Update Pipeline] Source streams for private wikis.
Tue, Jul 16, 8:33 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Patch-For-Review, Discovery-Search (Current work), Data-Engineering, CirrusSearch

Mon, Jul 15

EBernhardson moved T369729: RDF Updater Consumer: Allow "empty" patches from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Jul 15, 2:40 PM · Discovery-Search (Current work), Wikidata
EBernhardson moved T286814: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Jul 15, 2:16 PM · MW-1.43-notes (1.43.0-wmf.14; 2024-07-16), Discovery-Search (Current work), Wikimedia-production-error, Data-Engineering

Fri, Jul 12

EBernhardson moved T367691: SUP: Retry 429 (rate limit) at HTTP client level from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

AFAICT this has been deployed. The currently deployed version contains both patches above, and our helmfile configuration sets the new http-rate-limit-per-second to 600.

Fri, Jul 12, 6:29 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T369495: Make `haswbstatement:` work for the EntitySchema property from Ready for Dev -- SWE to To Be Deployed on the Discovery-Search (Current work) board.
Fri, Jul 12, 6:22 PM · Wikidata Dev Team (Wikidata.org Slice), Discovery-Search (Current work), Wikidata, EntitySchema
EBernhardson moved T366589: PHP Deprecated: Implicit conversion from float 75000.00000000001 to int loses precision from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Fri, Jul 12, 4:46 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), affects-translatewiki.net, PHP 8.1 support, CirrusSearch
EBernhardson moved T369148: Replace usage of StatsdDataFactory with StatsFactory from Ready for Dev -- SWE to Blocked/Waiting on the Discovery-Search (Current work) board.

If i understand this correctly, it needs to be delayed by a few weeks. Metrics started getting recorded into prometheus with this weeks train deploy, we likely want to wait for a couple (2?) weeks to transition the graphs so that the graphs still contain the information they contain today. The metrics from StatsdDataFactory shouldn't be removed until after the transition.

Fri, Jul 12, 4:36 PM · Observability-Metrics, Discovery-Search (Current work), CirrusSearch

Mon, Jul 8

EBernhardson added a comment to T346885: BUG Partially persistent authentication on WCQS after revoking permissions.

At a technical level whats happening is:

Mon, Jul 8, 8:57 PM · Wikidata, Fiwiki-Wikidata-Commons, Wikidata-Query-Service, StructuredDataOnCommons
EBernhardson added a comment to T365831: Increased delay in indexing of new Items on Wikidata .

A few options we might consider:

Mon, Jul 8, 8:33 PM · Discovery-Search (Current work), Wikidata, Wikidata.org, CirrusSearch
EBernhardson added a comment to T368894: Cirrus search does not prioritise master pages on their subpages.

Maybe there is a way to add to the algorithm a paragraph "if you show subsubpage in the line X, but its master page exists and wasn't shown above, then show the master page in the line X+1?"

Mon, Jul 8, 2:35 PM · Discovery-Search, CirrusSearch

Jun 14 2024

EBernhardson added a comment to T367435: Determine why elastic2088 and elastic2099 did not alert when unresponsive and fix.

For the alerts, my best guess would be we are not setting the contactgroups hieradata variable.

Jun 14 2024, 9:28 PM · Data-Platform-SRE (2024.06.17 - 2024.07.07)
EBernhardson added a comment to T367435: Determine why elastic2088 and elastic2099 did not alert when unresponsive and fix.

Elastic2099 is still down, refusing to be started from DRAC. From the logs on elastic2088:

Jun 14 2024, 7:24 PM · Data-Platform-SRE (2024.06.17 - 2024.07.07)

Jun 13 2024

EBernhardson added a comment to T361950: Ensure that WDQS query throttling does not interfere with federation.

We now have a flexible way to define if throttling should be enabled based on the presence or absense of various http headers and need to define what headers will be used. One concern with an X-Disable-Throttling header is we need some way to ensure that the header cannot be provided by arbitrary users, or we need to start getting more complicated with passing secrets around to ensure only requests with the secret token can disable throttling.

Jun 13 2024, 6:20 PM · wmde-wikidata-tech, Discovery-Search (Current work), Wikidata

Jun 12 2024

EBernhardson created P64743 (An Untitled Masterwork).
Jun 12 2024, 7:06 PM

Jun 11 2024

EBernhardson claimed T361950: Ensure that WDQS query throttling does not interfere with federation.
Jun 11 2024, 11:50 PM · wmde-wikidata-tech, Discovery-Search (Current work), Wikidata
EBernhardson added a comment to T361950: Ensure that WDQS query throttling does not interfere with federation.

The second option, making throttling conditional on X-BIGDATA-READ-ONLY makes sense to me. It's perhaps a little awkward to make generic and document, but shouldn't be too bad.

Jun 11 2024, 8:20 PM · wmde-wikidata-tech, Discovery-Search (Current work), Wikidata
EBernhardson added a comment to T259883: Analytics about usage of search - Updated data for dashboard.

The derived dataset the fulltext abandonment comes from is discovery.search_satisfaction_metrics. This should be readable by anyone in the privatedata group. The related code definition for fulltext abandonment is found in search_satisfaction_metrics.py. Essentially the recorded events indicate on a per-session basis how many result pages they saw, and how many pages they visited. If they saw a search result page and visited no pages we considered that abandonment.

Jun 11 2024, 7:46 PM · Wikipedia-Android-App-Backlog (Android Release - FY2023-24), Discovery-Search (Current work)
EBernhardson moved T366363: Use specific user agent in mediawiki api requests from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Jun 11 2024, 7:28 PM · serviceops-radar, Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T366363: Use specific user agent in mediawiki api requests.

Verified on the Wikikube Apache 2 accesslog dashboard that our requests have transitioned from the apache user agent to an application specific useragent. For the moment the user agents are as follows. These may change in the near future, as the user agent will also be used for rate limiting and we have to decide the appropriate granularity.

Jun 11 2024, 7:27 PM · serviceops-radar, Patch-For-Review, Discovery-Search (Current work)

Jun 5 2024

EBernhardson added a comment to T365044: SUP: fail over unknown config properties.

One awkward point, I kinda abused SUP not failing on unknown config properies to put the list of indexes being backfilled into the chart. It's used so that if the orchestration crashes it can fetch the configmap on startup and understand the backfill from only the deployed charts.

Jun 5 2024, 5:36 PM · Discovery-Search, CirrusSearch

Jun 4 2024

EBernhardson moved T253642: Offer "linkedfrom:" in Advanced Search (inverted "linksto:") from Needs review to Blocked/Waiting on the Discovery-Search (Current work) board.
Jun 4 2024, 8:23 PM · Discovery-Search, User-notice, CirrusSearch, Patch-For-Review, Advanced-Search
EBernhardson claimed T363734: Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization.
Jun 4 2024, 7:32 PM · Discovery-Search (Current work)
EBernhardson moved T72899: Search box needs some normalization for Arabic Family languages from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), Discovery-Search (Current work), CirrusSearch, Discovery-ARCHIVED, I18n, MediaWiki-Search
EBernhardson moved T363734: Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · Discovery-Search (Current work)
EBernhardson moved T358495: Enable dotted_I_fix (almost?) everywhere from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)
EBernhardson moved T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Discovery-Search (Current work), CirrusSearch
EBernhardson moved T362501: וי (U+05D5 vav, U+05D9 yod) doesn't find ױ (U+05F1 Yiddish vav yod) from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:32 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T363475: SUP: Shift Writes from Cirrus to SUP from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:31 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T364599: Automate search metrics notebooks and integrate with Airflow from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Jun 4 2024, 7:31 PM · Discovery-Search (Current work)

Jun 3 2024

EBernhardson added a comment to T363734: Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization.

Process has completed for eqiad and codfw. Total runtime was 3.5 days in codfw, 5.5 days in eqiad. The difference in time is mostly accounted for by commonswiki failing after more than a day and retrying. Based on review of this run I'm going to update the repository to work on a per-index basis instead of a per-wiki basis. This should reduce the effect of retries on large wikis, and also avoid a problem we see in the current process where commonswiki_content finishes reindexing, but then doesn't get backfilled for a day waiting for commonswiki_file to reindex.

Jun 3 2024, 3:00 PM · Discovery-Search (Current work)
EBernhardson claimed T366363: Use specific user agent in mediawiki api requests.
Jun 3 2024, 2:53 PM · serviceops-radar, Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T366363: Use specific user agent in mediawiki api requests from Incoming to Needs review on the Discovery-Search (Current work) board.
Jun 3 2024, 2:52 PM · serviceops-radar, Patch-For-Review, Discovery-Search (Current work)

May 31 2024

EBernhardson created T366363: Use specific user agent in mediawiki api requests.
May 31 2024, 3:36 PM · serviceops-radar, Patch-For-Review, Discovery-Search (Current work)

May 30 2024

EBernhardson updated the task description for T366297: Only reindex indexes that have some sort of change.
May 30 2024, 3:47 PM · Discovery-Search, CirrusSearch
EBernhardson created T366297: Only reindex indexes that have some sort of change.
May 30 2024, 3:46 PM · Discovery-Search, CirrusSearch

May 28 2024

EBernhardson added a comment to T363734: Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization.

I've been working out a new reindexing orchestration for this, found in https://gitlab.wikimedia.org/repos/search-platform/cirrus-reindex-orchestrator/. It has run to completion now on cloudelastic, finishing in under a week (compared to ~3 weeks last time). A review of the logs and the set of live indices in cloudelastic looks like this has been succesfull. Making a few more cleanups to the codebase, and then will start reindexing eqiad and codfw.

May 28 2024, 6:26 PM · Discovery-Search (Current work)

May 21 2024

EBernhardson moved T350974: search/glent fails on Java 11 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:54 PM · Discovery-Search (Current work), ci-test-error
EBernhardson moved T358472: Search dag image_suggestions_weekly failed with: Empty dataframe provided from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:52 PM · Discovery-Search (Current work), Structured-Data-Backlog, Image-Suggestions
EBernhardson moved T358350: Search Metrics - Successful searches from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:52 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T358351: Search Metrics - Read traffic generated by Search from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:52 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)
EBernhardson moved T358352: Search Metrics - Number of user sessions using search from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:52 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T358349: Search Metrics - Number of Searches from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
May 21 2024, 8:52 PM · Discovery-Search (Current work)
EBernhardson moved T364600: Create Superset dashboard for search metrics from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

@EBernhardson I was showing this today and we realized one more thing - would you please adjust the permissions for the underlying HDFS path to the data so that the analytics-privatedata-users group has read permissions? This should then make the dashboard viewable to all users on the cluster instead of just the analytics-search-users group members.

May 21 2024, 5:32 PM · Discovery-Search (Current work)

May 16 2024

EBernhardson added a comment to T364600: Create Superset dashboard for search metrics.

Went through and made some test charts in superset against my test tables I generated with the live data. It looks like we have everything we need, but I'm going to make one change to the collection scripts to simplify things.

May 16 2024, 8:34 PM · Discovery-Search (Current work)
EBernhardson moved T365190: Cannot provide empty array to wikis as $wgCirrusSearchWriteClusters from Incoming to Needs review on the Discovery-Search (Current work) board.
May 16 2024, 7:14 PM · MW-1.43-notes (1.43.0-wmf.6; 2024-05-21), Discovery-Search (Current work), CirrusSearch
EBernhardson created T365190: Cannot provide empty array to wikis as $wgCirrusSearchWriteClusters.
May 16 2024, 6:35 PM · MW-1.43-notes (1.43.0-wmf.6; 2024-05-21), Discovery-Search (Current work), CirrusSearch

May 14 2024

EBernhardson created P62389 script to check cirrus-streaming-updater container logs based on release and environment.
May 14 2024, 8:42 PM · Data-Platform-SRE, Discovery-Search
EBernhardson updated the task description for T364888: Autocomplete on exact matches is overly case sensitive.
May 14 2024, 4:58 PM · Discovery-Search
EBernhardson created T364888: Autocomplete on exact matches is overly case sensitive.
May 14 2024, 4:56 PM · Discovery-Search

May 13 2024

EBernhardson added a comment to T362789: Elastic-to-Opensearch migration: explore Opensearch-exclusive features.

Security would be interested in us investigating the access control mechanisms in opensearch, having access more limited than "anyone with a network connection in the cluster".

May 13 2024, 3:37 PM · Data-Platform-SRE, Discovery-Search

May 8 2024

EBernhardson added a comment to T357353: Application Security Review Request : NetworkSession MediaWiki extension .

@EBernhardson, according to @JMeybohm there is no way to limit the IP ranges of pod/service/namespace to associated them closely with an application (SUP).

Ok. I know it's hackier, but could that instead be managed via extension config?

May 8 2024, 8:52 PM · NetworkSession, Discovery-Search (Current work), secscrum, Security, Application Security Reviews

May 6 2024

EBernhardson added a comment to T358345: [Epic] Search metrics 2024.

The four sub-tickets were combined into a single gitlab MR with two calculations, and found in: https://gitlab.wikimedia.org/repos/search-platform/notebooks/-/merge_requests/3. These currently populated two daily partitioned hive tables and i've filled them with data for the months of march and april. Expecting that going forward we will want to move the metrics calculation to airflow, and decide on which metrics are worth dashboarding.

May 6 2024, 5:26 PM · Discovery-Search (Current work), Epic
EBernhardson claimed T358349: Search Metrics - Number of Searches.

Four tickets were combined into a single ticket, two calculations, and found in the patch above:

  • T358349 - number of searches
  • T358350 - successfull searches
  • T358351 - read traffic generated by search
  • T358352 - number of user sessions using search
May 6 2024, 5:22 PM · Discovery-Search (Current work)
EBernhardson moved T358352: Search Metrics - Number of user sessions using search from Ready for Dev -- SWE to Needs review on the Discovery-Search (Current work) board.

Four tickets were combined into a single ticket, two calculations, and found in the patch above:

  • T358349 - number of searches
  • T358350 - successfull searches
  • T358351 - read traffic generated by search
  • T358352 - number of user sessions using search
May 6 2024, 5:22 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T358350: Search Metrics - Successful searches.

Four tickets were combined into a single ticket, two calculations, and found in the patch above:

  • T358349 - number of searches
  • T358350 - successfull searches
  • T358351 - read traffic generated by search
  • T358352 - number of user sessions using search
May 6 2024, 5:21 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T358351: Search Metrics - Read traffic generated by Search.

Four tickets were combined into a single ticket, two calculations, and found in the patch above:

  • T358349 - number of searches
  • T358350 - successfull searches
  • T358351 - read traffic generated by search
  • T358352 - number of user sessions using search
May 6 2024, 5:21 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)

May 3 2024

EBernhardson added a comment to T358350: Search Metrics - Successful searches.

I've worked through most of this and have it calculating up the last two months of metrics now. They will be found, for now, in ebernhardson.T358350 in hive.

May 3 2024, 5:35 PM · Patch-For-Review, Discovery-Search (Current work)

May 1 2024

EBernhardson moved T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

The general issue here, missing search suggestions, is resolved and the temporary mitigations put in place have been rolled back. I'm calling this issue done. One of the root causes, network connectivity, has been resolved. The other root cause, promoting a bad index, is tracked in T363521. Some changes have already been put in place to make this code more resilient to network failures, but more might still me done.

May 1 2024, 6:00 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review
EBernhardson created T363922: Single letter tokens suffixed to article text in search.
May 1 2024, 4:42 PM · Discovery-Search
EBernhardson created P61631 Single letter tokens suffixed to article text in search.
May 1 2024, 4:41 PM · Discovery-Search

Apr 30 2024

EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1024698 switches from using a scroll to a search_after approach which should be more robust by handling retries and errors properly.
Question is whether we should do more by adding more checks or not? Unfortunately not all wikis are building a new index and promoting it, to optimize cluster operations most of the wikis recycle the same index where we don't have a chance to do such sanity checks prior to promoting.

Apr 30 2024, 4:18 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), serviceops-radar, CirrusSearch

Apr 26 2024

EBernhardson moved T361870: Stabilize "consumer-cloudelastic" Search Update Pipeline job from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

The consumer seems generally stable. It involved changes to both the application for better error handling, and an increase in the taskmanager memory above. The pods had been running for a week uninterrupted until we brought them down yesterday to verify some new alerting.

Apr 26 2024, 8:06 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.04.15 - 2024.05.05)
EBernhardson added a comment to T359215: mediawiki_cirrussearch_request data is regularly late.

Poked at the data-engineering-alerts archive, it looks like these were firing daily and then stopped on Apr 10. I think we can optimistically call this fixed?

Apr 26 2024, 7:54 PM · Performance Issue, Data-Platform
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

per the data-engineering-alerts list archive these were triggering daily alerts the two weeks prior to 2024-04-10 and haven't been emitted since. This is two days after the fix was applied, which is slightly curious. But I remember something about event refining operating over window of hours, so maybe it took some time to pass. I'm willing to call this complete with the errors stopping.

Apr 26 2024, 7:53 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

Root cause of the network issue has been tracked down in T363516#9748908, A layer-2 issue with LVS and new racks. With that fixed this error should be triggered less frequently, but we should still apply some resiliency updates to the related code.

Apr 26 2024, 7:04 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), serviceops-radar, CirrusSearch
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

decided to delay bringing traffic back to eqiad until monday. To be confident in the daily indices we would probably want to rebuild them all, but that takes many hours and it would finish only a few hours before I'm heading out for the weekend. Didn't seem like a great time to bring traffic back. The daily rebuilds will run, we can look at them on monday and bring traffic back if everything is back to normal.

Apr 26 2024, 6:51 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Needs review to Blocked/Waiting on the Discovery-Search (Current work) board.
Apr 26 2024, 6:49 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T359215: mediawiki_cirrussearch_request data is regularly late.

I poked around a little, but I'm not sure how to check if that fix solved the issue or not. I submitted a join request to the data-enginering-alerts mailing list, can check archives for current frequency after being accepted. I assume these alerts are also recorded by whatever sends them, but i wasn't sure where that is.

Apr 26 2024, 6:48 PM · Performance Issue, Data-Platform
EBernhardson moved T357066: CirrusSearch\BuildDocument\BuildDocumentException: ParserOutput cannot be obtained. from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

These look to have subsided, now 12 in the last 4 days.

Apr 26 2024, 5:57 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Discovery-Search (Current work), User-brennen, CirrusSearch, Wikimedia-production-error
EBernhardson edited P61254 (An Untitled Masterwork).
Apr 26 2024, 4:18 PM
EBernhardson created P61254 (An Untitled Masterwork).
Apr 26 2024, 4:01 PM

Apr 25 2024

EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

One thing we do have in logstash, although not specifically from the script running eqiad, is a surprising (to me) number of general network errors talking to the elasticsearch cluster. Looking at the Host overview dashboard for mwmaint1002 for today can see that there were intermittent network errors from 03:00 until 06:50. Our completion indices build ran from 02:30 to 6:45. Looking at the last 7 days there are consistently network errors during this time period. I'm assuming we are causing those, but we could try running it at a different time of day.

Apr 25 2024, 9:13 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), serviceops-radar, CirrusSearch
EBernhardson added a comment to T358350: Search Metrics - Successful searches.

Started looking over this the other day. Some data we have available:

Apr 25 2024, 8:29 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

Wrote a terrible bash script to compare titlesuggest doc counts between the two clusters. This suggests the problem isn't limited to enwiki

Apr 25 2024, 8:27 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), serviceops-radar, CirrusSearch
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

Decided against shuffling traffic, rebuild is almost compete already for enwiki. I can see in the logs where the enwiki eqiad build jumped from 44% to complete, but no reason why. nothing in logstash for that period either. I've created T363521 to put something in place to prevent this in the future.

Apr 25 2024, 8:20 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review
EBernhardson created T363521: Completion suggester can promote a bad build.
Apr 25 2024, 7:48 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Discovery-Search (Current work), serviceops-radar, CirrusSearch
EBernhardson created P61225 mwmaint1002:mediawiki_job_cirrus_build_completion_indices_eqiad syslog for enwiki.
Apr 25 2024, 7:47 PM
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

hmm, i can confirm this is happening. The completion index is built new every day in each datacenter. Usually they are the same, but somehow the eqiad index is about half the size of the codfw index (6.7g vs 14.5g). Auto complete is fairly high traffic, we should probably shift the autocomplete traffic to codfw until it can be fixed which probably requires a rebuild and a couple hours.

Apr 25 2024, 7:25 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review

Apr 24 2024

EBernhardson created P61184 (An Untitled Masterwork).
Apr 24 2024, 9:31 PM

Apr 19 2024

EBernhardson added a comment to T358345: [Epic] Search metrics 2024.

For those following along, have a look at the comment in T358349#9727873 to identify the notebook helping to fill a table in @EBernhardson's namespace and an example Superset.

Erik, nice work so far!

I'm interested to see migration of the the coarse grained session ratios in the subtasks, which are expressed in the previous notebooks such as T358352-user-sessions-using-search.ipynb brought into the Superset dashboard (the Python-deduced number of actors, as well as unique_devices_per_domain_daily divisors are helpful in particular for the AC).

Apr 19 2024, 11:25 PM · Discovery-Search (Current work), Epic

Apr 18 2024

EBernhardson added a comment to T358349: Search Metrics - Number of Searches.

This chart should (eventually) contain the same data as gehel posted above. As of this moment only 5 days are calculated but the aggregate % have already settled in. I only spent a couple minutes to make the chart, this probably isn't the best way to present the data. But an example: https://superset.wikimedia.org/explore/?slice_id=3368

Apr 18 2024, 8:21 PM · Discovery-Search (Current work)
EBernhardson added a comment to T358349: Search Metrics - Number of Searches.

@EBernhardson should we close this as a duplicate and move "(full text search, go bar, ...)" as a dimension aspect in T358352: Search Metrics - Number of user sessions using search?

Apr 18 2024, 6:43 PM · Discovery-Search (Current work)

Apr 17 2024

EBernhardson added a comment to T358599: Integrate Saneitizer with SUP.

One potential improvement we talked about, the initial method of configuring the saneitizer adds new pieces to the flink execution graph. This means you have to play around with some dangerous options to pause saneitization, losing the current saneitization state in the process. We should update the operation of the flag to toggle saneitization so that it still connects to the graph, but never emits any events or state changes when disabled. The general idea is that the shape of the graph should not change due to configuration changes, as graph shape changes require careful deployments.

Apr 17 2024, 7:49 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson added a comment to T358599: Integrate Saneitizer with SUP.

Iniital deployment has been a bit rocky, in particular saneitizer is visiting pages with error states we haven't seen in normal operation yet. Overall this is probably good, we would have run into pages with these error states eventually. Saneitizer is simply speeding that process up. The pipeline has been running for a couple hours now without issues,. If it's still running without restarts by tomorrow we can probably consider the initial deployment complete.

Apr 17 2024, 7:38 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson moved T358518: Deploy streaming updater for 100% of writes to cloudelastic from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Apr 17 2024, 7:36 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Apr 16 2024

EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

This looks to be all caught back up from our side

Apr 16 2024, 3:24 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson edited P60544 (An Untitled Masterwork).
Apr 16 2024, 12:23 AM
EBernhardson created P60544 (An Untitled Masterwork).
Apr 16 2024, 12:20 AM

Apr 15 2024

EBernhardson added a comment to T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.

All indices on cloudelastic look to be recreated now as well. It hasn't been running this whole time, it just took me awhile to get around to verifying the operation and finishing the couple wikis that failed the first two times through.

Apr 15 2024, 5:59 PM · Discovery-Search (Current work)
EBernhardson moved T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Apr 15 2024, 5:59 PM · Discovery-Search (Current work)
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Blocked/Waiting to Needs review on the Discovery-Search (Current work) board.
Apr 15 2024, 3:15 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

it was backfilling over the weekend but got stuck around feb 6th. It's back to processing hourlies, i expect they will keep decreasing for at least 12 more hours of processing based on the current rates, as long as it doesn't get stuck again. Basically what happened is there is a daily cleanup for old data, and because this is backfilling old data the bits it calculated were deleted in the middle of it working, and it stopped. I've paused the cleanup process for now until it completes.

Apr 15 2024, 2:07 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments

Apr 12 2024

EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

they are stored and processing through now at a rate of something like one hour per minute. It should catchup soon enough.

Apr 12 2024, 11:18 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

Hmm, indeed it looks like hourly transfers have been stuck for quite some time. Somehow airflow thinks there are two hours running and it never failed them. It is still waiting for them to complet even though nothing is running. It looks like we never set an SLA value on this dag, so it's failures probably don't get properly recognized. I've reset the two two tasks that were stuck and will see how i can get these all moving again, along with adding an sla so it properly alerts.

Apr 12 2024, 9:02 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson created P60469 (An Untitled Masterwork).
Apr 12 2024, 4:57 PM