Page MenuHomePhabricator

PHP Notice: Undefined index: watchlistdays-local-exception
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   PHP Notice: Undefined index: watchlistdays-local-exception
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/HTMLFormField.php(165)
#0 /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/HTMLFormField.php(165): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/HTMLFormField.php(277): HTMLFormField->getNearestFieldValue(array, string, boolean, boolean)
#2 /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/HTMLFormField.php(367): HTMLFormField->checkStateRecurse(array, array)
#3 /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/HTMLFormField.php(398): HTMLFormField->isDisabled(array)
#4 /srv/mediawiki/php-1.38.0-wmf.23/includes/htmlform/fields/HTMLFloatField.php(14): HTMLFormField->validate(string, array)
#5 /srv/mediawiki/php-1.38.0-wmf.23/includes/api/ApiOptions.php(128): HTMLFloatField->validate(string, array)
#6 /srv/mediawiki/php-1.38.0-wmf.23/includes/api/ApiMain.php(1891): ApiOptions->execute()
#7 /srv/mediawiki/php-1.38.0-wmf.23/includes/api/ApiMain.php(869): ApiMain->executeAction()
#8 /srv/mediawiki/php-1.38.0-wmf.23/includes/api/ApiMain.php(840): ApiMain->executeActionWithErrorHandling()
#9 /srv/mediawiki/php-1.38.0-wmf.23/api.php(90): ApiMain->execute()
#10 /srv/mediawiki/php-1.38.0-wmf.23/api.php(45): wfApiMain()
#11 /srv/mediawiki/w/api.php(3): require(string)
#12 {main}
Impact

A moderate rate (~20 messages per hour) of log spam.

Notes

Seemingly introduced by HTMLFormField: Make the behavior of cond-state consistent between client and server. Note the message is not normalized and there are a variety of similar error messages mentioning different assoc array keys, which makes this more spammy than normal as it fills up the top hits of our logstash dashboards.

Event Timeline

dduvall triaged this task as High priority.Feb 24 2022, 5:19 PM
dduvall created this task.

I'm not listing this as a blocker for now but it would be great to have this fixed ASAP. If the rates increase to an untenable number upon group2 it may become an official blocker.

Change 765670 had a related patch set uploaded (by Func; author: Func):

[mediawiki/core@master] ApiOptions: Workaround for preferences that have no default values

https://gerrit.wikimedia.org/r/765670

I'm not listing this as a blocker for now but it would be great to have this fixed ASAP. If the rates increase to an untenable number upon group2 it may become an official blocker.

The rate may be low, but afaik that just means the category of write actions in question has a relatively low rate in general. We don't have thousands of sign ups, edits, watchlist changes etc every minute in the first place. I don't know for sure, but it seems plausible at this rate that this it is affecting most or all calls to the Options API.

Unless there is confirmation or other evidence to know that this is not corrupting user preferences tables, I think this should have been a blocker.

Any new error messages that occur frequently enough to be noticed where deployers watch for breakage, will block the train.

If we start applying a judgement based on rate, it becomes very difficult to reason about the state of production given that most contributors and code reviewers are not watching Logstash 24/7 themselves. We rely on this policy to be followed in order to ensure the integrity of the production environment. Don't be afraid to block and rollback, and wait for someone to understand the issue and confirm that it is not causing corruption or loss of functionality.

Change 766788 had a related patch set uploaded (by Func; author: Func):

[mediawiki/core@master] Revert \"htmlform: Replace some uses of isHidden to isDisabled\"

https://gerrit.wikimedia.org/r/766788

I'm not listing this as a blocker for now but it would be great to have this fixed ASAP. If the rates increase to an untenable number upon group2 it may become an official blocker.

The rate may be low, but afaik that just means the category of write actions in question has a relatively low rate in general. We don't have thousands of sign ups, edits, watchlist changes etc every minute in the first place. I don't know for sure, but it seems plausible at this rate that this it is affecting most or all calls to the Options API.

Unless there is confirmation or other evidence to know that this is not corrupting user preferences tables, I think this should have been a blocker.

Noted for next time, though it sounds like you're not totally sure of that.

Any new error messages that occur frequently enough to be noticed where deployers watch for breakage, will block the train.

If we start applying a judgement based on rate, it becomes very difficult to reason about the state of production given that most contributors and code reviewers are not watching Logstash 24/7 themselves. We rely on this policy to be followed in order to ensure the integrity of the production environment. Don't be afraid to block and rollback, and wait for someone to understand the issue and confirm that it is not causing corruption or loss of functionality.

This policy is rarely followed to a tee, and we're always applying judgement. That's unavoidable with any process that involves a lot of pattern analysis by a human being. Otherwise, we would have implemented automated blocker reporting by now.

I think what's key to improving this process would be having the right people (those with more intimate knowledge of the software) involved directly in the deployment. IMO, relying on train blockage as the primary motivator for a quick response to production errors is an antipattern and completely untenable. We need developers to triage and investigate production errors more quickly than is typically done, and we need them involved directly in deployments. We'll get there.

Change 766801 had a related patch set uploaded (by Bartosz Dziewoński; author: Func):

[mediawiki/core@wmf/1.38.0-wmf.23] Revert \"htmlform: Replace some uses of isHidden to isDisabled\"

https://gerrit.wikimedia.org/r/766801

Change 766788 merged by jenkins-bot:

[mediawiki/core@master] Revert \"htmlform: Replace some uses of isHidden to isDisabled\"

https://gerrit.wikimedia.org/r/766788

Change 766801 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.23] Revert \"htmlform: Replace some uses of isHidden to isDisabled\"

https://gerrit.wikimedia.org/r/766801

Mentioned in SAL (#wikimedia-operations) [2022-02-28T21:30:34Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.23/includes/htmlform/: rMW67831a3894fe: Revert "htmlform: Replace some uses of isHidden to isDisabled" (T302512) (duration: 00m 48s)

matmarex closed this task as Resolved.EditedMar 1 2022, 12:11 AM
matmarex assigned this task to Func.

Confirmed fixed, the warnings stopped being logged after the deployment:

image.png (470×2 px, 49 KB)