Page MenuHomePhabricator

Wikimedia\Assert\PostconditionException: Postcondition failed: Revision had no page
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Assert\PostconditionException: Postcondition failed: Revision had no page
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.20/vendor/wikimedia/assert/src/Assert.php(202)
#0 /srv/mediawiki/php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php(44): Wikimedia\Assert\Assert::postcondition(boolean, string)
#1 /srv/mediawiki/php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php(105): MediaWiki\Extension\DiscussionTools\Notifications\EventDispatcher::getParsedRevision(MediaWiki\Revision\RevisionStoreRecord)
#2 /srv/mediawiki/php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Hooks/EchoHooks.php(91): MediaWiki\Extension\DiscussionTools\Notifications\EventDispatcher::generateEventsForRevision(array, MediaWiki\Revision\RevisionStoreRecord)
#3 /srv/mediawiki/php-1.37.0-wmf.20/includes/HookContainer/HookContainer.php(338): MediaWiki\Extension\DiscussionTools\Hooks\EchoHooks::onEchoGetEventsForRevision(array, MediaWiki\Revision\RevisionStoreRecord, boolean)
#4 /srv/mediawiki/php-1.37.0-wmf.20/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#5 /srv/mediawiki/php-1.37.0-wmf.20/includes/Hooks.php(137): MediaWiki\HookContainer\HookContainer->run(string, array, array)
#6 /srv/mediawiki/php-1.37.0-wmf.20/extensions/Echo/includes/DiscussionParser.php(165): Hooks::run(string, array)
#7 /srv/mediawiki/php-1.37.0-wmf.20/extensions/Echo/includes/EchoHooks.php(494): EchoDiscussionParser::generateEventsForRevision(MediaWiki\Revision\RevisionStoreRecord, boolean)
#8 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/MWCallableUpdate.php(38): EchoHooks::{closure}()
#9 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdates.php(515): MWCallableUpdate->doUpdate()
#10 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdates.php(391): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#11 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdates.php(234): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#12 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
#13 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#14 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdates.php(237): DeferredUpdatesScope->processUpdates(integer, Closure)
#15 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(AtomicSectionUpdate, integer)
#16 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#17 /srv/mediawiki/php-1.37.0-wmf.20/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
#18 /srv/mediawiki/php-1.37.0-wmf.20/extensions/EventBus/includes/JobExecutor.php(105): DeferredUpdates::doUpdates()
#19 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#20 {main}
Impact
Notes

Event Timeline

Not yet blocking 1.37.0-wmf.20 (T281161); please comment if we should.

This is weirdly reminiscent of T285895… Did something change somewhere causing us to read from the replica databases again?

Also, our error message would be much more useful if it would say which revision had no page.

These have started showing up more rapidly after rolling the train forward to group1.

brennen triaged this task as Unbreak Now! priority.Aug 25 2021, 7:26 PM

This looks like a blocker.

Also, our error message would be much more useful if it would say which revision had no page.

Keep in mind this sort of thing can sometimes make error filtering / logspam assessment more difficult.

This looks like a blocker.

Also, our error message would be much more useful if it would say which revision had no page.

Keep in mind this sort of thing can sometimes make error filtering / logspam assessment more difficult.

.. but that's what supplemental diagnostic fields and channels are for. In the case of the sample linked in the task description of reqId 780f2e48-9d1e-4789-964c-ee6694c5e462, we currently do have amount of information already:

  • Translate TranslateRenderJob [Help:Reset password/fa]: Starting TranslateRenderJob.
  • Translate TranslateRenderJob [Help:Reset password/fa]: Finished TranslateRenderJob.
  • Translate TranslateRenderJob [Help:Reset password/fa]: Finished page edit operation.
  • exception [780f2e48-9d1e-4789-964c-ee6694c5e462] /rpc/RunSingleJob.php Wikimedia\Assert\PostconditionException: Postcondition failed: Revision had no page
    • exception.trace: DeferredUpdates::doUpdates, MWCallableUpdate, EchoDiscussionParser::generateEventsForRevision, DiscussionTools\Hooks\EchoHooks::onEchoGetEventsForRevision, DiscussionTools\Notifications\EventDispatcher::getParsedRevision.
  • Deferred update 'MWCallableUpdate_EchoHooks::onPageSaveComplete' failed to run.

Change 714823 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] PageStore: Pass query flags to getPageByName()

https://gerrit.wikimedia.org/r/714823

Change 714824 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] EventDispatcher: Try really, really hard to read from master

https://gerrit.wikimedia.org/r/714824

I'm not sure if these patches will fix the problem, but I think they make sense, based on the comments by @Pchelolo that I linked to. If they don't fix it, we should try to add more debugging.

@Krinkle If that's telling you anything, please share. (Although I admit I didn't think to look at other errors in the same requests, that's good to know.)

Also, here's a better logstash search for the error, which also shows these errors have been occurring for a while (but at a much lower rate): https://logstash.wikimedia.org/goto/5d9b01512daa8e998bd7d1354f99f8bf

image.png (533×2 px, 76 KB)
I don't know what triggered the change, but the commits in DiscussionTools don't seem related. My guess is something in core, or some config change.

Copying IRC conversation about the patches, for future reference:

[21:42]	<Krinkle> MatmaRex: It looks suspicious to me to read from the primary in that code path. If I understand the CR comments from the other patches, this is for fetching the before-last revision, not one inserted in the current request. If that is indeed the case, how do you know about its existence? It seems whatever data source tells you the rev id of what you're looking for would naturally ensure you have it on the same replica connection.
[21:43]	<Krinkle> We've worked pretty hard to eliminate these kinds of fallbacks to master as they tend to be just a noisey trap making not-found errors more expensive.
[21:45]	<MatmaRex> Krinkle: it's looking for the page_id given a rev_id, and the rev is either the latest or the previous
[21:45]	<Krinkle> where does it get the rev_Id, and this is always a GET or a POST, or is the same code used both ways?
[21:45]	<MatmaRex> so i think querying master there is correct
[21:45]	<MatmaRex> it's a DeferredUpdate from a POST request
[21:46]	<MatmaRex> it gets the rev id from some hook
[21:47]	<Krinkle> right, so the knowledge of the rev id comes from the current request, likely just-inserted. that seems like a case of reading direclty from master unconditionally, not try/fallback.
[21:48]	<Krinkle> my only concern is that you'd want to avoid generic getter code (not obvsiouly master-conn inducing) from falling back to the master when there is no reason to from a non-post/update context.
[21:48]	<Krinkle> change 714824 appears to do that, but I'm not 100% sure
[21:50]	<MatmaRex> is there some way for me to reassure you, or do you just need to think about it?
[21:50]	<MatmaRex> ;)
[21:51]	<Krinkle> MatmaRex: you mean getParsedRevision() is only used for db write contexts?
[21:52]	<Krinkle> And you think it's likey that a majority of calls to it will be for things older/already replicated so want to try both.
[21:52]	<MatmaRex> yes (it's used for generating notifications and revision tags)
[21:53]	<MatmaRex> i don't really know if a majority of calls will be for replicated things… but i guess it was working with no errors for weeks, despite querying replicas, so yes i guess
[21:53]	<MatmaRex> with almost no errors*
[21:53]	<Krinkle> But yeah, if it's write-only  then I guess this is fine. Might be worth a comment somewhere to indicate as much, dunno. I suppose you'd know better how obvious it would be or not for this particular class. Dispatcher does seem clear in retrospect in terms of where it is used.
[21:53]	<Krinkle> ok - do we know why it changed (I'm just low-prio curious at this point)
[21:54]	<MatmaRex> i have no idea
[21:56]	<MatmaRex> Krinkle: thanks for the +2, there's a dependency in core https://gerrit.wikimedia.org/r/c/mediawiki/core/+/714823
[21:56]	<MatmaRex> …because those methods just ignored that parameter
[22:00]	<Krinkle> I see no obvious reason to not pass that indeed, also from looking at the commit that introduced it 
[22:01]	<MatmaRex> it was also pointed out in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/702457/comment/ce9d1428_49262081/ last month
[22:01]	<MatmaRex> (when i tried to add that READ_LATEST for the first time… just saying ;) )
[22:01]	<MatmaRex> thanks
[22:07]	<MatmaRex> Krinkle: also, i'll copy this conversation into phab, if that's okay with you? in case i end up debugging the same bug again a month from now
[22:07]	<Krinkle> sure

Change 714823 merged by jenkins-bot:

[mediawiki/core@master] PageStore: Pass query flags to getPageByName()

https://gerrit.wikimedia.org/r/714823

Change 714824 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] EventDispatcher: Try really, really hard to read from master

https://gerrit.wikimedia.org/r/714824

Change 714684 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.37.0-wmf.20] PageStore: Pass query flags to getPageByName()

https://gerrit.wikimedia.org/r/714684

Change 714685 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@wmf/1.37.0-wmf.20] EventDispatcher: Try really, really hard to read from master

https://gerrit.wikimedia.org/r/714685

Change 714684 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.20] PageStore: Pass query flags to getPageByName()

https://gerrit.wikimedia.org/r/714684

Change 714685 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@wmf/1.37.0-wmf.20] EventDispatcher: Try really, really hard to read from master

https://gerrit.wikimedia.org/r/714685

Mentioned in SAL (#wikimedia-operations) [2021-08-25T21:32:48Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: rMW34fb2b99104d: PageStore: Pass query flags to getPageByName() (T289717; T195069) (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-08-25T21:35:09Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: cc04b33dec6b9aed1d7621957c4de527266600d1: EventDispatcher: Try really, really hard to read from master (T289717) (duration: 01m 04s)

Urbanecm assigned this task to matmarex.
Urbanecm subscribed.

Tentatively closing, as fix was deployed. Reopen if it happens again.

I still this error in recent log entries for wmf.20.

Change 715051 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] PageStore: Pass query flags to getPageById() too

https://gerrit.wikimedia.org/r/715051

Hopefully that is just it, there's another level of indirection there (that I missed at first) that is ignoring our request to read from master.

Change 714864 had a related patch set uploaded (by Ahmon Dancy; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.37.0-wmf.20] PageStore: Pass query flags to getPageById() too

https://gerrit.wikimedia.org/r/714864

Change 715051 merged by jenkins-bot:

[mediawiki/core@master] PageStore: Pass query flags to getPageById() too

https://gerrit.wikimedia.org/r/715051

Change 714864 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.20] PageStore: Pass query flags to getPageById() too

https://gerrit.wikimedia.org/r/714864

Mentioned in SAL (#wikimedia-operations) [2021-08-26T17:26:26Z] <dancy@deploy1002> Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: [[gerrit:714864|PageStore: Pass query flags to getPageById() too (T289717 T195069)]] (duration: 01m 05s)

The last error has been recorded at 17:07 UTC, there are no more errors since the patch was deployed (17:26 UTC), and wmf.20 has been rolled forward to all wikis almost an hour ago too (~19:02 UTC). Looks fixed for real now.

https://logstash.wikimedia.org/goto/28457edce20856d551ae5f9e4d21d525

image.png (485×2 px, 59 KB)