Page MenuHomePhabricator

RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Request ID: XWTrCgpAICEAABcyyq4AAAHR
Request URL: commons.wikimedia.org/wiki/File:Bolsonaro_with_Israeli_PM_Benjamin_Netanyahu,_Tel_Aviv,_31_March_2019.jpg

message
[XWTrCgpAICEAABcyyq4AAAHR] /wiki/File:Bolsonaro_with_Israeli_PM_Benjamin_Netanyahu,_Tel_Aviv,_31_March_2019.jpg   Wikibase\DataModel\Services\Lookup\EntityLookupException from line 44 of /srv/mediawiki/php-1.34.0-wmf.19/extensions/Wikibase/lib/includes/Store/RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979
trace
#0 /srv/mediawiki/php-1.34.0-wmf.19/vendor/wikibase/data-model-services/src/Lookup/RedirectResolvingEntityLookup.php(51): Wikibase\Lib\Store\RevisionBasedEntityLookup->getEntity(Wikibase\MediaInfo\DataModel\MediaInfoId)
#1 /srv/mediawiki/php-1.34.0-wmf.19/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php(287): Wikibase\DataModel\Services\Lookup\RedirectResolvingEntityLookup->getEntity(Wikibase\MediaInfo\DataModel\MediaInfoId)
#2 /srv/mediawiki/php-1.34.0-wmf.19/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php(250): Wikibase\MediaInfo\WikibaseMediaInfoHooks->doBeforePageDisplay(OutputPage, boolean, array, Wikibase\Repo\BabelUserLanguageLookup, Wikibase\Repo\ParserOutput\DispatchingEntityViewFactory, array)
#3 /srv/mediawiki/php-1.34.0-wmf.19/includes/Hooks.php(174): Wikibase\MediaInfo\WikibaseMediaInfoHooks::onBeforePageDisplay(OutputPage, SkinVector)
#4 /srv/mediawiki/php-1.34.0-wmf.19/includes/Hooks.php(234): Hooks::callHook(string, array, array, NULL, string)
#5 /srv/mediawiki/php-1.34.0-wmf.19/includes/OutputPage.php(2578): Hooks::runWithoutAbort(string, array)
#6 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(899): OutputPage->output(boolean)
#7 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(911): Closure$MediaWiki::main()
#8 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(523): MediaWiki->main()
#9 /srv/mediawiki/php-1.34.0-wmf.19/index.php(42): MediaWiki->run()
#10 /srv/mediawiki/w/index.php(3): include(string)
#11 {main}
Impact

186 hits in the last hour, 4359 hits in the last 24 hours. Not seen before that.

Notes

...

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
hashar lowered the priority of this task from Unbreak Now! to High.Aug 27 2019, 2:10 PM
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM

What should happen here? Should the entity always retain its original ID even after a page move, or should the entity ID always reflect the current page ID of the page with which it's associated?

The assumption that the entity ID will always reflect the (current) page ID seems pretty well baked in at this point, so maybe the entity IDs should just be updated as discrepancies are found. On the other hand, that risks masking any deeper issues that arise.

Better, if we want to preserve the correspondence between mediainfo IDs and page IDs, would be to hook into ArticleUndelete and update the mediainfo ID if the page ID has changed. (Despite what its name might suggest, that hook isn't restricted to NS_MAIN, but appears to fire when any page from any namespace is restored.)

Not my decision to make but those are presumably stable identifiers that reusers of the data can rely on just like Wikidata Q-IDs for Items. In this case you'll probably want to make the data available under both IDs. Wikidata does this with redirects on Items and Lexemes.

Not really - the MediaInfo id refers to a slot rather than a page, and isn't accessible via a url. I think the [[ https://www.mediawiki.org/wiki/Manual:Hooks/ArticleUndelete | ArticleUndelete ]] solution will work

edit: Hmm actually it is accessible via Special:EntityData, so perhaps we will need a redirect

Urbanecm raised the priority of this task from High to Unbreak Now!.Sep 1 2019, 7:57 PM

Hello everyone, just got re-reported in T231757. Closed as dupe, and I would like to ask anyone having enough skills to focus at this task. I kinda feel it is an UBN, given it blocks users from doing important work, https://commons.wikimedia.org/w/index.php?title=User_talk:Martin_Urbanec&oldid=364071280#Copyright_status:_File:%E0%A4%B6%E0%A4%BF%E0%A4%B2%E0%A5%8D%E0%A4%AA%E0%A4%95%E0%A4%BE%E0%A4%B0_%E0%A4%9A%E0%A4%B0%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%B0%E0%A4%95%E0%A5%8B%E0%A4%B6_%E0%A4%96%E0%A4%82%E0%A4%A1_%E0%A5%A8_-_%E0%A4%B8%E0%A4%BE%E0%A4%B9%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%AF.pdf being just an example.

The few errors showing are since 2019-08-26 09:40:00 UTC (there are a few before that). Might just have been an edit on one of the pages that is introduced a fault that is not properly handled.

The scope is fairly thin, it just a few items on commonswiki apparently. Hence does not seem to be worth blocking the train.

I'm not sure I would agree. I really do feel this should be an UBN. Agree it isn't worth blocking the train, the train is already moving fastly, and is at group1 station for now. I think it is important/urgent enough to warrant that, if train were at group0.

image.png (261×1 px, 22 KB)

This is an up to date screenshot from https://logstash.wikimedia.org/goto/8b9e794e58be8125f1fc15b881154f0c (searching "instead of expected", not ideal, but shows this by vast majority, if not only). It shows that this happened almost 12k "today", according to Logstash definition, averaging 300 hits per 30 minutes (10 hits per minute). Expanding to "this week" doesn't make much difference through (still 12k hits), so I'm being bold and re-bumping to UBN. Not adding T220745, I'm not sure if it is wise to block running train... Anyway, I wouldn't rate "10 hits per minute" as "fairly thin", while it might be an edge case generally speaking, an edge case of 1 % of users means high absolute number at Wikimedia. As such, I'm sure this is UBN more than High, and setting the priority with reasoning then.

If you disagree (you being @hashar or anyone reading this), I'd appreciate if you can comment before, or while, reverting, to explain why this isn't right. I admit I didn't read the full task, and may or may not have any information mentioned here. Thanks for reading.

Quick fix for the error: just remove the check that triggers the error. Could be converted to a warning, or just a debug message.

However, this doesn't solve the conceptual problem of breaking all existing references to the old entity ID, internal and external.

@Mholloway said:

What should happen here? Should the entity always retain its original ID even after a page move, or should the entity ID always reflect the current page ID of the page with which it's associated?

Ideally, undeletion would not change the page ID. Core already tries to retain the original page ID, but it's not always possible. Fixing this is a complicated project, see T193690: RFC: How should we fix the undeletion system?. I wonder though why the page ID wasn't restored in this case - I don't really see a reason for that, may be worth investigating.

From the Wikibase model perspective, entity IDs should never change. So the original ID needs to be retained. The question is how that could be made to work...

The assumption that the entity ID will always reflect the (current) page ID seems pretty well baked in at this point, so maybe the entity IDs should just be updated as discrepancies are found. On the other hand, that risks masking any deeper issues that arise.

Indeed - From the perspective of EntityTitleLookup (as far as I remember it from a couple of years ago), the mapping of an ID to a page Title needs to be programmatic - for Items, the ID is the title text, and for MediaInfo, the title is the page ID. If the page ID changes, the page can no longer be found given the MediaInfo ID. To make this work, the mapping from MediaInfo ID to Title would have to be maintained in the database.

What could be done is fixing the self-reference in the EntityContent at load time. This way, the JSON emitted from Special:EntityData and in dumps would contain the new ID, even though the database doesn't.

Better, if we want to preserve the correspondence between mediainfo IDs and page IDs, would be to hook into ArticleUndelete and update the mediainfo ID if the page ID has changed. (Despite what its name might suggest, that hook isn't restricted to NS_MAIN, but appears to fire when any page from any namespace is restored.)

That would mean changing the content of the revision, which changes the hash of the revision. While this is not impossible, MediaWiki core doesn't offer any way to update a revision's content after it was saved, and doing so may break expectations.

@Lydia said:

Not my decision to make but those are presumably stable identifiers that reusers of the data can rely on just like Wikidata Q-IDs for Items. In this case you'll probably want to make the data available under both IDs. Wikidata does this with redirects on Items and Lexemes.

Keeping a redirect from the old ID to the new ID (plus rewriting the internal slef-reference upon load) is probably the cleanest solution. It however requires that mapping to be maintained somewhere somehow. Some kind of wikibase-internal alias mechanism, stored as a table in the database. This fits with the model, the problem is mainly maintaining the mapping. Perhaps the ArticleUndelete hook could be used for that.

Perhaps also the code that currently does that check and throws the error could instead check if the mapping is already recorded, if not store it in the DB. And of course fix the ID contained in the Entity object itself.

Note that keeping a table with aliases is similar to but not the same as maintaining the mapping from MediaInfoId to page ID in the database. The alias table would be much smaller. And would only be consulted when the "normal" lookup fails. On the other hand, it introduces a special case, somewhat increasing code complexity.

(...)I wonder though why the page ID wasn't restored in this case - I don't really see a reason for that, may be worth investigating.

The file is a big file (708.4 MB) from https://archive.org/details/20190830_20190830_1349
This may partly explain that (time out?).

Change 533907 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/extensions/Wikibase@master] [stopgap] Don't throw an exception on unexpected difference between M* ids

https://gerrit.wikimedia.org/r/533907

Quick fix for the error: just remove the check that triggers the error. Could be converted to a warning, or just a debug message.

Good idea, uploaded a patch.

However, this doesn't solve the conceptual problem of breaking all existing references to the old entity ID, internal and external.

Agreed, but that' longer-term problem. Still high, but not user-facing, at least to viewers.

@Mholloway said:

What should happen here? Should the entity always retain its original ID even after a page move, or should the entity ID always reflect the current page ID of the page with which it's associated?

[...]
From the Wikibase model perspective, entity IDs should never change. So the original ID needs to be retained. The question is how that could be made to work...

Agreed, but topic for "after at least a stopgap is done", I guess.

(...)I wonder though why the page ID wasn't restored in this case - I don't really see a reason for that, may be worth investigating.

The file is a big file (708.4 MB) from https://archive.org/details/20190830_20190830_1349
This may partly explain that (time out?).

Seems like a good way to go, hopefully, the patch above will work as a stopgap solution, until this is fully investigated.

@Urbanecm the patch prevents the fatal error, but there are other errors if, for example, a user tries to edit a caption.

However, this doesn't solve the conceptual problem of breaking all existing references to the old entity ID, internal and external.

Agreed, but that' longer-term problem. Still high, but not user-facing, at least to viewers.

I'd count breaking links as user facing... but not as urgent, sure.

The file is a big file (708.4 MB) from https://archive.org/details/20190830_20190830_1349
This may partly explain that (time out?).

That seems unlikely. Looking at the code in the PageArchive class, files are restored first, page revisions are handled after that. A timeout would just abort the process, not lead to a new ID being assigned.

The relevant line of code in PageArchive is this one:

$newid = $article->insertOn( $dbw, $latestRestorableRow->ar_page_id );

The code for restoring the page ID is from 2016, see T28123: Special:Undelete doesn't use ar_page_id. This should "usually" work. But I haven't checked that it does. Maybe it's broken somehow, and every undelete changes the page ID again? Needs investigation, because if that was true, it would greatly aggravate the effect of the MediaInfoID problem.

@Urbanecm the patch prevents the fatal error, but there are other errors if, for example, a user tries to edit a caption.

The new "actual" ID needs to be forced into the Entity data, I commented on the patch to that effect.

Maybe it's broken somehow, and every undelete changes the page ID again? Needs investigation, because if that was true, it would greatly aggravate the effect of the MediaInfoID problem.

I don't think so - at least locally a delete/restore doesn't change the page ID. Obvs it does happen sometimes on production, but I don't know how

The new "actual" ID needs to be forced into the Entity data, I commented on the patch to that effect.

On @Urbanecm 's patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/533907 ? I don't see a comment (scratches head)

The new "actual" ID needs to be forced into the Entity data, I commented on the patch to that effect.

On @Urbanecm 's patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/533907 ? I don't see a comment (scratches head)

Of course, I forgot to hit the big blue button...

Thanks @Cparle and @daniel. Fixed my patch. I felt something like that can be necessary, but don't have enough Wikibase knowledge to fully judge. Since I had about 10 spare mins when I thought about writing something to start the work to limit this bug's scope (speaking of user-visibility), I relied at jenkins/reviewers to help me in that area. Thanks!

Patch works per @Cparle. I can't test locally due to probably unrelated error:

[dd239c1581da0431f6fd1e17] /wiki/Special:Version Error from line 35 of /mnt/c/Users/urban/unsynced/gerrit/mediawiki/extensions/Wikibase/lib/WikibaseLib.entitytypes.php: Class 'Wikibase\DataModel\Entity\ItemId' not found

Backtrace:

#0 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/extensions/Wikibase/repo/includes/WikibaseRepo.php(791): require()
#1 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/extensions/Wikibase/repo/includes/WikibaseRepo.php(406): Wikibase\Repo\WikibaseRepo::getDefaultEntityTypes()
#2 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/extensions/Wikibase/repo/includes/WikibaseRepo.php(563): Wikibase\Repo\WikibaseRepo::newInstance()
#3 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/extensions/Wikibase/repo/RepoHooks.php(104): Wikibase\Repo\WikibaseRepo::getDefaultInstance()
#4 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/core/includes/Hooks.php(174): Wikibase\RepoHooks::onSetupAfterCache()
#5 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/core/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#6 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/core/includes/Setup.php(794): Hooks::run(string)
#7 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/core/includes/WebStart.php(81): require_once(string)
#8 /mnt/c/Users/urban/unsynced/gerrit/mediawiki/core/index.php(39): require(string)
#9 {main}

Config is at https://github.com/urbanecm/wikifarm-config, fwiw

@Urbanecm ItemId is a class implemented in the dedicated data-model repository - make sure you have wikibase installed correctly (composer, submodules).

Thanks, running submodule update and composer install.

It appears the call to EntityLookup::getEntity() was introduced relatively recently (2019-08-05), during https://gerrit.wikimedia.org/r/527560 for T229572: Write a new js config var containing the MediaInfo entity on the File page
Interestingly enough this was done to gain access to "wbEntity" on mw.config which is a deprecated approach in wikibase and was recently converted to the hook approach, removed from MediaInfo in this same patch, there.

...digging...

The javascript hook approach doesn't work for MediaInfo, becase the hook doesn't fire if a MediaInfo item doesn't exist for a File page - which it doesn't if no structured data has been added to the page

AFAICS the introduction of EntityLookup::getEntity() has revealed a problem rather than causing one. The problem is that MediaInfo ids for a slot on a page are based on the id of the page, and if the page id changes so does the MediaInfo id

IMO @Urbanecm 's patch is fine (providing I can fix the unit tests) for now, but we need to think about the hard-coding of the relationship between page id and MediaInfo id

@Pablo-WMDE this bit of code in WikibaseMediaInfoHooks.php is what triggers the error:

$entityId = $this->entityIdFromPageId( $pageId );

$wbRepo = WikibaseRepo::getDefaultInstance();
$entityLookup = $wbRepo->getEntityLookup();
$entity = $entityLookup->getEntity( $entityId );

If, for example, I have a page with id 1234 then data for the MediaInfo slot on that page is stored as a json blob, and the json will include the field "id": "M1234". If then the page id subsequently changes to 1235 the json blob isn't updated. The MediaInfo *data* will still be found by getEntity() call (because the sql used to find it is generated with the condition 'page_id' => $entityId->getNumericId() (see MediaInfoEntityQuery.php::selectRows()), but the ids don't match, hence the exception

zeljkofilipin lowered the priority of this task from Unbreak Now! to High.Sep 3 2019, 1:29 PM
zeljkofilipin updated the task description. (Show Details)

Lowered priority since it's not blocking the train any more.

@Cparle

AFAICS the introduction of EntityLookup::getEntity() has revealed a problem rather than causing one. The problem is that MediaInfo ids for a slot on a page are based on the id of the page,

Well put, no discussion on that.

IMO @Urbanecm 's patch is fine [...] The MediaInfo *data* will still be found by getEntity() call

The patch is one to wikibase, affects all entity types, and removes an exception that prevented the storage layer of handing out an EntityRevision for an entity which does not match the question it was asked for. As a consequence we find the consequences of the change to be quite profound compared to the expected gain.
We would prefer a patch, and are also looking into that question, which enables users to "at least view the page" while keeping the stopgap closer to the culprit.

@Pablo-WMDE if you change your mind let me know, and I'll fix up the broken unit tests

Just realized this affects the API, when searching via title, as well. e.g.

https://commons.wikimedia.org/w/api.php?action=wbgetentities&titles=File:Bolsonaro_with_Israeli_PM_Benjamin_Netanyahu,_Tel_Aviv,_31_March_2019.jpg&sites=commonswiki

{
    "error": {
        "code": "internal_api_error_Wikibase\\Lib\\Store\\BadRevisionException",
        "info": "[XW6FvQpAMEYAACeJrDQAAABG] Caught exception of type Wikibase\\Lib\\Store\\BadRevisionException",
        "errorclass": "Wikibase\\Lib\\Store\\BadRevisionException"
    },
    "servedby": "mw1235"
}

Lowered priority since it's not blocking the train any more.

I raised the priority intentionally above, see history.

@Urbanecm the patch prevents the fatal error, but there are other errors if, for example, a user tries to edit a caption.

That's the intention. I want to have it deployed soon-ish, to avoid fatals on viewing. More noticeable than exceptions on editing. Then we/you can work on how to fix the core issue later.

zeljkofilipin raised the priority of this task from High to Unbreak Now!.Sep 3 2019, 4:09 PM

I raised the priority intentionally above, see history.

Oops, sorry, I thought the priority remained UBN after I've initially set it.

Change 534218 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/WikibaseMediaInfo@master] Entity lookup: mend old ids when loading

https://gerrit.wikimedia.org/r/534218

Change 534216 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/Wikibase@master] EntityRevisionLookup: specific exception for entity id mismatch

https://gerrit.wikimedia.org/r/534216

@Mholloway @Cparle I created two patches which show a less invasive alternative to https://gerrit.wikimedia.org/r/533907.
If you still feel like this would be the right direction for the product please feel free to take over the MediaInfo patch.
The reasoning for my -1 on it is that I think it is the equivalent of sticking a band aid on a wound that was not cleaned yet.
The breach of assumption that happened inside MediaInfo through the diverging page id must be fixed or the consequences of "temporarily" not making this a fatal are largely unpredictable to me. Also "temporarily" usually turns out to be a very long time if pressure is not kept on a topic - and fiddling around it will take the pressure off.
But by any means, use it if you think it can help the product for the moment, yet I have too little idea about MediaInfo to take responsibility for it.

Change 534216 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] EntityRevisionLookup: specific exception for entity id mismatch

https://gerrit.wikimedia.org/r/534216

Change 534218 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Entity lookup: mend old ids when loading

https://gerrit.wikimedia.org/r/534218

Change 534853 had a related patch set uploaded (by Jforrester; owner: Pablo Grass (WMDE)):
[mediawiki/extensions/Wikibase@wmf/1.34.0-wmf.21] EntityRevisionLookup: specific exception for entity id mismatch

https://gerrit.wikimedia.org/r/534853

Change 534854 had a related patch set uploaded (by Jforrester; owner: Pablo Grass (WMDE)):
[mediawiki/extensions/WikibaseMediaInfo@wmf/1.34.0-wmf.21] Entity lookup: mend old ids when loading

https://gerrit.wikimedia.org/r/534854

Cherry-picked because this is 97% of all fatals in production. I appreciate that this is a bandage and not a fix.

Change 533907 abandoned by Urbanecm:
[stopgap] Don't throw an exception on unexpected difference between M* ids

Reason:
other patch was proposed

https://gerrit.wikimedia.org/r/533907

Change 534854 merged by Urbanecm:
[mediawiki/extensions/WikibaseMediaInfo@wmf/1.34.0-wmf.21] Entity lookup: mend old ids when loading

https://gerrit.wikimedia.org/r/534854

Change 534853 merged by Urbanecm:
[mediawiki/extensions/Wikibase@wmf/1.34.0-wmf.21] EntityRevisionLookup: specific exception for entity id mismatch

https://gerrit.wikimedia.org/r/534853

Mentioned in SAL (#wikimedia-operations) [2019-09-09T12:51:33Z] <urbanecm@deploy1001> Synchronized php-1.34.0-wmf.21/extensions/Wikibase: ubn patch T231276 (duration: 01m 03s)

Mentioned in SAL (#wikimedia-operations) [2019-09-09T12:55:19Z] <urbanecm@deploy1001> Synchronized php-1.34.0-wmf.21/extensions/WikibaseMediaInfo/: ubn patch T231276 (duration: 00m 58s)

If you think the problem is fixed, feel free to resolve the task.

Cparle claimed this task.

Ok great, closing

It is definitely gone. Thank you very much!

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks