Page MenuHomePhabricator

Ensure content of ParserOutput is safe to serialize
Closed, ResolvedPublic

Description

ParserOutput provides two methods to attach arbitrary data: setProperty() and setExtensionData(). We need to ensure the data passed into these methods is safe to serialize, for two reasons:

  • ParserOutput should be kept small. Serializing arbitrary objects can easily drag in large amounts of data, causing issues with cache capacity.
  • We want to start using JSON for serializing the ParserOutput. To achieve this, only plain data can be supported here. That is, the data must not contain objects (beyond anonymous objects).

To avoid production errors such as T264363, attempts to attach complex objects to ParserOutput should be logged as errors (or deprecation warnings?), but should not throw an exception for a while.

Found in logstash:

  • $.ExtensionData.pageImages.13.frame.link-title (T266251)
  • $.ExtensionData.GeoDataCoordsOutput (T266248)
  • $.ExtensionData.TemplateDataStatus (T266252)
  • $.ExtensionData.kartographer (T266260)
  • $.ExtensionData.referenced-entities.0 (T266263)
  • $.JsConfigVars.entityTitle (T267377)
  • $.ExtensionData.translate-translation-page.sourcepagetitle (T266268)

Also noted in Semantic MediaWiki: issue 4868

Related Objects

Event Timeline

daniel triaged this task as High priority.Oct 2 2020, 8:14 AM
daniel created this task.

We have been serializing things into JSON EventBus for awhile, we have a battle-proven piece of code for this kind of validation: see EventBus::validateJSONSerializable. It might not cover all cases since we have good control over what we sent in there, but should be an ok start. Do you think we could port it up to FormarJson and reuse in EventBus?

Change 631801 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Check if non-JSON-serializable data passed to ParserOutput

https://gerrit.wikimedia.org/r/631801

Change 631801 merged by jenkins-bot:
[mediawiki/core@master] Check if non-JSON-serializable data passed to ParserOutput

https://gerrit.wikimedia.org/r/631801

The patch above added some debug logging into setProperty and setExtensionData methods. After it's deployed, we need to look if any non-serializable data is passed in there.

I went through all the ParserOutput member variables and it seems like unless setProperty or setExtensionData are doing something brutally bad, we should be fine.

Change 635071 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/mediawiki-config@master] Enable warn+ logging for ParserCache channel

https://gerrit.wikimedia.org/r/635071

Change 635071 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable warn+ logging for ParserCache channel

https://gerrit.wikimedia.org/r/635071

After deploying to beta and to testwiki revealed a bunch of non-JSON-serializable data pithing ExtensionData: https://logstash-beta.wmflabs.org/goto/de000f31b6d6abad639c7ce039917826 and https://logstash.wikimedia.org/goto/e0806c65d45b32bc0d6ca307bf9c76bc

reveals the following properties until now:

  • $.ExtensionData.pageImages.13.frame.link-title
  • $.ExtensionData.GeoDataCoordsOutput
  • $.ExtensionData.referenced-entities.0
  • $.ExtensionData.TemplateDataStatus
  • $.JsConfigVars.entityTitle
  • $.ExtensionData.kartographer

These all indicate different extensions placing instances of classes into extension data. These can not be JSON-serialied. Extensions need to be fixed to only put arrays into it's data.

Reedy renamed this task from Ensure content of ParserOutput is safe to serialize. to Ensure content of ParserOutput is safe to serialize.Oct 22 2020, 11:33 PM

Change 635906 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Bikesheeding demo: safe ParserOutput extension data

https://gerrit.wikimedia.org/r/635906

Change 635906 merged by jenkins-bot:
[mediawiki/core@master] Safe ParserOutput extension data and JsonUnserializable helper.

https://gerrit.wikimedia.org/r/635906

Pchelolo claimed this task.

After deploying to group0 I see no logs about JSON serialization. https://logstash.wikimedia.org/goto/acb6acc897690baf1c2ebe23fe6b48e8