Page MenuHomePhabricator

Media html read view considerations
Open, MediumPublic

Description

While T270150 exists for the performance of the media css, the html itself should be examined for bloat. Since this is going to be rolled out to all wikis before Parsoid's general html, any considerations we have regarding competing editing vs viewing needs should be battled out beforehand.

See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/747219/3#message-ddcbc47f90556355c0da8001cb1c7383823f0bc7

@Krinkle wrote:

This seems quite a significant amount of data to parse/allocate in DOM attributes on large articles (transfer size should be minimal given gzip and nearby anchor in the common case), but overall DOM memory may be non-trivial. Worth measuring emperically on various devices/browsers/articles, I think.

Event Timeline

Arlolra triaged this task as Medium priority.Dec 17 2021, 9:54 PM
Arlolra updated the task description. (Show Details)

Change 763861 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Revert \"Add \"resource\" attribute to img tags\"

https://gerrit.wikimedia.org/r/763861

Change 763861 merged by jenkins-bot:

[mediawiki/core@master] Revert \"Add \"resource\" attribute to img tags\"

https://gerrit.wikimedia.org/r/763861

Krinkle updated the task description. (Show Details)
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.
Krinkle subscribed.

https://gitlab.wikimedia.org/jgiannelos/parsoid-readview-benchmark are the scripts @Jgiannelos is using for T272331, that can maybe be repurposed for the analysis here.

@ihurbain suggested maybe using a patchdemo https://patchdemo.wmflabs.org/ with wgParserEnableLegacyMediaDOM set vs production to do the comparison. Either that, or we need to dig up the vms used in T266149

Change 828056 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [DNM] Disable wgParserEnableLegacyMediaDOM

https://gerrit.wikimedia.org/r/828056

@ihurbain suggested maybe using a patchdemo https://patchdemo.wmflabs.org/ with wgParserEnableLegacyMediaDOM set vs production to do the comparison.

Alas, the proxying that patchdemo offers is based on MobileFrontendContentProvider, which makes parse requests to the production action api,
https://github.com/wikimedia/mediawiki-extensions-MobileFrontendContentProvider/blob/master/includes/MwApiContentProvider.php#L92

so, the patch doesn't get used when rendering the content.

Test wiki on Patch demo by Arlolra using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/607b0f0fb3/w/

Change 828056 abandoned by Arlolra:

[mediawiki/core@master] [DNM] Disable wgParserEnableLegacyMediaDOM

Reason:

https://gerrit.wikimedia.org/r/828056

@ihurbain suggested maybe using a patchdemo https://patchdemo.wmflabs.org/ with wgParserEnableLegacyMediaDOM set vs production to do the comparison.

Alas, the proxying that patchdemo offers is based on MobileFrontendContentProvider, which makes parse requests to the production action api,
https://github.com/wikimedia/mediawiki-extensions-MobileFrontendContentProvider/blob/master/includes/MwApiContentProvider.php#L92

so, the patch doesn't get used when rendering the content.

Sad! But, thanks for testing this - we'll know for next time!

On Oct 7, the Content-Transform-Team met with the Performance-Team to discuss the performance impact of switching to Parsoid. Metrics that were identified as important to measure for read views were time to first render and, as here, overall DOM memory. Since, in T314318, we've already deployed the media structure changes to a number of wikis, we should be able to use the existing dashboards to determine the impact that's had compared to the baseline legacy output.

Since, in T314318, we've already deployed the media structure changes to a number of wikis

The description of that task has the timeline of when the changes were deployed, for the analysis here.

The existing dashboards seem like they can be found from https://grafana.wikimedia.org/d/cZgMg49Wz/performance-summary

The only one that looks like it breaks down by domain is https://grafana.wikimedia.org/d/000000282/webpagereplay-drilldown but doesn't contain itwiki or any of the newly enabled wikis.

Hi @Arlolra you are right, we are only testing a couple of pages on a couple of wikis. Sync with me upcoming wikis where you will do the switch and a couple of pages that you thing is representable, then I can add those ASAP so we have before/after.

FYI: The correct dashboard for being able to drilldown all synthetic tests is https://grafana.wikimedia.org/d/IvAfnmLMk/page-drilldown - there's better description there to make it a little but easier to understand. I'll update that other link.

The plan in T314318 is to move to group1 wikis next so cawiki and hewiki are probably fine to use. A representative page should have a lot of media on it. I see Barack Obama as an example already on that drilldown dashboard. That seems like it would be ok to use, lots of pictures there. The page for Dog would also work or anything else you can think of in that vein. Thanks

The plan in T314318 is to move to group1 wikis next so cawiki and hewiki are probably fine to use. A representative page should have a lot of media on it. I see Barack Obama as an example already on that drilldown dashboard. That seems like it would be ok to use, lots of pictures there. The page for Dog would also work or anything else you can think of in that vein. Thanks

I think you might be looking in the pages for enwiki, not cawiki or hewiki. We currently monitor the following three URLs on ca.wikipedia.org, from the Page drilldown dashboard in Grafana:

Screenshot 2022-12-09 at 19.52.33.png (366×926 px, 67 KB)

I think you might be looking in the pages for enwiki, not cawiki or hewiki

Sorry, no, it's worse than that. What I was suggesting was adding pages for Barack Obama and Dog on cawiki and hewiki because I failed to note that you were already monitoring any pages at all on cawiki.

We currently monitor the following three URLs on ca.wikipedia.org

I think those pages are sufficient for our purposes here. I'll get the media changes deployed to cawiki as soon as possible.

Change 866653 had a related patch set uploaded (by Arlolra; author: Arlolra):

[operations/mediawiki-config@master] Disable wgParserEnableLegacyMediaDOM on cawiki

https://gerrit.wikimedia.org/r/866653

Change 866653 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable wgParserEnableLegacyMediaDOM on cawiki

https://gerrit.wikimedia.org/r/866653

Mentioned in SAL (#wikimedia-operations) [2022-12-15T14:18:31Z] <lucaswerkmeister-wmde@deploy1002> Started scap: Backport for [[gerrit:866653|Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318)]]

Mentioned in SAL (#wikimedia-operations) [2022-12-15T14:20:15Z] <lucaswerkmeister-wmde@deploy1002> lucaswerkmeister-wmde and arlolra: Backport for [[gerrit:866653|Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-12-15T14:27:49Z] <lucaswerkmeister-wmde@deploy1002> Finished scap: Backport for [[gerrit:866653|Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318)]] (duration: 09m 18s)

We currently monitor the following three URLs on ca.wikipedia.org, from the Page drilldown dashboard in Grafana:

Screenshot 2022-12-09 at 19.52.33.png (366×926 px, 67 KB)

Post-deploy, I've ?action=purged these pages so that they are now rendered with the new structure and we can proceed with analysis.

Centering around the Dec 15 deployment, I started to take a look at what may have changed as a result,

Screenshot 2023-01-13 at 7.41.53 PM.png (1×2 px, 888 KB)

Screenshot 2023-01-13 at 7.38.58 PM.png (1×2 px, 371 KB)

In the above, the number of dom elements increased. Some notes around that are,

  • Additional wrappers around inline media (ex. a > img becomes span > a > img)
  • For some block media, caption are now present in the html but not displayed (suppressed by css display: none), whereas previously they would have been there altogether
  • figures always have a figcaption element, even if there's no content to be rendered in it (though the same was probably true before for the thumbcaption div)
  • The new structure eliminates the thumbinner wrapping div
  • Uses css for the magnify link (a rendered div is eliminated but ::before and ::after sections are added)

There are only document.querySelectorAll( '[typeof*="mw:File"]' ).length = 13 files on the page so a jump of 80 elements would be surprising. Looking at the two other cawiki pages, that increase doesn't hold.

The change might be contributing to the 7% increase in CSS size.

Change 888127 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/888127

Change 888127 merged by jenkins-bot:

[mediawiki/core@master] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/888127

Change 901294 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@REL1_40] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/901294

Change 901294 merged by jenkins-bot:

[mediawiki/core@REL1_40] Change the default of wgParserEnableLegacyMediaDOM to false

https://gerrit.wikimedia.org/r/901294