Page MenuHomePhabricator

Determine storage requirements for stashing parsoid output for VE edits
Closed, ResolvedPublic

Description

The storage backend for stashing parsoid output for VE edits in the page/html endpoint needs to be configurable. The requirements in for persistance and latency are still unclear though.

Outcome

  • On the Cassandra keyspace used by RESTbase for stashing edits, we are seeing about 100 writes per second across all wikis (but only about 10 reads/s, indicating that 90% of edits are abandoned)
  • At a TTL of 24h, to amounts to about 7 million entries at any given time
  • Assuming an average of 20KB for each HTML blob, this works out to be 140GB.
  • Since this is essentially a key/value store, not much extra space is needed for indexes.
  • The sorage requirement will be multiplied by the replication factor

Backend tech choice:

  • Replication requirement: we need the stahed data to be available across DCs. Candidate tech: MemCached via mcrouter, Cassandra, MySQL (Redis as well, but it is being phased out).
  • Retention requirement: if stashed data vanishes, this directly impacts users by causing edits to fail. We don't want that. Candidate tech: Cassandra, MySQL
  • Performance requirement: high write rate. Candidate tech: MemCached via mcrouter, Cassandra
  • Space requirement: we need hundreds of GB with no unexpected eviction. Candidate tech: Cassandra, MySQL
  • Ease of deployment/maintenance: use what we have. Candidate tech: MemCached via mcrouter, MySQL.

Given the requirements above, the choice is between Cassandra and MySQL. Cassandra would require a significant effort (bundling and deploying a driver, implementing an adapter, setting up and running the Cassandra cluster). Using the ParserCache MySQL cluster only requires a small config change. So we whould try MySQL first, and picot to Cassandra if needed.

See T308511: [SPIKE] Determine necessity of edit session continuity during data center switchovers

Related Objects

StatusSubtypeAssignedTask
StalledNone
In ProgressNone
OpenNone
ResolvedNone
OpenNone
ResolvedJgiannelos
Resolveddaniel
ResolvedClement_Goubert
DeclinedNone
Resolvedhnowlan
In ProgressNone
Resolveddaniel
Resolveddaniel
Resolveddaniel
OpenMSantos
OpenNone
OpenNone
ResolvedROdonnell-WMF
OpenBUG REPORTNone
ResolvedBUG REPORTdaniel
ResolvedBUG REPORTdaniel
OpenBUG REPORTNone
OpenNone
Resolveddaniel
OpenNone
Resolveddaniel
Resolveddaniel
Resolveddaniel
OpenNone
OpenNone
ResolvedJgiannelos
ResolvedBPirkle
ResolvedJgiannelos

Event Timeline

Change 802584 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidOutputStash: make storage backend configurable.

https://gerrit.wikimedia.org/r/802584

Change 802584 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputStash: make storage backend configurable.

https://gerrit.wikimedia.org/r/802584

daniel claimed this task.
daniel updated the task description. (Show Details)

See summary in task description