Links in comments are a major source of DB queries and presumably contribute to the slowness of the history page and other large lists of changes or events (e.g. T284274). The obvious solution is to format all the comments in a batch, so that page existence checks can be done in a batch.
The complications are:
- Data flow
- Helper classes that intervene between result wrappers and HTML: CommentStore, RevisionRecord, RecentChange, etc.
- DELETED_COMMENT flags
The case studies below are the interesting cases pulled out of a caller survey of core and some extensions.
IndexPager
Direct Linker::formatComment() callers in pagers are mostly pretty easy:
class SomePager extends IndexPager { ... public function preprocessResults( $result ) { $this->formattedComments = $this->commentFormatter->formatRows( 'some_comment', $result ); } public function formatValue( $name, $value ) { switch ( $name ) { case 'some_comment': $formatted = $this->formattedComments[$this->getResultOffset()]; break; } } }
CommentFormatter calls CommentStore on the rows with "some_comment" being passed through as $key. IndexPager::getResultOffset() is trivial. So far, CommentFormatter only depends on CommentStore.
HistoryPager
revComment() is more difficult because it gets the comment text from a RevisionRecord. CommentFormatter in this case would need a RevisionStore, and all the RevisionRecord objects would be created twice, potentially with performance consequences.
class HistoryPager extends ReverseChronologicalPager { ... protected function doBatchLookups() { ... $this->formattedComments = $this->commentFormatter->formatRevisionRows( $this->mResult, false, true, false ); } ... private function historyLine( $row, $next, $notificationtimestamp, $resultOffset ) { ... $s2 = $this->formattedComments[$resultOffset]; ... } }
The parameters to formatRevisionRows() as proposed above are the same as Linker::revComment() except with the RevisionRecord replaced with a ResultWrapper. Linker::revComment() knows about deleted flags, whereas Linker::commentBlock() callers do their own deleted flag handling -- an inconsistency which is more obvious in the new interface.
To avoid unnecessary dependencies, we could have RevisionCommentFormatter which depends on RevisionStore. If you just want to format some strings, there would be another formatter class that is constructible without a RevisionStore.
RevisionList
The RevisionItem/RevisionList/RevDel* hierarchy is somewhat complicated. If we spray some around some @internal annotations, and break b/c, then I think it's feasible. There are some ways to do it in a backwards-compatible way, but I don't think extensions actually use these classes.
(Note: These classes do not provide a generic revision array backend. RevisionItemBase::getHTML() is abstract, firmly tying these classes to the index.php UI.)
Currently RevisionItemBase subclasses receive a RevisionListBase and a row as parameters. Most of the logic is implemented in the item subclasses -- lists don't really know how to get comments out of rows. So we'd need a list method that preprocesses results, and the comment extraction logic would move to there. Then items would receive a formatted comment as a constructor parameter.
A backwards-compatible way to do it would be to mutate the rows before passing them to the item classes, effectively adding a fake formatted_comment column to the result set.
ChangesList
The poorly-named ChangesList::insertComment() takes a RecentChange object and returns an HTML string. It's overridden by AbuseFilter. There is a place to put batch queries: ChangesList::initChangesListRows(). We could do the comment batch there, and store the formatted comments indexed by rc_id or something. ChangesList::insertComment() would call an accessor method which gets formatted comments out of the batch.