Generic tag for current issues that are causing slow server responses or slow/costly client-side payloads.
This is a form of Technical-Debt (use that workboard).
(For the workboard of the Performance Team, see Performance-Team.)
Generic tag for current issues that are causing slow server responses or slow/costly client-side payloads.
This is a form of Technical-Debt (use that workboard).
(For the workboard of the Performance Team, see Performance-Team.)
I believe Bernard is investigating. Please unassign if not!
For completeness, another option is the varnish "x-key" system, which involves two research projects. One is that implementation of x-key in varnish appears to be incomplete, and the second is that the assignment of appropriate x-keys to URLs is non-trivial as well. There are too many templates used on a page like [[Barack Obama]] to naively assign one x-key to every recursively-included template, so we still need to come up with a mechanism to determine which of the templates deserve an x-key assigned, likely based on purge statistics.
The number of resource_change and resource_purge events can get extremely high, spiking at 10k req/sec at times
Another option is to subdivide pages into two categories, "high traffic pages" and "long tail low traffic pages". The latter would be put effectively into a no-cache state: the cache lifetime would be very short, and we would never emit purges for them, relying on the natural expiration to deal with vandalism. We'd only emit purges for the high traffic pages.
See also:
2013: Upstream jQuery 2.0, changes $.globalEval from eval.call(window) to domEval (i.e. inline script) for strict mode, and indirect = eval; indirect() for non-strict mode. Note that MediaWiki never deployed jQuery 2.
In T368405#9945822, @Sfaci wrote:I have been exploring a bit more the changes around 18th of May and there is no change that we can correlate to these events. The service code haven't been changed this year and the only changes we have done are related to the kubernetes configuration as I mentioned before. The closest change, regarding time, is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1033405 where some network policies were changed on 24th May (I guess it was deployed after that date) but I can't say whether that change could be related to this. @BTullis any idea here?
Taking a look at the grafana dashboard for edit-analytics for the last two months it seems that the latency is pretty stable (there are a couple of peaks but the rest of the chart is fine) so, considering that, I don't know what happens but I would say that these events are not related to the service itself. Regarding the timeouts you mentioned, just wondering if there is something preventing your app from reaching the service.
I think this can be closed.
Change #1054290 merged by jenkins-bot:
[mediawiki/extensions/MultimediaViewer@master] Remove dead code from 2014 related to fullscreen pre-loading
Change #1054290 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/MultimediaViewer@master] Remove dead code from 2014 related to fullscreen pre-loading
"if the template is unprotected (protected templates are unlikely to be vandalized)."
Change #1053907 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):
[mediawiki/core@master] RefreshLinksJob: collect stats on redundant parses
The nature of this alert
I've been looking through the graphs and I noticed an increase in the CSS payload on July 5, from 25.2kB to 25.9kB
I think this could be related to the baseline increase in the fully loaded metric, from 2.28s on July 4 to 2.30s on July 5 (a 20ms increase).
The above error comes from:
[email protected][wikidatawiki]> explain SELECT rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE (ct_rev_id=rev_id) ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.385 AS `ores_damaging_threshold` FROM `revision` FORCE INDEX (rev_actor_timestamp) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 11 AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = 1) WHERE actor_name = 'Jarekt' AND (page_namespace = 1) AND ((rev_deleted & 4) = 0) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 51 ; +------+--------------------+---------------------+--------+----------------------------+-----------------------+---------+------------------------------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+---------------------+--------+----------------------------+-----------------------+---------+------------------------------------------+---------+-------------+ | 1 | PRIMARY | actor_rev_user | const | PRIMARY,actor_name | actor_name | 257 | const | 1 | | | 1 | PRIMARY | user | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | PRIMARY | revision | ref | rev_actor_timestamp | rev_actor_timestamp | 8 | const | 2300078 | Using where | | 1 | PRIMARY | page | eq_ref | PRIMARY,page_name_title | PRIMARY | 4 | wikidatawiki.revision.rev_page | 1 | Using where | | 1 | PRIMARY | ores_damaging_cls | eq_ref | oresc_rev_model_class | oresc_rev_model_class | 7 | wikidatawiki.revision.rev_id,const,const | 1 | | | 1 | PRIMARY | comment_rev_comment | eq_ref | PRIMARY | PRIMARY | 8 | wikidatawiki.revision.rev_comment_id | 1 | | | 2 | DEPENDENT SUBQUERY | change_tag | ref | ct_rev_tag_id,ct_tag_id_id | ct_rev_tag_id | 5 | wikidatawiki.revision.rev_id | 1 | Using index | | 2 | DEPENDENT SUBQUERY | change_tag_def | eq_ref | PRIMARY | PRIMARY | 4 | wikidatawiki.change_tag.ct_tag_id | 1 | | +------+--------------------+---------------------+--------+----------------------------+-----------------------+---------+------------------------------------------+---------+-------------+ 8 rows in set (0.002 sec)
I just had the same issue:
I have been exploring a bit more the changes around 18th of May and there is no change that we can correlate to these events. The service code haven't been changed this year and the only changes we have done are related to the kubernetes configuration as I mentioned before. The closest change, regarding time, is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1033405 where some network policies were changed on 24th May (I guess it was deployed after that date) but I can't say whether that change could be related to this. @BTullis any idea here?
Moving it to "Triaged" for now as Special:Homepage and its overall performance is not our main focus, but also tagging Technical-Debt so that we have a chance to find it when we want to improve GE in that regard.
Thanks @VirginiaPoundstone and @Sfaci for looking into this!
Regarding @VirginiaPoundstone has mentioned, I wanted to add that the delay was confirmed only when AQS service starts.
@Michael there was a change on AQS explained in T366851: gocql startup times have increased between v1.2.0 and v1.6.0. After upgrading gocql library, it seems that startup times for all services increased.
By now I'm pretty confident that this issue happens in Phab's "Javelin" JavaScript stack (which may allow setting some breakpoints in the browser).
So tackling the "structured tasks mobile preview"-fallback seems like the main priority to fix this now.
Pinging @KStoller-WMF because that requires a product decision for how and when exactly to do that.