Page MenuHomePhabricator

Track how many files receive the {{Original Upload Log}} template per day
Closed, DeclinedPublic5 Estimated Story Points

Description

Motivation
We want to find out, if the FileImporter has influence on the number of files that are moved to Commons. Therefore we need a benchmark how many files are usually moved.

Task

  • Track how many files are receiving the {{Original Upload log}} template on Commons on at least a per day granularity
  • In the comments of this ticket, link to either a metrics.md file or the schema that explains which variables were introduced and what they mean

Notes
We could write a script to query the template link table every day, and compare this with the last day's results. It could live on labs, grafite....

Event Timeline

Lea_WMDE created this task.
Lea_WMDE renamed this task from Track how many files receive the {{NowCommons}} template per day to Track how many files receive the {{Original Upload Log} template per day.Feb 16 2018, 1:27 PM
Lea_WMDE updated the task description. (Show Details)
Lea_WMDE updated the task description. (Show Details)
Charlie_WMDE renamed this task from Track how many files receive the {{Original Upload Log} template per day to Track how many files receive the {{Original Upload Log}} template per day.Feb 27 2018, 3:13 PM
Lea_WMDE set the point value for this task to 5.

@Magnus @TTO are you tracking how many files are moved with CommonsHelper and For the common good? If yes, is this public somewhere?

I have a terrible hack, which has to be run in a browser JavaScript console:

// Run the following line of code, and wait for the script to load:
mw.loader.load("https://en.wikipedia.org/w/index.php?action=raw&ctype=text/javascript&title=MediaWiki:Gadget-morebits.js");

// Then run the following, and wait for it to complete:
var query = {
	action:'query',
	list:'iwbacklinks',
	iwbltitle:'WP:FTCG',
	iwblprefix:'en',
	iwbllimit:'500',
	rawcontinue:''
};
var count = 0;
var iterations = 0;
var recurse = function(apiobj) {
	iterations++;
	count += $(apiobj.getXML()).find("iw").length;
	console.log("iteration " + iterations + ", count = " + count);
	var cont = $(apiobj.getXML()).find("query-continue iwbacklinks").attr("iwblcontinue");
	if (cont) {
		query.iwblcontinue = cont;
		mwapi = new Morebits.wiki.api("doing it", query, recurse);
		mwapi.post();
	} else {
		console.log("https://commons.wikimedia.org/w/api.php?" + Morebits.queryString.create(query));
	}
};
var mwapi = new Morebits.wiki.api("doing it", query, recurse);
mwapi.post();

The output, which counts how many pages link to [[en:WP:FTCG]] (as present in the {{transferred from|...|...|ftcg}} template), gets logged to your browser console. For FtCG that count is 43814 at present. (Having kept track of this number for some time, it seems that only a tiny handful of users are using FtCG these days.)

So the secret is to make the tool transclude {{transferred from}} in all uploaded files, and inside that template, include some kind of link (either local, interwiki or external) which can then be counted using the relevant backlinks module in the API.

I have altered CommonsHelper to add the #commonshelper tag to the upload comment, like so. This is active as of now.

Of course, if WMF were to track somewhere which edit was doe using which OAuth consumer, that would give much better data...

Also, here are all files on Commons that were created in January and contain {{Original upload log}}. (Note: this query might run for a minute or so)

For now we are ok with the metrics given by petscan (see Magnus' comment above)

We needed to dump some of these upload counts, so I thought I would leave breadcrumbs for posterity.

Commonshelper:

select
    count(*) as `num`,
    left(rev_timestamp, 6) as `year_month`
from revision
join revision_comment_temp on revcomment_rev=rev_id
join comment on comment_id=revcomment_comment_id
join change_tag on ct_rev_id = rev_id
join change_tag_def on ct_tag_id = ctd_id
where
    ctd_name = 'OAuth CID: 67' -- OAuth Uploader
    and comment_text like '%#commonshelper%'
group by
    left(rev_timestamp, 6);

FTCG:

select
    count(*) as `num`,
    left(rev_timestamp, 6) as `year_month`
from iwlinks
join page on iwl_from = page_id
join revision on page_id = rev_page
where
iwl_prefix='en'
and iwl_title='WP:FTCG'
and (rev_parent_id is null or rev_parent_id = 0)
group by
    left(rev_timestamp, 6);

MTC!:

select
    count(*) as `num`,
    left(rev_timestamp, 6) as `year_month`
from categorylinks
join page on cl_from = page_id
join revision on page_id = rev_page
where
cl_to = 'Uploaded_with_MTC!'
and (rev_parent_id is null or rev_parent_id = 0)
group by
    left(rev_timestamp, 6);

... and here's a query which addresses the original question, summing all files using the {{Original upload log}} template:

select
    count(*) as `num`,
    left(rev_timestamp, 6) as `year_month`
from templatelinks
join page on tl_from = page_id
join revision on page_id = rev_page
where
tl_from_namespace = 6 -- NS_FILE
and tl_namespace = 10 -- NS_TEMPLATE
and tl_title = 'Original_upload_log'
and (rev_parent_id is null or rev_parent_id = 0)
group by
    left(rev_timestamp, 6);

... and here's a query which addresses the original question, summing all files using the {{Original upload log}} template:

Here's a snapshot of the query and results: https://quarry.wmflabs.org/query/45866