Page MenuHomePhabricator

New Pages Feed: scoring the 'attack' category on draftquality model
Closed, ResolvedPublic

Description

In T195796, we discovered that the ORES extension currently stores scores for three of the four draftquality classes ('spam', 'vandalism', 'ok' -- but not 'attack'). The queries to derive the 'attack' score from the other three would be prohibitively unperformant. This task is about making it so that the 'attack' score is also stored in the mediawiki database.

Here is the conversation from T195796:

Filtering on draftquality may not be as straightforward as we'd like, especially for attacks (class 0).

The ORES extension stores 3 lines for every revision scores on draftquality. It ignores class 0 explicitly. Its value can be deducted from the other lines by substracting the sum of their probabilities from 1 (1 - ( p(OK) + p(SPAM) + p(VANDALISM) ) ). If none of the other class has oresc_is_predicted === 1 then class 0 logically has it.

Here is an example of a revision that is probably an attack.

mysql:research@s3-analytics-slave [enwiki]> select * from ores_classification where oresc_model=33 and oresc_rev=845715298;
+-----------+-----------+-------------+-------------+-------------------+--------------------+
| oresc_id  | oresc_rev | oresc_model | oresc_class | oresc_probability | oresc_is_predicted |
+-----------+-----------+-------------+-------------+-------------------+--------------------+
| 232150135 | 845715298 |          33 |           1 |             0.060 |                  0 |
| 232150136 | 845715298 |          33 |           2 |             0.258 |                  0 |
| 232150137 | 845715298 |          33 |           3 |             0.312 |                  0 |
+-----------+-----------+-------------+-------------+-------------------+--------------------+
3 rows in set (0.00 sec)

Revisions that are attacks can be found with a query like this:

mysql:research@s3-analytics-slave [enwiki]> select oresc_rev, count(oresc_is_predicted) as c, sum(oresc_is_predicted) as s from ores_classification where oresc_model=33 group by oresc_rev having c=3 and s=0 limit 4;
+-----------+---+------+
| oresc_rev | c | s    |
+-----------+---+------+
| 819044721 | 3 |    0 |
| 819086300 | 3 |    0 |
| 819201556 | 3 |    0 |
| 819241003 | 3 |    0 |
+-----------+---+------+
4 rows in set (3.58 sec)

Storing class 0 for draftquality would take up more space but make filtering much easier.

That sum query is probably not gonna be great performance-wise. You could also do this with something like LEFT JOIN ores_classification ON oresc_model=33 AND oresc_rev=rev_id AND oresc_is_predicted=1 WHERE oresc_probability IS NULL, it's possible that that would be more performant, but it would also flag all unscored revisions as attacks.

I agree that we should ask the ORES team to store class 0 for draftquality.

Event Timeline

@SBisson -- FYI that this task is created and is in the "To Do" column on the sprint board.

This is the code that explicitly discards class 0.

There is a number of ways to keep class 0 for draftquality, here's a few:

  1. Only discard class 0 when the name of the class is 'false'
  2. Only discard class 0 when there is only 2 classes
  3. Configure 'draftquality' explicitly storeAllClasses = true

Maybe the first 2 make more sense since querying models that have binary classes is always easy but it gets hard once you have more than 2. This would be balancing storage space vs. ease of use.

I don't mind any solution. I'm curious what the scoring team thinks (@awight @Ladsgroup)

Change 445402 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/ORES@master] Store class 0 of models with more than 2 classes

https://gerrit.wikimedia.org/r/445402

The patch above proposes a simple solution. Feel free to comment on the patch or the task.

Change 445402 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Store class 0 of models with more than 2 classes

https://gerrit.wikimedia.org/r/445402

draftquality should start including class 0 (attack) next week when this code is deployed with the train.

When we backfill the scores in T198982, we should consider rescoring the revisions that were scored without this code.

Checked in betalabs - the records with draftquality and class=0 get stored in ores_classification table on enwiki. I'll be monitoring the number of such records and check in production when it'll be deployed:

wikiadmin@[enwiki]> select count(*) from ores_classification where oresc_model=24 and oresc_class=0;
+----------+
| count(*) |
+----------+
|        8 |
+----------+
1 row in set (0.00 sec)

and the timestamp in recentchanges:

[enwiki]> select min(rc_timestamp)  from recentchanges where rc_this_oldid in (select oresc_rev from ores_classification where oresc_model=24 and oresc_class=0);
+-------------------+
| min(rc_timestamp) |
+-------------------+
| 20180713162353    |
+-------------------+
1 row in set (0.01 sec)

enwiki betalabs successfully stores draftquality in all classes:

[enwiki]> select oresc_model, oresc_class, count(*) from ores_classification where oresc_model=24 group by oresc_class;
+-------------+-------------+----------+
| oresc_model | oresc_class | count(*) |
+-------------+-------------+----------+
|          24 |           0 |       99 |
|          24 |           1 |      568 |
|          24 |           2 |      548 |
|          24 |           3 |      218 |
+-------------+-------------+----------+
4 rows in set (0.01 sec)

Checked in production - the records get stored there too.

I do not think I can effectively check this any better than @Etonkovidova has.