Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018: Difference between revisions

Content deleted Content added
Update 2018-06-07: work continues on adding drafts to the feed
m Reverted edit by 41.116.103.223 (talk) to last version by Gonnym
 
(24 intermediate revisions by 4 users not shown)
Line 1:
{{tracked|T193782}}
{{shortcut|WP:AFCIMPROVE}}
{{notice|'''This page documents the development process for a completed project undertaken by the WMF Growth team from April to October 2018.'''}}
 
The Wikimedia Foundation's [[metawiki:Community_Tech|Community Tech team]] teamand is[[mw:Growth|Growth teams]] are extending the [[Special:NewPagesFeed|New Pages Feed]] interface to allow both [[Wikipedia:WikiProject Articles for creation|Articles for Creation]] (AfC) reviewers and [[Wikipedia:New pages patrol|New Page Patrol]] (NPP) reviewers to prioritize pages for review using quality and copyright violation scores. This work will take place during May and June 2018.
 
== Summary ==
Line 137 ⟶ 139:
 
<gallery heights=200px widths=200px>
File:AfC list 2018-05-17.png|none|thumb|AfC list
File:AfC menus concept A 2018-05-17.png|none|thumb|AfC menus: concept A
File:AfC menus concept B 2018-05-17.png|none|thumb|AfC menus: concept B
File:NPP menus concept A 2018-05-17.png|none|thumb|NPP menus: concept A
</gallery>
 
Line 208 ⟶ 210:
 
And with respect to the third part of the project -- copyvio -- we are taking some time as the engineers work on the beginning parts of this project to think about the right way to implement this capability in the New Pages Feed, taking into account all the thoughts on the [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018|Talk page]].
 
=== Update 2018-06-14: initial screenshot from work in progress ===
The Community Tech team has made progress over the past week on adding draft pages to the New Pages Feed. We have been working on the same Phabricator tasks as mentioned in last week's update ([[phab:T195545]] and [[phab:T195924]]), and we added one additional task to the to-do list: [[phab:T197054|adding a feature flag]]. This will allow us to wait until a cohesive set of changes are developed for the New Pages Feed before exposing any of them to reviewers -- as opposed to the feed changing in little, incomplete ways over time.
 
At this point, we do have a glimpse of how the user interface changes are shaping up -- though it is a still a work in progress. The image below is from a developer's local environment, meaning these changes are not yet available on the actual wikis for anyone to see.
 
'''The screenshot shows a couple of important points of progress:'''
* The new toggle in which a reviewer will select whether they are doing "New Page Patrol" or "Articles for Creation".
* With "Articles for Creation" selected, the feed is restricted to drafts. The four statuses of where a draft can be in its lifecycle are available as filters ("Unsubmitted", "Awaiting review", "Under review", "Declined") as well as an option to view "All" drafts.
 
'''It also shows a few things that are incomplete:'''
* Next to the word "Showing", the filters selected in the menu will be listed, as opposed to saying "reviewed, unreviewed".
* The new sorting options for submission and declined dates have not yet been implemented.
* The developer's environment only has three drafts for testing, as opposed to the thousands that exist in reality.
* Most drafts would not have "No categories".
* The ORES and copyvio work has not yet been undertaken.
 
Please take a glance at the screenshot and add any of your reactions to this [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018|project's talk page]]. Sometimes seeing something take shape can inspire thoughts that wouldn't have occurred before, and we definitely want to get a sense for whether this feels like it's on the right track.
 
[[File:New Pages Feed in progress 2018-06-14.png|center|Screenshot from developer environment on New Pages Feed work in progress]]
 
=== Update 2018-06-21: now able to filter drafts by state in testing environment ===
Over the past week, the Community Tech team has mostly been working on [[phab:T195924]], which is about making it possible to filter drafts by their state ("Unsubmitted", "Awaiting review", "Under review", "Declined", "All"). The team has stood up the software in a testing environment as they develop it, and we have been noting issues in Phabricator as we test out the changing capabilities.
 
The next major item the team will be working on is [[phab:T195547|making it possible to sort drafts by their submitted and declined dates]].
 
=== Update 2018-06-28: team update and beginning work with ORES ===
There are four main topics in this update:
# Collaboration team involvement
# Current work on sorting options
# Upcoming testing
# Beginning work with ORES
 
==== Collaboration team involvement ====
We wanted to let everyone know about a team assignment change that will hopefully help this project be completed sooner. So far, the engineering on this project has been done by the [[metawiki:Community_Tech|Community Tech team]]. That team has been working on the first major part of the project, which is to add drafts to the New Pages Feed, and make the feed sortable by state and filterable by date. Starting next week, a different Wikimedia Foundation team, the [[mw:Collaboration|Collaboration team]], will be completing the second and third parts of the project, which are adding ORES scores and adding copyvio detection. The Collaboration team was the team that [[mw:Edit_Review_Improvements|added ORES scores to the Recent Changes feed]], giving them good experience using ORES scores and working with the various feeds in Mediawiki. I ({{u|MMiller (WMF)}}) will continue to be the product manager for this work. Community Tech has been doing great work so far, and we're being careful to transfer their knowledge to Collaboration so that the project continues smoothly.
 
==== Current work on sorting options ====
The last item that the Community Tech team is working on with this project, before the Collaboration team begins their work, is [[phab:T195547]], which will make it possible to sort AfC drafts by their most recent date of submission or most recent date they were declined, in addition to the original date they were created. That is the main work item currently underway this week and next.
 
==== Upcoming testing ====
Now that the initial work to add drafts to the New Pages Feed is largely complete, we are setting up our ability to rigorously test the new functionality. We are working to surface the new features in the [[testwiki:Main_Page|Test Wiki]] next week. Once the new features are there, we will post another update asking reviewers to try them out and reply with thoughts and bugs. At that point, reviewers who are testing might determine that the simple addition of drafts to New Pages Feed, even without ORES and copyvio, are enough of an improvement that they could be put into production on English Wikipedia.
 
==== Beginning work with ORES ====
As mentioned above, next week the Collaboration team will begin [[phab:T196178|the work to integrate ORES models]] into the New Pages Feed. Now that the ORES work will be beginning, I wanted to resurface a previous conversation and a decision we've made about how to proceed. In the [[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#Front-end|project update from 2018-05-17]], we posted wireframes for what we called "Concept A" and "Concept B".
* Concept A: reviewers would be able to choose specific "Predicted class" categories (Stub, Start, C-class, etc.) and specific "Predicted issues" categories (Spam, Attack, etc.)
* Concept B: instead of choosing specific categories, reviewers would choose from a smaller set of structured recommended options, e.g. "Likely high quality" or "Likely low quality".
In the discussion [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018#Asking for thoughts on the update from 2018-05-17|on the talk page]], some reviewers preferred Concept A because it gives reviewers more control, and some preferred Concept B because it provides clearer recommendations and less opportunity for mis-using the model scores. We have decided to implement Concept A because we believe it will be a good stepping stone to help reviewers figure out whether Concept B is better, and if so, what rules should be used for the structured options of Concept B. In other words, by implementing Concept A, reviewers will have the opportunity to experiment with the ORES scores, decide whether Concept B is preferred, and then develop the rules for it. From the engineering perspective, having implemented Concept A, it will be relatively easy to subsequently implement Concept B.
 
Please do post on the talk page with reactions, questions, or any other thoughts.
 
=== Update 2018-07-05: revisiting copyvio and setting up testing environment ===
 
Now that there are two teams working on this project, Community Tech and Growth, there are a handful of interesting updates:
 
* The Community Tech team has [[phab:T195547|continued to work on making drafts sortable]] by their submission and declined dates.
* The Growth team has started work on surfacing ORES scores for "predicted class" and "predicted issues". That work is happening in two Phabricator tasks: [[phab:T198748|one for the back-end]], and [[phab:T198747|one for the front-end]].
* Both teams are working to set up a useful testing environment so that community members can test out this functionality before it becomes part of English Wikipedia. Although my previous update predicted that would be available this week, it has taken longer than expected to do it right, and the teams are still working on it.
* We talked with the community developers behind CopyPatrol and Earwig's Copyvio Detector to get their perspectives on our best path forward for adding copyvio detection to the New Pages Feed, while staying inside our technical limitations and licensing limits with outside vendors. The notes from those conversations are on [[phab:T193809|this Phabricator task]]. Next week, the Growth team will incorporate that information as they plan the copyvio phase of this project.
* Some of the code refactoring work that the teams did for the New Pages Feed [[Wikipedia:Village pump (technical)#Issue with auto-patrol?|accidentally caused a bug]] in which pages created by autopatrolled editors were present in the New Pages Feed, instead of skipping the feed. That bug was reported on Friday, June 29, and was fixed on Monday, July 2.
 
=== Update 2018-07-12: ORES work and copyvio planning ===
 
If any AfC or NPP reviewers will be at Wikimania next week, please let me know! I'm hoping to meet some members of this community in person.
 
We're now formally testing the components of this project in our testing environments. As I've said in previous updates, as soon as we're technically able to do so, I'll ask community members to take some time to test things out as well.
 
Over the past week, the Community Tech team has continued the work to make drafts sortable by their submission and declined dates. And the Growth team has been writing the code to incorporate ORES scores into the New Pages Feed, and most of that code is now under review before it makes its way to the testing environment.
 
The Growth team has also learned a lot about using Google and Turnitin for copyvio detection, and has had multiple architectural conversations this week to narrow in on an approach.
 
=== Update 2018-07-20: testing, ORES work, and copyvio benchmarking ===
 
Over the past week, the Growth team has finished writing most of the components necessary for applying ORES scores to pages in the New Pages Feed, and along with the filters for the state of drafts, those components are now in our internal testing environments where QA staff are ironing out bugs.
 
We also conducted a comparison of the two main services that English Wikipedia uses for copyvio detection: Google search (used by [[toolforge:copyvios/|Earwig's Copyvio Detector]]) and Turnitin (used by [[toolforge:copypatrol/en|CopyPatrol]]). The objective was to help us understand how different the two services are in terms of their results. I'll be assembling the results and posting that in a coming update. We will use that information to help decide which service to use for New Pages Feed, in addition to considerations around the usage limits for those services.
 
=== Update 2018-08-06: ready for community testing ===
 
Starting today, everyone is welcome to test out the Growth team's progress on the [[testwiki:Special:NewPagesFeed|'''New Pages Feed using Test Wiki''']]! This has been a long time coming, and our team is excited that you'll be able to get your hands on the work so far. Please make sure to read the "'''[[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#How to test|How to test]]'''" section below to configure your account. The idea here is that we want to get the reactions and thoughts of AfC and NPP reviewers on an ongoing basis to make sure that we continue to build something useful. Going forward, we'll continue to push updates of the software to Test Wiki for everyone to try out as soon as possible. I'll post here when there is something new to try.
 
==== Important notes about testing ====
 
* '''Rough edges''': since this is a testing environment, we'll sometimes be pushing code even before the team's own QA engineer has had a chance to thoroughly test. That means the work will have rough edges, and sometimes bugs will slip through. Please point them out to us so we can fix them! We're hoping that by putting this work in a place where the community can see it earlier, we'll save everyone time in the long run.
* '''Right now, you'll be able to try out these new capabilities:'''
** All drafts are in the New Pages Feed under the "Articles for Creation" toggle at the top.
** It is possible to filter to just those drafts of a given state in the AfC process (Unsubmitted, Awaiting review, Under review, Declined).
** It is possible to sort drafts by their "submitted date" and "declined date", in addition to their "created date".
** The New Pages Feed for NPP should behave exactly as usual, with no changes.
* '''These capabilities are not yet part of the testing environment, but will be in coming weeks:'''
** Listing the draft's AfC state and dates with its entry in the feed.
** ORES scores for "predicted issues" and "predicted class". This code is mostly written and in the process of being merged and reviewed.
** Copyvio detection. The code here is starting to be written. I will post a separate update about our plans and decisions on this part of the project.
 
==== How to test ====
 
* Look at, sort, and filter the [[testwiki:Special:NewPagesFeed|New Pages Feed]]. This can be done even without logging in.
* Try your AfC reviewing workflow with the AFCH gadget. To do this, you will need to do two things:
*# Log in to Test Wiki as an autoconfirmed user. I expect many of you who will want to do testing are already autoconfirmed in Test Wiki. If you find that you're not, or you are unable to use the AFCH gadget, let me know on my [[:en:User_talk:MMiller_(WMF)|User talk page]], and I will change your user group.
*# Turn on the "Yet Another AFC Helper Script" gadget under [[testwiki:Special:Preferences#mw-prefsection-gadgets|Preferences --> Gadgets]].
* The Test Wiki is currently populated with a couple thousand articles and a few dozen drafts that you'll be able to see in the feed. You are also welcome to create new drafts via the [[testwiki:Wikipedia:Article_wizard/CreateDraft|Article Wizard]]. Note that in Test Wiki, the Article Wizard [[testwiki:Wikipedia:Article_wizard/CreateDraft|is found at a different URL]] than in English Wikipedia.
* Since this is just a testing environment, the content or references in a draft are not important.
 
==== Giving feedback ====
 
* You can post any thoughts, ideas, or comments on [[:en:Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#Testing_feedback|this project's talk page]].
* You can also create [[phab:|Phabricator]] tasks. If you use Phabricator, you can either tag me or tag the Growth Team so that we see the ticket you create.
* Here are some of the questions we're hoping to learn about from this testing:
** Is this a better way to find and prioritize AfC drafts for review than the current method?
** Are there important parts of the AfC workflow that are not being captured?
** Have there been any undesired changes to the workflow for New Page Patrol?
** Would it be useful to add this capability to the New Pages Feed in English Wikipedia even ''before'' the ORES and copyvio elements are ready?
 
=== Update 2018-08-08: new date features in test environment ===
 
We've deployed a few changes to the [[testwiki:Special:NewPagesFeed|testing environment]]. Please check them out and [[:en:Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#Testing_feedback|let us know what you think]]!
 
* Each draft in the feed now lists its state ("Awaiting review", "Declined", etc.)
* Each draft in the feed now lists its "Submitted date" if it is of states "Awaiting review" or "Under review", or its "Declined date" if it is of state "Declined".
* The "Sort by" menu allows sorting by "Submitted date" only when states "Awaiting review" or "Under review" are selected, and allows sorting by "Declined date" only when state "Declined" is selected.
 
We have not yet done any work on the formatting of the data presented with each draft in its listing in the feed. Because it's a lot of dense information, we would like to hear any suggestions to make it more readable.
 
=== Update 2018-08-16: ORES categories now available for testing ===
 
The team has now deployed a major set of work to the '''[[testwiki:Special:NewPagesFeed|testing environment]]'''. Please check it out and [[:en:Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#Testing_feedback|let us know what you think]]:
 
* All pages in the New Pages Feed, whether on the "New Page Patrol" or the "Articles for Creation" side, are being given "Predicted class" and "Predicted issues" categories [[mw:ORES#Article_quality|using ORES models]].
** '''Predicted class''': Stub, Start, C-class, B-class, Good, Featured
** '''Predicted issues''': spam, attack, vandalism, no issues
* Those categories are listed in the feed with each page.
* The feed is also filterable by the categories, so, for instance, a reviewer could look only at pages predicted to be spam, or pages predicted to be C-class or better.
* As new edits are saved, the models are re-scored in realtime, so new edits should be reflected immediately.
 
In order to see how the models change with different content, it can be helpful to paste wikitext from other articles in the Test Wiki, noting in the edit summary which article it came from. Feel free to create new drafts with the [[testwiki:Wikipedia:Article_wizard/CreateDraft|Article Wizard]], and refer to the "[[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#How to test|How to test]]" section above for more details (or ask on the talk page).
 
A few of notes on outstanding work that we're still doing on this front:
 
* [[phab:T200582|Capitalizing the categories in the feed]]
* [[phab:T199277|More clearly displaying the filters selected]]
* We are also thinking about better ways to make the model categories more scannable and readable in the feed
 
=== Update 2018-08-22: Decisions on copyvio ===
As the Growth team has been working on adding AfC drafts and ORES to the New Pages Feed ([[testwiki:Special:NewPagesFeed|now testable in Test Wiki]]), we have also been planning how to add the first copyvio detection tool to the New Pages Feed. This post is about our plan to use [http://tools.wmflabs.org/copypatrol/en CopyPatrol] (and the [https://www.turnitin.com/ Turnitin] service) to accomplish this. Read below for the plan and background, and please speak up on the [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018#Copyvio plan feedback|talk page]] with your thoughts and reactions – the point, after all, is to build something that helps reviewers get their work done. We've also posted the [[:en:Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018/Copyvio_solutions_comparison_report|brief statistical analysis]] our team did as a part of this planning process.
 
==== Current plan ====
 
* Every page in the New Pages Feed, including both the NPP and AfC pages, will automatically be checked for copyvio using [[toolforge:copypatrol/en|CopyPatrol]], a tool built by WMF's Community Tech team, which in turn uses the [[:en:Turnitin|Turnitin]] service.
* If potential violations are found, the page’s listing in the New Pages Feed will be flagged with red text that says “Potential issues: copyvio”.
* That text will link out to the CopyPatrol interface, which shows the text of the page side-by-side with the text from the source where it was potentially copied from.  Reviewers can use that interface to look at the text in detail and decide whether there really is a violation.
* If the page is edited with a revision over a certain number of bytes, the revision will be checked and its indicator in the feed will be updated.  Essentially, that indicator will mean “one of this page’s revisions had a potential violation”.
* You can still use [[toolforge:copyvios/|Earwig's Copyvio Detector]] for additional checks, in exactly the same way that reviewers have used it for years.
 
Below is a quick mockup of what this might look like.
 
[[File:New Pages Feed copyvio mockup 2018-08-21.png|frameless|426x426px|This is a mockup (not actual software) of a potential configuration of the New Pages Feed in English Wikipedia.]]
 
You can see that in the third draft in the list, next to "Possible issues", "copyvio" is listed in red. This word is a link to the CopyPatrol interface, where reviewers can investigate potential violations. Below is a screenshot from CopyPatrol showing its existing interface.
 
[[File:Copy Patrol example 2018-08-21.png|frameless|426x426px|This is a screenshot from the CopyPatrol tool on 2018-08-21.]]
 
==== Background ====
Back when we were planning this effort in May, reviewers participating in the discussion seemed to agree that pre-checking pages for copyvio would help increase reviewing efficiency. The idea is that reviewers could quickly find those pages that are most likely to have copyvio problems, and would save time by not needing to wait as a copyvio tool runs for each page that a reviewer works on.
 
As the Growth team has been working on the other two major parts of this New Pages Feed upgrade (adding AfC drafts, and adding ORES scores), we have simultaneously been debating the right way to approach the copyvio part.  This has been difficult, because unlike with ORES, we rely on third-party services for copyvio detection, like Google (via [[toolforge:copyvios/|Earwig's Copyvio Detector]]) and Turnitin (via [[toolforge:copypatrol/en|CopyPatrol]]).  Integrating third-parties into the Mediawiki software adds technical complexity and risk to our software, since we won’t be able to completely control the services that we’ll be relying on.
 
We have put a lot of thought into this, and we’ve decided to add copyvio detection in the New Pages Feed using CopyPatrol / Turnitin.  The main alternative we considered is Earwig's Copyvio Detector / Google. There are three main reasons we have decided to build with CopyPatrol / Turnitin.
 
#'''Performance''': our team analyzed the performance of the two services, and we did not find evidence that one service is better at detecting copyvio than the other (though deeper analysis would likely shed more light on the question). The pages they flag are somewhat, but not highly correlated, suggesting that in the long run, the two services may be ''complementary'' for finding copyvio – though integrating with both is out of the scope of this project. To read our analysis in depth, please [[:en:Wikipedia:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018/Copyvio_solutions_comparison_report|see this page]].
#'''Technical''': Turnitin’s API is built specifically for checking for copyright issues, making it straightforward to work with.  In other words, all one needs to do with Turnitin's API is send it the text on a page, and it tells you what percent of that page has been found elsewhere on the internet or in its databases. We will also be able to easily integrate with the existing bots and interfaces underlying CopyPatrol.  These things together mean that we’ll be able to deploy something useful in a much shorter timeframe than if we were working with Earwig's Copyvio Detector / Google.
#'''Resources''': Because CopyPatrol already checks every substantial edit in English Wikipedia, including new page creations, integrating with the New Pages Feed will add no additional load to our Turnitin credits.  We’ll just be surfacing in the New Pages Feed those instances that CopyPatrol is ''already'' finding. This would not be the case with the Earwig / Google tool, in which we would be substantially taxing our Google credits and limits.
 
==== Additional details ====
 
* When our team was first learning about Turnitin, we thought that Turnitin only compared pages to academic journals and things of that nature. We've learned that it actually does compare pages to websites, and even to archived websites that no longer exists. This contributes to its high coverage.
* Because CopyPatrol checks new revisions above a certain size, the New Pages Feed will flag a page if ''any'' of its revisions have had copyvio. That means that if the violating text is removed from the page, it will still be flagged in the feed. We are hoping this is not an issue because such a page, having had its violating text removed, would also be patrolled in the same session and therefore no longer be in the feed.
* We are building the underlying architecture here so that other copyvio services could be plugged into it in the future (such as Earwig's / Google). Though using more than one service is out of scope for this project, the technical components will be in place to make it possible at some other point.
 
=== Update 2018-08-30: target date for first upgrade to production is September 17 ===
Now that reviewers have had a few weeks to test out the changing [[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#Update 2018-08-06: ready for community testing|New Pages Feed in Test Wiki]], we want to get some of the improvements out into the real world so they can help reviewers. Specifically, we're planning on deploying the first of the three parts of this project to English Wikipedia on September 17: adding the "Articles for Creation" side to the feed. AfC reviewers would then be able to browse drafts in the feed, filter on their states, and sort by submitted and declined dates. This would leave the classic NPP workflow unchanged, except for the toggle button for "AfC".
 
As community reviewers tested the feed in Test Wiki, a couple of bugs and ideas were surfaced that our team has largely addressed, and it does not seem like there are major blocking issues. That said, we know that there is more to making a new feature successful than simply flipping it on. These are some of the things that I think would be good to address, and I'm looking for thoughts from reviewers about how best to them:
 
* Training and notifying AfC reviewers on using the "AfC" side of the New Pages Feed.
* Altering the text at the top of [[Special:NewPagesFeed]] to accurately reflect that it is used for multiple purposes now.
* Updating help documentation and screenshots in the NPP and AfC projects.
 
I am happy to help with any of the documentation or screenshots.
 
And then following September 17, here are some tentative dates for rolling out the second and third parts of this project (these may change, but give a sense of the pace of our work):
 
* October 1: adding ORES models
* October 15: adding copyvio
 
Let's [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018#Production rollout discussion for adding drafts to the feed|discuss on the talk page]] if there are any concerns, and what the correct order of operations is here so that we can start getting the useful new features into the hands of reviewers!
 
=== Update 2018-09-06: copyvio detection ready for testing ===
A couple weeks ago, [[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#Update 2018-08-22: Decisions on copyvio|we posted above]] on our plans for integrating copyvio detection to the New Pages Feed. It seemed like the plan made sense to reviewers who read it, and so we've '''[[testwiki:Special:NewPagesFeed|implemented it in Test Wiki so that reviewers can try it out]]'''. Please check it out and [[:en:Wikipedia_talk:WikiProject_Articles_for_creation/AfC_Process_Improvement_May_2018#Copyvio_testing_feedback|let us know what you think]].
 
Here's how it works:
 
# CopyPatrol checks all diffs in the Article and Draft spaces (including the first revision of a page) that have 500 bytes or more (excluding wikitext markup). 500 bytes is about three sentences.
# For any pages that have a diff that got flagged by CopyPatrol, that page will say "Potential issues: Copyvio" in the New Pages Feed, which will include a link to CopyPatrol. The feed can be filtered to just those pages with the flag.
# Reviewers can click on that link to inspect the potential violation, and to see whether it has already been resolved by someone else doing CopyPatrol work.
 
Some notes:
 
* CopyPatrol does not scan pages in Test Wiki. So to simulate how this will work in English Wikipedia, the six pages flagged as "Potential issues: Copyvio" in Test Wiki are actually linking to CopyPatrol for pages in English Wikipedia that have the same names. Unfortunately, although it was possible to do so with ORES models, it will not be possible for reviewers to tweak the content in Test Wiki pages in order to get a sense of how CopyPatrol flags pages. But you can look at the six English Wikipedia pages to see the actual diffs that were flagged.
* When a page has the "Potential issues: Copyvio" flag in the feed, it means that at least one of the substantial (over 500 bytes) revisions to the page has been flagged by CopyPatrol at some point.
* When issues have been resolved in CopyPatrol, the indicator will not disappear from the feed. Once a page has been flagged for potential copyvio in the feed, it will stay that way.
* If a page has very little content (under 500 bytes), it will not get scanned by CopyPatrol. If this seems to be problematic, we can discuss (along with the CopyPatrol community) altering the threshold.
* CopyPatrol does not scan User space pages, so no User space pages will have the "Potential issues: Copyvio" in the feed.
 
=== Update 2018-09-27: deployment schedule update ===
A [[Wikipedia:WikiProject Articles for creation/AfC Process Improvement May 2018#Update 2018-08-30: target date for first upgrade to production is September 17|previous update]] laid out our team's schedule for deploying the three parts of the new feature set to the New Pages Feed in English Wikipedia. This is an update on how deployment has gone so far and what the schedule holds going forward.
 
As planned, we did deploy AfC to the New Pages Feed on September 17. The reason we've not announced that the feed is ready to be used by AfC reviewers is that we discovered a set of bugs and issues that we've been fixing since that date. Since AfC reviewers already have a functioning workflow, we would prefer that they try out a well-functioning new workflow rather than a buggy new workflow, and so I have not yet declared victory at the AfC discussion page. I expect that the feed will be in shape for that at the beginning of next week, and at that time, I'll include brief instructions for how to use the feed for AfC review. However, for those of you who have been following along on this project, you can see that the [[Special:NewPagesFeed|New Pages Feed]] now has its "Articles for Creation" side. We are still fixing a few UI bugs having to do with the sorting and filtering menus, but you are welcome to try it out.
 
Here's what's coming up:
 
* '''October 1 or 2''': final UI bugs fixed in New Pages Feed having to do with adding AfC. AfC community can then start using the feed. I will post here and at [[Wikipedia talk:WikiProject Articles for creation|AfC talk]] when this is ready.
* '''October 4 or the beginning of the following week''': ORES scores added to both the NPP and AfC sides of the feed.
* '''October 15 or that week''': copyvio detection added to both the NPP and AfC sides of the feed.
 
For those interested, here are the details on the first deployment and its challenges:
 
After deploying the feature itself, we needed to spend a few days actually populating the feed with the 40,000+ drafts in English Wikipedia, along with their states ("Awaiting review", "Declined", etc) and their submitted and declined dates. Those dates have proved to be difficult, because the Mediawiki database does not retain a record of when templates and categories were applied to pages. We've approximated the submitted and declined dates using the most recent edit date, but [[phab:T204889|we have a task open]] to make them more accurate if reviewers find that the dates that are currently in the feed are not close enough for their work. We are still in the process of fixing a couple of UI bugs having to do with the default and sticky values for sorting and filtering selections: [[phab:T205168|T205168]] and [[phab:T205324|T205324]].
 
=== Update 2018-10-01: AfC reviewers can now use the New Pages Feed! ===
We fixed the bugs that I mentioned in the previous update, and so now the New Pages Feed is ready to be used for AfC review! [[Wikipedia talk:WikiProject Articles for creation#New Pages Feed ready for use by AfC reviewers|Here is the announcement]] on the AfC discussion page. This is a great milestone for this project -- it's our first of three releases (ORES and copyvio are the next two) and will hopefully give AfC reviewers a tool that helps them prioritize their work, and ultimately get high-quality drafts into the article space faster. Thank you all for weighing in, following along, and helping us get to this point!
 
Our team is quickly turning our full attention to adding ORES scores to the feed (for both AfC and NPP) this Thursday, October 4, or in the days that follow. I will post additional updates as that initiative unfolds. As always, please [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018|comment on the talk page]] with any thoughts! -- [[User:MMiller (WMF)|MMiller (WMF)]] ([[User talk:MMiller (WMF)|talk]]) 20:52, 1 October 2018 (UTC)
 
=== Update 2018-10-05: ORES now added to the New Pages Feed ===
Our deployment to add two sets of ORES scores to the New Pages Feed went smoothly yesterday. All pages have scores for both models, and they seem to be the scores that we expect them to have. We've posted [[Wikipedia talk:WikiProject Articles for creation#"Predicted class" and "Potential issues" added to New Pages Feed|here at AfC talk]] and [[Wikipedia talk:New pages patrol/Reviewers#Deployment complete -- next steps|here at NPP talk]] to announce and explain the changes. In terms of the objectives of this project, we're excited about this deployment because AfC reviewers will now be able to do things like filter the New Pages Feed to just those drafts that are predicted to be "B-class" and above, thereby accelerating the rate that high quality content makes it to the article namespace, and potentially accelerating how quickly a good-faith newbie gets some positive feedback.
 
To see interesting counts of how many pages in the feed ended up with which predictions, feel free to [[phab:T203286|check out this Phabricator task]].
 
Our team will now turn our attention to the third, and final, part of this project: adding copyvio detection to the feed. I will be back with more updates as we plan for that deployment the week of October 15 or October 22.
 
=== Update 2018-10-17: Copyvio now available for testing ===
Over the last two weeks, the team has been working to deploy copyvio detection, the third and final component of this project, to English Wikipedia. This has involved the [[Wikipedia:Bots/Requests for approval/EranBot 3|community-driven bot approval process]] for a modification to EranBot 3, which backs CopyPatrol and which now serves information to the New Pages Feed. This process precipitated some changes to the software, and [https://en.wikipedia.org/wiki/Special:Log?type=pagetriage-copyvio&user=&page=&wpdate=&tagfilter= the creation of a new log], which shows exactly which pages and revisions are being flagged as potentially violating.
 
The new feature is now deployed for a trial period in which NPP and AfC reviewers can try out the feature at [https://en.wikipedia.org/wiki/Special:NewPagesFeed?copyvio=1 this special URL] (as opposed to the [[Special:NewPagesFeed|regular URL]] for the New Pages Feed). This trial period will last into next week, at which point the team will decide when to release the feature at the regular URL. The ability to test has been announced [[Wikipedia talk:WikiProject Articles for creation#Copyvio detection ready for testing in New Pages Feed|here at AfC talk]] and [[Wikipedia talk:New pages patrol/Reviewers#Copyvio detection ready for testing|here at NPP talk]]. We'll expect to receive feedback at those talk pages, or on [[Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018#Copyvio detection ready for testing in English Wikipedia|this project's talk page]]. When the feature is released, we'll post recommendations for how to use it.
 
=== Update 2018-10-30: Copyvio detection now added to the New Pages Feed ===
Yesterday, the team deployed copyvio detection via CopyPatrol to the New Pages Feed. This is the third and final component of this project (along with adding drafts to the feed, and adding ORES scores to the feed). The trial period for the bot that backs this feature went well, and the results of that bot trial are [[Wikipedia:Bots/Requests for approval/EranBot 3|archived here]]. Only [[phab:T207345|one minor issue]] was discovered during the trial period.
 
Our testing shows that all that pages in the New Pages Feed that have been flagged by CopyPatrol are also flagged in the New Pages Feed, with links that go between them. Reviewers will now be able to use this information to further prioritize and triage pages waiting for NPP and AfC review in the feed.
 
We've posted [[Wikipedia talk:WikiProject Articles for creation#Copyvio detection now in New Pages Feed|here at AfC talk]] and [[Wikipedia talk:New pages patrol/Reviewers#Copyvio detection now in New Pages Feed|here at NPP talk]] to announce and explain the changes.
 
Since this is the final component of this project, we're going to keep an eye on it for the next week to make sure everything continues to work as expected. Then we will post to wrap-up the project.
 
=== Update 2018-12-06: project wrap-up and final post ===
Now that all the components of this project have been in production for a month without any unsolved issues, it is time to wrap this project up. The effort to improve the efficiency of the Articles for Creation process began in April 2018 and the work was completed in October 2018, seven months later. During the design process, the project expanded from only relating to Articles for Creation to also involving the New Page Review process through work on the New Pages Feed. We were in close contact with both communities throughout the process, and had great discussions where important consensus was built. We're grateful for the community members who tested the software at every step of the development process so that we could be confident that we were building something valuable.
 
Along the way to adding AfC, ORES, and copyvio detection to the New Pages Feed, we also fixed many bugs in the feed and improved existing components of the feed to make more sense for the contemporary reviewing processes. [[mw:Page_Curation#Upgrade_2018:_adding_AfC,_ORES,_and_copyvio_detection|This mediawiki.org page]] has been updated to document the 2018 improvements.
 
Thank you to the AfC and NPP community members who spent their volunteer time thinking about this project and helping us build it.