Jump to content

Proposal for Policy on overuse of bots in Wikipedias

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Yekrats (talk | contribs) at 17:31, 8 January 2008 (New proposal). It may differ significantly from the current version.
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

I would like to make the following proposal for a Policy on overuse of bots in Wikipedias, in light of recent perceived problems by overuse of bots. It should reflect that limited use of bots is supported. However, it must define what is the overuse of bots, and give solutions to that problem.

Currently, the scope of this proposal only extends to the Wikipedia projects. However, if it is adopted and seems to be successful, it may be used as a model for other Wikimedia projects as well.

History of the perceived problem

  • During 2007, there was rapid growth in the Volapuk and Lombard Wikipedias, supported by a massive flood of additions (more than 100'000 articles each) from bots. Some pointed out that some of these new articles had errors. Both communities were relatively small, Volapuk having only one active user at the time of the bot-uploads.
  • In October there was a request closing Volapük Wikipedia, which has resulted in a Keep decision with reasons cited such as the historical impact of Volapük.
  • The Lombard Wikipedia had among other issues a similar problem with bot generated articles (and therefore also had a request to close it). Although that discussion is still ongoing, a faction of the Lombard wikipedians voted for a moratorium on further bot use, and deleted most of their empty bot additions from about 117'000 articles down to about 14'000.
  • On December 25, 2007, a new request was made to cut the Volapuk's bot-created articles and move it to the incubator. As of this writing, that discussion is ongoing. However, during the discussion, Jimbo Wales (acting as an nonvoting advisor) suggested that all the bot additions be removed from VO:WP, and that VO:WP be kept open.

Is a Bot-heavy Wikipedia ("Botopedia") a problem?

First I would like to say that bots can be useful tools, and I am not against their use. They can make tedious jobs easier, update information and leverage the work a Wikipedia does. Pretty much every Wikipedia uses bots to some extent. However, I propose that overuse of bots is harmful to the Wikipedia itself, as well as the wider body of Wikipedias.

In the following analysis, I do not mean to pick on the Volapuk and Lombard Wikipedias, but they are the most current obvious examples we have. However, there are probably others out there that fit the model of a Wikipedia that has relied too heavily on bots.

  1. Harm to the image of Wikipedia. Like it or not, and whether or not it is fair or accurate, people look at the number of articles as a chief indicator of the health and activity of a Wikipedia. When they see a large number of articles, and then actually explore it and find the numbers to be gimmicked, that hurts the reputation of all Wikipedias.
  2. Bot expansion is being used for advertising and political purposes. Volapuk admin Smeira himself said that he uploaded a large number of bot-articles on the Volapuk Wikipedia to advertise it. Smeira said, "I thought I could try to get some new people interested in learning the language and contributing by doing something a little crazy -- like increasing the size of the Volapük wikipedia as fast as I could..." At the Lombard Wikipedia, it seemed that a similar ploy was used to lend legitimacy to the Lombard language-rights movement. I contend that Wikimedia should not be used for political or advertising reasons. It is an encyclopedia, not an advertising service or political platform.
  3. Overuse of bots is antithetical to the goals of the WMF. According to the WMF, "The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free content license or in the public domain, and to disseminate it effectively and globally." (emphasis mine) Although, bots may be used as a tool, primarily the focus should be on people doing the work. When a community comes together to build an article, the community gets a sense of accomplishment from coming together to achieve goals. Overuse of bots robs a community of that accomplishment.
  4. Put the brakes on oneupmanship. Some Wikipedians might see the list of Wikipedias as a numbers game, and it is a competition which must be won. That may be a wrong attitude, but it's human nature, especially when dealing with nationalistic things like languages. Thus, bots are used to raise in the ranks faster, sacrificing quality for quantity, harming the reputation of all Wikipedias.
  5. Jimbo Wales himself supports limiting bot-heavy Wikipedias. He advised during the "radical cleanup of the Volapuk Wikipedia". In fact, he proposed to delete all bot additions out of the Volapuk. I must respectfully disagree, thinking that it would be wrong to single out the Volapuk for a tool that almost every Wikipedia uses. However, I so agree that Volapuk has gone too far.
  6. Cookie cutter problem. Overuse of bots causes articles to look alike, and be only about a small set of subjects. For example, most of the content in the VO:WP right now consists of two or three sentence stubs about a small geographic location: towns and communities. Furthermore, although they can create articles far faster than a human, they can also make errors much faster than a human. Several of the robot stubs have errors, due to the bot not realizing it was messing something up. Problems have occurred with vestiges of the old copied templates messing things up: For example, both the Lombard and the Volapuk had large amounts of English text, due to bot-copying errors. An egregious example of this would be in the Volapuk Wikipedia, in which a search for a relatively common English word yielded 438 articles composed of mostly English text (as of this writing).
  7. Overuse of bots can extend a Wikipedia beyond the community's ability to maintain it. A human is likely to feel a sense of responsibility for articles they create on Wikipedia. Humans also generally care about what the final product looks like. A robot feels no such urge, so articles can be ugly, sparse or totally messed up, and nobody will care. Because there is such a high article to editor ratio, it's possible that articles may never be looked at or touched by a human. For example, it is estimated there are 20-30 Volapuk speakers in the world, less than 10 active in VO:WP, who must tend over 110'000 articles. Wikipedias which grow more organically (in proportion to the size of the community) are able to control and support their articles better.

What limits should be set?

I am defining a "human article" as an article which has been initiated by a human, and a "bot article" as an article initiated by a bot. Although this may be generalizing a bit, I've chosen this definition for a few reasons. First, it's simple and straightforward, thus it is a parameter which could be easily evaluated.

Additionally, I am also trying to avoid a "microedit brigade" scenario. I could see where a bot creates a bunch of articles, and a human or a team of humans makes a small inconsequential edit on each of them to make them "touched by a human". By "microedit" I mean just a little piece of information-- a link, template, category, etc. -- which just as easily could have been done by the initial bot. It would be difficult to determine if an edit is "inconsequential" or not, so that is a gray area which I would like to avoid.

I realize it's not a perfect system. There will be some terrific articles initiated by bots, and some lousy articles initiated by humans. There will be some articles initiated by humans, and completely overwritten by bots, and some bot articles completely overwritten by humans. However, I would consider most of those scenarios to be outliers. I am speaking about generalities here, and such details are not that important to the end-result.

Jimbo Wales suggested that ALL of the robot articles be cut back out of the Volapuk Wikipedia. I would suggest that would be too harsh of a limit. Besides, to limit one Wikipedia's bot use and not others seems to be unfair.

I am proposing a 3:1 ratio of BOT ARTICLES to HUMAN ARTICLES. Thus you are allowed a MAXIMUM of 75% of your articles to be bot-initiated. If your Wikipedia has 1000 articles initiated by humans, you are entitled to 3000 additional articles coming from bots. I think even this may be too many bot articles, but at least it is a limit. And if it seems too lax, we can always change the ratio later.

By proposing this ratio, please do not misunderstand. It is not suggested that Wikipedias should have a 3:1 ratio of bot-articles to human articles. If you are in a Wikipedia with a small bot-to-human ratio, great! Remember, a human-created article is almost always superior to the bot-created one.

Remedies for noncompliance

The remedies which have been proposed so far have been closure of the Wikipedia, cutting all of the bot articles, and moving the affected Wikipedia to the Incubator. All of these remedies seem excessively harsh to me, and seem punitive. Punishment is not the answer, I don't think. Our best bet is to warn of a violation, then allow a reasonable amount of time to self-correct, then outside corrective action should be taken. Penalties against a Wikipedia should only be punitive in extreme cases of ignoring the established policy.

  1. Warning that a Wikipedia is violating policy.
  2. Allow a reasonable length of time to initiate action to correct it. (30 or 60 days?)
  3. If refused, stewards will delete bot articles starting with the most recent ones.
  4. Punitive measures in extreme cases.

Once a Wikipedia is brought back into compliance, it is allowed to undelete articles in the same 3:1 ratio, as long as it continues to stay in compliance.

Other issues

How do we know which edits are bots and which are humans?

How are we to determine which editors are bots? Are all editors that doesn't have a bot flag, doesn't have 'bot' in the user name and doesn't tell it is a bot on the user page, non-bots? Or are we going to for instance set a limit of edits per day (or hour, week, month), and treat those editors which are making more edits as bots?

Procedural questions

How should a Wikipedia be warned, and who should do the policing? Is this a subject that should be brought up here or in the individual Wikipedia, or both? What should we do about Wikipedias that are currently not in compliance? (ie. Volapuk?)

Votes

Because of the complex nature of this proposal, I have broken it down into three essential parts:

  1. Should there be limits to bot-heavy Wikipedias?
  2. Is the above suggested 3:1 ratio fair?
  3. Is the above suggested remedy -- deleting bot-additions beyond the allowed limit -- fair?

I have voting in three parts: Full support, partial support, and opposition. If you only partially support it, could you please define what parts you are against, and why? We might be able to adjust the proposal to suit your concerns. This vote may or may not be official. Someone with more wiki-wisdom than me would have to say if it is or not.

Support Fully

  1. Support Support Proposer. Yekrats 17:31, 8 January 2008 (UTC)[reply]

Support Partially

Please indicate which sections of the proposal you agree or disagree with.

Oppose

If you oppose this proposal completely, please indicate why.


Comments and additional suggestions

I welcome your comments and suggestions on this difficult issue. However, I have tried to be fair in constructing it. It is not fair to single out one Wikipedia when making this kind of proposal. I would be curious to know if any other Wikipedias besides the Volapuk violate this proposal as it stands now.

I would like to end by saying that I feel no ill-will toward the Volapuk or any other Wikipedia. In fact, I am an admin at the Esperanto Wikipedia, so I have a fondness for conlangs. However, I think the measures that I outline above are necessary to make the Volapuk more robust, and help to protect the image of all Wikipedias. Additionally, it will help to keep the rampant oneupmanship in control. I ask that you try to keep a level head in the ensuing discussion. Thank you for your attention. -- Yekrats 17:31, 8 January 2008 (UTC)[reply]