Wikidata:Requests for permissions/Bot/MatmaBot 2
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved Legoktm (talk) 22:17, 15 September 2013 (UTC)[reply]
MatmaBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Matma Rex (talk • contribs • logs)
Task/s: Importing Polish-language (pl) descriptions for biographies based on Polish Wikipedia index of biographies some crazy people maintained by hand since 2004. This will affect approximately 100k pages.
Function details:
- The bot will parse the pages from that category (e.g. pl:Noty biograficzne - Ob) and upload the descriptions here.
- If a descriptions exists, it will be overwritten (in my experience almost no articles have descriptions in Polish, and these that do have pretty crappy ones; the ones in that index are basically perfect).
- If an article doesn't have a Wikidata entry, it will be created.
The list also includes birth and death dates – I am not going to upload these. It might be possible to translate the descriptions with relatively little manual work (they are rather nicely structured), or to extract information that could be turned into claims, but I'm not going to explore that now (but I'd be happy to help anybody who would).
This is one-time task, I'm planning to later replace the lists with autogenerated ones (based on the information I'm going to upload here as well).
Matma Rex (talk) 18:06, 10 September 2013 (UTC)[reply]
- Can you do a few test edits? ~100-200 should be good. Do you have an estimate about how many descriptions you might be overwriting? Just curious. Thanks! Legoktm (talk) 19:14, 10 September 2013 (UTC)[reply]
- Sure. I uploaded descriptions for people with last names starting with Ob and Qu, 127 pages total[1]. Out of these two already had a description (one had an identical one as the one I tried to upload, so no changes were made (Q920053); the other was Obama Q76); I don't have a better estimate than 2/127 :). Matma Rex (talk) 12:34, 11 September 2013 (UTC)[reply]
I'm not very comfortable with overwriting descriptions, but my greatest concern is about item creation: most of recently-bot-created items have been merged and deleted (even automatically). So, I would oppose this task if no attempts are made to link Polish Wikipedia pages to already-existing items. --Ricordisamoa 17:12, 11 September 2013 (UTC)[reply]
- Trying to "translate" names between two languages with different transliteration and transcriptions rules for just about every other language is the perfect recipe for bad time and I'm not going to do that. It seems like almost all of the relevant pages have WD items already anyway. Matma Rex (talk) 17:36, 11 September 2013 (UTC)[reply]
- At this point, it doesn't seem that too many items will be created and if it does become a problem later on, we can revisit this issue. Legoktm (talk) 22:12, 13 September 2013 (UTC)[reply]
I have a larger project depending on this on Polish Wikipedia; if nobody else comments, I'm going to just go ahead and upload these next week. I guess I could skip the overwriting-if-existing part and not overwrite instead if you insist. Matma Rex (talk) 20:13, 13 September 2013 (UTC)[reply]
- You mentioned that there would be very few anyways, so maybe just overwrite and log to somewhere in userspace indicating which ones were overwritten? Legoktm (talk) 22:12, 13 September 2013 (UTC)[reply]
- Also, could you provide an estimate of how many items is the bot going to create? --Ricordisamoa 22:40, 13 September 2013 (UTC)[reply]
Update: in addition to the ones already handled in the test run, this task will affect up to 66562 items. I am doing a dry run right now to determine how many items will be created and how many descriptions will be overwritten, will post results in the evening (UTC). The code for this task is at https://github.com/MatmaRex/bio-index/blob/master/upload-index.rb (and other files in that repo, but this is the primary one), do tell me if something seems amiss at a glance. Matma Rex (talk) 11:25, 14 September 2013 (UTC)[reply]
- 424 descriptions will be overwritten, 660 new items will be created, 455 descriptions are already the same here as in the source. Full log: [2]. Matma Rex (talk) 13:08, 14 September 2013 (UTC)[reply]
- Thanks! Legoktm (talk) 22:17, 15 September 2013 (UTC)[reply]