Jump to content

Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Doncram (talk | contribs) at 05:46, 9 July 2023 (→‎change whichever bot applies WikiProject United States banner: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Bots noticeboard

    Here we coordinate and discuss Wikipedia issues related to bots and other programs interacting with the MediaWiki software. Bot operators are the main users of this noticeboard, but even if you are not one, your comments will be welcome. Just make sure you are aware about our bot policy and know where to post your issue.

    Do not post here if you came to


    Internet Archive bot cruft

    I'm not exactly sure where to bring this up, but the Internet Archive bot recently made an edit to the article on the Roman Republic which did seemingly nothing but add archive URLs for live sites, expanding the size of the article by over six thousand characters. Is this actually okay? This cruft negatively affects editing (I use syntax highlighting regardless) and does not seem to offer any meaningful improvement to the articles given all the links are still live. Ifly6 (talk) 14:51, 2 July 2023 (UTC)[reply]

    This is not a bot issue. It's an edit manually made by Billjones94. Take it up with them. * Pppery * it has begun... 14:53, 2 July 2023 (UTC)[reply]
    Essentially, what Pppery said. Headbomb {t · c · p · b} 14:53, 2 July 2023 (UTC)[reply]
    Would not that edit text be generated by the bot (responding to a human request)? Or am I misunderstanding how this IA bot management console works? Ifly6 (talk) 14:56, 2 July 2023 (UTC)[reply]
    You are understanding how the tool works correctly. The current consensus is that that feature is only done when operated manually by a human on one page at a time, and that it's sometimes useful and sometimes not. You evidently think that it isn't in that case, which is a content dispute with the human who chose to operate it there, not the bot. * Pppery * it has begun... 15:20, 2 July 2023 (UTC)[reply]
    I went to the linked article, saw that someone had undone your reversion, which I agree with, although I usually don't bother unless a lot of cruft (>10k) has been added, and I restored your reversion. Dhtwiki (talk) 03:06, 3 July 2023 (UTC)[reply]
    Thanks. If you could copy your edit remarks onto Talk:Roman Republic I would appreciate it, since it would centralise any further discussion (if there really has to be any). Ifly6 (talk) 03:15, 3 July 2023 (UTC)[reply]
    How aggressively archive links should be added to citation templates is an open question, I think. I've seen people edit war over it before and I get the impression that folks are divided on the issue. Strictly speaking, they are not necessary for live links, because Wikipedia:Link rot#Automatic archiving says that Links added by editors to the English Wikipedia mainspace are automatically saved to Wayback Machine within about 24 hours. So the actual archiving is done automatically, and the archive links can be added at any time after. Pressing the button in the IA Bot Management Console doesn't actually do any new archiving. On the other hand, Visual Editor users can't even see the extra code generated by these archive links, probably leading some folks to think that adding these archive URLs is harmless. When in doubt, follow WP:BRD I guess. cc UndercoverClassicist. –Novem Linguae (talk) 04:19, 3 July 2023 (UTC)[reply]
    Re the argument that sometimes the text changes underneath the link, it's not a wholly baseless argument. However, I think it laughably weak when it comes to Jstor, Google Books, and other like services: these are services that (almost certainly) aren't going anywhere and, even if archived, are of little value due to pages and paywalls. Ifly6 (talk) 04:28, 3 July 2023 (UTC)[reply]
    Jstor URLs in a {{cite journal}} should definitely not have archives added by semi-automatic tools: we usually cite an article, and archiving JSTOR's page about that article is pointless. —Kusma (talk) 07:18, 3 July 2023 (UTC)[reply]
    Yeah I'd be in favour of guidance against adding archives to journal or book links. I think the larger issue here is actually that it's too common to define the entire citation in the running prose, rather than using list-defined references inside {{reflist}} or using short citations combined with refbegin and refend in a ==Sources== subheading. If the citations are defined in their own area, no amount of cruft makes editing the prose any more difficult. Folly Mox (talk) 20:12, 3 July 2023 (UTC)[reply]
    I also would be very much in favour of guidance trending against archival links added to Jstor and Google Books. The arguments for archival are largely non-existent: the chance of link rot is extremely small for these services, you cannot verify against any changes, there should be no changes anyway. At that point it is just cruft. Ifly6 (talk) 03:08, 4 July 2023 (UTC)[reply]
    Link rot on Google Books is significant. It's an unstable service, for Wikipedia purposes. Wayback links are not the answer either as they often don't correctly capture the pages. This is a really big problem that has been building for years, there are over a million GB links on Enwiki alone, and no maintaining them. Requires special code for GB peculiarities. -- GreenC 04:01, 4 July 2023 (UTC)[reply]
    I wasn't aware of that, but shouldn't the citation information be sufficient even if the gbooks link rots? Folly Mox (talk) 04:04, 4 July 2023 (UTC)[reply]
    Yes. URLs to books are entirely a convenience function and not generally necessary if we have a full citation. Izno (talk) 17:59, 4 July 2023 (UTC)[reply]
    I also wasn't aware of this. It doesn't seem to me on first glance as if there would be any way at all of solving this issue though. The material that we would want to use (ie material that does not fail WP:AGEMATTERS) is almost certainly in copyright so Google Books won't have too much of it anyway. What sort of special code are you discussing? Are the Google Books links changing in an unpredictable manner rather than just disappearing? Ifly6 (talk) 04:21, 4 July 2023 (UTC)[reply]
    Well a number of things can happen: it can simply vanish, change IDs with no redirect, the ID works but goes to a different book, the book still works but the page number link no longer works. There are some other things I forget. Scraping GB is hard because it's not always consistent, easy to make mistakes. Like you scrape one day and it does one thing, the next day it does something else. -- GreenC 04:35, 4 July 2023 (UTC)[reply]
    Hmm, those are some major problems. Is there any solution? Or is it just Snafu? Do you think URLs should just be removed from book citations if they are so susceptible to not working and regardless un-archivable? Ifly6 (talk) 04:39, 4 July 2023 (UTC)[reply]
    Well, there are two main scenarios: 1. a link to a book's default home page; or, 2) a link to a book page number/search result. In the first case if the links is "hard dead" ie. 404 or otherwise non-existent, something should be done. In case 2), if there is no usable content for verification purposes and the book has no ability to search inside then something should be done. What to do? Either delete the URL entirely, or replace it with a different service provider. -- GreenC 04:46, 4 July 2023 (UTC)[reply]
    Just to make it clear, I've replied on the talk page for the article, but I'm basically happy with where we've ended up. UndercoverClassicist (talk) 06:30, 3 July 2023 (UTC)[reply]

    Further steps?

    I'm making this section to possibly discuss further steps? It feels like there are a number of different solutions etc put forward here:

    1. Stop putting in archive links for printed sources that shouldn't change under them or are otherwise paywalled (making the archives useless)
    2. Have some bot or something change the template for Jstor to extract the stable ID and put it in the {{cite journal}} |jstor= field
    3. Remove Google Books URLs because they are not stable

    Do tell me if you think I've mischaracterised some positions so far. Do people here think some, any, or all of these ought to be done?

    I think we ought to do all of them. The first two seem like clear improvements that a bot could do (clean up and shorten templates with no loss of amenity); the last seems like a loss of amenity which is minimal when searching by ISBN should still pull up the right book. Ifly6 (talk) 17:07, 4 July 2023 (UTC)[reply]

    • Support 1 and 2, oppose 3. The convenience of usually being able to click on a Google Books URL and read the cited source outweighs, in my mind, the inconvenience of occasionally finding that it doesn't work. However, when those links no longer work, they should of course be tagged as dead links just like any other URL. I'm not sure if this is something a bot could do; I'd suspect probably not at this stage? UndercoverClassicist (talk) 17:13, 4 July 2023 (UTC)[reply]
    • Support 2, Oppose 1 and 3. I don't see the point of having the JSTOR link in the URL when |jstor= exists, I feel the same about Worldcat links when |oclc= exists. However if I remember the last thread about Worldcat links roughly sided with keeping them. Although archived links to paywalled sources might seem redundant they sometimes contain useful information on replacing the non-archived URL if it becomes dead. The fact that Google Book links sometimes change is a reason to replace them with a current URL rather than removing them. They can be far to helpful to consider removing them on mass. -- LCU ActivelyDisinterested transmissions °co-ords° 18:25, 4 July 2023 (UTC)[reply]
    • Support 2, oppose 3 like Undercover Classicist, unsure about 1. (How do we identify the sources that should not have archive links?) —Kusma (talk) 18:34, 4 July 2023 (UTC)[reply]
    • I don't think this is the right place to be having this discussion. It is not a critical issue to discuss the changes of interest. Either the talk page of IABot or VPM would be better. Izno (talk) 18:35, 4 July 2023 (UTC)[reply]
    • Some suggestions: For #1 a Phab ticket request for IABot not to archive Google Books. I think it already has this feature but you can also request such links be removed from the IABot database. For #2 a feature request for Citation bot, is the right bot for cite journal - it might already do it. For #3 the situation is not so bad to remove them all. -- GreenC 20:37, 4 July 2023 (UTC)[reply]
      I'm pretty sure anything reliant on Citoid (which I think includes Citation bot) will produce a |jstor= (or PMID, OCLC, SCID, DOI or whatever) if one is available. The trick is getting IABot to ignore archiving for any template containing one of those link rot resistant stable identifier parameters. Folly Mox (talk) 03:10, 5 July 2023 (UTC)[reply]
      I'm wrong. Citation bot calls Zotero translators directly without going through Citoid. It also starts with the stable identifier, but I haven't checked to see if it generates others if there are multiple for the same resource. Folly Mox (talk) 03:22, 5 July 2023 (UTC)[reply]
      Going down memory lane, I've found two cases where it seems Citation bot does and doesn't do Jstor URL to parameter exclusion. In this 2020 edit, Citation bot removed URLs and put in only links. But in this 2021 edit it kept both. It feels like the 2020 behaviour would be preferred. Ifly6 (talk) 04:11, 5 July 2023 (UTC)[reply]
    You are looking for this RFC to explain the difference in behavior. Izno (talk) 04:37, 5 July 2023 (UTC)[reply]
    Confirmed that "keep the Jstor URL and add Jstor parameter" behaviour is current. Ifly6 (talk) 04:21, 5 July 2023 (UTC)[reply]
    • 1 & 2 sound good. I'll add to Undercover Classicist's objection to 3 with an example. I recently found an article citing multiple books without pages numbers. When I clicked the Google Books link, I found that the page numbers were preserved in the URL and was able to add the pages into the visible citation text. If a bot had scrubbed those links, it would have been implausible to verify from entire books. Rjjiii (talk) 02:11, 5 July 2023 (UTC)[reply]
      This happens frighteningly often. Folly Mox (talk) 03:06, 5 July 2023 (UTC)[reply]
      To expound, copypasting a direct page gbooks link into an automated reference generator (like the Visual Editor, ReFill, etc.) will in every case never produce a page number parameter, and editors seem to assume that the direct link suffices as substitute. Folly Mox (talk) 03:12, 5 July 2023 (UTC)[reply]
      This seems like a rather reasonable reason to keep the Google Books links around. Edit re Folly Mox: it seems extremely dubious for editors to be using tiny snippets of a book they cannot see; it would make it very likely that something is taken wildly out of context while also making it difficult to verify. This isn't the forum for that topic but it seems a bad thing. Ifly6 (talk) 04:05, 5 July 2023 (UTC)[reply]
      You should see the number of gbooks cites where the direct page link is followed by the exact search query the editor used to find the information. There are definitely cases of people not reading the full context before using a source to support a claim. Not best practice. Folly Mox (talk) 04:31, 5 July 2023 (UTC)[reply]
    • Since the addition of archive links is not limited to a particular source and is performed by people making massive additions without evidence of their otherwise curating an article, I would be looking to do things such as changing the WP:LINKROT article, which is often used as justification by such editors, to discourage such massive additions or encouraging, even forcing, editors using bots to limit the addition of links to only those citations where the original links have died. Dhtwiki (talk) 05:36, 5 July 2023 (UTC)[reply]
      Would something like Do not use automated tools to populate archive links for live websites. Only add archive URLs for print or paywalled sources when a compelling need can be demonstrated. be worthwhile to add to that document? Ifly6 (talk) 06:35, 5 July 2023 (UTC)[reply]
      You could change the docs, but this has proven so controversial that making this change without consensus will only lead to conflict at some future date. And getting IABot to change this feature will also prove extremely difficult, without the threat of removing the bot's permissions on Enwiki, which would probably take a Village Pump RfC, and every time this has come up in the past there are too many who want this feature. -- GreenC 14:28, 5 July 2023 (UTC)[reply]
      I would expect that an RfC would be necessary, and I wouldn't do much without one. I don't remember seeing one that expressly addresses this issue ever since I've had most of the Village Pump pages on my watchlist, only discussions here. Can you point to one in particular? Dhtwiki (talk) 22:42, 5 July 2023 (UTC)[reply]
    Tagging Billjones94, who should be notified of this conversation. Ifly6 (talk) 06:29, 5 July 2023 (UTC)[reply]

    nobots template

    I came across this article Graham Barrow because my bot skipped editing it as there is {{nobots}} template in "playing career" section. I am not sure by who or why this template has been placed on the article in discussion. Any ideas? —usernamekiran (talk) 14:42, 7 July 2023 (UTC)[reply]

    {{nobots}} is automatically placed by {{Copyvio}}, which appeared on the article in question in this edit and got substituted here. Aidan9382 (talk) 14:44, 7 July 2023 (UTC)[reply]
    There's the massive copyvio notice... Not sure how you've missed that. Headbomb {t · c · p · b} 14:55, 7 July 2023 (UTC)[reply]
    Was probably reading only the history and not the actual article. Sometimes I too forget to look at the actual article and only check the history. Jo-Jo Eumerus (talk) 16:54, 7 July 2023 (UTC)[reply]
    Yeah, that would make sense. Headbomb {t · c · p · b} 12:36, 8 July 2023 (UTC)[reply]

    Depiping bot

    Is there any bot that can be used to plug in and modify a given example of a piped internal link across pages, such as those that might have changed or need redirecting due to a page move, split or other change that affects page navigation? Iskandar323 (talk) 08:48, 8 July 2023 (UTC)[reply]

    I'm not sure what exactly you have in mind, but such a bot seems like it would be relatively easy to code. That said, WP:NOTBROKEN also applies. Headbomb {t · c · p · b} 12:35, 8 July 2023 (UTC)[reply]
    Some page/subject splits can result in many misdirected page links, but it might be a tool thing. Iskandar323 (talk) 16:35, 8 July 2023 (UTC)[reply]

    Automatically tagging articles with Template:No significant coverage (sports)

    There are 78,874 sports biographies, listed at User:BilledMammal/Sports articles probably lacking SIGCOV, that a Quarry query suggests lack significant coverage as required by WP:SPORTSCRIT #5. I want to use the add_text script to add the template Template:No significant coverage (sports) to these articles in line with a recent suggestion at VPR, but given the scale of change I believed it best to seek approval here first.

    I am also not certain if this would require approval through WP:BRFA? BilledMammal (talk) 17:48, 8 July 2023 (UTC)[reply]

    I would support the principle, but think it should go through BRFA. I also looked at the first entry on the list, A. F. S. Talyarkhan, and it seems like at least one of the three books cited should provide SIGCOV. * Pppery * it has begun... 17:56, 8 July 2023 (UTC)[reply]
    I agree that this task should go through BRFA due to the scale. Izno (talk) 18:50, 8 July 2023 (UTC)[reply]
    Same with A. J. Christoff - ref 6 looks like SIGCOV to me. The idea is valid but the listmaking process needs some improvement. * Pppery * it has begun... 18:01, 8 July 2023 (UTC)[reply]
    Because of the way that the refs at A. F. S. Talyarkhan are formatted the query I constructed won't be able to exclude them - although since I am parsing the text as part of placing the tag I should be able to identify and exclude them. I'll consider how to do that as I write the bot, before I apply at WP:BRFA.
    A. J. Christoff should have been excluded; I see where my error was and I've fixed the query now. I should have an updated list in a couple of hours. BilledMammal (talk) 18:07, 8 July 2023 (UTC)[reply]
    Another thing to consider is whether the article has another tag that amounts to the same thing. For example Carlos Fumo has {{BLP sources}} reporting the exact same lack of SIGCOV sources so {{no significant coverage (sports)}} would be redundant. * Pppery * it has begun... 18:14, 8 July 2023 (UTC)[reply]
    Come to think of it, I didn't even check if it was already tagged with this template. I don't think there are many other templates beyond the BLP ones that this would be redundant with, because SPORTSCRIT creates an absolute requirement for sourcing that other templates don't address, but I'll keep that in mind. BilledMammal (talk) 18:17, 8 July 2023 (UTC)[reply]

    change whichever bot applies WikiProject United States banner

    There's a proposal at Wikipedia talk:WikiProject United States#Undo hijacking of WikiProjects Louisiana and New Orleans to restore previous Talk page banners, so as not to use WikiProject US's banner. Currently, if either banner {{WikiProject Louisiana}} or {{WikiProject New Orleans}} is put on an article Talk page, a bot soon replaces it by the United States banner. The proposal is also to stop that bot. I opened the discussion there, and hope to achieve consensus there for the proposal. Will that suffice to get approval (here?) for the change to the bot? --Doncram (talk,contribs) 05:46, 9 July 2023 (UTC)[reply]