Page MenuHomePhabricator

Evaluate whether WMDE can take over our essential community run constraints bot
Closed, DeclinedPublic

Description

"WMDE should take over the operation of KrBot (that generates Constraint Violation pages) because it now takes huge resources ("Now the bot requires 106 GB of memory to load and process all data") and Ivan Krestinin cannot cope. See discussion at User_talk:Ivan_A._Krestinin and at Telegram"
source https://phabricator.wikimedia.org/T201150#7341220

Event Timeline

Maybe someone in the community can adopt it from Ivan Krestinin? It should be able to run fine in a WMF cloud VPS I assume.

Where is the code for this bot?
And what exactly is the data in and out of this code?

The source code for the bot has never been published, see T189747.

I also see in the description WMF should take over and in the title WMF operations.
I'm guessing the intent here is for WMDE to consider taking this over, this is certainly not something that the WMF would get involved in for this project.

(Yes, we mean WMDE.)

This is basically a reopen of T189747 which was declined because the source is not available.
@Ivan_A_Krestinin Can you comment about opening the source?

The bot generates the constraint violation pages, eg see https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P2088.

https://phabricator.wikimedia.org/T189747#4058205 describes how the bot operates.

If WMDE won't take over the operation of KrBot, another option would be to write a WMDE bot to do it.
I believe that when violations are fully exposed for querying (T201150), this should be possible to do with SPARQL (or with SQL if they are also available in the underlying RDBMS).

Addshore renamed this task from Evaluate whether WMF operations can take over our essential community run constraints bot to Evaluate whether WMDE can take over our essential community run constraints bot.Sep 9 2021, 10:07 AM
Addshore updated the task description. (Show Details)

So I certainly don't think we want to operate the bot.
But I'd imagine something else might be possible, cc @Lydia_Pintscher @Manuel

As I understand the bot makes a page per property that contains a broken down list of current constraint violations for the constraints defined on that property, with some slightly useful extra data / formatting (such as the values at play)?

@Addshore I think that's a fair description. To add:

  • Database reports/Complex constraint violations/P245: not sure who generates

It’s DeltaBot. The source seems to be here, although that one is queriable in SPARQL (as it’s specified with raw SPARQL), so rewriting the bot wouldn’t be hard.

As Adam said we can't take over this bot. Instead I want us to focus our energy on the underlying issue, which is the fact that constraints violations are not accessible in a meaningful way via an API or other way to query them. We are working on fixing this by persistently storing the constraint violations. Once that is in place a much simpler and less resource-heavy bot can be written to handle these tasks if still desired. The work on persistently storing constraint violations is ongoing in T214362. I am declining this ticket in favor of that one as that is the more scalable long-term solution.