I'm trying to get largest cities by population (filtering Ukrainian ones, so I could check that everything is correct) like this:
# Cities by populationSELECT?item?cityLabel?countryLabel?populationWHERE{?citywdt:P31wd:Q515.# is a city?citywdt:P17?country.# show me country?citywdt:P17wd:Q212.# let is be Ukraine?citywdt:P1082?population.# and get me populationFILTER(?population>=1000000)# remove cities with population less than one millionSERVICEwikibase:label{bd:serviceParamwikibase:language"en"}}ORDER BYDESC(?population)
Is there a tool that can extract this data over time, so that it can be plotted? Or perhaps someone has a bot that could so that? perhaps the figures could be added to each property page on, say, a monthly or quarterly basis? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits17:13, 20 May 2017 (UTC)
The number used on the constraint pages isn't accurate. It doesn't get updated if they are no changes to the contents of the page, for example. Every property talk page shows {{Property uses}} though, you can use the revisions to get a overview. Sjoerd de Bruin(talk)21:48, 21 May 2017 (UTC)
@JakobVoss, Jneubert: Thank you. Funnily enough, I learned of that tool only last week, in another context. I see thumbnail plots on that page, but no links to larger versions, and not the raw data as plotted in your image here. What am i missing? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits12:51, 31 May 2017 (UTC)
end time (P582) is the generic property for things that have ceased to exist. It is generic enough that most SPARQL writers know to use if when they want only items that still exist. Syced (talk) 07:37, 1 June 2017 (UTC)
Enable Wiktionary sitelinks in Wikidata
After enabling the extension Cognate that provides the interlanguage links for the main pages of all the Wiktionaries, we will now move forward to the next step of supporting Wiktionary with Wikidata. From June 20th, we are going to store the Wiktionary interwiki links (all the namespaces but main, user and talk) in Wikidata.
Just like Wikipedia a few years ago, a “Wiktionary” links section will be created for the items, the links will be migrated to Wikidata, new items will be created on this purpose, and Wiktionary editors will be able to add and modify Wiktionary links.
How can you help?
First of all, you can help us translating this documentation page in the languages you know.
If you know tools, scripts, bots, that could be useful for the migration process and removing the manual sitelinks, please share your informations on the page and offer help to people who would need to use them.
From June 20th, you may want to pay special attention to the new created items and all recent changes that will result from this new feature available for Wiktionaries.
Be friendly and welcoming with the Wiktionary editors :) Help them if necessary, make them feel part of the great Wikidata community.
Also, in the WikiProject Video Games, we use regions to separate game releases, such as Japan, North America, Australia, and Europe. What item corresponds to these regions? SharkD (talk) 12:23, 1 June 2017 (UTC)
Join my Reddit AMA about Wikipedia and ethical, transparent AI
Hey folks, I'm doing an experimental Reddit AMA ("ask me anything") in r/IAmA on June 1st at 21:00 UTC. For those who don't know, I create artificial intelligences that support the volunteers who edit Wikidata. I've been studying the ways that crowds of volunteers build massive, high quality information resources like Wikipedia and Wikidata for over ten years.
This AMA will allow me to channel that for new audiences in a different (for us) way. I'll be talking about the work I'm doing with the ethics and transparency of the design of AI, how we think about artificial intelligence on Wikimedia projects, and ways we’re working to counteract vandalism. I'd love to have your feedback, comments, and questions—preferably when the AMA begins, but also through the ORES talkpage on MediaWiki.
Treaties - differentiating original signatories from later signatories
Regarding the signatory (P1891) property on the item treaty (Q131569), how do I indicate later signatories of a treaty from its original participants. Should I add a "point in time" reference for each country I list as a signatory? Also, if I am aware that a group of countries joined a treaty after its creation, but don't know the dates, how do I indicate that? EU explained (talk) 22:20, 1 June 2017 (UTC)
Hello everybody. Can anybody please check if I've gotten this regex right?
For the new of BBF ID (P1650), which looks like e.g. bcaec648-5c7d-46d8-8a80-3d4b38f7f1b1, i've identified the string pattern as (8 hexadecimal digits)-(4 hexadecimal digits)-(4 hexadecimal digits)-(4 hexadecimal digits)-(12 hexadecimal digits). Hence, I think as regex it should look like this:
Nope, it should be [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}, which you can simplify to [0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12} (and even further but that wouldn't be readable). Matěj Suchánek (talk) 06:46, 2 June 2017 (UTC)
Today it is the 66th birthday of Gilbert Baker (Q4081194) and therefore he has his own Doodle on Google. I could remember there was a property for that, but I can't remember what it was. Now I fixed it like this. Is there a better way to describe a GoogleDoodle event? Q.Zandenquestions?12:09, 2 June 2017 (UTC)
The urban population (city population).
Hello! There is a website https://www.citypopulation.de/ which displays data on the population of all countries, their regions and large settlements (data from several recent censuses or estimates). The website information is updated. It would be nice if all these data would be moved to Wikidata. (Original text discussion - https://www.wikidata.org/wiki/Wikidata:%D0%A4%D0%BE%D1%80%D1%83%D0%BC - Население городов) 14:11, 2 June 2017 (UTC) And S Yu (talk)
Since I am not that much experienced with translations and template programming, I need someone to help me with {{Property documentation/help page template}} (the help page template) in a way that it fulfills the categorization scheme shown above automatically. It should be almost fine right now, but I’d appreciate if someone with more experience could verify (or optimize) this.
I was thinking about this. I really like and endorse the idea that the headers and footers should be always and only used, so that we can let users focus on the help itself. Although you introduce syntax with <noinclude>, I would consider wrapping the help text to <onlyinclude>. Then we will be 100% sure the text is the only thing to get transcluded, the rest wouldn't have to make sure it's visible. Matěj Suchánek (talk) 12:14, 7 June 2017 (UTC)
The previous policy proposal is not even two months ago rejected. This new proposal has none of the provisions in it that will make a dent in the scale of the things as we have it and consequently it is as problematic. It considers several aspects in isolation but it forgets how controversial nationality is while ethnicity is included.. No thank you, not at this time. Thanks GerardM (talk) 04:59, 30 May 2017 (UTC)
That is only one of two arguments and imho the least relevant.. I oppose this proposal because it does not include a way whereby we concentrate on the issues that are likely problematic. This is too heavy handed. Thanks, GerardM (talk) 08:35, 30 May 2017 (UTC)
i see the same adversive negative feedback. would you care to rework as a wikiproject and quality circle approach? do you have any history justifying "repeated or egregrious incidents by a user may lead to blocks"? if not, them you are bringing a solution where the facts are not indicated. Slowking4 (talk) 16:35, 30 May 2017 (UTC)
@Slowking4: The WMF resolution is quite clear that every project in the Wikimedia universe should have its own explicit policy and having an explicit policy is good for the relationship to the individual Wikipedias. If you feel like there are changes to the draft that would improve the policy and reduce the chances for harm I invite you to make your changes, but I think sooner or later we will have an adopted policy. ChristianKl (talk) 11:36, 3 June 2017 (UTC)
@ChristianKl: When we are to do better re living people, fine. The big issue is that the way this proposal is put forward is like an edict with no indication on how this is to be implemented. When we are to improve our data, it is best done by concentrating on the known issues. These are not a subset of what is considered "problematic" but they are based on indicating the errors and having a path to improving the data.
I feel extremely uncomfortable by imposed policies that are only words and have no plan behind them. When the plan is nuke everything that does not comply, I am dead against it because our data set is limited as it is. So by comparing what exists in Wikipedias and other sources, finding the differences and NOT accepting them but curating them you have a way out of this mess. Thanks, GerardM (talk) 12:25, 3 June 2017 (UTC)
Protection against vandalism
Has Wikidata any protection against vandalism? I just found this obvious case: [1]. If Wikidata wants to be a useful reliable resource then it must have a kind of protection. --178.9.86.23808:37, 1 June 2017 (UTC)
Wikidata has 30 times more "articles" than English Wikipedia. Latter struggles to watch just every page created.
If we issue semi-protected flags manually before "vandalism", we will spend a lot of time.
Who is able to speak that many languages in order to "lock" them all? If we lock languages independently it will result up to (30*10^6)*(languages) records.
Simplest technical solution with reasonable involvement of human resources is to correct mistakes when necessary. d1g (talk) 08:59, 3 June 2017 (UTC)
I know nothing about coding, or oauth, but I have figured out how to send requests to urls. However, fatameh needs oauth, so I have no idea as to how to even append an oauth token to an url. Could someone help me? PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font!20:07, 2 June 2017 (UTC)
This sounds like you wanting help about how to do implement bot functionality without actually using a bot. This is not how it's supposed to be done. ChristianKl (talk) 16:57, 3 June 2017 (UTC)
We are starting a project to save historical Finnish place names in a Wikibase. In the project, we seek to come up with modelling solutions that can inform the same activities in Wikidata or make importing items from the repository straightforward.
The key design issue is whether or not to model the place names as items, whether or not an individual place name will have it's own URI. Here I present thoughts that are applicable to both Wikidata and the project, but will write them from the perspective of the project.
Yes, the names should have their own items/URIs in the project:
Related national resources have unique identifiers for names, it would be natural to follow that in the historical place name repository project.
Modelling with individual items is more flexible than with properties only. There are not enough levels for complex definitions: Hiidenniemi (Q28031790) > name (P2561) > "Hiisi" (fi) >IPA transcription (P898) > "hi:si" > language of work or name (P407) > Q28031759. Most of the data can be expressed by qualifiers to the name property, but there is never room for a definition that requires a qualifier.
The solution in this external project can differ from Wikidata, there can be URIs in the historical names project and not in Wikidata. When importing to Wikidata only most relevant information will be imported.
No, the names should only be expressed as properties in the project:
If Wikidata would follow the same principle, there would be too many meaningless name items. Similar names should be kept separate, not merged. That could confuse the users.
In Wikidata most of the existing names are defined as textual properties already.
Both:
There should be ways to declare names both as textual properties and as individual items. Currently there is a guideline to use the property P794 (P794) to link to a possible place name item. Perhaps a new property?
Using lexemes:
Lexemes are not in use yet
It is too early and possibly technically complex to start using them in an external project
If used in Wikidata, the data from the external project can be mapped to lexemes?
I don't see the problem with qualifiers in your example. Anytime you have a definition, it's the definition of a concept that has a name. That concept warrants it's own item.
What's your core motivation to have this project in your own Wikibase installation instead of having it in Wikidata? ChristianKl (talk) 08:27, 30 May 2017 (UTC
Thanks for the answers!
I have found native label (P1705) so ambiguous that I have reverted to name (P2561). And these items are not official name (P1448). Vernacular might be the best adjective. I will study the name properties more carefully, and the practices around them. Where would the latest discussion live?
I will study your reply about the qualifier problem before I can reply!
This project will act as the reference database. There are 3 million items, and the site will be set up to invite users to enrich and enhance the data. There will be duplicates etc. that are better cleaned before introducing to Wikidata. Also, the amount of items is such that it's better to proceed step by step. / Susanna Ånäs (Susannaanas) (talk) 09:20, 30 May 2017 (UTC)
I think a name with external identifiers would be ok as its own item rather than just embedded in a statement with qualifiers. We do have items for family names, etc. I don't think this is the same as a lexeme where there's language-specificity. ArthurPSmith (talk) 14:17, 30 May 2017 (UTC)
Thanks for your thoughts! I think one thing to avoid with unique identifiers would be that Pyhäjärvi (name) for "place A" would be the same item as Pyhäjärvi (name) for "place B". The reason to have the unique identifier would be to allow rich descriptions of the this specific name for this specific place. The practice with last names would encourage to think these as items to be treated as one.
For the lexeme issue: I have a hunch it would be useful to model place names as lexemes, but it's definitely too early to apply the idea. The place names are language-specific par excellence, for example their conjugation is an artform of it's own! – Susanna Ånäs (Susannaanas) (talk) 06:39, 4 June 2017 (UTC)
Hepburn romanization
For Japanese labels, which writing system should we use? There are several to choose from, including hiragana, katakana and Hepburn romanization. Can I use more than one? SharkD (talk) 10:26, 1 June 2017 (UTC)
That works. But what language code do I use for Japan? JA, JPN or JP? Here is a list. (I think it's the correct list.) The documentation for the labelLister doesn't say. SharkD (talk) 20:22, 1 June 2017 (UTC)
Is there a tool that can convert Wikipedia references to Wikidata references? It is the single most time consuming activity I perform on Wikidata, and if the references use citation templates they are already structured. Thanks. SharkD (talk) 05:09, 4 June 2017 (UTC)
I think it would be difficult to create a tool like this, because there are many items that must be searched for, such as the book, the edition of the book, and the authors. For each item, it's hard to search for the item because of minor variations in spelling, abbreviation, or completeness. For example, is the title The Oxford Companion to the Year: An exploration of calendar customs and time-reckoning or just Oxford Companion to the Year? Is the first author "Blackburn, B." or "Bonnie Blackburn"? These searches are best performed by humans, not tools. Jc3s5h (talk) 12:13, 4 June 2017 (UTC)
Yeah, I didn't consider that. In that case, something to at least make copying/pasting easier would be great. I can resolve discrepancies manually. SharkD (talk) 13:01, 4 June 2017 (UTC)
You might be interested in the Drag'n'drop gadget at Special:Preferences#mw-prefsection-gadgets; it adds links next to the sitelinks section in the web interface which make Wikipedia article references accessible. This is likely the best you could have right now. In future (time frame ~2–5 years) there will be WikiCite, I have some hope that this will drastically improve the reference management. —MisterSynergy (talk) 18:47, 4 June 2017 (UTC)
Activate the gadget. You’ll then see extra links called “[ref]” after each connected sitelink, which opens an overlay in the same browser window containing the connected articles. Some elements of them can indeed be dragged and dropped into the item, such as (online) references and wikilinks. I don’t know of any documentation. —MisterSynergy (talk) 16:37, 5 June 2017 (UTC)
Merge tool doesn't merge dates as expected
When merging two items with the same publication date (P577) what I expect is that if both entities have the same date the resulting entity will have the date only once with all the references combined, but it keeps both dates separately. I would expect this happening when the precision of the date is different, but I don't get it why they are not merged together in this example. -- Agabi10 (talk) 12:53, 4 June 2017 (UTC)
yes, tool could be error prone without any action.
such conflicts are resolved manually (remove duplicate/wrong claim at one item, then performing a merge). d1g (talk) 13:13, 4 June 2017 (UTC)
@Matěj Suchánek: Is there any feasible way of fixing the internal representation of the year and month precision dates? That would fix the problem and it would decrease the number of duplicate dates that are added... -- Agabi10 (talk) 13:36, 4 June 2017 (UTC)
I believe KrBot was doing this somewhere in the past. It's unclear to me, however, which one is correct, maybe that's why it's no longer done by the bot. Note that the first step should be to prevent one of the representations to be added. Matěj Suchánek (talk) 14:13, 4 June 2017 (UTC)
@Matěj Suchánek: Based on this diff I would say that the correct one is the one with this format 1975-00-00 instead of the other. I created that one using the GUI. If anything is creating entities in the other format it probably is a bot. -- Agabi10 (talk) 14:47, 4 June 2017 (UTC)
The format depends on how type it in. This diff was also created using the GUI by typing "1.1.2017" and the changing the precision to year. --Pasleim (talk) 14:53, 4 June 2017 (UTC)
Yes, but the one I did is using the automatic detection for the precision, the one of your diff is probably a bug that should be fixed... maybe... -- Agabi10 (talk) 15:30, 4 June 2017 (UTC)
The data model description mediawikiwiki:Wikibase/DataModel/JSON states 'That is, 1988-07-13T00:00:00 with precision 8 (decade) will be interpreted as 198?-??-?? and rendered as "1980s". 1981-01-21T00:00:00 with precision 8 would have the exact same interpretation. Thus the two dates are equivalent, since year, month, and days are treated as insignificant.'
Thus, a tool that treats characters that are insignificant as being significant is faulty, and the merge software should be fixed. Jc3s5h (talk) 11:59, 5 June 2017 (UTC)
Request for bureaucrat
Dear all,
there is an ongoing request for bureaucrat. Please express your opinion about the candidate at the request page. The most recent request for bureaucrat in January failed due to a lack of quorum. Although there was 80% support the total number of support votes was too low to make it pass. The quorum for a bureaucrat request is nearly double that of an admin request. The current candidate also seems to have a decent percentage of support, but there has been no new input for over a day. As such I would like to ask everyone to express their opinion about the candidate at the request page.
in the case of a creative work, if there is no standard English translation available, I would use the non-English title for the label, as you have done here. - - PKM (talk) 22:16, 4 June 2017 (UTC)
Hi. Looking at an item like Petronas Towers (Q83063) (a twin tower), could someone help guide me with regards to:
How to add individual information for each tower (like height, date if official opening, floors, etc)
If it is possible to add more than just two towers, such as 3 or more identical towers
I'm editing articles relating to buildings, and I often come across such multi-tower complexes which has slightly varying information for each tower. Thanks in advance! Rehman23:11, 4 June 2017 (UTC)
I see. Thanks! So that means, a separate page should be created for each tower, which has the needed specifics? As opposed to things like height/floors/etc to be all displayed on one page for all towers... Rehman04:09, 5 June 2017 (UTC)
Breaking change: "wb_entity_per_page" table will not be updated and replicated on ToolLabs anymore
Hello all,
This is an important message to all the people running external tools.
On July 12th, we are going to stop updating the wb_entity_per_page table from the Wikibase database and stop its replication on ToolLabs. At a later point we will remove it completely.
wb_entity_per_page was a secondary database table, mapping Wikibase entity IDs (e.g. "Q42") to MediaWiki page IDs (e.g. 138, which can be seen at https://www.wikidata.org/wiki/Q42?action=info). wb_entity_per_page stored entity IDs as numbers, while page titles are always full entity IDs.
This mapping existed because Wikibase was designed with the possibility to have entity pages where the ID does not match the title. This idea was never used, and finally removed in 2015 (documented here). We decided to get rid of the table because it contains outdated information that could mislead users, it costs resources and could conflict in the future with our new entity types for lexicographical data.
Please check if you are maintaining any code that accesses the wb_entity_per_page table, and replace it with lookups to MediaWiki's page and redirect tables.
We will drop the replica of the table on ToolLabs for test.wikidata.org on June 28th. We will do the same for wikidata.org on July 12th.
@Lea Lacroix (WMDE): This seems to break 89 queries in my query directory on toollabs. Not very thrilled about that. So how am I supposed to join the page table with wb_items_per_site and wb_terms? Multichill (talk) 19:49, 1 June 2017 (UTC)
term_full_entity_id is not yet deployed. We have decided to push back the removal of wb_entity_per_page until after term_full_entity_id has become available, so people only have to change their code once.
We currently have no plan to change wb_property_info or wb_items_per_site to use full entity IDs. They will keep using numeric IDs for now. If you need to JOIN against them, you will have to use CONCAT or SUBSTR (which will not be good for performance, depending on the query). Please let us know your concrete use cases, so we can try to find a solution. Please also consider using the MediaWiki API or the Query Service as an alternative.
Any code that currently uses wb_entity_per_page to find the page title for a given entity can simply skip this step now. Wikibase now guarantees that the entity's page title can always be computed from the entity ID; For now, the title will always be the ID itself. You will have to know which entity type corresponds to which namespace, though.
In general, using the database directly is always a tradeoff: you gain querying power, but you lose stability. The internal DB schema is not designed or intended to be as stable as an external API. We are aware of the cost of such changes to to tool authors (and DBAs), so we are trying to keep them to a minimum. But in the end, the DB schema is an internal structure, and not designed to provide backwards compatibility.
The move away from numeric IDs is driven by the need for supporting entities that do not have simple numeric IDs: The new Lexeme entity type will contain sub-entities, Forms and Senses, which are stored on the same page as the Lexeme, and use structured IDs (e.g. L762343-F6) for addressing. We could not fit these into the existing database tables.
In general: please tell us in detail what you are using wb_entity_per_page for, and why you decided to use it over some other method of getting the information you need. This will help us to know how to best support the migration, and minimize breakage. -- Daniel Kinzler (WMDE) (talk) 15:04, 6 June 2017 (UTC)
There is probably a better way to do these items, but I wonder if there isn't a confusion (or at least an odd translation) involved with has use (P366) ("use" in English) and uses (P2283) ("uses" in English). Tools for Wikidata generally uses (P2283)="Wikidata" as do other applications, but their has use (P366) is more less limited to maintaining Wikidata or, at least, it includes it. One can't just replace "tool for Wikidata" with "uses=Wikidata". --- Jura14:31, 5 June 2017 (UTC)
P31 isn't the best place to store information about has use (P366).
We need to create items for every aspect of data model and use them in P366.
Could a few more editors add pages to their watchlists from topviews? It's very surprising to me how blatantly obvious vandalism (e.g. changing labels to "caca" or other nonsense) manages to stay on important/high-profile items for days, and there should ideally be more editors dealing with it. (It's also useful to get lists of pages from the large Wikipedias and convert them to Wikidata item IDs with QuickStatements v1.) Thanks, Jc86035 (talk) 09:24, 4 June 2017 (UTC)
@d1gggg: To be honest I think it's probably a higher priority to improve recent changes tools, since so many edits even on the more important items manage to go unnoticed, but protecting pages with somewhere around 10 views or more per day should work. (Topviews wouldn't work for this since it only goes to 696 for Wikidata, for some reason.) Jc86035 (talk) 13:37, 4 June 2017 (UTC)
Registration is something possible in every modern site. Current CAPTCHA is readable.
Biggest question is if we want to allow edits semi-anonymously (using IP). Other projects e.g. Wikipedia might want this, but why Wikidata? d1g (talk) 13:27, 5 June 2017 (UTC)
You need more than just registration to be able to post on a semi-protected page. You also need to be autoconfirmed. ChristianKl (talk) 14:32, 5 June 2017 (UTC)
@ChristianKl: The main problem is that vandalism stays unnoticed for ages. I have no idea how many people are patrolling recent changes, but I've had to revert really obvious four-month-old vandalism (on labels etc.) which just doesn't get noticed by anyone and stays there. Granted, it only happens on a small minority of items, and many logged-out editors contribute constructively, but it's not very good and the flood of blatant vandalism has to be dealt with at some point. Semiprotection is just one option. Jc86035 (talk) 16:28, 5 June 2017 (UTC)
I don't think you understand the point. Semi-protecting not only prevents logged-out users from editing. It also prevents new users with registered accounts from editing. Preventing new users from contributing isn't a valid solution to the problem of vandalism. ChristianKl (talk) 16:58, 5 June 2017 (UTC)
@ChristianKl: To be clear, I think mass semiprotection isn't the best way of dealing with vandalism, and from a more Wikipedia-centric perspective it would go against allowing anyone to edit and would prevent page infoboxes from being modified by logged-out users and non-autoconfirmed users for no reason. Jc86035 (talk) 04:22, 6 June 2017 (UTC)
Would it be okay to create a separate type of genre for video games? You have the literary genres such as "mystery" or "tragedy" already. But games are usually categorized based on their mechanics. For instance, the difference between a "strategy" or a "shooter" game. (A game of course could be both a "strategy" and a "shooter" game at the same time.) We could call it a "gameplay genre". Thanks. SharkD (talk) 18:43, 4 June 2017 (UTC)
Thanks for the correction. I guess I'll need to find a different way to filter the different types of genre. SharkD (talk) 20:49, 5 June 2017 (UTC)
Property for league of player
Is there a property which designates the sport league a player played in? I am aware of league (P118), but this is for clubs and teams. It would be nice to have a property like this for generating lists of, e.g., players who played in the Premier League etc. Steak (talk) 10:56, 5 June 2017 (UTC)
You are wrong, this conclusion is not necessarily true. Supposse a club plays in the domestic league and a (national or international cup) in one season. If a player is member of that club in that season, there is no information if this player played in the league, in the cup or in both. Steak (talk) 13:06, 6 June 2017 (UTC)
Contemporary fiction
There are tags for science fiction, historical fiction, fantasy, etc. Is there one for fiction set in the modern day as well? Thanks. SharkD (talk) 15:39, 5 June 2017 (UTC)
How to prevent users from adding values that should not be there
The coat of arms of the City of Malmö, but NOT the urban area of Malmö.
Yesterday, I had a discussion on Talk:Q54339 about the removal of such claims as coat of arms and official webbsite to items like Trollhättan (Q54339). This could also be said about such things as sister city and head of government. If I simply remove such claims, automatic tools tends to add them again. If I add "novalue" here, that happens less often. But it still happens sometimes. How do I in the best way put a source or qualifier that this "novalue" should not be removed? -- Innocent bystander (talk) 07:09, 6 June 2017 (UTC)
We should be able chat with bot writers and to get change, though I have found that they can be indignant, indifferent, or silent to pings on items. It certainly is an issue where bad information is propagated by bot (re)addition. I would agree on your approach of "novalue" where it should be left empty. — billinghurstsDrewth08:19, 6 June 2017 (UTC)
Keep "no value" claims.
Wikipedias could be slow to update, wrong values would be added again until every wiki is fixed.
Not a problem of bot owners (e.g. infobox importing tools could copy wrong values again) d1g (talk) 08:29, 6 June 2017 (UTC)
Yes, infobox importing tools are a larger problem than bot owners. The use of "novalue" often prevents adding of data that way.
One problem here is: How do I source or note that "novalue" is the only correct statement? It would be helpful to tell that such information is more valid in items like "Municipality of Gothenburg" and/or "City of Gothenburg" than in the urban area of Gothenburg. -- Innocent bystander (talk) 08:45, 6 June 2017 (UTC)
@Matěj Suchánek: What rank and what source? If Italian Wikipedia in some page version says that Tokyo was ruined in 1954 by Godzilla, that is not of value even for a claim with deprecated rank. Why should I even look for sources that says otherwise? Some Wikipedia-versions and Wikidata is the only place where you can read that Malmö urban area has a coat of arms. Statistics Sweden is the only authority about Swedish urban areas. They says nothing about coat of arms and webbpages. How do I proof that? -- Innocent bystander (talk) 10:11, 6 June 2017 (UTC)
Well, if some source tells you "Malmö" has File:Malmö fulla vapen.svg as COA, that is a correct claim as such, but which item at Wikidata is (s)he then talking about? That claim is valid in Q10576166 and Q503361, but not in Q2211. Parts of the urban area of Malmö is located in Burlöv Municipality which have File:Burlöv vapen.svg as COA. In the same way, if somebody tells you there has been a terror-attack in Paris, you have to check which item you should add that to. One option is Paris (Q13107162), but that would most likely not be correct. -- Innocent bystander (talk) 13:22, 6 June 2017 (UTC)
Maybe we can set constraints to signal that an urban area has no coat of arms? Afterwards the tools shouldn't add constraint violating claims. ChristianKl (talk) 10:42, 6 June 2017 (UTC)
Property talk:P775 already have "conflicts with sister city" as constraint! P6, COA and "official webbpage" can be added to. That I have only added sister city here, was because I intended to move such claims to the proper item. It was a very hard job, since I found it very hard to find good sources for such claims. -- Innocent bystander (talk) 13:22, 6 June 2017 (UTC)
Adding new interwikis way too difficult to figure out
I created an article "Arnel Pineda" in the Finnish Wikipedia. Then I wanted to add interwiki links to the sidebar (knew there was a corresponding article in the English Wikipedia). Adding the interwikis failed through clicking the link in the sidebar in the Finnish article. That was a way to create a new Wikidata item. Who would figure that out? Please, put some red box there that says "you are creating a new item. are you sure? please, try to find if the item already exists before you proceed" Then I went to the English Wikipedia and clicked the link there in the corresponding article's sidebar and that was the way to go, but the mechanism to add an entry was way hidden. It should be way more visible and not just a line at the end of the list. You could add a colored box around the last line and the text "ADD NEW ENTRY". The box could be green. Just common sense IMO, these are too user-unfriendly because everything is white and small icons etc. Please, add some red and green boxes to help people figure out. --Hartz (talk) 07:38, 6 June 2017 (UTC)
Wikidata weekly summary #263
Here's your quick overview of what has been happening around Wikidata over the last week.
MySociety (Q10851773) are publishing a "five part series examining how to use Wikidata to answer the question: 'What is the gender breakdown of heads of government across the world?'".
Data donation: following WikiCite, our friends at DBLP (Q1224715) have begun to donate data, with >4,800 values in the first batch, including >1,300 DBLP ID (P2456) plus assorted aliases, and values for VIAF ID (P214), GND ID (P227), ORCID iD (P496), ACM Digital Library author ID (P864), zbMATH author ID (P1556), & Google Scholar ID (P1960).
Exclude redirects and deleted items from query service results
It happens occasionally that deleted items or merged redirects appear in query service results weeks and months after deletion/merging action was performed. What can I do to remove those items from results sets? Would purging potentially help? —MisterSynergy (talk) 15:03, 6 June 2017 (UTC)
I now left a message on Stas’ talk page. The MINUS hack might help for redirects, but theoretically they should not appear at all in the results. It would not help for deleted items at all. —MisterSynergy (talk) 17:46, 6 June 2017 (UTC)
"point in time" vs. last update (P585)
Hey folks, currently the property point in time (P585) is used for two purposes at the same time: On the one hand, it defines a point of time when an event took place (see the examples on the talk page, e.g. 2012 United States presidential election (Q4226) → November 6, 2012), and on the other hand it is used as a qualifier to determine when a statement was true or last updated (for example the population of a city or the number of goals a soccer player has scored – these data are subject to frequent changes, so it is important to state when the given information was true or last updated respectively). The property is used more and more for the first purpose, while its original intention was the latter (see property proposal: here). Especially for finding an appropriate label for the property in the various languages, the specific use of the property is important. In English, for example, the original label was as of to determine when a statement was true or last updated, but with the usage to determine when an event took place, it was changed to point in time. The same issue we're facing for the German label and probably for all other languages also. Therefore, I think it might be reasonable to create another property to determine when a statement was true (e.g. as of, last update or something like this) and so use two properties for those two purposes. As this would be quite a big change (the property is used extremely often), I thought a prior discussion here would be reasonable before requesting the creation of the new property. What do you think? Yellowcard (talk) 10:08, 25 May 2017 (UTC)
There are really three cases: "as of" implies that the truth of a statement was checked at a particular time. "Date of an arbitrary event" could be applied to most anything that doesn't have a devoted property for that kind of event, such as date of birth (P569). And the range of time when a status is known to be true can (in the sense of it became true on a known date, and became false on a later known date) could be indicated with start time (P580) and end time (P582).
@Innocent bystander: I would propose "per" as a Swedish translation of "as of". Example: "The Sweden population as of December 31, 2016 was..." -> "Sveriges befolkning per 31 december 2016 var..." --Larske (talk) 16:14, 26 May 2017 (UTC)
When I see a case like this, I think it might be worth to have a more formal process for changing the label and description of a property. ChristianKl (talk) 16:59, 25 May 2017 (UTC)
Is there a concensus that we change the as of purposes of P585 to retrieved (P813)? Does it have to be discussed on a broader level somewhere as it might cause some applications / modules to break? Yellowcard (talk) 10:57, 28 May 2017 (UTC)
No. "As of" means a certain status was true on the date stated. "Retrieved" means the date information was looked up in a source. One could write that as of May 20, Donald Trump was in Saudia Arabia, and that the information was retrieved from https://www.whitehouse.gov/potus-abroad on May 30. Putting it another way, the "as of" date is based on what the source says, and the "retrieved date" is based on the calendar of the editor who obtained the information from the source. Typically, but not always, the editor would add it to Wikidata right away, but if there is a delay in adding the information, the retrieved date would be earlier than the date of the edit that adds the information. Jc3s5h (talk) 11:57, 30 May 2017 (UTC)
Hi, we are new start up, our core product is soccer stats center: scorum.co.uk
We want to make bot for wikidata which will update soccer stats. We can start from Japanise J1 League: 2017 J1 League, update League table and Top scores. You can check if the stats are correct on our website at https://scorum.co.uk/football/tourneys/1103-2/j-league. It would be much more easier for you as you won’t need to update them manually and all the data will be up to date. MaybeVlad (talk) 08:03, 27 May 2017 (UTC)
@MaybeVlad: Hi, first I think that we should create Wikidata properties to link your website to Wikidata (matches, teams, persons, tourneys) in order that you can use our data.
Then you have to create items for each match (with the property Scorum match ID) and import your data. For example, you can see this item: France v Romania (Q24201656) (Group A match of UEFA Euro 2016).
You can have informations about bots here and how to create one here.
@MaybeVlad: No, you have to request a permission here and explain what you want exactely to do (which properties you want to use, etc.). For the properties, the rule is to wait one week after the request before create them. It's here to request the four properties.
This discussion tends to be too passive. Because in case this task will be approved, it will affect a lot of items, with multiple statements changed, and a lot of edits per each item, more input is needed. Maybe is the case to ping the en.wiki Footy project? XXN, 15:27, 28 May 2017 (UTC)
"lot of edits per each item" not always, depend by the bot, it's possible add a lot of statements, label, reference atc. with one edit. --ValterVB (talk) 18:10, 28 May 2017 (UTC)
Yes, but items will be edited frequently, generally on a weekly basis (a common span between league rounds in almost all countries) and everything should work fine and to be agreed. --XXN, 20:38, 28 May 2017 (UTC)
I'd like to see the list of all entities on an item page which are planned to be updated. Maybe some property is missing and there should be created a new one. XXN, 20:38, 28 May 2017 (UTC)
@MaybeVlad: Do you think about updating the data of soccer players, add items for matches or what exact purpose do you have in your mind? Potentially, this could be a great help. Yellowcard (talk) 17:29, 6 June 2017 (UTC)
@Yellowcard: yes, we want to update all data for soccer. Add items not only for matches but for players, teams etc. Now we are looking for a developer, to do all of this. I think we will start in one week. MaybeVlad (talk) 06:48, 7 June 2017 (UTC)
Not sure every match from every league is notable enough to create items for them, but at least already existent items certainly could be updated. Even 'popular' items like this could be greatly improved (number of matches played, goals scored, participating teams, etc). XXN, 13:36, 7 June 2017 (UTC)
@XXN: I think that all matches of notable leagues are notable. "It refers to an instance of a clearly identifiable conceptual or material entity" and " it can be described using serious and publicly available references" (Wikidata:Notability). It is sure that it could be very interesting if it's done correctly. Tubezlob (🙋) 13:57, 7 June 2017 (UTC)
Database properties and their corresponding items should contain some information about the maintainers. You might find information on their websites, but this has to be looked up case by case.
You can then try to establish a relationship to them (via email), as you apparently already did in some cases. Some will respond, while others won’t.
I’d suggest to leave a note on the talk page of the property, containing the following details: who wrote to them, when did that happen, what was the outcome, …. We should try to bundle our conversation with external database maintainers as much as possible, and the individual property talk pages seem to be much more suitable for that than a separate WikiProject.
How to deal with multiple IDs: all current IDs should not have deprecated rank; if useless (valid) IDs are among useful ones, prefer the latter ones with preferred rank; if an ID has been fixed in the external database, we might want to remove it from Wikidata or apply deprecated rank, maybe with a qualifier that indicates the reason.
@Pigsonthewing: Fine! Then en:Wikipedia:VIAF/errors should be linked from the top section of Property talk:P214 probably. I've also sent links to the constraint violation reports to external maintainers, especially to the single value constraint section where duplicates are collected. Two problems: 1) External errors should be added to the exceptions. But ok, those could be checked by the maintainers then. 2) The ratio of actual duplicates varies and the maintainers might not have enough resources or interest to sort them out. Thanks for the reply, --Marsupium (talk) 15:10, 30 May 2017 (UTC)
Should some propertymaintained by (P126)some maintaineremail address (P968)[email protected] and issue tracker URL (P1401) if available be used on the property pages to add contact information? In fact the website look-up is not the best way and sometimes a conversation might indicate a better way than that given on a website. If a way given on a website is the best the website can be referenced on the property page. Probably this should show up somewhere in the Template:Property documentation?
I’d suggest to leave a note on the talk page of the property, containing the following details: who wrote to them, when did that happen, what was the outcome, …. We should try to bundle our conversation with external database maintainers as much as possible, and the individual property talk pages seem to be much more suitable for that than a separate WikiProject.Support completely. I've alredy represented that information in this draft: Tables aren't the fastest way to do that though. Probably a template could help.
all current IDs should not have deprecated rank; if useless (valid) IDs are among useful ones, prefer the latter ones with preferred rank OK. The question is about cases like Q26822078#P836: The formerly obviously valid ID E05005938 now throws a 404. For Union List of Artist Names ID (P245) the content of records silently disappears when the Getty Vocabulary Program merges records. The use in keeping invalid IDs is obviously not any further information, for IDs whose issuers don't use redirects or otherwise indicate the former use of an ID like GSS code (2011) (P836) and Union List of Artist Names ID (P245) that might be interesting to third parties who can find that information here then. Thus we could deprecate the statement and set the qualifier XYreason for deprecated rank (P2241)withdrawn identifier value (Q21441764). (Interestingly Q26822078#P836 uses the qualifier but hasn't deprecated the statement.)
Although I am not very experienced with template and module coding, I guess it would be easily possible to make a template that pulls maintainer information from the database item and displays it on the property talk page. This would make maintainer information available where you would typically expect it to show up. —MisterSynergy (talk) 16:27, 30 May 2017 (UTC)
Kommentar I will say that numbers of issues in the VIAF database are caused by Wikimedians incorrect additions. @Pigsonthewing: do you know the turnaround time (or time range) for us deleting our incorrect assignations to be reflected in the VIAF database? — billinghurstsDrewth23:25, 30 May 2017 (UTC)
I have created Template:External reference error reports and Template:Error report row. Their quality and functionality is very basic and should definitely be improved if it turns out they are useful in principle. Unfortunately I didn't manage to hide the reports table in the case it is empty.
@Marsupium: I like it and left a couple suggestions for improvement.
I think having a table row for each occurrence will not scale; for some properties, especially before post-import clean-up, we have hundreds, if not thousands. And the manual work is too much, when constraint reports automate much of it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits10:51, 31 May 2017 (UTC)
@Pigsonthewing: I think this template is for deeper discussions on difficult cases. It's not necessary to fill it for all cases. Also, it's the first time we're tracking reactions from source database maintainers, so let's see how it goes. (It's not a substitute for a proper issue tracking system, so if such collaborations really take off, we'll need a better system) --Vladimir Alexiev (talk) 11:58, 7 June 2017 (UTC)
Deprecated? I must have a different understanding of the word. It is wrong to have our item assigned to an amalgam of people at VIAF. If it continues to exist I see little hope to resolution. I would like to see a stronger argument put forward to why 'deprecation'/retention is a path to resolution. — billinghurstsDrewth11:09, 31 May 2017 (UTC)
I hardly think that applies for this situation where two items point to the same incorrect VIAF identifier. 1) it sits outside our violation process, 2) we regularly remove removed VIAF data; 3) the rankling examples relate to factual components, eg. dates of birth; theories; etc. not some dynamic authority control series. — billinghurstsDrewth13:01, 31 May 2017 (UTC)
The examples are just that; they are neither definitive nor restrictive. The scenario at hand fits precisely into the definition at the linked page. As for "we regularly remove removed VIAF data" this is wrong and should stop. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits18:53, 31 May 2017 (UTC)
For duplicate IDs what do you think about tagging them like this:
I have been using publication date (P577) for video games that have been published. Can anyone recommend how to indicate in this field that a project was never published because, 1) it is a future product that will be published in the future, or 2) the development of the product was cancelled before it could be published? Or an alternate solution? Thanks. SharkD (talk) 23:09, 1 June 2017 (UTC)
Here you can set your valueYou can set the value for P577 to novalue for unpublished items by clicking on the small blue blocks and to unknown value for items that will be released in the future. Q.Zandenquestions?00:52, 2 June 2017 (UTC)
We discovered here that there is a third possibility where an item's publication date simply has not been defined yet, and that catching all three cases using a query is very very slow. The query that we defined timed-out when trying to do this. Does anyone think a new "publication status" property would be a good idea? Values could include "unreleased", "released" and "cancelled". SharkD Talk 03:09, 7 June 2017 (UTC)
I want to post here first, but in the process of trying to make on article on an institution that gathers the ministers of education (or equivalent) from all 28 eu countries, I discovered that many of these posts don't have a wikidata item, or if they do, their wikidata item corresponds to a list from wikipedia of people who've held that position rather than an instance of that position itself. I also have discovered that many pages don't differentiate between the government (the executive and administrative authority of a country responsible for day to day governance) and the cabinet of a country (the collective decision making body of senior ministers), having an page combining them both. Essentially, I don't think many of the ground rules have been set or clearly defined in this area and a lot of the data being imported from wikipedia is messed up.
It would be great to have pages on all ministerial positions within EU countries at the sovereign state level, with a record of their officeholders and the times they were in office (present and past), as well as what positions they replaced and the time they came into being (eg. if a minister of education, becomes minister of human capacities - yes, Hungary really does have a position by that name),. This would be great as you could for example, work out who was at a meeting of EU ministers and their official title, based simply on the date.
Let me know if anyone is doing something similar. I am biased towards starting with EU countries first - partly because I'm working on that and also for the good reason that ministers from the government of the EU's 28 countries form it's main legislature and so having a record of who was in office at any time is quite useful for writing articles on it. EU explained (talk) 10:35, 4 June 2017 (UTC)
It would definitely be good to clean those up - I've seen some of those "list" items but I haven't done anything about them myself at this point. ArthurPSmith (talk) 15:14, 5 June 2017 (UTC)
I've been doing some work in this area over the last year, and it's actually something that I'd been hoping to get around to in earnest within the next couple of weeks (currently I'm trying to sort out lots of issues with Heads of Government, which have a lot of the same problems, particularly around the "Prime Minister of X" vs "List of Prime Ministers of X" conflation, which is particularly time-consuming to untangle as often the "equivalent" page in different Wikipedias are actually about one or the other and all need split up. And of course that one has a whole bunch of other issues due to there being at least three different ways to say who the head of government of a country actual is… — come help out over at Wikidata:WikiProject Heads of state and government if you're interested)
The situation with Ministerial positions is actually even worse than many people realise, in that the position held (P39) information in lots of cases isn't even just the "simple" wrong version of pointing at a "List Of" page, but often says that the person was the "Ministry of Education' rather than the Minister (as lots of these were imported automatically from Wikipedia infoboxes without checking that the link was to the post rather than the department). I've been tidying a lot of those up as I come across them, and linking them together with organization directed by the office or position (P2389) and office held by head of the organization (P2388), etc, but this has been very ad-hoc so far. The worst is when the item page is a mishmash of information on the minister, the ministry, and the list of people who've been the ministry.
I also raised the issue about Cabinet vs Government a while back, though the response was a little inconclusive. I think it's mainly just a matter of going through and making sure that entries are created and linked correctly for each. I'd be very happy to be involved in working out the best way to do all that, putting together queries to track which ones need tidied up, with daily reports to spot any future problems etc, if there's general interest in turning it into a WikiProject.
In terms of tracking historic names, it's never been quite clear to me when we want a new "Minister of Education and Science of Placeistan" post as a replacement for the previous "Minister of Education of Placeistan", or when we want to just make that a change of name on the existing item. Either approach brings problems both for data maintenance, and for being able to write queries against it, but it's something that happens frequently enough that it would be good to resolve. Another thing to discuss on a WikiProject page, perhaps…
Yeah, that name change issue is definitely another one I've run into a lot. I think in general if it is simply a name change, and not a reorganization of responsibilities (for example pulling some pieces from other departments or merging several departments) then it should probably be a single item with the historical name identified with start and end dates. But if the name change is a consequence of an actual structural change of some sort, then it probably should be a new item. ArthurPSmith (talk) 13:39, 6 June 2017 (UTC)
It could be very complicated sometimes. Here in Sweden (Q34) we often have two ministers of education, one for education of children and one for higher education and research. The latter normally having a higher rank than the first. As far as I know, we have never had any known as "minister of space". When the EU ministers of space meet, it is either the "minister of research" or the "minister of industry" who participate depending on how the current government has selected to organise their work. -- Innocent bystander (talk) 15:31, 6 June 2017 (UTC)
@ArthurPSmith: — that sounds like a sensible approach, though in practice I'm not sure how easy it is to know that, especially for historic renamings. To pick a fairly random example, The Australian health minister has been renamed 12 times in the last 30 years: https://en.wikipedia.org/wiki/Minister_for_Health_and_Aged_Care — adding all those names with dates to a single article is relatively easy (though tedious, unless we script it), and can be done by anyone vaguely interested, but working out which ones were actually structural changes that deserve a separate article is a lot more work, and requires a much deeper understanding of Australian political affairs. --Oravrattas (talk) 07:42, 7 June 2017 (UTC)
Other than the constituent elements which make up "bad thing" and "good thing" what could be a practical application of these? —Justin (koavf)❤T☮C☺M☯16:21, 5 June 2017 (UTC)
We can remove duplicates, but such items could be used to store opinions: what is a bad thing or a good thing.
We cannot have any definition for these items, only to cite opinions IMO. d1g (talk) 08:41, 6 June 2017 (UTC)
@Infovarius: Did you know about evil (Q15292)/good (Q15290) (2012 year of birth)? How about "Should we now classify some (all?) items into one of these 2 groups? :)"? No preliminary discussion with the author of the item - is bad or good thing? For whom? Discussion behind the author's back, without pinging - is bad or good thing? For whom? Items Q30126951/Q30127019 had ([2]/[3]) pointer for whom (subject). Without such items, the user will not be able to ask such a question to the wikidata's model of the world and get an answer - Q30126951/Q30127019 for whom? --Fractaler (talk) 13:07, 6 June 2017 (UTC)
Wikidata is not making statements about goodness of something, so these items are meaningless - they cannot be used in Wikidata. About pinging: I am sorry, but you are known for controversial things (notions, disambiguations, categories and others), so your participation in the discussion is often meaningless. Sorry, but you are walking on the edge. --Infovarius (talk) 11:17, 7 June 2017 (UTC)
I kindly ask all of you not to continue this discussion here, since it is happening too much on a personal level. The issue itself is discussed at WD:RfD (already linked and backlinked). Thanks, MisterSynergy (talk) 11:24, 7 June 2017 (UTC)
Cannot merge Q6849468 into Q1458946
Please try to resolve this. There is no need to separate East Asian and Western languages in this topic.--Jusjih (talk) 00:12, 7 June 2017 (UTC)
Quite happy to collaborate particularly when it is about former countries. I only use "position held" so far. Thanks, GerardM (talk) 11:48, 7 June 2017 (UTC)
I've been toying with 雨 (Q3595028). I don't know of other efforts to work on kanji/hanzi on Wikidata, so I added some relevant statements to this one. Certainly we will need more properties in the future, especially with the incoming Wiktionary integration. In any case, commentary, criticism and help are appreciated. ~★ nmaiad16:45, 7 June 2017 (UTC)
Space missing
In Module:Wikidata is shown a table of examples. In line 12 appears a mistake in German interface:
I think it is likely that there will be many items that will be in both catalogues, is this going to cause an issue?
I think it would be really nice to have these databases imported in time for the Celtic Knot languages conference on the 6th of July. Does anyone have any suggestions of how to encourage people to do some mixing and matching? There's about 1,500 items needing matching on the UNESCO dataset and 15,000 on the Glottolog dataset, so its quite a lot of matching needs doing, I will keep chipping away at them but I feel a bit like I'm digging a hole with a teaspoon.
No, that's the point; Wikidata acts as the "hub" for the entries in each of the two (or more) external systems. Obviously, there should be one Wikidata item, not two!
Use the UK mailing list, and ask WMUK to promote this. Get them to mail the booked attendees.
If I am understanding the UNESCO data correctly, when a language population extends into two adjacent countries, UNESCO assigns an AWLD ID for the language in each country. Wikidata will generally have a single item for the language (correctly in my view), and I believe we should put both AWLD IDs on that item with qualifiers to indicate which country each ID is associated with. My question is, what location property should be used in this qualifer? In Cofán (Q2669254) I used located in the administrative territorial entity (P131) but I know that can't be right. @John Cummings:, FYI.
Also, adding two values causes a constraint violation. Is there any way to write the constraint such that multiple values are allowed if they have a specific qualifier? - PKM (talk) 19:39, 8 June 2017 (UTC)
Thanks for that link. UNESCO page says "In the case of outlying communities, the editors had the possibility to create separate entries, indicating respective levels of endangerment and numbers of speakers." In some cases, WP editors have created two articles and thus two wikidata items. Where they have not, I think both UNESCO numbers should be attached to the single wikidata item with location qualifiers. And since the alternate label for valid in place (P3005) is "applies to location", that is the qualifier I'll use. Mix'n'Match makes it really easy to find these, so I'll try to keep them cleaned up as more are matched. - PKM (talk) 00:40, 9 June 2017 (UTC)
Date qualifiers
I have a question around which combinations of date qualifiers to use which I'm hoping someone here might help out with.
I want to describe the water quality of a lake WFD Ecological status (P4002) where the given value is the decided one for "2016" but relies on measurements and assessments made between 2008 and 2013. I know I could fall back on using only point in time (P585)"2016". But I would ideally want to include the data collection period as well. Any suggestions are welcome. /André Costa (WMSE) (talk) 14:34, 2 June 2017 (UTC)
@André Costa (WMSE): No, we don't tell consumers if the order matters. But in this case my example of the order would get confusing because there are many interpretations possible. The order I gave later is to make clear that the date that this situation is for 2016, but was measured between 2008 and 2013. I hope this explains a bit? Q.Zandenquestions?16:15, 2 June 2017 (UTC)
I don't think that it makes sense to depend on the order in any way. There are no guarantees anywhere that the order is stable. SPARQL queries don't know the order. I think it's very easy for the proposed solution to lead to problems.
@QZanden, ChristianKl:. I agree with ChristianKl that relying on the order is risky. Using significant event (P793) is problematic however since we are only talking about measurement data used in the determination of WFD Ecological status (P4002). Other properties might rely on on other dates. Or were you thinking something like
@ChristianKl: My only worry with this is that it becomes complicated if other statements have data collection ranges. There is also the question whether data collection for a particular statement really is a significant event for the whole item. I have no better solution merely wondering if this one is preferred over excluding the data collection dates completely./André Costa (WMSE) (talk) 00:01, 8 June 2017 (UTC)
You can have any discussion about the GUI at this place. If you have concrete suggestion for changes, you can also fill a phabricator ticket. ChristianKl (talk) 08:45, 7 June 2017 (UTC)
My main issue is that the dropdown lists don't disappear when you tab between form elements using the keyboard. I prefer using the keyboard more than the mouse, and with all the lists visible, I can't see what I'm typing. SharkD Talk 09:00, 7 June 2017 (UTC)
Hard to say. I think the dropdown list stays open whenever you don't actually select something from it. You have to select something from the list to make it go away. But the form changes focus sometimes without your intervention. SharkD Talk 06:29, 8 June 2017 (UTC)
This is pretty normal to my experience. The covi page diff you’ve provided indicates that the evaluation of the covi report was performed at 2017-06-07T11:58:59Z, which was before you applied the fix in Subhash Khot (Q7631228).
AFAIR @Ivan A. Krestinin uses an offline copy of the database for evaluation. This might cause further delays, although I am not sure how much this could be.
Breaking change: improving the schema of wb_terms table
Hello,
This is an important information regarding our database, for people running tools and scripts.
The change
In the wb_terms table, a new column term_full_entity_id containing strings with the full ID including the letter (ie. “Q42”) will be created, and will be used instead of the current column term_entity_id that only stores the numeric part of the ID (ie. “42”).
This change is made, among other things, to support the new entity types to come, and then to store terms of entities that have non-numeric IDs, for example Forms (“L42-F1”).
Implications
This change only affects tools that directly access the database. Other tools, for example those getting data through the API, pywikibot, gadgets, etc. will not be affected and will work as before with no action required from the maintainers.
in order to adapt tools to the new database structure, database queries using wb_terms should be changed so that they use term_full_entity_id column instead of term_entity_id. This might potentially simplify queries. For example, instead of having a condition WHERE term_entity_type='item' AND term_entity_id=42, one could now do: WHERE term_full_entity_id='Q42'.
term_entity_type is not affected by this change, and will still be available as before.
The process
Starting now, we will work through several steps to achieve this change. Note that the dates indicated below may not be exact. We will announce each of the steps separately. Here is a rough timeline:
June 22th: the new column, term_full_entity_id, becomes visible on Labs. It will be fully populated in the testwikidatawiki database replica on Labs. Note that this column will remain incomplete in the wikidatawiki database until later! Tools that use the wb_terms table can test new code that uses the new term_full_entity_id column with the testwikidatawiki database, but must keep using the old code with the wikidatawiki database.
Some time later: testwikidatawiki should be fully populated, and become usable on the wikidatawiki database. Tools that use the wb_terms table can now switch to using new code that no longer uses the old term_entity_id column on all databases.
Eventually (not before July 6th): the old term_entity_id column is removed from the testwikidatawiki and wikidatawiki database replicas on labs. Any code that still uses the old term_entity_id will break.
I'm not sure if new concepts (L42) will be accessible along with Q42 (old).
Everything is fine as long we have at least one way to access core concepts language-independently (currently with Q/P and numbers). d1g (talk) 14:06, 8 June 2017 (UTC)
Does it make sense the unknown value or no value in properties as image (P18) or place of birth (P19) ?. See exemple at Bivin of Gorze (Q605836). I understand a no value for an unvariable situation, like a spouse (P26) in a single dead person, cause it can't change. However, the image could not be known now, but someone can find it in the future. Should we fill with unknown/no value all empty properties ?. Obviously not. So, what's the meaning of do it in some items as the exemple?. Thanks,--Amadalvarez (talk) 11:39, 8 June 2017 (UTC)
If no known depiction of someone exist or if place of death is not known than I can see using unknown value or no value to indicate that hopefully with references. Maybe we need more specific item like "no known depiction exist" to make it more clear. --Jarekt (talk) 12:37, 8 June 2017 (UTC)
All people were born somewhere, so all of them should have place of birth (P19), even somevalue (novalue would always be wrong). However, since we are still building our database (we will never reach the finish, I am concerned), adding place of birth (P19): somevalue to all items about people would make this process harder. Thus, special values usually require sources. Matěj Suchánek (talk) 16:14, 8 June 2017 (UTC)
I'd use no value in cases we specifically want to express absence of value - e.g. see George Washington (Q23) property position held (P39), the value for President of the United States (Q11696) has "predecessor" as "no value", because George Washington (Q23) was the first President. In the same manner, someone who never married can have "no value" as spouse, but I would add it only if this fact is specifically notable, since a lot of people are unmarried and by itself it's not that important. However, for Isaac Newton (Q935) it might be notable to mention he was a lifelong bachelor and as such spouse (P26) has "no value". For image (P18) I can't imagine many situations where "no value" would be appropriate - maybe for description of something that can not have any image for some notable reason? "Unknown/somevalue" is used when we know the value is there, but we don't know what it is - e.g. if we know somebody was married, but the identity of the spouse is unknown. Mass-adding "somevalue" for place of birth (P19) though wouldn't make sense IMO because it's not an interesting data by itself. However, when we have somebody who we know died, but don't know when exactly, "unknown" may be appropriate, since knowing whether somebody is alive is usually important. Laboramus (talk) 20:28, 8 June 2017 (UTC)
@Jarekt, Matěj Suchánek, Laboramus: If I understood correctly, we all have a coincident opinion about use this values when the value is significative and unchangeable. We can't write "all the things we don't know", because the list is infinite. We can talk about we have or we known, for instance, we known that George Washington (Q23) has "predecessor" = "no value" and it will not change because it happens in the past. In the case of image (P18), nobody knows when we'll find a image (or draw) of the person or place; so, "no value" or "unknown" are circumstancial value. Even, we agree that fill of "no info by now" all the theoretical properties is not a good idea. Then, if you don't mind, I'll delete this value when it doesn't make sense. Thanks for your cooperation. --Amadalvarez (talk) 21:49, 8 June 2017 (UTC)
Kommentar I do get concerned with things like date of death where "unknown value" is used, or a really rough approximation is used, and it is because the person adding the data themselves does not know. Our guidance on the use of these criteria of "no value", "unknown value" and approximations needs to be stronger. I would also think that there should be guidance given on the property pages to demonstrate 1) whether it is acceptable, and 2) the cases where it would be used. Even better would be constraint monitoring. I know the VIAF property has a use example. — billinghurstsDrewth03:56, 9 June 2017 (UTC)
I do think that we can do more, as we still have poor use. A couple of redirects to there are probably good, and I have just done a couple. I think that we need some level of over-arching statements that for data fields that the use of the tags is based on the sum of human knowledge, not an individual's knowledge base of the subject. For look-up fields, eg. authorities identifiers, absence of data allows or disallows the use.
For specific, I would like to see that guidance provided, so if no date of death, we say that it is not okay to use "no value", we test for it; there is limited scope for adding "unknown value", that is supported by examples of appropriate use, and state that it needs to be referenced to be a valid use. Similarly that guidance should cover the application of vague date ranges, so when is it appropriate to give a date of death of "20th century"? Or maybe it is never appropriate as vague dates are problematic for data recall and for testing whether the field is being used or not (and that is a problem that we see at the Wikisources). — billinghurstsDrewth12:50, 9 June 2017 (UTC)
How do I look up what city a person was born or died in?
I am writing c:Module:Creator that creates infoboxes on Commons about basic biographical info about people including place of birth and place of death. With places of birth/death the smallest unit is traditionally a city and if unknown than country. Wikidata place of birth (P19) and place of death (P20) properties are similar but significantly different. The description ask that the location should be "the most specific known (e.g. city instead of country, or hospital instead of city)", so now infoboxes show names of specific hospitals, neighborhoods or even houses as places of birth/death. For example for Pyotr Ilyich Tchaikovsky (Q7315)place of death (P20) is Malaya Morskaya Street (Q3449016) but traditionally his place of death is listed as Saint Petersburg (Q656) (see for example here). Infoboxes that rely on Wikidata incorrectly show Malaya Morskaya Street (Q3449016) (see for example here). I am trying to figure out some logic to get Lua code to consistently round up all the places of birth/death to show the city not specific building or street. Any idea how? I guess I can follow properties located in the administrative territorial entity (P131) until I get to an item whose instance of (P31) is subclass of (P279) a city (Q515). But that seems awfully complicated to just look up place of birth or death, not to mention that I might have to load bunch of large items to look it up. Any solutions I am missing? Any Lua codes out there to do that? --Jarekt (talk) 12:33, 8 June 2017 (UTC)
We don't need to show unspecific place of death, where exact coordinate or area is known.
@Jarekt: exact fragment from Russian page: Болезнь протекала тяжело, и Чайковский скончался в 3 часа пополуночи 25 октября (6 ноября) от холеры «неожиданно и безвременно» в квартире своего брата Модеста, в доме № 13 на Малой Морской улице. Claimed source is Правительственный Вѣстникъ 1893, № 235, p 2. 26 октября (7 ноября) d1g (talk) 15:19, 8 June 2017 (UTC)
Based on your requests to get city of place of death; your logic is based roughly on assumption that humans only die inside Cities, which is incorrect :-)
In majority of cases people die in some human settlement line city, town or village. If not than people usually list a nearby place (and stares that it is "near") or lists region, country, ocean, etc. My concern is with places described with precision finer than city. By the way current place of death on w:es:Piotr Ilich Chaikovski is listed as "sin etiquetar, Unión Soviética", which seems even worse. It seems like that page should overwrite Wikidata location to get expected location. --Jarekt (talk) 17:14, 8 June 2017 (UTC)
Description of Wikidata item can be used to disambiguate individual cases.
that creates infoboxes on Commons about basic biographical info
I think that such information is less expected at commons (not expected at all)
Sometimes both values (the most correct, sourced and the "client-friendly") are listed (even with different ranks), sometimes qualifiers are used but either is actually wrong. The tree approach should definitely be used. I understand coding this up is going to be very hard. If it shows up to be impossible or unsustainable, though, we can sit down with the developers. Matěj Suchánek (talk) 16:37, 8 June 2017 (UTC)
Matěj Suchánek this almost calls for a property derived from place of birth (P19) or place of death (P20) which is precalculated by a query and stored and which is not manually edited. "Country of death" is even more problematic since it could be either Soviet Union (Q15180) or Russia (Q159). By the way a query to find "place of death" (rounded to a city level) would be:
SELECT?item?itemLabel{wd:Q7315wdt:P20?podItem.# "place of death item"?podItemwdt:P131+?item.# item we ase looking for is a parent of "place of death item"?itemwdt:P31?cItem.# item is and instance of "city item" ?cItemwdt:P279+wd:Q515# "city item" is defined as subclass of citySERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en"}}
Very good question, I would be interested in an answer as well. As long as it is difficult to retrieve information from a different item than the one which the sitelink is connected to (particularly if this includes extensive searches in the knowledge tree), users tend to model data in a way that somehow fits their needs, but does not make use of Wikidata’s full potential. A “flat Wikidata” would be the outcome, in which all relevant information is collected within the item by itself, even it is utterly redundant or misplaced there. Example: place of birth (P19) with qualifiers located in the administrative territorial entity (P131) and country (P17) to indicate which district and country the place of birth was located in when the person was born (plenty of examples); there are a couple of similar other situations. It would be useful if SPARQL queries were available in Lua modules. Is this actually the case (I am totally inexperienced with modules), and if so how would it work? A fairly simple query identifies the city in which Pyotr Ilyich Tchaikovsky (Q7315) died, and I guess using the query service would be the most effective way to browse the knowledge tree. —MisterSynergy (talk) 16:53, 8 June 2017 (UTC)
As far as I know modules using Lua can not run Database queries and there are no plans to allow that. However modules can write code to pick an item (expensive operation), look up some properties and based on findings repeat the process until proper item is found. But that would be an ugly code that might have to traverse through a lot of items. That is very expensive resource wise. Also Commons Creator templates that would be using it are "templates" which are transcluded in millions of pages and all those pages would have to go through the same process to look up place of birth/death. --Jarekt (talk) 17:29, 8 June 2017 (UTC)
Do you know of any discussions regarding SPARQL queries within Lua modules? I am interested to read more about it… Thanks, MisterSynergy (talk) 18:47, 8 June 2017 (UTC)
Thanks. I can’t find an explicit request to integrate SPARQL into Lua in phabricator. Can anyone else? We should consider to open such a request otherwise. —MisterSynergy (talk) 20:20, 8 June 2017 (UTC)
@Jarekt: I apologize if I don't response exactly what you ask for. In CAtalan WP we are changing all infoboxes to get from WD as much properties as possible. Recently we have build ca:template:infotaula geografia política that manage all kind of administrative divisions of territory. To solve the upper administrative division where a specified city or regions belong to, we implement in our ca:module:wikidata a function called getParentValues that recover -in recursive way- all the located in the administrative territorial entity (P131) and instance of (P31) from the item you are until "n" levels or until find the country (P17) one. Even includes an optional list of instance of (P31) to exclude from recovery if you consider that it's not significant. If you are used with LUA (not my case, sorry) you can get some idea of solution we applied. To see the results of this feature in the mentioned infobox, you can see ca:Spiere (without paràmeters) or ca:Londres (with just 2 parametres of additional info). I hope it could be usefull to you. --Amadalvarez (talk) 21:33, 8 June 2017 (UTC)
Are there possibilities, and if so which, to get 'What links here' like information for an item that not only shows the other items linking to it but also for which property they are the value? And/or an overview ONLY of the properties? - Andre Engels (talk) 06:35, 9 June 2017 (UTC)
+1 I would include respective properties and sort/group "relevant" items by properties.
When using Wikipedia as a source, do I tag each claim using "imported from" and "English Wikipedia"? Or is there a preferred method? SharkD (talk) 22:18, 2 June 2017 (UTC)
In general it's better to use sources that are external to Wikipedia but if you take information directly from Wikipedia "imported from" is the way to go. ChristianKl (talk) 22:21, 2 June 2017 (UTC)
I just saw the previous topic on this page. Should I use "retrieved" in conjunction with "imported from"? I don't fully understand the other discussion. But it seems people want to also distinguish between bot versus user edits. SharkD (talk) 22:46, 2 June 2017 (UTC)
When it comes to bot edits it's relatively little effort to tell a bot to use "retrieved" every time it makes an effort. On the other hand with human editing it's unfortunatley takes addition time to use more. I think the quality is a bit better if you do use retrieved but it might not be worth the effort for manual editing. ChristianKl (talk) 09:25, 3 June 2017 (UTC)
Mac OS operating systems (Q43627) links to different concepts in various languages: "Macintosh operating systems" (classic+macOS+etc) in English, "Classic Mac OS" in Italian, and even "MacOS x" in Ukrainian. Anyone willing to clean this mess? :-) Syced (talk) 06:49, 9 June 2017 (UTC)
And you made a mistake: you have removed sitelink, but, if there is an error, you must move valid sitelink vs others item or vs new item with relevant property. If the situation is messy you can use this page --ValterVB (talk) 07:15, 10 June 2017 (UTC)
In my opinion, the English label for macOS (Q14116) should be "OS X". The term "macOS" (as written) is almost never used in English discussion about computers and "OS X" is Apple's official brand name for it.--Jasper Deng (talk) 16:53, 9 June 2017 (UTC)
When it comes to the example of the German CDU I think it makes sense to model Angela Merkel as chairperson (P488) and Peter Tauber general secretary (P3975). The CDU doesn't have a title of party leader Angela Merkel's title is "VORSITZENDE DER CDU DEUTSCHLANDS" and "Vorsitzender" is the German word for chairperson. ChristianKl (talk) 22:05, 9 June 2017 (UTC)
The issue is that some parties have both a chairperson and a party leader, some have one or the other. So, it seems to me that we need a new property for party leader. Danrok (talk) 12:51, 10 June 2017 (UTC)
Are you really sure you need a place of publication (P291) for that kind of item ? The more correct value should be Earth (Q2) but as we don't have any example of creative work published or created outside of Earth I think this is not a critical question. Snipre (talk) 01:03, 10 June 2017 (UTC)
I think you mix release and publication. A book can have an international release but it will have always one place of publication. A place of publication for a video game should be more related to the place where the final version was edited. I thing we reach a problem of property definition here. Snipre (talk) 03:39, 10 June 2017 (UTC)
Wow! I looked for this but missed it! Thanks. Unfortunately, "place of publication" is not showing up in my Contributions list, so I can't go back and fix them. SharkD Talk 10:48, 10 June 2017 (UTC)
There are different phrases when categorizing court case laws by year on English Wikipedia and English Wikisource. Renaming the categories on either wiki requires major works, so we have not tried this. Please reconsider it and also from Q15625194 to Q8202761? Thanks.--Jusjih (talk) 17:38, 7 June 2017 (UTC)
There are plenty of people not eligible for admin rights - not to mention those who should have them but would not be able to due to the petty enmities of others in this community - who would still like to request page protection from time to time. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits12:27, 11 June 2017 (UTC)
Death categories
A series of en.wiki categories "Category:Death in xxx" are incorrectly interconnected in Wikidata with categories from other languages which are equivalent to "Category:Deaths in xxx", meanwhile the corresponding enwiki "Category:Deaths in xxx" are in other items connected /or not/ with sitelinks from other wikis.
For clarity, categories "Category:Death in X" are for ~anything about death in the place X (e.g. including sub-categories for cemeteries, burials, funerals, mausoleums, etc; example en:Category:Death in the United States), while "Category:Deaths in X" are for people deceased in the place X. Many wikis currently have ~only categories for "Category:Deaths in ...", so, from these two types of items this one should be the "most popular" one, with the most sitelinks, but the current situation shows that we have mostly mixed concepts in items related to this subject.
Other items have no enwiki sitelink, only wrong English label "Death" instead of "Deaths", having correct sitelinks and labels in other languages, for example Q6240650, Q17388715, Q5760689, Q6433900. Seems that in these cases the bad label was added by bot(s), following the most widespread pattern (which is wrong).
Most frequently (almost always or even always), sitelinks for "Category:Deaths in xxx" in Russian (Категория:Умершие в xxx), Ukrainian (Категорія:Померли в xxx), Belarusian (Катэгорыя:Памерлі ў xxx) are correctly interconnected between them, but they may be in the same item with a wrong enwiki sitelink. XXN, 15:43, 11 June 2017 (UTC)
Please don't fix yet these examples; let other people understand the problem.
This could be probably corrected in mass using a bot, if the correct patterns for interconnecting pages from wikis will be known with certitude. XXN, 15:54, 11 June 2017 (UTC)
Power stations
Hi. I've been working with power station articles for some time, and though of helping move data of individual power stations to WikiData, but I'm facing a problem... Each type of power station (i.e. Geothermal, Nuclear, Pumped-storage, Solar, Thermal, Tidal, Wave, Wind) all have different specifics. For example a wind farm will have number of wind turbines/turbine manufacturer/turbine type/etc, or a nuclear plant will have number of nuclear reactors/reactor type/etc (see here for full list of such specifics).
My problem is, I am not able to figure out a way of adding those entries to WikiData. Are they purposely not included, or not included yet? I'd love to add myself, but I don't know how. Can someone guide me on how to go about with this please? Thanks! Rehman10:59, 8 June 2017 (UTC)
each station should probably have a instance of (P31) value that is the appropriate type (Geothermal, Nuclear, etc). If there are specific properties needed to support aspects of a given plant you could suggest them as new property proposals - see the Wikidata:Property proposal process. But most likely you can use existing properties - for example has part(s) (P527) with qualifier quantity (P1114) - see day (Q573) for an example where a "day" is indicated as having 24 hours. ArthurPSmith (talk) 13:48, 8 June 2017 (UTC)
I personally think new properties would help in this case, as there are literally hundreds of thousands of power station articles that needs proper syncing with WikiData, in order to easily integrate with infoboxes, etc. I'll look at the property proposals page. If anyone got time, please do feel free to lend a hand :) Rehman23:36, 8 June 2017 (UTC)
is meant to solve this kind of cases. That allows you to enter all the informations you want, I think. author TomT0m / talk page06:15, 9 June 2017 (UTC)
I'm totally stuck on how to proceed from there, as this is blocking the possibility of using Wikidata to auto-fill the power station infoboxes at the Wikipedias. Rehman13:42, 12 June 2017 (UTC)
Izno, I've added some details to the above power station and the generation unit. Could you assist to check if that's the correct way to do it? I'll then see if I can figure out how to extract those details on infoboxes at Wikipedia... Rehman14:23, 12 June 2017 (UTC)
That actually looks pretty reasonable. The one issue I see is powerplant: You've added a powerplant which is the company, not a system designed/built by that company. Perhaps qualify it with "designer/constructor/whatever we have for such a property" property rather than powerplant, since I would guess the specific models of turbines are basically one-off designs. If they're not one-offs, then it might be worthwhile to create items for those and then the use of the powerplant property becomes fine. --Izno (talk) 14:29, 12 June 2017 (UTC)
@Rehman: Create an item for each turbine. Each will then have their own claims. Link them to the power plant by « has part ». author TomT0m / talk page17:58, 12 June 2017 (UTC)
Looks like the correct way to go. Feel free to add more :-) I guess famous lawyers have their "occupation" set to "lawyer" but few editors have cared about their exact position, so far. Syced (talk) 06:56, 9 June 2017 (UTC)
@Oravrattas: - Thanks for that. You're right that position held (P39) is not the right property, though my gut tells me that "position held" is being misused all over the place in Wikidata given the ambiguity of the property label. The employer (P108) with position held (P39) qualifier is better, but could also seen as non-ideal since a partner is not just an "employer/employee" status. We also don't have a clean breakdown of legal partner vs financial/business partner. Therefore, a query to find all people who are partners in a legal firm is somewhat complex, as it would have to drill down into the subclass property of the firms in which someone is a partner. Unfortunately, a quick spot check of several law firms show that they are quite bare, and don't even have any statements that indicate they are law firms (Milbank LLP (Q6850819), Rose Law Firm (Q7367828), Debevoise & Plimpton (Q5248071)). We have a long way to go to properly model this, it seems. -- Fuzheado (talk) 12:07, 11 June 2017 (UTC)
A related issue is directors of firms - these are often not "employees" as such & an interesting relationship to model. Don't have a great solution at the moment. Andrew Gray (talk) 12:28, 11 June 2017 (UTC)
@Andrew Gray: - Good point. This is another example of our general systemic bias – lack of interest in making "corporation" entries detailed and complete. -- Fuzheado (talk) 15:32, 11 June 2017 (UTC)
@Fuzheado: Hoi, it is more a sign of immaturity then of systemic bias. Systemic bias is in the lack of data to do with the global south for instance or any other diversity issue. Thanks, GerardM (talk) 15:48, 11 June 2017 (UTC)
@GerardM: - True, but systemic bias is a much larger issue than just global south vs Western European notions of knowledge. At least on English Wikipedia, WikiProject:Corporations is pretty much a dormant project, with few people finding the motivation, or the sympathy, to improve articles about Fortune 500 companies in substance or quality. I suppose Wikidata has the chance to balance this out somewhat. It will be interesting to see. -- Fuzheado (talk) 17:54, 11 June 2017 (UTC)
@Fuzheado: There is already bias in concentrating on the Fortune 500 companies. They are American in scope.. There is a world out there. Thanks, GerardM (talk) 18:02, 11 June 2017 (UTC)
Spotty data is imho immaturity, the lack of data for countries like Brazil, Angola or Mozambique is bias. This absolute lack of data is really problematic because none of our tool grasp how much is lacking. With the Fortune 500 we know about them and consequently tools can indicate what is missing. Quite a difference. Thanks, GerardM (talk) 12:29, 12 June 2017 (UTC)
Changing the type of “relation”
As we are preparing for migrating constraints to statements on properties, Jonas noticed that the data type of relation (P2309) is not ideal: it would make more sense for it to have the data type “property”, and link directly to instance of (P31) and subclass of (P279), instead of having the type “item” and linking to instance of (Q21503252) and subclass of (Q21514624). (It would even be possible to link to other “relation” properties, though I can’t think of a good example for that right now.)
The ideal time to make that change would be pretty much right now – once we start using constraint statements, such a change would have to be synchronized between the change on the Wikidata entities and the change to the code using those entities (and the time when that code is deployed).
This makes sense to me. The property has only been used 31 times (and some of those uses look incorrect) so it should be easy to change when we can. However, changing datatype is I think not a simple thing in itself - it may have to be deleted and a new property created. ArthurPSmith (talk) 13:54, 8 June 2017 (UTC)
You'd need two or more properties to do the same. Please see Property:P247#P2302 how it's meant be used. A problem we have currently with type constraints, is that it's either "P31" or "P279", but not any of the two. Still, some are clearly for instances (P31) only, others not. --- Jura17:52, 8 June 2017 (UTC)
@Jura1, Ivan A. Krestinin: we already use instance of (P31) and subclass of (P279) to check the type – the question is just, why should we have those two items that are really just item versions of the properties? (Practically speaking, it means two extra configuration variables of the Wikibase Quality Constraints extension – two for P31 and P279, and then two more for Q21503252 and Q21514624: the items are redundant, in my opinion.) --Lucas Werkmeister (WMDE) (talk) 10:24, 9 June 2017 (UTC)
@Jura1: The QualityConstraints extension doesn’t support it either – you can’t just create the item and have it work magically, we’ll have to implement support for it if intended :)
And I’d say that two “relation” parameters, one with “instance” and one with “subclass”, would actually be the better way to model this. That’s still possible with a property datatype. --Lucas Werkmeister (WMDE) (talk) 08:43, 12 June 2017 (UTC)
this is an aspect of a very general problem with chemical compounds and related items - it's not clear whether P31 or P279 is the correct relationship, so people use both (in the case of genes, apparently quite methodically). Obviously when one talks about for example BRCA1 (Q227339) the subject is not a specific collection of atoms at one physical location, it is a general arrangement of atoms into a DNA sequence, or perhaps even more abstractly as an information object. So it represents a subset (subclass) of all the things that are "genes" in the sense of specific arrangements of atoms into parts of DNA molecules. Yet it seems obvious BRCA1 "is a" gene as we would understand it - an instance of the concept "gene". So what should the relationship be? Similar discussions have run for a while quite inconclusively under the chemistry wikiproject for example here. ArthurPSmith (talk) 18:08, 9 June 2017 (UTC)
My world view is very primitive: I use P31 only to most isolated physical entity (or abstract if physical manifestations aren't possible), constrained or limited in space and time.
One reason for this is to get information back from ... P31 ... statements easily.
If we use P31 without prioritization, then we have hard time to say if instance of (P31)BRCA1 (Q227339) means physical object or still abstract.
With my view (... P31 ...) is always physical without need to walk full P279 tree.
Ideally anything physical should use P279 and physical object (Q223557) but there are dark corners on Wikidata to make it work.
Abstract concepts (gene) should be ever higher than physics abstract entity (Q7184903), because they are abstract.
With genes we have everything linked to one item gene (Q7187) which is ambiguous: abstract gene can be different from subclasses of physical genes.
This is not entirely correct. Metaclasses exist as real concepts in human discussion; there are some examples on Help:Basic Membership Properties. See also Wikidata:WikiProject Ontology/Modelling and references there, particularly on the experience with Cyc's higher-level classes. All that said, in general you are right that instance of (P31) should be used sparingly for abstract things - metaclasses and higher-order or ambiguous classes (as gene perhaps is) should be rare. ArthurPSmith (talk) 20:15, 9 June 2017 (UTC)
That Help:Basic Membership Properties page is rather nice, and somehow I hadn't managed to see it before. Do you know if there's another similar page anywhere that is even more example driven — ideally with lots and lots of examples, and from many different fields, rather than just one or two for each concept? Mostly I've been muddling through pages like Wikidata:Item classification and User_talk:TomT0m/Classification, but they tend to get very technical and very abstract very quickly, so in practice I've largely just been trying to follow the examples already in place, and have sort-of built up a mental model over time of how to use them. But occasionally someone will revert a change I make, and it can sometimes quite difficult to know where (or whether) my understanding is actually wrong, or when the examples I'm copying might themselves be faulty. (It was a major revelation to me, for example, when Infovariuspointed out that position held (P39) being a subproperty of instance of (P31) essentially means that anyone holding a position that way means they're also an instance of it.) Or is there a different part of Wikidata where questions about these sorts of things tend to be raised, other than here? --Oravrattas (talk) 20:30, 9 June 2017 (UTC)
@Oravrattas: WD is a wiki meaning this is a collaborative work: discuss with other contributors about your topic using a dedicated wikiproject where a common view can be defined. I think one of big problems we have in WD is a very small trend to work with other contributors, so it is very difficult to create a coherent modelling. The worst are person with the possibility to work with a bot: they are able to apply a specific structure to a large number of items without any preliminary discussion about how items should be defined, which properties should be used with which values. Snipre (talk) 00:53, 10 June 2017 (UTC)
By the way, people with more expertise in medical areas than I have should review recent edits by Special:Contributions/EricSadou - this user has changed a lot of "subclass of" statements to "instance of" in a way that I think is not correct. I've fixed one or two of them but it probably should be more systematically reviewed. ArthurPSmith (talk) 20:19, 9 June 2017 (UTC)
@D1gggg, ArthurPSmith: Please don't forget the granularity characteristic of an ontology: this means we are free to decide until which level of details we want to use to describe a concept.
Something like "use P31 only to most isolated physical entity" can be true but is not always the truth. If I take the example of a molecule of ethanol which was in my glass of wine yesterday evening, I can in theory create an item for that molecule because I can describe its position at a certain time, so I can isolate it but does it have an interest ? Do we want and are we able to identify each molecule in the universe or at least some of them ? The response is no, there is no famous or historical molecules like we have for humans or animals. So even if we are able to distinguish one molecule among billions of similar molecules, there is no interest to create item for each unique molecule. So defining ethanol (Q153) instance of chemical compound (Q11173) is not wrong if we decide to not go deeper in the level of details, if we choose to define the lowest granularity to the group of same molecules and not to the isolated molecules.
I just want to continue the explanation with another example. Currently we accept to create one item for each person (respecting some criteria, see the notability rule) but in the classification of humans the lowest entity is the individual. But this is a choice and not a rule: why can we create different items for the different aspects of one unique person ? One item for "Albert Enstein as scientist", "Albert Einstein as student": in that organization, the existing item Albert Einstein (Q937) won't be an instance of human (Q5) but a subclass of human (Q5), as one person can be considered through different aspects (profession, life steps,...)
So for me Help:Basic Membership Properties page is not a good classification rule for WD because it tries to create one unique way to define what is an instance and what is a subclass without taking care of how we really model the concepts in WD. It doesn't respect the first question we have to answer when speaking about ontology: what do we want to model, to classify ? First we have to answer that question in each field of WD and then we can create the classification rules to distinguish between instance and subclass.
My proposition is to use a more pragmatic rule to identify an instance from a subclass: the instance is the more detailed concept in a classification. I can identify the class of methanol molecules from the class of ethanol molecule using different values for a set of common properties in WD but currently I can't describe two different molecules of ethanol using the existing properties like the precise location of each of them yesterday in my glass of wine.
So the question of instance/subclass is not an universal definition but more a dynamic question which is different depending on the concepts we are modelling. Snipre (talk) 00:20, 10 June 2017 (UTC)
@Snipre: definitions like "more detailed concept" aren't self-explanatory.
@Snipre, D1gggg: naturally I disagree somewhat with both of you here (yes there is a third intermediate option :) - Snipre, "ethanol" is not actually the "lowest granularity" in several respects even without talking about individual molecules isolated to a particular physical location etc. Does "ethanol" refer to the abstract arrangement of atoms in an individual molecule, or to the general concept of the liquid substance (or possibly solid or gas state) composed purely of those molecules? Is "ethanol" in different contexts (mixed with water for example, or with various other chemical compounds) the same concept? If some of the hydrogen atoms are actually deuterium, if a C is C-13 or the O O-17 or O-18, is that a different concept? For scientific purposes those different substance states, contexts, or isotopic arrangements imply differences in properties, so that "ethanol" is not really a uniform consistent thing under all conditions. We might (at least hypothetically) want different wikidata items for each of those conditions, in order to specify precisely relevant properties. I think those different items, were they to be created, would all be "subclasses" of "ethanol", because they are describing a subset of the conditions covered by the general concept "ethanol", so their specific instances would also be instances of the general concept. But is a specific instance of a given type of molecule actually an instance of the general concept "chemical compound"? I'm not so sure, and our definitions aren't so clear. There are similar issues in the area of "products" - for example blue cheese (Q746471) is an "instance of" type of cheese (Q3546121) while a "subclass of" cheese (Q10943) which seems right to me - so is "chemical compound" a "type" in that sense? ArthurPSmith (talk) 13:43, 12 June 2017 (UTC)
@ArthurPSmith: yes, it seems the best to use a separate item to make P31 claims over "types" of cheese or chemical compounds.
I also think that ethanol can be "physical" i.e. we can have at least one claim
Apparently the «gene» term is polysemous. The cleanest way I can see to handle this I can imagine is to create an item for each sense, and to make the polysemous item a superclass of all of them. And to look on existing ontologies and definitions to check if all of this matches.
My guess would be : at the concrete level, a gene instance is a (part of) molecule. Then a gene like «LacY» is the class of all these molecules, usually defined by their common sequence. If «LacY» has alleles, they are all subclasses of «LacY» (say «LacY1» «LacY2»). As a consequence, «LacY1» and «LacY2» are both classes of molecules, just as «LacY». They are defined by a more specific sequence.
I think there is two ways to see the « gene » concept : either as a class of molecules, a superclass of «LacY», «LacY1», «LacY2» and all the similar classes, like « lacZ ». All the instances of « qene » would share stuffs like their opening and ending sequence, so it’s no problem to clearly define the class. Now there may be other subclasses of it that are neither gene classes nor alleles, so it’s probably useful to have a distinct item « gene » that is the metaclass of all «LacY» « lacZ » … classes. And an « allele » item, metaclass for all «LacY1» «LacY2» … classes that would not make sense as a class of molecules. author TomT0m / talk page
Inconsistent parsing of dates
I've just noticed this when working with start time (P580): entering a date with different punctuation can change the way it's interpreted.
9 7 2017 - 9 July 2017
9/7/2017 - 7 September 2017
9.7.2017 - 9 July 2017
9-7-2017 - 9 July 2017
7 9 2017 - 7 September 2017
7/9/2017 - 9 July 2017
7.9.2017 - 7 September 2017
7-9-2017 - 7 September 2017
I understand that this sort of date is ambiguous and we shouldn't rely on it being interpreted consistently, but why does one set of punctuation give a different result to another? It can't be that using slashes is unique to MDY notation - I've been writing DMY dates this way all my life. Andrew Gray (talk) 21:37, 9 June 2017 (UTC)
I think Wikidata has to go by common international standards, not necessarily what you've been doing all your life. SharkD Talk 22:25, 9 June 2017 (UTC)
Well, yes, I agree entirely :-). The international standard is YYYY-MM-DD; we support the "traditional" xx-xx-YYYY for convenience, but there's no standard saying "dashes mean D-M-Y but slashes mean M/D/Y" - my comment was meant as a demonstration that punctuation between numbers isn't at all linked to a local style of date notation. Apologies if it was unclear... Andrew Gray (talk) 22:33, 9 June 2017 (UTC)
It's not consistent for any given punctuation either, e.g. "9/7/2017" gives "7 September 2017" but "19/7/2017" gives "19 July 2017" while "7.9.2017" gives "7 September 2017" but "7.19.2017" gives "19 July 2017". I would like to it see it give the user a list of plausible interpretations when a date format is ambiguous and ask them to pick which one is right, instead of acting like there is only one way to interpret it. - Nikki (talk) 00:00, 10 June 2017 (UTC)
Unclear, but on behalf of us Americans who are perhaps the most prominent users of MM/DD/YYYY, I apologize. :) -- Fuzheado (talk) 10:26, 12 June 2017 (UTC)
Video game remakes
Sometimes a video game is remade a few years after release, with improved graphics, additional dialogue between different characters, etc. What property should I use to indicate when this has been done? Is based on (P144) the property I should use? Thanks. SharkD Talk 22:23, 9 June 2017 (UTC)
In this case exact-match is redundant with Entrez Gene ID. However, the exact-match URL (http://identifiers.org/ncbigene/) looks more canonic than the Entrez Gene URL. I would recommend changing FormatterURL of Entrez Gene to use this more canonic URL, if the scope of Entrez Gene is subsumed by the scope of identifiers.org/ncbigene: I know nothing of life sciences). In fact it may be useful to create an external-id for identifiers.org and track that too: this ID would be "ncbigene/18054". The value of such redundant ID would be that it gives access to all kinds of life science entities --Vladimir Alexiev (talk) 07:31, 12 June 2017 (UTC)
Idea for new editing tool
Here's a query for people newly elected as United Kingdom MPs, rendered as table with - at the time of writing - gaps where data is missing for thing like "website" "Twitter name", "Facebook profile"
We could turn that into an editing interface, where people could enter text where gaps exist, and either have that saved directly into Wikidata, if the users are signed in, or into a Wikidata-game like interface for checking by a signed-in editor.
The columns requiring Q values would not be editable (or we could have an autosuggest-style method of adding them), neither would the fields already holding data.
I have always though that this would be a wonderful idea, indeed. Many people would take on the challenge to find a picture for each type of cat, an address for every museum, a batting average for every baseball player. Right now, people entering data are Wikidata experts. With such an interface, people entering the data would be cat/museum/baseball experts, not Wikidata experts. I really hope someone makes it a reality. I often write such queries, but for now I only post them to the relevant WikiProject, where they aren't exposed enough to domain experts. Syced (talk) 03:38, 11 June 2017 (UTC)
I think it would be good if the SPARQL tool would have the option to provide data entry. Otherwise there's work on automatically generated lists for Wikipedia. A good UI for lists might also solve this use-case. ChristianKl (talk) 12:11, 13 June 2017 (UTC)
User script help
Could somebody please help me with this user script I am working on? It is based on the currentDate script. It is supposed to automatically insert "English Wikipedia" when I add the "imported from" property to a reference. It is adding the text correctly I think, but the "save" link stays gray instead of turning blue. Thank you! SharkD Talk 18:56, 11 June 2017 (UTC)
@Sjoerddebruin: Can you tell me where such a decision was made? I agree that when it comes to manually adding facts it's a lot better to add them from a source outside of Wikipedia but if someone adds a fact that he knows from a Wikipedia source, "imported from" is the best way to provide provenance. ChristianKl (talk) 12:31, 13 June 2017 (UTC)
Are there any plans to notify external database owners in some kind of "official" way about errors in their databases? Here in wikidata there are many constraint violations reports with sometimes dozens of entries stemming from double entries in other databases. I know that there are ongoing efforts with VIAF, GND and partially also ImdB to fix their databases by removing double entries, but what about other databases (e.g. those of big sport organizations (FIFA etc.), federal databases or databases of science organizations)? Of course users can do this, but at least I got no reaction to such efforts (which is not really surprising I think, such organizations probably get tons of emails every day). I think this should be made more coordinated and in a more "offficial" way by Wikimedia or at least some representative wikidata people to have an impact. Steak (talk) 20:08, 11 June 2017 (UTC)
Interesting idea. I imagine a "report card" could be generated that could publicize to database owners what inconsistencies we have found with their data. I discussed something similar at the recent Wikicite conference about how we might report back to web sites how good/bad their metadata is for Citoid/Zotero. It's hard to find a direct channel back to an organization's "responsible person," but a periodic public report might do the trick. -- Fuzheado (talk) 07:10, 12 June 2017 (UTC)
Not all data that triggers a constraint report is "corrupt"; indeed, most is not. Several of us are working with external data managers to inform them of relevant constraint reports and other methods of cooperation; case studies are also being collected; see past issues of the 'Wikidata weekly summary'. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits21:21, 12 June 2017 (UTC)
i have emailed some photo libraries for metadata correction on an ad hoc basis. but we should have a contact process for error correction as a part of the upload process. they should have an email as a fall back. Slowking4 (talk) 21:25, 12 June 2017 (UTC)
Our focus is sharing your knowledge and experience within the community. We suggest a lot of different formats, topics, and we also have room for unusual formats and new ideas.
Administrative territorial entities: How to add links to geoshapes (that are not located on Commons)?
Since February 2017, Swisstopo is publishing the geoshapes of administrative territorial entities in Switzerland as linked data: [4].
They have approached me to find out if and how links to these geoshapes could be added to the corresponding items on Wikidata (i.e. cantons, districts, and municipalities in Switzerland).
With a quick search I was able to dig up: geoshape (P3896). But despite its more general label, this property seems to be reserved for geoshapes that are hosted on Wikimedia Commons. – What is the general idea behind having such a property, specifically targeted at Wikimedia Commons? Don't we want to link to geoshapes (available as open data) elsewhere?
So, would you advise in favor of ingesting them all into Wikimedia Commons? - If yes, is there a way to automatize the process? Given that the data is published as linked open data and that it will be updated in the future when changes to the municipality structure are made, this would be helpful.
Wouldn't it also be helpful in this case to have a statement for each Wikidata item that points to the official shapefile - if only to facilitate the automated ingest of the shapefiles into Commons? – For experimental purposes I have for now added a second formatter URI for RDF resource (P1921) statement to Swiss municipality code (P771), which actually provides the link to the item on the Swisstopo site. It's rather well hidden from the user; so I'm not entirely sure whether that's the way to go.
And there is yet another issue that would need to be resolved on the Wikidata side: on Wikipedia and Wikidata, Swiss municipality codes are "normalized" to four digits (thus, "351" becomes "0351"), while this is not the case in the official reference database. Of course, you can deal with this specifically when writing SPARQL queries pulling data from both services (see https://tinyurl.com/jqkkwrv for an example), but this requires some extra code to be written by people who are aware of the issue to begin with. – I guess it would make sense to correct the Swiss municipality codes on Wikidata.
MySociety (Q10851773) have now completed publishing a "five part series examining how to use Wikidata to answer the question: 'What is the gender breakdown of heads of government across the world?'". Here is the full set:
I am not sure, how to indicate, that building is under construction. I think we need such an information, for example for queries like tallest skyscrapers.
How can I query this (filter out incomplete buildings, note that majority of completed buildings does not have significant event (P793) and construction (Q385378) filled)? How can I record other states - proposed/approved/under construction/on-hold/architecturally topped out/structurally topped out/demolished etc. ?--Jklamo (talk) 15:06, 12 June 2017 (UTC)
Exclude with event Q331483 regardless of qualifiers, then exclude Q385378 with P580 but not with P582
Sweet! In comparison, the English Wikipedia is currently at around 780 million edits, so I don't think it'll take too long for us to pass them. Someone may even be able to chart the number of edits over time and make a graph (I'd do it myself, but I'd spend way too much time researching how to properly do it to bother). Jon Harald Søby (talk) 21:39, 12 June 2017 (UTC)
Timeline of Wikidata by number of edits.
Edits (hundreds of millions) / point in time
However, we should recognize the fact that this number is heavily inflated, and we could acquire all the current content of WD with less than 100 millions of edits. optimization (Q24476018) is a foreign word for Wikidata. XXN, 14:23, 13 June 2017 (UTC)
A two-dimensional print (Q11060274) (or in general instances of artwork superclasses) has physical dimensions height (P2048) and width (P2049). There are cases where different physical dimensions can be found in sources, e.g. for image size and paper size.
Which ones are typically used for physical dimension properties?
One could add multiple values and qualify them with qualifiers. Good idea? Which qualifiers fit best (property/value pairs)? (barely done for instances of image (Q478798) and subclasses thereof until now)
Should ranks be used to prefer one type of dimensions over another?
Which WikiProject or users have experience in this field?
Well in the paintings project we avoid these issues by ignoring all framing sizes altogether (and most images of paintings have their frames cut off as "sculpture" on Commons anyway), so I would go with image size for consistency. If you have both measurements per print and aren't afraid of the trouble to model them in such a way that others can follow what you are doing, go ahead. Modelling prints on Wikidata is still a bit up in the air, since no one wants to tackle them. Potentially it could be a nightmare if every library in the world wants to upload their print-of-a-print, but then some prints of old originals might be notable if the original prints are lost or quite rare, etc. Someone I spoke with recently felt that we should upload the entire Hollstein catalog (Gah...) and then each library can link their version to the iconic item for that print. I can't decide, but good luck to you! Jane023 (talk) 17:37, 12 June 2017 (UTC)
Thanks for your answer.
The task I work on is in fact a repair job which came to me via #Unitless claims (section above). There was a large set (~4000) of print items identified which are part of (P361)Welsh Landscape Collection (Q21542493), but they have a problem of missing units for their physical dimensions (P2048 and P2049). Since all data is equipped with an external identifier and it looked consistent, I decided to invest some time and “fix it” (the missing unit problem). However, I was made aware of the fact that all (?) of them have bad data for width and height, mixing up image height (assigned as width) and paper width (assigned as height). This was imported by another user from Commons, where it is still wrong as well.
Fortunately, uniform sources are provided and they are even accessible in JSON format, so I now have information of both image size and paper size for all ~4000 items in a local database (I crawled all of them). They “just” need to be prepared for the correction job, which should include the addition of a source as well.
One more question: the physical dimensions of a 2D artwork is typically given as “width × height”, isn’t it? I just want to make sure that it isn’t the other way round ;-) —MisterSynergy (talk) 18:30, 12 June 2017 (UTC)
Sorry, no idea about prints, but with paintings, it is generally H x W and not the other way around (see e.g. here). Having said that, just do a spot check on the data looking at the prints to find out which way your dataset does it - most prints are pretty obvious, since they tend towards the page size and rectangular shape. 4000 is a lot - thanks for your work on that! Jane023 (talk) 21:53, 13 June 2017 (UTC)
Mediawiki API Service for WDQS
We have deployed the first iteration of Mediawiki API support for Wikidata Query Service. Please see the manual for the full documentation, below outlined are the main highlights.
The service allows to call out to some Mediawiki APIs from SPARQL in order to obtain information not contained in RDF data and WDQS database. See the list of the supported APIs in the manual.
Currently a small subset of existing APIs is supported, and we expect the community to nominate more services and contribute service templates to extend the API. Please see the manual for description of service templates. Note that we do not plan to support any APIs that modify data, edit wikis, etc. - only read-only querying APIs and only APIs that do not require any authorization can be supported.
Currently supported hosts are: *.wikipedia.org, commons.wikimedia.org, www.mediawiki.org, www.wikidata.org, test.wikidata.org. If any other wikis need to be supported, please leave a comment to the developers and we will enable them.
If there are any problems or questions, please contact the developers on the wikidata list, #wikidata on IRC, or on wiki, or submit a Phabricator issue.
TODOs:
Add more services (nominations welcome)
Support services that accept multiple titles as input in one query
One use-case (probably already possible via generator) - confirm, that all transclusions of "Template:Infobox park" are marked as parks at WD. --Edgars2007 (talk) 11:20, 13 June 2017 (UTC)
I have created a userscript which simplifies the process of marking two pages as duplicates. This is necessary when two pages about the same topic exist in the same wiki and are connected to two duplicate items (example).
You will see "Mark as duplicate" link in the sidebar. When you click on it and type the id of the second item, the script will mark the current item as duplicate (using Wikimedia duplicated page (Q17362920)), identify which wikis have duplicate articles and try to request merging them (using templates from Template:Merge (Q6919004)).
Please be careful when using it, you may do a mess in many wikis during a short while. Feel free to report any problems you find. Good hunting! Matěj Suchánek (talk) 15:17, 9 June 2017 (UTC)
Excellent! Thanks for writing this. With all the new upcoming redirects, we might need it even more. --- Jura15:43, 12 June 2017 (UTC)
described by source (P1343) has a constraint that the value should be an item that is an <instance of>work (Q386724). Since the contents of dictionaries and encyclopedias vary by edition, shouldn't we reference a specific edition rather than the work? (With a page number and specific entry to boot, to my way of thinking). - PKM (talk) 21:32, 13 June 2017 (UTC)
Have a look at for instance "Sultan of the Ottoman Empire" and ask yourself if this is for real? This structure is so baroque I feel it has nothing to do with the things I am working on. When you compare it with other sultans / monarchs these have different classes. So the application of classes is inconsistent as well. When you look at the upper classes, it cannot be explained with a straight face without a straight jacket.. How did we get here? Thanks, GerardM (talk) 07:35, 14 June 2017 (UTC)
Reasonator seems to be picking out a particularly long thread of class relationships there - if you follow the subclass hierarchy through "activity" for example you get a much shorter list. In any case, yes, wikidata class relationships are rather messy, though I think they are slowly improving as people see problems and address them. Please participate in Wikidata:WikiProject Ontology (see in particular the Problems page) to help. ArthurPSmith (talk) 13:40, 14 June 2017 (UTC)
Next IRC office hour on June 28th
Hello,
Our next Wikidata IRC office hour will take place on June 28th, 18:00 UTC (20:00 in Berlin), on the channel #wikimedia-officeconnect.
During one hour, you'll be able to chat with the development team about the past, current and future projects, and ask any question you want.
Does anyone know if there is a step-by-step guide for making QuickStatements 2 batch edits? I've been writing commands up using the version 1 format, then importing them into the new version, but I can't seem to figure out how these people are making batches that show up in the batch history. – Pizza1016 (talk | contribs) 03:18, 14 June 2017 (UTC)
I'm pretty sure the option 'Run in background' gives you the option to describe the set of commands you wish to run, after which the set is queued as a single batch. Mahir256 (talk) 04:47, 14 June 2017 (UTC)
I also would love to see a Guide for QuickStatements 2. I am on of "these people" that make batch runs, but I am still clueless about what that means. I use version 1 syntax and on the button you have 2 buttons: "Run" and "Run in the background". I usually try to click run, but once or twice clicked the other button instead, the result was the same but it took a bit longer. --Jarekt (talk) 02:11, 17 June 2017 (UTC)
Facto Post – Issue 1 – 14 June 2017
Facto Post – Issue 1 – 14 June 2017
Editorial
This newsletter starts with the motto "common endeavour for 21st century content". To unpack that slogan somewhat, we are particularly interested in the new, post-Wikidata collection of techniques that are flourishing under the Wikimedia collaborative umbrella. To linked data, SPARQL queries and WikiCite, add gamified participation, text mining and new holding areas, with bots, tech and humans working harmoniously.
Scientists, librarians and Wikimedians are coming together and providing a more unified view of an emerging area. Further integration of both its community and its technical aspects can be anticipated.
While Wikipedia will remain the discursive heart of Wikimedia, data-rich and semantic content will support it. We'll aim to be both broad and selective in our coverage. This publication Facto Post (the very opposite of retroactive) and call to action are brought to you monthly by ContentMine.
If you wish to receive issues of Facto Post on English Wikipedia, please add your name to our mailing list. You can always remove it. Newsletter delivered there by MediaWiki message delivery
What do you mean exactly by "post-Wikidata"? Good luck with the Facto Post, looking forward to the next issues! Syced (talk) 05:42, 15 June 2017 (UTC)
Ah. I have tried thinking about Wikimedia integration around Wikidata. I think this is happening, but it is hard to explain anyone not already a Wikimedian working on several of the sister projects. The WMF is doing a review of prospects for 2030. I have contributed to both stages so far; in the first stage I said Wikidata should by then have one billion statements. Why not? But this also doesn't say enough.
So here I'm also talking about integration, but in a more encyclopedic way. In its 15th edition, the Encyclopædia Britannica innovated (Micropædia, Macropædia, and Propædia); but this was not really a success. Wikipedia has innovated also, but we need to look at both technologies and groups of people, to understand the potential for successfully building a new kind of reference work.
The input-output issues around Wikidata seem like a good way to understand things. Wikidata inputs (automated, semi-automated, and via the fact mining which I'm working on at WikiFactMine). Holding areas such as mix'n'match, potentially LibraryBase. Wikidata outputs, not just to infoboxes but via SPARQL, and some form of WikiCite export.
@Charles Matthews: Thanks a lot for starting this initiative. I think the movement needs people as courageous as you for advancing in its mission. Are you coming to the Wikidata conference? Did you apply for a grant already? If not, please do so, as this is the kind of submissions that really need the attention of the community.--Micru (talk) 04:48, 16 June 2017 (UTC)
There is a list of allowed protocols for properties of the URL data type. Right now irc is allowed but ircs is not. Can you file a ticket on phabricator.wikimedia.org? Then we can add it. --Lydia Pintscher (WMDE) (talk) 11:49, 16 June 2017 (UTC)
Surfacing more of our data in Wikipedia templates
For values such as population, Wikipedia templates often only list the most recent value. We could display an icon like 📊 after the number. If the user clicks on the icon, we could provide him with a chart in a popup that shows the historical data for the population.
What do you think about such a feature? Would it be well-received by the Wikipedians? ChristianKl (talk) 15:29, 16 June 2017 (UTC)
I'm not entirely sure that this is something that makes sense to model directly like this, any more than it would make sense to model what electoral district a lake is in. Any given geographic point will usually be within multiple electoral districts (i.e. at various levels of government), which will often have geographies that intersect in wildly different ways. Depending on the level in question, a town will often either contain many electoral districts, or overlap with multiple others. As there isn't usually a direct mapping like this, this seems like something that should really be queried a different way — e.g. by having shapefiles for the various geographic concepts you want to compare. --Oravrattas (talk) 22:47, 16 June 2017 (UTC)
Districts are not fixed. It is what gerrymandering is about. So the best thing is to have maps associated with electoral districts. Also, a town does not need to be fully in one district.. Just consider the electoral district of Kensington as an example. Thanks, GerardM (talk) 07:18, 17 June 2017 (UTC)
pinyin (Q42222) isn't a work and thus not in the domain of title (P1476) and thus there's no reason to use it in this context. In general I would recommend you (PokestarFan) to stick to cases where the right edit is more obvious. ChristianKl (talk) 17:22, 17 June 2017 (UTC)
The question is how to indicate the contents of each column in the property / item, including:
See for example c:Data:Wikidata/St.Petersburg.tab, if a new "population table" property is created for linking to these type of tables, I think new properties (similar to Wikidata property (P1687)) are needed to link as statements (in the "population table" property) to the point in time (P585), and population (P1082) properties, with a qualifier to indicate the name id of the column (i.e. "year" and "population") and the unit used ("human").
The JSON table in commons allows to add an optional "sources" file for referencing the source of the material, but in many cases each row of the table will have a different source (e.g. census of that year), so probably by convention in those cases a specific column (or columns) will be needed to store the specific source (and a specific property to indicate the format, e.g. reference URL (P854) or/and determination method (P459)).
The same table can contain different data values (e.g. temperature and precipitation in c:Data:Ncei.noaa.gov/weather/New York City.tab), but maybe each property should only be allowed to describe a single one (e.g. separate "temperature data" and "precipitation data" properties, instead of a single "weather data" property) ignoring the other columns, even if useful for data visualization purposes. But I think this should be discussed also from a technical point of view.
This description properties for these simple tables are meant to be specified in each property definition, as each table file will refer just to a single Wikidata item, but for complex tables containing data for multiple items (e.g. c:Data:Bea.gov/GDP_by_state.tab or c:Data:Dolmens_of_the_Preseli_Hills.tab) another qualifier will be needed in the item to indicate which column contains the actual data for this item (e.g. in the Alabama (Q173) item, a qualifier will be used for "GDP table" property to indicate that the "AL" column contains the applicable data). It is also possible to forbid the use of this complex tables, forcing by convention that each table only contains data for a single item, but if imported tables from recognized sources usually contains data for multiple items this may be impractical, requiring conversions.
I just want to discuss first some alternatives, with more experiences users, before creating a new RfC with a more mature approach. Thanks for your ideas!! —surueña07:33, 12 June 2017 (UTC)
@Suruena: great questions and ideas there! One additional thing that I think would be really useful would be some way to tie the data tables into SPARQL queries, so that the data from the table could actually be queryable and/or returnable. Maybe we can get the SPARQL endpoint folks to weigh in? In response to your first 3 questions/statements it seems perhaps we ought to encourage the use of the wikidata property id's as column headings in the data tables or otherwise have some way to tie them directly to the files, rather than using qualifiers within the wikidata item (I'm not sure how you'd even do that, as a qualifier can only have one value, but you're trying to link two values together??) Another option is to use file format (P2701) as qualifier to point to very specific file formats that list the individual properties used in those files. Your 4th issue is also something that could be important, I think there is a good case for one or two new properties there - 'key field' and 'key value' perhaps, to be qualifiers on the table statement? ArthurPSmith (talk) 14:16, 12 June 2017 (UTC)
Tabular data is data that can't be easily queried with SPARQL. As a result, I don't support moving properties like population (P1082) to the new data type. The motivation for using the new datatype is that we think keeping the relevant data inside of Wikidata takes up too much space and the ability to query the data with SPARQL isn't useful enough to pay that price. ChristianKl (talk) 15:21, 12 June 2017 (UTC)
That's a very good point, I agree that those properties meant to replace current facts like population shouldn't be created unless table files can be queried with SPARQL. RDF support for tabular data is currently being implemented (T163921), is this enough to query this data from SPARQL? In any case, my proposal is to replicate the data model, so table files are just a more convenient way to put a long statement group, i.e. one column is the value of the property (e.g. population count), and any additional column is a qualifier (e.g. point in time) or reference (e.g. reference URL) for that value. Therefore, to make as simple as possible to support it in queries (but in a more convenient way, and allowing interactive graphs and filtered lists without the need to duplicate information). @ChristianKl: It's also true that we can start with tabular properties of data taking a lot of space but not very relevant from queries, do you have examples please?
@ArthurPSmith: I've elaborating in my mind the proposal, I was thinking in something like this: Suppose a new property "population evolution" that links to .tab files like c:Data:Wikidata/St.Petersburg.tab. We need to describe in the statements of any property with Tabular data type the format of the files that can be linked with, i.e. we need to indicate that one column will have the population value and another will be the qualifier with the point in time. Then, something like the following statements and qualifiers would be in the "population evolution" property:
Statement 'Tabular data column value' → "population" (new String property, name of the column in the linked file)
Then, in the Saint Petersburg (Q656) item, the statement "population evolution" would be just a link to c:Data:Wikidata/St.Petersburg.tab. It's important to describe the allowed format, so constraint violations can be detected (e.g. mandatory column not present, or the values of the column do not met the constraints required by the property population (P1082), for example). Other qualifiers will be probably needed, e.g. whether the .tab column must have string/number/boolean/etc format or the allowed units, and probably also a convention how to write in the .tab files a "no value" / "unknown value" and ranks. A more complex approach will be needed to support complex tables, so probably is better to just allow simple tables (all columns just referring to a single item) and have scripts to take complex tables and split in simple tables.
I think that the use of wikidata property id's as column headings is a possibility, but this would be more difficult to modify by casual editors. Can you put an example of file format (P2701) usage? Would be something like the description of each column in statements, but as a separate item? Thank you very much to both of you for your great feedback! —surueña06:57, 13 June 2017 (UTC)
Yes, each column in the file described by statements in a separate item for the file format is what I was thinking. I think that might be a better option that creating a distinct property for each data format; on the other hand doing it via the property as you suggest would mean the relationships were very solidly defined from the start, constraints could be easily checked, etc. So either way could work. As to ChristianKl's specific domain comment - rather than population we have an old property proposal for temperature data here which I think would be ideal to try to get something like this working. On the other hand I'm not sure what existing commons tabular files for temperature exist that would be a good starting point there.... ArthurPSmith (talk) 15:34, 13 June 2017 (UTC)
I don't think that it's important to seek for use-cases of the new feature. It's much better to focus on how to model specific domains well. But if you want a use-case, if we had the goal of integrating most of the historic weather data that's in the public domain at the highest resolution that's available, that's likely too much data to store it directly in Wikidata. ChristianKl (talk) 12:01, 13 June 2017 (UTC)
@ChristianKl: I agree that storing historic weather data is a good candidate for tabular data, even in case it cannot be queried from SPARQL. In any case, I've asked the developers to see which is the current plan about this (see Wikidata:Contact the development team#Tabular data in SPARQL queries?). @ArthurPSmith: Thanks for pointing out that property discussion, the data file proposed seems representable by the approach I proposed (c:Data:Ncei.noaa.gov/weather/New York City.tab, to generate the Wikipedia table en:User:Yurik/WeatherDemo through Lua scripts), either by defining a different wikidata property per column ("highest temperature evolution", "average high temperature evolution", "Lowest temperature evolution", "precipitation evolution"...), or a single property "weather table" (with probably optional columns when data is not available for some locations). I prefer one property per column, and probably the scripts for printing the wikipedia tables can work fine with multiple properties, but is another point to be clarified. —surueña06:07, 19 June 2017 (UTC)
I can't use Harvest Templates as it wants me to specify a parameter.
By specifying a widely-used parameter such as "name" I can trick Harvest Templates into giving me a rather complete CSV list of pages, but then I can't feed that list to QuickStatements, because QuickStatements does not have the capability to detect whether each item has a P31 or not already (so I would end up adding park (Q22698) also to items that already have a more specific P31 such as national park (Q46169), which would be bad).
@Multichill: When editing that page I am told "This filter was triggered because you tried editing other user's userpage. Make sure you're now editing your userpage with your account. If you surely believe you should edit the page, you can request sysop to edit the page (Add new section). Add {{AllowEdit}} to your userpage if you want others to edit yours. Could you please add that or point to my page? Thanks a lot! Where is the source code of your bot? In particular, I would like to know how it reacts when the item already has a P31 (or the given property). If an article is about both a radio broadcast and a TV broadcast, thus having both infoboxes, will the tool add both? Will the bot add "TV broadcast" if the item already is an instance of "NHK TV broadcast"? Thanks! Syced (talk) 02:20, 13 June 2017 (UTC)
Right, the Jawp went completely overboard with abusefilter. You should be able to edit the page now. As for source and other documentation: All on User:NoclaimsBot.
I'd be very wary of that. The en.Wikipedia version of that template, for instance, is also used on things that are not parks, including nature reserves, botanical gardens, arboretums, and even a biography in which a park is discussed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits16:52, 12 June 2017 (UTC)
Would this item only refer to Germans as a collective group, or does it also refer to individual Germans of that group? More specifically, would the singular noun "German" have the same referent as this item? CodeCat (talk) 20:22, 14 June 2017 (UTC)
I want to link a sense of the word "German" (on Wiktionary) to a Wikidata item: "A member of the Germanic ethnic group which is the most populous ethnic group in Germany; a person of German descent." If the item is only for the group itself (for which Wiktionary has no entry since it's just the plural), then I don't know what to do. CodeCat (talk) 21:41, 14 June 2017 (UTC)
Firstly, being German by citizenship and German by ethnicity are separate things, even if they aren't usually distinguished. Secondly, the point is that there is a noun in English which refers to: 1. a person of the Germans (Q42884) ethnic group and 2. a person who is a citizen of Germany (Q183). Currently, Wikidata does not seem to have an item for either of these referents, i.e. there is no "German citizen" item nor is there a "person of the German ethnic group" item. This works fine when you want to express these things with a property, but I'm looking to connect the referents of these two senses of the noun "German" directly to Wikidata items. I have already done so with other words, such as wikt:Paris where in the English section the sense "The capital city of France." is tagged with {{senseid|en|Q90}} to indicate that it refers to the same thing as Paris (Q90). CodeCat (talk) 13:00, 15 June 2017 (UTC)
The core difference is that "country of citizenship" "Germany" is no item but a statement. The project of interlinking with Wikidictionary means that we need a variety of new items for concepts for which we currently don't have items but for which Wikidictionary has words. @CodeCat: If you want an item for the concept of an individual that's of German ethnicity, the principle of creating a new item is the same. ChristianKl (talk) 13:58, 15 June 2017 (UTC)
Why include Wiktionary terms? It is probably more relevant doing it the other way around. All Wikidata items have labels that can be conjugated. This is seeking a problem for a solution. Thanks, GerardM (talk) 15:05, 15 June 2017 (UTC)
For applications of text-mining it's very useful to be able to go from Wikidictionary terms to the concept behind those terms. The concept of a German citizen is also a "clearly identifiable conceptual entity" and thus notable. ChristianKl (talk) 11:30, 16 June 2017 (UTC)
I don't think this is the right thing to do. Wikidata does not model things by creating new subclasses for every possible intersection of things, it primarily describes things using properties. If the way Wiktionary links to Wikidata can't handle that, it sounds like something which needs fixing in Wiktionary. If we want to have an item "citizen of Germany", the argument for it should be more than just "Wiktionary wants to say that "German" means "citizen of Germany" instead of linking "citizen" and "Germany" separately" because that reasoning can be applied to lots of words (e.g. "female French citizen" could be added for wikt:en:Französin, "multiple female French citizens" for wikt:en:Französinnen...) and Wikidata is not a dictionary. Work is being done to add support for lexical information, but that will use new prefixes, not the existing Q and P ones. - Nikki (talk) 09:08, 16 June 2017 (UTC)
Adjectives seem to lend themselves well to being mapped to properties. The adjective "German" in one of its senses, would be ethnic group (P172) = Germans (Q42884), i.e. the adjective sense applies when an entity has that property with that value. In a roundabout way, the noun "German" can then be defined as any entity for which that property holds, i.e. a German person. I have no idea how that would be expressed on Wiktionary's side, but it's an idea. It could also help to distinguish the noun "green" green (Q3133) from the adjective "green" color (P462) = green (Q3133). CodeCat (talk) 21:02, 16 June 2017 (UTC)
Wikidata holds information about concepts in the Q-namespace. The noun and adjective green both refer to the same concept. Wikidata will soon have additional datatypes like the lexeme datatype and then we have separate items for noun and adjective. ChristianKl (talk) 18:33, 18 June 2017 (UTC)
@CodeCat: plural forms are not so useful to link when singular form is linked to Wikidata item.
At least Russian "немец" "немка" (colloquial "германец") are aliases of Germans (Q42884); "немцы" is used here.
Yes, many items are missing separate item and subclasses of class (Q17519152) because one/many distinction is implemented using objective properties (e.g. collection or exhibition size (P1436)) most of the time. d1g (talk) 18:59, 15 June 2017 (UTC)
In Wiktionary, both the singular "German" and the plural "Germans" are combined under one lemma, so these are a single unit from Wiktionary's point of view. Thus, the distinction one German versus many Germans is not made, they are the same sense. The Wikidata item, however, seems to refer specifically to all the Germans, the entire set of people whose ethnicity is German. This is a nuance that Wiktionary doesn't have, although perhaps it should. CodeCat (talk) 19:24, 15 June 2017 (UTC)
Yes, I'm aware of that. It doesn't work for Wiktionary though, since we're dealing with a specific concept, which requires its own item. I see three possibilities: 1. we link the Wiktionary sense straight to Germans (Q42884), understanding them to be the same, 2. we create a new item for a person of German ethnicity and link the Wiktionary sense to that, and also do this for every other ethnicity, or 3. we don't link the Wiktionary sense at all and just leave it as it is. CodeCat (talk) 20:22, 15 June 2017 (UTC)
1 or 3 now; maybe 2 if we really need these items for other reasons (as values in claims)
I have been thinking.. Suppose someone is of Turkish ancestry and a third generation German.. Would you call him a German or a Turk. If you make a choice is that not racist? Thanks, GerardM (talk) 11:08, 16 June 2017 (UTC)
We should user whichever one (or both!) we have a reliable source for, and reference the source(s). - PKM (talk) 20:15, 16 June 2017 (UTC)
SELECT(IRI(concat("https://commons.wikimedia.org/wiki/",?template))as?templateLink)?templateName?creatorItem?creatorItemLabel{SERVICEwikibase:mwapi{bd:serviceParamwikibase:api"Generator".bd:serviceParamwikibase:endpoint"commons.wikimedia.org".bd:serviceParammwapi:gcmtitle"Category:Creator templates without Wikidata link".bd:serviceParammwapi:generator"categorymembers".bd:serviceParammwapi:gcmlimit"max".?templatewikibase:apiOutputmwapi:title.}BIND(substr(?template,9)as?templateName).OPTIONAL{?creatorItemwdt:P1472?templateName.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en"}}FILTER(BOUND(?creatorItem)).}
However it may not return all wanted entries, due to limitations with MWAPI call results limit. This limit really doesn't make sense in the SPARQL context in my opinion, especially for generators. @Smalyshev (WMF): any chance the engine can iterate by using continue parameter? -- Nono314 (talk) 19:39, 16 June 2017 (UTC)
Theoretically, it is possible, practically, different APIs seem to do continuations differently, so it may be hard to implement it in generic way. E.g. for this API, continue is in gcmcontinue, but for search it's sroffset, and for querypage it's qpoffset. If I figure out a way how to generalize it, I can implement it. Though one needs to be careful as result may be too big and lead to timeouts. --Smalyshev (WMF) (talk) 22:54, 16 June 2017 (UTC)
Sure, as for any SPARQL query! @Smalyshev (WMF): Thanks for having a look at it. I think it really makes sense for generators. So we could pass the right parameter in the query, and you would just iterate until you get batchcomplete in result? And maybe a way to specify a maximum number of iterations? -- Nono314 (talk) 23:38, 16 June 2017 (UTC)
Nono314 this is great, that is exactly what I needed. Thanks a lot! I think I fixed all that were showing up, so the query does not return anything now. --Jarekt (talk) 03:13, 17 June 2017 (UTC)
Hopefully in the near future this will deliver complete result sets. In the meantime, Jarekt you can fiddle with gcmsort/gcmdir/gcmstart/gcmend parameters to get some additional results, like this. By the way, it seems like a number of the missing wikidata ids have actually been recently removed by your "reset linkback" bot run (eg [5]), so you may want to check that you are not undoing with one hand what you do with the other. -- Nono314 (talk) 10:46, 17 June 2017 (UTC)
I know that that bot job removed few Wikidata links ( it effected only pages where Wikidata field was in the same line as "Linkback" field in a template where each field is supposed to be in a separate line), so there is a new urgency to the query to fix things I have broken. --Jarekt (talk) 14:46, 17 June 2017 (UTC)
I think the second query I offered should have fixed them, as it was listing recently modified templates. Additionally, you may be interested by the following query listing creator templates that do not have wikidata id, but for which there is an item with Commons category (P373) pointing to their home category.
SELECT(IRI(concat("https://commons.wikimedia.org/wiki/",?template))as?templateLink)?templateName?categoryName?commonsCatItem?commonsCatItemLabel{SERVICEwikibase:mwapi{bd:serviceParamwikibase:api"Generator".bd:serviceParamwikibase:endpoint"commons.wikimedia.org".bd:serviceParammwapi:gcmtitle"Category:Creator templates without Wikidata link".bd:serviceParammwapi:generator"categorymembers".bd:serviceParammwapi:gcmtype"page".bd:serviceParammwapi:gcmlimit"max".bd:serviceParammwapi:gcmsort"timestamp".bd:serviceParammwapi:gcmdir"descending".?templatewikibase:apiOutputmwapi:title.}hint:Priorhint:runFirst1.SERVICEwikibase:mwapi{bd:serviceParamwikibase:api"Categories".bd:serviceParamwikibase:endpoint"commons.wikimedia.org".bd:serviceParammwapi:titles?template.bd:serviceParammwapi:clshow"!hidden".?categorywikibase:apiOutputmwapi:category.}BIND(substr(?template,9)as?templateName).BIND(substr(?category,10)as?categoryName).OPTIONAL{?commonsCatItemwdt:P373?categoryName.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en"}}FILTER(BOUND(?commonsCatItem)).}
Nono314 that is a good one too and I already added the connections. I have a way of finding creator templates that do not have wikidata id, but for which there is an item with sitelink pointing to their home category, but this one is better and can catch more connections. Another good query might be using c:Category:Creator template home categories without Wikidata link, as the creator template names often do not match category names. Thanks again. --Jarekt (talk) 00:32, 19 June 2017 (UTC)
3 year old bad bot edits
I was just looking at strange case of Q4747853 and w:en:Amos_Doolittle. Three years ago blocked bot User:GPUBot run by User:GautierPoupeau "imported" dates from English Wikipedia. However the dates were corrupted in transfer, somehow. Than French article was written citing wrong dates. I corrected the dates on Wikidata, but I do not speak French to correct them there. Shall we run some bot jobs searching for cases where source wiki dates no longer match wikidata? I wonder how many more cases like that were created by that bot. --Jarekt (talk) 04:45, 18 June 2017 (UTC)
Looks like enwiki was wrong when the bot ran - the import was in February 2014, and the enwiki article was corrected in April 2014. So it's not a corruption issue, thankfully :-)
Andrew Gray thank you for correcting me. I could not figured out where the issue come from, but now it makes sense, and I am less concerned about other edits by the bot.
This is the only way where we curate both Wikipedias and Wikidata. It is not only for one it is needed for any project. Thanks, GerardM (talk) 10:38, 18 June 2017 (UTC)
Missing years
The future years are a bit of a mess. Every once in a while a tiny Wikipedia creates a bunch of them. These get imported as items, duplicates occur, stuff gets deleted, etc etc. We're wasting time here. I propose to properly create and link all years from now to 3017. Why 3017? If you look at my notepad (pl) you'll notice the highest year is 3001 (Q29976877) so we might as well do now + 1000 to not run into this for a very very long time. Taking 2017 (Q25290) as an example. Each item would have:
Label in all the languages that have "2017" as label on 2017 (Q25290)
Description in all the languages that have a description on 2017 (Q25290)
Sounds good - I like the idea of creating these now so that we have a solid framework. Will it also be worth going backwards as well to cover historic years and make sure they all have the basic metadata? Andrew Gray (talk) 11:51, 18 June 2017 (UTC)
It looks like early years are quite a mess. A few problems I noticed with a cursory look:
@Jc3s5h: The Swedish article starts by telling that the year never existed. That is not a big deal. This is an item about a Swedish municipality that never has existed. This is one of at least two Moons of the planet Saturn that never has existed. -- Innocent bystander (talk) 18:34, 18 June 2017 (UTC)
We rarely use these items ourselves because our date/time properties don't use items. The main reason these items exist is because of the sitelinks, so I don't think there's much point going far beyond the range of years which have sitelinks. Filling in the gaps makes sense to me, but going all the way to 10,000 (and beyond) seems like overkill. - Nikki (talk) 16:37, 18 June 2017 (UTC)
The Wikipedia's who do have a lot of articles for individual years frequently store information about how a year is called in different calendars in the respective articles. Maybe we can store the same information to make them more useful? ChristianKl (talk) 18:02, 18 June 2017 (UTC)
Paste a tab/newline-delimited command sequence from the original QuickStatements here.
You can pass such commands as URL parameters by appending "#v1=COMMANDS" to the URL. For convenience, you can replace tabs by "|" and newlines by "||".
Note: MERGE command does NOT work yet.
You can remove specific statements by prefixing a line with "-"
Quantities with error can be entered as 1.2~0.3 (for 1.2±0.3)
So, just to be sure, we use value in daltons as "molar mass" and use it in mass (P2067)? I guess since they are defined as numerically equal, I can live with that, though it feels a bit iffy. But ok, I'll assign dalton-based masses to the appropriate items then, thanks. Laboramus (talk) 01:06, 19 June 2017 (UTC)
It is no longer the molar mass, it is an atomic or molecular mass (depends on subject item). As you said, they are numerically equal due to the definitions of a mole and the atomic mass unit. Maybe we should discuss the alias molar mass for mass (P2067), but I don’t see that much of a problem with it. —MisterSynergy (talk) 04:54, 19 June 2017 (UTC)
Constraint checks updated, please test!
Hi everyone! We recently made a lot of updates to the constraints extension,
to hopefully make the messages much easier to understand (“This property must only be used on items that are in the relation to the item (or a subclass of the item) defined in the parameters” – what? :D ),
and also to improve the look of the report that this user script displays on item pages.
Since we’d like to turn that user script into a gadget soon, we’d appreciate it a lot if some more people tested the current state and gave us feedback on it :)
There are a few known issues that also came up when the user script was first announced:
We currently only check the main statement, not qualifiers or references. One consequence of this is that the “value only” constraint is not implemented, since in the cases we check it would always be satisfied.
We currently don’t distinguish between different constraint states (e. g. “regular” and “mandatory”). This is tracked in T164254.
Constraints are only checked when the page is loaded. Edited or new statements aren’t immediately checked, and editing a statement without changing the main value will probably still make the constraint report icon disappear.
If you notice any other issues, or think some of the messages could be improved further, then please let us know (e. g. by replying here)!
We chose to provide this as a user script first, because we wanted to have a collaborative process, and we wanted the community to try it and provide feedback before enabling it as a gadget. That's what happened (thanks again to people who provided ideas and helped us fixing bugs). After this round of tests, we will indeed suggest to create a gadget, so more people can use it :) Lea Lacroix (WMDE) (talk) 13:30, 12 June 2017 (UTC)
Also, this saves time walking from item page to report page.
The only thing I can miss is ability to support complex constraints and custom help page for more complex things from Wikidata model. d1g (talk) 16:11, 12 June 2017 (UTC)
Hm, complex constraints are a bit different from other constraints because it’s a single SPARQL query that returns all violating items at once, not something that you check on a single statement of a single item… I’m not sure if there’s a good way to apply the same idea to per-statement constraints. How much do people use complex constraints?
Also, once we move to constraint statements, the length limit on the string datatype (500, I believe) will probably limit the usefulness of this constraint type – how long are typical queries for that constraint?
I get the impression that an edit to contraints that I made on May 27 wasn't yet integrated into the contraint table. Is this expected behavior? If so, I would advocate to run the script to update the contraint table on a weekly basis. ChristianKl (talk) 17:38, 17 June 2017 (UTC)
There are quite a lot of template subpages which cannot clearly be classified as "/doc" or something else which is mentioned in WD:Notability, and which have two sitelinks, so they are notable according to the criteria. However, I don't think that pages like Q21845026 or Q23758062 should have an item. Furthermore, /XML, /Meta and /preload subpages are also not covered by the current criteria. I would suggest to change the criteria so that subpages of templates are not notable in general. With subpage I mean pages which are not a template used in article main space for themselves. This does of course not include pages like Q13413893, where in german the template is formally a subpage ("Vorlage:Navigationsleiste Flughäfen nach Staat/Europa"), but actually it's a normal template. Steak (talk) 07:15, 14 June 2017 (UTC)
In this case the item provides interwiki links. The policy seems to be written with the assumption that the use-case of interwiki links warrants the item. Having the item has little cost. Can you argue why you think the interwiki links are useless? ChristianKl (talk) 10:10, 14 June 2017 (UTC)
Fully agree with Steak on this matter. 1) They are not templates; 2) They not individually notable, see Wikidata:Notability; 3) They are components of their parents, and cannot sensibly exist in isolation. The addition of subpages should not be automatic, it should depend on the wikis thoughts, and the usage of the page. — billinghurstsDrewth10:36, 14 June 2017 (UTC)
I am going to sligthly change the notability wording to make clear that XML and other types of subpages are also not notable. Steak (talk) 06:56, 19 June 2017 (UTC)
WikidataCon: registration open and new deadline for scholarships
Important: due the necessary time for people to get visas (about 3 months), we changed the deadline for the scholarship applications. You can apply for a scholarship before July 16th. We will then make sure that the applicants receive a response on July 25th.
It seems to me like forcing people to add data about emergency contacts isn't in the spirit of the WMF position around privacy. The same goes for the birth date. I can understand why you might need to know the year of birth but I don't see how the exact birth date is within the need to know. ChristianKl (talk) 12:58, 19 June 2017 (UTC)
Due the necessary time for people to get visas (about 3 months), we changed the deadline for the scholarship applications. You can apply for a scholarship before July 16th. We will then make sure that the applicants receive a response on July 25th.
If you can generate the list of items from a category (eg "Every entry in Categoría:Grandes cruces de la Real Orden del Mérito Deportivo should have P166:Q30278709") then you can use PetScan. After logging in through WIDAR (link at the top of the results), enter the WP language and category ([8]), then on "Other sources" select "Use wiki > Wikidata", and you will get a list of all the wikidata items representing pages in the category ([9]). Be careful with category depth - if you set this too high it can get very unexpected results. It's a good idea to quickly skim down the list and check that the items all look right and they're not, eg, lists of winners, or articles on the award. If there are any, uncheck them.
Then you can fill in the "process commands" box, which is a very simple syntax - P166:Q30278709 means "add this property:item pair to every item". Then hit "Process commands", and it will do them all for you. Any which already have this property:value pair will be skipped.
If you have a manually developed list of items that doesn't match a category, you can use "Other sources: Manual list" in PetScan, or else you can use QuickStatements (v1, v2. This takes a tab-seperated list of item-property-value entries which you can generate using a spreadsheet. QS is a bit more complicated to use but a) it's harder to accidentally do items you didn't intend, and b) you can add qualifiers and sources, which you can't do with PetScan. Andrew Gray (talk) 11:28, 24 June 2017 (UTC)
Dear @Andrew Gray, thanks for your detailed reply. I've built http://petscan.wmflabs.org/?psid=1131753 but after generating that list (I did the query myself to learn and checked with the links you already provided to me to see if I did it okay) I cannot find the "process commands" box in PetScan. Is it from a different tool? Regards, —MarcoAurelio (talk) 14:21, 24 June 2017 (UTC)
HackZurich is one of the largest hackathons in Europe and it will take place in Zurich, during the Digital Festival 2017 in September 2017 (15.09 - 17.09).
HackZurich is a great opportunity to make open data fans and hackers in general aware of Wikidata, and also to hack for Wikidata. We invite Wikidata volunteers to apply for HackZurich and code for the city.
One day before HackZurich, we will run a Wikidata workshop organised by and at the University of Zurich. You can participate in our workshop independently of whether you join HackZurich or not. We will have talks, small hands-on sessions to learn how to code with Wikidata and a brainstorming session to discuss what we could hack during HackZurich. You can register for our workshop here: Wikidata Zurich Workshop
If you are a Wikidata volunteer and you would be willing to help running a mini hands-on session during the workshop (14.09.2017), please contact us, we would be happy if you join us!
I would like to start a discussion - or rather, renew, as I think I talked about it here before - about claims having no units for properties that require units. We have quite a number of such claims, and something like "height" or "area" without a unit is mostly useless - granted, in some cases it can be derived from context, but that requires human intervention, and the whole point of Wikidata is so that it won't be the case.
I am mainly talking about such properties as (the number is how many unitless claims there are):
I think such claims should be either mass-deleted or assigned some kind of default unit (e.g. kilogram (Q11570) for mass (P2067)), but I would like to hear opinions on this. Maybe there's a better way of handling it. It's not hard to change them (I actually have the code that can find them, and deleting or changing them would be very easy) but I'd like to see what the consensus is. I think we should do something, because having useless data in the database is bad, and having something recoded as "mass = 42" is useless, it can not be used for any work where it is important to know mass.
I am not sure how the process of consensus gathering should work in this case - should I make an RFC, or discussion here is enough? Laboramus (talk) 05:10, 11 June 2017 (UTC)
It is better to have data than not having data. Just deleting for an opinion is imho the last thing we should do. With data we can indicate the ones as problematic and seek help. With no data there is not even a chance of that. Thanks, GerardM (talk) 07:54, 11 June 2017 (UTC)
To be more frank than I usually are "It is better to have data than not having data." is bullshit! We could add "42" as a value to every question here, but it does not makes any sense at all. It would be a mockery for Wikidata. I just removed this before I saw this new thread. The claim had no source, so I could not fix it. I sometimes remove such fully useless claims when they have no source. -- Innocent bystander (talk) 08:56, 11 June 2017 (UTC)
I think the « assume good faith » approach implies that if it’s not obvious vandalism there is at least something true in it. Find the pattern then mass correction is a possible approach. You should not care too much about the laughters. We all know why we are here aren’t we ? Better laught with them, a little bit of self-mockery is often a good thing.
First they ignore you, then they laugh at you, then they fight you, then you win.
Well, I guess you should avoid to call ohter people opinion "bullshit", then :) Unitless claims appears in constrain report, so they are also very visible. author TomT0m / talk page14:04, 11 June 2017 (UTC)
@Innocent bystander: Such strong opinions need the backing of strong arguments. You fail to provide them. Please explain why you think you understand the issue, what the ramifications are of both our opinions. So far you insult but fail to convince. Thanks, GerardM (talk) 15:45, 11 June 2017 (UTC)
Constraints reports gives us limited support in many cases. The report for "area" is in my watchlist, and I have seen very little progress there. And the attitude "the more data the merrier" has not made Wikidata more useful, it has only improved the numbers in the statistics-reports. That attitude has also given us Wikipedia as the main source for our data. If Wikidata should be useful at all, for anybody except for ourselves, we have to do much better than that. The class I am working with at the moment is filled with bad coordinates imported from nlwp. I have asked bot-owners to replace those claims with better sourced data, but it is not until I start to remove such bad claims that they are replaced with anything at all. -- Innocent bystander (talk) 19:00, 11 June 2017 (UTC)
I disagree that it is better to have bad data than none. Bad data is not useful in any application, but it makes it look as there is data, instead of showing the real situation - that we have no data. If for specific cases we can add units, I'm all for it, but if it's not possible, I'd rather have it explicit that we don't have good data than having meaningless numbers which can not be used for anything. -- Laboramus (talk) 20:44, 11 June 2017 (UTC)
@Innocent bystander: OK, but in your example, Seav just forgot the unit and the source. However, it's obvious that is km². Instead of delete it, you can add the unit. There is no source but it's not difficult to go on Wikipedia. And yes, "It is better to have data than not having data.". Tubezlob (🙋) 09:40, 11 June 2017 (UTC)
@Tubezlob: I definitely agree that a fix is better than to delete. But how should we fix when there is no source provided? And as you see below, looking up who the user was who added the data is not always simple. Do you yourself know what you used as source in a QuickStatement-run two years ago? I can tell that I don't! -- Innocent bystander (talk) 07:58, 12 June 2017 (UTC)
In one specific case, it may be easy (sometimes it's not). In 4000 cases, it's not possible to go through all of them and manually reconsider each one, and nobody is going to do it (unless somebody volunteers to do it? please feel free to :). So I am looking for better solution. -- Laboramus (talk) 20:44, 11 June 2017 (UTC)
In case of height (P2048) and width (P2049) more than 3500 of each are repairable. However, data is in fact partially wrong, so besides an addition of the unit, the values need to be fixed as well. Takes some time, but still: doable. I am working to fix it in the next days. —MisterSynergy (talk) 20:50, 11 June 2017 (UTC)
Here you have another very poor example. "area=1,74±0,01". I changed it now to 331 hectares as of 2015-12-31. This example had a poor source (enwp) but was still not correct. -- Innocent bystander (talk) 10:51, 11 June 2017 (UTC)
Unitless data should simply be removed. If Height = 120, this could be meter, feet, cm or whatever obscure unit. Having such statements is even dangerous, because some user may change (in good faith) the unit to meter, when it's actually feet. Then there is a real problem, because the data looks then good, but is in fact bullshit. Steak (talk) 07:21, 20 June 2017 (UTC)
Some structured data is better than none
First a few assertions:
There is no data store without problems. This includes Wikipedia and Wikidata.
The data we hold is best understood by applying set theory. The data in Wikidata consists of many subsets; probably the most valuable subset for the WMF are the interwiki links.
The error rate in each subset can be assessed and is by definition different from the overall Wikidata error rate
The absence of data often indicates a bias in what Wikidata holds. A good example is the lack of data relevant to the global south.
Given the huge influx of data from Wikipedia, the biggest imports have been from English Wikipedia and it is one reason for the existing biases in Wikidata.
An absence of data prevents the application of tools. Tools may suggest writing a Wikipedia article, tools may compare data with other sources.
Concentrating on the differences between Wikidata and any other data source is the most optimal way of improving the quality of existing data.
Having an application for the data in Wikidata is the best way for improving the usefulness for a subset of data.
Each contributor to Wikidata works on the data set(s) of his/her own choice, these data sets interact in the whole of Wikidata. This may raise issues and this can not always be avoided.
Examples of problematic data must be seen in the light of the total of the dataset they are part of. Statistically they may be irrelevant.
Never mind how "bad" an external datasource is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration
Wikidata improves continually and as such it is purrfect but it will never be perfect.
consequences
When people assert that all data in Wikidata should have a source or must be complete, they have a point. Once this point is made, it is then followed up with a proposed action.
This typical approach is often seen as problematic by others because it violates the assertions above. When problematic data is inside or outside of Wikidata and we have a healthy collaboration going with an external party it follows that after curation our data is improved or their data is improved.
When sources are lacking both inside and outside Wikidata, it is not really problematic when the data is the same. Sources are however of particular importance when there is a need for reconciliation and curation
One of the Wikimedia values is "be bold".
Much of our data is incomplete. When a constraint insists on a combination, the constraint is a warning and not really a constraint. A genuine constraint is "must have" but it can be taken in two ways. For instance: an award must be given to a person or an organisation. When this instance is lacking, it must not be a fictional person or fictional organisation. The second approach removes many constraints but is imho more relevant
There are several conventions where a bot could help out. When multiple initials exist in a label for a person, is there a space between them or not. – The preceding unsigned comment was added byGerardM (talk • contribs).
discussion
If there's a healthy collaboration with the third party than data gets improved. On the other hand, simply important all geonames data doesn't provide for such a healthy collaboration and the same goes for other ways to import a lot of low quality data. ChristianKl (talk) 21:28, 13 June 2017 (UTC)
So you insist on maintaining the current bias. You insist on preventing tools build to support our movement from working because of a lack of data. Why do you think we can not collaborate? How do you define "low quality data" when our own record in this sub set is abysmal? Thanks, GerardM (talk) 00:24, 14 June 2017 (UTC)
Basically GeoNames is a dataset that combines everything on which they could get their hands. It's optimized for quantity not quality. It also doesn't have the manpower to collaborate to remove errors. If we add to our cache of geo data a lot of low quality data it becomes less useful for actual tools.
The way to solve our problem with having no good data on Ethiopia isn't to seek to import a lot of low quality data about the country but by nice to Amheric and give them the freedom to contribute data in the way they want. If they have some use-case for an existing data set it can be useful to help them to import that data set but I don't think the way to get rid of Western bias is for Western people to create a lot of items about non-Western countries. ChristianKl (talk) 11:37, 14 June 2017 (UTC)
Much of the GeoNames data should be in Wikidata already, warts, and all because of its inclusion in Wikipedias. As we link to GeoNames already, we do include its source ID, we can compare the data that we share, collaborate on the curation of differences. When you insist on not having data about countries like Somalia, Ethiopia, Eritrea et al, you insist on maintaining our existing bias. No data is 100% incorrect, we can not do worse. Our tools cannot operate on no data, we cannot suggest to people to write an article. No data is the epitome of failure in what we aim to achieve; share in the sum of all knowledge. Thanks, GerardM (talk) 12:30, 14 June 2017 (UTC)
Last thing I want is deal with awfully fake Geonames data; especially if it is freemium licensed part of GeoNames.
You provide the exact arguments that make your opinion irrelevant. First you only want to work only on sourced information and second you make an absolute statement based on emotion not on facts. I have asserted that when cooperating with external Sources, sources are not crucial but differences are. I have also asserted that everywhere you will find vandalism/errors (choose your label), it is however the percentage that counts. I have also asserted that when we collaborate with another community, even this is largely irrelevant. So you do not want to work on this? ... Fine. Thanks, GerardM (talk) 07:59, 15 June 2017 (UTC)
I honestly don't think that it is possible to evaluate every dataset using same set of rules; especially geodata.
Best part of the Wikidata is that I can get slice of dataset where all data has great or acceptable sources; then I don't care how bad rest of Wikidata is. d1g (talk) 02:16, 14 June 2017 (UTC)
It is positive that you think but it is sad that you do not share your arguments. Wikidata includes an enormous amount of geodata that is of the highest quality that you cannot use. Most of its labels are in Chinese. Is that what you mean? Does it help us when we share data, compare, evaluate, curate and improve at both ends by concentrating on differences? ABSOLUTELY. Is there any proof that throws my assertion in GEODATA.. Really and how?
When you only care about the data with "great and acceptable sources" fine, go and do your thing. Especially for you I have added an additional assertion. Thanks, GerardM (talk) 07:47, 14 June 2017 (UTC)
Well, this postulates that the data we have is well structured. But in many cases it isn't. Thousands of articles are now deleted on svwiki since the quality of Geonames proved to be to unreliable. No data there proved better than the unstructured mess of Geonames. The whole basis for some of our structures is shipwrecked in some cases. I on daily basis have to deal with the postulate that administrative divisions follows a hierarchy from the top to the bottom. In a subset of the present world, that is maybe true. But we cannot model reality based on a small simple subset of reality. -- Innocent bystander (talk) 08:07, 14 June 2017 (UTC)
The experiences with importing data directly to Wikipedia is not part of this discussion. We would be talking cooperation between Wikidata and Geodata.
I am on record that data to be used to import data to a Wikipedia is best imported in Wikidata first and the texts for articles are then generated and cached. This has as a benefit that all the curation will affect the data presented to readers. Only when an editor decides to change the cached text, it becomes an article (and it is out of our hands).
Administrative divisions do follow a hierarchy from the top down. I have spend some time on such hierarchies in the Ottoman empire. They change. So far we have incomplete datasets in Wikidata because of incorrect assumptions inherited from one or more Wikipedias. All the more reasons to consider these and do better as a result. Thanks, GerardM (talk) 08:40, 14 June 2017 (UTC)
@GerardM: Do you tell me that "Administrative divisions do follow a hierarchy" because it did so in Ottoman Empire? Maybe so, but the system should work also outside of Ottoman Empire, and also in all times. Some areas close to the Swedish/Norwegian border was co-administrated between the two nations for some decades. The hierarchy is here therefor not even simple on nation-level. And if we go even further back in history, we find even more complex relations between Sweden, Norway and Novgorod. -- Innocent bystander (talk) 10:23, 20 June 2017 (UTC)
welcome to the maintenance of rotting urls. how many folks calling for references will pitch in? do we need an automated internet archive bot as a stop gap? Slowking4 (talk) 15:23, 19 June 2017 (UTC)
I vote for a new property - we should be able to map the property values from the "bad" url and then delete the URLs. But in general - an automatic archive bot isn't a bad idea either. - PKM (talk) 22:17, 19 June 2017 (UTC)
I created my first property proposal here and attempted to transclude it into Wikidata:Property_proposal/Creative_work with this edit. However, the property proposal page is still showing the following message near the top of the page:
You have not transcluded your proposal on Wikidata:Property proposal/Creative work yet. Please do it.
I such cases, it is always worth a try to make a null-edit to the page. (Press the edit button, and save again without changing anything.) Whatever caused the problem, looks like it is solved now. -- Innocent bystander (talk) 10:21, 24 June 2017 (UTC)
The name of Las Estrellas (Q80478) was Canal de las Estrellas until the 22 august 2017 when they changed their name to Las Estrellas. In the infobox of TV shows that started before that date the name of the station should be displayed as Canal de las Estrellas and for the ones after that date it should be displayed as Las Estrellas. When loading the data from Wikidata we use the label, but that way it's impossible knowing when each of the names should be displayed. How should the name change be added as Wikidata properties to allow us automatically taking the correct one? I've been thinking on using official name (P1448), but I don't know if there is any property that would be more appropriate for the current use case. Thanks. -- Agabi10 (talk) 11:39, 27 June 2017 (UTC)
I exchanged e-mails with Marc Wick, the founder of GeoNames. It turns out that he is looking into connecting GeoNames to Wikidata. He finds problems that need to be resolved because of errors on our end. Errors are to be expected.
I asked permission and this is where you can find the e-mails so far. As I have asserted before, when there is cooperation any complementary data is welcome because it improves an 100% error rate to something that is more manageable. The whole history with Wikipedias importing data from GeoNames is imho our own problem. When we welcome the data they want to import and provide the platform to generate texts, texts that are cached and not saved, our efforts to curate data will have the biggest effect.
What is quite clear to me is that only the first to throw a stone at GeoNames should be the one that makes no mistakes. As we all make mistakes collaborating is the only sane way forward. Thanks, GerardM (talk) 08:31, 17 June 2017 (UTC)
Since I already have tried to work with this dataset, I would like to start to pinpoint some weak points. The links to Wikipedia in GeoNames can a little to often not be trusted. Unfortunately we some years ago started to add GeoNames ID (P1566) here based on those WP-links. A little to often, the only thing the WP-articles and the GeoNames-items have in common is the label. A second problem is that the coordinates are rounded to minutes. That is not very useful when it is used to pinpoint a specific summit (Q207326) among others on a mountain. It also makes it difficult to find where a specific reef is IRL. A large problem is how the dataset in GeoNames has been collected. It looks like any name on a map have been harvested. But what those names describes has not always been identified in a good way. Beyne-Heusay (Q681039) is a commune in Belgium. There is no populated area with this name. I live fairly close to this point. I can tell you that there is no populated place with the name "Ytterlännäs". Ytterlännäs is the name of several existing and former administrative units, but there has never existed a populated place with this name. The namesake of these administrative units is Näs, located next to the old church of Ytterlännäs.
The problems on svwiki has not only been the quality of this source. How the project was implemented and what complementary sources was added together with the algoritms used by the bot also have affected the project. I do not in any way oppose any cooperation between GeoNames and Wikidata. In fact, I would have preferred if Lsjbot would have started here instead of on Wikipedia. That would have included more experienced users in the discussions about how the project should have been implemented. -- Innocent bystander (talk) 10:25, 17 June 2017 (UTC)
Thank you for your thoughtful reply. Yes, there are issues and these issues are found at both Wikidata and GeoNames (see blogpost). What we really need is a way to communicate issues in both directions. We are good at the things we do but we communicate badly. I am not a Wikipedian and I find many issues with awards that are fine in Wikidata but still problematic at the Wikipedias I frequent.
If there is one thing I wish for the strategic process is that communication of issues gets a better facilitation. Thanks, GerardM (talk) 11:15, 17 June 2017 (UTC)
Yes, the definition of "city" is one of the largest problems we have here. The Swedish article Stad mentions several different meanings of the word in Swedish alone. It could for example mean both an administrative entity or a larger settlement who is identified with one single name. The original meaning is "place".
Another problem worth talking about is the difficulties to identify what a "University" or "School" is. Both Wikidata and GeoNames have this problem. Universitys and Schools are something else than a building. The Lsjbot-articles about Universitys became something to laugh about. A University is located in B, it has climate C and is surrounded by D. I worked some time in Linköping University (Q782600). Telling that it is located at a specific place and describing the nature around it does not make sense. It has activity in several places in Sweden. The same thing could also be said about many things. Is a Municipality an organisation or a geographic feature? I work with a lot of people in the local municipality, and we mainly discuss such things as social care, economy and health care. We never discuss geography. -- Innocent bystander (talk) 15:04, 17 June 2017 (UTC)
In practice I think we should subclass the concept of city more frequently. When in German a city is a human settlement with at least X people and the French concept is one with at least Y people it makes sense to have a concept of "German city" and one of "French city". ChristianKl (talk) 17:31, 17 June 2017 (UTC)
Subclassing based on a local understanding? It should suffice to have the number of inhabitants for places. It is the user who understands the concept of city in a particular way. Having a subclass of French city only means city in France to me. Thanks, GerardM (talk) 18:40, 17 June 2017 (UTC)
Well, it is not as simple as body counting. If a place has a population of 200,000 but still has the infrastructure of a village, it is still today called a village ("by") in Swedish. A place here who do not have a liquor store and a pharmacy can here hardly be called a city. -- Innocent bystander (talk) 10:05, 18 June 2017 (UTC)
While working on the Thai localities, I also noticed several wrong entries in geonames when it comes to the populated place category - luckily I could stop the ceb-wikipedia bot from importing the whole of geonames in Thailand before it was too late. However, it seems most of these were imported from yet another database, which we have as GNS Unique Feature ID (P2326), so the blame for wrong entries goes one step further. Also, the above mentioned "Ytterlännäs" originates from GNS-UFI -2536857. Sadly, it seems the original ID from GNS isn't found in the geonames data anymore. That said, it is still a good idea to closer link geonames with wikidata, but don't blindly import from geonames. Ahoerstemeier (talk) 21:03, 17 June 2017 (UTC)
When we do not have data, we have a 100% failure. When we are not to blindly import, what scenario do you propose? Thanks, GerardM (talk) 10:36, 18 June 2017 (UTC)
One common mistake in GeoNames is that they mix up items with the same label. This item has the name "Viksäter" and GeoNames tells it has a population of 309. But this is a village with only a handful of houses. If they do not hide 300 aliens in the barns, that data is completely wrong. The Viksäter who had a population of 309 is this as of 2005. Note that the data was removed in the history of the item. But it was rollbacked. There are no official records of how many people lives in villages like this, at least we have not had any for 100 years. Wrong data is only useful as fiction, and that is not what I am interested in here! -- Innocent bystander (talk) 11:59, 18 June 2017 (UTC)
We don't have "no data". Wikidata has some data. If we take data about cities there are use-cases where it's more important that the data we have is trustworthy, that there's a lot of data. False positives can be more harmful than false negatives and it's usually more work to remove false positives as it takes more human analysis. ChristianKl (talk) 18:40, 18 June 2017 (UTC)
I have no idea about the quality of the GeoNames database but it seems that the GeoNames ID (P1566) we already have in Wikidata are also to blame. Following up on User:ArthurPSmith's import of GRID I have set out to add headquarters location (P159) to the company items he created, using GRID's data. GRID provides GeoNames ids, but it turns out that I get more accurate reconciliation results by ignoring them, because of the poor quality of the claims on Wikidata. This is a shame, because in many cases GeoNames ids bring a real value by successfully disambiguating between cities. For instance, we have
but GRID uses 1862415, which seems to be the correct value. I observe a disagreement in 20% of the reconciled cities.
It seems that many disagreements are of this king (the confusion between a big city and an administrative territorial entity that contains it and has the same name). That could probably be fixed automatically, but I do not have the time to look into that. − Pintoch (talk) 17:48, 18 June 2017 (UTC)
The mass creation of bot generated local Wikipedia articles only based on GeoNames (Q830106) is similar to that ones only based on Catalogue of Life (Q38840) by Lsjbot (Q17430942). In both cases we are enforced to have a (sometimes dubious) labeled item and to get in (by bots) more questionable property values or relations. Obviously the creation was done without sanity checks. Collaborating with external data providers is a good thing, but first of all I'm missing a way to a give straight feedback to the Wikimedia communities that are responsible. --Succu (talk) 21:55, 18 June 2017 (UTC)
It is now June 2017. This happened and as I said before, I prefer to have the data first in Wikidata and use caching to provide the texts into the Wikipedias. At GeoNames they have noticed that our data is wrong in places, they can collaborate with us. My assertion is that when there is collaboration, we can do better for both our projects.
This is not about feedback to the Wikipedias involved. This is about collaborating with GeoNames. Thanks, GerardM (talk) 01:08, 19 June 2017 (UTC)
@Succu: The articles are not only based on GeoNames. They are based on data from Nasa and some other sources together with some algorithms who interpret in what range a mountain is located and the size of lakes etc. A deep review of how it handled Finland shows that the algorithms had severe flaws, so also the complementary sources (which tells Finland has savanna). Add to that much of the data is based on coordinates that sometimes are wrong. I stopped supporting it when we reached Faroe Islands (Q4628) and I discovered that big parts of the poor nation was located on the bottom of the Atlantic Ocean.
I love to see a collaboration with GeoNames. But if it means that much of the data from that database is imported here without review of the quality of each part, I recommend against it. The flaws are not only in our end. Au contraire, they are much worse at GeoNames! -- Innocent bystander (talk) 19:19, 20 June 2017 (UTC)
Release date of computers
Hi. I can't find a common criterion for a property to mark the release date of computers:
Hi, I'm planning to buy a new tablet and decided to use query.wikidata.org service to find the best option. Unfortunately I found, that there is not much data about tablets even for the popular ones. For example look at Samsung Galaxy Tab S3, at the time I'm looked at it, it had zero statements. While Wikipedia info box has all the data I need, but not querieble. Then I decided to fill in missing data manually for a few devices I'm interested in. But unfortunately there is no properties for that kind of things either. So my question is, is it possible to create and use needed properties for phones, tablets and laptops without long approval period. I don't know how long it takes to get a proposed property accepted? And I need a bunch of properties. Also I have good Python skills, and could write a script or a bot, to import all this data from Wikipedia info boxes if someone gives me directions how to do that. But of course, without properties, it will not be possible anyway. Sirexo (talk) 18:51, 20 June 2017 (UTC)
Properties are given a minimum of one week for review. If you have a good case, good examples, and fill out the proposal template correctly (see some of the existing proposals at WD:PP) then it should be straightforward. Do search the existing properties first to see if there's already something that meets your needs though. ArthurPSmith (talk) 20:54, 20 June 2017 (UTC)
Sirexo - you might also want to check out meta:WikiObject proposed by Qupro. It has a list of useful wikidata properties for specific products - however we have so far not had a lot of detailed product information added to wikidata in general. ArthurPSmith (talk) 21:01, 20 June 2017 (UTC)
As ArthurPSmith said accepting a property takes at least a week.
When Wikipedia writes a template it's only important to keep the specific items for which the template is created in mind. On the other hand when we create properties for Wikidata it's important to also keep in mind how the property is used in other items as well. It's also important to find names that make it unlikely that the property will get misused.
Hey everyone,
i've made a tool that allows you to query Wikidata in a visual waywithout using SPARQL. It's called VizQuery.
The possibilities of using Wikidata to do interesting queries are endless, and the current query service allows for very powerful queries indeed. However, i feel that for the general public, especially those who are not that technical, it might be a bit overwhelming and difficult for them to learn a complex language such as SPARQL. To make people familiar with the concept of queries i believe a somewhat less intimidating approach might be useful, hence this tool.
VizQuery is only capable of doing a subset of possible queries. It's basically simple triples, variables (prefixed with '?') and literals (between "quotes"). You can do pretty powerful queries with only those things though. For example, here's a query with vegetarians who are married to a vegetarian.
Under the hood VizQuery uses Ruben Verborgh's SPARQL.js library to convert between JSON and SPARQL, so theoretically every SPARQL query
you could do in the regular query service can be done in VizQuery. However, many queries won't work because the visual interface only supports a subset of options: it's pretty hard to create user-friendly GUI representations of many of the complex SPARQL features. :)
Anyway, i'd like to hear what you think. Bugs, feature request and pull requests are also welcome on my Github page.
Nice tool! Consider to add support translation of of interface and make it work in more browsers (at Internet Explorer (Q1575) or Firefox (Q698) does not work for me).
An Idea - User statistics based on wikidata
hello,
I'm thinking about a possibility to use the query service for statistical issues related to Users of Wikimedia & partner projects. For example, it would be interesting to view the activity of users based on their projects, activity type (edits), locations, etc. I know some statistical tools exist already but with limited abilities. I think that involving wikidata in this task could bring encouraging results to present to the different wiki communities. My question is: has this idea been discussed before? and if yes, what is the actual state of this issue? If no, I'd like to listen to your opinions. Greeting! --Sky xe (talk) 01:36, 21 June 2017 (UTC)
Wikimedia cares strongly about privacy. If you propose a tool to analyze people's locations that's unlikely to get very far. Fleshing out a proposal that produces an added benefit, doesn't take too many technical resources and that doesn't violate privacy is non-trivial. ChristianKl (talk) 10:36, 21 June 2017 (UTC)
I guess Sky xe means locations of the subjects edited. Such things like coordinate location (P625) of articles/items edited by a specific user. A user like Salgo60 who edits a lot about people buried on a burial place in Stockholm will then be very sharply located. I, who in my early years here at WMF-wikis, edited a lot in the Asteroid belt and in the realms of the (ex)planet Pluto will have a very interesting pattern.
I agree with ChristianKl here. There are already plenty of tools around which allow to stalk users to an uncomfortable extent. While those tool results base on a lot of publicly available information, the accumulation and evaluation of this information often goes too far to my opinion. For me this is in fact the main reason not to reveal my real life identity, and not to attend real life community gatherings such as the WikidataCon. —MisterSynergy (talk) 10:46, 21 June 2017 (UTC)
Agreeing with ChristianKI and MisterSynergy if this is related to contributor's identity, IP's, edition patterns etc. And still waiting for the arrival of structured data on Commons. Let's see how that stuff handles dates, places, coordinates and let stalkers picture a "chronomap" of a contributor with a handy app. Strakhov (talk) 14:27, 21 June 2017 (UTC)
Thank you all for the answers. I agree that privacy indeed is a critical point. But imagine we can view and evaluate statistics -by non violating the privacy of all user identities- such as:
Which wiki project has been most active in the last period (week, month, year) relative to the number of active users and which languages are basically involved.
Which fields have been edited most, e.g. in the English Wikipedia.
What is relation between users and their activity on the different wiki projects (to bypass the privacy challenge, e.g. identification only by countries instead of geographic coordinates based on IP protocols)
etc.. a lot more could be possible.
I think that the idea deserves to be discussed and I'm sure, more aspects will be enlighten thereafter.. Greetings! --Sky xe (talk) 16:47, 21 June 2017 (UTC)
Given current practice, I think the country in which a user resides is seen as private. Once you allow queries, they can also used for things besides your examples. ChristianKl (talk) 10:15, 22 June 2017 (UTC)
Identifier for FamilySearch
I may have asked this before, but I can't find the results of the discussion. Should we have a Identifier for FamilySearch even if there is no landing page for the link unless you are registered for free. For biographies it would provide the data and actual image of the record for people in the census, and other primary records for confirming birth, marriage, and death. Here is the page for Elizabeth Coleman White at https://familysearch.org/tree/person/29WD-R5W/details To see it you would need to register, but the site is a cornucopia of biographical information based on primary documents. We link to Geni.com, but that does not have the original documents for free. --Richard Arthur Norton (1958- ) (talk) 23:41, 21 June 2017 (UTC)
I noticed today that the watchlist has a feature where hovering over an item or property link gives the label and description in the user's default language - so, for example, Cédric Villani (Q334065) has a little label saying "Cédric Villani | French mathematician". RecentChanges and Contributions both have the same feature.
Elsewhere on Wikidata (such as here, or on an item page), hovering over an item/property link just gives the Q-number or P-number. It feels like it would be useful to turn the detailed version on everywhere, particularly when something links to an ambiguously titled item. Any thoughts? If there's interest I'll file a bug request. Andrew Gray (talk) 15:58, 21 June 2017 (UTC)
I would prefer disambiguation without hovers, but (in curly braces).
I don't think your description is accurate. Sometimes on the item page I see the description displayed. I think there's a script that tries to load the description the moment you however over it. If the loading takes to long the ID get's displayed. At least that's the best hypothesis for why it's behavior. ChristianKl (talk) 20:02, 21 June 2017 (UTC)
Good and bad news about cross-wiki search results in Wikipedia
Good news: The cross-wiki search results from other projects are now live in Wikipedia. Bad news: The search results from Wikidata are not part of the plan, meaning users won't see those results in any Wikipedia site. Feel free to share your thoughts here. --George Ho (talk) 20:36, 22 June 2017 (UTC)
George Ho does the "Discovery Team" know that wikidata exists? But I think this is somewhat moot - or at least up to the language wikipedias how to deal with. In, for example, svwiki if your search turns up few or no results it also does a wikidata search - for example [10]. ArthurPSmith (talk) 13:18, 23 June 2017 (UTC)..
They have enabled a gadget on the Swedish Wikipedia to show results from Wikidata. And seems the Discovery team is aware of Wikidata per this. Stryn (talk) 14:55, 23 June 2017 (UTC)
Nutritional Information
We currently don't have any nutritional information in Wikidata. It seems USDA Food Composition Databases contains a great database by the US government. US government data is in the public domain, so we could import it.
Do we currently have the necessary properties? ChristianKl (talk) 12:59, 22 June 2017 (UTC)
Pierre from Wikidata & Open Food Facts, reporting for duty :-)
- We recently imported all the USDA db into Open Food Facts. We're looking for contributors who can scan their food, and add pictures to augment the data. Millions of photographic evidence to take. There are also some generic profiles (without barcodes) in that database, which we haven't imported into OFF, but that I've put up for matching on Mix N'Match. More info at https://www.wikidata.org/wiki/Wikidata:WikiProject_Food/Properties
- Also, feel free to help Open Food Facts: we need contributors, translators, coders...
Unfortunately, Open Food Facts seems to have a license that's not compatible with Wikidata. I think having data under CC-0 so that everybody can use it is a more worthy project. ChristianKl (talk) 17:42, 23 June 2017 (UTC)
The USDA db is public domain, as all works by US Civil Servants, although derived (for the part with barcodes) from producer data. So you can import it into Wikidata. For Open Food Facts, I do get your point: We have chosen the ODbL to ensure the Db keeps growing, and that commercial apps play the game of sending pictures back. --Teolemon (talk) 18:45, 23 June 2017 (UTC)
Yes, importing from USDA would work for Wikidata.
I think you are wrong about the effects of licensing. Any company that uses a data set has an interest in that data being high quality. We likely wouldn't be able to exchange data with Songkick if we would be ODL-licensed.
In many cases, I would estimate that a serious company will build their own database instead of using OpenFoodFacts. The will use BSD/Apache-licensed libraries but they won't use GPL licensed ones. ChristianKl (talk) 21:20, 23 June 2017 (UTC)
date of death of Honoré de Balzac
I was just looking at date of death (P570) of Honoré de Balzac and we seem to have two values 18 August that all Wikipedia articles state and 19 August "stated in" "Integrated Authority File" 3 years ago. I assume "Integrated Authority File" means this link, but I could not find any mention of date of death there. Is it OK to delete that date if the source no longer states it? If there is a controversy over his date of death, I then we need to keep both dates but otherwise this looks like typo that someone corrected long time ago. What is proper procedure of removing the data? --Jarekt (talk) 12:50, 23 June 2017 (UTC)
Ok so that date is still returned by d-nb.info I wonder if it is a typo or there is a real controversy about date of his death. --Jarekt (talk) 18:37, 23 June 2017 (UTC)
Best practice etiquette related to choosing image (P18) property
I am working on importing some of the data stored in Commons Creator templates to Wikidata, and I am looking now at images used in creator infoboxes. I imported all the images that were missing on Wikidata, but now I am looking at cases where both Creator template and Wikidata item has an image and which are different. You can see Creator templates in question in here and the images at c:User:Jarekt/f. I think I have a good idea of what is superior image to depict a person, but would like to double check if others feel the same way. My current thinking is to replace Wikidata image if:
new image depicts person and current image depicts work created by the person
new image is a better quality version of the old image
new image shows a single person and old image multiple people
favor images which show face.
Any other suggestions, to help pick "better" image (P18)?
Shall we favor "portrait" vs. "landscape" aspect ratio?
In case of painters and photographers I prefer realistic self-portraits to depict a person. How others feel about it?
Is it a good idea to have multiple good images stored in P18? I think not, but just checking.
I completely support your first four points. Additionally, I think a color portrait is better than an engraving of a portrait in most cases. And if we have a mediocre "head shot" and a good full-length portrait, we should consider using a head-and-shoulders crop from the full-length portrait. I also prefer "portrait" rather than "landscape" orientation. On self-portraits, I generally agree; but I'd make an exception for Dante Gabriel Rossetti and use the Watts portrait rather than the self-portrait of DGR as a very young man. - PKM (talk) 01:29, 22 June 2017 (UTC)
I agree with all four suggestions. Also portrait ratio should be preferred, because of wide use image (P18) in infoboxes. No preference about self-portraits. Single image is preferred.
Watch out for pictures not representing the person like images of signature or grave (without picture of bust of person).--Jklamo (talk) 11:25, 22 June 2017 (UTC)
Thank you both for confirming my rules of thumb. I am replacing hundreds of images with (hopefully) better ones. I thought I will share a little shortcut I find very useful which I use in c:User:Jarekt/f. That page was created by a excel spreadsheet pairing up images from commons with images from wikidata for the same person. A little (+) can be used to perform 2-click replacement of the wikidata image with Creator one (I am not suggesting the actual replacement in case of those 2 images, so only do one click). That technique can be probably used for other mass tasks with human in the loop. --Jarekt (talk) 15:33, 22 June 2017 (UTC)
For Classical Greek and Roman authors, a contemporary bust (Q241045) (sculpture) is usually the best option over a mosaic, painting, etching, or later form of depiction. The best photographs of busts have the face pointed towards the viewer, and the best choice of bust will have the nose intact. When a bust is not available, a depiction on a coin may be useful. Care has to be exercised though, because some sculptures have been incorrectly identified, and later found to be so. The current data accompanying some of these sculptures is not always present yet in Commons or Wikidata. --EncycloPetey (talk) 21:02, 22 June 2017 (UTC)
To start with, do we all agree that there should be only one file in P18 for every item? I can easily see that if not, Wikidata can become a trash can similar to Commons; on the other hand, I see added value in say having an exterion and an interior of a building. Has this actually ever been discussed?--Ymblanter (talk) 08:49, 23 June 2017 (UTC)
Well, I would prefer to see both a male and a female file in lion (Q140) like it already has.
Replacing an image which already has a caption in a language you cannot interpret and change could be very troublesome.
A related topic: I would like to allow such things as depicts (P180) as a qualifier to P18. Often we have pictures of churches to illustrate a village. But how to tell which church? The discussions this far has denied such usage, but I think it could be very useful since you cannot add links into the caption-property. And what a picture illustrates depends on in which item/article you use the file in. File:Carlito syrichta on the shoulder of a human.jpg has for example been used to illustrate everything from Lsjbot to Primates and everything in between. -- Innocent bystander (talk) 11:49, 23 June 2017 (UTC)
I agree that male and female images (if different) for living organisms is a great idea. I guess that is an example of a case where multiple images stored in image (P18) are a good idea. In context of Commons Creator templates images will be mostly of people, and for such I find multiple images troublesome. I did not pay much attention to captions, as so few images have them I did not notice you could add them. To me a new better image outweighs the loss of a caption, but it should be considered in case of minor improvements. One thing I do not like in case of P18 images of people is to see images depicting their artworks instead of depicting the artists. When the artwork shows someone else that is hard to spot. --Jarekt (talk) 12:11, 23 June 2017 (UTC)
Agree, if it isn't a good self portrait, an art made of the subject is not a good file at all in P18. I guess they have been imported from WP-articles and things like that. No big problem in WP if the caption tells it isn't the subject, but such things goes unnoticed by these rough tools. . -- Innocent bystander (talk) 12:22, 23 June 2017 (UTC)
I like your rules of thumb and I agree that "image of the artist's work" should never be in P18. However, would it be worth creating a new property for "image of an example of this person's work", to go alongside images of plaques, graves, signatures, etc? This would give us soemwhere to put these, which might be useful for the cases when we have no other appropriate image but still want to have some kind of illustration (eg for an infobox). Andrew Gray (talk) 12:19, 23 June 2017 (UTC)
Well, don't we already have a property for "Notable work"? I guess P18 could be used as qualifier to those. Or even better, be put inside those "work-items". -- Innocent bystander12:25, 23 June 2017 (UTC)~
{comment}} @Jarekt: I don't see that we are exactly limited to a single image, though where adding more than one it is important that one is marked as preferred. — billinghurstsDrewth13:00, 24 June 2017 (UTC)
Annoyed with the amount of merges that can be done
I really hate it when people request an item to be deleted when all they could do is merge it. Waste of bytes for system, waste of my time, and just a waste of time for the person submitting because they have to find the duplicate and write that in reason. PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font!12:59, 24 June 2017 (UTC)
We could do with either changing our "single value" constraint so that deprecated values are (optionally) ignored, or having an alternative constraint that could be applied to that end (without the need to write complex constraint code). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits16:43, 12 June 2017 (UTC)
Not just for deprecated please. E.g. for Thailand central administrative unit code (P1067), there is one current single value, but there may be several older ones - which then have normal rank, not deprecated, as they were correct in past, whereas the current value has rank high. So single value should just check for single value in the highest used rank. Ahoerstemeier (talk) 21:43, 12 June 2017 (UTC)
I'd also suggest that it pay attention to whether there's a start/end date on entries. There could be two valid historic values for something, both marked 'normal', and with suitable start/end dates, but no current version to mark as preferred. --Oravrattas (talk) 06:52, 13 June 2017 (UTC)
In that case`, why not add a "no value" with high rank to mark that there is not current value? Then it'd work without problems by checking on the highest used rank. Ahoerstemeier (talk) 07:52, 13 June 2017 (UTC)
That is fine in the case where it's definitively known that there is no valid value at the moment, but it's possible to have a case where there are historic versions that are known to no longer be valid, and where people have correctly added the information for those. Forcing them to then either find a new value, or explicitly choose between 'no value' and 'unknown value', when they may not have that knowledge, simply to satisfy a constraint check, violates the "missing is not broken" principle. --Oravrattas (talk) 13:51, 13 June 2017 (UTC)
I think we can go with this: "single value" now means "single best value", ie. the highest rank in a statement group should only occur once. Matěj Suchánek (talk) 14:16, 16 June 2017 (UTC)
When it comes to authority control, where there should be only one correct value, I don't think 2 statements with rank normal and 1 statement with the preferred rank should pass the contraint.
I think that the default "single value" constraint should allow depricated values but limit the amount of other values to one. We can have an additional "single best value constraint that also allows multiple statements with the normal rank along one statement with the preferred rank. ChristianKl (talk) 13:32, 18 June 2017 (UTC)
We have phabricator:T167653 already it seems. My current thinking is that "single value" means there should only be one best ranked statement. Are there cases where this leads to data we do want to flag as not being flagged? --Lydia Pintscher (WMDE) (talk) 09:23, 26 June 2017 (UTC)
Wiktionary sitelinks enabled in Wikidata
Hello all,
As mentioned previously, we are now able to store the interwiki links of all the Wiktionaries namespaces (except main, citations, user and talk) in Wikidata.
Important: even if it is technically possible, you should not link Wiktionary main namespace pages from Wikidata. The interwiki links for them are already provided by Cognate, and in the future, Wikidata will also have special entity types for lexemes (see Wikidata:Wiktionary and mw:Extension:WikibaseLexeme/Data Model).
How can you help?
First of all, you can help us translating this documentation page in the languages you know.
If you know tools, scripts, bots, that could be useful for the migration process and removing the manual sitelinks, please share your informations on the page and offer help to people who would need to use them.
You may want to pay special attention to the new created items and all recent changes that will result from this new feature available for Wiktionaries.
Be friendly and welcoming with the Wiktionary editors :) Help them if necessary, make them feel part of the great Wikidata community.
Are you aware of any bot that could transfer automatically all the interwiki links to wikidata? I guess that bots which did the job for other projects can be adapted for this task. Pamputt (talk) 00:59, 22 June 2017 (UTC)
I haven't looked into European Parliament information in Wikidata in any great depth yet, but my impression is that we're quite far away from being able to answer something like this. Most of the position held (P39) entries that I've seen for Member of the European Parliament (Q27169) are completely bare — i.e. not even qualifiers that will tell us when this was true, or which constituencies they were elected from — so we have no way to even produce a list of who we think the current MEPs are to see how complete it looks. And committee memberships is something that we don't really have in much depth for any legislature yet (and not at all for the vast majority of them). My impression is that some people have done work tidying up data on the MEPs from individual countries, but no-one is actively working on the European Parliament as a whole. --Oravrattas (talk) 06:30, 24 June 2017 (UTC)
We even have a few hundred with no country, as well. There was a push to get them all in based on matching to a database produced by the EP - so we do have them all, give or take some errors or duplication, but beyond that the data is pretty light. Andrew Gray (talk) 10:32, 24 June 2017 (UTC)
The first two showed up on my watchlist; they sounded wrong to me so I reverted. But the user re-reverted again without an explanation. So I started a discussion on their talk page (Topic:Tt6ir4v8yjzxodgf), but made no progress, it seems we're talking past each other. Can anyone chime in with a third opinion please? Intgr (talk) 15:23, 26 June 2017 (UTC)
client A program that establishes connections for the purpose of sending requests.
server An application program that accepts connections in order to service requests by sending back responses. Any given program may be capable of being both a client and a server; our use of these refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.
When one says "web framework" it can mean anything depending on the perspective of the speaker and his grip in technology. There is no standard for "web framework". 17:54, 26 June 2017 (UTC)
Thanks a lot, this is much clearer! But I do not really understand why you keep bringing up the HTTP protocol. Not every web-related concept is described by the standard for the HTTP protocol. (As a random pick, I would take Gangnam Style (Q890): not sure whether it is more of a server or a client. But it is definitely a very important component of the web!) − Pintoch (talk) 18:32, 26 June 2017 (UTC)
Kommentar I'm not change my opinion that "web ..." aren't useful because how complicated "web standards" are now.
d1g the world wide web was invented by Tim Berners-Lee in 1989 or thereabouts (en:World Wide Web). Servers have been around since the internet began, at least 20 years earlier - see en:Server (computing) ("In computing, "server" dates at least to RFC 5 (1969)"). An email server, for example, responds to SMTP requests, not to HTTP, so it is not a web server, and certainly not a web application, but it is server software. Your use of "said to be the same as" here is clearly wrong, and your arguments are not making any sense at all. ArthurPSmith (talk) 18:18, 26 June 2017 (UTC)
That's my point?.. "server"/"client" was there since day 1 or so; you can't define and source "web framework" consistently.
"web framework" is meaningless: commonly it means "server", but sometimes it is used for "client" too. d1g (talk) 18:28, 26 June 2017 (UTC)
so why are you trying to claim something that you assert is "meaningless" is "the same as" something that is clearly well defined and "was there since day 1"? In fact, while "web framework" may be somewhat buzzwordish, in my experience it does have a tangible meaning distinguishing that kind of software environment from the other components of an http(s)-based internet service. Which is in fact why we have a whole enwiki page on it: en:Web framework. enwiki doesn't usually have entire long pages about things that are "meaningless". ArthurPSmith (talk) 19:12, 26 June 2017 (UTC)
QuickStatements is a great tool and I use it extensively however it is lacking documentation (as we already discussed at Project chat) so the main way of figuring things out is try and error. Reading User:Magnus Manske/quick statements2 page suggests that the tool might be supporting Qualifiers. Did anybody figured out how to do it? I would like to be adding statements like date of birth is "circa 500 BC" (as in Pythagoras (Q10261)). --Jarekt (talk) 18:46, 26 June 2017 (UTC)
Matěj Suchánek & ChristianKl, thank you. I misunderstood the documentation in the old version. I thought one can add multiple properties with a single statement, but it is one property and multiple qualitiers. That makes much more sense. Thanks again. --Jarekt (talk) 19:37, 26 June 2017 (UTC)
Integrate the merge function into the default UI
Currently, a user has to activity a gadget to be shown the merge function. This means that new users are discouraged from merging items. Giving that we have a huge merge backlog, it might be worthwhile to integrate the merge function by default. What do you think? ChristianKl (talk) 07:30, 18 June 2017 (UTC)
I agree this would be very useful. We can enable the gadget by default, that's by far the easiest way to do it. Opinions please! :-) Multichill (talk) 10:55, 18 June 2017 (UTC)
I think the gadget lacks some quickly accessible explanations how to use it, what it does and some warnings when not to use it. This is necessary to prevent accidental misuse. Matěj Suchánek (talk) 15:28, 18 June 2017 (UTC)
(edit conflict) Just a question (unclear to me): Do you propose enabling for all logged in users or just everybody?
There will always be some misuse since mistakes always happen. But the new audience to get this gadget enabled (ie. newbies) is more likely to make mistakes. This is my concern, hence my proposal. Matěj Suchánek (talk) 20:57, 18 June 2017 (UTC)
My preference would be to enable it for everybody. Sometimes this will indeed lead to mistakes but I think that the amount of increased engagement is more valuable than the mistakes that will be made. I would also be okay with enabling it for logged in users. ChristianKl (talk) 22:44, 18 June 2017 (UTC)
ChristianKl - while I think this would ultimately be the right thing to do, there are several features that I believe are not now present in the merge gadget that I think should be addressed before it receives more widespread use:
I don't believe it recognizes different from (P1889) which should prevent merges between the respective items if it is present on either of them. There are perhaps some other properties that should also prevent or at least raise a question about merges (for example if country (P17) is set for both and differs, or if coordinate location (P625) differs by more than, say 10 km (?)).
the description in a given language on the merged item is completely lost if the item it is being merged into already has any sort of description in that language (and many old items have bare-bones descriptions auto-generated from P31's). There should be a way to pick the better description or merge descriptions in some way if appropriate.
merging should sensibly handle the case where one item has a wikilink to a redirect and the other has the direct link (i.e. allow the merge in this case even though both pages have links in the same language).
otherwise I think merge really needs to be limited to more experienced wikidata users who can recognize and know how to deal with these issues. ArthurPSmith (talk) 23:04, 18 June 2017 (UTC)
Kommentar ... "we have a huge merge backlog ..." citation needed! Or maybe it is language specific. If we think it is true, can we demonstrate that?
+1 to Matěj Suchánek and ArthurPSmith's statements. I would prefer that there was a new right developed for that could be auto-assigned after reaching an edit count criteria, rather than all users at any time. There definitely needs to be some better supporting information. Numbers of us more experienced users have fallen over with categories to main; taxonomy; settlement to municipal where there has been errors in the existing links, or other errors that make them look the same.
The fact that you see books being merged with version, edition or translation (Q3331189) doesn't necessarily mean that someone used the merge tool to do it. Magnus merge game suggests items to be merged when they share labels or aliases. When the book and the edition share a name it would suggest them to be merged. ChristianKl (talk) 07:53, 19 June 2017 (UTC)
Kommentar @ChristianKl: then please have those instructions updated or removed as it is simply inappropriate, and we need a stronger means to prevent such. Further, I would suggest that version, edition or translation (Q3331189) should basically not be presented with the suggestion to be merged, it is very unlikely we will need to merge editions — billinghurstsDrewth13:05, 24 June 2017 (UTC)
I think we have a merge backlog, because in the SPARQL query that's supposed to look for items that are supposed to be deleted that's shared on the German Project Chat the event that items should be merged happens more frequently than that they should be deleted. In the recent RFC about allowing redirects there was also a concern that it increases an already existing merge backlog. ChristianKl (talk) 07:53, 19 June 2017 (UTC)
Please provide a link to the query so we can all see the evidence. PLbot produces lists of works to merge, and from my observations of English language components it would appear to be in a managed state. — billinghurstsDrewth13:12, 24 June 2017 (UTC)
PLbot can only find items that have actually the same name. When working through tinyurl.com/ycr58tmo I found a bunch of items where merging made more sense. At the moment, for some reason that query seems to time out. ChristianKl (talk) 11:32, 27 June 2017 (UTC)
If we want to couple merging to a user right, it might make sense to use the already existing auto-patrol flag. It could work for both the merge tool and for the merging game. ChristianKl (talk) 13:03, 19 June 2017 (UTC)
Yes you could merge it if you thought that they should be co-assigned. That said, that is an applied right either by application or grant of administrator, so it will not necessarily meet your needs unless administrators are pro-actively assigning the right. And to note that you would need to do a little magic through your gadgets as people should still be able to turn it off if they do not wish to see/have the merge functionality. As it is already a gadget, I don't think that you will get best value to co-assign it that way. — billinghurstsDrewth22:53, 26 June 2017 (UTC)
Another thought. If we have some editing criteria, eg. 1000 edits, as a point when we think that someone knows enough to merge, we could just run a bot through finding users who pass a milestone value during the past <name your time period> and leave a note about the gadget. — billinghurstsDrewth22:58, 26 June 2017 (UTC)
Given that the user has to click on "More" before they see the possibility to merge, I can't imagine why someone wouldn't want to have the merge function enabled. ChristianKl (talk) 11:32, 27 June 2017 (UTC)
Bulk create items without statements, but only a single sitelink ?
In the past, occasionally, bots create thousands of items merely with sitelinks, but no further statements.
The situation has improved. Items without any statements for enwiki are at 4.6% (see Wikidata:Database_reports/without_claims_by_site). Still, we have items created years ago without any statements.
The question is if we should continue to create items without any statements (and wait till some day someone adds statements to these). Alternatively, we could decide that bulk created items should include at least one statement, if if a sitelink is present. --- Jura22:01, 25 June 2017 (UTC)
For Wikipedia, the answer may be "yes". For Wikisource, the answer is a resounding "no". Wikisource items without any statements are valueless, and many, many, many items at Wikisource sites are chapters of books, or acts within plays, and really should not have data items. --EncycloPetey (talk) 02:50, 26 June 2017 (UTC)
Clearly the tool you mentioned, Wikidata:Database reports/without claims by site, does not work for unlinked pages. In fact, only few tools (e.g. Duplicity) work for unlinked pages, but it's a huge backlog (multiplied by hundreds of wikis), and it's not practicial unless someone periodically check them (like nlwiki). Also, another way to kill unlinked items is QuickStatements, which required both the subject and the object are connected.--GZWDer (talk) 04:25, 26 June 2017 (UTC)
@Jura1: I think that every item in Wikidata without any statement is useless (even when it has multiple sitelinks, but the more sitelinks, the more easy it is to add any statement), and therefore can be deleted. Maybe a way to decrease the number of un-statemented items is to make a list with 50 items on it and give it a week to be improved. If the week has expired, and there are still items with zero statements they can be deleted without processing through RfD. If there are others with any other idea to clean the list of empty items, feel free to add it!
"I think that every item in Wikidata without any statement is useless (even when it has multiple sitelinks...), and therefore can be deleted" So you'd be happy to break the interwiki linking between two Wikipedias, or other sister projects? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits10:33, 26 June 2017 (UTC)
@Pigsonthewing: No, not at all, I see how much Wikidata can do to connect all the different languages. But if there are no statements, I think that the items are quite useless. And as I also said above, the more sitelinks an item has, the easier it should be to create some statements about it because there are multiple articles about that topic. So I think we should put more work in adding statements to (almost) empty items. Q.Zandenquestions?10:42, 26 June 2017 (UTC)
@QZanden: Is your position that you see what Wikidata can do to connect different languages but you still want to remove items that provide sitelinks between different articles in different languages?
Given that descriptions of items can be displayed in the Android App, items with a single sitelink and no statements can provide benefits to users. ChristianKl (talk) 11:06, 26 June 2017 (UTC)
The number of items without any sitelinks are growing steadily. It is impossible to guess what articles in any and all of the 285+ Wikipedias. When there are items, they are found when you search for them. We should more aggressively add items for articles that are not linked to an item. At that the English Wikipedia is not of a special relevance, the German Wikipedia for instance is in many fields more complete. Thanks, GerardM (talk) 11:20, 26 June 2017 (UTC)
@Q.Zanden: Because for any given chapter of a book, there must be (a) a general data item for the chapter as a work, and then (b,c,...) separate data items for each edition of that chapter. So, if we have a 1910 UK edition of The Time Machine, with 12 chapters, then we need to create 12 data items for the 12 chapters as works, then another 12 data items for the actual chapters of the 1910 UK edition. The 12 work chapters will be part of the book as a work, but the 12 edition data items for chapters will be part of the specific edition, and also editions of the work chapters. This runs into further problems when you realize that there is more than one way that The Time Machine is divided into chapters, namely editions with 12, 14, or 16 chapters, which are not analogous to each other. So we need one data item for the novel as a whole, one data item for the serialized version as a work, one data item for the Heinemann text as a work, one for the Holt text as a work, one for the Atlantic text as a work, then each individual edition can be added. But then that whole discussion about chapters as works and editions? That gets multiplied by 4 because of the three novel and one serialized edition, and each set of chapter data items needs to clearly identify to which edition of the text it belongs. And that just deals with English editions. For translations into other languages, there's no guarantee that it will fit any of those editions. And because translations are separate data items, and editions are separate data items, there will never be any interwiki links to reduce the number of data items through merging. Create chapter data items? The creation of those for a single novel can run into the hundreds. And if a text is misidentified and all its item have to be located and edited? The work and maintenance are prohibitive beyond belief with no real benefit. What is the advantage to having a data item for a chapter 3 from the 1910 UK publication of the Heinemann text of H. G. Wells' The Time MAchine? It will have no interwiki links, because only the English Wikisource will have English copies of chapter 3 of the 1910, etc., and neither will it have external links to any data items because libraries don't do that sort of thing. --EncycloPetey (talk) 11:19, 26 June 2017 (UTC)
I think the situation for WikiSource is somewhat different and not my primary concern when I started this thread. Numbers for enwiki are actually going down (333789 / 4.6%) yesterday compared to ( 411047 / 6.2%) a year ago. Still, I don't think the current level is necessary a good base level. There are benefits of having a single sitelink items, but I'm less convinced that it outweighs disadvantages .. --- Jura13:41, 26 June 2017 (UTC)
I strongly find even items with nothing but a label and a single sitelink to in themselves be useful in wikidata. Yes they require some human attention to become more useful, but in themselves they ensure that (if they are complete) everything that any wikipedia has written about has some record in wikidata, and can therefore be found and enhanced. If otherwise one has to search through every language wikipedia to see if there is a matching entity for the external identifier you are looking at, I would find that nearly hopeless. So I would strongly oppose any effort to stop adding these items. ArthurPSmith (talk) 14:36, 26 June 2017 (UTC)
I find these useful as well. As long as our policy is that every article in every Wikipedia should link to one and only one Wikdata item, we should create these as we go. - PKM (talk) 19:06, 26 June 2017 (UTC)
For the Dutch Wikipedia we have an approach:
Monitor new article creations and try to link the to an existing item or create a new item with some statements. This is all done by humans with tools like duplicity
When an article is at least 28 days old and hasn't been edited for 21 days a bot comes along and creates the missing item. In practice this rarely happens, but this prevents a backlog
User:NoclaimsBot also works on the Dutch Wikipedia, but rarely has any hits these days. Would probably be good to run the newitem bot for other language Wikipedia's too. This way the new items come in a steady flow instead of a huge pile of items every once in a while. Multichill (talk) 19:36, 26 June 2017 (UTC)
Kommentar @QZanden: Chapters in a novel don't have items as they are predominantly not individually notable. How many "Chapter 1" labels would you like to see through Wikidata? Where an individual component of a work is notable, eg. a biographical entry, a paper, a poem, then WSes does look to list them individually. — billinghurstsDrewth21:38, 26 June 2017 (UTC)
Maybe we can try to adopt some of the exemplary practices of nlwiki for other Wikipedias. Obviously, it works best if contributors from a given Wikipedia regularly check duplicity. On the Wikidata side, maybe for some categories, items could be created regularly by bot after a fairly short time (include a P31 value). For others, the bot could create them after a wait time (even without P31/P279). --- Jura11:47, 27 June 2017 (UTC)
Once in a while, constraint definitions get re-loaded from property talk pages. Should be ok next month .. --- Jura19:52, 27 June 2017 (UTC)
Named time and style periods
There is a tiny problem with named time periods (read HUGE problems). Such periods not only have different names in different countries, but they also have different start and end times. They can even appear in different order in different countries. Very confusing. Very very confusing. In Norway we had the Merovinger time between younger and older iron age. In Sweden they had the Vendel time. In Denmark they had the Younger Germanic iron age. Iron age in Norway goes from 500 BC to approx 1000 AD, unless we are talking about Sami people which had an iron age from 0 AD to 1500 AD. It is a complete mess.
So what to do? It seems like Iron Age (Q11764) have a sort of solution, but incoming references might be wrong if they don't use additional specifier to identify which period and where it applies. And to whom, like for the Sami people. Use applies to part (P518) with an ethnic group?
An other strange example. Assume a building described as belonging to a specific architectural period like Swiss chalet style (Q2256729). This architectural period starts at another year in a neighboring country. Use Norway and Sweden as an example. In Norway the period started about 1840 and in Sweden about 1850. In 1855 it had already started to fall out of popularity in Norway, being replaced by dragestil (Q4562834). In Norway there were factories mass producing such houses, exporting them to Sweden, and there being sold and built as Swiss chalet style. So, is a building built in Norway (or Sweden) in late 1880-ish belonging to the Swiss chalet style if it is described on the Swedish vs Norwegian Wikipedia? What style period does Villa Fridheim (Q14942980) belong to on Swedish vs Norwegian Wikipedia? Can we say that it is the same style period anyhow?
So, the description for a style period should probably both have a reference to where the object is, and where the object originated. There are probably some problems associated with the observer too, but this is more than complex enough for now.
As to styles, a style may originate in a time period, but as you note, styles migrate, and things can be created in an outdated or old-fashioned style. I think we need to think about styles and historical periods separately. We haven't yet built out a complete hierarchy of styles. We could do that, possibly using the Getty Art & Architecture Thesaurus, and then associate the styles with a period using <inception>. - PKM (talk) 20:52, 27 June 2017 (UTC)
Yeah, historical periods and style periods are different, but both relates to time, and it is confusing. There are probably more such periods. Is style periods from architecture, visual art, and litterature the same? I suspect they're not, even if architecture is an art. Perhaps this is related to model vs instantiation. Jeblad (talk) 21:26, 27 June 2017 (UTC)
However, no such constraint appears among the item's statements. Given that my premises listed above are true, how is it that my conclusion was false? (In other words, where is the constraint specified, if not at the page Property:P1196?) More importantly, why is the constraint not defined among the statements at the page Property:P1196?
N.B. I see that Property_talk:P1196 contains the template {{Constraint:One of|values={{Q|3739104}}, {{Q|171558}}, {{Q|10737}}, {{Q|149086}}, {{Q|15729048}}, {{Q|8454}}, somevalue }}. Is this where the constraint is formally specified? If so, why is it done in this totally non-obvious manner, on the property's talk page, and not using property constraint (P2302) on the item's page?
Constraints are currently defined in templates (like {{Constraint:One of}}) on property talk pages, but will soon (within a few weeks, hopefully) be migrated to constraint statements. Some properties already have constraint statements (e. g. COSPAR ID (P247)), but even in those cases the constraints defined in statements and on the talk pages may differ. This will hopefully be sorted out once we (the WikibaseQualityConstraints extension as well as Ivan A. Krestinin’s report generation tool) use constraint statements.
The documentation on the property constraints portal (and the subpage) is independent of those constraint definitions – I copied over the manner of death (P1196) constraint since it was a nice, short example, but decided to leave out the unknown valueHelp for now since it wasn’t yet clear how the constraint would deal with this value. Now that this has been cleared up, I should probably update the documentation to remove this confusing mismatch.
@Lucas Werkmeister (WMDE): many thanks indeed for this reply, which makes things much clearer to me. Given that Help:Property_constraints_portal currently has no mention whatsoever of {{Constraint:<whatever>}} templates, it is likely to come as a surprise to other users (as it did to me, before you replied) to find (a) that they exist at all, and (b) that they are the mechanism by some (all?) of the examples given at Help:Property_constraints_portal are implemented. While your reply has definitely helped me, this documentation/implementation mismatch should definitely be addressed, so that this issue will be clearer for other new users, too :) Zazpot (talk) 17:11, 28 June 2017 (UTC)
@Zazpot: well, the documentation is only a few weeks old, and I didn’t want to spend much time on documenting a soon to be obsolete system… constraint statements really shouldn’t take more than two weeks by now, I think (perhaps three depending on deployment schedule). --Lucas Werkmeister (WMDE) (talk) 17:18, 28 June 2017 (UTC)
@Jura1: when ChristianKlsaid "only the constraints on the talk page get intepreted by the constraint tools", it was not clear to me at the time what that meant, because I did not know what "constraints on the talk page" were. Now it makes sense :) Thanks for the link to that helpful revision of the Constraints Portal page, which spared me from wading through many dozens of intermediate revisions! Zazpot (talk) 18:11, 28 June 2017 (UTC)
It's not a matter of converting. The existing item is fine. There's the option for creating a corresponding property but that wouldn't mean we would get rid of the existing item that's linked to interwiki links. If you want a new property, the process is to write a property proposal. ChristianKl (talk) 11:56, 26 June 2017 (UTC)
It has been previously requested of Emaus and GZWDer to watch their category creations as they have had a tendency to create duplicates. In the past few days EmausBot and GZWDer (flood) have created 840 duplicate categories for Commons <-> EnWiki alone [11], and at a guess many others. This is unsustainable and an abuse of volunteer time and patience to have to resolve that sort of rubbish. There needs to be a more mindful approach to the creation of categories than the current apparent "wham bam thank you maam" approach that is taking place. — billinghurstsDrewth04:18, 27 June 2017 (UTC)
Given the amount of created category-items that seems to be an error rate that's less than 1%. It would be good if it would be lower, but it's not that bad.
Maybe @Magnus Manske:'s merge game can be changed in a way that lists of merge candidates like the one that Pasleim created go to the top? Given that currently, the game has a high false positive rate, merging might get more productive with presorted lists. ChristianKl (talk) 08:59, 27 June 2017 (UTC)
Umm, how about we ensure that there is a checking mechanism and that it is both updated and current. While your percentage of errors may appear reasonable, the specific number is not. Both users need to update their code and practice. Having the creation of 800+ duplicated category items is not a good use of volunteer time to laboriously merge them. — billinghurstsDrewth04:10, 28 June 2017 (UTC)
BTW, by the end of the job of creating items for all bot-created articles in ceb.wiki (not me is the one who works on this now - post updated on 14:58, 29 June 2017 (UTC)), there will be tens of thousands of duplicated items, and a big part of them will not appear in any page of User:Pasleim/projectmerge. XXN, 22:26, 28 June 2017 (UTC)
I have found that many of the cebWP pages have aligned with svWP so that hasn't been the most horrendous, though if that is continuing unregulated, then you just need to have PLbot generate reports. All that said, fix the problem, not propagate the problem. We need to require bot operators look to match, not just blithely keep creating because it is easier. — billinghurstsDrewth00:18, 29 June 2017 (UTC)
You’re closing your eyes. New patrollers won’t pop up just because you says we need more patrolling. It’s usually a mistake to say « this is the answer » to a complex problem. A complex problem has multiple faces, hence multiple entry points to be solved. We should take advantage of each of them to come up to an efficient vandalism fighting. author TomT0m / talk page11:07, 28 June 2017 (UTC)
thank you. we need to welcome good faith newbies, and train them, rather than labeling them as vandals and whinging at project chat. after all, where will we get the editors to work the backlogs, if we do not recruit them? Slowking4 (talk) 03:12, 29 June 2017 (UTC)
Bot reversion of vandalism
On the English Wikipedia and other large wikis, there are a multitude of bots such as ClueBot NG which revert obvious vandalism (e.g. nonsense, rude words, page blanking) quickly after its insertion into articles. Is there a bot currently doing this on Wikidata; and if not, is it possible for this to be done (for statements, as well as labels and descriptions) given the obvious differences in data format? Jc86035 (talk) 14:37, 24 June 2017 (UTC)
That's a very interesting topic! I guess a machine learning approach could go a long way. There are many interesting features to take into account the constraint violations (not just format), the editor's experience, the tags, the edit type, size… − Pintoch (talk) 18:18, 24 June 2017 (UTC)
ORES already does the machine learning work and is supposed to have an interface that communicates the likelihood that an edit is vandalism. ChristianKl (talk) 09:04, 25 June 2017 (UTC)
At the moment there isn't a bot, but if I understand the ORES mission the right way, they see it as their role to provide the necessary information for a potential bot to do a task like this. ChristianKl (talk) 10:03, 25 June 2017 (UTC)
I've found ORES marking a large number of my recent merge edits in red - I think it needs to be adjusted for how wikidata works, because it is badly wrong about vandalism in a lot of cases here. ArthurPSmith (talk) 14:21, 26 June 2017 (UTC)
The trouble with vandalism has always been that overt vandalism is easy but covert vandalism is hard to detect. I suspect that there is less overt vandalism on Wikidata. That's not to say I am not interested in the problem. All the best: RichFarmbrough, 12:31, 26 June 2017 (UTC).
I started the bot with threshold of 98.7% which will catch around 10% of vandalism, I can change it to 97% and it will catch more than half of them but in one in ten reverts will be wrong and we don't want that. Amir (talk) 13:17, 29 June 2017 (UTC)
Thanks for pointing that out! I really need to write a tutorial to explain how to use OpenRefine for Wikidata imports. This is exactly what this tool is good for, but people are just not aware it exists. − Pintoch (talk) 18:10, 27 June 2017 (UTC)
It's not entirely clear whether the hub is for people trying to figure out how to do it (without asking for help on specific points) or people trying to find someone else to do it for them. Supposedly, the second group would better ask at Wikidata:Bot requests. --- Jura18:19, 27 June 2017 (UTC)
Some imports are large enough that we normally want a bot approval for the import to happen. It seems like currently people think that if they annouce the import at the Data hub everything is fine. ChristianKl (talk) 12:37, 29 June 2017 (UTC)
@Zazpot: In our data model items don't have constraints. Adding constraints to items does nothing. Even for properties, constraint statements are currently ignoresd and only the constraints on the talk page get intepreted by the constraint tools. Constraints on Wikidata also don't prevent anybody from adding new statement but only show that a contraint was broken. ChristianKl (talk) 21:04, 27 June 2017 (UTC)
I think as creator of the item you get notifications whenever someone adds a link. There's no mechanism to tag an item in a way that leads to it not being a valid value as instance of (P31) for other items. ChristianKl (talk) 21:18, 27 June 2017 (UTC)
ChristianKl (or anyone else following this thread): as you say, it is not possible in WikiData at the moment for items to have constraints. That being so, is it possible for me to add a property constraint to instance of (P31) such that statements giving the object of that property as open license (Q30939938) will be constrained to taking only certain items as their subject? Zazpot (talk) 15:35, 28 June 2017 (UTC)
I am trying to understand wikidata model of time properties. We have a lot of documentation like Help:Wikidata_datamodel, Help:Dates or Help:Modelling/general/time and they all seem to be outdated and possibly written before we had time model. All the help pages (and some phabricator pages like phabricator:T73867) talk about "before" and "after" fields used for specifying for range of time. That task is currently done with start time (P580) / end time (P582) and earliest date (P1319) / latest date (P1326) qualifiers. So is all the talk about before / after fields some kind of evolutionary dead-end, or is it some kind of parallel system? If it is evolutionary dead-end than we should move descriptions of it from Help namespace pages to Help_talk namespace pages. --Jarekt (talk) 14:45, 28 June 2017 (UTC)
Thanks Jura, I did not find that page. Ok so the "before" and "after" (and "timezone") fields are stored just not used. That is a bit confusing, but other help pages are not flat wrong as I thought. --Jarekt (talk) 12:08, 29 June 2017 (UTC)
I think it would be great if we would use the field. Many times, we know that a person has either born in year 19xx or in 19xx+1. ChristianKl (talk) 12:41, 29 June 2017 (UTC)
The current method is to make the date more generic, for example specify only decade or century, and than use earliest date (P1319) / latest date (P1326) qualifiers for more precise description. Use of qualifiers is more flexible, but I wish unused before/ after fields were removed to avoid confusion. --Jarekt (talk) 15:36, 29 June 2017 (UTC)
I find the user experience of that solution quite bad. It takes a lot more effort than writing 1981-1982. It's also harder to learn for new people. ChristianKl (talk) 16:36, 29 June 2017 (UTC)
Taking Berthold (Q221328) as an example, the edit that created the date of birth property states the timestamp is +1000-00-00T00:00:00Z, the calendar is Julian, and the precision is 100 years. As I read the specifications for the JSON and RDF model, this precision means that only the hundred years digit, and more significant digits, are significant. In other words, the time stamp could be rewritten +10dd-dd-ddTdd:dd:ddZ where "d" means "don't care". Thus, the range of years would be 1000 to and including 1099. But many authorities consider the 10th century to comprise the years 1001 to and including 1100. Thus it is controversial for you to interpret this time value as 10th century.
You might argue that's what the user interface does, but the user interface has a long list of flaws that have been sitting around for years without action, so the user interface cannot serve as an example of what is correct. Jc3s5h (talk) 18:12, 29 June 2017 (UTC)
Jc3s5h, I think you are referring to my edits on Help:Dates. I think you have it a bit backward. When I add a property stating that someone was born in 19-th century (meaning years "1801-1900" or "1800-1899") than the software saves +1900-00-00T00:00:00Z string with precision 7. I expected it to save +1800-00-00T00:00:00Z, but that is not what happen. Similarly when I typed "2-nd millennium" (meaning years "1001-2000" than the software saved +0003-00-00T00:00:00Z (with precision 6), not +1000-00-00T00:00:00Z as one might expected. My guess is that JSON documentation is only correct for precision higher than a "year". (I tried it some time ago and kept notes, but now I could not reproduce it. Either something changed or I did something wrong while testing. ). If you think that is wrong than you can file a bug report, but my edits to Help:Dates were trying to capture what is currently there and not what it would be logical to be there. --Jarekt (talk) 20:10, 29 June 2017 (UTC)
I think the documentation should focus on what is stored in the database (which we can't actually see, but we can see an approximation with JSON or RDF output. Any discrepancy between what is typed into the user interface and what is stored should be addressed separately. Bear in mind that the user interface is only one way to enter data into the database. Also, what was entered into the user interface is fleeting; later editors can't know if the user interface was used at all, or if so, what the editor typed. Jc3s5h (talk) 21:01, 29 June 2017 (UTC)
Jc3s5h I did some more experiments on a test item and dates and we are both right. I just did not do enough experiments to fully understand the pattern. Let me use quick_statements notation of timestamp/precision. So +1700-01-01T00:00:00Z/7 shows as 17-th century and 2000-00-00T00:00:00Z/6 shows as second millennium. Also 0001-00-00T00:00:00Z/6 shows as first millennium. But +1701-01-01T00:00:00Z/7 shows as 18-th century and 2001-00-00T00:00:00Z/6 shows as third millennium. So you are right that the first segment of the timestamp just show the year and the precision shows how to round it up. --Jarekt (talk) 04:13, 30 June 2017 (UTC)
Lua code for Wikidata dates
By the way, I am working on c:Module:Wikidata date Lua code to parse Wikidata dates. Currently I can recognize:
dates using any precision and using any calendar (Julian or Georgian)
See c:Module talk:Wikidata date/testcases for some examples. Am I missing any other ways people are using to specify dates? If so please provide me with item / property IDs. I found several cases where dates are hard to interpret. See for example:
How would I indicate that Nick Licata was a member of the Seattle City Council? Property:P463 says it's not for this purpose, and Property:P39 doesn't seem to have a provision for being a member of a legislative body, only for holding a unique office. - Jmabel (talk) 00:04, 29 June 2017 (UTC)
Between 1 July and 31 July, Wikimedia Sverige and UNESCO co-arranges the second writing challenge of the Connected Open Heritage project, the COH Challenge.
As part of the Connected Open Heritage project a large number of images under a free license have been uploaded, e.g. of world heritage sites and of important archaeological and built heritage sites in Syria, Mexico, Cyprus and Sweden (the images can be found here). The purpose of this challenge is to get as many of these images as possible to be used in Wikipedia articles (however, at most five images – with caption – per article). But a lot of points are also to be gained from adding those images to Wikidata items, and adding Wikidata Item numbers on the images' file pages.
If you'd like to participate, you can find the participation page here, where you also register your points. Participate in any language you’d like! The winner receives as well the honor as great prizes.
Hello everyone,
a simple query of scientific articles with dates of publication but without description works well for some languages and not for others. I need the output for "Arabic", but for some reason it is not correct and the result includes items with descriptions try it. When I change the filtered language to "it" or "de" it works fine. What is the reason and how to fix that? Note: it would work correct if date variable is taken out, but it's needed. --Sky xe (talk) 12:22, 29 June 2017 (UTC)
We don't document structures of categories on Wikidata, there were discussions about this in the past. Every Wikipedia has his own structure of categories. Sjoerd de Bruin(talk)13:57, 29 June 2017 (UTC)
When you hit Run in background you will be asked if you want to set an edit summary. It will display then beside the #quickstatement label also the batch number with a link to that batch where the summary is shown. See for examples the latest edits from QuickStatementsBot where I ran a batch of three test edits. The summary links to the batchnumber where you can find the summary. It is not possible to add a summary when running QS by hand. Q.Zandenquestions?22:46, 30 June 2017 (UTC)