Topic on User talk:Magnus Manske

GND id normalization 20231013 authorities-gnd_umlenk_ohneTu.jsonld.gz

7
Asterix2023 (talkcontribs)

https://data.dnb.de/opendata/ authorities-gnd_umlenk_ohneTu_20231013.jsonld.gz

2023-11-01 11:08 6.7M 530418 Datensaetze / GND, Umlenksaetze ohne Tu, Format RDF (JSON-LD) (Stand: 13.10.2023 12:20 UTC)

When it contains IDs as redirect source that WD has as regular values, and the ID of the redirect target is not in WD, then maybe the target values should be added as new values or they should replace the old.

Probably the majority are for humans. There could be more than one source for each target. Does Datensatz mean redirect or group of IDs? How many targets does it contain, how many are humans and how many of these humans are in WD?

Magnus Manske (talkcontribs)

Interesting. I got all GNDs with items from SPARQL, then ran the GNDs against the "old" GNDs from dnb. I didn't find a single match. Maybe I'm doing something wrong, or maybe there is some bot already doing cleanup?

Magnus Manske (talkcontribs)

Silly space-instead-of-tab-separator issue. Found ~600 GNDs to be replaced. Running now.

Asterix2023 (talkcontribs)

Can you say how many targets are in the DNB file and how many of these are in WD? Maybe all humans from that DNB file should be in WD? They could be of higher importance (created at least twice) and higher quality (curated during merge) than many other humans in the GND.

Magnus Manske (talkcontribs)

I have a total of 1,788,762 unique GNDs in Wikidata, and 438,641 unique (target) ones from the GND file you gave me. There is an overlap of 188,987 or 43% of the GND file. I will try to make a Mix'n'match catalog for the remaining 249,654 ones. I should probably download the people-only data from GND and create the catalog locally, rather than query GND a quater million times...

Asterix2023 (talkcontribs)
Asterix2023 (talkcontribs)
Reply to "GND id normalization 20231013 authorities-gnd_umlenk_ohneTu.jsonld.gz"