Software developer on the Wikidata team at Wikimedia Germany (he/him, Berlin timezone). Private account: @LucasWerkmeister.
User Details
- User Since
- Apr 3 2017, 2:45 PM (379 w, 3 d)
- Availability
- Available
- IRC Nick
- Lucas_WMDE
- LDAP User
- Lucas Werkmeister (WMDE)
- MediaWiki User
- Lucas Werkmeister (WMDE) [ Global Accounts ]
Yesterday
That error doesn’t sound like it would be related to any particular test… I think we just need to wait for MediaWiki to be ready?
Wed, Jul 10
Is “palladium” still a thing?
I am also not at all sure right now that the test release can easily be folded in like that, we 'll have to see if the service mesh is able to support >1 release being exposed like that.
My TLS-fu isn’t the strongest, but to me it looks like the certificate configured for the service (kube_env termbox staging; kubectl get configmap config-test -o json | jq -r '.data["puppetca.crt.pem"]') and the ones sent by mw-api-int-ro (openssl s_client -connect mw-api-int-ro.discovery.wmnet:4446 -showcerts < /dev/null) are unrelated…? The former is for “CN = Puppet CA: palladium.eqiad.wmnet”, whereas the latter are both for “O = "Wikimedia Foundation, Inc"” (one “OU = SRE Foundations, CN = discovery”, the other “OU = Cloud Services, CN = Wikimedia_Internal_Root_CA”). I don’t see anything linking them.
(And as seen in T368523#9969866, curl rejects the cert from mw-api-int-ro.discovery.wmnet:4446, so my feeling would be that Node 20 is right to reject it and Node 18 had a bug. But that doesn’t help us much if we want to get away from Node 18 :D)
Okay, I found out how to emulate curl --connect-to.
Okay, the behavior seems pretty bizarre even without configuring the NODE_EXTRA_CA_CERTS=. With the new (node20) image, I get the “unable to get local issuer certificate” error:
Okay, I can reproduce something somewhat close to the error message, I think.
I wonder if Node 20 requires 4kbit RSA keys?
Hm, this seems a bit suspicious… if I connect to the endpoint that the Test Wikidata Termbox is configured to connect to:
Alright, the image revert fixed the termbox on Test Wikidata. So it’s indeed an issue somewhere in the node20 version of the image, and unrelated to envoy. (But node20 is still deployed in eqiad/codfw for non-Test Wikidata and working fine there.)
Hm, I can see NODE_EXTRA_CA_CERTS in both deployments at least (termbox-production and termbox-test)… and it’s still documented in Node 20 so I doubt it went away – but maybe it works differently now.
In T368523 we’re seeing an “unable to get local issuer certificate” error that may or may not be related to the new Envoy version; it’s not very urgent (only affects a test wiki) but I’d be very thankful if someone could take a look :)
as far as I can tell the updated version is working on Wikidata, but not on Test Wikidata
But the production termbox doesn’t seem to have this error, so I don’t think it’s a red herring.
Aha, “unable to get local issuer certificate”:
Hm, as far as I can tell the updated version is working on Wikidata, but not on Test Wikidata – there’s no SSR termbox there. But I don’t see any errors in logstash that would explain this.
(Almost this entire answer applies to Lexemes, Items and Properties equally, so I’ll mostly just say “Entities” to cover them all.)
Nothing left to do here at the moment; the backports will have to wait until the next l10n backport updates the translation on the release branches, and then they should be merged. Until then, let’s leave the task open.
because the translation was updated again
Tue, Jul 9
Okay, apparently those warnings come from a long-running eval.php running on mwmaint1002, which is presumably seeing an old version of CommonSettings.php. @Catrope will hopefully know what to do with that (I don’t think killing it is warranted yet).
Apparently the error happened on two lines (L3719 and L3728), and the above change fixed the latter occurrence but not the former:
This is pretty noisy in logstash, so the above fix will hopefully avoid the warnings. It might need further improvement later by someone who knows what the data looks like. (But it’s hopefully still better than reverting the whole tracking.)
Mon, Jul 8
AFAICT the relevant change is Make syntax highlighting readable in night mode (T356956), possibly with some further changes after that.
If needed, we could identify all entities having an EntitySchema property and reindex those manually, but this requires some work and instrumentation on our side.
Note that Special:NewLexeme already uses a partial Vue 3 build of Wikit, not the Vue 2 version.
Thanks, added that to the task description :)
Or is nodejs20-devel not needed anymore because nodejs20-slim already includes npm?
(FTR – if only so I remember it myself ^^ – I was motivated to work on this because a new Envoy image is being rolled out to all Kubernetes services for T368366, and so deploying the node20 update would have also deployed that update and killed two birds with one stone.)
Is there a reason why there’s no nodejs20-devel image in the docker registry yet? (We use both both -slim and -devel images in Termbox, so without a node20 version of -devel it’s not clear to me how to migrate T368523.) Is it expected to be added at some point, or should we maybe stick with node18 for development purposes?
Blocked on T364779#9960934, I think.
Oops, disregard the above DNM change – I forgot to remove the Bug line there ^^
/me shakes fist at Phorge for not letting me award this task another token
Moving out of our peer review column, since it’s unclear if the issue still happens at all. (I also haven’t been able to reproduce the giant “lexicographical data” menu heading on my end, FWIW.)
T355292: Port videoscaling to kubernetes should probably be a subtask of this (or maybe a subtask of T321899)? At least I’ve been told that videoscalers are blockers for the k8s migration being considered complete, and T355292 seems to be the currently active task in that area.
I think the above change should work, but I’d love for someone from Discovery-Search to take a look and see if it makes sense. I can only partially test it locally – I can see that EntitySchema IDs start to show up in the statement_keywords of action=query&prop=cirrusbuilddoc API output, but I don’t actually have ElasticSearch installed locally, so I don’t know if the search works.
Ask Lucas for advice on how to set up and reproduce the errors locally
That looks like the same error as T369149 (see T369149#9949518); is it still happening since Friday?
Fri, Jul 5
The undeserializable value looks like this:
Alright, thanks! Searching for the property seems to work now \o/
The fix should be deployed now; @dcausse do we need to manually trigger a re-indexing of the affected pages (only this property, really) or is it going to happen automatically?
While testing the deployment ^, I noticed that this doesn’t actually affect entities with EntitySchema-type statements (e.g. Q5); it only affects entities with other statements (e.g. Item-type statements, though probably some other types too) that had an EntitySchema qualifier. Which in practice was probably only P12861 (until I reproduced the situation on the sandbox item to verify the fix).
Looks like there was a change in mb_strlen() at least: https://3v4l.org/c0FZ6
var_dump(mb_strlen("\xF4\x8F\xBF\xC0", 'UTF-8'));
Output for 8.3.0 - 8.3.9
int(2)
Output for 8.2.17 - 8.2.21
int(1)
I think this is the same issue as T338115: Numerals for years are not converted in date statements?
Doesn’t seem to be specific to Wikidata – https://logstash.wikimedia.org/goto/2e673c3e690194a2f98280393e7975c4 shows >100k occurrences of the error, but mainly on Wikipedias. (It’s also very “spike-y”.)
(Note: The above logstash link is based on the mediawiki-errors backlog, but these messages don’t actually end up on an error channel – I removed that filter.)
Thu, Jul 4
Wed, Jul 3
As there were a lot of changes to deploy, I didn’t investigate yet, but just ran the script on mwmaint1002 instead.
Okay, I think we need two things:
Alright, if I load the CirrusSearch-related extensions and set all of the following settings, I can reproduce the issue on an item or property with EntitySchema statements:
Hm, are we just missing a search-index-data-formatter-callback data type definition?
I don’t see any P12861-related errors in Logstash that could explain this.
Sounds like our responsibility to fix, at least. Thanks for looking into it!
For future reference, the current cirrusDump contents are:
As for performance, if we backport this then I think we can look at ResourceLoader Module builds and see if there’s any visible effect from the deployment. (If this rolls out with the train, then the signal will probably be buried under noise from other changes in the train.)
Okay, I think I can reproduce the issue locally (looking whether entity-schema appears in mw.loader.using('wikibase.experts.modules').then(require => console.log(require('wikibase.experts.modules')))), and the above patch seems to fix it (no longer lags behind changes to $wgEntitySchemaEnableDatatype).
Could we force a re-index of this page?
Are you already deploying the backport now? Otherwise it shouldn’t have been merged yet IIUC.