User Details
- User Since
- Jan 6 2022, 7:27 PM (132 w, 3 d)
- Availability
- Available
- LDAP User
- Marco Fossati
- MediaWiki User
- MFossati (WMF) [ Global Accounts ]
Thu, Jul 18
Unblocked: directly load the logo detection model within the same code.
Wed, Jul 17
Tue, Jul 16
Blocked by T364551#9977031.
Mon, Jul 15
Thanks @kevinbazira for the prompt action. @klausman @isarantopoulos , as we're now entering hypothesis work under time constraints, could you please give us an estimate to tackle this request?
The LiftWing endpoint being accessible is a hard requirement for T368624: [XL] Post-upload job to detect logos. CC @AUgolnikova-WMF .
Fri, Jul 12
@matthiasmullie wrote:
Hi @kevinbazira; we finally have an API in production that is supposed to send data to the logo detection service at https://inference-staging.svc.codfw.wmnet:30443/v1/models/logo-detection:predict
It doesn’t fully seem to work, though - it looks like some servers are able to access that uri, while others are not.
E.g.:
curl -H 'X-Wikimedia-Debug: backend=mwdebug1001.eqiad.wmnet' https://commons.wikimedia.org/w/api.php\?action\=mediadetection\&format\=json\&formatversion\=2\&filekey\=1b2a8gxjr6m0.xox7qj.6750701.jpg {"predictions":[{"filename":"1b2a8gxjr6m0.xox7qj.6750701.jpg","target":"logo","prediction":0.0035,"out_of_domain":0.9965}]}
but:
curl -H 'X-Wikimedia-Debug: backend=k8s-mwdebug' https://commons.wikimedia.org/w/api.php\?action\=mediadetection\&format\=json\&formatversion\=2\&filekey\=1b2a8gxjr6m0.xox7qj.6750701.jpg {"error":{"code":"http-timed-out","info":"HTTP request timed out.","docref":"See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."},"servedby":"mw-debug.eqiad.pinkunicorn-b74f6b749-d6pc6"}
We suspect that that internal endpoint is not available to mw k8s nodes - do you know how to make them accessible?
Thu, Jul 11
Also done with https://gitlab.wikimedia.org/repos/structured-data/image-suggestions/-/merge_requests/39 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/583.
@matthiasmullie the floor is back to you.
Wed, Jul 10
Tue, Jul 9
- change section topics DAG's default data quality scripts output
A note that we should tackle this.
Mon, Jul 8
Actually partially addressed, see T369053#9962162.
BTW @matthiasmullie I've just downloaded the patch, and this message doesn't show up anymore.
@Sneha I think this will be addressed by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UploadWizard/+/1051758, is that right @matthiasmullie ?
Pasting the conclusion here for convenience.
Other edit tags
PAWS
No deletions due to logo except a negligible 0.002 % on Feb 2024.
Fri, Jul 5
Results
Upload Wizard
year | month | total_uploads | with_logo_dr | deleted_without_dr | total |
2024 | 06 | 259 | |||
2024 | 05 | 393670 | 0.013209032946376407 | 0.037848959788655476 | 0.05105799273503188 |
2024 | 04 | 399437 | 0.03655144616047086 | 0.04381166491837261 | 0.08036311107884347 |
2024 | 03 | 406777 | 0.08235470540369784 | 0.062196240200404644 | 0.1445509456041025 |
2024 | 02 | 353249 | 0.05152173113016597 | 0.08067963391262253 | 0.1322013650427885 |
2024 | 01 | 352467 | 0.12398323814711731 | 0.06894262441590276 | 0.19292586256302008 |
2023 | 12 | 310798 | 0.05984594495460074 | 0.07882933609611387 | 0.1386752810507146 |
2023 | 11 | 337588 | 0.052134554545777693 | 0.05361564984537365 | 0.10575020439115135 |
Wed, Jul 3
References:
Tue, Jul 2
Mon, Jul 1
Wed, Jun 26
https://gitlab.wikimedia.org/repos/structured-data/section-topics/-/merge_requests/29 reviewed, looks great to me.
Mon, Jun 24
Jun 21 2024
Jun 19 2024
Jun 6 2024
Jun 4 2024
Jun 3 2024
isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions') alis = isu.where('section_index is null') slis = isu.where('section_index is not null')
May 31 2024
@Etonkovidova @Sneha FYI as of now the patch is reverted, so we won't see the change on beta until we re-merge it.
@Etonkovidova @Sneha , the reason why I haven't added that horizontal line is because another one will show up in case of multiple uploads, so I've left it out.
May 30 2024
May 29 2024
Hey @KStoller-WMF , chiming in while @AUgolnikova-WMF is out of office: yes, I'll pick up this ticket next week. Stay tuned!
May 28 2024
May 27 2024
May 22 2024
May 15 2024
May 14 2024
See T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard
I'd suggest we proceed with a base64 encoded image for now.
With binary being the preferred format, right?
May 13 2024
I think that the logo detection service can be exposed through an internal endpoint, so it will be inside WMF’s infrastructure.
Moreover, when an image is sent to the upload stash, there’s a set of already implemented checks including existing duplicates and previously deleted duplicates.
May 10 2024
I agree and have dug deeper in the current request being made to the Upload API: maybe the CSRF token is what we're looking for. See upload_file_in_chunks in the example request code. I can confirm that the Upload Wizard is sending a token parameter in the request.
Chiming in: this will be done in T361061: [M] Update the 'other information' field in upload wizard.
May 9 2024
I've opened T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard to investigate the feasibility of this solution.
@isarantopoulos @kevinbazira , I think I found how to get a thumbnail from a stashed image. There you go: https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/1awuam969hko.2tkfbz.10893556.png/224px-1awuam969hko.2tkfbz.10893556.png, where 1awuam969hko.2tkfbz.10893556.png is the stash file key. The 224px- prefix is the width size.
Of course, I feel there's a caveat, as it seems that the thumbnail is generated on the fly at request time. Still not optimal, but sounds like a workable solution.
I can imagine we can tackle that from within the Upload Wizard with some JavaScript library. I can create a ticket to look into that if you think this would be the best solution.
Thinking out loud: what about sending multiple requests if the limit is reached? I speculate that 50 uploads are an edge case: if this happens, we could dispatch different requests.
May 8 2024
Hmm, I've just given it a try and I think it won't work for stashed images, which is a hard requirement for us.
@isarantopoulos , totally agree, makes a lot of sense.