Page MenuHomePhabricator

[Spike] Token expiration handling
Closed, ResolvedPublicSpike

Description

Background information

Follow-up from T246716#6324971

Questions we want to answer

How should we handle cleaning up bad tokens in the database that are not removed by subscribers?

How will we go about answering the questions

Review the approaches to this issue taken by pushd and other open-source push notification server projects.

Results

Projects reviewed: pushd, pushkin, uniqush, Signal push server, AirNotifier, web-push-subscriber, push.js

It turns out that there is a strong consensus on this point: they simply delete subscriptions for which a bad-subscription response is received, as soon as it is received. The only exceptions are pushkin, which marks them as unregistered and deletes them later (apparently when the number of subscriptions grows too large); Signal push server, which pushes them onto an unregistered device queue, but doesn't appear to do anything with that queue; and push.js, which has no provision for cleaning up bad subscriptions.

The pushd documentation states that apps are expected to ping the push service each time they are launched to ensure they are subscribed, and (re-)subscribe if not. (Its README file describes this as "letting pushd know the subscriber still exists," but that doesn't quite accurately describe what the code is actually doing.) I think that's a good approach for our apps to take as well.

The only somewhat tricky bit that we have to deal with and these existing projects don't is that we don't have direct access to the subscriber database, and instead have to work across a service boundary and ask MediaWiki to delete subscriptions on behalf of the service. I think we can deal with this in the following way:

  • In Echo, create a new user right (something like manage-all-push-subscriptions) and a new group (e.g., push-subscripton-managers) with that right
  • Update ApiEchoPushSubscriptionsDelete to allow users with the manage-all-push-subscriptions right to delete any token from the push subscriptions table
  • Choose a wiki through which the push-notifications service will communicate back with Wikimedia's MediaWiki installation (suggestion: metawiki)
  • On that wiki, create a bot account for the push-notifications service and add it to the newly created push-subscription-managers user group
  • Update the push service to be able to authenticate to MediaWiki and send push subscription delete requests
  • Update the APNS and FCM response handlers to identify bad subscriptions/tokens and send subscription delete requests to the designated wiki for those tokens

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJul 29 2020, 4:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 619570 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/Echo@master] Create push subscription manager group/right to clean up dead subs

https://gerrit.wikimedia.org/r/619570

This is about 2/3 implemented (see also the service patches assocated with T260247), but upon further reflection I'm thinking it would probably be better to take an event-driven approach to this rather than issuing HTTP requests for subscription deletion directly from the service to the MediaWiki API. Instead, in principle, we could define a SubscriptionDeleteJob in MediaWiki and have it be triggered by cpjobqueue as needed. I believe this would also remove the need for a user account and authentication.

@Pchelolo Can changeprop react to events by scheduling a job on the job queue? Or, alternatively, would it be acceptable for a non-MediaWiki service to send jobs directly to the Kafka job queue? Also, it acceptable to have private information (in this case, subscriber tokens issued by Apple and Google, albeit invalid ones) traverse the job queue?

(cc @MSantos, @bearND, @Jgiannelos for discussion)

Mini brain dump:

Job events produced by Mediawiki are signed with a keyed hash of the serialized event, using $wgSecretKey as the key. This signature is then validated before a job from the Kafka job queue is run. We cannot simply produce job events to the Kafka job queue from arbitrary sources and expect MediaWiki to run them.

I can imagine creating a MediaWiki REST endpoint like CreateSingleJob, similar to the existing RunSingleJob, for changeprop to hit in order to react to events by having MediaWiki create and submit a job.

I don't know whether we're better off pursuing this approach or sticking to the original plan, which is more naive but would probably perform satisfactorily.

Submitting a job from a service doesn't seem like a good idea, specifically for the reasons you describe.

Creating an API in MW to submit a job will make us end up making crazy validation on that job, which is no different from a specific deletion API.

JobQueue is MW-internal mechanism, we shouldn't expose it externally.

Thanks, @Pchelolo. Sounds like we might be better off just sticking to plan, then.

Change 619570 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Create push subscription manager group/right to clean up dead subs

https://gerrit.wikimedia.org/r/619570