Bayes Opens an Envelope

Einführung

Access to high-quality primary data is one of the key challenges for many sustainability initiatives, such as ESG reporting, carbon-accounting, or product passports. One solution to incentivise such data access could be to implement a simple commit-and-open protocol as part of a data-sharing solution. In the following, we will explain the underlying mechanism, its components, and why this can lead to more data access.

This article builds upon the seminal work of Emir Kamenica and Matthew Gentzkow introducing Bayesian Persuasion. For an introduction, I recommend their original paper or this summary on information design.

A simple example: Loan Granting

Let us assume you want to get a loan. In order to be eligible, you need to have a certain amount of money in your bank account. To make the loan decision, the lender needs to see your bank account balance. We assume that you are very sensitive to data privacy; i.e., giving the lender access to your bank account balance implies significant costs for you.

Let’s assume that the overall likelihood of receiving a loan is 60%. However, the lender wants to be 80% certain that your account balance is high enough. Both numbers are common knowledge to you and the lender. So without additional information, that is, without you revealing your bank account balance, the lender would reject your loan application as his desired certainty is higher than the average rating.

Now, what is your optimal strategy? When you are sure that your bank account balance is high enough (and the benefits of the loan are higher than your costs for sharing this sensitive data), you should reveal your bank account information. The opposite is true when you are certain that your account balance is not high enough.

But what should you do, if you never received a loan before and therefore don’t know whether your account balance is high enough or not? In game theory, this would be a case of not knowing your own type. As always, it depends (on your risk preferences). However, the lender could offer you an interesting incentive to share your data by means of a cryptographic device. This device is granted access to your bank account and, without ever revealing your account balance, does the following:

If your account balance is high enough, the cryptographic device sends a recommendation advising the lender to grant you the loan.
If your account balance is not high enough, with a probability of 38%, the device sends a recommendation advising the lender to grant you the loan and with a probability of 62%, the device sends a recommendation advising not to grant you the loan.

Why should both parties participate in this process? Well, let us first consider the current process: Lenders have implemented formal and informal verification institutions and processes. Typically, verification comes at a substantial cost and may not provide accurate and/or timely information (especially compared to the information concluded from your actual bank account).

With the process described above, you will get the loan with 75% probability

= 60% probability because your account balance is high enough

+ 40% x 38% probability even though your account balance is not high enough.

The lender knows that out of the 75% positive ratings, 60% are correct and 15% are false-positives. This can be considered the implicit cost of getting a rating based on primary data (your bank account balance) – a cost the lender might carry quite willingly, especially when compared to the high verification costs of the current process. And for you, the borrower, the likelihood of receiving the loan has increased significantly which might be the necessary incentive to securely share your data via the cryptographic device.

In the following, we will explain the different components of the mechanism and how to conclude the numbers used in the example.

Commit-and-Open

Very simplified, commit-and-open just means that a sender commits to a communication strategy before learning their actual type. In the example above, this means that you (the sender) allow the cryptographical device to communicate the loan decision (depending on your type, i.e. your bank account balance) to the borrower (the receiver) without yet knowing whether you’re eligible to receive the loan or not. More generally, you program the device in the following way:

If my type is “good,” tell the receiver the truth.
If my type is “bad,” tell the receiver that I am “good” (i.e., lie) with a probability of p (let us call p the lie-probability) and that I am “bad” (i.e., tell the truth) with a probability of 1 - p.

Combined with some basic Bayesian math, we can now conclude which value should be assigned to the lie-probability in order to persuade the lender to grant you the loan after receiving a “good” message.

Bayesian Persuasion

Let’s take the perspective of the lender: He should grant you the loan, if the probability that your type is “good” is higher than his personal threshold. There are cases where it is fine if the “good” message is true with 50% certainty, and other cases where a lender aka receiver wants 99% certainty. In our example above, the threshold is 80%.

If you prefer to skip the math part, here’s the tldr: Many economic theories say, that after receiving your message, the receiver updates their beliefs according to Bayes' theorem. This means you can calculate the lie-probability p, depending on the a-priori probability of the types and the threshold resp. preferences of the receiver, to persuade the receiver that you are “good”.

We will now introduce some very basic notation and a short summary of Bayes' theorem. There are two types (GOOD, BAD) and two possible messages (good, bad). The probability that the sender is GOOD is denoted by P(GOOD). P(GOOD|good) denotes the conditional probability (from the receiver’s perspective) that the actual type is GOOD after the receiver received the message good. The famous Bayes' theorem

enables us to calculate the optimal messaging strategy for the sender. We have P(good) = P(GOOD)P(good|GOOD)+P(BAD)P(good|BAD) whereas the second summand describes the probability that the message is good even though the actual type is BAD. Please note that P(good|GOOD) = 1 as the message will always be good if the type is GOOD and that P(BAD) = 1-P(GOOD). Therefore, we have

The receiver will believe the message if P(GOOD|good) is above their personal threshold (e.g. 50%, 99%, or 80% as in our example above) denoted by T (i.e., P(GOOD|good) > T). Now, we can solve for the lie-probability p = P(good|BAD) that we are looking for

This means lie-probability P(good|BAD) depends on the individual preference of the receiver as well as the a-priori probability of being a GOOD type, i.e. P(GOOD).

To sum it up, the process of the commit-and-open protocol based on the mechanism above looks like this:

The receiver communicates the probability threshold T to believe a good message.
The sender calculates the highest lie-probability P(good|BAD) and enters it into the commit-and-open scheme.
The sender learns her/his actual type.
The commit-and-open protocol automatically sends a message to the receiver according to protocol and the lie-probability.
The receiver follows the message.

A simple example: Loan Granting (ctd.)

In the example above, the probability threshold T = 80% and the a-priori likelihood of being a good type is P(GOOD) = 60%. Putting this in the equation above, we get the lie-probability

leading us to the numbers shown above:

If your bank account balance is high enough, you get the loan.
If your bank account balance is not high enough, you get the loan with a probability of 38%, and you do not receive the loan with a probability of 62%.

The overall probability for you getting the loan is 75% (vs. the a-priori probability of 60%).

Fazit

The recent scandals of Wirecard (financial information fraud) and Volkswagen (pollution information fraud) may indicate that the current verification mechanisms are not always effective. Similar to our loan granting example above, the current process leaves the lender uncertain whether the verification provided is correct or not. Access to primary data (i.e., the bank account balance) allows for a much better decision making by the lender. With the commit-and-open process, the lender can be certain of a fixed false-positive rate (15% in our example above), making it a manageable and especially insurable risk. The borrower increases her/his probability of getting the loan which might be a meaningful incentive to actually provide data access, i.e. reveal the bank account balance to a cryptographic device.

Bayes Opens an Envelope

Aurel Stenzel

Solving Data Sharing Dilemmas

Einführung

A simple example: Loan Granting

Commit-and-Open

Bayesian Persuasion

A simple example: Loan Granting (ctd.)

Fazit

Insights from the community

Others also viewed

Why is the Bank Statement Analysis Tool Indispensable?

Understanding the FICO Score: A Comprehensive Guide

Challenges and Opportunities of 2024 ( Besides AI).

AI-Powered Credit Scoring and Risk Assessment: Unlocking Financial Inclusion for the Underserved

Revolutionizing Credit Risk with Generative AI

Credit Repair Services Market To Watch: Big Spotlight On Market Giants | CreditRepair, The Credit People, Ovation

Explore topics