-
Stability and Multigroup Fairness in Ranking with Uncertain Predictions
Authors:
Siddartha Devic,
Aleksandra Korolova,
David Kempe,
Vatsal Sharan
Abstract:
Rankings are ubiquitous across many applications, from search engines to hiring committees. In practice, many rankings are derived from the output of predictors. However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings. Our work considers ranking functions: maps from individual predict…
▽ More
Rankings are ubiquitous across many applications, from search engines to hiring committees. In practice, many rankings are derived from the output of predictors. However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings. Our work considers ranking functions: maps from individual predictions for a classification task to distributions over rankings. We focus on two aspects of ranking functions: stability to perturbations in predictions and fairness towards both individuals and subgroups. Not only is stability an important requirement for its own sake, but -- as we show -- it composes harmoniously with individual fairness in the sense of Dwork et al. (2012). While deterministic ranking functions cannot be stable aside from trivial scenarios, we show that the recently proposed uncertainty aware (UA) ranking functions of Singh et al. (2021) are stable. Our main result is that UA rankings also achieve multigroup fairness through successful composition with multiaccurate or multicalibrated predictors. Our work demonstrates that UA rankings naturally interpolate between group and individual level fairness guarantees, while simultaneously satisfying stability guarantees important whenever machine-learned predictions are used.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Proportional Representation in Metric Spaces and Low-Distortion Committee Selection
Authors:
Yusuf Hakan Kalayci,
David Kempe,
Vikram Kher
Abstract:
We introduce a novel definition for a small set R of k points being "representative" of a larger set in a metric space. Given a set V (e.g., documents or voters) to represent, and a set C of possible representatives, our criterion requires that for any subset S comprising a theta fraction of V, the average distance of S to their best theta*k points in R should not be more than a factor gamma compa…
▽ More
We introduce a novel definition for a small set R of k points being "representative" of a larger set in a metric space. Given a set V (e.g., documents or voters) to represent, and a set C of possible representatives, our criterion requires that for any subset S comprising a theta fraction of V, the average distance of S to their best theta*k points in R should not be more than a factor gamma compared to their average distance to the best theta*k points among all of C. This definition is a strengthening of proportional fairness and core fairness, but - different from those notions - requires that large cohesive clusters be represented proportionally to their size.
Since there are instances for which - unless gamma is polynomially large - no solutions exist, we study this notion in a resource augmentation framework, implicitly stating the constraints for a set R of size k as though its size were only k/alpha, for alpha > 1. Furthermore, motivated by the application to elections, we mostly focus on the "ordinal" model, where the algorithm does not learn the actual distances; instead, it learns only for each point v in V and each candidate pairs c, c' which of c, c' is closer to v. Our main result is that the Expanding Approvals Rule (EAR) of Aziz and Lee is (alpha, gamma) representative with gamma <= 1 + 6.71 * (alpha)/(alpha-1).
Our results lead to three notable byproducts. First, we show that the EAR achieves constant proportional fairness in the ordinal model, giving the first positive result on metric proportional fairness with ordinal information. Second, we show that for the core fairness objective, the EAR achieves the same asymptotic tradeoff between resource augmentation and approximation as the recent results of Li et al., which used full knowledge of the metric. Finally, our results imply a very simple single-winner voting rule with metric distortion at most 44.
△ Less
Submitted 23 January, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Generalized Veto Core and a Practical Voting Rule with Optimal Metric Distortion
Authors:
Fatih Erdem Kizilkaya,
David Kempe
Abstract:
We revisit the recent breakthrough result of Gkatzelis et al. on (single-winner) metric voting, which showed that the optimal distortion of 3 can be achieved by a mechanism called Plurality Matching. The rule picks an arbitrary candidate for whom a certain candidate-specific bipartite graph contains a perfect matching, and thus, it is not neutral (i.e, symmetric with respect to candidates). Subseq…
▽ More
We revisit the recent breakthrough result of Gkatzelis et al. on (single-winner) metric voting, which showed that the optimal distortion of 3 can be achieved by a mechanism called Plurality Matching. The rule picks an arbitrary candidate for whom a certain candidate-specific bipartite graph contains a perfect matching, and thus, it is not neutral (i.e, symmetric with respect to candidates). Subsequently, a much simpler rule called Plurality Veto was shown to achieve distortion 3 as well. This rule only constructs such a matching implicitly but the winner depends on the order that voters are processed, and thus, it is not anonymous (i.e., symmetric with respect to voters).
We provide an intuitive interpretation of this matching by generalizing the classical notion of the (proportional) veto core in social choice theory. This interpretation opens up a number of immediate consequences. Previous methods for electing a candidate from the veto core can be interpreted simply as matching algorithms. Different election methods realize different matchings, in turn leading to different sets of candidates as winners. For a broad generalization of the veto core, we show that the generalized veto core is equal to the set of candidates who can emerge as winners under a natural class of matching algorithms reminiscent of Serial Dictatorship.
Extending these matching algorithms into continuous time, we obtain a highly practical voting rule with optimal distortion 3, which is also intuitive and easy to explain: Each candidate starts off with public support equal to his plurality score. From time 0 to 1, every voter continuously brings down, at rate 1, the support of her bottom choice among not-yet-eliminated candidates. A candidate is eliminated if he is opposed by a voter after his support reaches 0. On top of being anonymous and neutral, this rule satisfies many other axioms desirable in practice.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
A System-Level Analysis of Conference Peer Review
Authors:
Yichi Zhang,
Fang-Yi Yu,
Grant Schoenebeck,
David Kempe
Abstract:
The conference peer review process involves three constituencies with different objectives: authors want their papers accepted at prestigious venues (and quickly), conferences want to present a program with many high-quality and few low-quality papers, and reviewers want to avoid being overburdened by reviews. These objectives are far from aligned, primarily because the evaluation of a submission…
▽ More
The conference peer review process involves three constituencies with different objectives: authors want their papers accepted at prestigious venues (and quickly), conferences want to present a program with many high-quality and few low-quality papers, and reviewers want to avoid being overburdened by reviews. These objectives are far from aligned, primarily because the evaluation of a submission is inherently noisy. Over the years, conferences have experimented with numerous policies to navigate the tradeoffs. These experiments include setting various bars for acceptance, varying the number of reviews per submission, requiring prior reviews to be included with resubmissions, and others. In this work, we investigate, both analytically and empirically, how well various policies work, and more importantly, why they do or do not work.
We model the conference-author interactions as a Stackelberg game in which a prestigious conference commits to an acceptance policy; the authors best-respond by (re)submitting or not (re)submitting to the conference in each round of review, the alternative being a "sure accept" (such as a lightly refereed venue). Our main results include the following observations: 1) the conference should typically set a higher acceptance threshold than the actual desired quality; we call this the "resubmission gap". 2) the reviewing load is heavily driven by resubmissions of borderline papers - therefore, a judicious choice of acceptance threshold may lead to fewer reviews while incurring an acceptable loss in conference quality. 3) conference prestige, reviewer inaccuracy, and author patience increase the resubmission gap, and thus increase the review load for a fixed level of conference quality. For robustness, we further consider different models of paper quality and compare our theoretical results to simulations based on plausible parameters estimated from real data.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Binary Search with Distance-Dependent Costs
Authors:
Calvin Leng,
David Kempe
Abstract:
We introduce a search problem generalizing the typical setting of Binary Search on the line. Similar to the setting for Binary Search, a target is chosen adversarially on the line, and in response to a query, the algorithm learns whether the query was correct, too high, or too low. Different from the Binary Search setting, the cost of a query is a monotone non-decreasing function of the distance b…
▽ More
We introduce a search problem generalizing the typical setting of Binary Search on the line. Similar to the setting for Binary Search, a target is chosen adversarially on the line, and in response to a query, the algorithm learns whether the query was correct, too high, or too low. Different from the Binary Search setting, the cost of a query is a monotone non-decreasing function of the distance between the query and the correct answer; different functions can be used for queries that are too high vs. those that are too low. The algorithm's goal is to identify an adversarially chosen target with minimum total cost. Note that the algorithm does not even know the cost it incurred until the end, when the target is revealed. This abstraction captures many natural settings in which a principal experiments by setting a quantity (such as an item price, bandwidth, tax rate, medicine dosage, etc.) where the cost or regret increases the further the chosen setting is from the optimal one.
First, we show that for arbitrary symmetric cost functions (i.e., overshooting vs. undershooting by the same amount leads to the same cost), the standard Binary Search algorithm is a 4-approximation.
We then show that when the cost functions are bounded-degree polynomials of the distance, the problem can be solved optimally using Dynamic Programming; this relies on a careful encoding of the combined cost of past queries (which, recall, will only be revealed in the future). We then generalize the setting to finding a node on a tree; here, the response to a query is the direction on the tree in which the target is located, and the cost is increasing in the distance on the tree from the query to the target. Using the k-cut search tree framework of Berendsohn and Kozma and the ideas we developed for the case of the line, we give a PTAS when the cost function is a bounded-degree polynomial.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Fairness in Matching under Uncertainty
Authors:
Siddartha Devic,
David Kempe,
Vatsal Sharan,
Aleksandra Korolova
Abstract:
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings. Algorithmic decisions are used in assigning students to schools, users to advertisers, and applicants to job interviews. These decisions should heed the preferences of individuals, and simultaneously be fair with respect to their merits (synonymous with fit, future per…
▽ More
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings. Algorithmic decisions are used in assigning students to schools, users to advertisers, and applicants to job interviews. These decisions should heed the preferences of individuals, and simultaneously be fair with respect to their merits (synonymous with fit, future performance, or need). Merits conditioned on observable features are always \emph{uncertain}, a fact that is exacerbated by the widespread use of machine learning algorithms to infer merit from the observables. As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it. We design a linear programming framework to find fair utility-maximizing distributions over allocations, and we show that the linear program is robust to perturbations in the estimated parameters of the uncertain merit distributions, a key property in combining the approach with machine learning techniques.
△ Less
Submitted 16 June, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Online Team Formation under Different Synergies
Authors:
Matthew Eichhorn,
Siddhartha Banerjee,
David Kempe
Abstract:
Team formation is ubiquitous in many sectors: education, labor markets, sports, etc. A team's success depends on its members' latent types, which are not directly observable but can be (partially) inferred from past performances. From the viewpoint of a principal trying to select teams, this leads to a natural exploration-exploitation trade-off: retain successful teams that are discovered early, o…
▽ More
Team formation is ubiquitous in many sectors: education, labor markets, sports, etc. A team's success depends on its members' latent types, which are not directly observable but can be (partially) inferred from past performances. From the viewpoint of a principal trying to select teams, this leads to a natural exploration-exploitation trade-off: retain successful teams that are discovered early, or reassign agents to learn more about their types? We study a natural model for online team formation, where a principal repeatedly partitions a group of agents into teams. Agents have binary latent types, each team comprises two members, and a team's performance is a symmetric function of its members' types. Over multiple rounds, the principal selects matchings over agents and incurs regret equal to the deficit in the number of successful teams versus the optimal matching for the given function. Our work provides a complete characterization of the regret landscape for all symmetric functions of two binary inputs. In particular, we develop team-selection policies that, despite being agnostic of model parameters, achieve optimal or near-optimal regret against an adaptive adversary.
△ Less
Submitted 14 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Active Learning for Non-Parametric Choice Models
Authors:
Fransisca Susan,
Negin Golrezaei,
Ehsan Emamjomeh-Zadeh,
David Kempe
Abstract:
We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model. This representation provably encodes all the information about the choice model which can…
▽ More
We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model. This representation provably encodes all the information about the choice model which can be inferred from the available data, in the sense that it permits computing all choice probabilities.
We establish that given exact choice probabilities for a collection of item sets, one can reconstruct the DAG. However, attempting to extend this methodology to estimate the DAG from noisy choice frequency data obtained during an active learning process leads to inaccuracies. To address this challenge, we present an inclusion-exclusion approach that effectively manages error propagation across DAG levels, leading to a more accurate estimate of the DAG. Utilizing this technique, our algorithm estimates the DAG representation of an underlying non-parametric choice model. The algorithm operates efficiently (in polynomial time) when the set of frequent rankings is drawn uniformly at random. It learns the distribution over the most popular items among frequent preference types by actively and repeatedly offering assortments of items and observing the chosen item. We demonstrate that our algorithm more effectively recovers a set of frequent preferences on both synthetic and publicly available datasets on consumers' preferences, compared to corresponding non-active learning estimation algorithms. These findings underscore the value of our algorithm and the broader applicability of active-learning approaches in modeling consumer behavior.
△ Less
Submitted 25 April, 2024; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Plurality Veto: A Simple Voting Rule Achieving Optimal Metric Distortion
Authors:
Fatih Erdem Kizilkaya,
David Kempe
Abstract:
The metric distortion framework posits that n voters and m candidates are jointly embedded in a metric space such that voters rank candidates that are closer to them higher. A voting rule's purpose is to pick a candidate with minimum total distance to the voters, given only the rankings, but not the actual distances. As a result, in the worst case, each deterministic rule picks a candidate whose t…
▽ More
The metric distortion framework posits that n voters and m candidates are jointly embedded in a metric space such that voters rank candidates that are closer to them higher. A voting rule's purpose is to pick a candidate with minimum total distance to the voters, given only the rankings, but not the actual distances. As a result, in the worst case, each deterministic rule picks a candidate whose total distance is at least three times larger than that of an optimal one, i.e., has distortion at least 3. A recent breakthrough result showed that achieving this bound of 3 is possible; however, the proof is non-constructive, and the voting rule itself is a complicated exhaustive search.
Our main result is an extremely simple voting rule, called Plurality Veto, which achieves the same optimal distortion of 3. Each candidate starts with a score equal to his number of first-place votes. These scores are then gradually decreased via an n-round veto process in which a candidate drops out when his score reaches zero. One after the other, voters decrement the score of their bottom choice among the standing candidates, and the last standing candidate wins. We give a one-paragraph proof that this voting rule achieves distortion 3. This rule is also immensely practical, and it only makes two queries to each voter, so it has low communication overhead.
We also generalize Plurality Veto into a class of randomized voting rules in the following way: Plurality veto is run only for k < n rounds; then, a candidate is chosen with probability proportional to his residual score. This general rule interpolates between Random Dictatorship (for k=0) and Plurality Veto (for k=n-1), and k controls the variance of the output. We show that for all k, this rule has distortion at most 3.
△ Less
Submitted 29 June, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Allocating with Priorities and Quotas: Algorithms, Complexity, and Dynamics
Authors:
Siddhartha Banerjee,
Matthew Eichhorn,
David Kempe
Abstract:
In many applications such as rationing medical care and supplies, university admissions, and the assignment of public housing, the decision of who receives an allocation can be justified by various normative criteria. Such settings have motivated the following priority-respecting allocation problem: several categories, each with a quota of interchangeable items, wish to allocate the items among a…
▽ More
In many applications such as rationing medical care and supplies, university admissions, and the assignment of public housing, the decision of who receives an allocation can be justified by various normative criteria. Such settings have motivated the following priority-respecting allocation problem: several categories, each with a quota of interchangeable items, wish to allocate the items among a set of agents. Each category has a list of eligible agents and a priority ordering over these agents; agents may be eligible in multiple categories. The goal is to select a valid allocation: one that respects quotas, eligibility, and priorities and ensures Pareto efficiency. We provide an algorithmic characterization of all valid allocations, exhibiting a bijection between sets of agents who can be allocated and maximum-weight matchings under carefully chosen rank-based weights. While prior work provides a polynomial-time algorithm to locate a valid allocation, our characterization admits a simpler algorithm that enables two wide-reaching extensions: 1. Selecting valid allocations that satisfy additional criteria: Via three examples -- inclusion/exclusion of some chosen agent; agent-side Pareto efficiency vs. welfare maximization; and fairness from the perspective of allocated vs. unallocated agents -- we show that finding priority-respecting allocations subject to some secondary constraint straddles a complexity knife-edge; in each example, one problem variant can be solved efficiently, while its variant is NP-hard. 2. Efficiency-envy tradeoffs in dynamic allocation: In settings where allocations must be made to T agents arriving sequentially via some stochastic process, we show that while insisting on zero priority violations leads to an Omega(T) loss in efficiency, one can design allocation policies ensuring that the sum of the efficiency loss and priority violations in hindsight is O(1).
△ Less
Submitted 26 May, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Networked Restless Multi-Armed Bandits for Mobile Interventions
Authors:
Han-Ching Ou,
Christoph Siebenbrunner,
Jackson Killian,
Meredith B Brooks,
David Kempe,
Yevgeniy Vorobeychik,
Milind Tambe
Abstract:
Motivated by a broad class of mobile intervention problems, we propose and study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. In mobil…
▽ More
Motivated by a broad class of mobile intervention problems, we propose and study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. In mobile interventions, network effects may arise due to regular population movements (such as commuting between home and work). We show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. We propose a new solution approach for networked RMABs, exploiting concavity properties which arise under natural assumptions on the structure of intervention effects. We provide sufficient conditions for optimality of our approach in idealized settings and demonstrate that it empirically outperforms state-of-the art baselines in three mobile intervention domains using real-world graphs.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
On the benefits of being constrained when receiving signals
Authors:
Shih-Tang Su,
David Kempe,
Vijay G. Subramanian
Abstract:
We study a Bayesian persuasion setting in which the receiver is trying to match the (binary) state of the world. The sender's utility is partially aligned with the receiver's, in that conditioned on the receiver's action, the sender derives higher utility when the state of the world matches the action.
Our focus is on whether, in such a setting, being constrained helps a receiver. Intuitively, i…
▽ More
We study a Bayesian persuasion setting in which the receiver is trying to match the (binary) state of the world. The sender's utility is partially aligned with the receiver's, in that conditioned on the receiver's action, the sender derives higher utility when the state of the world matches the action.
Our focus is on whether, in such a setting, being constrained helps a receiver. Intuitively, if the receiver can only take the sender's preferred action with a smaller probability, the sender might have to reveal more information, so that the receiver can take the action more specifically when the sender prefers it. We show that with a binary state of the world, this intuition indeed carries through: under very mild non-degeneracy conditions, a more constrained receiver will always obtain (weakly) higher utility than a less constrained one. Unfortunately, without additional assumptions, the result does not hold when there are more than two states in the world, which we show with an explicit example.
△ Less
Submitted 25 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Threshold Tests as Quality Signals: Optimal Strategies, Equilibria, and Price of Anarchy
Authors:
Siddhartha Banerjee,
David Kempe,
Robert Kleinberg
Abstract:
We study a signaling game between two firms competing to have their product chosen by a principal. The products have qualities drawn i.i.d. from a common prior. The principal aims to choose the better product, but the quality of a product can only be estimated via a coarse-grained threshold test: for chosen $θ$, the principal learns whether a product's quality exceeds $θ$ or not.
We study this p…
▽ More
We study a signaling game between two firms competing to have their product chosen by a principal. The products have qualities drawn i.i.d. from a common prior. The principal aims to choose the better product, but the quality of a product can only be estimated via a coarse-grained threshold test: for chosen $θ$, the principal learns whether a product's quality exceeds $θ$ or not.
We study this problem under two types of interactions. In the first, the principal does the testing herself, and can choose tests from a class of allowable tests. We show that the optimum strategy for the principal is to administer different tests to the two products: one which is passed with probability $\frac{1}{3}$ and the other with probability $\frac{2}{3}$. If, however, the principal is required to choose the tests in a symmetric manner (i.e., via an i.i.d.~distribution), then the optimal strategy is to choose tests whose probability of passing is drawn uniformly from $[\frac{1}{4}, \frac{3}{4}]$.
In our second model, test difficulties are selected endogenously by the firms. This corresponds to a setting in which the firms must commit to their testing procedures before knowing the quality of their products. This interaction naturally gives rise to a signaling game; we characterize the unique Bayes-Nash Equilibrium of this game, which happens to be symmetric. We then calculate its Price of Anarchy in terms of the principal's probability of choosing the worse product. Finally, we show that by restricting both firms' set of available thresholds to choose from, the principal can lower the Price of Anarchy of the resulting equilibrium; however, there is a limit, in that for every (common) restricted set of tests, the equilibrium failure probability is strictly larger than under the optimal i.i.d. distribution.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Structural Stability of a Family of Group Formation Games
Authors:
Chenlan Wang,
Mehrdad Moharrami,
Kun Jin,
David Kempe,
P. Jeffrey Brantingham,
Mingyan Liu
Abstract:
We introduce and study a group formation game in which individuals/agents, driven by self-interest, team up in disjoint groups so as to be in groups of high collective strength. This strength could be group identity, reputation, or protection, and is equally shared by all group members. The group's access to resources, obtained from its members, is traded off against the geographic dispersion of t…
▽ More
We introduce and study a group formation game in which individuals/agents, driven by self-interest, team up in disjoint groups so as to be in groups of high collective strength. This strength could be group identity, reputation, or protection, and is equally shared by all group members. The group's access to resources, obtained from its members, is traded off against the geographic dispersion of the group: spread-out groups are more costly to maintain. We seek to understand the stability and structure of such partitions. We define two types of equilibria: Acceptance Equilibria (AE), in which no agent will unilaterally change group affiliation, either because the agent cannot increase her utility by switching, or because the intended receiving group is unwilling to accept her (i.e., the utility of existing members would decrease if she joined); and Strong Acceptance Equilibria (SAE), in which no subset of any group will change group affiliations (move together) for the same reasons given above. We show that under natural assumptions on the group utility functions, both an AE and SAE always exist, and that any sequence of improving deviations by agents (resp., subsets of agents in the same group) converges to an AE (resp., SAE). We then characterize the properties of the AEs. We show that an "encroachment" relationship - which groups have members in the territory of other groups - always gives rise to a directed acyclic graph (DAG); conversely, given any DAG, we can construct a game with suitable conditions on the utility function that has an AE with the encroachment structure specified by the given graph.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Fairness in Ranking under Uncertainty
Authors:
Ashudeep Singh,
David Kempe,
Thorsten Joachims
Abstract:
Fairness has emerged as an important consideration in algorithmic decision-making. Unfairness occurs when an agent with higher merit obtains a worse outcome than an agent with lower merit. Our central point is that a primary cause of unfairness is uncertainty. A principal or algorithm making decisions never has access to the agents' true merit, and instead uses proxy features that only imperfectly…
▽ More
Fairness has emerged as an important consideration in algorithmic decision-making. Unfairness occurs when an agent with higher merit obtains a worse outcome than an agent with lower merit. Our central point is that a primary cause of unfairness is uncertainty. A principal or algorithm making decisions never has access to the agents' true merit, and instead uses proxy features that only imperfectly predict merit (e.g., GPA, star ratings, recommendation letters). None of these ever fully capture an agent's merit; yet existing approaches have mostly been defining fairness notions directly based on observed features and outcomes.
Our primary point is that it is more principled to acknowledge and model the uncertainty explicitly. The role of observed features is to give rise to a posterior distribution of the agents' merits. We use this viewpoint to define a notion of approximate fairness in ranking. We call an algorithm $φ$-fair (for $φ\in [0,1]$) if it has the following property for all agents $x$ and all $k$: if agent $x$ is among the top $k$ agents with respect to merit with probability at least $ρ$ (according to the posterior merit distribution), then the algorithm places the agent among the top $k$ agents in its ranking with probability at least $φρ$.
We show how to compute rankings that optimally trade off approximate fairness against utility to the principal. In addition to the theoretical characterization, we present an empirical analysis of the potential impact of the approach in simulation studies. For real-world validation, we applied the approach in the context of a paper recommendation system that we built and fielded at the KDD 2020 conference.
△ Less
Submitted 10 November, 2021; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Altruism Design in Networked Public Goods Games
Authors:
Sixie Yu,
David Kempe,
Yevgeniy Vorobeychik
Abstract:
Many collective decision-making settings feature a strategic tension between agents acting out of individual self-interest and promoting a common good. These include wearing face masks during a pandemic, voting, and vaccination. Networked public goods games capture this tension, with networks encoding strategic interdependence among agents. Conventional models of public goods games posit solely in…
▽ More
Many collective decision-making settings feature a strategic tension between agents acting out of individual self-interest and promoting a common good. These include wearing face masks during a pandemic, voting, and vaccination. Networked public goods games capture this tension, with networks encoding strategic interdependence among agents. Conventional models of public goods games posit solely individual self-interest as a motivation, even though altruistic motivations have long been known to play a significant role in agents' decisions. We introduce a novel extension of public goods games to account for altruistic motivations by adding a term in the utility function that incorporates the perceived benefits an agent obtains from the welfare of others, mediated by an altruism graph. Most importantly, we view altruism not as immutable, but rather as a lever for promoting the common good. Our central algorithmic question then revolves around the computational complexity of modifying the altruism network to achieve desired public goods game investment profiles. We first show that the problem can be solved using linear programming when a principal can fractionally modify the altruism network. While the problem becomes in general intractable if the principal's actions are all-or-nothing, we exhibit several tractable special cases.
△ Less
Submitted 2 May, 2021;
originally announced May 2021.
-
Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
Authors:
Ehsan Emamjomeh-Zadeh,
Chen-Yu Wei,
Haipeng Luo,
David Kempe
Abstract:
We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about). The work of Kleinberg et al. (2010) showed that there exist no-regret algorithms which perform no worse than the best ranking of actions asymptotically. Unfortunately, achieving this regret bound appears computati…
▽ More
We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about). The work of Kleinberg et al. (2010) showed that there exist no-regret algorithms which perform no worse than the best ranking of actions asymptotically. Unfortunately, achieving this regret bound appears computationally hard: Kanade and Steinke (2014) showed that achieving this no-regret performance is at least as hard as PAC-learning DNFs, a notoriously difficult problem. In the present work, we relax the original problem and study computationally efficient no-approximate-regret algorithms: such algorithms may exceed the optimal cost by a multiplicative constant in addition to the additive regret. We give an algorithm that provides a no-approximate-regret guarantee for the general sleeping expert/bandit problems. For several canonical special cases of the problem, we give algorithms with significantly better approximation ratios; these algorithms also illustrate different techniques for achieving no-approximate-regret guarantees.
△ Less
Submitted 26 April, 2021; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Inducing Equilibria in Networked Public Goods Games through Network Structure Modification
Authors:
David Kempe,
Sixie Yu,
Yevgeniy Vorobeychik
Abstract:
Networked public goods games model scenarios in which self-interested agents decide whether or how much to invest in an action that benefits not only themselves, but also their network neighbors. Examples include vaccination, security investment, and crime reporting. While every agent's utility is increasing in their neighbors' joint investment, the specific form can vary widely depending on the s…
▽ More
Networked public goods games model scenarios in which self-interested agents decide whether or how much to invest in an action that benefits not only themselves, but also their network neighbors. Examples include vaccination, security investment, and crime reporting. While every agent's utility is increasing in their neighbors' joint investment, the specific form can vary widely depending on the scenario. A principal, such as a policymaker, may wish to induce large investment from the agents. Besides direct incentives, an important lever here is the network structure itself: by adding and removing edges, for example, through community meetings, the principal can change the nature of the utility functions, resulting in different, and perhaps socially preferable, equilibrium outcomes. We initiate an algorithmic study of targeted network modifications with the goal of inducing equilibria of a particular form. We study this question for a variety of equilibrium forms (induce all agents to invest, at least a given set $S$, exactly a given set $S$, at least $k$ agents), and for a variety of utility functions. While we show that the problem is NP-complete for a number of these scenarios, we exhibit a broad array of scenarios in which the problem can be solved in polynomial time by non-trivial reductions to (minimum-cost) matching problems.
△ Less
Submitted 2 September, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.
-
The Complexity of Interactively Learning a Stable Matching by Trial and Error
Authors:
Ehsan Emamjomeh-Zadeh,
Yannai A. Gonczarowski,
David Kempe
Abstract:
In a stable matching setting, we consider a query model that allows for an interactive learning algorithm to make precisely one type of query: proposing a matching, the response to which is either that the proposed matching is stable, or a blocking pair (chosen adversarially) indicating that this matching is unstable. For one-to-one matching markets, our main result is an essentially tight upper b…
▽ More
In a stable matching setting, we consider a query model that allows for an interactive learning algorithm to make precisely one type of query: proposing a matching, the response to which is either that the proposed matching is stable, or a blocking pair (chosen adversarially) indicating that this matching is unstable. For one-to-one matching markets, our main result is an essentially tight upper bound of $O(n^2\log n)$ on the deterministic query complexity of interactively learning a stable matching in this coarse query model, along with an efficient randomized algorithm that achieves this query complexity with high probability. For many-to-many matching markets in which participants have responsive preferences, we first give an interactive learning algorithm whose query complexity and running time are polynomial in the size of the market if the maximum quota of each agent is bounded; our main result for many-to-many markets is that the deterministic query complexity can be made polynomial (more specifically, $O(n^3 \log n)$) in the size of the market even for arbitrary (e.g., linear in the market size) quotas.
△ Less
Submitted 19 September, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Communication, Distortion, and Randomness in Metric Voting
Authors:
David Kempe
Abstract:
In distortion-based analysis of social choice rules over metric spaces, one assumes that all voters and candidates are jointly embedded in a common metric space. Voters rank candidates by non-decreasing distance. The mechanism, receiving only this ordinal (comparison) information, should select a candidate approximately minimizing the sum of distances from all voters. It is known that while the Co…
▽ More
In distortion-based analysis of social choice rules over metric spaces, one assumes that all voters and candidates are jointly embedded in a common metric space. Voters rank candidates by non-decreasing distance. The mechanism, receiving only this ordinal (comparison) information, should select a candidate approximately minimizing the sum of distances from all voters. It is known that while the Copeland rule and related rules guarantee distortion at most 5, many other standard voting rules, such as Plurality, Veto, or $k$-approval, have distortion growing unboundedly in the number $n$ of candidates.
Plurality, Veto, or $k$-approval with small $k$ require less communication from the voters than all deterministic social choice rules known to achieve constant distortion. This motivates our study of the tradeoff between the distortion and the amount of communication in deterministic social choice rules.
We show that any one-round deterministic voting mechanism in which each voter communicates only the candidates she ranks in a given set of $k$ positions must have distortion at least $\frac{2n-k}{k}$; we give a mechanism achieving an upper bound of $O(n/k)$, which matches the lower bound up to a constant. For more general communication-bounded voting mechanisms, in which each voter communicates $b$ bits of information about her ranking, we show a slightly weaker lower bound of $Ω(n/b)$ on the distortion.
For randomized mechanisms, it is known that Random Dictatorship achieves expected distortion strictly smaller than 3, almost matching a lower bound of $3-\frac{2}{n}$ for any randomized mechanism that only receives each voter's top choice. We close this gap, by giving a simple randomized social choice rule which only uses each voter's first choice, and achieves expected distortion $3-\frac{2}{n}$.
△ Less
Submitted 20 November, 2019; v1 submitted 19 November, 2019;
originally announced November 2019.
-
An Analysis Framework for Metric Voting based on LP Duality
Authors:
David Kempe
Abstract:
Distortion-based analysis has established itself as a fruitful framework for comparing voting mechanisms. m voters and n candidates are jointly embedded in an (unknown) metric space, and the voters submit rankings of candidates by non-decreasing distance from themselves. Based on the submitted rankings, the social choice rule chooses a winning candidate; the quality of the winner is the sum of the…
▽ More
Distortion-based analysis has established itself as a fruitful framework for comparing voting mechanisms. m voters and n candidates are jointly embedded in an (unknown) metric space, and the voters submit rankings of candidates by non-decreasing distance from themselves. Based on the submitted rankings, the social choice rule chooses a winning candidate; the quality of the winner is the sum of the (unknown) distances to the voters. The rule's choice will in general be suboptimal, and the worst-case ratio between the cost of its chosen candidate and the optimal candidate is called the rule's distortion. It was shown in prior work that every deterministic rule has distortion at least 3, while the Copeland rule and related rules guarantee worst-case distortion at most 5; a very recent result gave a rule with distortion $2+\sqrt{5} \approx 4.236$.
We provide a framework based on LP-duality and flow interpretations of the dual which provides a simpler and more unified way for proving upper bounds on the distortion of social choice rules. We illustrate the utility of this approach with three examples. First, we give a fairly simple proof of a strong generalization of the upper bound of 5 on the distortion of Copeland, to social choice rules with short paths from the winning candidate to the optimal candidate in generalized weak preference graphs. A special case of this result recovers the recent $2+\sqrt{5}$ guarantee. Second, using this generalized bound, we show that the Ranked Pairs and Schulze rules have distortion $Θ(\sqrt(n))$. Finally, our framework naturally suggests a combinatorial rule that is a strong candidate for achieving distortion 3, which had also been proposed in recent work. We prove that the distortion bound of 3 would follow from any of three combinatorial conjectures we formulate.
△ Less
Submitted 14 December, 2019; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Approximation Algorithms for Coordinating Ad Campaigns on Social Networks
Authors:
Kartik Lakhotia,
David Kempe
Abstract:
We study a natural model of coordinated social ad campaigns over a social network, based on models of Datta et al. and Aslay et al. Multiple advertisers are willing to pay the host - up to a known budget - per user exposure, whether the exposure is sponsored or orgain (i.e. shared by a friend). Campaigns are seeded with sponsored ads to some users, but no user must be exposed to too many sponsored…
▽ More
We study a natural model of coordinated social ad campaigns over a social network, based on models of Datta et al. and Aslay et al. Multiple advertisers are willing to pay the host - up to a known budget - per user exposure, whether the exposure is sponsored or orgain (i.e. shared by a friend). Campaigns are seeded with sponsored ads to some users, but no user must be exposed to too many sponsored ads. Thus, while ad campaigns proceed independently over the network, they need to be carefully coordinated with respect to their seed sets.
We study the objective of maximizing host's total ad revenue. Our main result is to show that under a broad class of influence models, the problem can be reduced to maximizing a submodular function subject to two matroid constraints; it can therefore be approximated within a factor essentially 1/2 in polynomial time. When there is no bound on the individual seed set sizes of advertisers, the constraints correspond only to a single matroid, and the guarantee can be improved to 1-1/e; in that case, a factor 1/2 is achieved by a practical greedy algorithm. The 1-1/e approximation algorithm for matroid-constrained problem is far from practical; however, we show that specifically under the Independent Cascade model, LP rounding and Reverse Reachability techniques can be combined to obtain a 1-1/e approximation algorithm.
Our theoretical results are complemented by experiments evaluating the extent to which the coordination of multiple ad campaigns inhibits the revenue obtained from each individual campaign, as a function of the similarity of the influence networks and strength of ties in the networks. Our experiments suggest that as networks for different advertisers become less similar, the harmful effect of competition decreases. With respect to tie strengths, we show that the most harm is done in an intermediate range.
△ Less
Submitted 26 December, 2019; v1 submitted 24 August, 2019;
originally announced August 2019.
-
Alea Iacta Est: Auctions, Persuasion, Interim Rules, and Dice
Authors:
Shaddin Dughmi,
David Kempe,
Ruixin Qiang
Abstract:
To select a subset of samples or "winners" from a population of candidates, order sampling [Rosen 1997] and the k-unit Myerson auction [Myerson 1981] share a common scheme: assign a (random) score to each candidate, then select the k candidates with the highest scores. We study a generalization of both order sampling and Myerson's allocation rule, called winner-selecting dice. The setting for winn…
▽ More
To select a subset of samples or "winners" from a population of candidates, order sampling [Rosen 1997] and the k-unit Myerson auction [Myerson 1981] share a common scheme: assign a (random) score to each candidate, then select the k candidates with the highest scores. We study a generalization of both order sampling and Myerson's allocation rule, called winner-selecting dice. The setting for winner-selecting dice is similar to auctions with feasibility constraints: candidates have random types drawn from independent prior distributions, and the winner set must be feasible subject to certain constraints. Dice (distributions over scores) are assigned to each type, and winners are selected to maximize the sum of the dice rolls, subject to the feasibility constraints. We examine the existence of winner-selecting dice that implement prescribed probabilities of winning (i.e., an interim rule) for all types.
Our first result shows that when the feasibility constraint is a matroid, then for any feasible interim rule, there always exist winner-selecting dice that implement it. Unfortunately, our proof does not yield an efficient algorithm for constructing the dice. In the special case of a 1-uniform matroid, i.e., only one winner can be selected, we give an efficient algorithm that constructs winner-selecting dice for any feasible interim rule. Furthermore, when the types of the candidates are drawn in an i.i.d.~manner and the interim rule is symmetric across candidates, unsurprisingly, an algorithm can efficiently construct symmetric dice that only depend on the type but not the identity of the candidate.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
A Class of Weighted TSPs with Applications
Authors:
David Kempe,
Mark Klein
Abstract:
Motivated by applications to poaching and burglary prevention, we define a class of weighted Traveling Salesman Problems on metric spaces. The goal is to output an infinite (though typically periodic) tour that visits the n points repeatedly, such that no point goes unvisited for "too long." More specifically, we consider two objective functions for each point x. The maximum objective is simply th…
▽ More
Motivated by applications to poaching and burglary prevention, we define a class of weighted Traveling Salesman Problems on metric spaces. The goal is to output an infinite (though typically periodic) tour that visits the n points repeatedly, such that no point goes unvisited for "too long." More specifically, we consider two objective functions for each point x. The maximum objective is simply the maximum duration of any absence from x, while the quadratic objective is the normalized sum of squares of absence lengths from x. For periodic tours, the quadratic objective captures the expected duration of absence from x at a uniformly random point in time during the tour. The overall objective is then the weighted maximum of the individual points' objectives. When a point has weight w_x, the absences under an optimal tour should be roughly a 1/w_x fraction of the absences from points of weight 1. Thus, the objective naturally encourages visiting high-weight points more frequently, and at roughly evenly spaced intervals.
We give a polynomial-time combinatorial algorithm whose output is simultaneously an O(log n) approximation under both objectives. We prove that up to constant factors, approximation guarantees for the quadratic objective directly imply the same guarantees for a natural security patrol game defined in recent work.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
On the Distortion of Voting with Multiple Representative Candidates
Authors:
Yu Cheng,
Shaddin Dughmi,
David Kempe
Abstract:
We study positional voting rules when candidates and voters are embedded in a common metric space, and cardinal preferences are naturally given by distances in the metric space. In a positional voting rule, each candidate receives a score from each ballot based on the ballot's rank order; the candidate with the highest total score wins the election. The cost of a candidate is his sum of distances…
▽ More
We study positional voting rules when candidates and voters are embedded in a common metric space, and cardinal preferences are naturally given by distances in the metric space. In a positional voting rule, each candidate receives a score from each ballot based on the ballot's rank order; the candidate with the highest total score wins the election. The cost of a candidate is his sum of distances to all voters, and the distortion of an election is the ratio between the cost of the elected candidate and the cost of the optimum candidate. We consider the case when candidates are representative of the population, in the sense that they are drawn i.i.d. from the population of the voters, and analyze the expected distortion of positional voting rules.
Our main result is a clean and tight characterization of positional voting rules that have constant expected distortion (independent of the number of candidates and the metric space). Our characterization result immediately implies constant expected distortion for Borda Count and elections in which each voter approves a constant fraction of all candidates. On the other hand, we obtain super-constant expected distortion for Plurality, Veto, and approving a constant number of candidates. These results contrast with previous results on voting with metric preferences: When the candidates are chosen adversarially, all of the preceding voting rules have distortion linear in the number of candidates or voters. Thus, the model of representative candidates allows us to distinguish voting rules which seem equally bad in the worst case.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
A General Framework for Robust Interactive Learning
Authors:
Ehsan Emamjomeh-Zadeh,
David Kempe
Abstract:
We propose a general framework for interactively learning models, such as (binary or non-binary) classifiers, orderings/rankings of items, or clusterings of data points. Our framework is based on a generalization of Angluin's equivalence query model and Littlestone's online learning model: in each iteration, the algorithm proposes a model, and the user either accepts it or reveals a specific mista…
▽ More
We propose a general framework for interactively learning models, such as (binary or non-binary) classifiers, orderings/rankings of items, or clusterings of data points. Our framework is based on a generalization of Angluin's equivalence query model and Littlestone's online learning model: in each iteration, the algorithm proposes a model, and the user either accepts it or reveals a specific mistake in the proposal. The feedback is correct only with probability $p > 1/2$ (and adversarially incorrect with probability $1 - p$), i.e., the algorithm must be able to learn in the presence of arbitrary noise. The algorithm's goal is to learn the ground truth model using few iterations.
Our general framework is based on a graph representation of the models and user feedback. To be able to learn efficiently, it is sufficient that there be a graph $G$ whose nodes are the models and (weighted) edges capture the user feedback, with the property that if $s, s^*$ are the proposed and target models, respectively, then any (correct) user feedback $s'$ must lie on a shortest $s$-$s^*$ path in $G$. Under this one assumption, there is a natural algorithm reminiscent of the Multiplicative Weights Update algorithm, which will efficiently learn $s^*$ even in the presence of noise in the user's feedback.
From this general result, we rederive with barely any extra effort classic results on learning of classifiers and a recent result on interactive clustering; in addition, we easily obtain new interactive learning algorithms for ordering/ranking.
△ Less
Submitted 15 October, 2017;
originally announced October 2017.
-
Adaptive Hierarchical Clustering Using Ordinal Queries
Authors:
Ehsan Emamjomeh-Zadeh,
David Kempe
Abstract:
In many applications of clustering (for example, ontologies or clusterings of animal or plant species), hierarchical clusterings are more descriptive than a flat clustering. A hierarchical clustering over $n$ elements is represented by a rooted binary tree with $n$ leaves, each corresponding to one element. The subtrees rooted at interior nodes capture the clusters. In this paper, we study active…
▽ More
In many applications of clustering (for example, ontologies or clusterings of animal or plant species), hierarchical clusterings are more descriptive than a flat clustering. A hierarchical clustering over $n$ elements is represented by a rooted binary tree with $n$ leaves, each corresponding to one element. The subtrees rooted at interior nodes capture the clusters. In this paper, we study active learning of a hierarchical clustering using only ordinal queries. An ordinal query consists of a set of three elements, and the response to a query reveals the two elements (among the three elements in the query) which are "closer" to each other than to the third one. We say that elements $x$ and $x'$ are closer to each other than $x"$ if there exists a cluster containing $x$ and $x'$, but not $x"$.
When all the query responses are correct, there is a deterministic algorithm that learns the underlying hierarchical clustering using at most $n \log_2 n$ adaptive ordinal queries. We generalize this algorithm to be robust in a model in which each query response is correct independently with probability $p > \frac{1}{2}$, and adversarially incorrect with probability $1 - p$. We show that in the presence of noise, our algorithm outputs the correct hierarchical clustering with probability at least $1 - δ$, using $O(n \log n + n \log(1/δ))$ adaptive ordinal queries. For our results, adaptivity is crucial: we prove that even in the absence of noise, every non-adaptive algorithm requires $Ω(n^3)$ ordinal queries in the worst case.
△ Less
Submitted 16 April, 2018; v1 submitted 31 July, 2017;
originally announced August 2017.
-
Of the People: Voting Is More Effective with Representative Candidates
Authors:
Yu Cheng,
Shaddin Dughmi,
David Kempe
Abstract:
In light of the classic impossibility results of Arrow and Gibbard and Satterthwaite regarding voting with ordinal rules, there has been recent interest in characterizing how well common voting rules approximate the social optimum. In order to quantify the quality of approximation, it is natural to consider the candidates and voters as embedded within a common metric space, and to ask how much fur…
▽ More
In light of the classic impossibility results of Arrow and Gibbard and Satterthwaite regarding voting with ordinal rules, there has been recent interest in characterizing how well common voting rules approximate the social optimum. In order to quantify the quality of approximation, it is natural to consider the candidates and voters as embedded within a common metric space, and to ask how much further the chosen candidate is from the population as compared to the socially optimal one. We use this metric preference model to explore a fundamental and timely question: does the social welfare of a population improve when candidates are representative of the population? If so, then by how much, and how does the answer depend on the complexity of the metric space?
We restrict attention to the most fundamental and common social choice setting: a population of voters, two independently drawn candidates, and a majority rule election. When candidates are not representative of the population, it is known that the candidate selected by the majority rule can be thrice as far from the population as the socially optimal one. We examine how this ratio improves when candidates are drawn independently from the population of voters. Our results are two-fold: When the metric is a line, the ratio improves from $3$ to $4-2\sqrt{2}$, roughly $1.1716$; this bound is tight. When the metric is arbitrary, we show a lower bound of $1.5$ and a constant upper bound strictly better than $2$ on the approximation ratio of the majority rule.
The positive result depends in part on the assumption that candidates are independent and identically distributed. However, we show that independence alone is not enough to achieve the upper bound: even when candidates are drawn independently, if the population of candidates can be different from the voters, then an upper bound of $2$ on the approximation is tight.
△ Less
Submitted 26 August, 2017; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Quasi-regular sequences and optimal schedules for security games
Authors:
David Kempe,
Leonard J. Schulman,
Omer Tamuz
Abstract:
We study security games in which a defender commits to a mixed strategy for protecting a finite set of targets of different values. An attacker, knowing the defender's strategy, chooses which target to attack and for how long. If the attacker spends time $t$ at a target $i$ of value $α_i$, and if he leaves before the defender visits the target, his utility is $t \cdot α_i $; if the defender visits…
▽ More
We study security games in which a defender commits to a mixed strategy for protecting a finite set of targets of different values. An attacker, knowing the defender's strategy, chooses which target to attack and for how long. If the attacker spends time $t$ at a target $i$ of value $α_i$, and if he leaves before the defender visits the target, his utility is $t \cdot α_i $; if the defender visits before he leaves, his utility is 0. The defender's goal is to minimize the attacker's utility. The defender's strategy consists of a schedule for visiting the targets; it takes her unit time to switch between targets. Such games are a simplified model of a number of real-world scenarios such as protecting computer networks from intruders, crops from thieves, etc.
We show that optimal defender play for this continuous time security games reduces to the solution of a combinatorial question regarding the existence of infinite sequences over a finite alphabet, with the following properties for each symbol $i$: (1) $i$ constitutes a prescribed fraction $p_i$ of the sequence. (2) The occurrences of $i$ are spread apart close to evenly, in that the ratio of the longest to shortest interval between consecutive occurrences is bounded by a parameter $K$. We call such sequences $K$-quasi-regular.
We show that, surprisingly, $2$-quasi-regular sequences suffice for optimal defender play. What is more, even randomized $2$-quasi-regular sequences suffice for optimality. We show that such sequences always exist, and can be calculated efficiently.
The question of the least $K$ for which deterministic $K$-quasi-regular sequences exist is fascinating. Using an ergodic theoretical approach, we show that deterministic $3$-quasi-regular sequences always exist. For $2 \leq K < 3$ we do not know whether deterministic $K$-quasi-regular sequences always exist.
△ Less
Submitted 28 October, 2017; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Learning Influence Functions from Incomplete Observations
Authors:
Xinran He,
Ke Xu,
David Kempe,
Yan Liu
Abstract:
We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshol…
▽ More
We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade (DIC) models is established by reducing incomplete observations to complete observations in a modified graph. Our improper PAC learnability result applies for the DLT and DIC models as well as the Continuous-Time Independent Cascade (CIC) model. It is based on a parametrization in terms of reachability features, and also gives rise to an efficient and practical heuristic. Experiments on synthetic and real-world datasets demonstrate the ability of our method to compensate even for a fairly large fraction of missing observations.
△ Less
Submitted 7 November, 2016;
originally announced November 2016.
-
Persuasion with Limited Communication
Authors:
Shaddin Dughmi,
David Kempe,
Ruixin Qiang
Abstract:
We examine information structure design, also called "persuasion" or "signaling", in the presence of a constraint on the amount of communication. We focus on the fundamental setting of bilateral trade, which in its simplest form involves a seller with a single item to price, a buyer whose value for the item is drawn from a common prior distribution over $n$ different possible values, and a take-it…
▽ More
We examine information structure design, also called "persuasion" or "signaling", in the presence of a constraint on the amount of communication. We focus on the fundamental setting of bilateral trade, which in its simplest form involves a seller with a single item to price, a buyer whose value for the item is drawn from a common prior distribution over $n$ different possible values, and a take-it-or-leave-it-offer protocol. A mediator with access to the buyer's type may partially reveal such information to the seller in order to further some objective such as the social welfare or the seller's revenue.
In the setting of maximizing welfare under bilateral trade, we show that $O(\log(n) \log \frac{1}ε)$ signals suffice for a $1-ε$ approximation to the optimal welfare, and this bound is tight. As our main result, we exhibit an efficient algorithm for computing a $\frac{M-1}{M} \cdot (1-1/e)$-approximation to the welfare-maximizing scheme with at most M signals.
For the revenue objective, we show that $Ω(n)$ signals are needed for a constant factor approximation to the revenue of a fully informed seller. From a computational perspective, however, the problem gets easier: we show that a simple dynamic program computes the signaling scheme with M signals maximizing the seller's revenue.
Observing that the signaling problem in bilateral trade is a special case of the fundamental Bayesian Persuasion model of Kamenica and Gentzkow, we also examine the question of communication-constrained signaling more generally. In this model there is a sender (the mediator), a receiver (the seller) looking to take an action (setting the price), and a state of nature (the buyer's type) drawn from a common prior. We show that it is NP-hard to approximate the optimal sender's utility to within any constant factor in the presence of communication constraints.
△ Less
Submitted 5 March, 2020; v1 submitted 24 June, 2016;
originally announced June 2016.
-
Robust Influence Maximization
Authors:
Xinran He,
David Kempe
Abstract:
Uncertainty about models and data is ubiquitous in the computational social sciences, and it creates a need for robust social network algorithms, which can simultaneously provide guarantees across a spectrum of models and parameter settings. We begin an investigation into this broad domain by studying robust algorithms for the Influence Maximization problem, in which the goal is to identify a set…
▽ More
Uncertainty about models and data is ubiquitous in the computational social sciences, and it creates a need for robust social network algorithms, which can simultaneously provide guarantees across a spectrum of models and parameter settings. We begin an investigation into this broad domain by studying robust algorithms for the Influence Maximization problem, in which the goal is to identify a set of k nodes in a social network whose joint influence on the network is maximized.
We define a Robust Influence Maximization framework wherein an algorithm is presented with a set of influence functions, typically derived from different influence models or different parameter settings for the same model. The different parameter settings could be derived from observed cascades on different topics, under different conditions, or at different times. The algorithm's goal is to identify a set of k nodes who are simultaneously influential for all influence functions, compared to the (function-specific) optimum solutions.
We show strong approximation hardness results for this problem unless the algorithm gets to select at least a logarithmic factor more seeds than the optimum solution. However, when enough extra seeds may be selected, we show that techniques of Krause et al. can be used to approximate the optimum robust influence to within a factor of 1 - 1/e. We evaluate this bicriteria approximation algorithm against natural heuristics on several real-world data sets. Our experiments indicate that the worst-case hardness does not necessarily translate into bad performance on real-world data sets; all algorithms perform fairly well.
△ Less
Submitted 10 June, 2016; v1 submitted 16 February, 2016;
originally announced February 2016.
-
Incentivizing Exploration with Heterogeneous Value of Money
Authors:
Li Han,
David Kempe,
Ruixin Qiang
Abstract:
Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent m…
▽ More
Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff.
The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent's tradeoff via a signal that could be more or less informative.
Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called "Diamonds in the Rough" instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances. These results also extend to the budgeted case as in Frazier et al. We also show that the optimal policy is monotone with respect to information, i.e., the approximation ratio of the optimal policy improves as the signals become more informative.
△ Less
Submitted 28 December, 2015;
originally announced December 2015.
-
Occurrence Typing Modulo Theories
Authors:
Andrew M. Kent,
David Kempe,
Sam Tobin-Hochstadt
Abstract:
We present a new type system combining occurrence typing, previously used to type check programs in dynamically-typed languages such as Racket, JavaScript, and Ruby, with dependent refinement types. We demonstrate that the addition of refinement types allows the integration of arbitrary solver-backed reasoning about logical propositions from external theories. By building on occurrence typing, we…
▽ More
We present a new type system combining occurrence typing, previously used to type check programs in dynamically-typed languages such as Racket, JavaScript, and Ruby, with dependent refinement types. We demonstrate that the addition of refinement types allows the integration of arbitrary solver-backed reasoning about logical propositions from external theories. By building on occurrence typing, we can add our enriched type system as an extension of Typed Racket---adding dependency and refinement reuses the existing formalism while increasing its expressiveness.
Dependent refinement types allow Typed Racket programmers to express rich type relationships, ranging from data structure invariants such as red-black tree balance to preconditions such as vector bounds. Refinements allow programmers to embed the propositions that occurrence typing in Typed Racket already reasons about into their types. Further, extending occurrence typing to refinements allows us to make the underlying formalism simpler and more powerful.
In addition to presenting the design of our system, we present a formal model of the system, show how to integrate it with theories over both linear arithmetic and bitvectors, and evaluate the system in the context of the full Typed Racket implementation. Specifically, we take safe vector access as a case study, and examine all vector accesses in a 56,000 line corpus of Typed Racket programs. Our system is able to prove that 50% of these are safe with no new annotation, and with a few annotations and modifications, we can capture close to 80%.
△ Less
Submitted 4 October, 2016; v1 submitted 22 November, 2015;
originally announced November 2015.
-
Deterministic and Probabilistic Binary Search in Graphs
Authors:
Ehsan Emamjomeh-Zadeh,
David Kempe,
Vikrant Singhal
Abstract:
We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node $q$, the algorithm learns either that $q$ is the target, or is given an edge out of $q$ that lies on a shortest path from $q$ to the target. We stud…
▽ More
We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node $q$, the algorithm learns either that $q$ is the target, or is given an edge out of $q$ that lies on a shortest path from $q$ to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability $p > \frac{1}{2}$ (a known constant), and an (adversarial) incorrect one with probability $1-p$.
Our main positive result is that when $p = 1$ (i.e., all answers are correct), $\log_2 n$ queries are always sufficient. For general $p$, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than $(1 - δ)\frac{\log_2 n}{1 - H(p)} + o(\log n) + O(\log^2 (1/δ))$ queries, and identifies the target correctly with probability at leas $1-δ$. Here, $H(p) = -(p \log p + (1-p) \log(1-p))$ denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm.
Even for $p = 1$, we show several hardness results for the problem of determining whether a target can be found using $K$ queries. Our upper bound of $\log_2 n$ implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query $r$ nodes each in $k$ rounds, we show membership in $Σ_{2k-1}$ in the polynomial hierarchy, and hardness for $Σ_{2k-5}$.
△ Less
Submitted 28 July, 2017; v1 submitted 2 March, 2015;
originally announced March 2015.
-
Stability of Influence Maximization
Authors:
Xinran He,
David Kempe
Abstract:
The present article serves as an erratum to our paper of the same title, which was presented and published in the KDD 2014 conference. In that article, we claimed falsely that the objective function defined in Section 1.4 is non-monotone submodular. We are deeply indebted to Debmalya Mandal, Jean Pouget-Abadie and Yaron Singer for bringing to our attention a counter-example to that claim.
Subseq…
▽ More
The present article serves as an erratum to our paper of the same title, which was presented and published in the KDD 2014 conference. In that article, we claimed falsely that the objective function defined in Section 1.4 is non-monotone submodular. We are deeply indebted to Debmalya Mandal, Jean Pouget-Abadie and Yaron Singer for bringing to our attention a counter-example to that claim.
Subsequent to becoming aware of the counter-example, we have shown that the objective function is in fact NP-hard to approximate to within a factor of $O(n^{1-ε})$ for any $ε> 0$.
In an attempt to fix the record, the present article combines the problem motivation, models, and experimental results sections from the original incorrect article with the new hardness result. We would like readers to only cite and use this version (which will remain an unpublished note) instead of the incorrect conference version.
△ Less
Submitted 15 April, 2015; v1 submitted 19 January, 2015;
originally announced January 2015.
-
User Satisfaction in Competitive Sponsored Search
Authors:
David Kempe,
Brendan Lucier
Abstract:
We present a model of competition between web search algorithms, and study the impact of such competition on user welfare. In our model, search providers compete for customers by strategically selecting which search results to display in response to user queries. Customers, in turn, have private preferences over search results and will tend to use search engines that are more likely to display pag…
▽ More
We present a model of competition between web search algorithms, and study the impact of such competition on user welfare. In our model, search providers compete for customers by strategically selecting which search results to display in response to user queries. Customers, in turn, have private preferences over search results and will tend to use search engines that are more likely to display pages satisfying their demands.
Our main question is whether competition between search engines increases the overall welfare of the users (i.e., the likelihood that a user finds a page of interest). When search engines derive utility only from customers to whom they show relevant results, we show that they differentiate their results, and every equilibrium of the resulting game achieves at least half of the welfare that could be obtained by a social planner. This bound also applies whenever the likelihood of selecting a given engine is a convex function of the probability that a user's demand will be satisfied, which includes natural Markovian models of user behavior.
On the other hand, when search engines derive utility from all customers (independent of search result relevance) and the customer demand functions are not convex, there are instances in which the (unique) equilibrium involves no differentiation between engines and a high degree of randomness in search results. This can degrade social welfare by a factor of the square root of N relative to the social optimum, where N is the number of webpages. These bad equilibria persist even when search engines can extract only small (but non-zero) expected revenue from dissatisfied users, and much higher revenue from satisfied ones.
△ Less
Submitted 15 October, 2013;
originally announced October 2013.
-
Pricing Public Goods for Private Sale
Authors:
Michal Feldman,
David Kempe,
Brendan Lucier,
Renato Paes Leme
Abstract:
We consider the pricing problem faced by a seller who assigns a price to a good that confers its benefits not only to its buyers, but also to other individuals around them. For example, a snow-blower is potentially useful not only to the household that buys it, but also to others on the same street. Given that the seller is constrained to selling such a (locally) public good via individual private…
▽ More
We consider the pricing problem faced by a seller who assigns a price to a good that confers its benefits not only to its buyers, but also to other individuals around them. For example, a snow-blower is potentially useful not only to the household that buys it, but also to others on the same street. Given that the seller is constrained to selling such a (locally) public good via individual private sales, how should he set his prices given the distribution of values held by the agents?
We study this problem as a two-stage game. In the first stage, the seller chooses and announces a price for the product. In the second stage, the agents (each having a private value for the good) decide simultaneously whether or not they will buy the product. In the resulting game, which can exhibit a multiplicity of equilibria, agents must strategize about whether they will themselves purchase the good to receive its benefits.
In the case of a fully public good (where all agents benefit whenever any agent purchases), we describe a pricing mechanism that is approximately revenue-optimal (up to a constant factor) when values are drawn from a regular distribution. We then study settings in which the good is only "locally" public: agents are arranged in a network and share benefits only with their neighbors. We describe a pricing method that approximately maximizes revenue, in the worst case over equilibria of agent behavior, for any $d$-regular network. Finally, we show that approximately optimal prices can be found for general networks in the special case that private values are drawn from a uniform distribution. We also discuss some barriers to extending these results to general networks and regular distributions.
△ Less
Submitted 1 May, 2013;
originally announced May 2013.
-
Selection and Influence in Cultural Dynamics
Authors:
David Kempe,
Jon Kleinberg,
Sigal Oren,
Aleksandrs Slivkins
Abstract:
One of the fundamental principles driving diversity or homogeneity in domains such as cultural differentiation, political affiliation, and product adoption is the tension between two forces: influence (the tendency of people to become similar to others they interact with) and selection (the tendency to be affected most by the behavior of others who are already similar). Influence tends to promote…
▽ More
One of the fundamental principles driving diversity or homogeneity in domains such as cultural differentiation, political affiliation, and product adoption is the tension between two forces: influence (the tendency of people to become similar to others they interact with) and selection (the tendency to be affected most by the behavior of others who are already similar). Influence tends to promote homogeneity within a society, while selection frequently causes fragmentation. When both forces act simultaneously, it becomes an interesting question to analyze which societal outcomes should be expected.
To study this issue more formally, we analyze a natural stylized model built upon active lines of work in political opinion formation, cultural diversity, and language evolution. We assume that the population is partitioned into "types" according to some traits (such as language spoken or political affiliation). While all types of people interact with one another, only people with sufficiently similar types can possibly influence one another. The "similarity" is captured by a graph on types in which individuals of the same or adjacent types can influence one another. We achieve an essentially complete characterization of (stable) equilibrium outcomes and prove convergence from all starting states. We also consider generalizations of this model.
△ Less
Submitted 27 October, 2015; v1 submitted 28 April, 2013;
originally announced April 2013.
-
Bayesian Auctions with Friends and Foes
Authors:
Po-An Chen,
David Kempe
Abstract:
We study auctions whose bidders are embedded in a social or economic network. As a result, even bidders who do not win the auction themselves might derive utility from the auction, namely, when a friend wins. On the other hand, when an enemy or competitor wins, a bidder might derive negative utility. Such spite and altruism will alter the bidding strategies. A simple and natural model for bidders'…
▽ More
We study auctions whose bidders are embedded in a social or economic network. As a result, even bidders who do not win the auction themselves might derive utility from the auction, namely, when a friend wins. On the other hand, when an enemy or competitor wins, a bidder might derive negative utility. Such spite and altruism will alter the bidding strategies. A simple and natural model for bidders' utilities in these settings posits that the utility of a losing bidder i as a result of bidder j winning is a constant (positive or negative) fraction of bidder j's utility.
△ Less
Submitted 11 April, 2012; v1 submitted 27 March, 2012;
originally announced March 2012.
-
Low-distortion Inference of Latent Similarities from a Multiplex Social Network
Authors:
Ittai Abraham,
Shiri Chechik,
David Kempe,
Aleksandrs Slivkins
Abstract:
Much of social network analysis is - implicitly or explicitly - predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Thus, an observed social network provides a noisy signal about the latent underlying "social space:" the way in which individuals are similar or dissimilar. Many research questions frequently addressed via social network analysis…
▽ More
Much of social network analysis is - implicitly or explicitly - predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Thus, an observed social network provides a noisy signal about the latent underlying "social space:" the way in which individuals are similar or dissimilar. Many research questions frequently addressed via social network analysis are in reality questions about this social space, raising the question of inverting the process: Given a social network, how accurately can we reconstruct the social structure of similarities and dissimilarities?
We begin to address this problem formally. Observed social networks are usually multiplex, in the sense that they reflect (dis)similarities in several different "categories," such as geographical proximity, kinship, or similarity of professions/hobbies. We assume that each such category is characterized by a latent metric capturing (dis)similarities in this category. Each category gives rise to a separate social network: a random graph parameterized by this metric. For a concrete model, we consider Kleinberg's small world model and some variations thereof. The observed social network is the unlabeled union of these graphs, i.e., the presence or absence of edges can be observed, but not their origins. Our main result is an algorithm which reconstructs each metric with provably low distortion.
△ Less
Submitted 14 August, 2014; v1 submitted 4 February, 2012;
originally announced February 2012.
-
The Robust Price of Anarchy of Altruistic Games
Authors:
Po-An Chen,
Bart de Keijzer,
David Kempe,
Guido Schaefer
Abstract:
We study the inefficiency of equilibria for various classes of games when players are (partially) altruistic. We model altruistic behavior by assuming that player i's perceived cost is a convex combination of 1-α_i times his direct cost and α_i times the social cost. Tuning the parameters α_i allows smooth interpolation between purely selfish and purely altruistic behavior. Within this framework,…
▽ More
We study the inefficiency of equilibria for various classes of games when players are (partially) altruistic. We model altruistic behavior by assuming that player i's perceived cost is a convex combination of 1-α_i times his direct cost and α_i times the social cost. Tuning the parameters α_i allows smooth interpolation between purely selfish and purely altruistic behavior. Within this framework, we study altruistic extensions of linear congestion games, fair cost-sharing games and valid utility games.
We derive (tight) bounds on the price of anarchy of these games for several solution concepts. Thereto, we suitably adapt the smoothness notion introduced by Roughgarden and show that it captures the essential properties to determine the robust price of anarchy of these games. Our bounds show that for congestion games and cost-sharing games, the worst-case robust price of anarchy increases with increasing altruism, while for valid utility games, it remains constant and is not affected by altruism. However, the increase in the price of anarchy is not a universal phenomenon: for symmetric singleton linear congestion games, we derive a bound on the pure price of anarchy that decreases as the level of altruism increases. Since the bound is also strictly lower than the robust price of anarchy, it exhibits a natural example in which Nash equilibria are more efficient than more permissive notions of equilibrium.
△ Less
Submitted 20 February, 2013; v1 submitted 15 December, 2011;
originally announced December 2011.
-
You Share, I Share: Network Effects and Economic Incentives in P2P File-Sharing Systems
Authors:
Mahyar Salek,
Shahin Shayandeh,
David Kempe
Abstract:
We study the interaction between network effects and external incentives on file sharing behavior in Peer-to-Peer (P2P) networks. Many current or envisioned P2P networks reward individuals for sharing files, via financial incentives or social recognition. Peers weigh this reward against the cost of sharing incurred when others download the shared file. As a result, if other nearby nodes share file…
▽ More
We study the interaction between network effects and external incentives on file sharing behavior in Peer-to-Peer (P2P) networks. Many current or envisioned P2P networks reward individuals for sharing files, via financial incentives or social recognition. Peers weigh this reward against the cost of sharing incurred when others download the shared file. As a result, if other nearby nodes share files as well, the cost to an individual node decreases. Such positive network sharing effects can be expected to increase the rate of peers who share files.
In this paper, we formulate a natural model for the network effects of sharing behavior, which we term the "demand model." We prove that the model has desirable diminishing returns properties, meaning that the network benefit of increasing payments decreases when the payments are already high. This result holds quite generally, for submodular objective functions on the part of the network operator.
In fact, we show a stronger result: the demand model leads to a "coverage process," meaning that there is a distribution over graphs such that reachability under this distribution exactly captures the joint distribution of nodes which end up sharing. The existence of such distributions has advantages in simulating and estimating the performance of the system. We establish this result via a general theorem characterizing which types of models lead to coverage processes, and also show that all coverage processes possess the desirable submodular properties. We complement our theoretical results with experiments on several real-world P2P topologies. We compare our model quantitatively against more naïve models ignoring network effects. A main outcome of the experiments is that a good incentive scheme should make the reward dependent on a node's degree in the network.
△ Less
Submitted 3 August, 2011; v1 submitted 27 July, 2011;
originally announced July 2011.
-
False-name-proof Mechanisms for Hiring a Team
Authors:
Atsushi Iwasaki,
David Kempe,
Mahyar Salek,
Makoto Yokoo
Abstract:
We study the problem of hiring a team of selfish agents to perform a task. Each agent is assumed to own one or more elements of a set system, and the auctioneer is trying to purchase a feasible solution by conducting an auction. Our goal is to design auctions that are truthful and false-name-proof, meaning that it is in the agents' best interest to reveal ownership of all elements (which may not b…
▽ More
We study the problem of hiring a team of selfish agents to perform a task. Each agent is assumed to own one or more elements of a set system, and the auctioneer is trying to purchase a feasible solution by conducting an auction. Our goal is to design auctions that are truthful and false-name-proof, meaning that it is in the agents' best interest to reveal ownership of all elements (which may not be known to the auctioneer a priori) as well as their true incurred costs.
We first propose and analyze a false-name-proof mechanism for the special case where each agent owns only one element in reality, but may pretend that this element is in fact a set of multiple elements. We prove that its frugality ratio is bounded by $2^n$, which, up to constants, matches a lower bound of $Ω(2^n)$ for all false-name-proof mechanisms in this scenario. We then propose a second mechanism for the general case in which agents may own multiple elements. It requires the auctioneer to choose a reserve cost a priori, and thus does not always purchase a solution. In return, it is false-name-proof even when agents own multiple elements. We experimentally evaluate the payment (as well as social surplus) of the second mechanism through simulation.
△ Less
Submitted 12 June, 2011;
originally announced June 2011.
-
Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection
Authors:
Abhimanyu Das,
David Kempe
Abstract:
We study the problem of selecting a subset of k random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used greedy heuristics, using insights from the maximization of submodular functions and spectral analysis.…
▽ More
We study the problem of selecting a subset of k random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used greedy heuristics, using insights from the maximization of submodular functions and spectral analysis. We introduce the submodularity ratio as a key quantity to help understand why greedy algorithms perform well even when the variables are highly correlated. Using our techniques, we obtain the strongest known approximation guarantees for this problem, both in terms of the submodularity ratio and the smallest k-sparse eigenvalue of the covariance matrix. We further demonstrate the wide applicability of our techniques by analyzing greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on real-world and synthetic data sets; the experiments show that the submodularity ratio is a stronger predictor of the performance of greedy algorithms than other spectral parameters.
△ Less
Submitted 24 February, 2011; v1 submitted 19 February, 2011;
originally announced February 2011.
-
Estimating the Average of a Lipschitz-Continuous Function from One Sample
Authors:
Abhimanyu Das,
David Kempe
Abstract:
We study the problem of estimating the average of a Lipschitz continuous function $f$ defined over a metric space, by querying $f$ at only a single point. More specifically, we explore the role of randomness in drawing this sample. Our goal is to find a distribution minimizing the expected estimation error against an adversarially chosen Lipschitz continuous function. Our work falls into the broad…
▽ More
We study the problem of estimating the average of a Lipschitz continuous function $f$ defined over a metric space, by querying $f$ at only a single point. More specifically, we explore the role of randomness in drawing this sample. Our goal is to find a distribution minimizing the expected estimation error against an adversarially chosen Lipschitz continuous function. Our work falls into the broad class of estimating aggregate statistics of a function from a small number of carefully chosen samples. The general problem has a wide range of practical applications in areas as diverse as sensor networks, social sciences and numerical analysis. However, traditional work in numerical analysis has focused on asymptotic bounds, whereas we are interested in the \emph{best} algorithm. For arbitrary discrete metric spaces of bounded doubling dimension, we obtain a PTAS for this problem. In the special case when the points lie on a line, the running time improves to an FPTAS. Both algorithms are based on approximately solving a linear program with an infinite set of constraints, by using an approximate separation oracle. For Lipschitz-continuous functions over $[0,1]$, we calculate the precise achievable error as $1-\frac{\sqrt{3}}{2} \approx 0.134$, which improves upon the \quarter which is best possible for deterministic algorithms.
△ Less
Submitted 19 January, 2011;
originally announced January 2011.
-
Frugal and Truthful Auctions for Vertex Covers, Flows, and Cuts
Authors:
David Kempe,
Mahyar Salek,
Cristopher Moore
Abstract:
We study truthful mechanisms for hiring a team of agents in three classes of set systems: Vertex Cover auctions, k-flow auctions, and cut auctions. For Vertex Cover auctions, the vertices are owned by selfish and rational agents, and the auctioneer wants to purchase a vertex cover from them. For k-flow auctions, the edges are owned by the agents, and the auctioneer wants to purchase k edge-disjoin…
▽ More
We study truthful mechanisms for hiring a team of agents in three classes of set systems: Vertex Cover auctions, k-flow auctions, and cut auctions. For Vertex Cover auctions, the vertices are owned by selfish and rational agents, and the auctioneer wants to purchase a vertex cover from them. For k-flow auctions, the edges are owned by the agents, and the auctioneer wants to purchase k edge-disjoint s-t paths, for given s and t. In the same setting, for cut auctions, the auctioneer wants to purchase an s-t cut. Only the agents know their costs, and the auctioneer needs to select a feasible set and payments based on bids made by the agents.
We present constant-competitive truthful mechanisms for all three set systems. That is, the maximum overpayment of the mechanism is within a constant factor of the maximum overpayment of any truthful mechanism, for every set system in the class. The mechanism for Vertex Cover is based on scaling each bid by a multiplier derived from the dominant eigenvector of a certain matrix. The mechanism for k-flows prunes the graph to be minimally (k+1)-connected, and then applies the Vertex Cover mechanism. Similarly, the mechanism for cuts contracts the graph until all s-t paths have length exactly 2, and then applies the Vertex Cover mechanism.
△ Less
Submitted 13 June, 2011; v1 submitted 16 December, 2009;
originally announced December 2009.
-
On the Bias of Traceroute Sampling; or, Power-law Degree Distributions in Regular Graphs
Authors:
Dimitris Achlioptas,
Aaron Clauset,
David Kempe,
Cristopher Moore
Abstract:
Understanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot be explicitly queried. Instead, empirical studies rely on traceroutes to build what are essentially single-source, all-destinations, shortest-path…
▽ More
Understanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot be explicitly queried. Instead, empirical studies rely on traceroutes to build what are essentially single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a recent paper by Lakhina et al. found empirically that the resuting sample is intrinsically biased. For instance, the observed degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.
In this paper, we study the bias of traceroute sampling systematically, and, for a very general class of underlying degree distributions, calculate the likely observed distributions explicitly. To do this, we use a continuous-time realization of the process of exposing the BFS tree of a random graph with a given degree distribution, calculate the expected degree distribution of the tree, and show that it is sharply concentrated. As example applications of our machinery, we show how traceroute sampling finds power-law degree distributions in both delta-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.
△ Less
Submitted 29 March, 2006; v1 submitted 3 March, 2005;
originally announced March 2005.