-
Unsupervised Mapping of Arguments of Deverbal Nouns to Their Corresponding Verbal Labels
Authors:
Aviv Weinstein,
Yoav Goldberg
Abstract:
Deverbal nouns are nominal forms of verbs commonly used in written English texts to describe events or actions, as well as their arguments. However, many NLP systems, and in particular pattern-based ones, neglect to handle such nominalized constructions. The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation and require semantic ontologies,…
▽ More
Deverbal nouns are nominal forms of verbs commonly used in written English texts to describe events or actions, as well as their arguments. However, many NLP systems, and in particular pattern-based ones, neglect to handle such nominalized constructions. The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation and require semantic ontologies, making their applications restricted to a small set of nouns. We propose to adopt instead a more syntactic approach, which maps the arguments of deverbal nouns to the universal-dependency relations of the corresponding verbal construction. We present an unsupervised mechanism -- based on contextualized word representations -- which allows to enrich universal-dependency trees with dependency arcs denoting arguments of deverbal nouns, using the same labels as the corresponding verbal cases. By sharing the same label set as in the verbal case, patterns that were developed for verbs can be applied without modification but with high accuracy also to the nominal constructions.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
HCMD-zero: Learning Value Aligned Mechanisms from Data
Authors:
Jan Balaguer,
Raphael Koster,
Ari Weinstein,
Lucy Campbell-Gillingham,
Christopher Summerfield,
Matthew Botvinick,
Andrea Tacchetti
Abstract:
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a…
▽ More
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms), or on what the goal of the mechanism should be (e.g. total welfare). Here we introduce HCMD-zero, a general purpose method to construct mechanisms making none of these three assumptions. HCMD-zero learns to mediate interactions among participants and adjusts the mechanism parameters to make itself more likely to be preferred by participants. It does so by remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. We test our method on a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free ride. HCMD-zero produces a mechanism that is preferred by human participants over a strong baseline, it does so automatically, without requiring prior knowledge, and using human behavioral trajectories sparingly and effectively. Our analysis shows HCMD-zero consistently makes the mechanism policy more and more likely to be preferred by human participants over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.
△ Less
Submitted 20 May, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Human-centered mechanism design with Democratic AI
Authors:
Raphael Koster,
Jan Balaguer,
Andrea Tacchetti,
Ari Weinstein,
Tina Zhu,
Oliver Hauser,
Duncan Williams,
Lucy Campbell-Gillingham,
Phoebe Thacker,
Matthew Botvinick,
Christopher Summerfield
Abstract:
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share…
▽ More
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders, and successfully won the majority vote. By optimizing for human preferences, Democratic AI may be a promising method for value-aligned policy innovation.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Probing Physics Knowledge Using Tools from Developmental Psychology
Authors:
Luis Piloto,
Ari Weinstein,
Dhruva TB,
Arun Ahuja,
Mehdi Mirza,
Greg Wayne,
David Amos,
Chia-chun Hung,
Matt Botvinick
Abstract:
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract ge…
▽ More
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract general physical concepts directly from sensory data. In the latter case, one challenge that arises is evaluating the learning system. Research on intuitive physics knowledge in children has long employed a violation of expectations (VOE) method to assess children's mastery of specific physical concepts. We take the novel step of applying this method to artificial learning systems. In addition to introducing the VOE technique, we describe a set of probe datasets inspired by classic test stimuli from developmental psychology. We test a baseline deep learning system on this battery, as well as on a physics learning dataset ("IntPhys") recently posed by another research group. Our results show how the VOE technique may provide a useful tool for tracking physics knowledge in future research.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Structure Learning in Motor Control:A Deep Reinforcement Learning Model
Authors:
Ari Weinstein,
Matthew M. Botvinick
Abstract:
Motor adaptation displays a structure-learning effect: adaptation to a new perturbation occurs more quickly when the subject has prior exposure to perturbations with related structure. Although this `learning-to-learn' effect is well documented, its underlying computational mechanisms are poorly understood. We present a new model of motor structure learning, approaching it from the point of view o…
▽ More
Motor adaptation displays a structure-learning effect: adaptation to a new perturbation occurs more quickly when the subject has prior exposure to perturbations with related structure. Although this `learning-to-learn' effect is well documented, its underlying computational mechanisms are poorly understood. We present a new model of motor structure learning, approaching it from the point of view of deep reinforcement learning. Previous work outside of motor control has shown how recurrent neural networks can account for learning-to-learn effects. We leverage this insight to address motor learning, by importing it into the setting of model-based reinforcement learning. We apply the resulting processing architecture to empirical findings from a landmark study of structure learning in target-directed reaching (Braun et al., 2009), and discuss its implications for a wider range of learning-to-learn phenomena.
△ Less
Submitted 13 July, 2017; v1 submitted 21 June, 2017;
originally announced June 2017.
-
Inequalities for the Bayes Risk
Authors:
Asaf Weinstein,
Ehud Weinstein
Abstract:
Several inequalities are presented which, in part, generalize inequalities by Weinstein and Weiss, giving rise to new lower bounds for the Bayes risk under squared error loss.
Several inequalities are presented which, in part, generalize inequalities by Weinstein and Weiss, giving rise to new lower bounds for the Bayes risk under squared error loss.
△ Less
Submitted 21 January, 2014;
originally announced January 2014.
-
On active and passive testing
Authors:
Noga Alon,
Rani Hod,
Amit Weinstein
Abstract:
Given a property of Boolean functions, what is the minimum number of queries required to determine with high probability if an input function satisfies this property or is "far" from satisfying it? This is a fundamental question in Property Testing, where traditionally the testing algorithm is allowed to pick its queries among the entire set of inputs. Balcan, Blais, Blum and Yang have recently su…
▽ More
Given a property of Boolean functions, what is the minimum number of queries required to determine with high probability if an input function satisfies this property or is "far" from satisfying it? This is a fundamental question in Property Testing, where traditionally the testing algorithm is allowed to pick its queries among the entire set of inputs. Balcan, Blais, Blum and Yang have recently suggested to restrict the tester to take its queries from a smaller random subset of polynomial size of the inputs. This model is called active testing, and in the extreme case when the size of the set we can query from is exactly the number of queries performed it is known as passive testing.
We prove that passive or active testing of k-linear functions (that is, sums of k variables among n over Z_2) requires Theta(k*log n) queries, assuming k is not too large. This extends the case k=1, (that is, dictator functions), analyzed by Balcan et. al.
We also consider other classes of functions including low degree polynomials, juntas, and partially symmetric functions. Our methods combine algebraic, combinatorial, and probabilistic techniques, including the Talagrand concentration inequality and the Erdos--Rado theorem on Delta-systems.
△ Less
Submitted 15 November, 2015; v1 submitted 28 July, 2013;
originally announced July 2013.
-
Local Correction with Constant Error Rate
Authors:
Noga Alon,
Amit Weinstein
Abstract:
A Boolean function f of n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma(x)=f_sigma(x_1, ..., x_n) = f(x_sigma(1), ..., x_sigma(n)) of f, we can compute f_sigma(x) for any x in {0,1}^n with good probability using q queries to g. It is known that degree d polynomials are O(2^d)-locally correctable, and that most…
▽ More
A Boolean function f of n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma(x)=f_sigma(x_1, ..., x_n) = f(x_sigma(1), ..., x_sigma(n)) of f, we can compute f_sigma(x) for any x in {0,1}^n with good probability using q queries to g. It is known that degree d polynomials are O(2^d)-locally correctable, and that most k-juntas are O(k log k)-locally correctable, where the closeness parameter, or more precisely the distance between g and f_sigma, is required to be exponentially small (in d and k respectively).
In this work we relax the requirement for the closeness parameter by allowing the distance between the functions to be a constant. We first investigate the family of juntas, and show that almost every k-junta is O(k log^2 k)-locally correctable for any distance epsilon < 0.001. A similar result is shown for the family of partially symmetric functions, that is functions which are indifferent to any reordering of all but a constant number of their variables. For both families, the algorithms provided here use non-adaptive queries and are applicable to most but not all functions of each family (as it is shown to be impossible to locally correct all of them).
Our approach utilizes the measure of symmetric influence introduced in the recent analysis of testing partial symmetry of functions.
△ Less
Submitted 20 October, 2012;
originally announced October 2012.
-
Partially Symmetric Functions are Efficiently Isomorphism-Testable
Authors:
Eric Blais,
Amit Weinstein,
Yuichi Yoshida
Abstract:
Given a function f: {0,1}^n \to {0,1}, the f-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to f up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions f we can test f-isomorphism with a constant number of queries. Despite much rece…
▽ More
Given a function f: {0,1}^n \to {0,1}, the f-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to f up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions f we can test f-isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas.
We unify and extend these results by showing that all partially symmetric functions---functions invariant to the reordering of all but a constant number of their variables---are efficiently isomorphism-testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphism-testable.
To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly-optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families.
Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testing-by-implicit-learning approach to complete the proof that partially symmetric functions are efficiently isomorphism-testable.
△ Less
Submitted 24 December, 2011;
originally announced December 2011.
-
Recovering a Clipped Signal in Sparseland
Authors:
Alejandro J. Weinstein,
Michael B. Wakin
Abstract:
In many data acquisition systems it is common to observe signals whose amplitudes have been clipped. We present two new algorithms for recovering a clipped signal by leveraging the model assumption that the underlying signal is sparse in the frequency domain. Both algorithms employ ideas commonly used in the field of Compressive Sensing; the first is a modified version of Reweighted $\ell_1$ minim…
▽ More
In many data acquisition systems it is common to observe signals whose amplitudes have been clipped. We present two new algorithms for recovering a clipped signal by leveraging the model assumption that the underlying signal is sparse in the frequency domain. Both algorithms employ ideas commonly used in the field of Compressive Sensing; the first is a modified version of Reweighted $\ell_1$ minimization, and the second is a modification of a simple greedy algorithm known as Trivial Pursuit. An empirical investigation shows that both approaches can recover signals with significant levels of clipping
△ Less
Submitted 23 October, 2011;
originally announced October 2011.
-
Local Correction of Juntas
Authors:
Noga Alon,
Amit Weinstein
Abstract:
A Boolean function f over n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma of f, we can compute f_sigma(x) for any x in Z_2^n with good probability using q queries to g.
We observe that any k-junta, that is, any function which depends only on k of its input variables, is O(2^k)-locally correctable. Moreover, w…
▽ More
A Boolean function f over n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma of f, we can compute f_sigma(x) for any x in Z_2^n with good probability using q queries to g.
We observe that any k-junta, that is, any function which depends only on k of its input variables, is O(2^k)-locally correctable. Moreover, we show that there are examples where this is essentially best possible, and locally correcting some k-juntas requires a number of queries which is exponential in k. These examples, however, are far from being typical, and indeed we prove that for almost every k-junta, O(k log k) queries suffice.
△ Less
Submitted 24 December, 2011; v1 submitted 16 September, 2011;
originally announced September 2011.
-
Simultaneous communication in noisy channels
Authors:
Amit Weinstein
Abstract:
A sender wishes to broadcast a message of length $n$ over an alphabet to $r$ users, where each user $i$, $1 \leq i \leq r$ should be able to receive one of $m_i$ possible messages. The broadcast channel has noise for each of the users (possibly different noise for different users), who cannot distinguish between some pairs of letters. The vector $(m_1, m_2,...s, m_r)_{(n)}$ is said to be feasible…
▽ More
A sender wishes to broadcast a message of length $n$ over an alphabet to $r$ users, where each user $i$, $1 \leq i \leq r$ should be able to receive one of $m_i$ possible messages. The broadcast channel has noise for each of the users (possibly different noise for different users), who cannot distinguish between some pairs of letters. The vector $(m_1, m_2,...s, m_r)_{(n)}$ is said to be feasible if length $n$ encoding and decoding schemes exist enabling every user to decode his message. A rate vector $(R_1, R_2,..., R_r)$ is feasible if there exists a sequence of feasible vectors $(m_1, m_2,..., m_r)_{(n)}$ such that $R_i = \lim_{n \mapsto \infty} \frac {\log_2 m_i} {n}, {for all} i$. We determine the feasible rate vectors for several different scenarios and investigate some of their properties. An interesting case discussed is when one user can only distinguish between all the letters in a subset of the alphabet. Tight restrictions on the feasible rate vectors for some specific noise types for the other users are provided. The simplest non-trivial cases of two users and alphabet of size three are fully characterized. To this end a more general previously known result, to which we sketch an alternative proof, is used. This problem generalizes the study of the Shannon capacity of a graph, by considering more than a single user.
△ Less
Submitted 24 December, 2011; v1 submitted 9 May, 2010;
originally announced May 2010.
-
Broadcasting with side information
Authors:
Noga Alon,
Avinatan Hasidim,
Eyal Lubetzky,
Uri Stav,
Amit Weinstein
Abstract:
A sender holds a word x consisting of n blocks x_i, each of t bits, and wishes to broadcast a codeword to m receivers, R_1,...,R_m. Each receiver R_i is interested in one block, and has prior side information consisting of some subset of the other blocks. Let β_t be the minimum number of bits that has to be transmitted when each block is of length t, and let βbe the limit β= \lim_{t \to \infty}…
▽ More
A sender holds a word x consisting of n blocks x_i, each of t bits, and wishes to broadcast a codeword to m receivers, R_1,...,R_m. Each receiver R_i is interested in one block, and has prior side information consisting of some subset of the other blocks. Let β_t be the minimum number of bits that has to be transmitted when each block is of length t, and let βbe the limit β= \lim_{t \to \infty} β_t/t. In words, βis the average communication cost per bit in each block (for long blocks). Finding the coding rate β, for such an informed broadcast setting, generalizes several coding theoretic parameters related to Informed Source Coding on Demand, Index Coding and Network Coding.
In this work we show that usage of large data blocks may strictly improve upon the trivial encoding which treats each bit in the block independently. To this end, we provide general bounds on β_t, and prove that for any constant C there is an explicit broadcast setting in which β= 2 but β_1 > C. One of these examples answers a question of Lubetzky and Stav.
In addition, we provide examples with the following counterintuitive direct-sum phenomena. Consider a union of several mutually independent broadcast settings. The optimal code for the combined setting may yield a significant saving in communication over concatenating optimal encodings for the individual settings. This result also provides new non-linear coding schemes which improve upon the largest known gap between linear and non-linear Network Coding, thus improving the results of Dougherty, Freiling, and Zeger.
The proofs use ideas related to Witsenhausen's rate, OR graph products, colorings of Cayley graphs and the chromatic numbers of Kneser graphs.
△ Less
Submitted 19 June, 2008;
originally announced June 2008.