-
Minimax Optimal Submodular Optimization with Bandit Feedback
Authors:
Artin Tajdini,
Lalit Jain,
Kevin Jamieson
Abstract:
We consider maximizing a monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + η_t$ where $η_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret over…
▽ More
We consider maximizing a monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + η_t$ where $η_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret over $T$ times with respect to ($1-e^{-1}$)-approximation of maximum $f(S_*)$ with $|S_*| = k$, obtained through greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable. In this work, we establish the first minimax lower bound for this setting that scales like $\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T}))$. Moreover, we propose an algorithm that is capable of matching the lower bound regret.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Studying Large Language Model Generalization with Influence Functions
Authors:
Roger Grosse,
Juhan Bae,
Cem Anil,
Nelson Elhage,
Alex Tamkin,
Amirhossein Tajdini,
Benoit Steiner,
Dustin Li,
Esin Durmus,
Ethan Perez,
Evan Hubinger,
Kamilė Lukošiūtė,
Karina Nguyen,
Nicholas Joseph,
Sam McCandlish,
Jared Kaplan,
Samuel R. Bowman
Abstract:
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?…
▽ More
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Time and Query Optimal Quantum Algorithms Based on Decision Trees
Authors:
Salman Beigi,
Leila Taghavi,
Artin Tajdini
Abstract:
It has recently been shown that starting with a classical query algorithm (decision tree) and a guessing algorithm that tries to predict the query answers, we can design a quantum algorithm with query complexity $O(\sqrt{GT})$ where $T$ is the query complexity of the classical algorithm (depth of the decision tree) and $G$ is the maximum number of wrong answers by the guessing algorithm [arXiv:141…
▽ More
It has recently been shown that starting with a classical query algorithm (decision tree) and a guessing algorithm that tries to predict the query answers, we can design a quantum algorithm with query complexity $O(\sqrt{GT})$ where $T$ is the query complexity of the classical algorithm (depth of the decision tree) and $G$ is the maximum number of wrong answers by the guessing algorithm [arXiv:1410.0932, arXiv:1905.13095]. In this paper we show that, given some constraints on the classical algorithms, this quantum algorithm can be implemented in time $\tilde O(\sqrt{GT})$. Our algorithm is based on non-binary span programs and their efficient implementation. We conclude that various graph theoretic problems including bipartiteness, cycle detection and topological sort can be solved in time $O(n^{3/2}\log n)$ and with $O(n^{3/2})$ quantum queries. Moreover, finding a maximal matching can be solved with $O(n^{3/2})$ quantum queries in time $O(n^{3/2}\log n)$, and maximum bipartite matching can be solved in time $O(n^2\log n)$.
△ Less
Submitted 16 October, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
On a question of Haemers regarding vectors in the nullspace of Seidel matrices
Authors:
Saieed Akbari,
Sebastian M. Cioabă,
Samira Goudarzi,
Aidin Niaparast,
Artin Tajdini
Abstract:
In 2011, Haemers asked the following question: If $S$ is the Seidel matrix of a graph of order $n$ and $S$ is singular, does there exist an eigenvector of $S$ corresponding to $0$ which has only $\pm 1$ elements?
In this paper, we construct infinite families of graphs which give a negative answer to this question. One of our constructions implies that for every natural number $N$, there exists a…
▽ More
In 2011, Haemers asked the following question: If $S$ is the Seidel matrix of a graph of order $n$ and $S$ is singular, does there exist an eigenvector of $S$ corresponding to $0$ which has only $\pm 1$ elements?
In this paper, we construct infinite families of graphs which give a negative answer to this question. One of our constructions implies that for every natural number $N$, there exists a graph whose Seidel matrix $S$ is singular such that for any integer vector in the nullspace of $S$, the absolute value of any entry in this vector is more than $N$. We also derive some characteristics of vectors in the nullspace of Seidel matrices, which lead to some necessary conditions for the singularity of Seidel matrices. Finally, we obtain some properties of the graphs which affirm the above question.
△ Less
Submitted 21 January, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.