-
Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach
Authors:
Irina Jurenka,
Markus Kunesch,
Kevin R. McKee,
Daniel Gillick,
Shaojian Zhu,
Sara Wiltberger,
Shubham Milind Phal,
Katherine Hermann,
Daniel Kasenberg,
Avishkar Bhoopchand,
Ankit Anand,
Miruna Pîslar,
Stephanie Chan,
Lisa Wang,
Jennifer She,
Parsa Mahmoudieh,
Aliya Rysbek,
Wei-Jen Ko,
Andrea Huber,
Brett Wiltshire,
Gal Elidan,
Roni Rabin,
Jasmin Rubinovitz,
Amit Pitaru,
Mac McAllister
, et al. (49 additional authors not shown)
Abstract:
A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily…
▽ More
A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily due to the difficulties with verbalising pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, reinforced by the challenges in defining excellent pedagogy. Here we present our work collaborating with learners and educators to translate high level principles from learning science into a pragmatic set of seven diverse educational benchmarks, spanning quantitative, qualitative, automatic and human evaluations; and to develop a new set of fine-tuning datasets to improve the pedagogical capabilities of Gemini, introducing LearnLM-Tutor. Our evaluations show that LearnLM-Tutor is consistently preferred over a prompt tuned Gemini by educators and learners on a number of pedagogical dimensions. We hope that this work can serve as a first step towards developing a comprehensive educational evaluation framework, and that this can enable rapid progress within the AI and EdTech communities towards maximising the positive impact of gen AI in education.
△ Less
Submitted 19 July, 2024; v1 submitted 21 May, 2024;
originally announced July 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Task-agnostic Continual Learning with Hybrid Probabilistic Models
Authors:
Polina Kirichenko,
Mehrdad Farajtabar,
Dushyant Rao,
Balaji Lakshminarayanan,
Nir Levine,
Ang Li,
Huiyi Hu,
Andrew Gordon Wilson,
Razvan Pascanu
Abstract:
Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to lea…
▽ More
Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model's statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Neural Rate Control for Video Encoding using Imitation Learning
Authors:
Hongzi Mao,
Chenjie Gu,
Miaosen Wang,
Angie Chen,
Nevena Lazic,
Nir Levine,
Derek Pang,
Rene Claus,
Marisabel Hechtman,
Ching-Han Chiang,
Cheng Chen,
Jingning Han
Abstract:
In modern video encoders, rate control is a critical component and has been heavily engineered. It decides how many bits to spend to encode each frame, in order to optimize the rate-distortion trade-off over all video frames. This is a challenging constrained planning problem because of the complex dependency among decisions for different video frames and the bitrate constraint defined at the end…
▽ More
In modern video encoders, rate control is a critical component and has been heavily engineered. It decides how many bits to spend to encode each frame, in order to optimize the rate-distortion trade-off over all video frames. This is a challenging constrained planning problem because of the complex dependency among decisions for different video frames and the bitrate constraint defined at the end of the episode.
We formulate the rate control problem as a Partially Observable Markov Decision Process (POMDP), and apply imitation learning to learn a neural rate control policy. We demonstrate that by learning from optimal video encoding trajectories obtained through evolution strategies, our learned policy achieves better encoding efficiency and has minimal constraint violation. In addition to imitating the optimal actions, we find that additional auxiliary losses, data augmentation/refinement and inference-time policy improvements are critical for learning a good rate control policy. We evaluate the learned policy against the rate control policy in libvpx, a widely adopted open source VP9 codec library, in the two-pass variable bitrate (VBR) mode. We show that over a diverse set of real-world videos, our learned policy achieves 8.5% median bitrate reduction without sacrificing video quality.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Pricing Security in Proof-of-Work Systems
Authors:
George Bissias,
Rainer Böhme,
David Thibodeau,
Brian N. Levine
Abstract:
A key component of security in decentralized blockchains is proof of opportunity cost among block producers. In the case of proof-of-work (PoW), currently used by the most prominent systems, the cost is due to spent computation. In this paper, we characterize the security investment of miners in terms of its cost in fiat money. This enables comparison of security allocations across PoW blockchains…
▽ More
A key component of security in decentralized blockchains is proof of opportunity cost among block producers. In the case of proof-of-work (PoW), currently used by the most prominent systems, the cost is due to spent computation. In this paper, we characterize the security investment of miners in terms of its cost in fiat money. This enables comparison of security allocations across PoW blockchains that generally use different PoW algorithms and reward miners in different cryptocurrency units. We prove that there exists a unique allocation equilibrium, depending on market prices only, that is achieved by both strategic miners (who contemplate the actions of others) and by miners seeking only short-term profit. In fact, the latter will unknowingly compensate for any attempt to deliberately shift security allocation away from equilibrium.
Our conclusions are supported analytically through the development of a Markov decision process, game theoretical analysis, and derivation of no arbitrage conditions. We corroborate those results with empirical evidence from more than two years of blockchain and price data. Overall agreement is strong. We show that between January 1, 2018 and August 1, 2020, market prices predicted security allocation between Bitcoin and Bitcoin Cash with error less than 0.6%. And from the beginning of October 2019, until August 1, 2020, market prices predicted security allocation between Bitcoin and Litecoin with error of 0.45%. These results are further corroborated by our establishment of Granger-causality between change in market prices and change in security allocation.
To demonstrate the practicality of our results, we describe a trustless oracle that leverages the equilibrium to estimate the price ratios of PoW cryptocurrencies from on-chain information only.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Balancing Constraints and Rewards with Meta-Gradient D4PG
Authors:
Dan A. Calian,
Daniel J. Mankowitz,
Tom Zahavy,
Zhongwen Xu,
Junhyuk Oh,
Nir Levine,
Timothy Mann
Abstract:
Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved w…
▽ More
Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present a soft-constrained RL approach that utilizes meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of this approach by showing that it consistently outperforms the baselines across four different MuJoCo domains.
△ Less
Submitted 27 November, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
A maximum-entropy approach to off-policy evaluation in average-reward MDPs
Authors:
Nevena Lazic,
Dong Yin,
Mehrdad Farajtabar,
Nir Levine,
Dilan Gorur,
Chris Harris,
Dale Schuurmans
Abstract:
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, wh…
▽ More
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint
Authors:
Dong Yin,
Mehrdad Farajtabar,
Ang Li,
Nir Levine,
Alex Mott
Abstract:
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary…
▽ More
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings.
△ Less
Submitted 8 February, 2021; v1 submitted 19 June, 2020;
originally announced June 2020.
-
An empirical investigation of the challenges of real-world reinforcement learning
Authors:
Gabriel Dulac-Arnold,
Nir Levine,
Daniel J. Mankowitz,
Jerry Li,
Cosmin Paduraru,
Sven Gowal,
Todd Hester
Abstract:
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the di…
▽ More
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called the realworldrl-suite which we propose an as an open-source benchmark.
△ Less
Submitted 4 March, 2021; v1 submitted 24 March, 2020;
originally announced March 2020.
-
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Authors:
Nir Levine,
Yinlam Chow,
Rui Shu,
Ang Li,
Mohammad Ghavamzadeh,
Hung Bui
Abstract:
Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to lea…
▽ More
Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to learn a representation that is amenable to existing control algorithms? In this paper, we focus on learning representations for locally-linear control algorithms, such as iterative LQR (iLQR). By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a loss function that consists of three terms: prediction, consistency, and curvature (PCC). Crucially, to make PCC tractable, we derive an amortized variational bound for the PCC loss function. Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control.
△ Less
Submitted 10 February, 2020; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Greedy but Cautious: Conditions for Miner Convergence to Resource Allocation Equilibrium
Authors:
George Bissias,
Brian N. Levine,
David Thibodeau
Abstract:
All public blockchains are secured by a proof of opportunity cost among block producers. For example, the security offered by proof-of-work (PoW) systems, like Bitcoin, is due to spent computation; it is work precisely because it cannot be performed for free. In general, more resources provably lost in producing blocks yields more security for the blockchain. When two blockchains share the same me…
▽ More
All public blockchains are secured by a proof of opportunity cost among block producers. For example, the security offered by proof-of-work (PoW) systems, like Bitcoin, is due to spent computation; it is work precisely because it cannot be performed for free. In general, more resources provably lost in producing blocks yields more security for the blockchain. When two blockchains share the same mechanism for providing opportunity cost, as is the case when they share the same PoW algorithm, the two chains compete for resources from block producers. Indeed, if there exists a liquid market between resource types, then theoretically all blockchains will compete for resources. In this paper, we show that there exists a resource allocation equilibrium between any two blockchains, which is essentially driven by the fiat value of reward that each chain offers in return for providing security. We go on to prove that this equilibrium is singular and always achieved provided that block producers behave in a greedy, but cautious fashion. The opposite is true when they are overly greedy: resource allocation oscillates in extremes between the two chains. We show that these results hold both in practice and in a block generation simulation. Finally, we demonstrate several applications of this theory including a trustless price-ratio oracle, increased security for blockchains whose coins have lower fiat value, and a quantification of cost to allocating resources away from the equilibrium.
△ Less
Submitted 26 August, 2019; v1 submitted 19 July, 2019;
originally announced July 2019.
-
Bonded Mining: Difficulty Adjustment by Miner Commitment
Authors:
George Bissias,
David Thibodeau,
Brian N. Levine
Abstract:
Proof-of-work blockchains must implement a difficulty adjustment algorithm (DAA) in order to maintain a consistent inter-arrival time between blocks. Conventional DAAs are essentially feedback controllers, and as such, they are inherently reactive. This approach leaves them susceptible to manipulation and often causes them to either under- or over-correct. We present Bonded Mining, a proactive DAA…
▽ More
Proof-of-work blockchains must implement a difficulty adjustment algorithm (DAA) in order to maintain a consistent inter-arrival time between blocks. Conventional DAAs are essentially feedback controllers, and as such, they are inherently reactive. This approach leaves them susceptible to manipulation and often causes them to either under- or over-correct. We present Bonded Mining, a proactive DAA that works by collecting hash rate commitments secured by bond from miners. The difficulty is set directly from the commitments and the bond is used to penalize miners who deviate from their commitment. We devise a statistical test that is capable of detecting hash rate deviations by utilizing only on-blockchain data. The test is sensitive enough to detect a variety of deviations from commitments, while almost never misclassifying honest miners. We demonstrate in simulation that, under reasonable assumptions, Bonded Mining is more effective at maintaining a target block time than the Bitcoin Cash DAA, one of the newest and most dynamic DAAs currently deployed. In this preliminary work, the lowest hash rate miner our approach supports is 1% of the total and we directly consider only two types of fundamental attacks. Future work will address these limitations.
△ Less
Submitted 5 August, 2019; v1 submitted 29 June, 2019;
originally announced July 2019.
-
Robust Reinforcement Learning for Continuous Control with Model Misspecification
Authors:
Daniel J. Mankowitz,
Nir Levine,
Rae Jeong,
Yuanyuan Shi,
Jackie Kay,
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Timothy Mann,
Todd Hester,
Martin Riedmiller
Abstract:
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a…
▽ More
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework. This includes an adaptation to another continuous control RL algorithm as well as learning the uncertainty set from offline data. Performance videos can be found online at https://sites.google.com/view/robust-rl.
△ Less
Submitted 11 February, 2020; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Improved Knowledge Distillation via Teacher Assistant
Authors:
Seyed-Iman Mirzadeh,
Mehrdad Farajtabar,
Ang Li,
Nir Levine,
Akihiro Matsukawa,
Hassan Ghasemzadeh
Abstract:
Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However,…
▽ More
Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.
△ Less
Submitted 16 December, 2019; v1 submitted 9 February, 2019;
originally announced February 2019.
-
Using Economic Risk to Model Miner Hash Rate Allocation in Cryptocurrencies
Authors:
George Bissias,
Brian N. Levine,
David Thibodeau
Abstract:
Abrupt changes in the miner hash rate applied to a proof-of-work (PoW) blockchain can adversely affect user experience and security. Because different PoW blockchains often share hashing algorithms, miners face a complex choice in deciding how to allocate their hash power among chains. We present an economic model that leverages Modern Portfolio Theory to predict a miner's allocation over time usi…
▽ More
Abrupt changes in the miner hash rate applied to a proof-of-work (PoW) blockchain can adversely affect user experience and security. Because different PoW blockchains often share hashing algorithms, miners face a complex choice in deciding how to allocate their hash power among chains. We present an economic model that leverages Modern Portfolio Theory to predict a miner's allocation over time using price data and inferred risk tolerance. The model matches actual allocations with mean absolute error within 20% for four out of the top five miners active on both Bitcoin (BTC) and Bitcoin Cash (BCH) blockchains. A model of aggregate allocation across those four miners shows excellent agreement in magnitude with the actual aggregate as well a correlation coefficient of 0.649. The accuracy of the aggregate allocation model is also sufficient to explain major historical changes in inter-block time (IBT) for BCH. Because estimates of miner risk are not time-dependent and our model is otherwise price-driven, we are able to use it to anticipate the effect of a major price shock on hash allocation and IBT in the BCH blockchain. Using a Monte Carlo simulation, we show that, despite mitigation by the new difficulty adjustment algorithm, a price drop of 50% could increase the IBT by 50% for at least a day, with a peak delay of 100%.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Bobtail: A Proof-of-Work Target that Minimizes Blockchain Mining Variance (Draft)
Authors:
George Bissias,
Brian Neil Levine
Abstract:
Blockchain systems are designed to produce blocks at a constant average rate. The most popular systems currently employ a Proof of Work (PoW) algorithm as a means of creating these blocks. Bitcoin produces, on average, one block every 10 minutes. An unfortunate limitation of all deployed PoW blockchain systems is that the time between blocks has high variance. For example, 5% of the time, Bitcoin'…
▽ More
Blockchain systems are designed to produce blocks at a constant average rate. The most popular systems currently employ a Proof of Work (PoW) algorithm as a means of creating these blocks. Bitcoin produces, on average, one block every 10 minutes. An unfortunate limitation of all deployed PoW blockchain systems is that the time between blocks has high variance. For example, 5% of the time, Bitcoin's inter-block time is at least 40 minutes. This variance impedes the consistent flow of validated transactions through the system. We propose an alternative process for PoW-based block discovery that results in an inter-block time with significantly lower variance. Our algorithm, called Bobtail, generalizes the current algorithm by comparing the mean of the k lowest order statistics to a target. We show that the variance of inter-block times decreases as k increases. If our approach were applied to Bitcoin, about 80% of blocks would be found within 7 to 12 minutes, and nearly every block would be found within 5 to 18 minutes; the average inter-block time would remain at 10 minutes. Further, we show that low-variance mining significantly thwarts doublespend and selfish mining attacks. For Bitcoin and Ethereum currently (k=1), an attacker with 40% of the mining power will succeed with 30% probability when the merchant sets up an embargo of 8 blocks; however, when k>=20, the probability of success falls to less than 1%. Similarly, for Bitcoin and Ethereum currently, a selfish miner with 40% of the mining power will claim about 66% of blocks; however, when k>=5, the same miner will find that selfish mining is less successful than honest mining. The cost of our approach is a larger block header.
△ Less
Submitted 13 August, 2019; v1 submitted 25 September, 2017;
originally announced September 2017.
-
An Extended Relevance Model for Session Search
Authors:
Nir Levine,
Haggai Roitman,
Doron Cohen
Abstract:
The session search task aims at best serving the user's information need given her previous search behavior during the session. We propose an extended relevance model that captures the user's dynamic information need in the session. Our relevance modelling approach is directly driven by the user's query reformulation (change) decisions and the estimate of how much the user's search behavior affect…
▽ More
The session search task aims at best serving the user's information need given her previous search behavior during the session. We propose an extended relevance model that captures the user's dynamic information need in the session. Our relevance modelling approach is directly driven by the user's query reformulation (change) decisions and the estimate of how much the user's search behavior affects such decisions. Overall, we demonstrate that, the proposed approach significantly boosts session search performance.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Shallow Updates for Deep Reinforcement Learning
Authors:
Nir Levine,
Tom Zahavy,
Daniel J. Mankowitz,
Aviv Tamar,
Shie Mannor
Abstract:
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the ot…
▽ More
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.
△ Less
Submitted 2 November, 2017; v1 submitted 21 May, 2017;
originally announced May 2017.
-
Rotting Bandits
Authors:
Nir Levine,
Koby Crammer,
Shie Mannor
Abstract:
The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has b…
▽ More
The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms' rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm's expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and derive theoretical guarantees.
△ Less
Submitted 2 November, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
An Explanation of Nakamoto's Analysis of Double-spend Attacks
Authors:
A. Pinar Ozisik,
Brian Neil Levine
Abstract:
The fundamental attack against blockchain systems is the double-spend attack. In this tutorial, we provide a very detailed explanation of just one section of Satoshi Nakamoto's original paper where the attack's probability of success is stated. We show the derivation of the mathematics relied upon by Nakamoto to create a model of the attack. We also validate the model with a Monte Carlo simulation…
▽ More
The fundamental attack against blockchain systems is the double-spend attack. In this tutorial, we provide a very detailed explanation of just one section of Satoshi Nakamoto's original paper where the attack's probability of success is stated. We show the derivation of the mathematics relied upon by Nakamoto to create a model of the attack. We also validate the model with a Monte Carlo simulation, and we determine which model component is not perfect.
△ Less
Submitted 14 January, 2017;
originally announced January 2017.
-
An Analysis of Attacks on Blockchain Consensus
Authors:
George Bissias,
Brian Neil Levine,
A. Pinar Ozisik,
Gavin Andresen
Abstract:
We present and validate a novel mathematical model of the blockchain mining process and use it to conduct an economic evaluation of the double-spend attack, which is fundamental to all blockchain systems. Our analysis focuses on the value of transactions that can be secured under a conventional double-spend attack, both with and without a concurrent eclipse attack. Our model quantifies the importa…
▽ More
We present and validate a novel mathematical model of the blockchain mining process and use it to conduct an economic evaluation of the double-spend attack, which is fundamental to all blockchain systems. Our analysis focuses on the value of transactions that can be secured under a conventional double-spend attack, both with and without a concurrent eclipse attack. Our model quantifies the importance of several factors that determine the attack's success, including confirmation depth, attacker mining power, and any confirmation deadline set by the merchant. In general, the security of a transaction against a double-spend attack increases roughly logarithmically with the depth of the block, made easier by the increasing sum of coin turned-over (between individuals) in the blocks, but more difficult by the increasing proof of work required. In recent blockchain data, we observed a median block turnover value of 6 BTC. Based on this value, a merchant requiring a single confirmation is protected against only attackers that can increase the current mining power by 1% or less. However, similar analysis shows that a merchant that requires a much longer 72 confirmations (~12 hours) will eliminate all potential profit for any double-spend attacker adding mining power less than 40% of the current mining power.
△ Less
Submitted 20 November, 2016; v1 submitted 25 October, 2016;
originally announced October 2016.
-
Actively Learning to Attract Followers on Twitter
Authors:
Nir Levine,
Timothy A. Mann,
Shie Mannor
Abstract:
Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research has focused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers through normative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextual bandit…
▽ More
Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research has focused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers through normative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextual bandit problem, in which we consider retweeting content to be the action chosen and each tweet (content) is accompanied by context. We design reward signals based on the change in followers. The result of our month long experiment with 60 agents suggests that (1) aggregating experience across agents can adversely impact prediction accuracy and (2) the Twitter community's response to different actions is non-stationary. Our findings suggest that actively learning on-line can provide deeper insights about how to attract followers than machine learning over passively collected data alone.
△ Less
Submitted 16 April, 2015;
originally announced April 2015.