Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Sidor, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  2. arXiv:2203.03466  [pdf, other

    cs.LG cond-mat.dis-nn cs.NE

    Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

    Authors: Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

    Abstract: Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call muTransfer: parametrize the target model in muP, tune the HP indirectly on… ▽ More

    Submitted 28 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: NeurIPS 2021

  3. arXiv:1912.06680  [pdf, other

    cs.LG stat.ML

    Dota 2 with Large Scale Deep Reinforcement Learning

    Authors: OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang , et al. (2 additional authors not shown)

    Abstract: On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learnin… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  4. arXiv:1808.00177  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Learning Dexterous In-Hand Manipulation

    Authors: OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

    Abstract: We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite… ▽ More

    Submitted 18 January, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Making OpenAI the first author. We wish this paper to be cited as "Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating the approach from the physics community: arXiv:1812.06489

  5. arXiv:1710.03748  [pdf, other

    cs.AI

    Emergent Complexity via Multi-Agent Competition

    Authors: Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch

    Abstract: Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can pro… ▽ More

    Submitted 14 March, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018

  6. arXiv:1706.04317  [pdf, other

    cs.AI

    Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics

    Authors: Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George

    Abstract: The recent adaptation of deep neural network-based methods to reinforcement learning and planning domains has yielded remarkable progress on individual tasks. Nonetheless, progress on task-to-task transfer remains limited. In pursuit of efficient and robust generalization, we introduce the Schema Network, an object-oriented generative physics simulator capable of disentangling multiple causes of e… ▽ More

    Submitted 17 August, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

  7. arXiv:1706.01905  [pdf, other

    cs.LG cs.AI cs.NE cs.RO stat.ML

    Parameter Space Noise for Exploration

    Authors: Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz

    Abstract: Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and requir… ▽ More

    Submitted 31 January, 2018; v1 submitted 6 June, 2017; originally announced June 2017.

    Comments: Updated to camera-ready ICLR submission

  8. arXiv:1706.01502  [pdf, ps, other

    cs.LG stat.ML

    UCB Exploration via Q-Ensembles

    Authors: Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

    Abstract: We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

    Submitted 7 November, 2017; v1 submitted 5 June, 2017; originally announced June 2017.

  9. arXiv:1703.03864  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Authors: Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever

    Abstract: We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, ou… ▽ More

    Submitted 7 September, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

  10. arXiv:1602.03203  [pdf, other

    cs.AI

    Time Resource Networks

    Authors: Szymon Sidor, Peng Yu, Cheng Fang, Brian Williams

    Abstract: The problem of scheduling under resource constraints is widely applicable. One prominent example is power management, in which we have a limited continuous supply of power but must schedule a number of power-consuming tasks. Such problems feature tightly coupled continuous resource constraints and continuous temporal constraints. We address such problems by introducing the Time Resource Network… ▽ More

    Submitted 9 February, 2016; originally announced February 2016.

    Comments: 7 pages, submitted for review to IJCAI16

  11. arXiv:1506.08251  [pdf, other

    cs.LG

    Occam's Gates

    Authors: Jonathan Raiman, Szymon Sidor

    Abstract: We present a complimentary objective for training recurrent neural networks (RNN) with gating units that helps with regularization and interpretability of the trained model. Attention-based RNN models have shown success in many difficult sequence to sequence classification problems with long and short term dependencies, however these models are prone to overfitting. In this paper, we describe how… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: In review at NIPS