Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Cox, D D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13359  [pdf, other

    cs.CL cs.AI cs.LG

    Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

    Authors: Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda

    Abstract: Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters. Re… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2407.13739  [pdf, other

    cs.AI cs.CL cs.SE

    Scaling Granite Code Models to 128K Context

    Authors: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

    Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also re… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  4. arXiv:2403.01081  [pdf, other

    cs.CL cs.LG

    LAB: Large-Scale Alignment for ChatBots

    Authors: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Kai Xu, David D. Cox, Akash Srivastava

    Abstract: This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Corresponding Author: Akash Srivastava. Equal Contribution: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Akash Srivastava, Code: https://github.com/instructlab

  5. arXiv:2304.03767  [pdf, other

    cs.CV

    Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

    Authors: Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CoRL 2022

  6. arXiv:2303.00980  [pdf, other

    cs.LG

    Learning to Grow Pretrained Models for Efficient Transformer Training

    Authors: Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

    Abstract: Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR), 2023

  7. arXiv:2111.06979  [pdf, other

    q-bio.NC cs.LG cs.NE

    Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

    Authors: Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung

    Abstract: Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  8. arXiv:2012.11587  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Object-Centric Diagnosis of Visual Reasoning

    Authors: Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

    Abstract: When answering questions about an image, it not only needs knowing what -- understanding the fine-grained contents (e.g., objects, relationships) in the image, but also telling why -- reasoning over grounding visual cues to derive the answer for a question. Over the last few years, we have seen significant progress on visual question answering. Though impressive as the accuracy grows, it still lag… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  9. arXiv:2010.13187  [pdf, other

    stat.ML cs.CV cs.LG

    Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

    Authors: Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaquan Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund

    Abstract: Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 October, 2020; originally announced October 2020.

  10. arXiv:2009.04433  [pdf, other

    eess.IV cs.CV stat.ML

    not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution

    Authors: Seungwook Han, Akash Srivastava, Cole Hurwitz, Prasanna Sattigeri, David D. Cox

    Abstract: State-of-the-art models for high-resolution image generation, such as BigGAN and VQVAE-2, require an incredible amount of compute resources and/or time (512 TPU-v3 cores) to train, putting them out of reach for the larger research community. On the other hand, GAN-based image super-resolution models, such as ESRGAN, can not only upscale images to high dimensions, but also are efficient to train. I… ▽ More

    Submitted 25 October, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

  11. arXiv:1911.08051  [pdf, other

    stat.ML cs.LG

    SimVAE: Simulator-Assisted Training forInterpretable Generative Models

    Authors: Akash Srivastava, Jessie Rosenberg, Dan Gutfreund, David D. Cox

    Abstract: This paper presents a simulator-assisted training method (SimVAE) for variational autoencoders (VAE) that leads to a disentangled and interpretable latent space. Training SimVAE is a two-step process in which first a deep generator network(decoder) is trained to approximate the simulator. During this step, the simulator acts as the data source or as a teacher network. Then an inference network (en… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  12. arXiv:1806.00730  [pdf, other

    stat.ML cs.LG cs.NE

    Minnorm training: an algorithm for training over-parameterized deep neural networks

    Authors: Yamini Bansal, Madhu Advani, David D Cox, Andrew M Saxe

    Abstract: In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting trai… ▽ More

    Submitted 21 June, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

  13. arXiv:1502.04972  [pdf, other

    cs.NE q-bio.NC

    Measuring and Understanding Sensory Representations within Deep Networks Using a Numerical Optimization Framework

    Authors: Chuan-Yung Tsai, David D. Cox

    Abstract: A central challenge in sensory neuroscience is describing how the activity of populations of neurons can represent useful features of the external environment. However, while neurophysiologists have long been able to record the responses of neurons in awake, behaving animals, it is another matter entirely to say what a given neuron does. A key problem is that in many sensory domains, the space of… ▽ More

    Submitted 17 February, 2015; originally announced February 2015.

  14. arXiv:1306.3476  [pdf, other

    cs.CV cs.LG stat.ML

    Hyperparameter Optimization and Boosting for Classifying Facial Expressions: How good can a "Null" Model be?

    Authors: James Bergstra, David D. Cox

    Abstract: One of the goals of the ICML workshop on representation and learning is to establish benchmark scores for a new data set of labeled facial expressions. This paper presents the performance of a "Null" model consisting of convolutions with random weights, PCA, pooling, normalization, and a linear readout. Our approach focused on hyperparameter optimization rather than novel model components. On the… ▽ More

    Submitted 14 June, 2013; originally announced June 2013.

    Comments: Presented at the Workshop on Representation and Learning, ICML 2013

  15. arXiv:1209.5111  [pdf, other

    cs.CV cs.NE

    Making a Science of Model Search

    Authors: J. Bergstra, D. Yamins, D. D. Cox

    Abstract: Many computer vision algorithms depend on a variety of parameter choices and settings that are typically hand-tuned in the course of evaluating the algorithm. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to evaluating a method's full potential. Compounding matters, these parameters often must b… ▽ More

    Submitted 23 September, 2012; originally announced September 2012.