Zum Hauptinhalt springen

Showing 1–42 of 42 results for author: Chiu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07524  [pdf, other

    cs.CL cs.AI cs.LG

    Simple and Effective Masked Diffusion Language Models

    Authors: Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov

    Abstract: While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a sim… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Report number: cr07

  2. arXiv:2406.00104  [pdf, other

    cs.LG stat.ML

    Scalable Bayesian Learning with posteriors

    Authors: Samuel Duffield, Kaelan Donatella, Johnathan Chiu, Phoebe Klett, Daniel Simpson

    Abstract: Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes;… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2405.20550  [pdf

    cs.LG stat.ML

    Uncertainty Quantification for Deep Learning

    Authors: Peter Jan van Leeuwen, J. Christine Chiu, C. Kevin Yang

    Abstract: A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages 4 figures, submitted to Environmental data Science

    MSC Class: 62D99 ACM Class: G.3

  4. arXiv:2405.19660  [pdf, other

    cs.CL

    PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

    Authors: Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen

    Abstract: Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cogniti… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  5. arXiv:2404.10166  [pdf, other

    cs.CV cs.LG

    Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification

    Authors: Luffina C. Huang, Darren J. Chiu, Manish Mehta

    Abstract: Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  7. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  8. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  10. arXiv:2312.07395  [pdf, other

    cs.CV cs.CL

    A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

    Authors: Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh

    Abstract: Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. However, we expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in stan… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  11. arXiv:2311.13647  [pdf, other

    cs.CL cs.LG

    Language Model Inversion

    Authors: John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush

    Abstract: Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompt… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  12. arXiv:2311.11822  [pdf, other

    cs.LG cs.CC cs.CR cs.DC

    Zero redundancy distributed learning with differential privacy

    Authors: Zhiqi Bu, Justin Chiu, Ruixuan Liu, Sheng Zha, George Karypis

    Abstract: Deep learning using large models have achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of the training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, DP optimization has comparable efficiency to the standard non-p… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  13. arXiv:2311.08584  [pdf, other

    cs.CL

    Asking More Informative Questions for Grounded Retrieval

    Authors: Sedrick Keh, Justin T. Chiu, Daniel Fried

    Abstract: When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questio… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  14. arXiv:2311.08469  [pdf, other

    cs.CL

    UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

    Authors: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

    Abstract: Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexp… ▽ More

    Submitted 1 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: accepted at NAACL'24

  15. arXiv:2311.08390  [pdf, other

    cs.CL

    Predicting Text Preference Via Structured Comparative Reasoning

    Authors: Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky

    Abstract: Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning. While approaches like Chain-of-Thought improve accuracy in many other settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC, a prompting approach that predicts text pref… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  16. arXiv:2311.04986  [pdf, other

    cs.CV

    Exploiting Inductive Biases in Video Modeling through Neural CDEs

    Authors: Johnathan Chiu, Samuel Duffield, Max Hunter-Gordon, Kaelan Donatella, Max Aifer, Andi Gu

    Abstract: We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  17. arXiv:2310.17140  [pdf, other

    cs.CL cs.AI

    Symbolic Planning and Code Generation for Grounded Dialogue

    Authors: Justin T. Chiu, Wenting Zhao, Derek Chen, Saujas Vaduguru, Alexander M. Rush, Daniel Fried

    Abstract: Large language models (LLMs) excel at processing and generating both text and code. However, LLMs have had limited applicability in grounded task-oriented dialogue as they are difficult to steer toward task objectives and fail to handle novel grounding. We present a modular and interpretable grounded dialogue system that addresses these shortcomings by composing LLMs with a symbolic planner and gr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  18. arXiv:2310.07582  [pdf, other

    cs.LG cs.AI

    Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

    Authors: Dean S. Hazineh, Zechen Zhang, Jeffery Chiu

    Abstract: Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. Th… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  19. arXiv:2307.15858  [pdf, other

    cs.LG

    Multi-output Headed Ensembles for Product Item Classification

    Authors: Hotaka Shiokawa, Pradipto Das, Arthur Toth, Justin Chiu

    Abstract: In this paper, we revisit the problem of product item classification for large-scale e-commerce catalogs. The taxonomy of e-commerce catalogs consists of thousands of genres to which are assigned items that are uploaded by merchants on a continuous basis. The genre assignments by merchants are often wrong but treated as ground truth labels in automatically generated training sets, thus creating a… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  20. arXiv:2305.14618  [pdf, other

    cs.CL cs.AI

    Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations

    Authors: Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

    Abstract: Abductive reasoning aims to find plausible explanations for an event. This style of reasoning is critical for commonsense tasks where there are often multiple plausible explanations. Existing approaches for abductive reasoning in natural language processing (NLP) often rely on manually generated annotations for supervision; however, such annotations can be subjective and biased. Instead of using d… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: accepted at ACL'23

  21. arXiv:2305.14237  [pdf, ps, other

    cs.CL cs.AI

    HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision

    Authors: Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

    Abstract: Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers. This problem has been extensively studied under the supervised setting, where both answer and rationale annotations are given. Because rationale annotations are expensive to collect and not always available, recent efforts have been de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  22. arXiv:2305.09967  [pdf, other

    cs.CV cs.LG

    Variable Length Embeddings

    Authors: Johnathan Chiu, Andi Gu, Matt Zhou

    Abstract: In this work, we introduce a novel deep learning architecture, Variable Length Embeddings (VLEs), an autoregressive model that can produce a latent representation composed of an arbitrary number of tokens. As a proof of concept, we demonstrate the capabilities of VLEs on tasks that involve reconstruction and image decomposition. We evaluate our experiments on a mix of the iNaturalist and ImageNet… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  23. arXiv:2302.03011  [pdf, other

    cs.CV

    Structure and Content-Guided Video Synthesis with Diffusion Models

    Authors: Patrick Esser, Johnathan Chiu, Parmida Atighehchian, Jonathan Granskog, Anastasis Germanidis

    Abstract: Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames. In this work, we present a structure and content-guided vide… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Project page at https://research.runwayml.com/gen1

  24. arXiv:2210.13763  [pdf, other

    cs.NI cs.LG

    Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

    Authors: Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu

    Abstract: The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Te… ▽ More

    Submitted 19 May, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

  25. arXiv:2210.11528  [pdf, other

    cs.CL

    Unsupervised Text Deidentification

    Authors: John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush

    Abstract: Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that masks words that leak personally-identifying information. The approach utilizes a specially trained reidentification model to identify individuals from redacted p… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  26. arXiv:2205.04799  [pdf, other

    cs.RO cs.LG

    Designing a Recurrent Neural Network to Learn a Motion Planner for High-Dimensional Inputs

    Authors: Johnathan Chiu

    Abstract: The use of machine learning in the self-driving industry has boosted a number of recent advancements. In particular, the usage of large deep learning models in the perception and prediction stack have proved quite successful, but there still lacks significant literature on the use of machine learning in the planning stack. The current state of the art in the planning stack often relies on fast con… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  27. arXiv:2205.00119  [pdf, other

    cs.DC

    MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud

    Authors: Zhen Zhang, Shuai Zheng, Yida Wang, Justin Chiu, George Karypis, Trishul Chilimbi, Mu Li, Xin Jin

    Abstract: Existing general purpose frameworks for gigantic model training, i.e., dense models with billions of parameters, cannot scale efficiently on cloud environment with various networking conditions due to large communication overheads. In this paper, we propose MiCS, which Minimizes the Communication Scale to bring down communication overhead. Specifically, by decreasing the number of participants in… ▽ More

    Submitted 28 October, 2022; v1 submitted 29 April, 2022; originally announced May 2022.

  28. arXiv:2204.03208  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    A Joint Learning Approach for Semi-supervised Neural Topic Modeling

    Authors: Jeffrey Chiu, Rajat Mittal, Neehal Tumma, Abhishek Sharma, Finale Doshi-Velez

    Abstract: Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic mode… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: To appear in the 6th ACL Workshop on Structured Prediction for NLP (SPNLP)

  29. arXiv:2202.12385  [pdf, other

    cs.RO

    A Collision-Free MPC for Whole-Body Dynamic Locomotion and Manipulation

    Authors: Jia-Ruei Chiu, Jean-Pierre Sleiman, Mayank Mittal, Farbod Farshidian, Marco Hutter

    Abstract: In this paper, we present a real-time whole-body planner for collision-free legged mobile manipulation. We enforce both self-collision and environment-collision avoidance as soft constraints within a Model Predictive Control (MPC) scheme that solves a multi-contact optimal control problem. By penalizing the signed distances among a set of representative primitive collision bodies, the robot is abl… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Accepted in IEEE International Conference on Robotics and Automation (ICRA) 2022 in Philadelphia (PA), USA

  30. arXiv:2201.02715  [pdf, other

    cs.CL

    Low-Rank Constraints for Fast Inference in Structured Models

    Authors: Justin T. Chiu, Yuntian Deng, Alexander M. Rush

    Abstract: Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCF… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: 22 pages. Published at NeurIPS 2021

  31. arXiv:2112.07207  [pdf, other

    cs.IT cs.CV cs.LG

    Modeling Image Quantization Tradeoffs for Optimal Compression

    Authors: Johnathan Chiu

    Abstract: All Lossy compression algorithms employ similar compression schemes -- frequency domain transform followed by quantization and lossless encoding schemes. They target tradeoffs by quantizating high frequency data to increase compression rates which come at the cost of higher image distortion. We propose a new method of optimizing quantization tables using Deep Learning and a minimax loss function t… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  32. arXiv:2112.01537  [pdf, other

    cs.HC cs.AI cs.LG

    Improving mathematical questioning in teacher training

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Qua… ▽ More

    Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Human Centered AI Workshop (HCAI). Data collection process for this data is described here arXiv:2112.00985

  33. arXiv:2112.00985  [pdf, other

    cs.AI cs.HC cs.LG

    Evaluation of mathematical questioning strategies using data collected through weak supervision

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: A large body of research demonstrates how teachers' questioning strategies can improve student learning outcomes. However, developing new scenarios is challenging because of the lack of training data for a specific scenario and the costs associated with labeling. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skil… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Workshop on Math AI for Education (MATHAI4ED)

  34. arXiv:2109.05042  [pdf, other

    cs.CL

    Reference-Centric Models for Grounded Collaborative Dialogue

    Authors: Daniel Fried, Justin T. Chiu, Dan Klein

    Abstract: We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent ac… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  35. arXiv:2011.04640  [pdf, other

    cs.CL cs.LG

    Scaling Hidden Markov Language Models

    Authors: Justin T. Chiu, Alexander M. Rush

    Abstract: The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling da… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: 9 pages, accepted as a short paper at EMNLP 2020

    Journal ref: EMNLP 2020

  36. arXiv:2010.12710  [pdf, other

    cs.CL cs.CY cs.LG

    Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education

    Authors: Debajyoti Datta, Maria Phillips, Jennifer Chiu, Ginger S. Watson, James P. Bywater, Laura Barnes, Donald Brown

    Abstract: Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data s… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Preprint: Under Review

    ACM Class: I.2.7

  37. arXiv:2005.07173  [pdf, other

    cs.LG cs.PL eess.SY stat.ML

    Formal Analysis and Redesign of a Neural Network-Based Aircraft Taxiing System with VerifAI

    Authors: Daniel J. Fremont, Johnathan Chiu, Dragos D. Margineantu, Denis Osipychev, Sanjit A. Seshia

    Abstract: We demonstrate a unified approach to rigorous design of safety-critical autonomous systems using the VerifAI toolkit for formal analysis of AI-based systems. VerifAI provides an integrated toolchain for tasks spanning the design process, including modeling, falsification, debugging, and ML component retraining. We evaluate all of these applications in an industrial case study on an experimental au… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: Full version of a CAV 2020 paper

  38. arXiv:1902.03210  [pdf, other

    stat.ML cs.LG

    Tensor Variable Elimination for Plated Factor Graphs

    Authors: Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexander Rush, Noah Goodman

    Abstract: A wide class of machine learning algorithms can be reduced to variable elimination on factor graphs. While factor graphs provide a unifying notation for these algorithms, they do not provide a compact way to express repeated structure when compared to plate diagrams for directed graphical models. To exploit efficient tensor algebra in graphs with plates of variables, we generalize undirected facto… ▽ More

    Submitted 16 May, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

    Comments: To appear at ICML; 17 pages

  39. arXiv:1807.03756  [pdf, other

    stat.ML cs.CL cs.LG

    Latent Alignment and Variational Attention

    Authors: Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush

    Abstract: Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compos… ▽ More

    Submitted 7 November, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: accepted by NIPS 2018

  40. arXiv:1705.01187  [pdf, other

    cs.RO cs.AI

    Towards Full Automated Drive in Urban Environments: A Demonstration in GoMentum Station, California

    Authors: Akansel Cosgun, Lichao Ma, Jimmy Chiu, Jiawei Huang, Mahmut Demir, Alexandre Miranda Anon, Thang Lian, Hasan Tafish, Samir Al-Stouhi

    Abstract: Each year, millions of motor vehicle traffic accidents all over the world cause a large number of fatalities, injuries and significant material loss. Automated Driving (AD) has potential to drastically reduce such accidents. In this work, we focus on the technical challenges that arise from AD in urban environments. We present the overall architecture of an AD system and describe in detail the per… ▽ More

    Submitted 2 May, 2017; originally announced May 2017.

    Comments: Accepted to Intelligent Vehicles Conference (IV 2017)

  41. arXiv:1602.02739  [pdf, other

    q-bio.PE cs.DS

    On Determining if Tree-based Networks Contain Fixed Trees

    Authors: Maria Anaya, Olga Anipchenko-Ulaj, Aisha Ashfaq, Joyce Chiu, Mahedi Kaiser, Max Shoji Ohsawa, Megan Owen, Ella Pavlechko, Katherine St. John, Shivam Suleria, Keith Thompson, Corrine Yap

    Abstract: We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is NP-hard to decide, by reduction from 3-Dimensional Matching (3DM), and further, that the problem is fixed parameter tractable.

    Submitted 8 February, 2016; originally announced February 2016.

    Comments: 7 pages, 4 figures

  42. arXiv:1511.08308  [pdf, other

    cs.CL cs.LG cs.NE

    Named Entity Recognition with Bidirectional LSTM-CNNs

    Authors: Jason P. C. Chiu, Eric Nichols

    Abstract: Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature… ▽ More

    Submitted 19 July, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

    Comments: To appear in Transactions of the Association for Computational Linguistics

    MSC Class: 68T50 ACM Class: I.2.7