Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Ie, E

.
  1. arXiv:2406.11776  [pdf, other

    cs.CL

    Improving Multi-Agent Debate with Sparse Communication Topology

    Authors: Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie

    Abstract: Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, in terms of the communication among agents, existing approaches adopt a brute force algorithm -- each agent can communicate with all other agents. In this paper, we systematically investigate the effe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  2. arXiv:2306.01075  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

    Authors: Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene Ie, Congcong Li

    Abstract: Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas. In this work, we aim at identifying crossing pedestrians and predicting their future trajectories. To achieve these goals, we not only need the context information of road geometry and other t… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ICRA 2023

  3. arXiv:2103.08057  [pdf, other

    cs.LG cs.AI cs.IR

    RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

    Authors: Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier

    Abstract: The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years. Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often fail to offer… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

  4. arXiv:2101.10504  [pdf, other

    cs.AI cs.CL cs.CV

    On the Evaluation of Vision-and-Language Navigation Instructions

    Authors: Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie

    Abstract: Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions. However, existing instruction generators have not been comprehensively evaluated, and the automatic evaluation metrics used to develop them have not been validated. Using human wayfinders, we show that these generators perform on par with or only slightly better than a te… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021

  5. arXiv:2011.09046  [pdf, other

    cs.CV cs.CL

    A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

    Authors: Bowen Zhang, Hexiang Hu, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha

    Abstract: Identifying a short segment in a long video that semantically matches a text query is a challenging task that has important application potentials in language-based video search, browsing, and navigation. Typical retrieval systems respond to a query with either a whole video or a pre-defined video segment, but it is challenging to localize undefined segments in untrimmed and unsegmented videos whe… ▽ More

    Submitted 23 November, 2020; v1 submitted 17 November, 2020; originally announced November 2020.

  6. arXiv:2010.12694  [pdf, other

    cs.CL

    AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

    Authors: Sayali Kulkarni, Sheide Chammas, Wan Zhu, Fei Sha, Eugene Ie

    Abstract: Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  7. arXiv:2010.07954  [pdf, other

    cs.CV cs.AI cs.CL

    Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

    Authors: Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge

    Abstract: We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the vir… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  8. arXiv:2010.02949  [pdf, other

    cs.CV cs.CL cs.LG

    Learning to Represent Image and Text with Denotation Graph

    Authors: Bowen Zhang, Hexiang Hu, Vihan Jain, Eugene Ie, Fei Sha

    Abstract: Learning to fuse vision and language information and representing them is an important research problem with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets containing images aligned with linguistic expressions that describe the images. In this paper, we propose learnin… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: to appear at EMNLP 2020

  9. arXiv:2008.09236  [pdf, other

    cs.CL

    Spatial Language Representation with Multi-Level Geocoding

    Authors: Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang

    Abstract: We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations. The Earth's surface is represented using space-filling curves that decompose the sphere into a hierarchy of similarly sized, non-overlapping cells. MLG balances generalization and accuracy by combining losses across multiple levels and predicting cells at each level simultaneously. Without using… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

  10. arXiv:2006.07584  [pdf, other

    cs.LG stat.ML

    Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation

    Authors: Zhiyun Lu, Eugene Ie, Fei Sha

    Abstract: Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks. Among them, ensemble methods often lead to state-of-the-art results, though they require modifications to the training procedures and are computationally costly for both training and inference. In this paper, we propose a new single-model based approach. The main idea is insp… ▽ More

    Submitted 9 May, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

  11. arXiv:2005.04625  [pdf, other

    cs.AI cs.CL cs.CV

    BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

    Authors: Wang Zhu, Hexiang Hu, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha

    Abstract: Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decom… ▽ More

    Submitted 14 June, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

    Comments: Accepted by ACL 2020

  12. arXiv:2003.00443  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

    Authors: Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

    Abstract: Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to generalize well in previously unseen environments. To close the gap between seen and unseen environments, we aim at learning a generalized navi… ▽ More

    Submitted 20 July, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  13. arXiv:2001.03671  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

    Authors: Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski

    Abstract: The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLea… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  14. arXiv:1912.03241  [pdf, other

    cs.LG stat.ML

    VALAN: Vision and Language Agent Navigation

    Authors: Larry Lansing, Vihan Jain, Harsh Mehta, Haoshuo Huang, Eugene Ie

    Abstract: VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. W… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  15. arXiv:1909.10506  [pdf, other

    cs.CL cs.IR cs.LG

    Learning Dense Representations for Entity Retrieval

    Authors: Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano

    Abstract: We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: CoNLL 2019

  16. arXiv:1909.04847  [pdf, other

    cs.LG cs.HC cs.IR stat.ML

    RecSim: A Configurable Simulation Platform for Recommender Systems

    Authors: Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier

    Abstract: We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users. RecSim allows the creation of new environments that reflect particular aspects of user behavior and item structure at a level of abstraction well-suited to pushing the limits of current reinforcement learning (RL) and RS technique… ▽ More

    Submitted 26 September, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

  17. arXiv:1908.03409  [pdf, other

    cs.CV cs.CL cs.LG cs.RO

    Transferable Representation Learning in Vision-and-Language Navigation

    Authors: Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie

    Abstract: Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. The overall task requires competence in several perception problems: successful agents combine spatio-temporal, vision and language understanding to produce appropriate action sequenc… ▽ More

    Submitted 12 August, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

    Comments: To appear in ICCV 2019

  18. arXiv:1907.05446  [pdf, other

    cs.RO cs.AI cs.CL

    General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping

    Authors: Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge

    Abstract: In instruction conditioned navigation, agents interpret natural language and their surroundings to navigate through an environment. Datasets for studying this task typically contain pairs of these instructions and reference trajectories. Yet, most evaluation metrics used thus far fail to properly account for the latter, relying instead on insufficient similarity comparisons. We address fundamental… ▽ More

    Submitted 28 November, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

    Journal ref: Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019)

  19. arXiv:1905.13358  [pdf, other

    cs.CL cs.CV

    Multi-modal Discriminative Model for Vision-and-Language Navigation

    Authors: Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie

    Abstract: Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must have the ability to parse natural language of varying linguistic styles, ground them in potentially unfamiliar scenes, plan and react with ambigu… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: Accepted at SpLU-RoboNLP 2019 (workshop at NAACL)

  20. arXiv:1905.12767  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

    Authors: Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, Craig Boutilier

    Abstract: Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items - which may have interacting effects on user choice -… ▽ More

    Submitted 31 May, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Short version to appear IJCAI-2019

  21. arXiv:1905.12255  [pdf, other

    cs.AI cs.CL

    Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

    Authors: Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge

    Abstract: Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understandin… ▽ More

    Submitted 21 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted at ACL 2019 as long paper

  22. arXiv:1312.5697  [pdf, other

    cs.CV cs.LG

    Using Web Co-occurrence Statistics for Improving Image Categorization

    Authors: Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

    Abstract: Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see an elephant to appear in the middle of an ocean. We consider a simple approach to encapsulate such common sense knowledge using co-occurrence statis… ▽ More

    Submitted 20 December, 2013; v1 submitted 19 December, 2013; originally announced December 2013.