Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Burns, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07822  [pdf, other

    cs.CV cs.CL

    Tell Me What's Next: Textual Foresight for Generic UI Representations

    Authors: Andrea Burns, Kate Saenko, Bryan A. Plummer

    Abstract: Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user interfaces. Prior work has learned strong visual representations with local or global captioning losses, but fails to retain both granularities. To co… ▽ More

    Submitted 7 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings. Data and code to be released at https://github.com/aburns4/textualforesight

  2. arXiv:2405.02793  [pdf, other

    cs.CV cs.CL

    ImageInWords: Unlocking Hyper-Detailed Image Descriptions

    Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

    Abstract: Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replet… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Webpage (https://google.github.io/imageinwords), GitHub (https://github.com/google/imageinwords), HuggingFace (https://huggingface.co/datasets/google/imageinwords)

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. Extending Rely-Guarantee thinking to handle Real-Time Scheduling

    Authors: Cliff B. Jones, Alan Burns

    Abstract: The reference point for developing any artefact is its specification; to develop software formally, a formal specification is required. For sequential programs, pre and post conditions (together with abstract objects) suffice; rely and guarantee conditions extend the scope of formal development approaches to tackle concurrency. In addition, real-time systems need ways of both requiring progress an… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: Published on-line (2023-11-30) in "Formal Methods in System Design"

    ACM Class: D.2; F.3

  5. arXiv:2305.05432  [pdf, other

    cs.CL cs.CV

    WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

    Authors: Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

    Abstract: Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at the WikiWorkshop 2023. Data is readily available at https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md. arXiv admin note: text overlap with arXiv:2305.03668

  6. arXiv:2305.03668  [pdf, other

    cs.CL cs.CV

    A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

    Authors: Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

    Abstract: Webpages have been a rich, scalable resource for vision-language and language only tasks. Yet only pieces of webpages are kept in existing datasets: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data left underused. To study multimodal webpage understanding, we introduce the Wikipedia… ▽ More

    Submitted 20 October, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted in EMNLP 2023, revision contains camera ready edits. Data can be downloaded at https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md

  7. arXiv:2303.16342  [pdf, other

    cs.CV cs.AI cs.CL

    Language-Guided Audio-Visual Source Separation via Trimodal Consistency

    Authors: Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

    Abstract: We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform, all without access to… ▽ More

    Submitted 23 September, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  8. arXiv:2209.10430  [pdf, other

    cs.PF cs.AR cs.NI

    Real-Time Guarantees in Routerless Networks-on-Chip

    Authors: Leandro Soares Indrusiak, Alan Burns

    Abstract: This paper considers the use of routerless networks-on-chip as an alternative on-chip interconnect for multiprocessor systems requiring hard real-time guarantees for inter-processor communication. It presents a novel analytical framework that can provide latency upper bounds to real-time packet flows sent over routerless networks-on-chip, and it uses that framework to evaluate the ability of such… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

  9. arXiv:2206.05994  [pdf, other

    eess.SY cs.RO math.OC

    Discretization and Stabilization of Energy-Based Controller for Period Switching Control and Flexible Scheduling

    Authors: Seyed Amir Tafrishi, Xiaotian Dai, Yasuhisa Hirata, Alan Burns

    Abstract: Emerging advanced control applications, with increased complexity in software but limited computing resources, suggest that real-time controllers should have adaptable designs. These control strategies also should be designed with consideration of the run-time behavior of the system. One of such research attempts is to design the controller along with the task scheduler, known as control-schedulin… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted to 2022 American Control Conference (ACC), 6 pages, 8 figures

  10. arXiv:2202.02312  [pdf, other

    cs.CL cs.CV cs.HC

    A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

    Authors: Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

    Abstract: Vision-language navigation (VLN), in which an agent follows language instruction in a visual environment, has been studied under the premise that the input command is fully feasible in the environment. Yet in practice, a request may not be possible due to language ambiguity or environment changes. To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Itera… ▽ More

    Submitted 14 August, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: Accepted at the European Conference on Computer Vision (ECCV) 2022. This is a new version of the paper with additional experimental results and a few prior implementation bugs fixed

  11. arXiv:2108.13270  [pdf, ps, other

    cs.HC

    Making the Invisible Visible: Risks and Benefits of Disclosing Metadata in Visualization

    Authors: Alyxander Burns, Thai On, Christiana Lee, Rachel Shapiro, Cindy Xiong, Narges Mahyar

    Abstract: Accompanying a data visualization with metadata may benefit readers by facilitating content understanding, strengthening trust, and providing accountability. However, providing this kind of information may also have negative, unintended consequences, such as biasing readers' interpretations, a loss of trust as a result of too much transparency, and the possibility of opening visualization creators… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

    Comments: To appear in the Visualization for Social Good Workshop at VIS 2021

  12. arXiv:2108.06613  [pdf, other

    cs.CV cs.LG

    Unsupervised Disentanglement without Autoencoding: Pitfalls and Future Directions

    Authors: Andrea Burns, Aaron Sarna, Dilip Krishnan, Aaron Maschinot

    Abstract: Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs). While prior work has focused on generative methods for disentangled representation learning, these approaches do not scale to large datasets due to current limitations of generative models. Instead, we explore regularization methods with contrastive learning, which could re… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: Accepted at the ICML 2021 Self-Supervised Learning for Reasoning and Perception Workshop

  13. arXiv:2104.08560  [pdf, other

    cs.CL cs.CV

    Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

    Authors: Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer

    Abstract: In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction. However, the datasets used for these problems fail to capture the complexity of real inputs and multimodal environments, such as ambiguous natural language requests and diverse… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: Accepted at the workshop on Visually Grounded Interaction and Language (ViGIL) at NAACL 2021

  14. arXiv:2103.06997  [pdf, other

    cs.CV eess.IV

    The Location of Optimal Object Colors with More Than Two Transitions (Preprint)

    Authors: Scott A. Burns

    Abstract: The chromaticity diagram associated with the CIE 1931 color matching functions is shown to be slightly non-convex. While having no impact on practical colorimetric computations, the non-convexity does have a significant impact on the shape of some optimal object color reflectance distributions associated with the outer surface of the object color solid. Instead of the usual two-transition Schrodin… ▽ More

    Submitted 14 May, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 5/14/21 version adds notice of acceptance for publication and changes made in final version

  15. arXiv:2012.01493  [pdf, ps, other

    cs.SE

    A Rely-Guarantee Specification of Mixed-Criticality Scheduling

    Authors: Cliff B Jones, Alan Burns

    Abstract: The application considered is mixed-criticality scheduling. The core formal approaches used are Rely-Guarantee conditions and the Timeband framework; these are applied to give a layered description of job scheduling which includes resilience to jobs overrunning their expected execution time. A novel formal modelling idea is proposed to handle the relationship between actual time and its approximat… ▽ More

    Submitted 21 August, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: This paper will appear in a Festschrift - on publication we will insert a pointer to the book

    Journal ref: Mathematical Foundations of Software Engineering, College Publication, 2022, Chap 6

  16. arXiv:2009.01747  [pdf, other

    cs.HC

    How to evaluate data visualizations across different levels of understanding

    Authors: Alyxander Burns, Cindy Xiong, Steven Franconeri, Alberto Cairo, Narges Mahyar

    Abstract: Understanding a visualization is a multi-level process. A reader must extract and extrapolate from numeric facts, understand how those facts apply to both the context of the data and other potential contexts, and draw or evaluate conclusions from the data. A well-designed visualization should support each of these levels of understanding. We diagnose levels of understanding of visualized data by a… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

    Comments: 8 pages, 3 figures, accepted for presentation at BELIV 2020

  17. arXiv:2004.04312  [pdf, other

    cs.CV cs.CL

    Learning to Scale Multilingual Representations for Vision-Language Tasks

    Authors: Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

    Abstract: Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed… ▽ More

    Submitted 27 August, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: ECCV 2020 accepted spotlight paper

  18. arXiv:1908.06327  [pdf, other

    cs.CV cs.CL

    Language Features Matter: Effective Language Representations for Vision-Language Tasks

    Authors: Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

    Abstract: Shouldn't language and vision features be treated equally in vision-language (VL) tasks? Many VL approaches treat the language component as an afterthought, using simple language models that are either built upon fixed word embeddings trained on text-only data or are learned from scratch. We believe that language features deserve more attention, and conduct experiments which compare different word… ▽ More

    Submitted 17 August, 2019; originally announced August 2019.

    Comments: ICCV 2019 accepted paper

  19. Chromatic Adaptation Transform by Spectral Reconstruction (Preprint)

    Authors: Scott A Burns

    Abstract: A color appearance model (CAM) is an advanced colorimetric tool used to predict color appearance under a wide variety of viewing conditions. A chromatic adaptation transform (CAT) is an integral part of a CAM. Its role is to predict "corresponding colors," that is, a pair of colors that have the same color appearance when viewed under different illuminants, after partial or full adaptation to each… ▽ More

    Submitted 28 September, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: Ver 2 adds the abstract. Ver 3 gives attribution to Eq 1. Ver 4 adds publication notice. Ver 5 corrects Table 4. Ver 6 adds email address, date, and updates publication notice. Ver 7 adds link to full text of the final published version at Color Res Appl. Ver 8 adds citation of final publication

    Journal ref: Color Res Appl. 2019;44(5):682-693

  20. arXiv:1710.06364  [pdf

    cs.GR

    Subtractive Color Mixture Computation

    Authors: Scott Allen Burns

    Abstract: Modeling subtractive color mixture (e.g., the way that paints mix) is difficult when working with colors described only by three-dimensional color space values, such as RGB. Although RGB values are sufficient to describe a specific color sensation, they do not contain enough information to predict the RGB color that would result from a subtractive mixture of two specified RGB colors. Methods do ex… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

    ACM Class: I.3.7

  21. arXiv:1710.05732  [pdf

    cs.CV q-bio.QM

    Generating Reflectance Curves from sRGB Triplets

    Authors: Scott Allen Burns

    Abstract: The color sensation evoked by an object depends on both the spectral power distribution of the illumination and the reflectance properties of the object being illuminated. The color sensation can be characterized by three color-space values, such as XYZ, RGB, HSV, L*a*b*, etc. It is straightforward to compute the three values given the illuminant and reflectance curves. The converse process of com… ▽ More

    Submitted 9 January, 2020; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: v3 minor editing to clarify some points, and some webpage link updates, v4 adds the LHTSS method, v5 indicates LHTSS should be preferred to ILLSS generally

    ACM Class: I.2.10; I.3.7; I.4.8

  22. arXiv:1702.05398  [pdf, other

    cs.CL

    Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

    Authors: Pradeep Dasigi, Gully A. P. C. Burns, Eduard Hovy, Anita de Waard

    Abstract: We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot a… ▽ More

    Submitted 17 February, 2017; originally announced February 2017.

  23. arXiv:1606.02942  [pdf, other

    cs.NI cs.DC cs.PF

    Analysis of buffering effects on hard real-time priority-preemptive wormhole networks

    Authors: Leandro Soares Indrusiak, Alan Burns, Borislav Nikolic

    Abstract: There are several approaches to analyse the worst-case response times of sporadic packets transmitted over priority-preemptive wormhole networks. In this paper, we provide an overview of the different approaches, discuss their strengths and weaknesses, and propose an approach that captures all effects considered by previous approaches while providing tight yet safe upper bounds for packet response… ▽ More

    Submitted 9 June, 2016; originally announced June 2016.

  24. arXiv:1209.5922  [pdf

    cs.DB q-bio.NC

    Towards structured sharing of raw and derived neuroimaging data across existing resources

    Authors: D. B. Keator, K. Helmer, J. Steffener, J. A. Turner, T. G. M. Van Erp, S. Gadde, N. Ashish, G. A. Burns, B. N. Nichols, S. S. Ghosh

    Abstract: Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from… ▽ More

    Submitted 6 March, 2013; v1 submitted 26 September, 2012; originally announced September 2012.