Skip to main content

Showing 1–4 of 4 results for author: Rimchala, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03604  [pdf, other

    cs.CL cs.CV

    Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations

    Authors: Zhiyang Xu, Minqian Liu, Ying Shen, Joy Rimchala, Jiaxin Zhang, Qifan Wang, Yu Cheng, Lifu Huang

    Abstract: Recent advancements in Vision-Language Models (VLMs) have led to the development of Vision-Language Generalists (VLGs) capable of understanding and generating interleaved images and text. Despite these advances, VLGs still struggle to follow user instructions for interleaved text and image generation. To address this issue, we introduce LeafInstruct, the first open-sourced interleaved instruction… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 Pages, visual instruction tuning, parameter-efficient tuning

  2. arXiv:2406.14643  [pdf, other

    cs.CV cs.AI cs.CL

    Holistic Evaluation for Interleaved Text-and-Image Generation

    Authors: Minqian Liu, Zhiyang Xu, Zihao Lin, Trevor Ashby, Joy Rimchala, Jiaxin Zhang, Lifu Huang

    Abstract: Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress. 13 pages, 5 figure, 6 tables

  3. arXiv:2311.09625  [pdf, other

    cs.CV

    DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

    Authors: Jiaxin Zhang, Joy Rimchala, Lalla Mouatadid, Kamalika Das, Sricharan Kumar

    Abstract: The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To ad… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted by IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  4. arXiv:2305.14590  [pdf, other

    cs.CL cs.AI

    RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents

    Authors: Pritika Ramu, Sijia Wang, Lalla Mouatadid, Joy Rimchala, Lifu Huang

    Abstract: Current research in form understanding predominantly relies on large pre-trained language models, necessitating extensive data for pre-training. However, the importance of layout structure (i.e., the spatial relationship between the entity blocks in the visually rich document) to relation extraction has been overlooked. In this paper, we propose REgion-Aware Relation Extraction (RE$^2$) that lever… ▽ More

    Submitted 3 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: NAACL 2024