Zum Hauptinhalt springen

Showing 1–50 of 303 results for author: Patel, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06447  [pdf, other

    cs.CV

    S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation

    Authors: Jay N. Paranjape, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

    Abstract: Medical image segmentation has been traditionally approached by training or fine-tuning the entire model to cater to any new modality or dataset. However, this approach often requires tuning a large number of parameters during training. With the introduction of the Segment Anything Model (SAM) for prompted segmentation of natural images, many efforts have been made towards adapting it efficiently… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted in MICCAI 2024

  2. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  3. arXiv:2407.15373  [pdf, other

    cs.HC

    avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality

    Authors: Dizhi Ma, Xiyun Hu, Jingyu Shi, Mayank Patel, Rahul Jain, Ziyi Liu, Zhengzhe Zhu, Karthik Ramani

    Abstract: Table tennis stroke training is a critical aspect of player development. We designed a new augmented reality (AR) system, avaTTAR, for table tennis stroke training. The system provides both "on-body" (first-person view) and "detached" (third-person view) visual cues, enabling users to visualize target strokes and correct their attempts effectively with this dual perspectives setup. By employing a… ▽ More

    Submitted 26 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  4. arXiv:2407.09781  [pdf, other

    cs.CV

    Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

    Authors: Ruihuang Li, Zhengqiang Zhang, Chenhang He, Zhiyuan Ma, Vishal M. Patel, Lei Zhang

    Abstract: Recent vision-language pre-training models have exhibited remarkable generalization ability in zero-shot recognition tasks. Previous open-vocabulary 3D scene understanding methods mostly focus on training 3D models using either image or text supervision while neglecting the collective strength of all modalities. In this work, we propose a Dense Multimodal Alignment (DMA) framework to densely co-em… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  5. arXiv:2407.07220  [pdf, other

    cs.CV cs.GR

    Reference-based Controllable Scene Stylization with Gaussian Splatting

    Authors: Yiqun Mei, Jiacong Xu, Vishal M. Patel

    Abstract: Referenced-based scene stylization that edits the appearance based on a content-aligned reference image is an emerging research area. Starting with a pretrained neural radiance field (NeRF), existing methods typically learn a novel appearance that matches the given style. Despite their effectiveness, they inherently suffer from time-consuming volume rendering, and thus are impractical for many rea… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.06839  [pdf, other

    cs.CV

    A Mamba-based Siamese Network for Remote Sensing Change Detection

    Authors: Jay N. Paranjape, Celso de Melo, Vishal M. Patel

    Abstract: Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significan… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  7. arXiv:2407.06187  [pdf, other

    cs.CV cs.GR

    JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

    Authors: Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming.… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR 24

  8. arXiv:2406.17396  [pdf, other

    cs.CV

    SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing

    Authors: Ruihuang Li, Liyi Chen, Zhengqiang Zhang, Varun Jampani, Vishal M. Patel, Lei Zhang

    Abstract: Text-based 2D diffusion models have demonstrated impressive capabilities in image generation and editing. Meanwhile, the 2D diffusion models also exhibit substantial potentials for 3D editing tasks. However, how to achieve consistent edits across multiple viewpoints remains a challenge. While the iterative dataset update method is capable of achieving global consistency, it suffers from slow conve… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 16 pages, 13 figures

  9. arXiv:2406.13237  [pdf, other

    cs.CV

    ModelMix: A New Model-Mixup Strategy to Minimize Vicinal Risk across Tasks for Few-scribble based Cardiac Segmentation

    Authors: Ke Zhang, Vishal M. Patel

    Abstract: Pixel-level dense labeling is both resource-intensive and time-consuming, whereas weak labels such as scribble present a more feasible alternative to full annotations. However, training segmentation networks with weak supervision from scribbles remains challenging. Inspired by the fact that different segmentation tasks can be correlated with each other, we introduce a new approach to few-scribble… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 3 figures

  10. arXiv:2406.10373  [pdf, other

    cs.CV cs.GR

    Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

    Authors: Jiacong Xu, Yiqun Mei, Vishal M. Patel

    Abstract: Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their exte… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  11. arXiv:2406.02549  [pdf, other

    cs.CV

    Dreamguider: Improved Training free Diffusion-based Conditional Generation

    Authors: Nithin Gopalakrishnan Nair, Vishal M Patel

    Abstract: Diffusion models have emerged as a formidable tool for training-free conditional generation.However, a key hurdle in inference-time guidance techniques is the need for compute-heavy backpropagation through the diffusion network for estimating the guidance direction. Moreover, these techniques often require handcrafted parameter tuning on a case-by-case basis. Although some recent works have introd… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.11708  [pdf, other

    cs.LG cs.CV

    Adaptive Batch Normalization Networks for Adversarial Robustness

    Authors: Shao-Yuan Lo, Vishal M. Patel

    Abstract: Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still r… ▽ More

    Submitted 26 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted at IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 2024

  13. arXiv:2405.11056  [pdf, other

    cs.GR cs.LG

    A Comparative Study of Garment Draping Techniques

    Authors: Prerana Achar, Mayank Patel, Anushka Mulik, Neha Katre, Stevina Dias, Chirag Raman

    Abstract: We present a comparison review that evaluates popular techniques for garment draping for 3D fashion design, virtual try-ons, and animations. A comparative study is performed between various methods for garment draping of clothing over the human body. These include numerous models, such as physics and machine learning based techniques, collision handling, and more. Performance evaluations and trade… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  14. arXiv:2405.10913  [pdf, other

    cs.CV

    Blackbox Adaptation for Medical Image Segmentation

    Authors: Jay N. Paranjape, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

    Abstract: In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation.… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted early at MICCAI 2024

  15. arXiv:2405.10456  [pdf, other

    cs.CV

    Region-level labels in ice charts can produce pixel-level segmentation for Sea Ice types

    Authors: Muhammed Patel, Xinwei Chen, Linlin Xu, Yuhao Chen, K Andrea Scott, David A. Clausi

    Abstract: Fully supervised deep learning approaches have demonstrated impressive accuracy in sea ice classification, but their dependence on high-resolution labels presents a significant challenge due to the difficulty of obtaining such data. In response, our weakly supervised learning method provides a compelling alternative by utilizing lower-resolution regional labels from expert-annotated ice charts. Th… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Published at ICLR 2024 Machine Learning for Remote Sensing (ML4RS) Workshop

  16. arXiv:2404.14406  [pdf, other

    cs.CV

    Hyp-OC: Hyperbolic One Class Classification for Face Anti-Spoofing

    Authors: Kartik Narayan, Vishal M. Patel

    Abstract: Face recognition technology has become an integral part of modern security systems and user authentication processes. However, these systems are vulnerable to spoofing attacks and can easily be circumvented. Most prior research in face anti-spoofing (FAS) approaches it as a two-class classification task where models are trained on real samples and known spoof attacks and tested for detection perfo… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted in FG2024, Project Page - https://kartik-3004.github.io/hyp-oc/

  17. arXiv:2404.12450  [pdf, other

    cs.CV cs.AI cs.LG

    Enhancing AI Diagnostics: Autonomous Lesion Masking via Semi-Supervised Deep Learning

    Authors: Ting-Ruen Wei, Michele Hell, Dang Bich Thuy Le, Aren Vierra, Ran Pang, Mahesh Patel, Young Kang, Yuling Yan

    Abstract: This study presents an unsupervised domain adaptation method aimed at autonomously generating image masks outlining regions of interest (ROIs) for differentiating breast lesions in breast ultrasound (US) imaging. Our semi-supervised learning approach utilizes a primitive model trained on a small public breast US dataset with true annotations. This model is then iteratively refined for the domain a… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  18. arXiv:2404.12368  [pdf, other

    cs.CV cs.LG

    Gradient-Regularized Out-of-Distribution Detection

    Authors: Sina Sharifi, Taha Entesari, Bardia Safaei, Vishal M. Patel, Mahyar Fazlyab

    Abstract: One of the challenges for neural networks in real-life applications is the overconfident errors these models make when the data is not from the original training distribution. Addressing this issue is known as Out-of-Distribution (OOD) detection. Many state-of-the-art OOD methods employ an auxiliary dataset as a surrogate for OOD data during training to achieve improved performance. However,… ▽ More

    Submitted 23 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to ECCV 2024

  19. arXiv:2404.11764  [pdf, other

    cs.CV

    Multimodal 3D Object Detection on Unseen Domains

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  20. arXiv:2404.11737  [pdf, other

    cs.CV

    Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  21. arXiv:2404.09977  [pdf, other

    cs.CV

    MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

    Authors: Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M Patel

    Abstract: Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation. For most applications, we can train the model end-toend with paired data to obtain photorealistic generation quality. However, to add an additional task, one often needs to retrain the model from scratch using paired data across al… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2404.09976  [pdf, other

    cs.CV

    Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers

    Authors: Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M. Patel

    Abstract: Recently, diffusion transformers have gained wide attention with its excellent performance in text-to-image and text-to-vidoe models, emphasizing the need for transformers as backbone for diffusion models. Transformer-based models have shown better generalization capability compared to CNN-based models for general vision tasks. However, much less has been explored in the existing literature regard… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.01367  [pdf, other

    cs.CV cs.LG

    Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

    Authors: Kangfu Mei, Zhengzhong Tu, Mauricio Delbracio, Hossein Talebi, Vishal M. Patel, Peyman Milanfar

    Abstract: We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established te… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  24. arXiv:2403.19593  [pdf, other

    cs.CV

    Frame by Familiar Frame: Understanding Replication in Video Diffusion Models

    Authors: Aimon Rahman, Malsha V. Perera, Vishal M. Patel

    Abstract: Building on the momentum of image generation diffusion models, there is an increasing interest in video-based diffusion models. However, video generation poses greater challenges due to its higher-dimensional nature, the scarcity of training data, and the complex spatiotemporal relationships involved. Image generation models, due to their extensive data requirements, have already strained computat… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  25. arXiv:2403.19549  [pdf, other

    cs.CV cs.RO

    GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

    Authors: Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald

    Abstract: Recent advancements in RGB-only dense Simultaneous Localization and Mapping (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without ne… ▽ More

    Submitted 27 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  26. arXiv:2403.14513  [pdf, other

    cs.CV

    View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

    Authors: Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai

    Abstract: Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dr… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  27. arXiv:2403.14053  [pdf, other

    cs.CV cs.GR

    Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions

    Authors: Jiacong Xu, Mingqian Liao, K Ram Prabhakar, Vishal M. Patel

    Abstract: Neural Radiance Fields (NeRF) accomplishes photo-realistic novel view synthesis by learning the implicit volumetric representation of a scene from multi-view images, which faithfully convey the colorimetric information. However, sensor noises will contaminate low-value pixel signals, and the lossy camera image signal processor will further remove near-zero intensities in extremely dark situations,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 25 pages, 13 figures

  28. arXiv:2403.12960  [pdf, other

    cs.CV

    FaceXFormer: A Unified Transformer for Facial Analysis

    Authors: Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel

    Abstract: In this work, we introduce FaceXformer, an end-to-end unified transformer model for a comprehensive range of facial analysis tasks such as face parsing, landmark detection, head pose estimation, attributes recognition, and estimation of age, gender, race, and landmarks visibility. Conventional methods in face analysis have often relied on task-specific designs and preprocessing techniques, which l… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project page: https://kartik-3004.github.io/facexformer_web/

  29. arXiv:2403.09632  [pdf, other

    cs.CV

    Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image

    Authors: Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, Vishal M. Patel

    Abstract: At the core of portrait photography is the search for ideal lighting and viewpoint. The process often requires advanced knowledge in photography and an elaborate studio setup. In this work, we propose Holo-Relighting, a volumetric relighting method that is capable of synthesizing novel viewpoints, and novel lighting from a single image. Holo-Relighting leverages the pretrained 3D GAN (EG3D) to rec… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  30. arXiv:2403.06978  [pdf, other

    cs.CV

    Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling

    Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

    Abstract: In this paper, we introduce Attention Prompt Tuning (APT) - a computationally efficient variant of prompt tuning for video-based applications such as action recognition. Prompt tuning approaches involve injecting a set of learnable prompts along with data tokens during fine-tuning while keeping the backbone frozen. This approach greatly reduces the number of learnable parameters compared to full t… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG'24) Code available at: https://github.com/wgcban/apt 12 pages, 8 figures, 6 tables

  31. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  32. arXiv:2402.19341  [pdf, other

    cs.RO cs.CV

    RoadRunner -- Learning Traversability Estimation for Autonomous Off-road Driving

    Authors: Jonas Frey, Manthan Patel, Deegan Atha, Julian Nubert, David Fan, Ali Agha, Curtis Padgett, Patrick Spieler, Marco Hutter, Shehryar Khattak

    Abstract: Autonomous navigation at high speeds in off-road environments necessitates robots to comprehensively understand their surroundings using onboard sensing only. The extreme conditions posed by the off-road setting can cause degraded camera image quality due to poor lighting and motion blur, as well as limited sparse geometric information available from LiDAR sensing when driving at high speeds. In t… ▽ More

    Submitted 30 August, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: accepted for IEEE Transactions on Field Robotics (T-FR)

  33. arXiv:2402.17207  [pdf, other

    cs.CV

    Deployment Prior Injection for Run-time Calibratable Object Detection

    Authors: Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua

    Abstract: With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  34. arXiv:2402.08697  [pdf, other

    eess.IV cs.CV

    Weakly Supervised Detection of Pheochromocytomas and Paragangliomas in CT

    Authors: David C. Oluigboa, Bikash Santra, Tejas Sudharshan Mathai, Pritam Mukherjee, Jianfei Liu, Abhishek Jha, Mayank Patel, Karel Pacak, Ronald M. Summers

    Abstract: Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors which have the potential to metastasize. For the management of patients with PPGLs, CT is the preferred modality of choice for precise localization and estimation of their progression. However, due to the myriad variations in size, morphology, and appearance of the tumors in different anatomical regions, radiolo… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted at SPIE 2024. arXiv admin note: text overlap with arXiv:2402.00175

  35. arXiv:2402.05195  [pdf, other

    cs.CV cs.CL

    $λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

    Authors: Maitreya Patel, Sangmin Jung, Chitta Baral, Yezhou Yang

    Abstract: Despite the recent advances in personalized text-to-image (P-T2I) generative models, it remains challenging to perform finetuning-free multi-subject-driven T2I in a resource-efficient manner. Predominantly, contemporary approaches, involving the training of Hypernetworks and Multimodal Large Language Models (MLLMs), require heavy computing resources that range from 600 to 12300 GPU hours of traini… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Project page: https://eclipse-t2i.github.io/Lambda-ECLIPSE/

  36. arXiv:2402.02263  [pdf, other

    cs.LG cs.AI cs.CV

    MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers

    Authors: Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi

    Abstract: Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct pred… ▽ More

    Submitted 12 April, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    MSC Class: 68T07

  37. arXiv:2401.16279  [pdf, other

    cs.AR

    Rethinking the Producer-Consumer Relationship in Modern DRAM-Based Systems

    Authors: Minesh Patel, Taha Shahroodi, Aditya Manglik, Abdullah Giray Yağlıkçı, Ataberk Olgun, Haocong Luo, Onur Mutlu

    Abstract: Generational improvements to commodity DRAM throughout half a century have long solidified its prevalence as main memory across the computing industry. However, overcoming today's DRAM technology scaling challenges requires new solutions driven by both DRAM producers and consumers. In this paper, we observe that the separation of concerns between producers and consumers specified by industry-wide… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.10378

  38. arXiv:2401.08497  [pdf, other

    cs.RO

    Battery-Swapping Multi-Agent System for Sustained Operation of Large Planetary Fleets

    Authors: Ethan Holand, Jarrod Homer, Alex Storrer, Musheeera Khandeker, Ethan F. Muhlon, Maulik Patel, Ben-oni Vainqueur, David Antaki, Naomi Cooke, Chloe Wilson, Bahram Shafai, Nathaniel Hanson, Taşkın Padır

    Abstract: We propose a novel, heterogeneous multi-agent architecture that miniaturizes rovers by outsourcing power generation to a central hub. By delegating power generation and distribution functions to this hub, the size, weight, power, and cost (SWAP-C) per rover are reduced, enabling efficient fleet scaling. As these rovers conduct mission tasks around the terrain, the hub charges an array of replaceme… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 15 pages, 12 figures. To be published in IEEE Aerospace Conference 2024

  39. arXiv:2401.00972  [pdf

    cs.LG cs.CY stat.AP

    Robust Meta-Model for Predicting the Need for Blood Transfusion in Non-traumatic ICU Patients

    Authors: Alireza Rafiei, Ronald Moore, Tilendra Choudhary, Curtis Marshall, Geoffrey Smith, John D. Roback, Ravi M. Patel, Cassandra D. Josephson, Rishikesan Kamaleswaran

    Abstract: Objective: Blood transfusions, crucial in managing anemia and coagulopathy in ICU settings, require accurate prediction for effective resource allocation and patient risk assessment. However, existing clinical decision support systems have primarily targeted a particular patient demographic with unique medical conditions and focused on a single type of blood transfusion. This study aims to develop… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  40. arXiv:2312.14952  [pdf, other

    cs.CV eess.IV

    A Cascaded Neural Network System For Rating Student Performance In Surgical Knot Tying Simulation

    Authors: Yunzhe Xue, Olanrewaju Eletta, Justin W. Ady, Nell M. Patel, Advaith Bongu, Usman Roshan

    Abstract: As part of their training all medical students and residents have to pass basic surgical tasks such as knot tying, needle-passing, and suturing. Their assessment is typically performed in the operating room by surgical faculty where mistakes and failure by the student increases the operation time and cost. This evaluation is quantitative and has a low margin of error. Simulation has emerged as a c… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: To appear in proceedings of 11th IEEE International Conference on Healthcare Informatics (ICHI) 2023

  41. arXiv:2312.14126  [pdf, other

    cs.CV

    Entropic Open-set Active Learning

    Authors: Bardia Safaei, Vibashan VS, Celso M. de Melo, Vishal M. Patel

    Abstract: Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real-world scenarios where the unlabeled data contains unknown categories. Recently, a few studies have attempted to tackle the AL problem for the open-set setting.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted in AAAI 2024

  42. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  43. arXiv:2312.11414  [pdf, other

    cs.AI

    Animal-AI 3: What's New & Why You Should Care

    Authors: Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, Matthew Crosby, Joel Holmes, John Burden, Niharika Chaubey, Niall Donnelly, Matishalin Patel, Marta Halina, José Hernández-Orallo, Lucy G. Cheke

    Abstract: The Animal-AI Environment is a unique game-based research platform designed to serve both the artificial intelligence and cognitive science research communities. In this paper, we present Animal-AI 3, the latest version of the environment, outlining several major new features that make the game more engaging for humans and more complex for AI systems. New features include interactive buttons, rewa… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  44. arXiv:2312.04655  [pdf, other

    cs.CV

    ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

    Authors: Maitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang

    Abstract: Text-to-image (T2I) diffusion models, notably the unCLIP models (e.g., DALL-E-2), achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks, at the cost of significant computational resources. The unCLIP stack comprises T2I prior and diffusion image decoder. The T2I prior model alone adds a billion parameters compared to the Latent Diffusion Models, which increases the co… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project Page: https://eclipse-t2i.vercel.app/

  45. arXiv:2312.02156  [pdf, other

    cs.CV cs.AI

    Latent Feature-Guided Diffusion Models for Shadow Removal

    Authors: Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, Vishal M. Patel

    Abstract: Recovering textures under shadows has remained a challenging problem due to the difficulty of inferring shadow-free scenes from shadow images. In this paper, we propose the use of diffusion models as they offer a promising approach to gradually refine the details of shadow regions during the diffusion process. Our method improves this process by conditioning on a learned latent feature space that… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: project page see https://kfmei.page/shadow-diffusion/index.html

  46. arXiv:2312.02151  [pdf, other

    cs.CV cs.AI cs.LG

    Guarding Barlow Twins Against Overfitting with Mixed Samples

    Authors: Wele Gedara Chaminda Bandara, Celso M. De Melo, Vishal M. Patel

    Abstract: Self-supervised Learning (SSL) aims to learn transferable feature representations for downstream applications without relying on labeled data. The Barlow Twins algorithm, renowned for its widespread adoption and straightforward implementation compared to its counterparts like contrastive learning methods, minimizes feature redundancy while maximizing invariance to common corruptions. Optimizing fo… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Code and checkpoints are available at: https://github.com/wgcban/mix-bt.git

  47. arXiv:2312.00909  [pdf, other

    cs.IR cs.AI

    LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

    Authors: Reza Yousefi Maragheh, Chenhao Fang, Charan Chand Irugu, Parth Parikh, Jason Cho, Jianpeng Xu, Saranyan Sukumar, Malay Patel, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Keyword extraction is one of the core tasks in natural language processing. Classic extraction models are notorious for having a short attention span which make it hard for them to conclude relational connections among the words and sentences that are far from each other. This, in turn, makes their usage prohibitive for generating keywords that are inferred from the context of the whole text. In t… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  48. arXiv:2310.16825  [pdf, other

    cs.CV cs.CY

    CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

    Authors: Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov

    Abstract: We assemble a dataset of Creative-Commons-licensed (CC) images, which we use to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train text-to-image generative models; (2) CC images are relatively scarce. In turn, to address these challenges, we use… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  49. arXiv:2310.01407  [pdf, other

    cs.CV cs.AI cs.LG

    CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

    Authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar

    Abstract: Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks such as image enhancement, restoration, editing, and compositing. However, their widespread adoption is hindered by the high computational cost, which limits their real-time application. To address this challenge, we introduce a novel method dubbed CoDi, that… ▽ More

    Submitted 17 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  50. arXiv:2310.00815  [pdf

    cs.DB

    ReAcTable: Enhancing ReAct for Table Question Answering

    Authors: Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, Jignesh M. Patel

    Abstract: Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical reasoning, understanding of data semantics, and fundamental analytical capabilities. Due to its significance, a substantial volume of research has… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.