Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Mittal, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01282  [pdf, other

    cs.CV

    LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

    Authors: Akshita Gupta, Gaurav Mittal, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen

    Abstract: Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video. The emergence of large video foundation models has led RGB-only video backbones to outperform previous methods needing both RGB and optical flow modalities. Leveraging these large models is often limited to training only the TAL head due to the prohibitively large GPU memory required to ad… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Under submission

  2. arXiv:2402.18085  [pdf, other

    cs.SD cs.CR eess.AS

    AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

    Authors: Govind Mittal, Arthur Jakobsson, Kelly O. Marshall, Chinmay Hegde, Nasir Memon

    Abstract: Scammers are aggressively leveraging AI voice-cloning technology for social engineering attacks, a situation significantly worsened by the advent of audio Real-time Deepfakes (RTDFs). RTDFs can clone a target's voice in real-time over phone calls, making these interactions highly interactive and thus far more convincing. Our research confidently addresses the gap in the existing literature on deep… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Dataset will be made public by end of March 2024

  3. arXiv:2308.01508  [pdf, other

    cs.LG cs.CR cs.CV

    Circumventing Concept Erasure Methods For Text-to-Image Generative Models

    Authors: Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal, Chinmay Hegde

    Abstract: Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likene… ▽ More

    Submitted 8 October, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  4. arXiv:2307.12935  [pdf, other

    cs.CL cs.AI

    Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

    Authors: Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen

    Abstract: Classic approaches to content moderation typically apply a rule-based heuristic approach to flag content. While rules are easily customizable and intuitive for humans to interpret, they are inherently fragile and lack the flexibility or robustness needed to moderate the vast amount of undesirable content found online today. Recent advances in deep learning have demonstrated the promise of using hi… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: ACL 2023 Main Conference

  5. arXiv:2307.08585  [pdf, other

    cs.CV

    Identity-Preserving Aging of Face Images via Latent Diffusion Models

    Authors: Sudipta Banerjee, Govind Mittal, Ameya Joshi, Chinmay Hegde, Nasir Memon

    Abstract: The performance of automated face recognition systems is inevitably impacted by the facial aging process. However, high quality datasets of individuals collected over several years are typically small in scale. In this work, we propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images. Our models succeed with few-shot training, a… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to appear in International Joint Conference in Biometrics (IJCB) 2023

  6. arXiv:2306.16410  [pdf, other

    cs.CL cs.CV

    Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

    Authors: William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, Amanpreet Singh

    Abstract: We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recog… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  7. arXiv:2305.10547  [pdf, other

    cs.CV cs.CY

    Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality

    Authors: Jialin Yuan, Ye Yu, Gaurav Mittal, Matthew Hall, Sandra Sajeev, Mei Chen

    Abstract: There is a rapidly growing need for multimodal content moderation (CM) as more and more content on social media is multimodal in nature. Existing unimodal CM systems may fail to catch harmful content that crosses modalities (e.g., memes or videos), which may lead to severe consequences. In this paper, we present a novel CM model, Asymmetric Mixed-Modal Moderation (AM3), to target multimodal and un… ▽ More

    Submitted 13 December, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted at WACV 2024

  8. arXiv:2210.06186  [pdf, other

    cs.CR cs.AI cs.CV

    GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

    Authors: Govind Mittal, Chinmay Hegde, Nasir Memon

    Abstract: With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-s… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE Euro S&P 2024

  9. arXiv:2208.01159  [pdf, other

    cs.CV

    BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

    Authors: Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen

    Abstract: Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-super… ▽ More

    Submitted 7 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted by ECCV 2022 (Oral)

  10. arXiv:2206.04668  [pdf, other

    cs.CV

    GateHUB: Gated History Unit with Background Suppression for Online Action Detection

    Authors: Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen

    Abstract: Online action detection is the task of predicting the action as soon as it happens in a streaming video. A major challenge is that the model does not have access to the future and has to solely rely on the history, i.e., the frames observed so far, to make predictions. It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: CVPR 2022

  11. Balsa: Learning a Query Optimizer Without Expert Demonstrations

    Authors: Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

    Abstract: Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Bals… ▽ More

    Submitted 3 May, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: SIGMOD 2022; code released at: https://github.com/balsa-project/balsa/

  12. arXiv:2110.12606  [pdf, other

    cs.CV

    MUSE: Feature Self-Distillation with Mutual Information and Self-Information

    Authors: Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen

    Abstract: We present a novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). The core idea of our proposed method, called MUSE, is to combine MUtual information and SElf-information to jointly improve the expressivity of all features extracted from different layers in a CNN. We present two variants of the realization of MUSE -- Additive Inf… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: The 32nd British Machine Vision Conference (BMVC 2021)

  13. arXiv:2109.15317  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation

    Authors: Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen

    Abstract: We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition. MetaUVFS leverages over 550K unlabeled videos to train a two-stream 2D and 3D CNN architecture via contrastive learning to capture the appearance-specific spatial and action-specific spatio-temporal video features respectively. MetaUVFS comprises a novel Action-Appearance Aligned Meta-adapt… ▽ More

    Submitted 11 October, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (Oral)

  14. arXiv:2103.16091  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Symbolic Music Generation with Diffusion Models

    Authors: Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon

    Abstract: Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizin… ▽ More

    Submitted 25 November, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: ISMIR 2021

  15. arXiv:2007.08428  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    On Adversarial Robustness: A Neural Architecture Search perspective

    Authors: Chaitanya Devaguptapu, Devansh Agarwal, Gaurav Mittal, Pulkit Gopalani, Vineeth N Balasubramanian

    Abstract: Adversarial robustness of deep learning models has gained much traction in the last few years. Various attacks and defenses are proposed to improve the adversarial robustness of modern-day deep learning architectures. While all these approaches help improve the robustness, one promising direction for improving adversarial robustness is unexplored, i.e., the complex topology of the neural network a… ▽ More

    Submitted 26 August, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Accepted at the Workshop on Adversarial Robustness in Real-World, ICCV-2021 (previous version accepted at four ICLR-21 Workshops)

  16. arXiv:2005.10524  [pdf, other

    cs.CV cs.LG stat.ML

    HyperSTAR: Task-Aware Hyperparameters for Deep Networks

    Authors: Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu

    Abstract: While deep neural networks excel in solving visual recognition tasks, they require significant effort to find hyperparameters that make them work optimally. Hyperparameter Optimization (HPO) approaches have automated the process of finding good hyperparameters but they do not adapt to a given task (task-agnostic), making them computationally inefficient. To reduce HPO time, we present HyperSTAR (S… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: Published at CVPR 2020 (Oral)

  17. arXiv:1910.04269  [pdf, other

    cs.CL cs.LG

    Spoken Language Identification using ConvNets

    Authors: Sarthak, Shikhar Shukla, Govind Mittal

    Abstract: Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: 2019 European Conference on Ambient Intelligence

  18. arXiv:1910.00726  [pdf, other

    cs.CV cs.LG eess.AS

    Animating Face using Disentangled Audio Representations

    Authors: Gaurav Mittal, Baoyuan Wang

    Abstract: All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to such as sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning fra… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at WACV 2020 (Winter conference on Applications of Computer Vision)

  19. arXiv:1909.06781  [pdf, ps, other

    cs.CR

    A Vector Space Approach to Generate Dynamic Keys for Hill Cipher

    Authors: Sunil Kumar, Sandeep Kumar, Gaurav Mittal, Shiv Narain

    Abstract: In this paper, a variant of the Hill cipher is proposed. In the classical Hill cipher, an invertible matrix is used for encryption but the scheme is vulnerable to the known-plaintext attack which can reveal the matrix. In our proposed cryptosystem, each plaintext block is encrypted by a new invertible key matrix that thwarts the known-plaintext attack. To generate the invertible matrices which ser… ▽ More

    Submitted 10 May, 2021; v1 submitted 15 September, 2019; originally announced September 2019.

    MSC Class: 11T71; 94A60

  20. arXiv:1908.06148  [pdf, other

    cs.CR cs.MM

    FiFTy: Large-scale File Fragment Type Identification using Neural Networks

    Authors: Govind Mittal, Pawel Korus, Nasir Memon

    Abstract: We present FiFTy, a modern file type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space, akin to successful natural language processing models. Our approach dispenses with explicit feature extraction which is a bottleneck in legacy syste… ▽ More

    Submitted 7 June, 2020; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: Paper accepted for publication in the IEEE Transactions on Information Forensics and Security

  21. arXiv:1905.03743  [pdf, other

    cs.CV cs.AI cs.LG

    Interactive Image Generation Using Scene Graphs

    Authors: Gaurav Mittal, Shubham Agrawal, Anuva Agarwal, Sushant Mehta, Tanya Marwah

    Abstract: Recent years have witnessed some exciting developments in the domain of generating images from scene-based text descriptions. These approaches have primarily focused on generating images from a static text description and are limited to generating images in a single pass. They are unable to generate an image interactively based on an incrementally additive text description (something that is more… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: Published at ICLR 2019 Deep Generative Models for Highly Structured Data Workshop

  22. arXiv:1708.05980  [pdf, other

    cs.CV

    Attentive Semantic Video Generation using Captions

    Authors: Tanya Marwah, Gaurav Mittal, Vineeth N. Balasubramanian

    Abstract: This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to disting… ▽ More

    Submitted 21 October, 2017; v1 submitted 20 August, 2017; originally announced August 2017.

    Journal ref: Presented at ICCV 2017 (International Conference on Computer Vision)

  23. Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

    Authors: Gaurav Mittal, Tanya Marwah, Vineeth N. Balasubramanian

    Abstract: This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW can also perform text-to-video generation which, to the best of our knowledge, makes it the first approach of its kind. It combines a Variational Autoencoder~(VAE) with a Recurrent Attention Mechanism in a novel manner to create a temporally dependent sequence of… ▽ More

    Submitted 21 October, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

  24. arXiv:1308.1806  [pdf

    cs.DC

    A Survey of Current Trends in Distributed, Grid and Cloud Computing

    Authors: Gaurav Mittal, Dr. Nishtha Kesswani, Kuldeep Goswami

    Abstract: Through the 1990s to 2012 the internet changed the world of computing drastically. It started its journey with parallel computing after it advanced to distributed computing and further to grid computing. And in present scenario it creates a new world which is pronounced as a Cloud Computing [1]. These all three terms have different meanings. Cloud computing is based on backward computing schemes l… ▽ More

    Submitted 8 August, 2013; originally announced August 2013.

    Comments: 6 pages

    Journal ref: International Journal of Advanced Studies in Computers, Science & Engineering (IJASCSE Vol 2, Issue 3, 2013)