Skip to main content

Showing 1–50 of 126 results for author: Ranjan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12801  [pdf

    cs.CY cs.HC

    Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based Exploration

    Authors: Shailja Gupta, Rajesh Ranjan

    Abstract: Elite universities are a dream destination for not just students but also top employers who get a supply of amazing talents. When we hear about top universities, the first thing that comes to mind is their academic rigor, prestigious reputation, and highly successful alumni. However, society at large is not just represented by a few elite universities, but several others. We have seen several exam… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: 14 pages, 4 Figures

  2. arXiv:2407.07254  [pdf, other

    eess.IV cs.CV

    HAMIL-QA: Hierarchical Approach to Multiple Instance Learning for Atrial LGE MRI Quality Assessment

    Authors: K M Arefeen Sultan, Md Hasibul Husain Hisham, Benjamin Orkild, Alan Morris, Eugene Kholmovski, Erik Bieging, Eugene Kwan, Ravi Ranjan, Ed DiBella, Shireen Elhabian

    Abstract: The accurate evaluation of left atrial fibrosis via high-quality 3D Late Gadolinium Enhancement (LGE) MRI is crucial for atrial fibrillation management but is hindered by factors like patient movement and imaging variability. The pursuit of automated LGE MRI quality assessment is critical for enhancing diagnostic accuracy, standardizing evaluations, and improving patient outcomes. The deep learnin… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI2024, 10 pages, 2 figures

  3. arXiv:2407.04190  [pdf, other

    cs.CV

    Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset

    Authors: Rahm Ranjan, David Ahmedt-Aristizabal, Mohammad Ali Armin, Juno Kim

    Abstract: Clinical gait analysis (CGA) using computer vision is an emerging field in artificial intelligence that faces barriers of accessible, real-world data, and clear task objectives. This paper lays the foundation for current developments in CGA as well as vision-based methods and datasets suitable for gait analysis. We introduce The Gait Abnormality in Video Dataset (GAVD) in response to our review of… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    ACM Class: I.2.10

  4. arXiv:2406.02535  [pdf, other

    cs.CV

    Enhancing 2D Representation Learning with a 3D Prior

    Authors: Mehmet Aygün, Prithviraj Dhar, Zhicheng Yan, Oisin Mac Aodha, Rakesh Ranjan

    Abstract: Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D infor… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.20008  [pdf, other

    cs.CV

    Sharing Key Semantics in Transformer Makes Efficient Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu Sebe

    Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the Vision Transformers (ViTs) emergence has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages

  6. Wearable-based behaviour interpolation for semi-supervised human activity recognition

    Authors: Haoran Duan, Shidong Wang, Varun Ojha, Shizheng Wang, Yawen Huang, Yang Long, Rajiv Ranjan, Yefeng Zheng

    Abstract: While traditional feature engineering for Human Activity Recognition (HAR) involves a trial-anderror process, deep learning has emerged as a preferred method for high-level representations of sensor-based human activities. However, most deep learning-based HAR requires a large amount of labelled data and extracting HAR features from unlabelled data for effective deep learning training remains chal… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2405.15914  [pdf, other

    cs.CV

    ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

    Authors: Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge,… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.13900  [pdf, other

    cs.LG cs.CV

    Rehearsal-free Federated Domain-incremental Learning

    Authors: Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan

    Abstract: We introduce a rehearsal-free federated domain incremental learning framework, RefFiL, based on a global prompt-sharing paradigm to alleviate catastrophic forgetting challenges in federated domain-incremental learning, where unseen domains are continually learned. Typical methods for mitigating forgetting, such as the use of additional datasets and the retention of private data from earlier tasks,… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  9. arXiv:2405.13370  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

    Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

    Abstract: This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptiv… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: IEEE ISBI 2024

  10. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  11. arXiv:2405.10674  [pdf, other

    cs.CV cs.AI

    From Sora What We Can See: A Survey of Text-to-Video Generation

    Authors: Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan

    Abstract: With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark fr… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: A comprehensive list of text-to-video generation studies in this survey is available at https://github.com/soraw-ai/Awesome-Text-to-Video-Generation

  12. arXiv:2405.02608  [pdf, other

    cs.CV cs.AI cs.RO

    UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model

    Authors: Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx

    Abstract: Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also ana… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024. Code is available at https://github.com/facebookresearch/UnSAMFlow

  13. IoTSim-Osmosis-RES: Towards autonomic renewable energy-aware osmotic computing

    Authors: Tomasz Szydlo, Amadeusz Szabala, Nazar Kordiumov, Konrad Siuzdak, Lukasz Wolski, Khaled Alwasel, Fawzy Habeeb, Rajiv Ranjan

    Abstract: Internet of Things systems exists in various areas of our everyday life. For example, sensors installed in smart cities and homes are processed in edge and cloud computing centres providing several benefits that improve our lives. The place of data processing is related to the required system response times -- processing data closer to its source results in a shorter system response time. The Osmo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  14. arXiv:2404.07815  [pdf, other

    cs.LG cs.AI stat.ML

    Post-Hoc Reversal: Are We Selecting Models Prematurely?

    Authors: Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Chase Lipton

    Abstract: Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical st… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 9 pages + references + appendix, 7 figures

  15. arXiv:2404.00013  [pdf, other

    cs.LG cs.AI q-fin.ST stat.AP

    Missing Data Imputation With Granular Semantics and AI-driven Pipeline for Bankruptcy Prediction

    Authors: Debarati Chakraborty, Ravi Ranjan

    Abstract: This work focuses on designing a pipeline for the prediction of bankruptcy. The presence of missing values, high dimensional data, and highly class-imbalance databases are the major challenges in the said task. A new method for missing data imputation with granular semantics has been introduced here. The merits of granular computing have been explored here to define this method. The missing values… ▽ More

    Submitted 15 March, 2024; originally announced April 2024.

    Comments: 15 pages

  16. arXiv:2403.19495  [pdf, other

    cs.CV cs.GR

    CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

    Authors: Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari

    Abstract: The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured p… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://people.engr.tamu.edu/nimak/Papers/CoherentGS

  17. arXiv:2403.18816  [pdf, other

    cs.CV

    Garment3DGen: 3D Garment Stylization and Texture Generation

    Authors: Nikolaos Sarafianos, Tuur Stuyck, Xiaoyu Xiang, Yilei Li, Jovan Popovic, Rakesh Ranjan

    Abstract: We introduce Garment3DGen a new method to synthesize 3D garment assets from a base mesh given a single input image as guidance. Our proposed approach allows users to generate 3D textured clothes based on both real and synthetic images, such as those generated by text prompts. The generated assets can be directly draped and simulated on human bodies. First, we leverage the recent progress of image… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Project Page: https://nsarafianos.github.io/garment3dgen

  18. arXiv:2403.18730  [pdf, other

    cs.CV

    Towards Image Ambient Lighting Normalization

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, Radu Timofte

    Abstract: Lighting normalization is a crucial but underexplored restoration task with broad applications. However, existing works often simplify this task within the context of shadow removal, limiting the light sources to one and oversimplifying the scene, thus excluding complex self-shadows and restricting surface classes to smooth ones. Although promising, such simplifications hinder generalizability to… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  19. BFT-DSN: A Byzantine Fault Tolerant Decentralized Storage Network

    Authors: Hechuan Guo, Minghui Xu, Jiahao Zhang, Chunchi Liu, Rajiv Ranjan, Dongxiao Yu, Xiuzhen Cheng

    Abstract: With the rapid development of blockchain and its applications, the amount of data stored on decentralized storage networks (DSNs) has grown exponentially. DSNs bring together affordable storage resources from around the world to provide robust, decentralized storage services for tens of thousands of decentralized applications (dApps). However, existing DSNs do not offer verifiability when implemen… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 11 pages, 8 figures

  20. arXiv:2402.12712  [pdf, other

    cs.CV

    MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

    Authors: Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan

    Abstract: This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 3D generation, project page: https://mvdiffusion-plusplus.github.io/

  21. arXiv:2402.09233  [pdf, other

    cs.RO cs.AI cs.MA eess.SY math.OC

    Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms

    Authors: Michael Shaham, Risha Ranjan, Engin Kirda, Taskin Padir

    Abstract: Autonomous vehicle platoons present near- and long-term opportunities to enhance operational efficiencies and save lives. The past 30 years have seen rapid development in the autonomous driving space, enabling new technologies that will alleviate the strain placed on human drivers and reduce vehicle emissions. This paper introduces a testbed for evaluating and benchmarking platooning algorithms on… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: To be published in International Symposium on Experimental Robotics, 2023

  22. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  23. arXiv:2402.00863  [pdf, other

    cs.CV

    Geometry Transfer for Stylizing Radiance Fields

    Authors: Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos, Sungjoo Yoo, Alexander Sorkine-Hornung, Rakesh Ranjan

    Abstract: Shape and geometric patterns are essential in defining stylistic identity. However, current 3D style transfer methods predominantly focus on transferring colors and textures, often overlooking geometric aspects. In this paper, we introduce Geometry Transfer, a novel method that leverages geometric deformation for 3D style transfer. This technique employs depth maps to extract a style guide, subseq… ▽ More

    Submitted 6 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project page: https://hyblue.github.io/geo-srf/

  24. arXiv:2401.00909  [pdf, other

    cs.CV cs.LG

    Taming Mode Collapse in Score Distillation for Text-to-3D Generation

    Authors: Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

    Abstract: Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explai… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Project page: https://vita-group.github.io/3D-Mode-Collapse/

  25. arXiv:2401.00604  [pdf, other

    cs.CV

    SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

    Authors: Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

    Abstract: Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS an… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Project page: https://vita-group.github.io/SteinDreamer/

  26. arXiv:2312.14239  [pdf, other

    cs.CV eess.IV

    PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

    Authors: Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan

    Abstract: 3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded reg… ▽ More

    Submitted 5 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project Page: https://platonerf.github.io/

  27. arXiv:2312.04615  [pdf, other

    cs.LG cs.DB

    Relational Deep Learning: Graph Representation Learning on Relational Databases

    Authors: Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, Jure Leskovec

    Abstract: Much of the world's most valued data is stored in relational databases and data warehouses, where the data is organized into many tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming. The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-f… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: https://relbench.stanford.edu

  28. arXiv:2312.03640  [pdf, other

    eess.IV cs.CV

    Training Neural Networks on RAW and HDR Images for Restoration Tasks

    Authors: Lei Luo, Alexandre Chapiro, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Rafal Mantiuk

    Abstract: The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearl… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  29. arXiv:2311.11325  [pdf, other

    cs.CV eess.IV

    MoVideo: Motion-Aware Video Generation with Diffusion Models

    Authors: Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan

    Abstract: While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspe… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: project homepage: https://jingyunliang.github.io/MoVideo

  30. arXiv:2310.09653  [pdf, other

    cs.SD cs.AI eess.AS

    SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

    Authors: Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

    Abstract: We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss te… ▽ More

    Submitted 3 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted at ICML 2024

  31. Two-Stage Deep Learning Framework for Quality Assessment of Left Atrial Late Gadolinium Enhanced MRI Images

    Authors: K M Arefeen Sultan, Benjamin Orkild, Alan Morris, Eugene Kholmovski, Erik Bieging, Eugene Kwan, Ravi Ranjan, Ed DiBella, Shireen Elhabian

    Abstract: Accurate assessment of left atrial fibrosis in patients with atrial fibrillation relies on high-quality 3D late gadolinium enhancement (LGE) MRI images. However, obtaining such images is challenging due to patient motion, changing breathing patterns, or sub-optimal choice of pulse sequence parameters. Automated assessment of LGE-MRI image diagnostic quality is clinically significant as it would en… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted to STACOM 2023. 11 pages, 3 figures

  32. arXiv:2308.15266  [pdf, other

    cs.CV

    NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

    Authors: Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

    Abstract: Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this belief, in particular, for challenging and long video sequences. We understand this work as a rebuttal of those recent observations and an appeal to the commun… ▽ More

    Submitted 18 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  33. arXiv:2307.06669  [pdf, other

    cs.SD cs.CR eess.AS

    Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects

    Authors: Rishabh Ranjan, Mayank Vatsa, Richa Singh

    Abstract: Audio has become an increasingly crucial biometric modality due to its ability to provide an intuitive way for humans to interact with machines. It is currently being used for a range of applications, including person authentication to banking to virtual assistants. Research has shown that these systems are also susceptible to spoofing and attacks. Therefore, protecting audio processing systems ag… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted in IJCAI 2023

  34. arXiv:2305.07214  [pdf, other

    cs.CV cs.AI

    MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition

    Authors: Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, Yilei Li, Zhangyang Wang, Rakesh Ranjan

    Abstract: In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action cat… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2023

  35. arXiv:2304.10537  [pdf, other

    cs.CV cs.GR

    Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

    Authors: Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao

    Abstract: Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs in… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page: http://raywzy.com/NDRF

  36. arXiv:2303.16493  [pdf, other

    cs.CV

    AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation

    Authors: Hyunyoung Jung, Zhuo Hui, Lei Luo, Haitao Yang, Feng Liu, Sungjoo Yoo, Rakesh Ranjan, Denis Demandolx

    Abstract: To apply optical flow in practice, it is often necessary to resize the input to smaller dimensions in order to reduce computational costs. However, downsizing inputs makes the estimation more challenging because objects and motion ranges become smaller. Even though recent approaches have demonstrated high-quality flow estimation, they tend to fail to accurately model small objects and precise boun… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 (Highlight)

  37. arXiv:2303.00748  [pdf, other

    cs.CV

    Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

    Authors: Yawei Li, Yuchen Fan, Xiaoyu Xiang, Denis Demandolx, Rakesh Ranjan, Radu Timofte, Luc Van Gool

    Abstract: The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration. To achieve that, we start by analyzing two important properties of natural images including cross-scale similarity and anisotropic image features. Inspired by that, we propose the anchored stripe self-attention which achieves a good b… ▽ More

    Submitted 25 May, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. 12 pages, 7 figures, 11 tables

  38. arXiv:2212.07265  [pdf, other

    cs.CR

    Cross-Channel: Scalable Off-Chain Channels Supporting Fair and Atomic Cross-Chain Operations

    Authors: Yihao Guo, Minghui Xu, Dongxiao Yu, Yong Yu, Rajiv Ranjan, Xiuzhen Cheng

    Abstract: Cross-chain technology facilitates the interoperability among isolated blockchains on which users can freely communicate and transfer values. Existing cross-chain protocols suffer from the scalability problem when processing on-chain transactions. Off-chain channels, as a promising blockchain scaling technique, can enable micro-payment transactions without involving on-chain transaction settlement… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  39. arXiv:2212.03961  [pdf, other

    cs.CV

    FSID: Fully Synthetic Image Denoising via Procedural Scene Generation

    Authors: Gyeongmin Choe, Beibei Du, Seonghyeon Nam, Xiaoyu Xiang, Bo Zhu, Rakesh Ranjan

    Abstract: For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  40. arXiv:2212.01747  [pdf, other

    cs.CV

    Fast Point Cloud Generation with Straight Flows

    Authors: Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu

    Abstract: Diffusion models have emerged as a powerful tool for point cloud generation. A key component that drives the impressive performance for generating high-quality samples from noise is iteratively denoise for thousands of steps. While beneficial, the complexity of learning steps has limited its applications to many 3D real-world. To address this limitation, we propose Point Straight Flow (PSF), a mod… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

  41. arXiv:2211.08658  [pdf, other

    eess.IV cs.CV

    Consistent Direct Time-of-Flight Video Depth Super-Resolution

    Authors: Zhanghao Sun, Wei Ye, Jinhui Xiong, Gyeongmin Choe, Jialiang Wang, Shuochen Su, Rakesh Ranjan

    Abstract: Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., $\sim 20\times30$ for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-… ▽ More

    Submitted 3 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  42. arXiv:2210.13994  [pdf, other

    cs.CV

    Minutiae-Guided Fingerprint Embeddings via Vision Transformers

    Authors: Steven A. Grosz, Joshua J. Engelsma, Rajeev Ranjan, Naveen Ramakrishnan, Manoj Aggarwal, Gerard G. Medioni, Anil K. Jain

    Abstract: Minutiae matching has long dominated the field of fingerprint recognition. However, deep networks can be used to extract fixed-length embeddings from fingerprints. To date, the few studies that have explored the use of CNN architectures to extract such embeddings have shown extreme promise. Inspired by these early works, we propose the first use of a Vision Transformer (ViT) to learn a discriminat… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

  43. arXiv:2210.09082  [pdf, other

    cs.LG

    A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

    Authors: Yatin Nandwani, Rishabh Ranjan, Mausam, Parag Singla

    Abstract: There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as Neural ILP in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neura… ▽ More

    Submitted 13 January, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  44. [Re] Differentiable Spatial Planning using Transformers

    Authors: Rohit Ranjan, Himadri Bhakta, Animesh Jha, Parv Maheshwari, Debashish Chakravarty

    Abstract: This report covers our reproduction effort of the paper 'Differentiable Spatial Planning using Transformers' by Chaplot et al. . In this paper, the problem of spatial path planning in a differentiable way is considered. They show that their proposed method of using Spatial Planning Transformers outperforms prior data-driven models and leverages differentiable structures to learn mapping without a… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Journal ref: ReScience C 8.2 (#34) 2022

  45. arXiv:2206.02146  [pdf, other

    cs.CV eess.IV

    Recurrent Video Restoration Transformer with Guided Deformable Attention

    Authors: Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

    Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More

    Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted by NeurIPS 2022. Code: https://github.com/JingyunLiang/RVRT

  46. arXiv:2204.10183  [pdf, other

    cs.LG cs.DC

    Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

    Authors: Bharath Sudharsan, Dineshkumar Sundaram, Pankesh Patel, John G. Breslin, Muhammad Intizar Ali, Schahram Dustdar, Albert Zomaya, Rajiv Ranjan

    Abstract: The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the tradi… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  47. arXiv:2203.14863  [pdf, other

    cs.CV cs.MM

    HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars

    Authors: Xiaoyu Xiang, Jon Morton, Fitsum A Reda, Lucas Young, Federico Perazzi, Rakesh Ranjan, Amit Kumar, Andrea Colaco, Jan Allebach

    Abstract: A promising direction for recovering the lost information in low-resolution headshot images is utilizing a set of high-resolution exemplars from the same identity. Complementary images in the reference set can improve the generated headshot quality across many different views and poses. However, it is challenging to make the best use of multiple exemplars: the quality and alignment of each exempla… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Technical Report

  48. arXiv:2203.08140  [pdf, other

    cs.CV cs.AI cs.MM

    Learning Spatio-Temporal Downsampling for Effective Video Upscaling

    Authors: Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan

    Abstract: Downsampling is one of the most basic image processing operations. Improper spatio-temporal downsampling applied on videos can cause aliasing issues such as moiré patterns in space and the wagon-wheel effect in time. Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artif… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: Main paper: 13 pages, 8 figures; appendix: 8 pages, 10 figures

    ACM Class: I.2; I.4.3; I.4.4

  49. arXiv:2201.12288  [pdf, other

    cs.CV eess.IV

    VRT: A Video Restoration Transformer

    Authors: Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool

    Abstract: Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, wh… ▽ More

    Submitted 15 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: add results on VFI and STVSR; SOTA results (+up to 2.16dB) on video SR, video deblurring, video denoising, video frame interpolation and space-time video super-resolution. Code: https://github.com/JingyunLiang/VRT

  50. Just Enough, Just in Time, Just for "Me": Fundamental Principles for Engineering IoT-native Software Systems

    Authors: Zheng Li, Rajiv Ranjan

    Abstract: By seamlessly integrating everyday objects and by changing the way we interact with our surroundings, Internet of Things (IoT) is drastically improving the life quality of households and enhancing the productivity of businesses. Given the unique IoT characteristics, IoT applications have emerged distinctively from the mainstream application types. Inspired by the outlook of a programmable world, w… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 5 pages

    Journal ref: Proceedings of the 44th International Conference on Software Engineering (ICSE 2022)