Search | arXiv e-print repository

Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP

Authors: Ayush Ranjan, Daniel Wen, Karthik Bhat

Abstract: Understanding the limitations and weaknesses of state-of-the-art models in artificial intelligence is crucial for their improvement and responsible application. In this research, we focus on CLIP, a model renowned for its integration of vision and language processing. Our objective is to uncover recurring problems and blind spots in CLIP's image comprehension. By delving into both the commonalitie… ▽ More Understanding the limitations and weaknesses of state-of-the-art models in artificial intelligence is crucial for their improvement and responsible application. In this research, we focus on CLIP, a model renowned for its integration of vision and language processing. Our objective is to uncover recurring problems and blind spots in CLIP's image comprehension. By delving into both the commonalities and disparities between CLIP and human image understanding, we augment our comprehension of these models' capabilities. Through our analysis, we reveal significant discrepancies in CLIP's interpretation of images compared to human perception, shedding light on areas requiring improvement. Our methodologies, the Discrepancy Analysis Framework (DAF) and the Transformative Caption Analysis for CLIP (TCAC), enable a comprehensive evaluation of CLIP's performance. We identify 14 systemic faults, including Action vs. Stillness confusion, Failure to identify the direction of movement or positioning of objects in the image, Hallucination of Water-like Features, Misattribution of Geographic Context, among others. By addressing these limitations, we lay the groundwork for the development of more accurate and nuanced image embedding models, contributing to advancements in artificial intelligence. △ Less

Submitted 30 June, 2024; originally announced July 2024.

ACM Class: F.2.2; I.2.7

arXiv:2405.10183 [pdf, other]

A Guide to Tracking Phylogenies in Parallel and Distributed Agent-based Evolution Models

Authors: Matthew Andres Moreno, Anika Ranjan, Emily Dolson, Luis Zaman

Abstract: Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yield… ▽ More Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2403.16247 [pdf, other]

Improving Sequence-to-Sequence Models for Abstractive Text Summarization Using Meta Heuristic Approaches

Authors: Aditya Saxena, Ashutosh Ranjan

Abstract: As human society transitions into the information age, reduction in our attention span is a contingency, and people who spend time reading lengthy news articles are decreasing rapidly and the need for succinct information is higher than ever before. Therefore, it is essential to provide a quick overview of important news by concisely summarizing the top news article and the most intuitive headline… ▽ More As human society transitions into the information age, reduction in our attention span is a contingency, and people who spend time reading lengthy news articles are decreasing rapidly and the need for succinct information is higher than ever before. Therefore, it is essential to provide a quick overview of important news by concisely summarizing the top news article and the most intuitive headline. When humans try to make summaries, they extract the essential information from the source and add useful phrases and grammatical annotations from the original extract. Humans have a unique ability to create abstractions. However, automatic summarization is a complicated problem to solve. The use of sequence-to-sequence (seq2seq) models for neural abstractive text summarization has been ascending as far as prevalence. Numerous innovative strategies have been proposed to develop the current seq2seq models further, permitting them to handle different issues like saliency, familiarity, and human lucidness and create excellent synopses. In this article, we aimed toward enhancing the present architectures and models for abstractive text summarization. The modifications have been aimed at fine-tuning hyper-parameters, attempting specific encoder-decoder combinations. We examined many experiments on an extensively used CNN/DailyMail dataset to check the effectiveness of various models. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.01410 [pdf, other]

Barrier Functions Inspired Reward Shaping for Reinforcement Learning

Authors: Nilaksh Nilaksh, Abhishek Ranjan, Shreenabh Agrawal, Aayush Jain, Pushpak Jagtap, Shishir Kolathaya

Abstract: Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by bar… ▽ More Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by barrier functions, offering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the proposed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real-world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework. △ Less

Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 7 pages, 10 figures, Accepted as contributed paper at ICRA 2024

ACM Class: I.2.9

arXiv:2312.11537 [pdf, other]

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

Authors: Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan

Abstract: Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by using extra input features, loss functions, and/or expensive training procedures such as knowledge distillation. In this paper, we aim to leverage SR for… ▽ More Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by using extra input features, loss functions, and/or expensive training procedures such as knowledge distillation. In this paper, we aim to leverage SR for efficiency gains without costly training or architectural changes. Specifically, we build a simple NeRF+SR pipeline that directly combines existing modules, and we propose a lightweight augmentation technique, random patch sampling, for training. Compared to existing NeRF+SR methods, our pipeline mitigates the SR computing overhead and can be trained up to 23x faster, making it feasible to run on consumer devices such as the Apple MacBook. Experiments show our pipeline can upscale NeRF outputs by 2-4x while maintaining high quality, increasing inference speeds by up to 18x on an NVIDIA V100 GPU and 12.8x on an M1 Pro chip. We conclude that SR can be a simple but effective technique for improving the efficiency of NeRF models for consumer devices. △ Less

Submitted 20 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: WACV 2024 (Oral)

arXiv:2311.18168 [pdf, other]

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

Authors: Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

Abstract: We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f… ▽ More We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world. Importantly, the relationship between speech and facial motion is one-to-many, containing both inter-speaker and intra-speaker variations and necessitating a probabilistic approach. In this paper, we identify and address key challenges that have so far limited the development of probabilistic models: lack of datasets and metrics that are suitable for training and evaluating them, as well as the difficulty of designing a model that generates diverse results while remaining faithful to a strong conditioning signal as speech. We first propose large-scale benchmark datasets and metrics suitable for probabilistic modeling. Then, we demonstrate a probabilistic model that achieves both diversity and fidelity to speech, outperforming other methods across the proposed benchmarks. Finally, we showcase useful applications of probabilistic models trained on these large-scale datasets: we can generate diverse speech-driven 3D facial motion that matches unseen speaker styles extracted from reference clips; and our synthetic meshes can be used to improve the performance of downstream audio-visual models. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17910 [pdf, other]

HUGS: Human Gaussian Splats

Authors: Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

Abstract: Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human togethe… ▽ More Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work. Our code will be announced here: https://github.com/apple/ml-hugs △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2310.15130 [pdf, other]

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

Authors: Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang

Abstract: We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separ… ▽ More We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. While naively training an end-to-end network fails to produce high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms enables the same network to jointly tackle these tasks. Our method outperforms existing methods designed for the individual tasks, demonstrating its effectiveness at utilizing 3D visual information. In a simulated study on the Matterport3D-NVAS dataset, our model achieves near-perfect accuracy on source localization, a PSNR of 26.44dB and a SDR of 14.23dB for source separation and dereverberation, resulting in a PSNR of 25.55 dB and a SDR of 14.20 dB on novel-view acoustic synthesis. We release our code and model on our project website at https://github.com/apple/ml-nvas3d. Please wear headphones when listening to the results. △ Less

Submitted 15 August, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: Interspeech 2024

arXiv:2310.00831 [pdf, other]

Action Recognition Utilizing YGAR Dataset

Authors: Shuo Wang, Amiya Ranjan, Lawrence Jiang

Abstract: The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition. Although significant effort has been made in this area, there still exist gaps in the range of available data types a more flexible and comprehensive data set could help bridge. In this paper, we present a new 3D actions data simulation engine and generate 3 sets of sample data to… ▽ More The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition. Although significant effort has been made in this area, there still exist gaps in the range of available data types a more flexible and comprehensive data set could help bridge. In this paper, we present a new 3D actions data simulation engine and generate 3 sets of sample data to demonstrate its current functionalities. With the new data generation process, we demonstrate its applications to image classifications, action recognitions and potential to evolve into a system that would allow the exploration of much more complex action recognition tasks. In order to show off these capabilities, we also train and test a list of commonly used models for image recognition to demonstrate the potential applications and capabilities of the data sets and their generation process. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 10 pages, 18 figures

arXiv:2309.15259 [pdf, other]

doi 10.1609/aaai.v37i8.26175

SLIQ: Quantum Image Similarity Networks on Noisy Quantum Computers

Authors: Daniel Silver, Tirthak Patel, Aditya Ranjan, Harshitta Gandhi, William Cutler, Devesh Tiwari

Abstract: Exploration into quantum machine learning has grown tremendously in recent years due to the ability of quantum computers to speed up classical programs. However, these efforts have yet to solve unsupervised similarity detection tasks due to the challenge of porting them to run on quantum computers. To overcome this challenge, we propose SLIQ, the first open-sourced work for resource-efficient quan… ▽ More Exploration into quantum machine learning has grown tremendously in recent years due to the ability of quantum computers to speed up classical programs. However, these efforts have yet to solve unsupervised similarity detection tasks due to the challenge of porting them to run on quantum computers. To overcome this challenge, we propose SLIQ, the first open-sourced work for resource-efficient quantum similarity detection networks, built with practical and effective quantum learning and variance-reducing algorithms. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Journal ref: Vol. 37 No. 8: AAAI-2023 Technical Tracks 8

arXiv:2309.07164 [pdf, other]

Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion

Authors: Anshul Ranjan, Kaushik Jegadeesan

Abstract: This paper presents a novel hybrid Automatic Speech Recognition (ASR) system designed specifically for resource-constrained robots. The proposed approach combines Hidden Markov Models (HMMs) with deep learning models and leverages socket programming to distribute processing tasks effectively. In this architecture, the HMM-based processing takes place within the robot, while a separate PC handles t… ▽ More This paper presents a novel hybrid Automatic Speech Recognition (ASR) system designed specifically for resource-constrained robots. The proposed approach combines Hidden Markov Models (HMMs) with deep learning models and leverages socket programming to distribute processing tasks effectively. In this architecture, the HMM-based processing takes place within the robot, while a separate PC handles the deep learning model. This synergy between HMMs and deep learning enhances speech recognition accuracy significantly. We conducted experiments across various robotic platforms, demonstrating real-time and precise speech recognition capabilities. Notably, the system exhibits adaptability to changing acoustic conditions and compatibility with low-power hardware, making it highly effective in environments with limited computational resources. This hybrid ASR paradigm opens up promising possibilities for seamless human-robot interaction. In conclusion, our research introduces a pioneering dimension to ASR techniques tailored for robotics. By employing socket programming to distribute processing tasks across distinct devices and strategically combining HMMs with deep learning models, our hybrid ASR system showcases its potential to enable robots to comprehend and respond to spoken language adeptly, even in environments with restricted computational resources. This paradigm sets a innovative course for enhancing human-robot interaction across a wide range of real-world scenarios. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: To be published in IEEE Access, 9 pages, 14 figures, Received valuable support from CCBD PESU, for associated code, see https://github.com/AnshulRanjan2004/PyHMM

MSC Class: 62M09 (Primary) 62F10; 62F12 (Secondary) ACM Class: I.2.7; I.2.9

arXiv:2308.11096 [pdf, other]

MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers

Authors: Daniel Silver, Tirthak Patel, William Cutler, Aditya Ranjan, Harshitta Gandhi, Devesh Tiwari

Abstract: Quantum machine learning and vision have come to the fore recently, with hardware advances enabling rapid advancement in the capabilities of quantum machines. Recently, quantum image generation has been explored with many potential advantages over non-quantum techniques; however, previous techniques have suffered from poor quality and robustness. To address these problems, we introduce, MosaiQ, a… ▽ More Quantum machine learning and vision have come to the fore recently, with hardware advances enabling rapid advancement in the capabilities of quantum machines. Recently, quantum image generation has been explored with many potential advantages over non-quantum techniques; however, previous techniques have suffered from poor quality and robustness. To address these problems, we introduce, MosaiQ, a high-quality quantum image generation GAN framework that can be executed on today's Near-term Intermediate Scale Quantum (NISQ) computers. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted to appear at ICCV'23

arXiv:2307.16799 [pdf, other]

Toward Privacy in Quantum Program Execution On Untrusted Quantum Cloud Computing Machines for Business-sensitive Quantum Needs

Authors: Tirthak Patel, Daniel Silver, Aditya Ranjan, Harshitta Gandhi, William Cutler, Devesh Tiwari

Abstract: Quantum computing is an emerging paradigm that has shown great promise in accelerating large-scale scientific, optimization, and machine-learning workloads. With most quantum computing solutions being offered over the cloud, it has become imperative to protect confidential and proprietary quantum code from being accessed by untrusted and/or adversarial agents. In response to this challenge, we pro… ▽ More Quantum computing is an emerging paradigm that has shown great promise in accelerating large-scale scientific, optimization, and machine-learning workloads. With most quantum computing solutions being offered over the cloud, it has become imperative to protect confidential and proprietary quantum code from being accessed by untrusted and/or adversarial agents. In response to this challenge, we propose SPYCE, which is the first known solution to obfuscate quantum code and output to prevent the leaking of any confidential information over the cloud. SPYCE implements a lightweight, scalable, and effective solution based on the unique principles of quantum computing to achieve this task. △ Less

Submitted 31 July, 2023; originally announced July 2023.

arXiv:2306.11177 [pdf, other]

Pipit: Scripting the analysis of parallel execution traces

Authors: Abhinav Bhatele, Rakrish Dhakal, Alexander Movsesyan, Aditya K. Ranjan, Onur Cankur

Abstract: Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However,… ▽ More Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are challenging to scale to large trace sizes, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats (OTF2, HPCToolkit, Projections, Nsight Systems, etc.) and provides a uniform data structure in the form of a pandas DataFrame. Pipit provides operations to aggregate, filter, and transform the events in a trace to present the data in different ways. We also provide several functions to quickly and easily identify performance issues in parallel executions. More importantly, the API is easily extensible to support custom analyses by different end users. △ Less

Submitted 14 May, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.13525 [pdf, other]

A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs

Authors: Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele

Abstract: Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN fra… ▽ More Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN framework. In addition, we employ two key strategies to further minimize communication overheads. First, we aggressively overlap expensive collective operations (reduce-scatter, all-gather, and all-reduce) with computation. Second, we develop an analytical model to identify high-performing configurations within the large search space defined by our 4D algorithm. This model empowers practitioners by simplifying the tuning process for their specific training workloads. When training an 80-billion parameter GPT on 1024 GPUs of Perlmutter, AxoNN surpasses Megatron-LM, a state-of-the-art framework, by a significant 26%. Additionally, it achieves a significantly high 57% of the theoretical peak FLOP/s or 182 PFLOP/s in total. △ Less

Submitted 14 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.12390 [pdf, other]

Pointersect: Neural Rendering with Cloud-Ray Intersection

Authors: Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel

Abstract: We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other… ▽ More We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other representations--e.g., surfaces or implicit functions--our key idea is to directly infer the intersection of a light ray with the underlying surface represented by the given point cloud. Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray. Localizing the problem into small neighborhoods enables us to train a model with only 48 meshes and apply it to unseen point clouds. Our model achieves higher estimation accuracy than state-of-the-art surface reconstruction and point-cloud rendering methods on three test sets. When applied to room-scale point clouds, without any scene-specific optimization, the model achieves competitive quality with the state-of-the-art novel-view rendering methods. Moreover, we demonstrate ability to render and manipulate Lidar-scanned point clouds such as lighting control and object insertion. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2304.01480 [pdf, other]

FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction

Authors: Noah Stier, Anurag Ranjan, Alex Colburn, Yajie Yan, Liang Yang, Fangchang Ma, Baptiste Angles

Abstract: Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without test-time optimization is feasible using deep neural networks, showing remarkable promise and high efficiency. However, the reconstructed geometry, typically represented as a 3D truncated signed distance function (TSDF), is often coarse without fine geometric details. To a… ▽ More Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without test-time optimization is feasible using deep neural networks, showing remarkable promise and high efficiency. However, the reconstructed geometry, typically represented as a 3D truncated signed distance function (TSDF), is often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. We first present a resolution-agnostic TSDF supervision strategy to provide the network with a more accurate learning signal during training, avoiding the pitfalls of TSDF interpolation seen in previous work. We then introduce a depth guidance strategy using multi-view depth estimates to enhance the scene representation and recover more accurate surfaces. Finally, we develop a novel architecture for the final layers of the network, conditioning the output TSDF prediction on high-resolution image features in addition to coarse voxel features, enabling sharper reconstruction of fine details. Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics. △ Less

Submitted 18 August, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: ICCV 2023

arXiv:2303.15437 [pdf, other]

FaceLit: Neural 3D Relightable Faces

Authors: Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel

Abstract: We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Ph… ▽ More We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Phong reflectance model in the neural volume rendering framework. Our model learns to generate shape and material properties of a face such that, when rendered according to the natural statistics of pose and illumination, produces photorealistic face images with multiview 3D and illumination consistency. Our method enables photorealistic generation of faces with explicit illumination and view controls on multiple datasets - FFHQ, MetFaces and CelebA-HQ. We show state-of-the-art photorealism among 3D aware GANs on FFHQ dataset achieving an FID score of 3.5. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: CVPR 2023

arXiv:2303.14189 [pdf, other]

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Authors: Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

Abstract: The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural repara… ▽ More The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network. We further apply train-time overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency. We show that - our model is 3.5x faster than CMT, a recent state-of-the-art hybrid transformer architecture, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset. At similar latency, our model obtains 4.2% better Top-1 accuracy on ImageNet than MobileOne. Our model consistently outperforms competing architectures across several tasks -- image classification, detection, segmentation and 3D mesh regression with significant improvement in latency on both a mobile device and a desktop GPU. Furthermore, our model is highly robust to out-of-distribution samples and corruptions, improving over competing robust models. Code and models are available at https://github.com/apple/ml-fastvit. △ Less

Submitted 17 August, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2210.14800 [pdf, other]

Naturalistic Head Motion Generation from Speech

Authors: Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald

Abstract: Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the varia… ▽ More Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. We show that, despite providing more diverse head motions, the generative model produces motions with varying degrees of perceptual quality. We finally show that objective metrics commonly used in previous research do not accurately reflect the perceptual quality of generated head motions. These results open an interesting avenue for future work to investigate better objective metrics that correlate with human perception of quality. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2207.10237 [pdf, other]

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Authors: Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, Mohammad Rastegari

Abstract: Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on… ▽ More Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures. The code is available at https://github.com/apple/ml-spin. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted at ECCV 2022

arXiv:2206.04040 [pdf, other]

MobileOne: An Improved One millisecond Mobile Backbone

Authors: Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

Abstract: Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and op… ▽ More Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural networks and provide ways to mitigate these bottlenecks. To this end, we design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile. Our best model obtains similar performance on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3% better top-1 accuracy on ImageNet than EfficientNet at similar latency. Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device. Code and models are available at https://github.com/apple/ml-mobileone △ Less

Submitted 28 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Accepted at CVPR 2023

arXiv:2204.03618 [pdf]

Pneumonia Detection in Chest X-Rays using Neural Networks

Authors: Narayana Darapaneni, Ashish Ranjan, Dany Bright, Devendra Trivedi, Ketul Kumar, Vivek Kumar, Anwesh Reddy Paduri

Abstract: With the advancement in AI, deep learning techniques are widely used to design robust classification models in several areas such as medical diagnosis tasks in which it achieves good performance. In this paper, we have proposed the CNN model (Convolutional Neural Network) for the classification of Chest X-ray images for Radiological Society of North America Pneumonia (RSNA) datasets. The study als… ▽ More With the advancement in AI, deep learning techniques are widely used to design robust classification models in several areas such as medical diagnosis tasks in which it achieves good performance. In this paper, we have proposed the CNN model (Convolutional Neural Network) for the classification of Chest X-ray images for Radiological Society of North America Pneumonia (RSNA) datasets. The study also tries to achieve the same RSNA benchmark results using the limited computational resources by trying out various approaches to the methodologies that have been implemented in recent years. The proposed method is based on a non-complex CNN and the use of transfer learning algorithms like Xception, InceptionV3/V4, EfficientNetB7. Along with this, the study also tries to achieve the same RSNA benchmark results using the limited computational resources by trying out various approaches to the methodologies that have been implemented in recent years. The RSNA benchmark MAP score is 0.25, but using the Mask RCNN model on a stratified sample of 3017 along with image augmentation gave a MAP score of 0.15. Meanwhile, the YoloV3 without any hyperparameter tuning gave the MAP score of 0.32 but still, the loss keeps decreasing. Running the model for a greater number of iterations can give better results. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2203.12575 [pdf, other]

NeuMan: Neural Human Radiance Field from a Single Video

Authors: Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, Anurag Ranjan

Abstract: Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model. To train these models,… ▽ More Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model. To train these models, we rely on existing methods to estimate the rough geometry of the human and the scene. Those rough geometry estimates allow us to create a warping field from the observation space to the canonical pose-independent space, where we train the human model in. Our method is able to learn subject specific details, including cloth wrinkles and accessories, from just a 10 seconds video clip, and to provide high quality renderings of the human under novel poses, from novel views, together with the background. △ Less

Submitted 21 September, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

arXiv:2201.02912 [pdf, other]

doi 10.1016/j.ins.2022.07.127

λ-Scaled-Attention: A Novel Fast Attention Mechanism for Efficient Modeling of Protein Sequences

Authors: Ashish Ranjan, Md Shah Fahad, Akshay Deepak

Abstract: Attention-based deep networks have been successfully applied on textual data in the field of NLP. However, their application on protein sequences poses additional challenges due to the weak semantics of the protein words, unlike the plain text words. These unexplored challenges faced by the standard attention technique include (i) vanishing attention score problem and (ii) high variations in the a… ▽ More Attention-based deep networks have been successfully applied on textual data in the field of NLP. However, their application on protein sequences poses additional challenges due to the weak semantics of the protein words, unlike the plain text words. These unexplored challenges faced by the standard attention technique include (i) vanishing attention score problem and (ii) high variations in the attention distribution. In this regard, we introduce a novel λ-scaled attention technique for fast and efficient modeling of the protein sequences that addresses both the above problems. This is used to develop the λ-scaled attention network and is evaluated for the task of protein function prediction implemented at the protein sub-sequence level. Experiments on the datasets for biological process (BP) and molecular function (MF) showed significant improvements in the F1 score values for the proposed λ-scaled attention technique over its counterpart approach based on the standard attention technique (+2.01% for BP and +4.67% for MF) and state-of-the-art ProtVecGen-Plus approach (+2.61% for BP and +4.20% for MF). Further, fast convergence (converging in half the number of epochs) and efficient learning (in terms of very low difference between the training and validation losses) were also observed during the training process. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Journal ref: Information Sciences, 2022

arXiv:2110.04252 [pdf, other]

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

Authors: Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari

Abstract: When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network… ▽ More When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network subspaces, we propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models that range from highly efficient to highly accurate. Our models require no retraining, thus our subspace of models can be deployed entirely on-device to allow adaptive network compression at inference time. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. We achieve accuracies on-par with standard models when testing our uncompressed models, and maintain high accuracy for sparsity rates above 90% when testing our compressed models. We also demonstrate that our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2110.03860 [pdf, other]

Token Pooling in Vision Transformers

Authors: Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel

Abstract: Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To impr… ▽ More Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To improve the computational complexity of all layers, we propose a novel token downsampling method, called Token Pooling, efficiently exploiting redundancies in the images and intermediate token representations. We show that, under mild assumptions, softmax-attention acts as a high-dimensional low-pass (smoothing) filter. Thus, its output contains redundancy that can be pruned to achieve a better trade-off between the computational cost and accuracy. Our new technique accurately approximates a set of tokens by minimizing the reconstruction error caused by downsampling. We solve this optimization problem via cost-efficient clustering. We rigorously analyze and compare to prior downsampling methods. Our experiments show that Token Pooling significantly improves the cost-accuracy trade-off over the state-of-the-art downsampling. Token Pooling is a simple and effective operator that can benefit many architectures. Applied to DeiT, it achieves the same ImageNet top-1 accuracy using 42% fewer computations. △ Less

Submitted 11 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023

arXiv:2012.05225 [pdf, other]

MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias

Authors: Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff

Abstract: To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images o… ▽ More To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people. The simulator first fits a 3D morphable model to a provided image, applies the desired head pose and facial expression controls, then renders the model into an image. Next, a conditional Generative Adversarial Network (GAN) conditioned on the original image and the rendered morphable model is used to produce the image of the original person with the new facial expression and head pose. We call this conditional GAN -- MorphGAN. Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression. Images generated by MorphGAN can also serve as data augmentation when training data are scarce. We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity. △ Less

Submitted 10 December, 2020; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2011.02523 [pdf, other]

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Authors: Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, Joshua M. Susskind

Abstract: For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images… ▽ More For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense per-pixel semantic instance segmentations and complete camera information for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, computation time, and annotation effort. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model. We also evaluate sim-to-real transfer performance on two real-world scene understanding tasks - semantic segmentation and 3D shape prediction - where we find that pre-training on our dataset significantly improves performance on both tasks, and achieves state-of-the-art performance on the most challenging Pix3D test set. All of our rendered image data, as well as all the code we used to generate our dataset and perform our experiments, is available online. △ Less

Submitted 17 August, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021

arXiv:2011.00773 [pdf, other]

Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music

Authors: Ashish Ranjan, Varun Nagesh Jolly Behera, Motahar Reza

Abstract: Generating music is an interesting and challenging problem in the field of machine learning. Mimicking human creativity has been popular in recent years, especially in the field of computer vision and image processing. With the advent of GANs, it is possible to generate new similar images, based on trained data. But this cannot be done for music similarly, as music has an extra temporal dimension.… ▽ More Generating music is an interesting and challenging problem in the field of machine learning. Mimicking human creativity has been popular in recent years, especially in the field of computer vision and image processing. With the advent of GANs, it is possible to generate new similar images, based on trained data. But this cannot be done for music similarly, as music has an extra temporal dimension. So it is necessary to understand how music is represented in digital form. When building models that perform this generative task, the learning and generation part is done in some high-level representation such as MIDI (Musical Instrument Digital Interface) or scores. This paper proposes a bi-directional LSTM (Long short-term memory) model with attention mechanism capable of generating similar type of music based on MIDI data. The music generated by the model follows the theme/style of the music the model is trained on. Also, due to the nature of MIDI, the tempo, instrument, and other parameters can be defined, and changed, post generation. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2011.00443 [pdf, other]

A Parallel Approach for Real-Time Face Recognition from a Large Database

Authors: Ashish Ranjan, Varun Nagesh Jolly Behera, Motahar Reza

Abstract: We present a new facial recognition system, capable of identifying a person, provided their likeness has been previously stored in the system, in real time. The system is based on storing and comparing facial embeddings of the subject, and identifying them later within a live video feed. This system is highly accurate, and is able to tag people with their ID in real time. It is able to do so, even… ▽ More We present a new facial recognition system, capable of identifying a person, provided their likeness has been previously stored in the system, in real time. The system is based on storing and comparing facial embeddings of the subject, and identifying them later within a live video feed. This system is highly accurate, and is able to tag people with their ID in real time. It is able to do so, even when using a database containing thousands of facial embeddings, by using a parallelized searching technique. This makes the system quite fast and allows it to be highly scalable. △ Less

Submitted 1 November, 2020; originally announced November 2020.

arXiv:2011.00414 [pdf, other]

Graph based Clustering Algorithm for Social Community Transmission Prediction of COVID-19

Authors: Varun Nagesh Jolly Behera, Ashish Ranjan, Motahar Reza

Abstract: A system to model the spread of COVID-19 cases after lockdown has been proposed, to define new preventive measures based on hotspots, using the graph clustering algorithm. This method allows for more lenient measures in areas less prone to the virus spread. There exist methods to model the spread of the virus, by predicting the number of confirmed cases. But the proposed system focuses more on the… ▽ More A system to model the spread of COVID-19 cases after lockdown has been proposed, to define new preventive measures based on hotspots, using the graph clustering algorithm. This method allows for more lenient measures in areas less prone to the virus spread. There exist methods to model the spread of the virus, by predicting the number of confirmed cases. But the proposed system focuses more on the preventive side of the solution from a geographical point of view, by predicting the areas or regions that may become hotspots for the virus in the near future. The fact that the virus can only be transmitted by being in close proximity to an already infected person, suggests that, the regions that can easily be reached from an existing hotspot, have a higher chance of becoming a new hotspot. Moreover, in smaller regions, even after strict provisions, positive cases have been found. To consider this fact, the geographic distance between the nearest hotspots can be used as a measure of likelihood of the region also becoming a hotspot. In this paper, a weighted graph of regions with the regions themselves as weighted nodes with weight of the nodes as the number of active cases and the distance as edge weights. The graph can be completely connected or connected based on a distance threshold. The nodes are the administrative, and the distance measure tells the possible transmission between separate communities. Using this data, the potential regions that can become hotspots can be predicted, and preventive measures can be devised. △ Less

Submitted 31 October, 2020; originally announced November 2020.

arXiv:2009.00149 [pdf, other]

GIF: Generative Interpretable Faces

Authors: Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael Black, Timo Bolkart

Abstract: Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to… ▽ More Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to disentangle different factors in an unsupervised manner, or by adding control post hoc to a pre-trained model. Unconditional GANs, however, may entangle factors that are hard to undo later. We condition our generative model on pre-defined control parameters to encourage disentanglement in the generation process. Specifically, we condition StyleGAN2 on FLAME, a generative 3D face model. While conditioning on FLAME parameters yields unsatisfactory results, we find that conditioning on rendered FLAME geometry and photometric details works well. This gives us a generative 2D face model named GIF (Generative Interpretable Faces) that offers FLAME's parametric control. Here, interpretable refers to the semantic meaning of different parameters. Given FLAME parameters for shape, pose, expressions, parameters for appearance, lighting, and an additional style vector, GIF outputs photo-realistic face images. We perform an AMT based perceptual study to quantitatively and qualitatively evaluate how well GIF follows its conditioning. The code, data, and trained model are publicly available for research purposes at http://gif.is.tue.mpg.de. △ Less

Submitted 25 November, 2020; v1 submitted 31 August, 2020; originally announced September 2020.

Comments: International Conference on 3D Vision (3DV) 2020

arXiv:2006.01897 [pdf, other]

Automatic Differentiation for All Photons Imaging to See Inside Volumetric Scattering Media

Authors: Tomohiro Maeda, Ankit Ranjan, Ramesh Raskar

Abstract: Imaging through dense scattering media - such as biological tissue, fog, and smoke - has applications in the medical and robotics fields. We propose a new framework using automatic differentiation for All Photons Imaging through homogeneous scattering media with unknown optical properties for non-invasive sensing and diagnostics. We overcome the need for the imaging target to be visible to the ill… ▽ More Imaging through dense scattering media - such as biological tissue, fog, and smoke - has applications in the medical and robotics fields. We propose a new framework using automatic differentiation for All Photons Imaging through homogeneous scattering media with unknown optical properties for non-invasive sensing and diagnostics. We overcome the need for the imaging target to be visible to the illumination source in All Photons Imaging, enabling practical and non-invasive imaging through turbid media with a simple optical setup. Our method does not require calibration to acquire the sensor position or optical properties of the media. △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:1910.11667 [pdf, other]

doi 10.1007/s11263-019-01279-w

Learning Multi-Human Optical Flow

Authors: Anurag Ranjan, David T. Hoffmann, Dimitrios Tzionas, Siyu Tang, Javier Romero, Michael J. Black

Abstract: The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body… ▽ More The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single- and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research. △ Less

Submitted 4 December, 2019; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: arXiv admin note: text overlap with arXiv:1806.05666

Report number: 2019

Journal ref: International Journal of Computer Vision (IJCV) 2019

arXiv:1910.10053 [pdf, other]

Attacking Optical Flow

Authors: Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black

Abstract: Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical fl… ▽ More Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: ICCV 2019

arXiv:1907.13615 [pdf, other]

Learning to Dress 3D People in Generative Clothing

Authors: Qianli Ma, Jinlong Yang, Anurag Ranjan, Sergi Pujades, Gerard Pons-Moll, Siyu Tang, Michael J. Black

Abstract: Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shapes. To… ▽ More Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shapes. To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. Specifically, we train a conditional Mesh-VAE-GAN to learn the clothing deformation from the SMPL body model, making clothing an additional term in SMPL. Our model is conditioned on both pose and clothing type, giving the ability to draw samples of clothing to dress different body shapes in a variety of styles and poses. To preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to 3D meshes. Our model, named CAPE, represents global shape and fine local structure, effectively extending the SMPL body model to clothing. To our knowledge, this is the first generative model that directly dresses 3D human body meshes and generalizes to different poses. The model, code and data are available for research purposes at https://cape.is.tue.mpg.de. △ Less

Submitted 22 May, 2020; v1 submitted 31 July, 2019; originally announced July 2019.

Comments: CVPR-2020 camera ready. Code and data are available at https://cape.is.tue.mpg.de

arXiv:1905.03079 [pdf, other]

Capture, Learning, and Synthesis of 3D Speaking Styles

Authors: Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black

Abstract: Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on… ▽ More Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de. △ Less

Submitted 8 May, 2019; originally announced May 2019.

Comments: To appear in CVPR 2019

arXiv:1811.01338 [pdf, other]

doi 10.1109/TCBB.2019.2911609

Deep Robust Framework for Protein Function Prediction using Variable-Length Protein Sequences

Authors: Ashish Ranjan, Md Shah Fahad, David Fernandez-Baca, Akshay Deepak, Sudhakar Tripathi

Abstract: Amino acid sequence portrays most intrinsic form of a protein and expresses primary structure of protein. The order of amino acids in a sequence enables a protein to acquire a particular stable conformation that is responsible for the functions of the protein. This relationship between a sequence and its function motivates the need to analyse the sequences for predicting protein functions. Early g… ▽ More Amino acid sequence portrays most intrinsic form of a protein and expresses primary structure of protein. The order of amino acids in a sequence enables a protein to acquire a particular stable conformation that is responsible for the functions of the protein. This relationship between a sequence and its function motivates the need to analyse the sequences for predicting protein functions. Early generation computational methods using BLAST, FASTA, etc. perform function transfer based on sequence similarity with existing databases and are computationally slow. Although machine learning based approaches are fast, they fail to perform well for long protein sequences (i.e., protein sequences with more than 300 amino acid residues). In this paper, we introduce a novel method for construction of two separate feature sets for protein sequences based on analysis of 1) single fixed-sized segments and 2) multi-sized segments, using bi-directional long short-term memory network. Further, model based on proposed feature set is combined with the state of the art Multi-lable Linear Discriminant Analysis (MLDA) features based model to improve the accuracy. Extensive evaluations using separate datasets for biological processes and molecular functions demonstrate promising results for both single-sized and multi-sized segments based feature sets. While former showed an improvement of +3.37% and +5.48%, the latter produces an improvement of +5.38% and +8.00% respectively for two datasets over the state of the art MLDA based classifier. After combining two models, there is a significant improvement of +7.41% and +9.21% respectively for two datasets compared to MLDA based classifier. Specifically, the proposed approach performed well for the long protein sequences and superior overall performance. △ Less

Submitted 19 June, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

Journal ref: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019

arXiv:1807.10267 [pdf, other]

Generating 3D faces using Convolutional Mesh Autoencoders

Authors: Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, Michael J. Black

Abstract: Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformatio… ▽ More Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We also show that, replacing the expression space of an existing state-of-the-art face model with our autoencoder, achieves a lower reconstruction error. Our data, model and code are available at http://github.com/anuragranj/coma △ Less

Submitted 31 July, 2018; v1 submitted 26 July, 2018; originally announced July 2018.

Journal ref: European Conference on Computer Vision 2018

arXiv:1806.05666 [pdf, other]

Learning Human Optical Flow

Authors: Anurag Ranjan, Javier Romero, Michael J. Black

Abstract: The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and… ▽ More The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research. △ Less

Submitted 22 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: British Machine Vision Conference 2018 (Oral)

arXiv:1805.09806 [pdf, other]

Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

Authors: Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black

Abstract: We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the… ▽ More We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems. △ Less

Submitted 11 March, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

Comments: CVPR 2019

arXiv:1703.02118 [pdf, other]

Computing in Memory with Spin-Transfer Torque Magnetic RAM

Authors: Shubham Jain, Ashish Ranjan, Kaushik Roy, Anand Raghunathan

Abstract: In-memory computing is a promising approach to addressing the processor-memory data transfer bottleneck in computing systems. We propose Spin-Transfer Torque Compute-in-Memory (STT-CiM), a design for in-memory computing with Spin-Transfer Torque Magnetic RAM (STT-MRAM). The unique properties of spintronic memory allow multiple wordlines within an array to be simultaneously enabled, opening up the… ▽ More In-memory computing is a promising approach to addressing the processor-memory data transfer bottleneck in computing systems. We propose Spin-Transfer Torque Compute-in-Memory (STT-CiM), a design for in-memory computing with Spin-Transfer Torque Magnetic RAM (STT-MRAM). The unique properties of spintronic memory allow multiple wordlines within an array to be simultaneously enabled, opening up the possibility of directly sensing functions of the values stored in multiple rows using a single access. We propose modifications to STT-MRAM peripheral circuits that leverage this principle to perform logic, arithmetic, and complex vector operations. We address the challenge of reliable in-memory computing under process variations by extending ECC schemes to detect and correct errors that occur during CiM operations. We also address the question of how STT-CiM should be integrated within a general-purpose computing system. To this end, we propose architectural enhancements to processor instruction sets and on-chip buses that enable STT-CiM to be utilized as a scratchpad memory. Finally, we present data mapping techniques to increase the effectiveness of STT-CiM. We evaluate STT-CiM using a device-to-architecture modeling framework, and integrate cycle-accurate models of STT-CiM with a commercial processor and on-chip bus (Nios II and Avalon from Intel). Our system-level evaluation shows that STT-CiM provides system-level performance improvements of 3.93x on average (upto 10.4x), and concurrently reduces memory system energy by 3.83x on average (upto 12.4x). △ Less

Submitted 20 November, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

arXiv:1611.00850 [pdf, other]

Optical Flow Estimation using a Spatial Pyramid Network

Authors: Anurag Ranjan, Michael J. Black

Abstract: We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per le… ▽ More We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning. △ Less

Submitted 21 November, 2016; v1 submitted 2 November, 2016; originally announced November 2016.

Comments: 10 pages

arXiv:1607.01254 [pdf, ps, other]

An extended MABAC for multi-attribute decision making using trapezoidal interval type-2 fuzzy numbers

Authors: Jagannath Roy, Ananta Ranjan, Animesh Debnath, Samarjit Kar

Abstract: In this paper, we attempt to extend Multi Attributive Border Approximation area Comparison (MABAC) approach for multi-attribute decision making (MADM) problems based on type-2 fuzzy sets (IT2FSs). As a special case of IT2FSs interval type-2 trapezoidal fuzzy numbers (IT2TrFNs) are adopted here to deal with uncertainties present in many practical evaluation and selection problems. A systematic desc… ▽ More In this paper, we attempt to extend Multi Attributive Border Approximation area Comparison (MABAC) approach for multi-attribute decision making (MADM) problems based on type-2 fuzzy sets (IT2FSs). As a special case of IT2FSs interval type-2 trapezoidal fuzzy numbers (IT2TrFNs) are adopted here to deal with uncertainties present in many practical evaluation and selection problems. A systematic description of MABAC based on IT2TrFNs is presented in the current study. The validity and feasibility of the proposed method are illustrated by a practical example of selecting the most suitable candidate for a software company which is heading to hire a system analysis engineer based on few attributes. Finally, a comparison with two other existing MADM methods is described. △ Less

Submitted 2 December, 2016; v1 submitted 5 July, 2016; originally announced July 2016.

Comments: 14 pages

arXiv:1505.05269 [pdf, other]

A Survey Report on Operating Systems for Tiny Networked Sensors

Authors: Alok Ranjan, H. B. Sahu, Prasant Misra

Abstract: Wireless sensor network (WSN) has attracted researchers worldwide to explore the research opportunities, with application mainly in health monitoring, industry automation, battlefields, home automation and environmental monitoring. A WSN is highly resource constrained in terms of energy, computation and memory. WSNs deployment ranges from the normal working environment up to hostile and hazardous… ▽ More Wireless sensor network (WSN) has attracted researchers worldwide to explore the research opportunities, with application mainly in health monitoring, industry automation, battlefields, home automation and environmental monitoring. A WSN is highly resource constrained in terms of energy, computation and memory. WSNs deployment ranges from the normal working environment up to hostile and hazardous environment such as in volcano monitoring and underground mines. These characteristics of WSNs hold additional set of challenges in front of the operating system designer. The objective of this survey is to highlight the features and weakness of the opearting system available for WSNs, with the focus on the current application demands. The paper also discusses the operating system design issues in terms of architecture, programming model, scheduling and memory management and support for real time applications. △ Less

Submitted 20 May, 2015; originally announced May 2015.

Comments: 12 pages, Submitted to Journal

Journal ref: Journal of Advanced Research in Networking and Communication Engineering, Vol(1) issue 1, 2014

arXiv:1310.0519 [pdf]

Evidence that Cross-Domain Re-interpretations of Creative Ideas are Recognizable

Authors: Apara Ranjan, Liane Gabora, Brian O'Connor

Abstract: The goal of this study was to investigate the translate-ability of creative works into other domains. We tested whether people were able to recognize which works of art were inspired by which pieces of music. Three expert painters created four paintings, each of which was the artist's interpretation of one of four different pieces of instrumental music. Participants were able to identify which pai… ▽ More The goal of this study was to investigate the translate-ability of creative works into other domains. We tested whether people were able to recognize which works of art were inspired by which pieces of music. Three expert painters created four paintings, each of which was the artist's interpretation of one of four different pieces of instrumental music. Participants were able to identify which paintings were inspired by which pieces of music at statistically significant above-chance levels. The findings support the hypothesis that creative ideas can exist in an at least somewhat domain-independent state of potentiality and become more well-defined as they are actualized in accordance with the constraints of a particular domain. △ Less

Submitted 9 July, 2019; v1 submitted 1 October, 2013; originally announced October 2013.

Comments: 6 pages. arXiv admin note: substantial text overlap with arXiv:1308.4706

Journal ref: In G. Stojanov & B. Indurkhya (Co-Chairs), Creativity and (early) cognitive development. Symposium conducted at the meeting of Association for the Advancement of Artificial Intelligence (AAAI), Palo Alto, CA. (2013)

arXiv:1106.3600 [pdf]

doi 10.7551/mitpress/9780262019583.003.0002

How Insight Emerges in a Distributed, Content-addressable Memory

Authors: Liane Gabora, Apara Ranjan

Abstract: We begin this chapter with the bold claim that it provides a neuroscientific explanation of the magic of creativity. Creativity presents a formidable challenge for neuroscience. Neuroscience generally involves studying what happens in the brain when someone engages in a task that involves responding to a stimulus, or retrieving information from memory and using it the right way, or at the right ti… ▽ More We begin this chapter with the bold claim that it provides a neuroscientific explanation of the magic of creativity. Creativity presents a formidable challenge for neuroscience. Neuroscience generally involves studying what happens in the brain when someone engages in a task that involves responding to a stimulus, or retrieving information from memory and using it the right way, or at the right time. If the relevant information is not already encoded in memory, the task generally requires that the individual make systematic use of information that is encoded in memory. But creativity is different. It paradoxically involves studying how someone pulls out of their brain something that was never put into it! Moreover, it must be something both new and useful, or appropriate to the task at hand. The ability to pull out of memory something new and appropriate that was never stored there in the first place is what we refer to as the magic of creativity. Even if we are so fortunate as to determine which areas of the brain are active and how these areas interact during creative thought, we will not have an answer to the question of how the brain comes up with solutions and artworks that are new and appropriate. On the other hand, since the representational capacity of neurons emerges at a level that is higher than that of the individual neurons themselves, the inner workings of neurons is too low a level to explain the magic of creativity. Thus we look to a level that is midway between gross brain regions and neurons. Since creativity generally involves combining concepts from different domains, or seeing old ideas from new perspectives, we focus our efforts on the neural mechanisms underlying the representation of concepts and ideas. Thus we ask questions about the brain at the level that accounts for its representational capacity, i.e. at the level of distributed aggregates of neurons. △ Less

Submitted 5 July, 2019; v1 submitted 17 June, 2011; originally announced June 2011.

Comments: 17 pages; 2 figures

Journal ref: In A. Bristol, O. Vartanian, & J. Kaufman (Eds.), The neuroscience of creativity (pp. 19-43). Cambridge, MA: MIT Press (2013)

arXiv:1003.1814 [pdf]

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Authors: Alok Ranjan, Harish Verma, Eatesh Kandpal, Joydip Dhar

Abstract: Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a goal of creating good quality clusters, a variety of algorithms have been developed having quality-complexity trade-offs. Among these, some algorithms seek to m… ▽ More Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a goal of creating good quality clusters, a variety of algorithms have been developed having quality-complexity trade-offs. Among these, some algorithms seek to minimize the computational complexity using certain criterion functions which are defined for the whole set of clustering solution. In this paper, we are proposing a novel document clustering algorithm based on an internal criterion function. Most commonly used partitioning clustering algorithms (e.g. k-means) have some drawbacks as they suffer from local optimum solutions and creation of empty clusters as a clustering solution. The proposed algorithm usually does not suffer from these problems and converge to a global optimum, its performance enhances with the increase in number of clusters. We have checked our algorithm against three different datasets for four different values of k (required number of clusters). △ Less

Submitted 9 March, 2010; originally announced March 2010.

Comments: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Showing 1–49 of 49 results for author: Ranjan, A