Search | arXiv e-print repository

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

Authors: Rishubh Parihar, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu

Abstract: Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing… ▽ More Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled $\mathcal{W+}$ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the $\mathcal{W+}$ space, we train a latent mapper to translate latent codes from $\mathcal{W+}$ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing. △ Less

Submitted 24 July, 2024; originally announced August 2024.

Comments: ECCV 2024, Project page: https://rishubhpar.github.io/PreciseControl.home/

arXiv:2401.09243 [pdf, other]

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

Authors: Sabariswaran Mani, Sreyas Venkataraman, Abhranil Chandra, Adyan Rizvi, Yash Sirvi, Soumojit Bhattacharya, Aritra Hazra

Abstract: Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also be… ▽ More Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion. △ Less

Submitted 23 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: NeurIPS 2023 Train Offline Test Online Workshop and Competition (Best Paper Oral Presentation / Winning Competition Submission)

arXiv:2308.06261 [pdf, other]

Enhancing Network Management Using Code Generated by Large Language Models

Authors: Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh, Santiago Segarra, Ranveer Chandra, Srikanth Kandula

Abstract: Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate t… ▽ More Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2103.04811 [pdf]

A Framework for Enabling Safe and Resilient Food Factories for the Public Feeding Programs

Authors: Nataraj Kuntagod, Sanjay Podder, Satya Sai Srinivas Abbabathula, Venkatesh Subramanian, Giju Mathew, Suresh Kumar Mani

Abstract: Public feeding programs continue to be a major source of nutrition to a large part of the population across the world. Any disruption to these activities, like the one during the Covid-19 pandemic, can lead to adverse health outcomes, especially among children. Policymakers and other stakeholders must balance the need for continuing the feeding programs while ensuring the health and safety of work… ▽ More Public feeding programs continue to be a major source of nutrition to a large part of the population across the world. Any disruption to these activities, like the one during the Covid-19 pandemic, can lead to adverse health outcomes, especially among children. Policymakers and other stakeholders must balance the need for continuing the feeding programs while ensuring the health and safety of workers engaged in the operations. This has led to several innovations that leverage advanced technologies like AI and IOT to monitor the health and safety of workers and ensure hygienic operations. However, there are practical challenges in its implementation on a large scale. This paper presents an implementation framework to build resilient public feeding programs using a combination of intelligent technologies. The framework is a result of piloting the technology solution at a facility run as part of a large mid-day meal feeding program in India. Using existing resources like CCTV cameras and new technologies like AI and IOT, hygiene and safety compliance anomalies can be detected and reported in a resource-efficient manner. It will guide stakeholders running public feeding programs as they seek to restart suspended operations and build systems that better adapt to future crises. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 4 pages, 3 figures. To appear ICSE Workshop on Software Engineering for Healthcare, June 3, 2021, virtual

arXiv:2101.03967 [pdf, other]

doi 10.1109/ICOSC.2019.8665639

Real-Time Optimized N-gram For Mobile Devices

Authors: Sharmila Mani, Sourabh Vasant Gothe, Sourav Ghosh, Ajay Kumar Mishra, Prakhar Kulshreshtha, Bhargavi M, Muthu Kumaran

Abstract: With the increasing number of mobile devices, there has been continuous research on generating optimized Language Models (LMs) for soft keyboard. In spite of advances in this domain, building a single LM for low-end feature phones as well as high-end smartphones is still a pressing need. Hence, we propose a novel technique, Optimized N-gram (Op-Ngram), an end-to-end N-gram pipeline that utilises m… ▽ More With the increasing number of mobile devices, there has been continuous research on generating optimized Language Models (LMs) for soft keyboard. In spite of advances in this domain, building a single LM for low-end feature phones as well as high-end smartphones is still a pressing need. Hence, we propose a novel technique, Optimized N-gram (Op-Ngram), an end-to-end N-gram pipeline that utilises mobile resources efficiently for faster Word Completion (WC) and Next Word Prediction (NWP). Op-Ngram applies Stupid Backoff and pruning strategies to generate a light-weight model. The LM loading time on mobile is linear with respect to model size. We observed that Op-Ngram gives 37% improvement in Language Model (LM)-ROM size, 76% in LM-RAM size, 88% in loading time and 89% in average suggestion time as compared to SORTED array variant of BerkeleyLM. Moreover, our method shows significant performance improvement over KenLM as well. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: 2019 IEEE 13th International Conference on Semantic Computing (ICSC). Accessible at https://ieeexplore.ieee.org/document/8665639

Journal ref: 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 2019, pp. 87-92

arXiv:2101.03963 [pdf, other]

doi 10.1109/ICSC.2020.00057

Language Detection Engine for Multilingual Texting on Mobile Devices

Authors: Sourabh Vasant Gothe, Sourav Ghosh, Sharmila Mani, Guggilla Bhanodai, Ankur Agarwal, Chandramouli Sanchi

Abstract: More than 2 billion mobile users worldwide type in multiple languages in the soft keyboard. On a monolingual keyboard, 38% of falsely auto-corrected words are valid in another language. This can be easily avoided by detecting the language of typed words and then validating it in its respective language. Language detection is a well-known problem in natural language processing. In this paper, we pr… ▽ More More than 2 billion mobile users worldwide type in multiple languages in the soft keyboard. On a monolingual keyboard, 38% of falsely auto-corrected words are valid in another language. This can be easily avoided by detecting the language of typed words and then validating it in its respective language. Language detection is a well-known problem in natural language processing. In this paper, we present a fast, light-weight and accurate Language Detection Engine (LDE) for multilingual typing that dynamically adapts to user intended language in real-time. We propose a novel approach where the fusion of character N-gram model and logistic regression based selector model is used to identify the language. Additionally, we present a unique method of reducing the inference time significantly by parameter reduction technique. We also discuss various optimizations fabricated across LDE to resolve ambiguity in input text among the languages with the same character pattern. Our method demonstrates an average accuracy of 94.5% for Indian languages in Latin script and that of 98% for European languages on the code-switched data. This model outperforms fastText by 60.39% and ML-Kit by 23.67% in F1 score for European languages. LDE is faster on mobile device with an average inference time of 25.91 microseconds. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: 2020 IEEE 14th International Conference on Semantic Computing (ICSC). Accessible at https://ieeexplore.ieee.org/document/9031474

Journal ref: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), San Diego, CA, USA, 2020, pp. 279-286

arXiv:2011.06713 [pdf, other]

iHorology: Lowering the Barrier to Microsecond-level Internet Time

Authors: Sathiya Kumaran Mani, Yi Cao, Paul Barford, Darryl Veitch

Abstract: High precision, synchronized clocks are essential to a growing number of Internet applications. Standard protocols and their associated server infrastructure have been shown to typically enable client clocks to synchronize on the order of tens of milliseconds. We address one of the key challenges to high precision Internet timekeeping - the intrinsic contribution to clock error of path asymmetry b… ▽ More High precision, synchronized clocks are essential to a growing number of Internet applications. Standard protocols and their associated server infrastructure have been shown to typically enable client clocks to synchronize on the order of tens of milliseconds. We address one of the key challenges to high precision Internet timekeeping - the intrinsic contribution to clock error of path asymmetry between client and time server, a fundamental barrier to microsecond level accuracy. We first exploit results of a measurement study to quantify asymmetry and its effect on timing. We then describe three approaches to addressing the path asymmetry problem: LBBE, SBBE and K-SBBE, each based on timestamp exchange with multiple servers, with the goal of tightening bounds on asymmetry for each client. We explore their capabilities and limitations through simulation and argument. We show that substantial improvements are possible, and discuss whether, and how, the goal of microsecond accuracy might be attained. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.03901 [pdf, other]

Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective Genetic Optimization Guided By Deep Networks

Authors: Alex Mathai, Shreya Khare, Srikanth Tamilselvam, Senthil Mani

Abstract: We propose a novel genetic-algorithm technique that generates black-box adversarial examples which successfully fool neural network based text classifiers. We perform a genetic search with multi-objective optimization guided by deep learning based inferences and Seq2Seq mutation to generate semantically similar but imperceptible adversaries. We compare our approach with DeepWordBug (DWB) on SST an… ▽ More We propose a novel genetic-algorithm technique that generates black-box adversarial examples which successfully fool neural network based text classifiers. We perform a genetic search with multi-objective optimization guided by deep learning based inferences and Seq2Seq mutation to generate semantically similar but imperceptible adversaries. We compare our approach with DeepWordBug (DWB) on SST and IMDB sentiment datasets by attacking three trained models viz. char-LSTM, word-LSTM and elmo-LSTM. On an average, we achieve an attack success rate of 65.67% for SST and 36.45% for IMDB across the three models showing an improvement of 49.48% and 101% respectively. Furthermore, our qualitative study indicates that 94% of the time, the users were not able to distinguish between an original and adversarial sample. △ Less

Submitted 9 November, 2020; v1 submitted 7 November, 2020; originally announced November 2020.

arXiv:2011.01043 [pdf, other]

Evaluation of Siamese Networks for Semantic Code Search

Authors: Raunak Sinha, Utkarsh Desai, Srikanth Tamilselvam, Senthil Mani

Abstract: With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese netwo… ▽ More With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese networks are well suited to learning such joint relations between data, but have not been explored in the context of code search. In this work, we evaluate Siamese networks for this task by exploring multiple extraction network architectures. These networks independently process code and text descriptions before passing them to a Siamese network to learn embeddings in a common space. We experiment on two different datasets and discover that Siamese networks can act as strong regularizers on networks that extract rich information from code and text, which in turn helps achieve impressive performance on code search beating previous baselines on $2$ programming languages. We also analyze the embedding space of these networks and provide directions to fully leverage the power of Siamese networks for semantic code search. △ Less

Submitted 12 October, 2020; originally announced November 2020.

arXiv:2005.10220 [pdf, other]

Reducing Overlearning through Disentangled Representations by Suppressing Unknown Tasks

Authors: Naveen Panwar, Tarun Tater, Anush Sankaran, Senthil Mani

Abstract: Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand. From a privacy preservation perspective, the input visual information is not protected from the model; enabling the model to become more intelligent than it is trained to be. Current approaches for suppressing additional task learning assume the… ▽ More Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand. From a privacy preservation perspective, the input visual information is not protected from the model; enabling the model to become more intelligent than it is trained to be. Current approaches for suppressing additional task learning assume the presence of ground truth labels for the tasks to be suppressed during training time. In this research, we propose a three-fold novel contribution: (i) a model-agnostic solution for reducing model overlearning by suppressing all the unknown tasks, (ii) a novel metric to measure the trust score of a trained deep learning model, and (iii) a simulated benchmark dataset, PreserveTask, having five different fundamental image classification tasks to study the generalization nature of models. In the first set of experiments, we learn disentangled representations and suppress overlearning of five popular deep learning models: VGG16, VGG19, Inception-v1, MobileNet, and DenseNet on PreserverTask dataset. Additionally, we show results of our framework on color-MNIST dataset and practical applications of face attribute preservation in Diversity in Faces (DiF) and IMDB-Wiki dataset. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Added appendix with additional results

arXiv:2002.00754 [pdf, other]

Benchmarking Popular Classification Models' Robustness to Random and Targeted Corruptions

Authors: Utkarsh Desai, Srikanth Tamilselvam, Jassimran Kaur, Senthil Mani, Shreya Khare

Abstract: Text classification models, especially neural networks based models, have reached very high accuracy on many popular benchmark datasets. Yet, such models when deployed in real world applications, tend to perform badly. The primary reason is that these models are not tested against sufficient real world natural data. Based on the application users, the vocabulary and the style of the model's input… ▽ More Text classification models, especially neural networks based models, have reached very high accuracy on many popular benchmark datasets. Yet, such models when deployed in real world applications, tend to perform badly. The primary reason is that these models are not tested against sufficient real world natural data. Based on the application users, the vocabulary and the style of the model's input may greatly vary. This emphasizes the need for a model agnostic test dataset, which consists of various corruptions that are natural to appear in the wild. Models trained and tested on such benchmark datasets, will be more robust against real world data. However, such data sets are not easily available. In this work, we address this problem, by extending the benchmark datasets along naturally occurring corruptions such as Spelling Errors, Text Noise and Synonyms and making them publicly available. Through extensive experiments, we compare random and targeted corruption strategies using Local Interpretable Model-Agnostic Explanations(LIME). We report the vulnerabilities in two popular text classification models along these corruptions and also find that targeted corruptions can expose vulnerabilities of a model better than random choices in most cases. △ Less

Submitted 31 January, 2020; originally announced February 2020.

arXiv:1912.03984 [pdf, other]

Expert-guided Regularization via Distance Metric Learning

Authors: Shouvik Mani, Mehdi Maasoumy, Sina Pakazad, Henrik Ohlsson

Abstract: High-dimensional prediction is a challenging problem setting for traditional statistical models. Although regularization improves model performance in high dimensions, it does not sufficiently leverage knowledge on feature importances held by domain experts. As an alternative to standard regularization techniques, we propose Distance Metric Learning Regularization (DMLreg), an approach for eliciti… ▽ More High-dimensional prediction is a challenging problem setting for traditional statistical models. Although regularization improves model performance in high dimensions, it does not sufficiently leverage knowledge on feature importances held by domain experts. As an alternative to standard regularization techniques, we propose Distance Metric Learning Regularization (DMLreg), an approach for eliciting prior knowledge from domain experts and integrating that knowledge into a regularized linear model. First, we learn a Mahalanobis distance metric between observations from pairwise similarity comparisons provided by an expert. Then, we use the learned distance metric to place prior distributions on coefficients in a linear model. Through experimental results on a simulated high-dimensional prediction problem, we show that DMLreg leads to improvements in model performance when the domain expert is knowledgeable. △ Less

Submitted 9 December, 2019; originally announced December 2019.

Journal ref: Learning with Rich Experience (LIRE) Workshop, NeurIPS 2019

arXiv:1911.11433 [pdf, other]

"You might also like this model": Data Driven Approach for Recommending Deep Learning Models for Unknown Image Datasets

Authors: Ameya Prabhu, Riddhiman Dasgupta, Anush Sankaran, Srikanth Tamilselvam, Senthil Mani

Abstract: For an unknown (new) classification dataset, choosing an appropriate deep learning architecture is often a recursive, time-taking, and laborious process. In this research, we propose a novel technique to recommend a suitable architecture from a repository of known models. Further, we predict the performance accuracy of the recommended architecture on the given unknown dataset, without the need for… ▽ More For an unknown (new) classification dataset, choosing an appropriate deep learning architecture is often a recursive, time-taking, and laborious process. In this research, we propose a novel technique to recommend a suitable architecture from a repository of known models. Further, we predict the performance accuracy of the recommended architecture on the given unknown dataset, without the need for training the model. We propose a model encoder approach to learn a fixed length representation of deep learning architectures along with its hyperparameters, in an unsupervised fashion. We manually curate a repository of image datasets with corresponding known deep learning models and show that the predicted accuracy is a good estimator of the actual accuracy. We discuss the implications of the proposed approach for three benchmark images datasets and also the challenges in using the approach for text modality. To further increase the reproducibility of the proposed approach, the entire implementation is made publicly available along with the trained models. △ Less

Submitted 20 May, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

Comments: NeurIPS 2019, New in ML Group

arXiv:1911.07309 [pdf, other]

Coverage Testing of Deep Learning Models using Dataset Characterization

Authors: Senthil Mani, Anush Sankaran, Srikanth Tamilselvam, Akshay Sethi

Abstract: Deep Neural Networks (DNNs), with its promising performance, are being increasingly used in safety critical applications such as autonomous driving, cancer detection, and secure authentication. With growing importance in deep learning, there is a requirement for a more standardized framework to evaluate and test deep learning models. The primary challenge involved in automated generation of extens… ▽ More Deep Neural Networks (DNNs), with its promising performance, are being increasingly used in safety critical applications such as autonomous driving, cancer detection, and secure authentication. With growing importance in deep learning, there is a requirement for a more standardized framework to evaluate and test deep learning models. The primary challenge involved in automated generation of extensive test cases are: (i) neural networks are difficult to interpret and debug and (ii) availability of human annotators to generate specialized test points. In this research, we explain the necessity to measure the quality of a dataset and propose a test case generation system guided by the dataset properties. From a testing perspective, four different dataset quality dimensions are proposed: (i) equivalence partitioning, (ii) centroid positioning, (iii) boundary conditioning, and (iv) pair-wise boundary conditioning. The proposed system is evaluated on well known image classification datasets such as MNIST, Fashion-MNIST, CIFAR10, CIFAR100, and SVHN against popular deep learning models such as LeNet, ResNet-20, VGG-19. Further, we conduct various experiments to demonstrate the effectiveness of systematic test case generation system for evaluating deep learning models. △ Less

Submitted 17 November, 2019; originally announced November 2019.

arXiv:1906.07981 [pdf, other]

Artistic Enhancement and Style Transfer of Image Edges using Directional Pseudo-coloring

Authors: Shouvik Mani

Abstract: Computing the gradient of an image is a common step in computer vision pipelines. The image gradient quantifies the magnitude and direction of edges in an image and is used in creating features for downstream machine learning tasks. Typically, the image gradient is represented as a grayscale image. This paper introduces directional pseudo-coloring, an approach to color the image gradient in a deli… ▽ More Computing the gradient of an image is a common step in computer vision pipelines. The image gradient quantifies the magnitude and direction of edges in an image and is used in creating features for downstream machine learning tasks. Typically, the image gradient is represented as a grayscale image. This paper introduces directional pseudo-coloring, an approach to color the image gradient in a deliberate and coherent manner. By pseudo-coloring the image gradient magnitude with the image gradient direction, we can enhance the visual quality of image edges and achieve an artistic transformation of the original image. Additionally, we present a simple style transfer pipeline which learns a color map from a style image and then applies that color map to color the edges of a content image through the directional pseudo-coloring technique. Code for the algorithms and experiments is available at https://github.com/shouvikmani/edge-colorizer. △ Less

Submitted 19 June, 2019; originally announced June 2019.

Journal ref: 2nd Workshop on Humanizing AI (HAI), IJCAI 2019

arXiv:1905.02486 [pdf, other]

A Visual Programming Paradigm for Abstract Deep Learning Model Development

Authors: Srikanth Tamilselvam, Naveen Panwar, Shreya Khare, Rahul Aralikatte, Anush Sankaran, Senthil Mani

Abstract: Deep learning is one of the fastest growing technologies in computer science with a plethora of applications. But this unprecedented growth has so far been limited to the consumption of deep learning experts. The primary challenge being a steep learning curve for learning the programming libraries and the lack of intuitive systems enabling non-experts to consume deep learning. Towards this goal, w… ▽ More Deep learning is one of the fastest growing technologies in computer science with a plethora of applications. But this unprecedented growth has so far been limited to the consumption of deep learning experts. The primary challenge being a steep learning curve for learning the programming libraries and the lack of intuitive systems enabling non-experts to consume deep learning. Towards this goal, we study the effectiveness of a no-code paradigm for designing deep learning models. Particularly, a visual drag-and-drop interface is found more efficient when compared with the traditional programming and alternative visual programming paradigms. We conduct user studies of different expertise levels to measure the entry level barrier and the developer load across different programming paradigms. We obtain a System Usability Scale (SUS) of 90 and a NASA Task Load index (TLX) score of 21 for the proposed visual programming compared to 68 and 52, respectively, for the traditional programming methods. △ Less

Submitted 19 August, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

arXiv:1811.04376 [pdf, other]

Explaining Deep Learning Models using Causal Inference

Authors: Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, Senthil Mani

Abstract: Although deep learning models have been successfully applied to a variety of tasks, due to the millions of parameters, they are becoming increasingly opaque and complex. In order to establish trust for their widespread commercial use, it is important to formalize a principled framework to reason over these models. In this work, we use ideas from causal inference to describe a general framework to… ▽ More Although deep learning models have been successfully applied to a variety of tasks, due to the millions of parameters, they are becoming increasingly opaque and complex. In order to establish trust for their widespread commercial use, it is important to formalize a principled framework to reason over these models. In this work, we use ideas from causal inference to describe a general framework to reason over CNN models. Specifically, we build a Structural Causal Model (SCM) as an abstraction over a specific aspect of the CNN. We also formulate a method to quantitatively rank the filters of a convolution layer according to their counterfactual importance. We illustrate our approach with popular CNN architectures such as LeNet5, VGG19, and ResNet32. △ Less

Submitted 11 November, 2018; originally announced November 2018.

arXiv:1811.01312 [pdf, other]

Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization

Authors: Shreya Khare, Rahul Aralikatte, Senthil Mani

Abstract: Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform b… ▽ More Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio. △ Less

Submitted 3 July, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

Comments: Published in Interspeech 2019

arXiv:1810.04412 [pdf, ps, other]

doi 10.1007/978-3-319-99429-1_6

Synthesis for Vesicle Traffic Systems

Authors: Ashutosh Gupta, Somya Mani, Ankit Shukla

Abstract: Vesicle Traffic Systems (VTSs) are the material transport mechanisms among the compartments inside the biological cells. The compartments are viewed as nodes that are labeled with the containing chemicals and the transport channels are similarly viewed as labeled edges between the nodes. Understanding VTSs is an ongoing area of research and for many cells they are partially known. For example, the… ▽ More Vesicle Traffic Systems (VTSs) are the material transport mechanisms among the compartments inside the biological cells. The compartments are viewed as nodes that are labeled with the containing chemicals and the transport channels are similarly viewed as labeled edges between the nodes. Understanding VTSs is an ongoing area of research and for many cells they are partially known. For example, there may be undiscovered edges, nodes, or their labels in a VTS of a cell. It has been speculated that there are properties that the VTSs must satisfy. For example, stability, i.e., every chemical that is leaving a compartment comes back. Many synthesis questions may arise in this scenario, where we want to complete a partially known VTS under a given property. In the paper, we present novel encodings of the above questions into the QBF (quantified Boolean formula) satisfiability problems. We have implemented the encodings in a highly configurable tool and applied to a couple of found-in-nature VTSs and several synthetic graphs. Our results demonstrate that our method can scale up to the graphs of interest. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: 18 pages, 2 figures, 1 table

arXiv:1806.02474 [pdf, ps, other]

A System for Clock Synchronization in an Internet of Things

Authors: Sathiya Kumaran Mani, Ramakrishnan Durairajan, Paul Barford, Joel Sommers

Abstract: Synchronizing clocks on Internet of Things (IoT) devices is important for applications such as monitoring and real time control. In this paper, we describe a system for clock synchronization in IoT devices that is designed to be scalable, flexibly accommodate diverse hardware, and maintain tight synchronization over a range of operating conditions. We begin by examining clock drift on two standard… ▽ More Synchronizing clocks on Internet of Things (IoT) devices is important for applications such as monitoring and real time control. In this paper, we describe a system for clock synchronization in IoT devices that is designed to be scalable, flexibly accommodate diverse hardware, and maintain tight synchronization over a range of operating conditions. We begin by examining clock drift on two standard IoT prototyping platforms. We observe clock drift on the order of seconds over relatively short time periods, as well as poor clock rate stability, each of which make standard synchronization protocols ineffective. To address this problem, we develop a synchronization system, which includes a lightweight client, a new packet exchange protocol called SPoT and a scalable reference server. We evaluate the efficacy of our system over a range of configurations, operating conditions and target platforms. We find that SPoT performs synchronization 22x and 17x more accurately than MQTT and SNTP, respectively, at high noise levels, and maintains a clock accuracy of within ~15ms at various noise levels. Finally, we report on the scalability of our server implementation through microbenchmark and wide area experiments, which show that our system can scale to support large numbers of clients efficiently. △ Less

Submitted 6 June, 2018; originally announced June 2018.

arXiv:1801.02123 [pdf, ps, other]

TimeWeaver: Opportunistic One Way Delay Measurement via NTP

Authors: Ramakrishnan Durairajan, Sathiya Kumaran Mani, Paul Barford, Rob Nowak, Joel Sommers

Abstract: One-way delay (OWD) between end hosts has important implications for Internet applications, protocols, and measurement-based analyses. We describe a new approach for identifying OWDs via passive measurement of Network Time Protocol (NTP) traffic. NTP traffic offers the opportunity to measure OWDs accurately and continuously from hosts throughout the Internet. Based on detailed examina- tion of NTP… ▽ More One-way delay (OWD) between end hosts has important implications for Internet applications, protocols, and measurement-based analyses. We describe a new approach for identifying OWDs via passive measurement of Network Time Protocol (NTP) traffic. NTP traffic offers the opportunity to measure OWDs accurately and continuously from hosts throughout the Internet. Based on detailed examina- tion of NTP implementations and in-situ behavior, we develop an analysis tool that we call TimeWeaver, which enables assessment of precision and accuracy of OWD measurements from NTP. We apply TimeWeaver to a ~1TB corpus of NTP traffic collected from 19 servers located in the US and report on the characteristics of hosts and their associated OWDs, which we classify in a precision/accuracy hierarchy. To demonstrate the utility of these measurements, we apply iterative hard-threshold singular value decomposition to estimate OWDs between arbitrary hosts from the high- est tier in the hierarchy. We show that this approach results in highly accurate estimates of OWDs, with average error rates on the order of less than 2%. Finally, we outline a number of applications---in particular, IP geolocation, network operations and management---for hosts in lower tiers of the precision hierarchy that can benefit from TimeWeaver, offering directions for future work. △ Less

Submitted 6 January, 2018; originally announced January 2018.

Comments: 14 pages

arXiv:1801.01275 [pdf, other]

DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging

Authors: Senthil Mani, Anush Sankaran, Rahul Aralikatte

Abstract: For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of th… ▽ More For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of the available developers (classes). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data noisy. The existing bag-of-words (BOW) feature models do not consider the syntactical and sequential word information available in the unstructured text. We propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based bug representation is then used for training the classifier. Using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (~70% bugs in an open source bug tracking system) are leveraged, which were completely ignored in the previous studies. Another contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium (383,104 bug reports), Mozilla Core (314,388 bug reports), and Mozilla Firefox (162,307 bug reports). Experimentally we compare our approach with BOW model and machine learning approaches and observe that DBRNN-A provides a higher rank-10 average accuracy. △ Less

Submitted 4 January, 2018; originally announced January 2018.

arXiv:1801.00428 [pdf, other]

Sanskrit Sandhi Splitting using seq2(seq)^2

Authors: Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, Senthil Mani

Abstract: In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems inco… ▽ More In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well. △ Less

Submitted 15 July, 2019; v1 submitted 1 January, 2018; originally announced January 2018.

Comments: Accepted in EMNLP 2018

arXiv:1711.03543 [pdf, other]

DLPaper2Code: Auto-generation of Code from Deep Learning Research Papers

Authors: Akshay Sethi, Anush Sankaran, Naveen Panwar, Shreya Khare, Senthil Mani

Abstract: With an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. This is due to the lack of open source implementations provided by the authors. Further, re-implementing research papers in a different library is a daunting task. To address these challenges, we propose a novel extensible approach, DLPaper2Code, to extract and understand d… ▽ More With an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. This is due to the lack of open source implementations provided by the authors. Further, re-implementing research papers in a different library is a daunting task. To address these challenges, we propose a novel extensible approach, DLPaper2Code, to extract and understand deep learning design flow diagrams and tables available in a research paper and convert them to an abstract computational graph. The extracted computational graph is then converted into execution ready source code in both Keras and Caffe, in real-time. An arXiv-like website is created where the automatically generated designs is made publicly available for 5,000 research papers. The generated designs could be rated and edited using an intuitive drag-and-drop UI framework in a crowdsourced manner. To evaluate our approach, we create a simulated dataset with over 216,000 valid design visualizations using a manually defined grammar. Experiments on the simulated dataset show that the proposed framework provide more than $93\%$ accuracy in flow diagram content extraction. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: AAAI2018

arXiv:1711.02012 [pdf, other]

Hi, how can I help you?: Automating enterprise IT support help desks

Authors: Senthil Mani, Neelamadhav Gantayat, Rahul Aralikatte, Monika Gupta, Sampath Dechu, Anush Sankaran, Shreya Khare, Barry Mitchell, Hemamalini Subramanian, Hema Venkatarangan

Abstract: Question answering is one of the primary challenges of natural language understanding. In realizing such a system, providing complex long answers to questions is a challenging task as opposed to factoid answering as the former needs context disambiguation. The different methods explored in the literature can be broadly classified into three categories namely: 1) classification based, 2) knowledge… ▽ More Question answering is one of the primary challenges of natural language understanding. In realizing such a system, providing complex long answers to questions is a challenging task as opposed to factoid answering as the former needs context disambiguation. The different methods explored in the literature can be broadly classified into three categories namely: 1) classification based, 2) knowledge graph based and 3) retrieval based. Individually, none of them address the need of an enterprise wide assistance system for an IT support and maintenance domain. In this domain the variance of answers is large ranging from factoid to structured operating procedures; the knowledge is present across heterogeneous data sources like application specific documentation, ticket management systems and any single technique for a general purpose assistance is unable to scale for such a landscape. To address this, we have built a cognitive platform with capabilities adopted for this domain. Further, we have built a general purpose question answering system leveraging the platform that can be instantiated for multiple products, technologies in the support domain. The system uses a novel hybrid answering model that orchestrates across a deep learning classifier, a knowledge graph based context disambiguation module and a sophisticated bag-of-words search system. This orchestration performs context switching for a provided question and also does a smooth hand-off of the question to a human expert if none of the automated techniques can provide a confident answer. This system has been deployed across 675 internal enterprise IT support and maintenance projects. △ Less

Submitted 2 November, 2017; originally announced November 2017.

Comments: To appear in IAAI 2018

arXiv:1710.02595 [pdf, other]

Intelligent Pothole Detection and Road Condition Assessment

Authors: Umang Bhatt, Shouvik Mani, Edgar Xi, J. Zico Kolter

Abstract: Poor road conditions are a public nuisance, causing passenger discomfort, damage to vehicles, and accidents. In the U.S., road-related conditions are a factor in 22,000 of the 42,000 traffic fatalities each year. Although we often complain about bad roads, we have no way to detect or report them at scale. To address this issue, we developed a system to detect potholes and assess road conditions in… ▽ More Poor road conditions are a public nuisance, causing passenger discomfort, damage to vehicles, and accidents. In the U.S., road-related conditions are a factor in 22,000 of the 42,000 traffic fatalities each year. Although we often complain about bad roads, we have no way to detect or report them at scale. To address this issue, we developed a system to detect potholes and assess road conditions in real-time. Our solution is a mobile application that captures data on a car's movement from gyroscope and accelerometer sensors in the phone. To assess roads using this sensor data, we trained SVM models to classify road conditions with 93% accuracy and potholes with 92% accuracy, beating the base rate for both problems. As the user drives, the models use the sensor data to classify whether the road is good or bad, and whether it contains potholes. Then, the classification results are used to create data-rich maps that illustrate road conditions across the city. Our system will empower civic officials to identify and repair damaged roads which inconvenience passengers and cause accidents. This paper details our data science process for collecting training data on real roads, transforming noisy sensor data into useful signals, training and evaluating machine learning models, and deploying those models to production through a real-time classification app. It also highlights how cities can use our system to crowdsource data and deliver road repair resources to areas in need. △ Less

Submitted 10 October, 2017; v1 submitted 6 October, 2017; originally announced October 2017.

Comments: Presented at the Data For Good Exchange 2017

arXiv:1708.04968 [pdf, other]

doi 10.1145/3152494.3152500

Fault in your stars: An Analysis of Android App Reviews

Authors: Rahul Aralikatte, Giriprasad Sridhara, Neelamadhav Gantayat, Senthil Mani

Abstract: Mobile app distribution platforms such as Google Play Store allow users to share their feedback about downloaded apps in the form of a review comment and a corresponding star rating. Typically, the star rating ranges from one to five stars, with one star denoting a high sense of dissatisfaction with the app and five stars denoting a high sense of satisfaction. Unfortunately, due to a variety of… ▽ More Mobile app distribution platforms such as Google Play Store allow users to share their feedback about downloaded apps in the form of a review comment and a corresponding star rating. Typically, the star rating ranges from one to five stars, with one star denoting a high sense of dissatisfaction with the app and five stars denoting a high sense of satisfaction. Unfortunately, due to a variety of reasons, often the star rating provided by a user is inconsistent with the opinion expressed in the review. For example, consider the following review for the Facebook App on Android; "Awesome App". One would reasonably expect the rating for this review to be five stars, but the actual rating is one star! Such inconsistent ratings can lead to a deflated (or inflated) overall average rating of an app which can affect user downloads, as typically users look at the average star ratings while making a decision on downloading an app. Also, the app developers receive a biased feedback about the application that does not represent ground reality. This is especially significant for small apps with a few thousand downloads as even a small number of mismatched reviews can bring down the average rating drastically. In this paper, we conducted a study on this review-rating mismatch problem. We manually examined 8600 reviews from 10 popular Android apps and found that 20% of the ratings in our dataset were inconsistent with the review. Further, we developed three systems; two of which were based on traditional machine learning and one on deep learning to automatically identify reviews whose rating did not match with the opinion expressed in the review. Our deep learning system performed the best and had an accuracy of 92% in identifying the correct star rating to be associated with a given review. △ Less

Submitted 11 August, 2018; v1 submitted 16 August, 2017; originally announced August 2017.

Comments: Accepted in CoDS-COMAD 2018. Preprint

arXiv:1708.04923 [pdf, other]

mAnI: Movie Amalgamation using Neural Imitation

Authors: Naveen Panwar, Shreya Khare, Neelamadhav Gantayat, Rahul Aralikatte, Senthil Mani, Anush Sankaran

Abstract: Cross-modal data retrieval has been the basis of various creative tasks performed by Artificial Intelligence (AI). One such highly challenging task for AI is to convert a book into its corresponding movie, which most of the creative film makers do as of today. In this research, we take the first step towards it by visualizing the content of a book using its corresponding movie visuals. Given a set… ▽ More Cross-modal data retrieval has been the basis of various creative tasks performed by Artificial Intelligence (AI). One such highly challenging task for AI is to convert a book into its corresponding movie, which most of the creative film makers do as of today. In this research, we take the first step towards it by visualizing the content of a book using its corresponding movie visuals. Given a set of sentences from a book or even a fan-fiction written in the same universe, we employ deep learning models to visualize the input by stitching together relevant frames from the movie. We studied and compared three different types of setting to match the book with the movie content: (i) Dialog model: using only the dialog from the movie, (ii) Visual model: using only the visual content from the movie, and (iii) Hybrid model: using the dialog and the visual content from the movie. Experiments on the publicly available MovieBook dataset shows the effectiveness of the proposed models. △ Less

Submitted 16 August, 2017; originally announced August 2017.

Comments: Accepted in ML4Creativity workshop in KDD 2017. Preprint

arXiv:1708.04915 [pdf, other]

doi 10.1109/ICSE-NIER.2017.13

DARVIZ: Deep Abstract Representation, Visualization, and Verification of Deep Learning Models

Authors: Anush Sankaran, Rahul Aralikatte, Senthil Mani, Shreya Khare, Naveen Panwar, Neelamadhav Gantayat

Abstract: Traditional software engineering programming paradigms are mostly object or procedure oriented, driven by deterministic algorithms. With the advent of deep learning and cognitive sciences there is an emerging trend for data-driven programming, creating a shift in the programming paradigm among the software engineering communities. Visualizing and interpreting the execution of a current large scale… ▽ More Traditional software engineering programming paradigms are mostly object or procedure oriented, driven by deterministic algorithms. With the advent of deep learning and cognitive sciences there is an emerging trend for data-driven programming, creating a shift in the programming paradigm among the software engineering communities. Visualizing and interpreting the execution of a current large scale data-driven software development is challenging. Further, for deep learning development there are many libraries in multiple programming languages such as TensorFlow (Python), CAFFE (C++), Theano (Python), Torch (Lua), and Deeplearning4j (Java), driving a huge need for interoperability across libraries. △ Less

Submitted 16 August, 2017; originally announced August 2017.

Comments: Accepted in ICSE NIER 2017. Preprint

arXiv:1206.6853 [pdf]

A theoretical study of Y structures for causal discovery

Authors: Subramani Mani, Peter L. Spirtes, Gregory F. Cooper

Abstract: There are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserv… ▽ More There are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserved common causes. In doing so, the algorithm does not ever need to assign scores to causal structures with unobserved common causes. The algorithm is based on the identification of so called Y substructures within Bayesian network structures that can be learned from observational data. An example of a Y substructure is A -> C, B -> C, C -> D. After providing background on causal discovery, the paper proves the conditions under which the algorithm is reliable in the large sample limit. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

Report number: UAI-P-2006-PG-314-323

arXiv:1006.2702 [pdf]

SPIM Architecture for MVC based Web Applications

Authors: R. Sridaran, G. Padmavathi, K. Iyakutti, M. N. S. Mani

Abstract: The Model / View / Controller design pattern divides an application environment into three components to handle the user-interactions, computations and output respectively. This separation greatly favors architectural reusability. The pattern works well in the case of single-address space and not proven to be efficient for web applications involving multiple address spaces. Web applications force… ▽ More The Model / View / Controller design pattern divides an application environment into three components to handle the user-interactions, computations and output respectively. This separation greatly favors architectural reusability. The pattern works well in the case of single-address space and not proven to be efficient for web applications involving multiple address spaces. Web applications force the designers to decide which of the components of the pattern are to be partitioned between the server and client(s) before the design phase commences. For any rapidly growing web application, it is very difficult to incorporate future changes in policies related to partitioning. One solution to this problem is to duplicate the Model and controller components at both server and client(s). However, this may add further problems like delayed data fetch, security and scalability issues. In order to overcome this, a new architecture SPIM has been proposed that deals with the partitioning problem in an alternative way. SPIM shows tremendous improvements in performance when compared with a similar architecture. △ Less

Submitted 14 June, 2010; originally announced June 2010.

Showing 1–31 of 31 results for author: Mani, S