-
Recovering a Message from an Incomplete Set of Noisy Fragments
Authors:
Aditya Narayan Ravi,
Alireza Vahid,
Ilan Shomorony
Abstract:
We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment len…
▽ More
We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Diffusion-based learning of contact plans for agile locomotion
Authors:
Victor Dhédin,
Adithya Kumar Chinnakkonda Ravi,
Armand Jordana,
Huaijiang Zhu,
Avadesh Meduri,
Ludovic Righetti,
Bernhard Schölkopf,
Majid Khadiv
Abstract:
Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as stepping stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on stepping stones. In our framework, we use nonlinear…
▽ More
Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as stepping stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on stepping stones. In our framework, we use nonlinear model predictive control (NMPC) to generate whole-body motions for a given contact plan. To efficiently search for an optimal contact plan, we propose to use Monte Carlo tree search (MCTS). While the combination of MCTS and NMPC can quickly find a feasible plan for a given environment (a few seconds), it is not yet suitable to be used as a reactive policy. Hence, we generate a dataset for optimal goal-conditioned policy for a given scene and learn it through supervised learning. In particular, we leverage the power of diffusion models in handling multi-modality in the dataset. We test our proposed framework on a scenario where our quadruped robot Solo12 successfully jumps to different goals in a highly constrained environment.
△ Less
Submitted 16 July, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Utilizing Free Clients in Federated Learning for Focused Model Enhancement
Authors:
Aditya Narayan Ravi,
Ilan Shomorony
Abstract:
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean object…
▽ More
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean objective of a subset of clients, designated as priority clients. An important question arises: How do we choose and incentivize well aligned non priority clients to participate in the federation, while discarding misaligned clients? We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge. The algorithm employs a matching strategy that chooses non priority clients based on how similar the models loss is on their data compared to the global data, thereby ensuring the use of non priority client gradients only when it is beneficial for priority clients. This approach ensures mutual benefits as non priority clients are motivated to join when the model performs satisfactorily on their data, and priority clients can utilize their updates and computational resources when their goals align. We present a convergence analysis that quantifies the trade off between client selection and speed of convergence. Our algorithm shows faster convergence and higher test accuracy than baselines for various synthetic and benchmark datasets.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Visual-Inertial and Leg Odometry Fusion for Dynamic Locomotion
Authors:
Victor Dhédin,
Haolong Li,
Shahram Khorshidi,
Lukas Mack,
Adithya Kumar Chinnakkonda Ravi,
Avadesh Meduri,
Paarth Shah,
Felix Grimminger,
Ludovic Righetti,
Majid Khadiv,
Joerg Stueckler
Abstract:
Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman…
▽ More
Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman filter (EKF) based state estimator. The VIO module uses a stereo camera and IMU to yield low-drift 3D position and yaw orientation and drift-free pitch and roll orientation of the robot base link in the inertial frame. However, these values have a considerable amount of latency due to image processing and optimization, while the rate of update is quite low which is not suitable for low-level control. To reduce the latency, we predict the VIO state estimate at the rate of the IMU measurements of the VIO sensor. The EKF module uses the base pose and linear velocity predicted by VIO, fuses them further with a second high-rate IMU and leg odometry measurements, and produces robot state estimates with a high frequency and small latency suitable for control. We integrate this lightweight estimation framework with a nonlinear model predictive controller and show successful implementation of a set of agile locomotion behaviors, including trotting and jumping at varying horizontal speeds, on a torque-controlled quadruped robot.
△ Less
Submitted 10 October, 2022; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Classification of Microscopy Images of Breast Tissue: Region Duplication based Self-Supervision vs. Off-the Shelf Deep Representations
Authors:
Aravind Ravi
Abstract:
Breast cancer is one of the leading causes of female mortality in the world. This can be reduced when diagnoses are performed at the early stages of progression. Further, the efficiency of the process can be significantly improved with computer aided diagnosis. Deep learning based approaches have been successfully applied to achieve this. One of the limiting factors for training deep networks in a…
▽ More
Breast cancer is one of the leading causes of female mortality in the world. This can be reduced when diagnoses are performed at the early stages of progression. Further, the efficiency of the process can be significantly improved with computer aided diagnosis. Deep learning based approaches have been successfully applied to achieve this. One of the limiting factors for training deep networks in a supervised manner is the dependency on large amounts of expert annotated data. In reality, large amounts of unlabelled data and only small amounts of expert annotated data are available. In such scenarios, transfer learning approaches and self-supervised learning (SSL) based approaches can be leveraged. In this study, we propose a novel self-supervision pretext task to train a convolutional neural network (CNN) and extract domain specific features. This method was compared with deep features extracted using pre-trained CNNs such as DenseNet-121 and ResNet-50 trained on ImageNet. Additionally, two types of patch-combination methods were introduced and compared with majority voting. The methods were validated on the BACH microscopy images dataset. Results indicated that the best performance of 99% sensitivity was achieved for the deep features extracted using ResNet50 with concatenation of patch-level embedding. Preliminary results of SSL to extract domain specific features indicated that with just 15% of unlabelled data a high sensitivity of 94% can be achieved for a four class classification of microscopy images.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
A Tale of Color Variants: Representation and Self-Supervised Learning in Fashion E-Commerce
Authors:
Ujjal Kr Dutta,
Sandeep Repakula,
Maulik Parmar,
Abhinav Ravi
Abstract:
In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for ou…
▽ More
In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
"If we didn't solve small data in the past, how can we solve Big Data today?"
Authors:
Akash Ravi
Abstract:
Data is a critical aspect of the world we live in. With systems producing and consuming vast amounts of data, it is essential for businesses to digitally transform and be equipped to derive the most value out of data. Data analytics techniques can be used to augment strategic decision-making. While this overall objective of data analytics remains fairly constant, the data itself can be available i…
▽ More
Data is a critical aspect of the world we live in. With systems producing and consuming vast amounts of data, it is essential for businesses to digitally transform and be equipped to derive the most value out of data. Data analytics techniques can be used to augment strategic decision-making. While this overall objective of data analytics remains fairly constant, the data itself can be available in numerous forms and can be categorized under various contexts. In this paper, we aim to research terms such as 'small' and 'big' data, understand their attributes, and look at ways in which they can add value. Specifically, the paper probes into the question "If we didn't solve small data in the past, how can we solve Big Data today?". Based on the research, it can be inferred that, regardless of how small data might have been used, organizations can still leverage big data with the right technology and business vision.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Coded Shotgun Sequencing
Authors:
Aditya Narayan Ravi,
Alireza Vahid,
Ilan Shomorony
Abstract:
Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this p…
▽ More
Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem;i.e., the scenario where the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that, while in the uncoded case, $O(n)$ reads of length greater than $2\log{n}$ are needed for reliable reconstruction of a length-$n$ binary sequence, in the coded case, only $O(n/\log{n})$ reads of length greater than $\log{n}$ are needed for the capacity to be arbitrarily close to $1$.
△ Less
Submitted 7 February, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
SentEmojiBot: Empathising Conversations Generation with Emojis
Authors:
Akhilesh Ravi,
Amit Yadav,
Jainish Chauhan,
Jatin Dholakia,
Naman Jain,
Mayank Singh
Abstract:
The increasing use of dialogue agents makes it extremely desirable for them to understand and acknowledge the implied emotions to respond like humans with empathy. Chatbots using traditional techniques analyze emotions based on the context and meaning of the text and lack the understanding of emotions expressed through face. Emojis representing facial expressions present a promising way to express…
▽ More
The increasing use of dialogue agents makes it extremely desirable for them to understand and acknowledge the implied emotions to respond like humans with empathy. Chatbots using traditional techniques analyze emotions based on the context and meaning of the text and lack the understanding of emotions expressed through face. Emojis representing facial expressions present a promising way to express emotions. However, none of the AI systems utilizes emojis for empathetic conversation generation. We propose, SentEmojiBot, based on the SentEmoji dataset, to generate empathetic conversations with a combination of emojis and text. Evaluation metrics show that the BERT-based model outperforms the vanilla transformer model. A user study indicates that the dialogues generated by our model were understandable and adding emojis improved empathetic traits in conversations by 9.8%
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Using Game Theory to maximize the chance of victory in two-player sports
Authors:
Ambareesh Ravi,
Atharva Gokhale,
Anchit Nagwekar
Abstract:
Game Theory concepts have been successfully applied in a wide variety of domains over the past decade. Sports and games are one of the popular areas of game theory application owing to its merits and benefits in solving complex scenarios. With recent advancements in technology, the technical and analytical assistance available to players before the match, during game-play and after the match in th…
▽ More
Game Theory concepts have been successfully applied in a wide variety of domains over the past decade. Sports and games are one of the popular areas of game theory application owing to its merits and benefits in solving complex scenarios. With recent advancements in technology, the technical and analytical assistance available to players before the match, during game-play and after the match in the form of post-match analysis for any kind of sport has improved to a great extent. In this paper, we propose three novel approaches towards the development of a tool that can assist the players by providing detailed analysis of optimal decisions so that the player is well prepared with the most appropriate strategy which would produce a favourable result for a given opponent's strategy. We also describe how the system changes when we consider real-time game-play wherein the history of the opponent's strategies in the current rally is also taken into consideration while suggesting.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
A multimodal deep learning framework for scalable content based visual media retrieval
Authors:
Ambareesh Ravi,
Amith Nandakumar
Abstract:
We propose a novel, efficient, modular and scalable framework for content based visual media retrieval systems by leveraging the power of Deep Learning which is flexible to work both for images and videos conjointly and we also introduce an efficient comparison and filtering metric for retrieval. We put forward our findings from critical performance tests comparing our method to the predominant co…
▽ More
We propose a novel, efficient, modular and scalable framework for content based visual media retrieval systems by leveraging the power of Deep Learning which is flexible to work both for images and videos conjointly and we also introduce an efficient comparison and filtering metric for retrieval. We put forward our findings from critical performance tests comparing our method to the predominant conventional approach to demonstrate the feasibility and efficiency of the proposed solution with best practices, possible improvements that may further augment the ability of retrieval architectures.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Color Variants Identification in Fashion e-commerce via Contrastive Self-Supervised Representation Learning
Authors:
Ujjal Kr Dutta,
Sandeep Repakula,
Maulik Parmar,
Abhinav Ravi
Abstract:
In this paper, we utilize deep visual Representation Learning to address an important problem in fashion e-commerce: color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. At first we attempt to tackle the problem by obtaining manual annotations (depicting whether two products are color variants), and trai…
▽ More
In this paper, we utilize deep visual Representation Learning to address an important problem in fashion e-commerce: color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. At first we attempt to tackle the problem by obtaining manual annotations (depicting whether two products are color variants), and train a supervised triplet loss based neural network model to learn representations of fashion products. However, for large scale real-world industrial datasets such as addressed in our paper, it is infeasible to obtain annotations for the entire dataset, while capturing all the difficult corner cases. Interestingly, we observed that color variants are essentially manifestations of color jitter based augmentations. Thus, we instead explore Self-Supervised Learning (SSL) to solve this problem. We observed that existing state-of-the-art SSL methods perform poor, for our problem. To address this, we propose a novel SSL based color variants model that simultaneously focuses on different parts of an apparel. Quantitative and qualitative evaluation shows that our method outperforms existing SSL methods, and at times, the supervised model.
△ Less
Submitted 30 June, 2021; v1 submitted 17 April, 2021;
originally announced April 2021.
-
A Comprehensive Survey of Machine Learning Based Localization with Wireless Signals
Authors:
Daoud Burghal,
Ashwin T. Ravi,
Varun Rao,
Abdullah A. Alghafis,
Andreas F. Molisch
Abstract:
The last few decades have witnessed a growing interest in location-based services. Using localization systems based on Radio Frequency (RF) signals has proven its efficacy for both indoor and outdoor applications. However, challenges remain with respect to both complexity and accuracy of such systems. Machine Learning (ML) is one of the most promising methods for mitigating these problems, as ML (…
▽ More
The last few decades have witnessed a growing interest in location-based services. Using localization systems based on Radio Frequency (RF) signals has proven its efficacy for both indoor and outdoor applications. However, challenges remain with respect to both complexity and accuracy of such systems. Machine Learning (ML) is one of the most promising methods for mitigating these problems, as ML (especially deep learning) offers powerful practical data-driven tools that can be integrated into localization systems. In this paper, we provide a comprehensive survey of ML-based localization solutions that use RF signals. The survey spans different aspects, ranging from the system architectures, to the input features, the ML methods, and the datasets.
A main point of the paper is the interaction between the domain knowledge arising from the physics of localization systems, and the various ML approaches. Besides the ML methods, the utilized input features play a major role in shaping the localization solution; we present a detailed discussion of the different features and what could influence them, be it the underlying wireless technology or standards or the preprocessing techniques. A detailed discussion is dedicated to the different ML methods that have been applied to localization problems, discussing the underlying problem and the solution structure. Furthermore, we summarize the different ways the datasets were acquired, and then list the publicly available ones. Overall, the survey categorizes and partly summarizes insights from almost 400 papers in this field.
This survey is self-contained, as we provide a concise review of the main ML and wireless propagation concepts, which shall help the researchers in either field navigate through the surveyed solutions, and suggested open problems.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Stacked Generalization for Human Activity Recognition
Authors:
Ambareesh Ravi
Abstract:
This short paper aims to discuss the effectiveness and performance of classical machine learning approaches for Human Activity Recognition (HAR). It proposes two important models - Extra Trees and Stacked Classifier with the emphasize on the best practices, heuristics and measures that are required to maximize the performance of those models.
This short paper aims to discuss the effectiveness and performance of classical machine learning approaches for Human Activity Recognition (HAR). It proposes two important models - Extra Trees and Stacked Classifier with the emphasize on the best practices, heuristics and measures that are required to maximize the performance of those models.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
On the Capacity Enlargement of Gaussian Broadcast Channels with Passive Noisy Feedback
Authors:
Aditya Narayan Ravi,
Sibi Raj B. Pillai,
Vinod Prabhakaran,
Michèle Wigger
Abstract:
It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact rem…
▽ More
It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact remains true even when the feedback noise variance is very high, and available only from one of the receivers. While such capacity enlargements are feasible for several other feedback models in the Gaussian BC setting, it is also known that feedback does not change the capacity region for physically degraded broadcast channels. In this paper, we consider a two user GBC with independent noise realizations at the receivers, where the feedback links from the receivers are corrupted by independent additive Gaussian noise processes. We investigate the set of four noise variances, two forward and two feedback, for which no capacity enlargement is possible. A sharp characterization of this region is derived, i.e., any quadruple outside the presented region will lead to a capacity enlargement, whereas quadruples inside will leave the capacity region unchanged. Our results lead to the conclusion that when the forward noise variances are different, too noisy a feedback from one of the receivers alone is not always beneficial for enlarging the capacity region, be it from the stronger user or the weaker one, in sharp contrast to the case of equal forward noise variances.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles via Apparel Attributes
Authors:
Rajdeep Hazra Banerjee,
Abhinav Ravi,
Ujjal Kr Dutta
Abstract:
Popular fashion e-commerce platforms mostly provide details about low-level attributes of an apparel (eg, neck type, dress length, collar type) on their product detail pages. However, customers usually prefer to buy apparel based on their style information, or simply put, occasion (eg, party/ sports/ casual wear). Application of a supervised image-captioning model to generate style-based image cap…
▽ More
Popular fashion e-commerce platforms mostly provide details about low-level attributes of an apparel (eg, neck type, dress length, collar type) on their product detail pages. However, customers usually prefer to buy apparel based on their style information, or simply put, occasion (eg, party/ sports/ casual wear). Application of a supervised image-captioning model to generate style-based image captions is limited because obtaining ground-truth annotations in the form of style-based captions is difficult. This is because annotating style-based captions requires a certain amount of fashion domain expertise, and also adds to the costs and manual effort. On the contrary, low-level attribute based annotations are much more easily available. To address this issue, we propose a transfer-learning based image captioning model that is trained on a source dataset with sufficient attribute-based ground-truth captions, and used to predict style-based captions on a target dataset. The target dataset has only a limited amount of images with style-based ground-truth captions. The main motivation of our approach comes from the fact that most often there are correlations among the low-level attributes and the higher-level styles for an apparel. We leverage this fact and train our model in an encoder-decoder based framework using attention mechanism. In particular, the encoder of the model is first trained on the source dataset to obtain latent representations capturing the low-level attributes. The trained model is fine-tuned to generate style-based captions for the target dataset. To highlight the effectiveness of our method, we qualitatively and quantitatively demonstrate that the captions generated by our approach are close to the actual style information for the evaluated apparel. A Proof Of Concept for our model is under pilot at Myntra where it is exposed to some internal users for feedback.
△ Less
Submitted 11 December, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Buy Me That Look: An Approach for Recommending Similar Fashion Products
Authors:
Abhinav Ravi,
Sandeep Repakula,
Ujjal Kr Dutta,
Maulik Parmar
Abstract:
Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \textit{"Wish I could get a list of fashion items similar to the ones worn by the model!"}. This is what we address in this paper, where we propose a novel computer vision based technique called \textbf{ShopLook} to address the challenging problem of recommending similar fashion products. The propo…
▽ More
Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \textit{"Wish I could get a list of fashion items similar to the ones worn by the model!"}. This is what we address in this paper, where we propose a novel computer vision based technique called \textbf{ShopLook} to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.
△ Less
Submitted 6 April, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Holopix50k: A Large-Scale In-the-wild Stereo Image Dataset
Authors:
Yiwen Hua,
Puneet Kohli,
Pritish Uplavikar,
Anand Ravi,
Saravana Gunaseelan,
Jason Orozco,
Edward Li
Abstract:
With the mass-market adoption of dual-camera mobile phones, leveraging stereo information in computer vision has become increasingly important. Current state-of-the-art methods utilize learning-based algorithms, where the amount and quality of training samples heavily influence results. Existing stereo image datasets are limited either in size or subject variety. Hence, algorithms trained on such…
▽ More
With the mass-market adoption of dual-camera mobile phones, leveraging stereo information in computer vision has become increasingly important. Current state-of-the-art methods utilize learning-based algorithms, where the amount and quality of training samples heavily influence results. Existing stereo image datasets are limited either in size or subject variety. Hence, algorithms trained on such datasets do not generalize well to scenarios encountered in mobile photography. We present Holopix50k, a novel in-the-wild stereo image dataset, comprising 49,368 image pairs contributed by users of the Holopix mobile social platform. In this work, we describe our data collection process and statistically compare our dataset to other popular stereo datasets. We experimentally show that using our dataset significantly improves results for tasks such as stereo super-resolution and self-supervised monocular depth estimation. Finally, we showcase practical applications of our dataset to motivate novel works and use cases. The Holopix50k dataset is available at http://github.com/leiainc/holopix50k
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems
Authors:
Aditya Narayan Ravi,
Pranav Poduval,
Dr. Sharayu Moharir
Abstract:
We use a novel modification of Multi-Armed Bandits to create a new model for recommendation systems. We model the recommendation system as a bandit seeking to maximize reward by pulling on arms with unknown rewards. The catch however is that this bandit can only access these arms through an unreliable intermediate that has some level of autonomy while choosing its arms. For example, in a streaming…
▽ More
We use a novel modification of Multi-Armed Bandits to create a new model for recommendation systems. We model the recommendation system as a bandit seeking to maximize reward by pulling on arms with unknown rewards. The catch however is that this bandit can only access these arms through an unreliable intermediate that has some level of autonomy while choosing its arms. For example, in a streaming website the user has a lot of autonomy while choosing content they want to watch. The streaming sites can use targeted advertising as a means to bias opinions of these users. Here the streaming site is the bandit aiming to maximize reward and the user is the unreliable intermediate. We model the intermediate as accessing states via a Markov chain. The bandit is allowed to perturb this Markov chain. We prove fundamental theorems for this setting after which we show a close-to-optimal Explore-Commit algorithm.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Teaching DNNs to design fast fashion
Authors:
Abhinav Ravi,
Arun Patro,
Vikram Garg,
Anoop Kolar Rajagopal,
Aruna Rajan,
Rajdeep Hazra Banerjee
Abstract:
$ $"Fast Fashion" spearheads the biggest disruption in fashion that enabled to engineer resilient supply chains to quickly respond to changing fashion trends. The conventional design process in commercial manufacturing is often fed through "trends" or prevailing modes of dressing around the world that indicate sudden interest in a new form of expression, cyclic patterns, and popular modes of expre…
▽ More
$ $"Fast Fashion" spearheads the biggest disruption in fashion that enabled to engineer resilient supply chains to quickly respond to changing fashion trends. The conventional design process in commercial manufacturing is often fed through "trends" or prevailing modes of dressing around the world that indicate sudden interest in a new form of expression, cyclic patterns, and popular modes of expression for a given time frame. In this work, we propose a fully automated system to explore, detect, and finally synthesize trends in fashion into design elements by designing representative prototypes of apparel given time series signals generated from social media feeds. Our system is envisioned to be the first step in design of Fast Fashion where the production cycle for clothes from design inception to manufacturing is meant to be rapid and responsive to current "trends". It also works to reduce wastage in fashion production by taking in customer feedback on sellability at the time of design generation. We also provide an interface wherein the designers can play with multiple trending styles in fashion and visualize designs as interpolations of elements of these styles. We aim to aid the creative process through generating interesting and inspiring combinations for a designer to mull by running them through her key customers.
△ Less
Submitted 3 July, 2019; v1 submitted 27 June, 2019;
originally announced June 2019.
-
Simultaneous induction of SSMVEP and SMR Using a Gaiting video stimulus: a novel hybrid brain-computer interface
Authors:
Xin Zhang,
Guanghua Xu,
Aravind Ravi,
Sarah Pearce,
Ning Jiang
Abstract:
We proposed a novel visual stimulus for brain-computer interface. The stimulus is in the form gaiting sequence of a human. The hypothesis is that observing such a visual stimulus would simultaneously induce 1) steady-state motion visual evoked potential (SSMVEP) in the occipital area, similarly to an SSVEP stimulus; and 2) sensorimotor rhythm (SMR) in the primary sensorimotor area, because such ac…
▽ More
We proposed a novel visual stimulus for brain-computer interface. The stimulus is in the form gaiting sequence of a human. The hypothesis is that observing such a visual stimulus would simultaneously induce 1) steady-state motion visual evoked potential (SSMVEP) in the occipital area, similarly to an SSVEP stimulus; and 2) sensorimotor rhythm (SMR) in the primary sensorimotor area, because such action observation (AO) could activate the mirror neuron system. Canonical correlation analysis (CCA) was used to detect SSMVEP from occipital EEG, and event-related spectral perturbations (ERSP) were used to identify SMR in the EEG from the sensorimotor area. The results showed that the proposed visual gaiting stimulus-induced SSMVEP, with classification accuracies of 88.9 $\pm$ 12.0% in a four-class scenario. More importantly, it induced clear and sustained event-related desynchronization/synchronization (ERD/ERS) in the EEG from the sensorimotor area, while no ERD/ERS in the sensorimotor area could be observed when the other two SSVEP stimuli were used. Further, for participants with sufficiently clear SSMVEP pattern (classification accuracy > 85%), the ERD index values in mu-beta band induced by the proposed gaiting stimulus were statistically different from that of the other two types of stimulus. Therefore, a novel BCI based on the proposed stimulus has potential in neurorehabilitation applications because it simultaneously has the high accuracy of an SSMVEP (~90% accuracy in a four-class setup) and the ability to activate sensorimotor cortex. And such potential will be further explored in future studies.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Pre-Trained Convolutional Neural Network Features for Facial Expression Recognition
Authors:
Aravind Ravi
Abstract:
Facial expression recognition has been an active area in computer vision with application areas including animation, social robots, personalized banking, etc. In this study, we explore the problem of image classification for detecting facial expressions based on features extracted from pre-trained convolutional neural networks trained on ImageNet database. Features are extracted and transferred to…
▽ More
Facial expression recognition has been an active area in computer vision with application areas including animation, social robots, personalized banking, etc. In this study, we explore the problem of image classification for detecting facial expressions based on features extracted from pre-trained convolutional neural networks trained on ImageNet database. Features are extracted and transferred to a Linear Support Vector Machine for classification. All experiments are performed on two publicly available datasets such as JAFFE and CK+ database. The results show that representations learned from pre-trained networks for a task such as object recognition can be transferred, and used for facial expression recognition. Furthermore, for a small dataset, using features from earlier layers of the VGG19 network provides better classification accuracy. Accuracies of 92.26% and 92.86% were achieved for the CK+ and JAFFE datasets respectively.
△ Less
Submitted 15 December, 2018;
originally announced December 2018.
-
Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning
Authors:
Mengqi Jin,
Mohammad Taha Bahadori,
Aaron Colak,
Parminder Bhatia,
Busra Celikkaya,
Ram Bhakta,
Selvan Senthivel,
Mohammed Khalilia,
Daniel Navarro,
Borui Zhang,
Tiberiu Doman,
Arun Ravi,
Matthieu Liger,
Taha Kass-hout
Abstract:
Clinical text provides essential information to estimate the acuity of a patient during hospital stays in addition to structured clinical data. In this study, we explore how clinical text can complement a clinical predictive learning task. We leverage an internal medical natural language processing service to perform named entity extraction and negation detection on clinical notes and compose sele…
▽ More
Clinical text provides essential information to estimate the acuity of a patient during hospital stays in addition to structured clinical data. In this study, we explore how clinical text can complement a clinical predictive learning task. We leverage an internal medical natural language processing service to perform named entity extraction and negation detection on clinical notes and compose selected entities into a new text corpus to train document representations. We then propose a multimodal neural network to jointly train time series signals and unstructured clinical text representations to predict the in-hospital mortality risk for ICU patients. Our model outperforms the benchmark by 2% AUC.
△ Less
Submitted 3 December, 2018; v1 submitted 29 November, 2018;
originally announced November 2018.
-
A Dataset and Preliminary Results for Umpire Pose Detection Using SVM Classification of Deep Features
Authors:
Aravind Ravi,
Harshwin Venugopal,
Sruthy Paul,
Hamid R. Tizhoosh
Abstract:
In recent years, there has been increased interest in video summarization and automatic sports highlights generation. In this work, we introduce a new dataset, called SNOW, for umpire pose detection in the game of cricket. The proposed dataset is evaluated as a preliminary aid for developing systems to automatically generate cricket highlights. In cricket, the umpire has the authority to make impo…
▽ More
In recent years, there has been increased interest in video summarization and automatic sports highlights generation. In this work, we introduce a new dataset, called SNOW, for umpire pose detection in the game of cricket. The proposed dataset is evaluated as a preliminary aid for developing systems to automatically generate cricket highlights. In cricket, the umpire has the authority to make important decisions about events on the field. The umpire signals important events using unique hand signals and gestures. We identify four such events for classification namely SIX, NO BALL, OUT and WIDE based on detecting the pose of the umpire from the frames of a cricket video. Pre-trained convolutional neural networks such as Inception V3 and VGG19 net-works are selected as primary candidates for feature extraction. The results are obtained using a linear SVM classifier. The highest classification performance was achieved for the SVM trained on features extracted from the VGG19 network. The preliminary results suggest that the proposed system is an effective solution for the application of cricket highlights generation.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Play Duration based User-Entity Affinity Modeling in Spoken Dialog System
Authors:
Bo Xiao,
Nicholas Monath,
Shankar Ananthakrishnan,
Abishek Ravi
Abstract:
Multimedia streaming services over spoken dialog systems have become ubiquitous. User-entity affinity modeling is critical for the system to understand and disambiguate user intents and personalize user experiences. However, fully voice-based interaction demands quantification of novel behavioral cues to determine user affinities. In this work, we propose using play duration cues to learn a matrix…
▽ More
Multimedia streaming services over spoken dialog systems have become ubiquitous. User-entity affinity modeling is critical for the system to understand and disambiguate user intents and personalize user experiences. However, fully voice-based interaction demands quantification of novel behavioral cues to determine user affinities. In this work, we propose using play duration cues to learn a matrix factorization based collaborative filtering model. We first binarize play durations to obtain implicit positive and negative affinity labels. The Bayesian Personalized Ranking objective and learning algorithm are employed in our low-rank matrix factorization approach. To cope with uncertainties in the implicit affinity labels, we propose to apply a weighting function that emphasizes the importance of high confidence samples. Based on a large-scale database of Alexa music service records, we evaluate the affinity models by computing Spearman correlation between play durations and predicted affinities. Comparing different data utilizations and weighting functions, we find that employing both positive and negative affinity samples with a convex weighting function yields the best performance. Further analysis demonstrates the model's effectiveness on individual entity level and provides insights on the temporal dynamics of observed affinities.
△ Less
Submitted 29 June, 2018;
originally announced June 2018.