-
Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization
Authors:
Venkatesh Umaashankar,
Girish Shanmugam S,
Aditi Prakash
Abstract:
In E-commerce, it is a common practice to organize the product catalog using product taxonomy. This enables the buyer to easily locate the item they are looking for and also to explore various items available under a category. Product taxonomy is a tree structure with 3 or more levels of depth and several leaf nodes. Product categorization is a large scale classification task that assigns a catego…
▽ More
In E-commerce, it is a common practice to organize the product catalog using product taxonomy. This enables the buyer to easily locate the item they are looking for and also to explore various items available under a category. Product taxonomy is a tree structure with 3 or more levels of depth and several leaf nodes. Product categorization is a large scale classification task that assigns a category path to a particular product. Research in this area is restricted by the unavailability of good real-world datasets and the variations in taxonomy due to the absence of a standard across the different e-commerce stores. In this paper, we introduce a high-quality product taxonomy dataset focusing on clothing products which contain 186,150 images under clothing category with 3 levels and 52 leaf nodes in the taxonomy. We explain the methodology used to collect and label this dataset. Further, we establish the benchmark by comparing image classification and Attention based Sequence models for predicting the category path. Our benchmark model reaches a micro f-score of 0.92 on the test set. The dataset, code and pre-trained models are publicly available at \url{https://github.com/vumaasha/atlas}. We invite the community to improve upon these baselines.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
ActiveHARNet: Towards On-Device Deep Bayesian Active Learning for Human Activity Recognition
Authors:
Gautham Krishna Gudur,
Prahalathan Sundaramoorthy,
Venkatesh Umaashankar
Abstract:
Various health-care applications such as assisted living, fall detection etc., require modeling of user behavior through Human Activity Recognition (HAR). HAR using mobile- and wearable-based deep learning algorithms have been on the rise owing to the advancements in pervasive computing. However, there are two other challenges that need to be addressed: first, the deep learning model should suppor…
▽ More
Various health-care applications such as assisted living, fall detection etc., require modeling of user behavior through Human Activity Recognition (HAR). HAR using mobile- and wearable-based deep learning algorithms have been on the rise owing to the advancements in pervasive computing. However, there are two other challenges that need to be addressed: first, the deep learning model should support on-device incremental training (model updation) from real-time incoming data points to learn user behavior over time, while also being resource-friendly; second, a suitable ground truthing technique (like Active Learning) should help establish labels on-the-fly while also selecting only the most informative data points to query from an oracle. Hence, in this paper, we propose ActiveHARNet, a resource-efficient deep ensembled model which supports on-device Incremental Learning and inference, with capabilities to represent model uncertainties through approximations in Bayesian Neural Networks using dropout. This is combined with suitable acquisition functions for active learning. Empirical results on two publicly available wrist-worn HAR and fall detection datasets indicate that ActiveHARNet achieves considerable efficiency boost during inference across different users, with a substantially low number of acquired pool points (at least 60% reduction) during incremental learning on both datasets experimented with various acquisition functions, thus demonstrating deployment and Incremental Learning feasibility.
△ Less
Submitted 31 May, 2019;
originally announced June 2019.
-
Ask less - Scale Market Research without Annoying Your Customers
Authors:
Venkatesh Umaashankar,
Girish Shanmugam S
Abstract:
Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for…
▽ More
Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for businesses to scale the market research to a large number of customers. In this work, we model this task using Bayesian networks. We demonstrate the effectiveness of our approach using an example market segmentation of broadband customers.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
Effectiveness of Hierarchical Softmax in Large Scale Classification Tasks
Authors:
Abdul Arfat Mohammed,
Venkatesh Umaashankar
Abstract:
Typically, Softmax is used in the final layer of a neural network to get a probability distribution for output classes. But the main problem with Softmax is that it is computationally expensive for large scale data sets with large number of possible outputs. To approximate class probability efficiently on such large scale data sets we can use Hierarchical Softmax. LSHTC datasets were used to study…
▽ More
Typically, Softmax is used in the final layer of a neural network to get a probability distribution for output classes. But the main problem with Softmax is that it is computationally expensive for large scale data sets with large number of possible outputs. To approximate class probability efficiently on such large scale data sets we can use Hierarchical Softmax. LSHTC datasets were used to study the performance of the Hierarchical Softmax. LSHTC datasets have large number of categories. In this paper we evaluate and report the performance of normal Softmax Vs Hierarchical Softmax on LSHTC datasets. This evaluation used macro f1 score as a performance measure. The observation was that the performance of Hierarchical Softmax degrades as the number of classes increase.
△ Less
Submitted 13 December, 2018;
originally announced December 2018.