-
Real-time Recognition of Yoga Poses using computer Vision for Smart Health Care
Authors:
Abhishek Sharma,
Yash Shah,
Yash Agrawal,
Prateek Jain
Abstract:
Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI datase…
▽ More
Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI dataset has been developed which include 10 Yoga postures with around 400-900 images of each pose and also contain 5 mudras for identification of mudras postures. It contains around 500 images of each mudra. The feature has been extracted by making a skeleton on the body for yoga poses and hand for mudra poses. Two different algorithms have been used for creating a skeleton one for yoga poses and the second for hand mudras. Angles of the joints have been extracted as a features for different machine learning and deep learning models. among all the models XGBoost with RandomSearch CV is most accurate and gives 99.2\% accuracy. The complete design framework is described in the present paper.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation
Authors:
Kunal Vaswani,
Yudhik Agrawal,
Vinoo Alluri
Abstract:
Music has the power to evoke intense emotional experiences and regulate the mood of an individual. With the advent of online streaming services, research in music recommendation services has seen tremendous progress. Modern methods leveraging the listening histories of users for session-based song recommendations have overlooked the significance of features extracted from lyrics and acoustic conte…
▽ More
Music has the power to evoke intense emotional experiences and regulate the mood of an individual. With the advent of online streaming services, research in music recommendation services has seen tremendous progress. Modern methods leveraging the listening histories of users for session-based song recommendations have overlooked the significance of features extracted from lyrics and acoustic content. We address the task of song prediction through multiple modalities, including tags, lyrics, and acoustic content. In this paper, we propose a novel deep learning approach by refining Attentive Neural Networks using representations derived via a Transformer model for lyrics and Variational Autoencoder for acoustic features. Our model achieves significant improvement in performance over existing state-of-the-art models using lyrical and acoustic features alone. Furthermore, we conduct a study to investigate the impact of users' psychological health on our model's performance.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Transformer-based approach towards music emotion recognition from lyrics
Authors:
Yudhik Agrawal,
Ramaguru Guru Ravi Shanker,
Vinoo Alluri
Abstract:
The task of identifying emotions from a given music track has been an active pursuit in the Music Information Retrieval (MIR) community for years. Music emotion recognition has typically relied on acoustic features, social tags, and other metadata to identify and classify music emotions. The role of lyrics in music emotion recognition remains under-appreciated in spite of several studies reporting…
▽ More
The task of identifying emotions from a given music track has been an active pursuit in the Music Information Retrieval (MIR) community for years. Music emotion recognition has typically relied on acoustic features, social tags, and other metadata to identify and classify music emotions. The role of lyrics in music emotion recognition remains under-appreciated in spite of several studies reporting superior performance of music emotion classifiers based on features extracted from lyrics. In this study, we use the transformer-based approach model using XLNet as the base architecture which, till date, has not been used to identify emotional connotations of music based on lyrics. Our proposed approach outperforms existing methods for multiple datasets. We used a robust methodology to enhance web-crawlers' accuracy for extracting lyrics. This study has important implications in improving applications involved in playlist generation of music based on emotions in addition to improving music recommendation systems.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
GlocalNet: Class-aware Long-term Human Motion Synthesis
Authors:
Neeraj Battan,
Yudhik Agrawal,
Veeravalli Saisooryarao,
Aman Goel,
Avinash Sharma
Abstract:
Synthesis of long-term human motion skeleton sequences is essential to aid human-centric video generation with potential applications in Augmented Reality, 3D character animations, pedestrian trajectory prediction, etc. Long-term human motion synthesis is a challenging task due to multiple factors like, long-term temporal dependencies among poses, cyclic repetition across poses, bi-directional and…
▽ More
Synthesis of long-term human motion skeleton sequences is essential to aid human-centric video generation with potential applications in Augmented Reality, 3D character animations, pedestrian trajectory prediction, etc. Long-term human motion synthesis is a challenging task due to multiple factors like, long-term temporal dependencies among poses, cyclic repetition across poses, bi-directional and multi-scale dependencies among poses, variable speed of actions, and a large as well as partially overlapping space of temporal pose variations across multiple class/types of human activities. This paper aims to address these challenges to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity classes (>50). We propose a two-stage activity generation method to achieve this goal, where the first stage deals with learning the long-term global pose dependencies in activity sequences by learning to synthesize a sparse motion trajectory while the second stage addresses the generation of dense motion trajectories taking the output of the first stage. We demonstrate the superiority of the proposed method over SOTA methods using various quantitative evaluation metrics on publicly available datasets.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Towards Multimodal MIR: Predicting individual differences from music-induced movement
Authors:
Yudhik Agrawal,
Samyak Jain,
Emily Carlson,
Petri Toiviainen,
Vinoo Alluri
Abstract:
As the field of Music Information Retrieval grows, it is important to take into consideration the multi-modality of music and how aspects of musical engagement such as movement and gesture might be taken into account. Bodily movement is universally associated with music and reflective of important individual features related to music preference such as personality, mood, and empathy. Future multim…
▽ More
As the field of Music Information Retrieval grows, it is important to take into consideration the multi-modality of music and how aspects of musical engagement such as movement and gesture might be taken into account. Bodily movement is universally associated with music and reflective of important individual features related to music preference such as personality, mood, and empathy. Future multimodal MIR systems may benefit from taking these aspects into account. The current study addresses this by identifying individual differences, specifically Big Five personality traits, and scores on the Empathy and Systemizing Quotients (EQ/SQ) from participants' free dance movements. Our model successfully explored the unseen space for personality as well as EQ, SQ, which has not previously been accomplished for the latter. R2 scores for personality, EQ, and SQ were 76.3%, 77.1%, and 86.7% respectively. As a follow-up, we investigated which bodily joints were most important in defining these traits. We discuss how further research may explore how the mapping of these traits to movement patterns can be used to build a more personalized, multi-modal recommendation system, as well as potential therapeutic applications.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic
Authors:
Rohan Pandey,
Vaibhav Gautam,
Ridam Pal,
Harsh Bandhey,
Lovedeep Singh Dhingra,
Himanshu Sharma,
Chirag Jain,
Kanav Bhagat,
Arushi,
Lajjaben Patel,
Mudit Agarwal,
Samprati Agrawal,
Rishabh Jalan,
Akshat Wadhwa,
Ayush Garg,
Vihaan Misra,
Yashwin Agrawal,
Bhavika Rana,
Ponnurangam Kumaraguru,
Tavpritesh Sethi
Abstract:
Background: The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this Infodemic requires strong health messaging systems that are engaging, vernacular, scalable, effective and c…
▽ More
Background: The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this Infodemic requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation.
Objective: We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages.
Methods: We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. Results: A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot Satya increased thus proving the usefulness of an mHealth platform to mitigate health misinformation.
Conclusion: We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation.
△ Less
Submitted 30 October, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
HumanMeshNet: Polygonal Mesh Recovery of Humans
Authors:
Abbhinav Venkat,
Chaitanya Patel,
Yudhik Agrawal,
Avinash Sharma
Abstract:
3D Human Body Reconstruction from a monocular image is an important problem in computer vision with applications in virtual and augmented reality platforms, animation industry, en-commerce domain, etc. While several of the existing works formulate it as a volumetric or parametric learning with complex and indirect reliance on re-projections of the mesh, we would like to focus on implicitly learnin…
▽ More
3D Human Body Reconstruction from a monocular image is an important problem in computer vision with applications in virtual and augmented reality platforms, animation industry, en-commerce domain, etc. While several of the existing works formulate it as a volumetric or parametric learning with complex and indirect reliance on re-projections of the mesh, we would like to focus on implicitly learning the mesh representation. To that end, we propose a novel model, HumanMeshNet, that regresses a template mesh's vertices, as well as receives a regularization by the 3D skeletal locations in a multi-branch, multi-task setup. The image to mesh vertex regression is further regularized by the neighborhood constraint imposed by mesh topology ensuring smooth surface reconstruction. The proposed paradigm can theoretically learn local surface deformations induced by body shape variations and can therefore learn high-resolution meshes going ahead. We show comparable performance with SoA (in terms of surface and joint error) with far lesser computational complexity, modeling cost and therefore real-time reconstructions on three publicly available datasets. We also show the generalizability of the proposed paradigm for a similar task of predicting hand mesh models. Given these initial results, we would like to exploit the mesh topology in an explicit manner going ahead.
△ Less
Submitted 18 August, 2019;
originally announced August 2019.
-
Online Self-supervised Scene Segmentation for Micro Aerial Vehicles
Authors:
Shreyansh Daftry,
Yashasvi Agrawal,
Larry Matthies
Abstract:
Recently, there have been numerous advances in the development of payload and power constrained lightweight Micro Aerial Vehicles (MAVs). As these robots aspire for high-speed autonomous flights in complex dynamic environments, robust scene understanding at long-range becomes critical. The problem is heavily characterized by either the limitations imposed by sensor capabilities for geometry-based…
▽ More
Recently, there have been numerous advances in the development of payload and power constrained lightweight Micro Aerial Vehicles (MAVs). As these robots aspire for high-speed autonomous flights in complex dynamic environments, robust scene understanding at long-range becomes critical. The problem is heavily characterized by either the limitations imposed by sensor capabilities for geometry-based methods, or the need for large-amounts of manually annotated training data required by data-driven methods. This motivates the need to build systems that have the capability to alleviate these problems by exploiting the complimentary strengths of both geometry and data-driven methods. In this paper, we take a step in this direction and propose a generic framework for adaptive scene segmentation using self-supervised online learning. We present this in the context of vision-based autonomous MAV flight, and demonstrate the efficacy of our proposed system through extensive experiments on benchmark datasets and real-world field tests.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.