-
On the Predictive Capability of Dynamic Mode Decomposition for Nonlinear Periodic Systems with Focus on Orbital Mechanics
Authors:
Sriram Narayanan,
Mohamed Naveed Gul Mohamed,
Indranil Nayak,
Suman Chakravorty,
Mrinal Kumar
Abstract:
This paper discusses the predictive capability of Dynamic Mode Decomposition (DMD) in the context of orbital mechanics. The focus is specifically on the Hankel variant of DMD which uses a stacked set of time-delayed observations for system identification and subsequent prediction. A theory on the minimum number of time delays required for accurate reconstruction of periodic trajectories of nonline…
▽ More
This paper discusses the predictive capability of Dynamic Mode Decomposition (DMD) in the context of orbital mechanics. The focus is specifically on the Hankel variant of DMD which uses a stacked set of time-delayed observations for system identification and subsequent prediction. A theory on the minimum number of time delays required for accurate reconstruction of periodic trajectories of nonlinear systems is presented and corroborated using experimental analysis. In addition, the window size for training and prediction regions, respectively, is presented. The need for a meticulous approach while using DMD is emphasized by drawing comparisons between its performance on two candidate satellites, the ISS and MOLNIYA-3-50.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Information Geometry for the Working Information Theorist
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar,
Ting-Kam Leonard Wong
Abstract:
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas…
▽ More
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
A framework of windowed octonion linear canonical transform
Authors:
Manish Kumar,
Bhawna
Abstract:
The uncertainty principle is a fundamental principle in theoretical physics, such as quantum mechanics and classical mechanics. It plays a prime role in signal processing, including optics, where a signal is to be analyzed simultaneously in both domains; for instance, in harmonic analysis, both time and frequency domains, and in quantum mechanics, both time and momentum. On the other hand, many ma…
▽ More
The uncertainty principle is a fundamental principle in theoretical physics, such as quantum mechanics and classical mechanics. It plays a prime role in signal processing, including optics, where a signal is to be analyzed simultaneously in both domains; for instance, in harmonic analysis, both time and frequency domains, and in quantum mechanics, both time and momentum. On the other hand, many mathematicians, physicists, and other related domain researchers have paid more attention to the octonion-related integral transforms in recent years. In this paper, we define important properties of the windowed octonion linear canonical transform (WOCLCT), such as inversion, linearity, parity, shifting, and the relationship between OCLCT and WOCLCT. Further, we derived sharp Pitt's and sharp Young-Hausdorff inequalities for 3D WOCLCT. We obtain the logarithmic uncertainty principle for the 3D WOCLCT. Furthermore, Heisenberg's and Donoho-Stark's uncertainty principles are derived for WOCLCT, and the potential applications of WOCLCT are also discussed.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Non-Linear Signal Processing methods for UAV detections from a Multi-function X-band Radar
Authors:
Mohit Kumar,
Keith Kelly
Abstract:
This article develops the applicability of non-linear processing techniques such as Compressed Sensing (CS), Principal Component Analysis (PCA), Iterative Adaptive Approach (IAA) and Multiple-input-multiple-output (MIMO) for the purpose of enhanced UAV detections using portable radar systems. The combined scheme has many advantages and the potential for better detection and classification accuracy…
▽ More
This article develops the applicability of non-linear processing techniques such as Compressed Sensing (CS), Principal Component Analysis (PCA), Iterative Adaptive Approach (IAA) and Multiple-input-multiple-output (MIMO) for the purpose of enhanced UAV detections using portable radar systems. The combined scheme has many advantages and the potential for better detection and classification accuracy. Some of the benefits are discussed here with a phased array platform in mind, the novel portable phased array Radar (PWR) by Agile RF Systems (ARS), which offers quadrant outputs. CS and IAA both show promising results when applied to micro-Doppler processing of radar returns owing to the sparse nature of the target Doppler frequencies. This shows promise in reducing the dwell time and increase the rate at which a volume can be interrogated. Real-time processing of target information with iterative and non-linear solutions is possible now with the advent of GPU-based graphics processing hardware. Simulations show promising results.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
DeepCPG Policies for Robot Locomotion
Authors:
Aditya M. Deshpande,
Eric Hurd,
Ali A. Minai,
Manish Kumar
Abstract:
Central Pattern Generators (CPGs) form the neural basis of the observed rhythmic behaviors for locomotion in legged animals. The CPG dynamics organized into networks allow the emergence of complex locomotor behaviors. In this work, we take this inspiration for developing walking behaviors in multi-legged robots. We present novel DeepCPG policies that embed CPGs as a layer in a larger neural networ…
▽ More
Central Pattern Generators (CPGs) form the neural basis of the observed rhythmic behaviors for locomotion in legged animals. The CPG dynamics organized into networks allow the emergence of complex locomotor behaviors. In this work, we take this inspiration for developing walking behaviors in multi-legged robots. We present novel DeepCPG policies that embed CPGs as a layer in a larger neural network and facilitate end-to-end learning of locomotion behaviors in deep reinforcement learning (DRL) setup. We demonstrate the effectiveness of this approach on physics engine-based insectoid robots. We show that, compared to traditional approaches, DeepCPG policies allow sample-efficient end-to-end learning of effective locomotion strategies even in the case of high-dimensional sensor spaces (vision). We scale the DeepCPG policies using a modular robot configuration and multi-agent DRL. Our results suggest that gradual complexification with embedded priors of these policies in a modular fashion could achieve non-trivial sensor and motor integration on a robot platform. These results also indicate the efficacy of bootstrapping more complex intelligent systems from simpler ones based on biological principles. Finally, we present the experimental results for a proof-of-concept insectoid robot system for which DeepCPG learned policies initially using the simulation engine and these were afterwards transferred to real-world robots without any additional fine-tuning.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Design of generalized fuzzy multiple deferred state (GFMDS) sampling plan for attributes
Authors:
Julia Thampy Thomas,
Mahesh Kumar
Abstract:
. A sampling plan is a pilot tool for a supply and demand chain quality check strategy. These plans proved to be economically viable for the quality inspection processes but the uncertainty in the plan parameters challenged the reliability of the application of traditional acceptance sampling plans. This study proposes a generalized fuzzy multiple deferred state (GFMDS) sampling plan for attribute…
▽ More
. A sampling plan is a pilot tool for a supply and demand chain quality check strategy. These plans proved to be economically viable for the quality inspection processes but the uncertainty in the plan parameters challenged the reliability of the application of traditional acceptance sampling plans. This study proposes a generalized fuzzy multiple deferred state (GFMDS) sampling plan for attributes that consider the ambiguity in determining the exact value of the percentage of defectives in a batch. The performance measures have been derived and the plan is designed in terms of a minimum average sample number. A comparison study is done over the existing fuzzy acceptance sampling plans for attributes and a pertinent observation is made regarding the efficiency of the GFMDS scheme. The effect of inspection errors on the sampling procedure is analyzed and the drop in the acceptance criteria of the plan is observed corresponding to the intensified inspection errors. Several numerical examples are presented to validate the results
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Modelling Controllers for Cyber Physical Systems Using Neural Networks
Authors:
Aravindakumar Vijayasri Mohan Kumar
Abstract:
Model Predictive Controllers (MPC) are widely used for controlling cyber-physical systems. It is an iterative process of optimizing the prediction of the future states of a robot over a fixed time horizon. MPCs are effective in practice, but because they are computationally expensive and slow, they are not well suited for use in real-time applications. Overcoming the flaw can be accomplished by ap…
▽ More
Model Predictive Controllers (MPC) are widely used for controlling cyber-physical systems. It is an iterative process of optimizing the prediction of the future states of a robot over a fixed time horizon. MPCs are effective in practice, but because they are computationally expensive and slow, they are not well suited for use in real-time applications. Overcoming the flaw can be accomplished by approximating an MPC's functionality. Neural networks are very good function approximators and are faster compared to an MPC. It can be challenging to apply neural networks to control-based applications since the data does not match the i.i.d assumption. This study investigates various imitation learning methods for using a neural network in a control-based environment and evaluates their benefits and shortcomings.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Authors:
Naoya Takahashi,
Mayank Kumar,
Singh,
Yuki Mitsufuji
Abstract:
Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we propose a hierarchical diffusion model for singing voice neural vocoders. The proposed method consists of multiple diffusion…
▽ More
Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we propose a hierarchical diffusion model for singing voice neural vocoders. The proposed method consists of multiple diffusion models operating in different sampling rates; the model at the lowest sampling rate focuses on generating accurate low-frequency components such as pitch, and other models progressively generate the waveform at higher sampling rates on the basis of the data at the lower sampling rate and acoustic features. Experimental results show that the proposed method produces high-quality singing voices for multiple singers, outperforming state-of-the-art neural vocoders with a similar range of computational costs.
△ Less
Submitted 17 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Efficacy of Asynchronous GPS Spoofing Against High Volume Consumer GNSS Receivers
Authors:
M. Surendra Kumar,
Gaurav S. Kasbekar,
Arnab Maity
Abstract:
The vulnerability of the Global Positioning System (GPS) against spoofing is known for quite some time. Also, the positioning and navigation of most semi-autonomous and autonomous drones are dependent on Global Navigation Satellite System (GNSS) signals. In prior work, simplistic or asynchronous GPS spoofing was found to be a simple, efficient, and effective cyber attack against L1 GPS or GNSS dep…
▽ More
The vulnerability of the Global Positioning System (GPS) against spoofing is known for quite some time. Also, the positioning and navigation of most semi-autonomous and autonomous drones are dependent on Global Navigation Satellite System (GNSS) signals. In prior work, simplistic or asynchronous GPS spoofing was found to be a simple, efficient, and effective cyber attack against L1 GPS or GNSS dependent commercial drones. In this paper, first we make some important observations on asynchronous GPS spoofing attacks on drones presented in prior research literature. Then, we design an asynchronous GPS spoofing attack plan. Next, we test the effectiveness of this attack against GNSS receivers (high volume consumer devices based on Android mobile phones) of different capabilities and a commercial drone (DJI Mavic 2 Pro) under various conditions. Finally, we present several novel insights based on the results of the tests.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages
Authors:
Jiyeon Kim,
Mehul Kumar,
Dhananjaya Gowda,
Abhinav Garg,
Chanwoo Kim
Abstract:
In this paper, we propose a three-stage training methodology to improve the speech recognition accuracy of low-resource languages. We explore and propose an effective combination of techniques such as transfer learning, encoder freezing, data augmentation using Text-To-Speech (TTS), and Semi-Supervised Learning (SSL). To improve the accuracy of a low-resource Italian ASR, we leverage a well-traine…
▽ More
In this paper, we propose a three-stage training methodology to improve the speech recognition accuracy of low-resource languages. We explore and propose an effective combination of techniques such as transfer learning, encoder freezing, data augmentation using Text-To-Speech (TTS), and Semi-Supervised Learning (SSL). To improve the accuracy of a low-resource Italian ASR, we leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively. In the first stage, we use transfer learning from a well-trained English model. This primarily helps in learning the acoustic information from a resource-rich language. This stage achieves around 24% relative Word Error Rate (WER) reduction over the baseline. In stage two, We utilize unlabeled text data via TTS data-augmentation to incorporate language information into the model. We also explore freezing the acoustic encoder at this stage. TTS data augmentation helps us further reduce the WER by ~ 21% relatively. Finally, In stage three we reduce the WER by another 4% relative by using SSL from unlabeled audio data. Overall, our two-pass speech recognition system with a Monotonic Chunkwise Attention (MoChA) in the first pass and a full-attention in the second pass achieves a WER reduction of ~ 42% relative to the baseline.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
A comparison of streaming models and data augmentation methods for robust speech recognition
Authors:
Jiyeon Kim,
Mehul Kumar,
Dhananjaya Gowda,
Abhinav Garg,
Chanwoo Kim
Abstract:
In this paper, we present a comparative study on the robustness of two different online streaming speech recognition models: Monotonic Chunkwise Attention (MoChA) and Recurrent Neural Network-Transducer (RNN-T). We explore three recently proposed data augmentation techniques, namely, multi-conditioned training using an acoustic simulator, Vocal Tract Length Perturbation (VTLP) for speaker variabil…
▽ More
In this paper, we present a comparative study on the robustness of two different online streaming speech recognition models: Monotonic Chunkwise Attention (MoChA) and Recurrent Neural Network-Transducer (RNN-T). We explore three recently proposed data augmentation techniques, namely, multi-conditioned training using an acoustic simulator, Vocal Tract Length Perturbation (VTLP) for speaker variability, and SpecAugment. Experimental results show that unidirectional models are in general more sensitive to noisy examples in the training set. It is observed that the final performance of the model depends on the proportion of training examples processed by data augmentation techniques. MoChA models generally perform better than RNN-T models. However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques. On the other hand, RNN-T models perform better than MoChA models in terms of latency, inference time, and the stability of training. Additionally, RNN-T models are generally more robust against noise and reverberation. All these advantages make RNN-T models a better choice for streaming on-device speech recognition compared to MoChA models.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Robust Deep Reinforcement Learning for Quadcopter Control
Authors:
Aditya M. Deshpande,
Ali A. Minai,
Manish Kumar
Abstract:
Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from…
▽ More
Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments.
Codes: https://github.com/adipandas/gym_multirotor
△ Less
Submitted 6 November, 2021;
originally announced November 2021.
-
Medical Image Segmentation with 3D Convolutional Neural Networks: A Survey
Authors:
S Niyas,
S J Pawan,
M Anand Kumar,
Jeny Rajan
Abstract:
Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and…
▽ More
Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and software support to process large volumes of data, 3D deep learning methods are gaining popularity in medical image analysis. Here, we present an extensive review of the recently evolved 3D deep learning methods in medical image segmentation. Furthermore, the research gaps and future directions in 3D medical image segmentation are discussed.
△ Less
Submitted 28 April, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Few-shot calibration of low-cost air pollution (PM2.5) sensors using meta-learning
Authors:
Kalpit Yadav,
Vipul Arora,
Sonu Kumar Jha,
Mohit Kumar,
Sachchida Nand Tripathi
Abstract:
Low-cost particulate matter sensors are transforming air quality monitoring because they have lower costs and greater mobility as compared to reference monitors. Calibration of these low-cost sensors requires training data from co-deployed reference monitors. Machine Learning based calibration gives better performance than conventional techniques, but requires a large amount of training data from…
▽ More
Low-cost particulate matter sensors are transforming air quality monitoring because they have lower costs and greater mobility as compared to reference monitors. Calibration of these low-cost sensors requires training data from co-deployed reference monitors. Machine Learning based calibration gives better performance than conventional techniques, but requires a large amount of training data from the sensor, to be calibrated, co-deployed with a reference monitor. In this work, we propose novel transfer learning methods for quick calibration of sensors with minimal co-deployment with reference monitors. Transfer learning utilizes a large amount of data from other sensors along with a limited amount of data from the target sensor. Our extensive experimentation finds the proposed Model-Agnostic- Meta-Learning (MAML) based transfer learning method to be the most effective over other competitive baselines.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Fragility curves for power transmission towers in Odisha, India, based on observed damage during 2019 Cyclone Fani
Authors:
Surender V Raj,
Manish Kumar,
Udit Bhatia
Abstract:
Lifeline infrastructure systems such as a power transmission network in coastal regions are vulnerable to strong winds generated during tropical cyclones. Understanding the fragility of individual towers is helpful in improving the resilience of such systems. Fragility curves have been developed in the past for some regions, but without considering relevant epistemic uncertainties. Further, risk a…
▽ More
Lifeline infrastructure systems such as a power transmission network in coastal regions are vulnerable to strong winds generated during tropical cyclones. Understanding the fragility of individual towers is helpful in improving the resilience of such systems. Fragility curves have been developed in the past for some regions, but without considering relevant epistemic uncertainties. Further, risk and resilience studies are best performed using the fragility curves specific to a region. Such studies become particularly important if the region is exposed to cyclones rather frequently. This paper presents the development of fragility curves for high-voltage power transmission towers in the state of Odisha, India, based on macro-level damage data from 2019 cyclone Fani, which was obtained through concerned government offices. Two types of damages were identified, namely, collapse and partial damage. Accordingly, fragility curves for collapse and functionality disruption damage states were developed considering relevant aleatory and epistemic uncertainties. The latter class of uncertainties included that associated with wind speed estimation at a location and the finite sample uncertainty. The most significant contribution in the epistemic uncertainty was due to the wind speed estimation at a location. The median and logarithmic standard deviation for the 50th percentile fragility curve associated with collapse was close to that for the functionality disruption damage state. These curves also compared reasonably well with those reported for similar structures in other parts of the world.
△ Less
Submitted 26 June, 2021;
originally announced July 2021.
-
Dual Script E2E framework for Multilingual and Code-Switching ASR
Authors:
Mari Ganesh Kumar,
Jom Kuriakose,
Anand Thyagachandran,
Arun Kumar A,
Ashish Seth,
Lodagala Durga Prasad,
Saish Jaiswal,
Anusha Prakash,
Hema Murthy
Abstract:
India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems.
Inspired by results in…
▽ More
India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems.
Inspired by results in text-to-speech synthesis, in this work, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021. Our best results achieve 6% and 5% improvement (approx) in word error rate over the baseline system for the multilingual and code-switching tasks, respectively, on the challenge development data.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Reduction in Circulating Current with Improved Secondary Side Modulation in Isolated Current-Fed Half Bridge AC-DC Converter
Authors:
Manish Kumar,
Sumit Pramanick,
B K Panigrahi
Abstract:
Current-fed half bridge converter with bidirectional switches on ac side and a full bridge converter on dc side of a high frequency transformer is an optimal topology for single stage galvanically isolated ac-dc converter for onboard vehicle charging application. AC side switches are actively commutated to achieve zero current switching (ZCS) using single phase shift modulation (SPSM) and disconti…
▽ More
Current-fed half bridge converter with bidirectional switches on ac side and a full bridge converter on dc side of a high frequency transformer is an optimal topology for single stage galvanically isolated ac-dc converter for onboard vehicle charging application. AC side switches are actively commutated to achieve zero current switching (ZCS) using single phase shift modulation (SPSM) and discontinuous current phase shift modulation (DCPSM). Furthermore, zero voltage turn-on (ZVS) is achieved for dc side switches. Compared to SPSM, DCPSM maintains a constant peak current in the converter throughout the grid cycle of ac mains voltage. However, constant peak current contributes to a high circulating current near the zero crossings of ac mains voltage and also at light load conditions. This paper proposes an improved discontinuous current phase shift modulation (IDCPSM) to increase the efficiency of the converter across different loading conditions. A dual control variable is adopted to actively reduce the circulating current while maintaining soft switching of both ac and dc side switches across the grid cycle of ac mains voltage. A 1.5 kW laboratory prototype has been developed to experimentally validate the analysis, design and improvement in performance for different loading conditions.
△ Less
Submitted 22 May, 2021;
originally announced May 2021.
-
A MIMO approach for Weather Radars
Authors:
Mohit Kumar,
V Chandrasekar,
P Keith Kelly
Abstract:
This article develops the multiple-input multiple-output (MIMO) technology for weather radar sensing. There are ample advantages of MIMO that have been highlighted that can improve the spatial resolution of the observations and also the accuracy of the radar variables. These concepts have been introduced here pertaining to weather radar observations with supporting simulations demonstrating improv…
▽ More
This article develops the multiple-input multiple-output (MIMO) technology for weather radar sensing. There are ample advantages of MIMO that have been highlighted that can improve the spatial resolution of the observations and also the accuracy of the radar variables. These concepts have been introduced here pertaining to weather radar observations with supporting simulations demonstrating improvements to existing phased array technology. Already MIMO is being used in a big way for hard target detection and tracking and also in the automotive radar industry and it offers similar improvements for weather radar observations. Some of the benefits are discussed here with a phased array platform in mind which offers quadrant outputs.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Integrated Framework of Vehicle Dynamics, Instabilities, Energy Models, and Sparse Flow Smoothing Controllers
Authors:
Jonathan W. Lee,
George Gunter,
Rabie Ramadan,
Sulaiman Almatrudi,
Paige Arnold,
John Aquino,
William Barbour,
Rahul Bhadani,
Joy Carpio,
Fang-Chieh Chou,
Marsalis Gibson,
Xiaoqian Gong,
Amaury Hayat,
Nour Khoudari,
Abdul Rahman Kreidieh,
Maya Kumar,
Nathan Lichtlé,
Sean McQuade,
Brian Nguyen,
Megan Ross,
Sydney Truong,
Eugene Vinitsky,
Yibo Zhao,
Jonathan Sprinkle,
Benedetto Piccoli
, et al. (3 additional authors not shown)
Abstract:
This work presents an integrated framework of: vehicle dynamics models, with a particular attention to instabilities and traffic waves; vehicle energy models, with particular attention to accurate energy values for strongly unsteady driving profiles; and sparse Lagrangian controls via automated vehicles, with a focus on controls that can be executed via existing technology such as adaptive cruise…
▽ More
This work presents an integrated framework of: vehicle dynamics models, with a particular attention to instabilities and traffic waves; vehicle energy models, with particular attention to accurate energy values for strongly unsteady driving profiles; and sparse Lagrangian controls via automated vehicles, with a focus on controls that can be executed via existing technology such as adaptive cruise control systems. This framework serves as a key building block in developing control strategies for human-in-the-loop traffic flow smoothing on real highways. In this contribution, we outline the fundamental merits of integrating vehicle dynamics and energy modeling into a single framework, and we demonstrate the energy impact of sparse flow smoothing controllers via simulation results.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Information Geometry and Classical Cramér-Rao Type Inequalities
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar
Abstract:
We examine the role of information geometry in the context of classical Cramér-Rao (CR) type inequalities. In particular, we focus on Eguchi's theory of obtaining dualistic geometric structures from a divergence function and then applying Amari-Nagoaka's theory to obtain a CR type inequality. The classical deterministic CR inequality is derived from Kullback-Leibler (KL)-divergence. We show that t…
▽ More
We examine the role of information geometry in the context of classical Cramér-Rao (CR) type inequalities. In particular, we focus on Eguchi's theory of obtaining dualistic geometric structures from a divergence function and then applying Amari-Nagoaka's theory to obtain a CR type inequality. The classical deterministic CR inequality is derived from Kullback-Leibler (KL)-divergence. We show that this framework could be generalized to other CR type inequalities through four examples: $α$-version of CR inequality, generalized CR inequality, Bayesian CR inequality, and Bayesian $α$-CR inequality. These are obtained from, respectively, $I_α$-divergence (or relative $α$-entropy), generalized Csiszár divergence, Bayesian KL divergence, and Bayesian $I_α$-divergence.
△ Less
Submitted 21 August, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
A Comprehensive Survey on Real-Time Voltage Stability Assessment for Power Systems
Authors:
Gourav Wadhwa,
Amandeep Kharb,
Satyam Mishra,
Mohit Kumar,
Shreyansh Srivastav
Abstract:
Accurate real-time assessment of power systems voltage stability has been an active area of research in the past few decades. In the past decade, after the development of phasor measurement units (PMU), a lot of discussions has been going on phasor measurement techniques for real-time voltage stability. The fundamental idea behind these methods is to find the Thevenin equivalents of the system, an…
▽ More
Accurate real-time assessment of power systems voltage stability has been an active area of research in the past few decades. In the past decade, after the development of phasor measurement units (PMU), a lot of discussions has been going on phasor measurement techniques for real-time voltage stability. The fundamental idea behind these methods is to find the Thevenin equivalents of the system, and then determine the voltage stability margin based on the equivalent circuits. Some approaches also include the use of Artificial Neural Networks (ANN), for online monitoring of voltage stability margins. These methods are really fast as compared to the other methods. It has been shown that if we can obtain the phase angles and voltage magnitude in real-time from the phasor measurement units (PMU), then the voltage stability margins can be obtained in real-time and we can initiate voltage stability control methods. We are going to discuss Thevenin's equivalent methods and Artificial Intelligence methods in detail in this paper. We will also introduce the traditional methods which were earlier used for power systems stability assessment such as Time Domain methods, Static Methods, and Sensitivity methods. We are going to finally compare these methods and try to give general guidance on choosing a power stability method.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Multi-Scale Speaker Diarization With Neural Affinity Score Fusion
Authors:
Tae Jin Park,
Manoj Kumar,
Shrikanth Narayanan
Abstract:
Identifying the identity of the speaker of short segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trad…
▽ More
Identifying the identity of the speaker of short segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trade-off between temporal resolution and the quality of the speaker representations. To find a set of weights that balance the scores from multiple temporal scales of segments, a neural affinity score fusion model is presented. Using the CALLHOME dataset, we show that our proposed multi-scale segmentation and integration approach can achieve a state-of-the-art diarization performance.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Exploration of End-to-end Synthesisers forZero Resource Speech Challenge 2020
Authors:
Karthik Pandia D S,
Anusha Prakash,
Mano Ranjith Kumar,
Hema A Murthy
Abstract:
A Spoken dialogue system for an unseen language is referred to as Zero resource speech. It is especially beneficial for developing applications for languages that have low digital resources. Zero resource speech synthesis is the task of building text-to-speech (TTS) models in the absence of transcriptions. In this work, speech is modelled as a sequence of transient and steady-state acoustic units,…
▽ More
A Spoken dialogue system for an unseen language is referred to as Zero resource speech. It is especially beneficial for developing applications for languages that have low digital resources. Zero resource speech synthesis is the task of building text-to-speech (TTS) models in the absence of transcriptions. In this work, speech is modelled as a sequence of transient and steady-state acoustic units, and a unique set of acoustic units is discovered by iterative training. Using the acoustic unit sequence, TTS models are trained. The main goal of this work is to improve the synthesis quality of zero resource TTS system. Four different systems are proposed. All the systems consist of three stages: unit discovery, followed by unit sequence to spectrogram mapping, and finally spectrogram to speech inversion. Modifications are proposed to the spectrogram mapping stage. These modifications include training the mapping on voice data, using x-vectors to improve the mapping, two-stage learning, and gender-specific modelling. Evaluation of the proposed systems in the Zerospeech 2020 challenge shows that quite good quality synthesis can be achieved.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Use of adaptive filtering techniques and deconvolution to obtain low range sidelobe samples
Authors:
Mohit Kumar,
V. Chandrasekar
Abstract:
In this paper the use of adaptive filtering techniques to obtain better peak sidelobe suppression and integrated sidelobe energy will be discussed with regard to weather radars and obtaining better sensitivity with this technique. The performance of these new coefficient sets obtained with adaptive filter (using RLS optimization) will be discussed and presented. They will also be compared with the…
▽ More
In this paper the use of adaptive filtering techniques to obtain better peak sidelobe suppression and integrated sidelobe energy will be discussed with regard to weather radars and obtaining better sensitivity with this technique. The performance of these new coefficient sets obtained with adaptive filter (using RLS optimization) will be discussed and presented. They will also be compared with the existing techniques and their peak sidelobe levels.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Designing Neural Speaker Embeddings with Meta Learning
Authors:
Manoj Kumar,
Tae Jin-Park,
Somer Bishop,
Shrikanth Narayanan
Abstract:
Neural speaker embeddings trained using classification objectives have demonstrated state-of-the-art performance in multiple applications. Typically, such embeddings are trained on an out-of-domain corpus on a single task e.g., speaker classification, albeit with a large number of classes (speakers). In this work, we reformulate embedding training under the meta-learning paradigm. We redistribute…
▽ More
Neural speaker embeddings trained using classification objectives have demonstrated state-of-the-art performance in multiple applications. Typically, such embeddings are trained on an out-of-domain corpus on a single task e.g., speaker classification, albeit with a large number of classes (speakers). In this work, we reformulate embedding training under the meta-learning paradigm. We redistribute the training corpus as an ensemble of multiple related speaker classification tasks, and learn a representation that generalizes better to unseen speakers. First, we develop an open source toolkit to train x-vectors that is matched in performance with pre-trained Kaldi models for speaker diarization and speaker verification applications. We find that different bottleneck layers in the architecture variedly favor different applications. Next, we use two meta-learning strategies, namely prototypical networks and relation networks, to improve over the x-vector embeddings. Our best performing model achieves a relative improvement of 12.37% and 7.11% in speaker error on the DIHARD II development corpus and the AMI meeting corpus, respectively. We analyze improvements across different domains in the DIHARD corpus. Notably, on the challenging child speech domain, we study the relation between child age and the diarization performance. Further, we show reductions in equal error rate for speaker verification on the SITW corpus (7.68%) and the VOiCES challenge corpus (8.78%). We observe that meta-learning particularly offers benefits in challenging acoustic conditions and recording setups encountered in these corpora. Our experiments illustrate the applicability of meta-learning as a generalized learning paradigm for training deep neural speaker embeddings.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Evidence of Task-Independent Person-Specific Signatures in EEG using Subspace Techniques
Authors:
Mari Ganesh Kumar,
Shrikanth Narayanan,
Mriganka Sur,
Hema A Murthy
Abstract:
Electroencephalography (EEG) signals are promising as alternatives to other biometrics owing to their protection against spoofing. Previous studies have focused on capturing individual variability by analyzing task/condition-specific EEG. This work attempts to model biometric signatures independent of task/condition by normalizing the associated variance. Toward this goal, the paper extends ideas…
▽ More
Electroencephalography (EEG) signals are promising as alternatives to other biometrics owing to their protection against spoofing. Previous studies have focused on capturing individual variability by analyzing task/condition-specific EEG. This work attempts to model biometric signatures independent of task/condition by normalizing the associated variance. Toward this goal, the paper extends ideas from subspace-based text-independent speaker recognition and proposes novel modifications for modeling multi-channel EEG data. The proposed techniques assume that biometric information is present in the entire EEG signal and accumulate statistics across time in a high dimensional space. These high dimensional statistics are then projected to a lower dimensional space where the biometric information is preserved. The lower dimensional embeddings obtained using the proposed approach are shown to be task-independent. The best subspace system identifies individuals with accuracies of 86.4% and 35.9% on datasets with 30 and 920 subjects, respectively, using just nine EEG channels. The paper also provides insights into the subspace model's scalability to unseen tasks and individuals during training and the number of channels needed for subspace modeling.
△ Less
Submitted 25 March, 2021; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization
Authors:
Monisankha Pal,
Manoj Kumar,
Raghuveer Peri,
Tae Jin Park,
So Hyun Kim,
Catherine Lord,
Somer Bishop,
Shrikanth Narayanan
Abstract:
The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN n…
▽ More
The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine-tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to the autism spectrum disorder domain. Extensive analyses of the experimental data are done to investigate the effectiveness of the proposed ClusterGAN and MCGAN embeddings over x-vectors. The results show that the proposed embeddings with normalized maximum eigengap spectral clustering (NME-SC) back-end consistently outperform Kaldi state-of-the-art z-vector diarization system. Finally, we employ embedding fusion with x-vectors to provide further improvement in diarization performance. We achieve a relative diarization error rate (DER) improvement of 6.67% to 53.93% on the aforementioned datasets using the proposed fused embeddings over x-vectors. Besides, the MCGAN embeddings provide better performance in the number of speakers estimation and short speech segment diarization as compared to x-vectors and ClusterGAN in telephonic data.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors
Authors:
Aditya M. Deshpande,
Rumit Kumar,
Ali A. Minai,
Manish Kumar
Abstract:
In this paper, we present a novel developmental reinforcement learning-based controller for a quadcopter with thrust vectoring capabilities. This multirotor UAV design has tilt-enabled rotors. It utilizes the rotor force magnitude and direction to achieve the desired state during flight. The control policy of this robot is learned using the policy transfer from the learned controller of the quadco…
▽ More
In this paper, we present a novel developmental reinforcement learning-based controller for a quadcopter with thrust vectoring capabilities. This multirotor UAV design has tilt-enabled rotors. It utilizes the rotor force magnitude and direction to achieve the desired state during flight. The control policy of this robot is learned using the policy transfer from the learned controller of the quadcopter (comparatively simple UAV design without thrust vectoring). This approach allows learning a control policy for systems with multiple inputs and multiple outputs. The performance of the learned policy is evaluated by physics-based simulations for the tasks of hovering and way-point navigation. The flight simulations utilize a flight controller based on reinforcement learning without any additional PID components. The results show faster learning with the presented approach as opposed to learning the control policy from scratch for this new UAV design created by modifications in a conventional quadcopter, i.e., the addition of more degrees of freedom (4-actuators in conventional quadcopter to 8-actuators in tilt-rotor quadcopter). We demonstrate the robustness of our learned policy by showing the recovery of the tilt-rotor platform in the simulation from various non-static initial conditions in order to reach a desired state. The developmental policy for the tilt-rotor UAV also showed superior fault tolerance when compared with the policy learned from the scratch. The results show the ability of the presented approach to bootstrap the learned behavior from a simpler system (lower-dimensional action-space) to a more complex robot (comparatively higher-dimensional action-space) and reach better performance faster.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Quaternion Feedback Based Autonomous Control of a Quadcopter UAV with Thrust Vectoring Rotors
Authors:
Rumit Kumar,
Mahathi Bhargavapuri,
Aditya M. Deshpande,
Siddharth Sridhar,
Kelly Cohen,
Manish Kumar
Abstract:
In this paper, we present an autonomous flight controller for a quadcopter with thrust vectoring capabilities. This UAV falls in the category of multirotors with tilt-motion enabled rotors. Since the vehicle considered is over-actuated in nature, the dynamics and control allocation have to be analysed carefully. Moreover, the possibility of hovering at large attitude maneuvers of this novel vehicl…
▽ More
In this paper, we present an autonomous flight controller for a quadcopter with thrust vectoring capabilities. This UAV falls in the category of multirotors with tilt-motion enabled rotors. Since the vehicle considered is over-actuated in nature, the dynamics and control allocation have to be analysed carefully. Moreover, the possibility of hovering at large attitude maneuvers of this novel vehicle requires singularity-free attitude control. Hence, quaternion state feedback is utilized to compute the control commands for the UAV motors while avoiding the gimbal lock condition experienced by Euler angle based controllers. The quaternion implementation also reduces the overall complexity of state estimation due to absence of trigonometric parameters. The quadcopter dynamic model and state space is utilized to design the attitude controller and control allocation for the UAV. The control allocation, in particular, is derived by linearizing the system about hover condition. This mathematical method renders the control allocation more accurate than existing approaches. Lyapunov stability analysis of the attitude controller is shown to prove global stability. The quaternion feedback attitude controller is commanded by an outer position controller loop which generates rotor-tilt and desired quaternions commands for the system. The performance of the UAV is evaluated by numerical simulations for tracking attitude step commands and for following a way-point navigation mission.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
Computer Vision Toolkit for Non-invasive Monitoring of Factory Floor Artifacts
Authors:
Aditya M. Deshpande,
Anil Kumar Telikicherla,
Vinay Jakkali,
David A. Wickelhaus,
Manish Kumar,
Sam Anand
Abstract:
Digitization has led to smart, connected technologies be an integral part of businesses, governments and communities. For manufacturing digitization, there has been active research and development with a focus on Cloud Manufacturing (CM) and the Industrial Internet of Things (IIoT). This work presents a computer vision toolkit (CV Toolkit) for non-invasive digitization of the factory floor in line…
▽ More
Digitization has led to smart, connected technologies be an integral part of businesses, governments and communities. For manufacturing digitization, there has been active research and development with a focus on Cloud Manufacturing (CM) and the Industrial Internet of Things (IIoT). This work presents a computer vision toolkit (CV Toolkit) for non-invasive digitization of the factory floor in line with Industry 4.0 requirements for factory data collection. Currently, technical challenges persist towards digitization of legacy systems due to the limitation for changes in their design and sensors. This novel toolkit is developed to facilitate easy integration of legacy production machinery and factory floor artifacts with the digital and smart manufacturing environment with no requirement of any physical changes in the machines. The system developed is modular, and allows real-time monitoring of production machinery. Modularity aspect allows the incorporation of new software applications in the current framework of CV Toolkit. To allow connectivity of this toolkit with manufacturing floors in a simple, deployable and cost-effective manner, the toolkit is integrated with a known manufacturing data standard, MTConnect, to "translate" the digital inputs into data streams that can be read by commercial status tracking and reporting software solutions. The proposed toolkit is demonstrated using a mock-panel environment developed in house at the University of Cincinnati to highlight its usability.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
One-Shot Recognition of Manufacturing Defects in Steel Surfaces
Authors:
Aditya M. Deshpande,
Ali A. Minai,
Manish Kumar
Abstract:
Quality control is an essential process in manufacturing to make the product defect-free as well as to meet customer needs. The automation of this process is important to maintain high quality along with the high manufacturing throughput. With recent developments in deep learning and computer vision technologies, it has become possible to detect various features from the images with near-human acc…
▽ More
Quality control is an essential process in manufacturing to make the product defect-free as well as to meet customer needs. The automation of this process is important to maintain high quality along with the high manufacturing throughput. With recent developments in deep learning and computer vision technologies, it has become possible to detect various features from the images with near-human accuracy. However, many of these approaches are data intensive. Training and deployment of such a system on manufacturing floors may become expensive and time-consuming. The need for large amounts of training data is one of the limitations of the applicability of these approaches in real-world manufacturing systems. In this work, we propose the application of a Siamese convolutional neural network to do one-shot recognition for such a task. Our results demonstrate how one-shot learning can be used in quality control of steel by identification of defects on the steel surface. This method can significantly reduce the requirements of training data and can also be run in real-time.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Flight Control of Sliding Arm Quadcopter with Dynamic Structural Parameters
Authors:
Rumit Kumar,
Aditya M. Deshpande,
James Z. Wells,
Manish Kumar
Abstract:
The conceptual design and flight controller of a novel kind of quadcopter are presented. This design is capable of morphing the shape of the UAV during flight to achieve position and attitude control. We consider a dynamic center of gravity (CoG) which causes continuous variation in a moment of inertia (MoI) parameters of the UAV in this design. These dynamic structural parameters play a vital rol…
▽ More
The conceptual design and flight controller of a novel kind of quadcopter are presented. This design is capable of morphing the shape of the UAV during flight to achieve position and attitude control. We consider a dynamic center of gravity (CoG) which causes continuous variation in a moment of inertia (MoI) parameters of the UAV in this design. These dynamic structural parameters play a vital role in the stability and control of the system. The length of quadcopter arms is a variable parameter, and it is actuated using attitude feedback-based control law. The MoI parameters are computed in real-time and incorporated in the equations of motion of the system. The UAV utilizes the angular motion of propellers and variable quadcopter arm lengths for position and navigation control. The movement space of the CoG is a design parameter and it is bounded by actuator limitations and stability requirements of the system. A detailed information on equations of motion, flight controller design and possible applications of this system are provided. Further, the proposed shape-changing UAV system is evaluated by comparative numerical simulations for way point navigation mission and complex trajectory tracking.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Intermediate frequency Upgrade design features of NASA D3R Weather Radar System
Authors:
Mohit Kumar,
Shashank Joshil,
Manuel Vega,
Robert Beauchamp,
V Chandrasekar
Abstract:
The NASA dual-frequency, dual-polarization, Doppler radar (D3R) is an important ground validation tool for the global precipitation measurement (GPM) mission dual-frequency precipitation radar (DPR). The D3R has undergone extensive field trials starting in 2011 and continues to provide observations that enhance our scientific knowledge. To further enhance its capabilities, the Intermediate frequen…
▽ More
The NASA dual-frequency, dual-polarization, Doppler radar (D3R) is an important ground validation tool for the global precipitation measurement (GPM) mission dual-frequency precipitation radar (DPR). The D3R has undergone extensive field trials starting in 2011 and continues to provide observations that enhance our scientific knowledge. To further enhance its capabilities, the Intermediate frequency (IF) electronics, digital receiver and waveform generation subsystems have been upgraded. Due to the new, more flexible architecture, this upgrade enables more research frontiers to be explored with better performance. One of the primary motivations for this upgrade is to enable enhanced radar sensitivity and increase range resolution to 30 meters. In this work, the D3R system upgrade will be discussed with a focus on the key upgrade design features to obtain better sensitivity and a flexible waveform capability.
△ Less
Submitted 25 June, 2020; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Inter Pulse Frequency Diversity System for Second Trip Suppression and Retrieval in a Weather Radar
Authors:
V Chandrasekar,
Mohit Kumar
Abstract:
This paper develops the use of Inter-pulse frequency diversity scheme for a weather radar system. It establishes the performance of frequency diversity technique comparing it with other inter-pulse schemes for weather radar systems. Inter-pulse coding is widely used for second trip suppression or cross-polarization isolation. Here, a new inter-pulse scheme is discussed taking advantage of frequenc…
▽ More
This paper develops the use of Inter-pulse frequency diversity scheme for a weather radar system. It establishes the performance of frequency diversity technique comparing it with other inter-pulse schemes for weather radar systems. Inter-pulse coding is widely used for second trip suppression or cross-polarization isolation. Here, a new inter-pulse scheme is discussed taking advantage of frequency diverse waveforms. The simulations and tests of performance are evaluated keeping in mind NASA dual-frequency, dual-polarization, Doppler radar (D3R). A new method is described to recover velocity and spectral width due to incoherence in samples from change of frequency pulse to pulse. This technique can recover the weather radar moments over a much higher dynamic range of the other trip contamination as compared with the popular systematic phase code, for second trip suppression and retrieval.
△ Less
Submitted 22 February, 2020;
originally announced March 2020.
-
Coding schemes and Applications for Weather Radars
Authors:
Mohit Kumar,
V. Chandrasekar,
Shashank Joshil
Abstract:
In this paper, we describe the evolution of a pair of polyphase coded waveform for use in second trip suppression in weather radar. The polyphase codes were designed and tested on NASA weather radar. The NASA dual-frequency, dual-polarization Doppler radar (D3R) was developed primarily as a ground validation tool for the GPM satellite dual-frequency radar. Recently, the D3R radar was upgraded with…
▽ More
In this paper, we describe the evolution of a pair of polyphase coded waveform for use in second trip suppression in weather radar. The polyphase codes were designed and tested on NASA weather radar. The NASA dual-frequency, dual-polarization Doppler radar (D3R) was developed primarily as a ground validation tool for the GPM satellite dual-frequency radar. Recently, the D3R radar was upgraded with new versions of digital receiver hardware and firmware, which supports larger filter lengths and multiple phase coded waveforms, and also newer IF sub-systems. This has enhanced the capabilities of radar manifolds.
△ Less
Submitted 20 February, 2020;
originally announced March 2020.
-
Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap
Authors:
Tae Jin Park,
Kyu J. Han,
Manoj Kumar,
Shrikanth Narayanan
Abstract:
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization. The proposed framework uses normalized maximum eigengap (NME) values to estimate the number of clusters and the parameters for the threshold of the elements of each row in an affinity matrix during spectral clustering, without the use of…
▽ More
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization. The proposed framework uses normalized maximum eigengap (NME) values to estimate the number of clusters and the parameters for the threshold of the elements of each row in an affinity matrix during spectral clustering, without the use of parameter tuning on the development set. Even through this hands-off approach, we achieve a comparable or better performance across various evaluation sets than the results found using traditional clustering methods that apply careful parameter tuning and development data. A relative improvement of 17% in the speaker error rate on the well-known CALLHOME evaluation set shows the effectiveness of our proposed spectral clustering with auto-tuning.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Generalized Bayesian Cramér-Rao Inequality via Information Geometry of Relative $α$-Entropy
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar
Abstract:
The relative $α$-entropy is the Rényi analog of relative entropy and arises prominently in information-theoretic problems. Recent information geometric investigations on this quantity have enabled the generalization of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator of an escort of the underlying parametric probability distribution. However, this framework…
▽ More
The relative $α$-entropy is the Rényi analog of relative entropy and arises prominently in information-theoretic problems. Recent information geometric investigations on this quantity have enabled the generalization of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator of an escort of the underlying parametric probability distribution. However, this framework remains unexamined in the Bayesian framework. In this paper, we propose a general Riemannian metric based on relative $α$-entropy to obtain a generalized Bayesian Cramér-Rao inequality. This establishes a lower bound for the variance of an unbiased estimator for the $α$-escort distribution starting from an unbiased estimator for the underlying distribution. We show that in the limiting case when the entropy order approaches unity, this framework reduces to the conventional Bayesian Cramér-Rao inequality. Further, in the absence of priors, the same framework yields the deterministic Cramér-Rao inequality.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Cramér-Rao Lower Bounds Arising from Generalized Csiszár Divergences
Authors:
M. Ashok Kumar,
Kumar Vijay Mishra
Abstract:
We study the geometry of probability distributions with respect to a generalized family of Csiszár $f$-divergences. A member of this family is the relative $α$-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi's theory to derive the Fisher information metric and the dual affine conne…
▽ More
We study the geometry of probability distributions with respect to a generalized family of Csiszár $f$-divergences. A member of this family is the relative $α$-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi's theory to derive the Fisher information metric and the dual affine connections arising from these generalized divergence functions. This enables us to arrive at a more widely applicable version of the Cramér-Rao inequality, which provides a lower bound for the variance of an estimator for an escort of the underlying parametric probability distribution. We then extend the Amari-Nagaoka's dually flat structure of the exponential and mixer models to other distributions with respect to the aforementioned generalized metric. We show that these formulations lead us to find unbiased and efficient estimators for the escort model. Finally, we compare our work with prior results on generalized Cramér-Rao inequalities that were derived from non-information-geometric frameworks.
△ Less
Submitted 24 May, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Authors:
Abhinav Garg,
Dhananjaya Gowda,
Ankur Kumar,
Kwangyoun Kim,
Mehul Kumar,
Chanwoo Kim
Abstract:
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic gra…
▽ More
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).
△ Less
Submitted 27 December, 2019;
originally announced December 2019.
-
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Authors:
Chanwoo Kim,
Mehul Kumar,
Kwangyoun Kim,
Dhananjaya Gowda
Abstract:
In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filter…
▽ More
In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filterbank coefficients and apply nonlinearity functions for each filterbank channel. With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution. With the histogram-based MUD, the empirical Cumulative Density Function (CDF) from the training database is employed to transform the original distribution into a uniform distribution. In MUD processing, we do not use any prior knowledge (e.g. logarithmic relation) about the energy of the incoming signal and the perceived intensity by a human. Experimental results using an end-to-end speech recognition system demonstrate that power-function based MUD shows better result than the conventional Mel Filterbank Cepstral Coefficients (MFCCs). On the LibriSpeech database, we could achieve 4.02 % WER on test-clean and 13.34 % WER on test-other without using any Language Models (LMs). The major contribution of this work is that we developed a new algorithm for designing the compressive nonlinearity in a data-driven way, which is much more flexible than the previous approaches and may be extended to other domains as well.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
end-to-end training of a large vocabulary end-to-end speech recognition system
Authors:
Chanwoo Kim,
Sungsoo Kim,
Kwangyoun Kim,
Mehul Kumar,
Jiyeon Kim,
Kyungmin Lee,
Changwoo Han,
Abhinav Garg,
Eunhyang Kim,
Minkyoo Shin,
Shatrughan Singh,
Larry Heck,
Dhananjaya Gowda
Abstract:
In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire data reading, large scale data augmentation, neural network parameter updates are all performed "on-the-fly". We use vocal tract length perturbation […
▽ More
In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire data reading, large scale data augmentation, neural network parameter updates are all performed "on-the-fly". We use vocal tract length perturbation [1] and an acoustic simulator [2] for data augmentation. The processed features and labels are sent to the GPU cluster. The Horovod allreduce approach is employed to train neural network parameters. We evaluated the effectiveness of our system on the standard Librispeech corpus [3] and the 10,000-hr anonymized Bixby English dataset. Our end-to-end speech recognition system built using this training infrastructure showed a 2.44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM). For the proprietary English Bixby open domain test set, we obtained a WER of 7.92 % using a Bidirectional Full Attention (BFA) end-to-end model after applying shallow fusion with an RNN-LM. When the monotonic chunckwise attention (MoCha) based approach is employed for streaming speech recognition, we obtained a WER of 9.95 % on the same Bixby open domain test set.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
Finite State Markov Modeling of Fading Channels Towards Decoding of LDPC Codes
Authors:
Mohit Kumar
Abstract:
Here we have proposed two decoding strategies of low-density parity-check (LDPC) codes over Markov noise channels with bit flipping noise. The sum-product algorithm used for decoding LDPC codes over memoryless channels is extended to include channel estimation and how much gain we obtain by doing so is simulated and verified. LDPC codes have been studied for years over memoryless channels and are…
▽ More
Here we have proposed two decoding strategies of low-density parity-check (LDPC) codes over Markov noise channels with bit flipping noise. The sum-product algorithm used for decoding LDPC codes over memoryless channels is extended to include channel estimation and how much gain we obtain by doing so is simulated and verified. LDPC codes have been studied for years over memoryless channels and are known to have excellent performance. However, these codes over channels with memory is the topic of current research. Here, channels with memory are characterized by Markov modeling which is a useful busty channel model. With sufficient no. of states, they are able to model sufficient noise characteristics. We have gone for a two-state system as it shows a good compromise between complexity and performance.
△ Less
Submitted 19 October, 2019;
originally announced December 2019.
-
Receive signal path design for Active phased array radars
Authors:
Mohit Kumar,
Dileep,
K. Sreenivasulu,
D. Seshagiri,
Durga Srinivas,
S. Narasimhan
Abstract:
Modern Active Phased array Radar systems with a large number of T/R modules, multi-channel receiver down converters and distributed power distribution networks leads to design and analysis of the receive signal path more complex. In this paper receive signal path design of a typical 1000 T/R modules based fully distributed active phased array radar is discussed in detail including the gain, Spurio…
▽ More
Modern Active Phased array Radar systems with a large number of T/R modules, multi-channel receiver down converters and distributed power distribution networks leads to design and analysis of the receive signal path more complex. In this paper receive signal path design of a typical 1000 T/R modules based fully distributed active phased array radar is discussed in detail including the gain, Spurious Free Dynamic Range (SFDR) requirements at different levels. The techniques for optimization of SFDR and system performance also described along with the Systemvue model to receive path calculations is presented.
△ Less
Submitted 19 October, 2019;
originally announced December 2019.
-
Intra-Pulse Polyphase Coding System for Second Trip Suppression in a Weather Radar
Authors:
Mohit Kumar,
V Chandrasekar
Abstract:
This paper describes the design and implementation of intra-pulse polyphase codes for a weather radar system. Algorithms to generate codes with good correlation properties are discussed. Thereafter, a new design framework is described, which optimizes the polyphase code and corresponding mismatched filter, using a cost/error function, especially for weather radars. It establishes the performance o…
▽ More
This paper describes the design and implementation of intra-pulse polyphase codes for a weather radar system. Algorithms to generate codes with good correlation properties are discussed. Thereafter, a new design framework is described, which optimizes the polyphase code and corresponding mismatched filter, using a cost/error function, especially for weather radars. It establishes the performance of these intra-pulse techniques with specific application towards second trip removal. The developed code is implemented on NASA D3R, which is a dual-frequency, dual-polarization, Doppler weather radar system.
△ Less
Submitted 29 November, 2019;
originally announced December 2019.
-
Learning Domain Invariant Representations for Child-Adult Classification from Speech
Authors:
Rimita Lahiri,
Manoj Kumar,
Somer Bishop,
Shrikanth Narayanan
Abstract:
Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that go from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is spea…
▽ More
Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that go from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is speaking when i.e., perform child-adult speaker classification. This binary classification task is often confounded due to variability associated with the participants' speech and background conditions. Further, scarcity of training data often restricts direct application of conventional deep learning methods. In this work, we address two major sources of variability - age of the child and data source collection location - using domain adversarial learning which does not require labeled target domain data. We use two methods, generative adversarial training with inverted label loss and gradient reversal layer to learn speaker embeddings invariant to the above sources of variability, and analyze different conditions under which the proposed techniques improve over conventional learning methods. Using a large corpus of ADOS-2 (autism diagnostic observation schedule, 2nd edition) sessions, we demonstrate upto 13.45% and 6.44% relative improvements over conventional learning methods.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
A study of semi-supervised speaker diarization system using gan mixture model
Authors:
Monisankha Pal,
Manoj Kumar,
Raghuveer Peri,
Shrikanth Narayanan
Abstract:
We propose a new speaker diarization system based on a recently introduced unsupervised clustering technique namely, generative adversarial network mixture model (GANMM). The proposed system uses x-vectors as front-end representation. Spectral embedding is used for dimensionality reduction followed by k-means initialization during GANMM pre-training. GANMM performs unsupervised speaker clustering…
▽ More
We propose a new speaker diarization system based on a recently introduced unsupervised clustering technique namely, generative adversarial network mixture model (GANMM). The proposed system uses x-vectors as front-end representation. Spectral embedding is used for dimensionality reduction followed by k-means initialization during GANMM pre-training. GANMM performs unsupervised speaker clustering by efficiently capturing complex data distributions. Experimental results on the AMI meeting corpus show that the proposed semi-supervised diarization system matches or exceeds the performance of competitive baselines. On an evaluation set containing fifty sessions with varying durations, the best achieved average diarization error rate (DER) is 17.11%, a relative improvement of 33% over the information bottleneck baseline and comparable to xvector baseline.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Meta-learning for robust child-adult classification from speech
Authors:
Nithin Rao Koluguri,
Manoj Kumar,
So Hyun Kim,
Catherine Lord,
Shrikanth Narayanan
Abstract:
Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. T…
▽ More
Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. Training a speaker classification system robust to speaker and channel conditions is challenging due to inherent variability in the speech within children and the adult interlocutors. In this work, we propose the use of meta-learning, in particular, prototypical networks which optimize a metric space across multiple tasks. By modeling every child-adult pair in the training set as a separate task during meta-training, we learn a representation with improved generalizability compared to conventional supervised learning. We demonstrate improvements over state-of-the-art speaker embeddings (x-vectors) under two evaluation settings: weakly supervised classification (up to 14.53% relative improvement in F1-scores) and clustering (up to relative 9.66% improvement in cluster purity). Our results show that protonets can potentially extract robust speaker embeddings for child-adult classification from speech.
△ Less
Submitted 28 October, 2019; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Speaker diarization using latent space clustering in generative adversarial network
Authors:
Monisankha Pal,
Manoj Kumar,
Raghuveer Peri,
Tae Jin Park,
So Hyun Kim,
Catherine Lord,
Somer Bishop,
Shrikanth Narayanan
Abstract:
In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a co…
▽ More
In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a combination of continuous random variables and discrete one-hot encoded variables using the original speaker labels. We benchmark our proposed system on the AMI meeting corpus, and two child-clinician interaction corpora (ADOS and BOSCC) from the autism diagnosis domain. ADOS and BOSCC contain diagnostic and treatment outcome sessions respectively obtained in clinical settings for verbal children and adolescents with autism. Experimental results show that our proposed system significantly outperform the state-of-the-art x-vector based diarization system on these databases. Further, we perform embedding fusion with x-vectors to achieve a relative DER improvement of 31%, 36% and 49% on AMI eval, ADOS and BOSCC corpora respectively, when compared to the x-vector baseline using oracle speech segmentation.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
A Novel Scheme of Digital Instantaneous Automatic Gain Control (DIAGC) for Pulse Radars
Authors:
Sumanta Pal,
Nirmala Shanmugam,
Mohit Kumar,
P Radhakrishna
Abstract:
Several schemes for gain control are used for preventing saturation of receiver, and overloading of data processor, tracker or display in pulse radars. The use of digital processing techniques open the door to a variety of digital automatic gain control schemes for analyzing digitized return signals and controlling receiver gain only at saturating clutter zones without affecting the detection at o…
▽ More
Several schemes for gain control are used for preventing saturation of receiver, and overloading of data processor, tracker or display in pulse radars. The use of digital processing techniques open the door to a variety of digital automatic gain control schemes for analyzing digitized return signals and controlling receiver gain only at saturating clutter zones without affecting the detection at other zones. In this paper, we present a novel scheme of Digital Instantaneous Automatic Gain Control (DIAGC) which is based on storing digitally the dwell based clutter returns and deriving the gain control. The returns corresponding to the first two PRTs in a dwell are used to analyze the presence of saturating clutter zones and the depth of saturation. Third PRT onwards proper gain control is applied at the IF stage to prevent saturation of the following stages. FPGA based scheme is used for digital data processing, storing, threshold calculation and gain control generation. The effect of DIAGC on pulse compression is also addressed in this paper.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
Distributed High Speed Optical Network for Digital Radar Systems
Authors:
Vishal Maheshwari,
K. Sreenivasulu,
Mohit Kumar,
Dr. Vengada Rajan,
Sumant Pal,
Mohana Kumari
Abstract:
Modern Digital radar systems with multiple digital beamforming capability are built of a large number of receivers and requires high-speed data interface links for transmission of receiver baseband data to processor units. High data throughput (>250Mbyte/sec) from typical eight-channel receivers will be transmitted to Digital beamformer over high-speed serial interface links over an optical channe…
▽ More
Modern Digital radar systems with multiple digital beamforming capability are built of a large number of receivers and requires high-speed data interface links for transmission of receiver baseband data to processor units. High data throughput (>250Mbyte/sec) from typical eight-channel receivers will be transmitted to Digital beamformer over high-speed serial interface links over an optical channel. Currently for digital radar systems with sub-array level beam former distribution of receiver data is through point to point optical interface links. For the modern element-level digital beamforming radars the distribution of baseband data increases the design complexity. In this paper novel scheme of usage of distributed optical interface network is discussed using high-speed optical transport networks, FPGAs as well as distributed techniques to over the above problem. The recent advances in optical communication and feature of FPGA devices are utilized in the implementation of optical distribution networks and these schemes are covered in this paper.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.