Search | arXiv e-print repository

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Authors: Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy

Abstract: We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. We pretrain multiple Transfusion models up to 7B parameters from scratch on a mixture of text and image data, establishing scaling laws with… ▽ More We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. We pretrain multiple Transfusion models up to 7B parameters from scratch on a mixture of text and image data, establishing scaling laws with respect to a variety of uni- and cross-modal benchmarks. Our experiments show that Transfusion scales significantly better than quantizing images and training a language model over discrete image tokens. By introducing modality-specific encoding and decoding layers, we can further improve the performance of Transfusion models, and even compress each image to just 16 patches. We further demonstrate that scaling our Transfusion recipe to 7B parameters and 2T multi-modal tokens produces a model that can generate images and text on a par with similar scale diffusion models and language models, reaping the benefits of both worlds. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 23 pages

arXiv:2408.07841 [pdf]

SustainDC -- Benchmarking for Sustainable Data Center Control

Authors: Avisek Naug, Antonio Guillen, Ricardo Luna, Vineet Gundecha, Desik Rengarajan, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Dejan Markovikj, Lekhapriya D Kashyap, Soumyendu Sarkar

Abstract: Machine learning has driven an exponential increase in computational demand, leading to massive data centers that consume significant amounts of energy and contribute to climate change. This makes sustainable data center control a priority. In this paper, we introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC)… ▽ More Machine learning has driven an exponential increase in computational demand, leading to massive data centers that consume significant amounts of energy and contribute to climate change. This makes sustainable data center control a priority. In this paper, we introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC). SustainDC supports custom DC configurations and tasks such as workload scheduling, cooling optimization, and auxiliary battery management, with multiple agents managing these operations while accounting for the effects of each other. We evaluate various MARL algorithms on SustainDC, showing their performance across diverse DC designs, locations, weather conditions, grid carbon intensity, and workload requirements. Our results highlight significant opportunities for improvement of data center operations using MARL algorithms. Given the increasing use of DC due to AI, SustainDC provides a crucial platform for the development and benchmarking of advanced algorithms essential for achieving sustainable computing and addressing other heterogeneous real-world challenges. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: Under review at Advances in Neural Information Processing Systems 2024 (NeurIPS 2024)

arXiv:2407.21674 [pdf, other]

Synthetic Simplicity: Unveiling Bias in Medical Data Augmentation

Authors: Krishan Agyakari Raja Babu, Rachana Sathish, Mrunal Pattanaik, Rahul Venkataramani

Abstract: Synthetic data is becoming increasingly integral in data-scarce fields such as medical imaging, serving as a substitute for real data. However, its inherent statistical characteristics can significantly impact downstream tasks, potentially compromising deployment performance. In this study, we empirically investigate this issue and uncover a critical phenomenon: downstream neural networks often ex… ▽ More Synthetic data is becoming increasingly integral in data-scarce fields such as medical imaging, serving as a substitute for real data. However, its inherent statistical characteristics can significantly impact downstream tasks, potentially compromising deployment performance. In this study, we empirically investigate this issue and uncover a critical phenomenon: downstream neural networks often exploit spurious distinctions between real and synthetic data when there is a strong correlation between the data source and the task label. This exploitation manifests as \textit{simplicity bias}, where models overly rely on superficial features rather than genuine task-related complexities. Through principled experiments, we demonstrate that the source of data (real vs.\ synthetic) can introduce spurious correlating factors leading to poor performance during deployment when the correlation is absent. We first demonstrate this vulnerability on a digit classification task, where the model spuriously utilizes the source of data instead of the digit to provide an inference. We provide further evidence of this phenomenon in a medical imaging problem related to cardiac view classification in echocardiograms, particularly distinguishing between 2-chamber and 4-chamber views. Given the increasing role of utilizing synthetic datasets, we hope that our experiments serve as effective guidelines for the utilization of synthetic datasets in model training. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.08270 [pdf, other]

SciQu: Accelerating Materials Properties Prediction with Automated Literature Mining for Self-Driving Laboratories

Authors: Anand Babu

Abstract: Assessing different material properties to predict specific attributes, such as band gap, resistivity, young modulus, work function, and refractive index, is a fundamental requirement for materials science-based applications. However, the process is time-consuming and often requires extensive literature reviews and numerous experiments. Our study addresses these challenges by leveraging machine le… ▽ More Assessing different material properties to predict specific attributes, such as band gap, resistivity, young modulus, work function, and refractive index, is a fundamental requirement for materials science-based applications. However, the process is time-consuming and often requires extensive literature reviews and numerous experiments. Our study addresses these challenges by leveraging machine learning to analyze material properties with greater precision and efficiency. By automating the data extraction process and using the extracted information to train machine learning models, our developed model, SciQu, optimizes material properties. As a proof of concept, we predicted the refractive index of materials using data extracted from numerous research articles with SciQu, considering input descriptors such as space group, volume, and bandgap with Root Mean Square Error (RMSE) 0.068 and R2 0.94. Thus, SciQu not only predicts the properties of materials but also plays a key role in self-driving laboratories by optimizing the synthesis parameters to achieve precise shape, size, and phase of the materials subjected to the input parameters. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2404.12498 [pdf]

A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration

Authors: Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Vineet Gundecha, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Soumyendu Sarkar

Abstract: There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabl… ▽ More There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabled control with the purpose of evaluating key sustainability metrics including carbon footprint, energy consumption, and observing temperature hotspots. We demonstrate these capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers. PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning https://www.climatechange.ai/papers/neurips2023/15. arXiv admin note: substantial text overlap with arXiv:2310.03906

arXiv:2404.10991 [pdf]

doi 10.24963/ijcai.2023/688

Function Approximation for Reinforcement Learning Controller for Energy from Spread Waves

Authors: Soumyendu Sarkar, Vineet Gundecha, Sahand Ghorbanpour, Alexander Shmakov, Ashwin Ramesh Babu, Avisek Naug, Alexandre Pichard, Mathieu Cocho

Abstract: The industrial multi-generator Wave Energy Converters (WEC) must handle multiple simultaneous waves coming from different directions called spread waves. These complex devices in challenging circumstances need controllers with multiple objectives of energy capture efficiency, reduction of structural stress to limit maintenance, and proactive protection against high waves. The Multi-Agent Reinforce… ▽ More The industrial multi-generator Wave Energy Converters (WEC) must handle multiple simultaneous waves coming from different directions called spread waves. These complex devices in challenging circumstances need controllers with multiple objectives of energy capture efficiency, reduction of structural stress to limit maintenance, and proactive protection against high waves. The Multi-Agent Reinforcement Learning (MARL) controller trained with the Proximal Policy Optimization (PPO) algorithm can handle these complexities. In this paper, we explore different function approximations for the policy and critic networks in modeling the sequential nature of the system dynamics and find that they are key to better performance. We investigated the performance of a fully connected neural network (FCN), LSTM, and Transformer model variants with varying depths and gated residual connections. Our results show that the transformer model of moderate depth with gated residual connections around the multi-head attention, multi-layer perceptron, and the transformer block (STrXL) proposed in this paper is optimal and boosts energy efficiency by an average of 22.1% for these complex spread waves over the existing spring damper (SD) controller. Furthermore, unlike the default SD controller, the transformer controller almost eliminated the mechanical stress from the rotational yaw motion for angled waves. Demo: https://tinyurl.com/yueda3jh △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: IJCAI 2023, Proceedings of the Thirty-Second International Joint Conference on Artificial IntelligenceAugust 2023

Journal ref: IJCAI 2023, Proceedings of the Thirty-Second International Joint Conference on Artificial IntelligenceAugust 2023, Article No 688, Pages 6201 to 6209

arXiv:2404.10786 [pdf]

doi 10.1609/aaai.v38i20.30238

Sustainability of Data Center Digital Twins with Reinforcement Learning

Authors: Soumyendu Sarkar, Avisek Naug, Antonio Guillen, Ricardo Luna, Vineet Gundecha, Ashwin Ramesh Babu, Sajad Mousavi

Abstract: The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity… ▽ More The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 2024 Proceedings of the AAAI Conference on Artificial Intelligence

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 20, pp. 22322-22330, Mar. 2024

arXiv:2403.18985 [pdf]

doi 10.1609/aaai.v38i21.30579

Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning

Authors: Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Avisek Naug, Sahand Ghorbanpour

Abstract: We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D). The framework focuses on identifying sensitive regions and inducing misclassifications with minimal distortions and various distortion types. The novel RL method outperforms s… ▽ More We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D). The framework focuses on identifying sensitive regions and inducing misclassifications with minimal distortions and various distortion types. The novel RL method outperforms state-of-the-art methods for all three applications, proving its efficiency. Our RL approach produces superior localization masks, enhancing interpretability for image classification and ECG analysis models. For applications such as ECG analysis, our platform highlights critical ECG segments for clinicians while ensuring resilience against prevalent distortions. This comprehensive tool aims to bolster both resilience with adversarial training and transparency across varied applications and data types. △ Less

Submitted 22 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: AAAI Proceedings reference: https://ojs.aaai.org/index.php/AAAI/article/view/30579

Journal ref: 2024 Proceedings of the AAAI Conference on Artificial Intelligence

arXiv:2403.14092 [pdf]

Carbon Footprint Reduction for Sustainable Data Centers in Real-Time

Authors: Soumyendu Sarkar, Avisek Naug, Ricardo Luna, Antonio Guillen, Vineet Gundecha, Sahand Ghorbanpour, Sajad Mousavi, Dejan Markovikj, Ashwin Ramesh Babu

Abstract: As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. This requires a paradigm shift in optimizing power consumption in cooling and IT loads, shifting flexible loads based on the availability of renewable energy in the power grid, and leveraging battery stor… ▽ More As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. This requires a paradigm shift in optimizing power consumption in cooling and IT loads, shifting flexible loads based on the availability of renewable energy in the power grid, and leveraging battery storage from the uninterrupted power supply in data centers, using collaborative agents. The complex association between these optimization strategies and their dependencies on variable external factors like weather and the power grid carbon intensity makes this a hard problem. Currently, a real-time controller to optimize all these goals simultaneously in a dynamic real-world setting is lacking. We propose a Data Center Carbon Footprint Reduction (DC-CFR) multi-agent Reinforcement Learning (MARL) framework that optimizes data centers for the multiple objectives of carbon footprint reduction, energy consumption, and energy cost. The results show that the DC-CFR MARL agents effectively resolved the complex interdependencies in optimizing cooling, load shifting, and energy storage in real-time for various locations under real-world dynamic weather and grid carbon intensity conditions. DC-CFR significantly outperformed the industry standard ASHRAE controller with a considerable reduction in carbon emissions (14.5%), energy usage (14.4%), and energy cost (13.7%) when evaluated over one year across multiple geographical regions. △ Less

Submitted 25 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Journal ref: 2024 Proceedings of the AAAI Conference on Artificial Intelligence

arXiv:2310.18679 [pdf]

N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics

Authors: Sajad Mousavi, Ricardo Luna Gutiérrez, Desik Rengarajan, Vineet Gundecha, Ashwin Ramesh Babu, Avisek Naug, Antonio Guillen, Soumyendu Sarkar

Abstract: We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model's own feedback. Drawing inspiration from human behavior, we explore whether LLMs can emulate the self-correction process observed in humans who often engage in self-reflection and… ▽ More We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model's own feedback. Drawing inspiration from human behavior, we explore whether LLMs can emulate the self-correction process observed in humans who often engage in self-reflection and seek input from others to refine their understanding of complex topics. Our approach is model-agnostic and can be applied across various domains to enhance trustworthiness by addressing fairness, bias, and robustness concerns. We consistently observe performance improvements in LLMs for reducing toxicity and correcting factual errors. △ Less

Submitted 8 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

Journal ref: NeurIPS 2023 Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models 2023(NeurIPS 2023)

arXiv:2310.18626 [pdf]

Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness

Authors: Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Zachariah Carmichael, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna, Gutierrez Antonio Guillen, Avisek Naug

Abstract: We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models. Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment. The benchmark can generate datasets at various distortion levels to assess the robustness of differ… ▽ More We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models. Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment. The benchmark can generate datasets at various distortion levels to assess the robustness of different image classifiers. Our results show that the adversarial samples generated by our framework with any of the image classification models, like ResNet-50, Inception-V3, and VGG-16, are effective and transferable to other models causing them to fail. These failures happen even when these models are adversarially retrained using state-of-the-art techniques, demonstrating the generalizability of our adversarial samples. We achieve competitive performance in terms of net $L_2$ distortion compared to state-of-the-art benchmark techniques on CIFAR-10 and ImageNet; however, we demonstrate our framework achieves such results with simple distortions like Gaussian noise without introducing unnatural artifacts or color bleeds. This is made possible by a model-based reinforcement learning (RL) agent and a technique that reduces a deep tree search of the image for model sensitivity to perturbations, to a one-level analysis and action. The flexibility of choosing distortions and setting classification probability thresholds for multiple classes makes our framework suitable for algorithmic audits. △ Less

Submitted 8 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

arXiv:2310.08715 [pdf, other]

Toward Joint Language Modeling for Speech Units and Text

Authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

Abstract: Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform co… ▽ More Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: EMNLP findings 2023

arXiv:2310.03912 [pdf]

doi 10.1109/CASE56687.2023.10260520

RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels

Authors: Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen, Soumyendu Sarkar

Abstract: Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function… ▽ More Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function optimization and \textit{few-shot} multi-objective optimization. However, even few-shot techniques fail to exploit similarities shared between closely related objectives. In this paper, we combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning. We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL, empowering the surrogates to adapt to contextual information gathered during the BO process. We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art results in continuous high-dimensional optimization problems. △ Less

Submitted 8 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

arXiv:2310.03906 [pdf]

doi 10.1145/3600100.3623732

PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability

Authors: Avisek Naug, Antonio Guillen, Ricardo Luna Gutiérrez, Vineet Gundecha, Dejan Markovikj, Lekhapriya Dheeraj Kashyap, Lorenz Krause, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Soumyendu Sarkar

Abstract: The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A signif… ▽ More The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A significant challenge in this pursuit is the lack of a configurable and scalable thermal data center model that offers an end-to-end pipeline. Data centers consist of multiple IT components whose geometric configuration and heat dissipation make thermal modeling difficult. This paper presents PyDCM, a customizable Data Center Model implemented in Python, that allows users to create unique configurations of IT equipment with custom server specifications and geometric arrangements of IT cabinets. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations and scales sublinearly with the number of CPUs. Also, PyDCM enables the use of Deep Reinforcement Learning via the Gymnasium wrapper to optimize data center cooling and offers a user-friendly platform for testing various data center design prototypes. △ Less

Submitted 26 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: The 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys '23), November 15-16, 2023, Istanbul, Turkey

Journal ref: 2023 BuildSys '23: Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation

arXiv:2309.02591 [pdf, other]

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Authors: Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz , et al. (2 additional authors not shown)

Abstract: We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted fr… ▽ More We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.11814 [pdf, other]

Evaluation of Deep Neural Operator Models toward Ocean Forecasting

Authors: Ellery Rajagopal, Anantha N. S. Babu, Tony Ryu, Patrick J. Haley Jr., Chris Mirabito, Pierre F. J. Lermusiaux

Abstract: Data-driven, deep-learning modeling frameworks have been recently developed for forecasting time series data. Such machine learning models may be useful in multiple domains including the atmospheric and oceanic ones, and in general, the larger fluids community. The present work investigates the possible effectiveness of such deep neural operator models for reproducing and predicting classic fluid… ▽ More Data-driven, deep-learning modeling frameworks have been recently developed for forecasting time series data. Such machine learning models may be useful in multiple domains including the atmospheric and oceanic ones, and in general, the larger fluids community. The present work investigates the possible effectiveness of such deep neural operator models for reproducing and predicting classic fluid flows and simulations of realistic ocean dynamics. We first briefly evaluate the capabilities of such deep neural operator models when trained on a simulated two-dimensional fluid flow past a cylinder. We then investigate their application to forecasting ocean surface circulation in the Middle Atlantic Bight and Massachusetts Bay, learning from high-resolution data-assimilative simulations employed for real sea experiments. We confirm that trained deep neural operator models are capable of predicting idealized periodic eddy shedding. For realistic ocean surface flows and our preliminary study, they can predict several of the features and show some skill, providing potential for future research and applications. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Rajagopal, E., A.N.S. Babu, T. Ryu, P.J. Haley, Jr., C. Mirabito, and P.F.J. Lermusiaux, 2023. Evaluation of Deep Neural Operator Models toward Ocean Forecasting. In OCEANS' 23 IEEE/MTS Gulf Coast, 25-28 September 2023, in press

MSC Class: 76U60; 86A05; 86-08; 86A10; 86A08; 68T01; 68T07; 68T37 ACM Class: J.2; I.2; I.6

arXiv:2305.13516 [pdf, other]

Scaling Speech Technology to 1,000+ Languages

Authors: Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Abstract: Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 languages spoken around the world. The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on… ▽ More Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 languages spoken around the world. The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task. The main ingredients are a new dataset based on readings of publicly available religious texts and effectively leveraging self-supervised learning. We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, as well as a language identification model for 4,017 languages. Experiments show that our multilingual speech recognition model more than halves the word error rate of Whisper on 54 languages of the FLEURS benchmark while being trained on a small fraction of the labeled data. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.04297 [pdf, other]

AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources

Authors: Anakha V Babu, Tekin Bicer, Saugat Kandel, Tao Zhou, Daniel J. Ching, Steven Henke, Siniša Veseli, Ryan Chard, Antonino Miceli, Mathew Joseph Cherukara

Abstract: We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlappin… ▽ More We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlapping scan positions. This acquisition method can enable nanoscale imaging with x-rays and electrons, but this often requires very large experimental datasets and commensurately high turnaround times, which can limit experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we introduce a software system that can automate ptychography data analysis tasks. We accelerate the data analysis pipeline by using a modified version of PtychoNN -- an ML-based approach to solve phase retrieval problem that shows two orders of magnitude speedup compared to traditional iterative methods. Further, our system coordinates and overlaps different data analysis tasks to minimize synchronization overhead between different stages of the workflow. We evaluate our workflow system with real-world experimental workloads from the 26ID beamline at Advanced Photon Source and ThetaGPU cluster at Argonne Leadership Computing Resources. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: 7 pages, 1 figure, to be published in High Performance Computing for Imaging Conference, Electronic Imaging (HPCI 2023)

arXiv:2212.07525 [pdf, other]

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Authors: Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Abstract: Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0… ▽ More Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0 benefits from the rich contextualized target representations introduced in data2vec which enable a fast self-supervised learner. Experiments on ImageNet-1K image classification show that data2vec 2.0 matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time, on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6x less time, and on GLUE natural language understanding it matches a retrained RoBERTa model in half the time. Trading some speed for accuracy results in ImageNet-1K top-1 accuracy of 86.8\% with a ViT-L model trained for 150 epochs. △ Less

Submitted 15 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2210.11546 [pdf, other]

Proof of Backhaul: Trustfree Measurement of Broadband Bandwidth

Authors: Peiyao Sheng, Nikita Yadav, Vishal Sevani, Arun Babu, SVR Anand, Himanshu Tyagi, Pramod Viswanath

Abstract: Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability… ▽ More Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability to prove network performance in a decentralized trustfree manner, i.e., a Byzantine fault tolerant network telemetry system. In this paper, we present a Proof of Backhaul (PoB) protocol which measures the bandwidth of the (broadband) backhaul link of a wireless access point, termed prover, in a decentralized and trustfree manner. In particular, our proposed protocol is the first one to satisfy the following two properties: (1) Trustfree. Bandwidth measurement is secure against Byzantine attacks by collaborations of challenge servers and the prover. (2) Open. The barrier-to-entry for being a challenge server is low; there is no requirement of having a low latency and high throughput path to the measured link. At a high-level, our protocol aggregates the challenge traffic from multiple challenge servers and uses cryptographic primitives to ensure that a subset of challengers or, even challengers and provers, cannot maliciously modify results in their favor. A formal security model allows us to establish guarantees of accurate bandwidth measurement as a function of the fraction of malicious actors. Our evaluation shows that our PoB protocol can verify backhaul bandwidth of up to 1000 Mbps with less than 8% error using measurements lasting only 100 ms. The measurement accuracy is not affected in the presence of corrupted challengers. Importantly, the basic verification protocol lends itself to a minor modification that can measure available bandwidth even in the presence of cross-traffic. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2209.09408 [pdf, other]

Deep learning at the edge enables real-time streaming ptychographic imaging

Authors: Anakha V Babu, Tao Zhou, Saugat Kandel, Tekin Bicer, Zhengchun Liu, William Judge, Daniel J. Ching, Yi Jiang, Sinisa Veseli, Steven Henke, Ryan Chard, Yudong Yao, Ekaterina Sirazitdinova, Geetika Gupta, Martin V. Holt, Ian T. Foster, Antonino Miceli, Mathew J. Cherukara

Abstract: Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact… ▽ More Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.05656 [pdf]

doi 10.1109/CASE49997.2022.9926561

Skip Training for Multi-Agent Reinforcement Learning Controller for Industrial Wave Energy Converters

Authors: Soumyendu Sarkar, Vineet Gundecha, Sahand Ghorbanpour, Alexander Shmakov, Ashwin Ramesh Babu, Alexandre Pichard, Mathieu Cocho

Abstract: Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation. Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture. This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper control… ▽ More Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation. Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture. This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper controller. Our initial studies show that the complex nature of problems makes it hard for training to converge. Hence, we propose a novel skip training approach which enables the MARL training to overcome performance saturation and converge to more optimum controllers compared to default MARL training, boosting power generation. We also present another novel hybrid training initialization (STHTI) approach, where the individual agents of the MARL controllers can be initially trained against the baseline Spring Damper (SD) controller individually and then be trained one agent at a time or all together in future iterations to accelerate convergence. We achieved double-digit gains in energy efficiency over the baseline Spring Damper controller with the proposed MARL controllers using the Asynchronous Advantage Actor-Critic (A3C) algorithm. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE) August 20-24, 2022

Report number: 02

Journal ref: 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

arXiv:2202.03555 [pdf, other]

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Authors: Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

Abstract: While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent repres… ▽ More While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches. △ Less

Submitted 25 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2111.09296 [pdf, other]

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

Abstract: This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, b… ▽ More This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 14-34% relative on average. XLS-R also sets a new state of the art on VoxLingua107 language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform English-only pretraining when translating English speech into other languages, a setting which favors monolingual pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world. △ Less

Submitted 16 December, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

arXiv:2109.06262 [pdf, other]

Evaluating Multiway Multilingual NMT in the Turkic Languages

Authors: Jamshidbek Mirzakhalov, Anoop Babu, Aigiz Kunafin, Ahsan Wahab, Behzod Moydinboyev, Sardana Ivanova, Mokhiyakhon Uzokova, Shaxnoza Pulatova, Duygu Ataman, Julia Kreutzer, Francis Tyers, Orhan Firat, John Licato, Sriram Chellappan

Abstract: Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from… ▽ More Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from the Turkic language family, most of which being extremely under-explored. First, we adopt the TIL Corpus with a few key improvements to the training and the evaluation sets. Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations. We find that the MNMT model outperforms almost all bilingual baselines in the out-of-domain test sets and finetuning the model on a downstream task of a single pair also results in a huge performance boost in both low- and high-resource scenarios. Our attentive analysis of evaluation criteria for MT models in Turkic languages also points to the necessity for further research in this direction. We release the corpus splits, test sets as well as models to the public. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: 9 pages, 3 figures, 7 tables. To be presented at WMT 2021

arXiv:2109.04593 [pdf, other]

A Large-Scale Study of Machine Translation in the Turkic Languages

Authors: Jamshidbek Mirzakhalov, Anoop Babu, Duygu Ataman, Sherzod Kariev, Francis Tyers, Otabek Abduraufov, Mammad Hajili, Sardana Ivanova, Abror Khaytbaev, Antonio Laverghetta Jr., Behzodbek Moydinboyev, Esra Onal, Shaxnoza Pulatova, Ahsan Wahab, Orhan Firat, Sriram Chellappan

Abstract: Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language… ▽ More Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language family in order to realize the gains of NMT for Turkic languages under high-resource to extremely low-resource scenarios. In addition to presenting an extensive analysis that identifies the bottlenecks towards building competitive systems to ameliorate data scarcity, our study has several key contributions, including, i) a large parallel corpus covering 22 Turkic languages consisting of common public datasets in combination with new datasets of approximately 2 million parallel sentences, ii) bilingual baselines for 26 language pairs, iii) novel high-quality test sets in three different translation domains and iv) human evaluation scores. All models, scripts, and data will be released to the public. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: 9 pages, 1 figure, 8 tables. Main proceedings of EMNLP 2021

arXiv:2106.15009 [pdf, other]

Understanding Cognitive Fatigue from fMRI Scans with Self-supervised Learning

Authors: Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Fillia Makedon, Glenn Wylie

Abstract: Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that records neural activations in the brain by capturing the blood oxygen level in different regions based on the task performed by a subject. Given fMRI data, the problem of predicting the state of cognitive fatigue in a person has not been investigated to its full extent. This paper proposes tackling this issue as a multi-… ▽ More Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that records neural activations in the brain by capturing the blood oxygen level in different regions based on the task performed by a subject. Given fMRI data, the problem of predicting the state of cognitive fatigue in a person has not been investigated to its full extent. This paper proposes tackling this issue as a multi-class classification problem by dividing the state of cognitive fatigue into six different levels, ranging from no-fatigue to extreme fatigue conditions. We built a spatio-temporal model that uses convolutional neural networks (CNN) for spatial feature extraction and a long short-term memory (LSTM) network for temporal modeling of 4D fMRI scans. We also applied a self-supervised method called MoCo (Momentum Contrast) to pre-train our model on a public dataset BOLD5000 and fine-tuned it on our labeled dataset to predict cognitive fatigue. Our novel dataset contains fMRI scans from Traumatic Brain Injury (TBI) patients and healthy controls (HCs) while performing a series of N-back cognitive tasks. This method establishes a state-of-the-art technique to analyze cognitive fatigue from fMRI data and beats previous approaches to solve this problem. △ Less

Submitted 19 September, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 8 pages, 5 figures, 2 tables

arXiv:2106.11890 [pdf, other]

Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization

Authors: David Eriksson, Pierce I-Jen Chuang, Samuel Daulton, Peng Xia, Akshat Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, Maximilian Balandat

Abstract: When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trad… ▽ More When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook. △ Less

Submitted 25 June, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: To Appear at the 8th ICML Workshop on Automated Machine Learning, ICML 2021

arXiv:2104.07275 [pdf, other]

Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

Authors: Akshat Shrivastava, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed Aly

Abstract: An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottlenecked by length prediction, as even small inaccuracies change the… ▽ More An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottlenecked by length prediction, as even small inaccuracies change the syntactic and semantic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive parsers which shift the decoding task from text generation to span prediction; that is, when imputing utterance spans into frame slots, our model produces endpoints (e.g., [i, j]) as opposed to text (e.g., "6pm"). This natural quantization of the output space reduces the variability of gold frames, therefore improving length prediction and, ultimately, exact match. Furthermore, length prediction is now responsible for frame syntax and the decoder is responsible for frame semantics, resulting in a coarse-to-fine model. We evaluate our approach on several task-oriented semantic parsing datasets. Notably, we bridge the quality gap between non-autogressive and autoregressive parsers, achieving 87 EM on TOPv2 (Chen et al. 2020). Furthermore, due to our more consistent gold frames, we show strong improvements in model generalization in both cross-domain and cross-lingual transfer in low-resource settings. Finally, due to our diminished output vocabulary, we observe 70% reduction in latency and 83% reduction in memory at beam size 5 compared to prior non-autoregressive parsers. △ Less

Submitted 14 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

arXiv:2104.04923 [pdf, other]

Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

Authors: Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad

Abstract: Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic pars… ▽ More Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic parse trees with an efficient seq2seq model architecture. By combining non-autoregressive prediction with convolutional neural networks, we achieve significant latency gains and parameter size reduction compared to traditional RNN models. Our novel architecture achieves up to an 81% reduction in latency on TOP dataset and retains competitive performance to non-pretrained models on three different semantic parsing datasets. Our code is available at https://github.com/facebookresearch/pytext △ Less

Submitted 11 April, 2021; originally announced April 2021.

arXiv:2012.08662 [pdf, other]

doi 10.1145/3453892.3453999

Automated system to measure Tandem Gait to assess executive functions in children

Authors: Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, Maria Kyrarini, Morris Bell, Fillia Makedon

Abstract: As mobile technologies have become ubiquitous in recent years, computer-based cognitive tests have become more popular and efficient. In this work, we focus on assessing motor function in children by analyzing their gait movements. Although there has been a lot of research on designing automated assessment systems for gait analysis, most of these efforts use obtrusive wearable sensors for measurin… ▽ More As mobile technologies have become ubiquitous in recent years, computer-based cognitive tests have become more popular and efficient. In this work, we focus on assessing motor function in children by analyzing their gait movements. Although there has been a lot of research on designing automated assessment systems for gait analysis, most of these efforts use obtrusive wearable sensors for measuring body movements. We have devised a computer vision-based assessment system that only requires a camera which makes it easier to employ in school or home environments. A dataset has been created with 27 children performing the test. Furthermore in order to improve the accuracy of the system, a deep learning based model was pre-trained on NTU-RGB+D 120 dataset and then it was fine-tuned on our gait dataset. The results highlight the efficacy of proposed work for automating the assessment of children's performances by achieving 76.61% classification accuracy. △ Less

Submitted 28 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2011.00362 [pdf, other]

A Survey on Contrastive Self-supervised Learning

Authors: Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, Fillia Makedon

Abstract: Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural l… ▽ More Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we have a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make substantial progress. △ Less

Submitted 7 February, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

Comments: 20 pages, 18 figures, 6 tables

arXiv:2008.11755 [pdf, other]

doi 10.1145/3453892.3453893

Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks

Authors: Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, Fillia Makedon

Abstract: This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task in order to improve its ability for encoding video representations that are useful in downstream tasks such as human activity recognition. In the proposed method, input video frames are randomly transformed by different spatial transformations, such as rotation, translation and sh… ▽ More This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task in order to improve its ability for encoding video representations that are useful in downstream tasks such as human activity recognition. In the proposed method, input video frames are randomly transformed by different spatial transformations, such as rotation, translation and shearing or temporal transformations such as shuffling temporal order of frames. Then discriminator is encouraged to predict the applied transformation by introducing an auxiliary loss. Subsequently, results prove superiority of the proposed method over baseline methods for providing a useful representation of videos used in human activity recognition performed on datasets such as KTH, UCF101 and Ball-Drop. Ball-Drop dataset is a specifically designed dataset for measuring executive functions in children through physically and cognitively demanding tasks. Using features from proposed method instead of baseline methods caused the top-1 classification accuracy to increase by more then 4%. Moreover, ablation study was performed to investigate the contribution of different transformations on downstream task. △ Less

Submitted 28 December, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

arXiv:2008.02189 [pdf, other]

SpinAPS: A High-Performance Spintronic Accelerator for Probabilistic Spiking Neural Networks

Authors: Anakha V Babu, Osvaldo Simeone, Bipin Rajendran

Abstract: We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-… ▽ More We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-to-spike decoding without the need for conversion from pre-trained ANNs. The proposed solution is shown to achieve comparable performance with an equivalent ANN on handwritten digit and human activity recognition benchmarks. The inference engine, SpinAPS, is shown through software emulation tools to achieve 4x performance improvement in terms of GSOPS/W/mm2 when compared to an equivalent SRAM-based design. The architecture leverages probabilistic spiking neural networks that employ first-to-spike decoding rule to make inference decisions at low latencies, achieving 75% of the test performance in as few as 4 algorithmic time steps on the handwritten digit benchmark. The accelerator also exhibits competitive performance with other memristor-based DNN/SNN accelerators and state-of-the-art GPUs. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Comments: 25 pages, 10 figures, Submitted to Elsevier Neural Networks for review

arXiv:2004.10746 [pdf, other]

Chip Placement with Deep Reinforcement Learning

Authors: Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean

Abstract: In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously… ▽ More In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks. △ Less

Submitted 22 April, 2020; originally announced April 2020.

arXiv:2004.02810 [pdf]

Computer Vision and Abnormal Patient Gait Assessment a Comparison of Machine Learning Models

Authors: Jasmin Hundall, Benson A. Babu

Abstract: Abnormal gait, its associated falls and complications have high patient morbidity, mortality. Computer vision detects, predicts patient gait abnormalities, assesses fall risk and serves as clinical decision support tool for physicians. This paper performs a systematic review of how computer vision, machine learning models perform an abnormal patient's gait assessment. Computer vision is beneficial… ▽ More Abnormal gait, its associated falls and complications have high patient morbidity, mortality. Computer vision detects, predicts patient gait abnormalities, assesses fall risk and serves as clinical decision support tool for physicians. This paper performs a systematic review of how computer vision, machine learning models perform an abnormal patient's gait assessment. Computer vision is beneficial in gait analysis, it helps capture the patient posture. Several literature suggests the use of different machine learning algorithms such as SVM, ANN, K-Star, Random Forest, KNN, among others to perform the classification on the features extracted to study patient gait abnormalities. △ Less

Submitted 21 March, 2020; originally announced April 2020.

Comments: 2 tables

arXiv:2002.01535 [pdf, ps, other]

Lightweight Convolutional Representations for On-Device Natural Language Processing

Authors: Shrey Desai, Geoffrey Goh, Arun Babu, Ahmed Aly

Abstract: The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight… ▽ More The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight convolutional representation that can be swapped into any neural model and compressed significantly (up to 32x) with a negligible reduction in performance. In addition, we show gains over recurrent representations when considering resource-centric metrics (e.g., model file size, latency, memory usage) on a Samsung Galaxy S9. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: Accepted to MLSys 2020

arXiv:1905.10979 [pdf, other]

Scalable K-Medoids via True Error Bound and Familywise Bandits

Authors: Aravindakshan Babu, Saurabh Agarwal, Sudarshan Babu, Hariharan Chandrasekaran

Abstract: K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data.Error analyses of KM have traditionally used an in-sample notion of error,which can be far from the true error and suffer from generalization gap. We formalize the true K-Medoid error based on the underlying data distribution.We decompose the true error into fundamental statistical problems of: minimum estimation (… ▽ More K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data.Error analyses of KM have traditionally used an in-sample notion of error,which can be far from the true error and suffer from generalization gap. We formalize the true K-Medoid error based on the underlying data distribution.We decompose the true error into fundamental statistical problems of: minimum estimation (ME) and minimum mean estimation (MME). We provide a convergence result for MME. We show $\errMME$ decreases no slower than $Θ(\frac{1}{n^{\frac{2}{3}}})$, where $n$ is a measure of sample size. Inspired by this bound, we propose a computationally efficient, distributed KM algorithm namely MCPAM. MCPAM has expected runtime $\mathcal{O}(km)$,where $k$ is the number of medoids and $m$ is number of samples. MCPAM provides massive computational savings for a small tradeoff in accuracy. We verify the quality and scaling properties of MCPAM on various datasets. And achieve the hitherto unachieved feat of calculating the KM of 1 billion points on semi-metric spaces. △ Less

Submitted 29 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

arXiv:1804.01174 [pdf, other]

Towards Deep Learning based Hand Keypoints Detection for Rapid Sequential Movements from RGB Images

Authors: Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, Vassilis Athitsos

Abstract: Hand keypoints detection and pose estimation has numerous applications in computer vision, but it is still an unsolved problem in many aspects. An application of hand keypoints detection is in performing cognitive assessments of a subject by observing the performance of that subject in physical tasks involving rapid finger motion. As a part of this work, we introduce a novel hand key-points benchm… ▽ More Hand keypoints detection and pose estimation has numerous applications in computer vision, but it is still an unsolved problem in many aspects. An application of hand keypoints detection is in performing cognitive assessments of a subject by observing the performance of that subject in physical tasks involving rapid finger motion. As a part of this work, we introduce a novel hand key-points benchmark dataset that consists of hand gestures recorded specifically for cognitive behavior monitoring. We explore the state of the art methods in hand keypoint detection and we provide quantitative evaluations for the performance of these methods on our dataset. In future, these results and our dataset can serve as a useful benchmark for hand keypoint recognition for rapid finger movements. △ Less

Submitted 3 April, 2018; originally announced April 2018.

arXiv:1802.03201 [pdf, other]

Freestyle, a randomized version of ChaCha for resisting offline brute-force and dictionary attacks

Authors: P. Arun Babu, Jithin Jose Thomas

Abstract: This paper introduces Freestyle, a randomized and variable round version of the ChaCha cipher. Freestyle uses the concept of hash based halting condition where a decryption attempt with an incorrect key is likely to take longer time to halt. This makes Freestyle resistant to key-guessing attacks i.e. brute-force and dictionary based attacks. Freestyle demonstrates a novel approach for ciphertext r… ▽ More This paper introduces Freestyle, a randomized and variable round version of the ChaCha cipher. Freestyle uses the concept of hash based halting condition where a decryption attempt with an incorrect key is likely to take longer time to halt. This makes Freestyle resistant to key-guessing attacks i.e. brute-force and dictionary based attacks. Freestyle demonstrates a novel approach for ciphertext randomization by using random number of rounds for each block, where the exact number of rounds are unknown to the receiver in advance. Freestyle provides the possibility of generating $2^{128}$ different ciphertexts for a given key, nonce, and message; thus resisting key and nonce reuse attacks. Due to its inherent random behavior, Freestyle makes cryptanalysis through known-plaintext, chosen-plaintext, and chosen-ciphertext attacks difficult in practice. On the other hand, Freestyle has costlier cipher initialization process, typically generates 3.125% larger ciphertext, and was found to be 1.6 to 3.2 times slower than ChaCha20. Freestyle is suitable for applications that favor ciphertext randomization and resistance to key-guessing and key reuse attacks over performance and ciphertext size. Freestyle is ideal for applications where ciphertext can be assumed to be in full control of an adversary, and an offline key-guessing attack can be carried out. △ Less

Submitted 19 February, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

arXiv:1711.03640 [pdf, other]

Stochastic Deep Learning in Memristive Networks

Authors: Anakha V Babu, Bipin Rajendran

Abstract: We study the performance of stochastically trained deep neural networks (DNNs) whose synaptic weights are implemented using emerging memristive devices that exhibit limited dynamic range, resolution, and variability in their programming characteristics. We show that a key device parameter to optimize the learning efficiency of DNNs is the variability in its programming characteristics. DNNs with s… ▽ More We study the performance of stochastically trained deep neural networks (DNNs) whose synaptic weights are implemented using emerging memristive devices that exhibit limited dynamic range, resolution, and variability in their programming characteristics. We show that a key device parameter to optimize the learning efficiency of DNNs is the variability in its programming characteristics. DNNs with such memristive synapses, even with dynamic range as low as $15$ and only $32$ discrete levels, when trained based on stochastic updates suffer less than $3\%$ loss in accuracy compared to floating point software baseline. We also study the performance of stochastic memristive DNNs when used as inference engines with noise corrupted data and find that if the device variability can be minimized, the relative degradation in performance for the Stochastic DNN is better than that of the software baseline. Hence, our study presents a new optimization corner for memristive devices for building large noise-immune deep learning systems. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: 4 pages, 5 figures, accepted at ICECS 2017

arXiv:1701.02925 [pdf]

doi 10.5121/ijnlc.2016.5603

Question Analysis for Arabic Question Answering Systems

Authors: Waheeb Ahmed, Dr. Anto P Babu

Abstract: The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, t… ▽ More The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, tokenizer,Stop-word removal, Question expansion, Question classification and Question focus extraction components. We employ numerous detection rules and trained classifier using features from this analysis to detect important elements of the question, including: 1) the portion of the question that is a referring to the answer (the focus); 2) different terms in the question that identify what type of entity is being asked for (the lexical answer types); 3) Question expansion ; 4) a process of classifying the question into one or more of several and different types; and We describe how these elements are identified and evaluate the effect of accurate detection on our question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure. △ Less

Submitted 11 January, 2017; originally announced January 2017.

Comments: 10 pages, 3 figures, published article in IJNLC

arXiv:1509.00693 [pdf]

A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data

Authors: Zahid Ansari, Mohammad Fazle Azeem, A. Vinaya Babu, Waseem Ahmed

Abstract: The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users… ▽ More The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users needs. Therefore analyzing users behaviour is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. WUM contains three main steps: preprocessing, knowledge extraction and results analysis. The goal of the preprocessing stage in Web usage mining is to transform the raw web log data into a set of user profiles. Each such profile captures a sequence or a set of URLs representing a user session. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Journal ref: International Journal of Computer Science and Information Security, pp. 70-79 Vol. 9, No. 6, June 2011. (ISSN 1947-5500, IJCSIS Publications, United State)

arXiv:1509.00692 [pdf]

Discovery of Web Usage Profiles Using Various Clustering Techniques

Authors: Zahid Ansari, Waseem Ahmed, M. F. Azeem, A. Vinaya Babu

Abstract: The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely us… ▽ More The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1507.03340

Journal ref: International Journal of Computer Information Systems, pp. 18-27 Vol. 1, No. 3, July 2011. (ISSN 2229-5208, Silicon Valley Publishers, United Kingdom)

arXiv:1509.00690 [pdf]

A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results

Authors: Zahid Ansari, M. F. Azeem, A. Vinaya Babu, Waseem Ahmed

Abstract: Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionize… ▽ More Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionized data in order to capture similar interests and trends among users navigational patterns. Since the sessionized data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality reduction is achieved by eliminating the low support URLs. Very small sessions are also removed in order to filter out the noise from the data. But direct elimination of low support URLs and small sized sessions may results in loss of a significant amount of information especially when the count of low support URLs and small sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions based on a fuzzy membership function. After assigning the weights we apply a Fuzzy c-Mean Clustering algorithm to discover the clusters of user profiles. In this paper, we describe our fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment. Finally we compare our soft computing based approach of dimensionality reduction with the traditional approach of direct elimination of small sessions and low support count URLs. Our results show that fuzzy feature evaluation and dimensionality reduction results in better performance and validity indices for the discovered clusters. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Journal ref: International Journal on Advanced Science Engineering and Information Technology, pp. 67-73 Vol. 2 No. 6, 2012. (ISSN: 2088-5334, INSIGHT Publishers, Indonesia)

arXiv:1507.03340 [pdf]

Quantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions

Authors: Zahid Ansari, M. F. Azeem, Waseem Ahmed, A. Vinaya Babu

Abstract: Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are mini… ▽ More Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn's Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared. △ Less

Submitted 13 July, 2015; originally announced July 2015.

Journal ref: World of Computer Science and Information Technology Journal pp. 217-226, Vol. 1, No. 5, June 2011. (ISSN: 2221- 0741, WCSIT Publisher, Unites States)

arXiv:1104.0848 [pdf, ps, other]

Streaming algorithms for language recognition problems

Authors: Ajesh Babu, Nutan Limaye, Jaikumar Radhakrishnan, Girish Varma

Abstract: We study the complexity of the following problems in the streaming model. Membership testing for \DLIN We show that every language in \DLIN\ can be recognised by a randomized one-pass $O(\log n)$ space algorithm with inverse polynomial one-sided error, and by a deterministic p-pass $O(n/p)$ space algorithm. We show that these algorithms are optimal. Membership testing for \LL$(k)$ For language… ▽ More We study the complexity of the following problems in the streaming model. Membership testing for \DLIN We show that every language in \DLIN\ can be recognised by a randomized one-pass $O(\log n)$ space algorithm with inverse polynomial one-sided error, and by a deterministic p-pass $O(n/p)$ space algorithm. We show that these algorithms are optimal. Membership testing for \LL$(k)$ For languages generated by \LL$(k)$ grammars with a bound of $r$ on the number of nonterminals at any stage in the left-most derivation, we show that membership can be tested by a randomized one-pass $O(r\log n)$ space algorithm with inverse polynomial (in $n$) one-sided error. Membership testing for \DCFL We show that randomized algorithms as efficient as the ones described above for \DLIN\ and $\LL(k)$ (which are subclasses of \DCFL) cannot exist for all of \DCFL: there is a language in \VPL\ (a subclass of \DCFL) for which any randomized p-pass algorithm with error bounded by $ε< 1/2$ must use $Ω(n/p)$ space. Degree sequence problem We study the problem of determining, given a sequence $d_1, d_2,..., d_n$ and a graph $G$, whether the degree sequence of $G$ is precisely $d_1, d_2,..., d_n$. We give a randomized one-pass $O(\log n)$ space algorithm with inverse polynomial one-sided error probability. We show that our algorithms are optimal. Our randomized algorithms are based on the recent work of Magniez et al. \cite{MMN09}; our lower bounds are obtained by considering related communication complexity problems. △ Less

Submitted 5 April, 2011; originally announced April 2011.

arXiv:1011.1058 [pdf, ps, other]

An entropy based proof of the Moore bound for irregular graphs

Authors: S. Ajesh Babu, Jaikumar Radhakrishnan

Abstract: We provide proofs of the following theorems by considering the entropy of random walks: Theorem 1.(Alon, Hoory and Linial) Let G be an undirected simple graph with n vertices, girth g, minimum degree at least 2 and average degree d: Odd girth: If g=2r+1,then n \geq 1 + d*(\Sum_{i=0}^{r-1}(d-1)^i) Even girth: If g=2r,then n \geq 2*(\Sum_{i=0}^{r-1} (d-1)^i) Theorem 2.(Hoory) Let G = (V_L,V_R,E) be… ▽ More We provide proofs of the following theorems by considering the entropy of random walks: Theorem 1.(Alon, Hoory and Linial) Let G be an undirected simple graph with n vertices, girth g, minimum degree at least 2 and average degree d: Odd girth: If g=2r+1,then n \geq 1 + d*(\Sum_{i=0}^{r-1}(d-1)^i) Even girth: If g=2r,then n \geq 2*(\Sum_{i=0}^{r-1} (d-1)^i) Theorem 2.(Hoory) Let G = (V_L,V_R,E) be a bipartite graph of girth g = 2r, with n_L = |V_L| and n_R = |V_R|, minimum degree at least 2 and the left and right average degrees d_L and d_R. Then, n_L \geq \Sum_{i=0}^{r-1}(d_R-1)^{i/2}(d_L-1)^{i/2} n_R \geq \Sum_{i=0}^{r-1}(d_L-1)^{i/2}(d_R-1)^{i/2} △ Less

Submitted 3 November, 2010; originally announced November 2010.

Comments: 6 pages

arXiv:1010.3862 [pdf]

A New Non Linear, Time Stamped & Feed Back Model Based Encryption Mechanism with Acknowledgement Support

Authors: A. V. N. Krishna, A. Vinaya Babu

Abstract: In this work a model is going to be used which develops data distributed over a identified value which is used as nonce (IV). The model considers an equilibrium equation which is a function of non linear relationships, time variant and nonce variant values and takes the feed back of earlier round as input to the present round. The process is repeated for different timings which are used as time st… ▽ More In this work a model is going to be used which develops data distributed over a identified value which is used as nonce (IV). The model considers an equilibrium equation which is a function of non linear relationships, time variant and nonce variant values and takes the feed back of earlier round as input to the present round. The process is repeated for different timings which are used as time stamps in the encryption mechanism. Thus this model generates a distributed sequence which is used as sub key. This model supports very important parameters in symmetric data encryption schemes like non linear relationships between different values used in the model, variable key length, timeliness of encryption mechanism and also acknowledgement between the participating parties. It also supports feed back mode which provides necessary strength against crypto analysis. △ Less

Submitted 19 October, 2010; originally announced October 2010.

Comments: 6 pages

Journal ref: IJoAT Vol 1, No 2 (October 2010)

arXiv:1007.0411 [pdf]

Role of Statistical tests in Estimation of the Security of a New Encryption Algorithm

Authors: Addepalli V. N Krishna, A Vinay Babu

Abstract: Encryption study basically deals with three levels of algorithms. The first algorithm deals with encryption mechanism, second deals with decryption Mechanism and the third discusses about the generation of keys and sub keys used in the encryption study. In the given study, a new algorithm is discussed. The algorithm executes a series of steps and generates a sequence. This sequence is being used a… ▽ More Encryption study basically deals with three levels of algorithms. The first algorithm deals with encryption mechanism, second deals with decryption Mechanism and the third discusses about the generation of keys and sub keys used in the encryption study. In the given study, a new algorithm is discussed. The algorithm executes a series of steps and generates a sequence. This sequence is being used as sub key to be mapped to plain text to generate cipher text. The strength of the encryption & Decryption process depends on the strength of sequence generated against crypto analysis.. In this part of work some statistical tests like Uniformity tests, Universal tests & Repetition tests are tried on the sequence generated to test the strength of it. △ Less

Submitted 1 July, 2010; originally announced July 2010.

Comments: http://ijict.org/index.php/ijoat/article/view/statistical-tests-for-encryption-algorithm

Journal ref: International Journal of Advancements in Technology, Vol 1, No 1 (2010)

Showing 1–50 of 52 results for author: Babu, A