Search | arXiv e-print repository

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2407.01544 [pdf, other]

Decentralized Multi-Party Multi-Network AI for Global Deployment of 6G Wireless Systems

Authors: Merim Dzaferagic, Marco Ruffini, Nina Slamnik-Krijestorac, Joao F. Santos, Johann Marquez-Barja, Christos Tranoris, Spyros Denazis, Thomas Kyriakakis, Panagiotis Karafotis, Luiz DaSilva, Shashi Raj Pandey, Junya Shiraishi, Petar Popovski, Soren Kejser Jensen, Christian Thomsen, Torben Bach Pedersen, Holger Claussen, Jinfeng Du, Gil Zussman, Tingjun Chen, Yiran Chen, Seshu Tirupathi, Ivan Seskar, Daniel Kilper

Abstract: Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to encompass both the radio and the fiber-optical domain. This paper introduces the Decentralized Multi-Party, Multi-Network AI (DMMAI) framework for integrating AI into 6G networks deployed at scale. DM… ▽ More Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to encompass both the radio and the fiber-optical domain. This paper introduces the Decentralized Multi-Party, Multi-Network AI (DMMAI) framework for integrating AI into 6G networks deployed at scale. DMMAI harmonizes AI-driven controls across diverse network platforms and thus facilitates networks that autonomously configure, monitor, and repair themselves. This is particularly crucial at the network edge, where advanced applications meet heightened functionality and security demands. The radio/optical integration is vital due to the current compartmentalization of AI research within these domains, which lacks a comprehensive understanding of their interaction. Our approach explores multi-network orchestration and AI control integration, filling a critical gap in standardized frameworks for AI-driven coordination in 6G networks. The DMMAI framework is a step towards a global standard for AI in 6G, aiming to establish reference use cases, data and model management methods, and benchmarking platforms for future AI/ML solutions. △ Less

Submitted 15 April, 2024; originally announced July 2024.

arXiv:2406.17910 [pdf]

Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

Authors: Ruchika Pandey, Prabhat Singh, Raymond Wei, Shaila Shankar

Abstract: Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant… ▽ More Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant reductions in developer toil, with up to 50% time saved in code documentation and autocompletion, and 30-40% in repetitive coding tasks, unit test generation, debugging, and pair programming. However, Copilot struggles with complex tasks, large functions, multiple files, and proprietary contexts, particularly with C/C++ code. We project a 33-36% time reduction for coding-related tasks in a cloud-first software development lifecycle. This study aims to quantify productivity improvements, identify underperforming scenarios, examine practical benefits and challenges, investigate performance variations across programming languages, and discuss emerging issues related to code quality, security, and developer experience. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 13 pages, 8 figures

arXiv:2406.11356 [pdf, other]

DIDChain: Advancing Supply Chain Data Management with Decentralized Identifiers and Blockchain

Authors: Patrick Herbke, Sid Lamichhane, Kaustabh Barman, Sanjeet Raj Pandey, Axel Küpper, Andreas Abraham, Markus Sabadello

Abstract: Supply chain data management faces challenges in traceability, transparency, and trust. These issues stem from data silos and communication barriers. This research introduces DIDChain, a framework leveraging blockchain technology, Decentralized Identifiers, and the InterPlanetary File System. DIDChain improves supply chain data management. To address privacy concerns, DIDChain employs a hybrid blo… ▽ More Supply chain data management faces challenges in traceability, transparency, and trust. These issues stem from data silos and communication barriers. This research introduces DIDChain, a framework leveraging blockchain technology, Decentralized Identifiers, and the InterPlanetary File System. DIDChain improves supply chain data management. To address privacy concerns, DIDChain employs a hybrid blockchain architecture that combines public blockchain transparency with the control of private systems. Our hybrid approach preserves the authenticity and reliability of supply chain events. It also respects the data privacy requirements of the participants in the supply chain. Central to DIDChain is the cheqd infrastructure. The cheqd infrastructure enables digital tracing of asset events, such as an asset moving from the milk-producing dairy farm to the cheese manufacturer. In this research, assets are raw materials and products. The cheqd infrastructure ensures the traceability and reliability of assets in the management of supply chain data. Our contribution to blockchain-enabled supply chain systems demonstrates the robustness of DIDChain. Integrating blockchain technology through DIDChain offers a solution to data silos and communication barriers. With DIDChain, we propose a framework to transform the supply chain infrastructure across industries. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted to be published at the 18th IEEE International Conference on Service-Oriented System Engineering 2024

arXiv:2405.16684 [pdf, other]

gzip Predicts Data-dependent Scaling Laws

Authors: Rohan Pandey

Abstract: Past work has established scaling laws that predict the performance of a neural language model (LM) as a function of its parameter count and the number of tokens it's trained on, enabling optimal allocation of a fixed compute budget. Are these scaling laws agnostic to training data as some prior work suggests? We generate training datasets of varying complexities by modulating the syntactic proper… ▽ More Past work has established scaling laws that predict the performance of a neural language model (LM) as a function of its parameter count and the number of tokens it's trained on, enabling optimal allocation of a fixed compute budget. Are these scaling laws agnostic to training data as some prior work suggests? We generate training datasets of varying complexities by modulating the syntactic properties of a PCFG, finding that 1) scaling laws are sensitive to differences in data complexity and that 2) gzip, a compression algorithm, is an effective predictor of how data complexity impacts scaling properties. We propose a new data-dependent scaling law for LM's that accounts for the training data's gzip-compressibility; its compute-optimal frontier increases in dataset size preference (over parameter count preference) as training data becomes harder to compress. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 9 pages, 9 figures

arXiv:2404.12816 [pdf, other]

Coexistence of Push Wireless Access with Pull Communication for Content-based Wake-up Radios

Authors: Junya Shiraishi, Sara Cavallero, Shashi Raj Pandey, Fabio Saggese, Petar Popovski

Abstract: This paper considers energy-efficient connectivity for Internet of Things (IoT) devices in a coexistence scenario between two distinctive communication models: pull- and push-based. In pull-based, the base station (BS) decides when to retrieve a specific type of data from the IoT devices, while in push-based, the IoT device decides when and which data to transmit. To this end, this paper advocates… ▽ More This paper considers energy-efficient connectivity for Internet of Things (IoT) devices in a coexistence scenario between two distinctive communication models: pull- and push-based. In pull-based, the base station (BS) decides when to retrieve a specific type of data from the IoT devices, while in push-based, the IoT device decides when and which data to transmit. To this end, this paper advocates introducing the content-based wake-up (CoWu), which enables the BS to remotely activate only a subset of pull-based nodes equipped with wake-up receivers, observing the relevant data. In this setup, a BS pulls data with CoWu at a specific time instance to fulfill its tasks while collecting data from the nodes operating with a push-based communication model. The resource allocation plays an important role: longer data collection duration for pull-based nodes can lead to high retrieval accuracy while decreasing the probability of data transmission success for push-based nodes, and vice versa. Numerical results show that CoWu can manage communication requirements for both pull-based and push-based nodes while realizing the high energy efficiency (up to 38%) of IoT devices, compared to the baseline scheduling method. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Paper submitted to Globecom 2024. Copyright may be transferred without further notice

arXiv:2404.01786 [pdf]

Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model

Authors: Rohit Pandey, Hetvi Waghela, Sneha Rakshit, Aparna Rangari, Anjali Singh, Rahul Kumar, Ratnadeep Ghosal, Jaydip Sen

Abstract: This work delved into the realm of automatic text generation, exploring a variety of techniques ranging from traditional deterministic approaches to more modern stochastic methods. Through analysis of greedy search, beam search, top-k sampling, top-p sampling, contrastive searching, and locally typical searching, this work has provided valuable insights into the strengths, weaknesses, and potentia… ▽ More This work delved into the realm of automatic text generation, exploring a variety of techniques ranging from traditional deterministic approaches to more modern stochastic methods. Through analysis of greedy search, beam search, top-k sampling, top-p sampling, contrastive searching, and locally typical searching, this work has provided valuable insights into the strengths, weaknesses, and potential applications of each method. Each text-generating method is evaluated using several standard metrics and a comparative study has been made on the performance of the approaches. Finally, some future directions of research in the field of automatic text generation are also identified. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: This report pertains to the Capstone Project done by Group 5 of the Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The reports consists of 57 pages and it includes 17 figures and 8 tables. This is the preprint which will be submitted to IEEE CONIT 2024 for review

arXiv:2404.01543 [pdf, other]

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Authors: Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang

Abstract: 3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static… ▽ More 3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static scenes, these methods cannot be simply employed to support realistic facial expressions, such as in the case of a dynamic facial performance. To address these challenges, we propose a novel fast 3D neural implicit head avatar model that achieves real-time rendering while maintaining fine-grained controllability and high rendering quality. Our key idea lies in the introduction of local hash table blendshapes, which are learned and attached to the vertices of an underlying face parametric model. These per-vertex hash-tables are linearly merged with weights predicted via a CNN, resulting in expression dependent embeddings. Our novel representation enables efficient density and color predictions using a lightweight MLP, which is further accelerated by a hierarchical nearest neighbor search method. Extensive experiments show that our approach runs in real-time while achieving comparable rendering quality to state-of-the-arts and decent results on challenging expressions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: In CVPR2024. Project page: https://augmentedperception.github.io/monoavatar-plus

arXiv:2402.11909 [pdf, other]

One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation

Authors: Zhixuan Yu, Ziqian Bai, Abhimitra Meka, Feitong Tan, Qiangeng Xu, Rohit Pandey, Sean Fanello, Hyun Soo Park, Yinda Zhang

Abstract: Traditional methods for constructing high-quality, personalized head avatars from monocular videos demand extensive face captures and training time, posing a significant challenge for scalability. This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user. We learn a generative model for 3D animatable photo-realistic head avatar from… ▽ More Traditional methods for constructing high-quality, personalized head avatars from monocular videos demand extensive face captures and training time, posing a significant challenge for scalability. This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user. We learn a generative model for 3D animatable photo-realistic head avatar from a multi-view dataset of expressions from 2407 subjects, and leverage it as a prior for creating personalized avatar from few-shot images. Different from previous 3D-aware face generative models, our prior is built with a 3DMM-anchored neural radiance field backbone, which we show to be more effective for avatar creation through auto-decoding based on few-shot inputs. We also handle unstable 3DMM fitting by jointly optimizing the 3DMM fitting and camera calibration that leads to better few-shot adaptation. Our method demonstrates compelling results and outperforms existing state-of-the-art methods for few-shot avatar adaptation, paving the way for more efficient and personalized avatar creation. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.10051 [pdf, other]

SwissNYF: Tool Grounded LLM Agents for Black Box Setting

Authors: Somnath Sendhil Kumar, Dhruv Jain, Eshaan Agarwal, Raunak Pandey

Abstract: While Large Language Models (LLMs) have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API ca… ▽ More While Large Language Models (LLMs) have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API call and those necessitating forward planning, like automated action pipelines, present complex challenges. Furthermore, scenarios often arise where a generalized approach is needed because algorithms lack direct access to the specific implementations of these functions or secrets to use them. Traditional tool planning methods are inadequate in these cases, compelling the need to operate within black-box environments. Unlike their performance in tool manipulation, LLMs excel in black-box tasks, such as program synthesis. Therefore, we harness the program synthesis capabilities of LLMs to strategize tool usage in black-box settings, ensuring solutions are verified prior to implementation. We introduce TOPGUN, an ingeniously crafted approach leveraging program synthesis for black box tool planning. Accompanied by SwissNYF, a comprehensive suite that integrates black-box algorithms for planning and verification tasks, addressing the aforementioned challenges and enhancing the versatility and effectiveness of LLMs in complex API interactions. The public code for SwissNYF is available at https://github.com/iclr-dummy-user/SwissNYF. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2312.15024 [pdf, other]

Coded Caching for Hierarchical Two-Layer Networks with Coded Placement

Authors: Rajlaxmi Pandey, Charul Rajput, B. Sundar Rajan

Abstract: We consider two layered hierarchical coded caching problem introduced in Hierarchical coded caching, IEEE Trans. Inf. Theory, 2016, in which a server is connected to $K_1$ mirrors, and each mirror is connected to $K_2$ users. The mirrors and the users are equipped with the cache of size $M_1$ and $M_2$, respectively. We propose a hierarchical coded caching scheme with coded placements that perform… ▽ More We consider two layered hierarchical coded caching problem introduced in Hierarchical coded caching, IEEE Trans. Inf. Theory, 2016, in which a server is connected to $K_1$ mirrors, and each mirror is connected to $K_2$ users. The mirrors and the users are equipped with the cache of size $M_1$ and $M_2$, respectively. We propose a hierarchical coded caching scheme with coded placements that perform better than the existing schemes. In order to ensure a fair comparison with existing schemes, we introduce the notion of composite rate, defined as $\overline{R}=R_1+K_1R_2$, which consists of the rate from server to mirrors $R_1$ and the rate from mirror to users $R_2$. The composite rate has not been discussed before in literature and it represents the total consumed bandwidth in the system. Therefore, it is more appropriate to consider the composite rate along with $R_1$ and $R_2$. For the proposed scheme, we show a trade-off between the global memory $\overline{M}=K_1M_1+K_1K_2M_2$ of the system and the composite rate. We compare the proposed scheme with the existing hierarchical coded caching schemes using the proposed parameter composite rate. Finally, we propose another scheme for the specific case of a single mirror, which performs better for this case. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 16 pages and 5 figures

arXiv:2312.09108 [pdf, ps, other]

doi 10.1109/LNET.2024.3363620

Greedy Shapley Client Selection for Communication-Efficient Federated Learning

Authors: Pranava Singhal, Shashi Raj Pandey, Petar Popovski

Abstract: The standard client selection algorithms for Federated Learning (FL) are often unbiased and involve uniform random sampling of clients. This has been proven sub-optimal for fast convergence under practical settings characterized by significant heterogeneity in data distribution, computing, and communication resources across clients. For applications having timing constraints due to limited communi… ▽ More The standard client selection algorithms for Federated Learning (FL) are often unbiased and involve uniform random sampling of clients. This has been proven sub-optimal for fast convergence under practical settings characterized by significant heterogeneity in data distribution, computing, and communication resources across clients. For applications having timing constraints due to limited communication opportunities with the parameter server (PS), the client selection strategy is critical to complete model training within the fixed budget of communication rounds. To address this, we develop a biased client selection strategy, GreedyFed, that identifies and greedily selects the most contributing clients in each communication round. This method builds on a fast approximation algorithm for the Shapley Value at the PS, making the computation tractable for real-world applications with many clients. Compared to various client selection strategies on several real-world datasets, GreedyFed demonstrates fast and stable convergence with high accuracy under timing constraints and when imposing a higher degree of heterogeneity in data distribution, systems constraints, and privacy requirements. △ Less

Submitted 7 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Networking Letters

arXiv:2312.04875 [pdf, other]

MVDD: Multi-View Depth Diffusion Models

Authors: Zhen Wang, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang

Abstract: Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality… ▽ More Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks. △ Less

Submitted 19 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.03763 [pdf, other]

Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

Authors: Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

Abstract: We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-pla… ▽ More We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-plane payload within each Gaussian rather than directly storing color and opacity. Additionally, we parameterize the Gaussians in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with fine-grained editing over facial features and expressions. Extensive experiments demonstrate the effectiveness of our method. △ Less

Submitted 19 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: project webpage: https://nirvanalan.github.io/projects/gaussian3diff/

arXiv:2312.02611 [pdf, other]

Privacy-Aware Data Acquisition under Data Similarity in Regression Markets

Authors: Shashi Raj Pandey, Pierre Pinson, Petar Popovski

Abstract: Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and… ▽ More Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems (submission version)

arXiv:2311.08053 [pdf, other]

Batch Selection and Communication for Active Learning with Edge Labeling

Authors: Victor Croisfelt, Shashi Raj Pandey, Osvaldo Simeone, Petar Popovski

Abstract: Conventional retransmission (ARQ) protocols are designed with the goal of ensuring the correct reception of all the individual transmitter's packets at the receiver. When the transmitter is a learner communicating with a teacher, this goal is at odds with the actual aim of the learner, which is that of eliciting the most relevant label information from the teacher. Taking an active learning perspe… ▽ More Conventional retransmission (ARQ) protocols are designed with the goal of ensuring the correct reception of all the individual transmitter's packets at the receiver. When the transmitter is a learner communicating with a teacher, this goal is at odds with the actual aim of the learner, which is that of eliciting the most relevant label information from the teacher. Taking an active learning perspective, this paper addresses the following key protocol design questions: (i) Active batch selection: Which batch of inputs should be sent to the teacher to acquire the most useful information and thus reduce the number of required communication rounds? (ii) Batch encoding: Can batches of data points be combined to reduce the communication resources required at each communication round? Specifically, this work introduces Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD), a novel protocol that integrates Bayesian active learning with compression via a linear mix-up mechanism. Comparisons with existing active learning protocols demonstrate the advantages of the proposed approach. △ Less

Submitted 22 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures, conference version, accepted in IEEE ICC 2024, Workshop on Task-Oriented and Generative Communications For 6G

arXiv:2311.04788 [pdf, other]

TinyAirNet: TinyML Model Transmission for Energy-efficient Image Retrieval from IoT Devices

Authors: Junya Shiraishi, Mathias Thorsager, Shashi Raj Pandey, Petar Popovski

Abstract: This letter introduces an energy-efficient pull-based data collection framework for Internet of Things (IoT) devices that use Tiny Machine Learning (TinyML) to interpret data queries. A TinyML model is transmitted from the edge server to the IoT devices. The devices employ the model to facilitate the subsequent semantic queries. This reduces the transmission of irrelevant data, but receiving the M… ▽ More This letter introduces an energy-efficient pull-based data collection framework for Internet of Things (IoT) devices that use Tiny Machine Learning (TinyML) to interpret data queries. A TinyML model is transmitted from the edge server to the IoT devices. The devices employ the model to facilitate the subsequent semantic queries. This reduces the transmission of irrelevant data, but receiving the ML model and its processing at the IoT devices consume additional energy. We consider the specific instance of image retrieval in a single device scenario and investigate the gain brought by the proposed scheme in terms of energy efficiency and retrieval accuracy, while considering the cost of computation and communication, as well as memory constraints. Numerical evaluation shows that, compared to a baseline scheme, the proposed scheme reaches up to 67% energy reduction under the accuracy constraint when many images are stored. Although focused on image retrieval, our analysis is indicative of a broader set of communication scenarios in which the preemptive transmission of an ML model can increase communication efficiency. △ Less

Submitted 17 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 5 pages, 3 figures, Submitted for possible publication

arXiv:2311.00991 [pdf, other]

IR-UWB Radar-based Situational Awareness System for Smartphone-Distracted Pedestrians

Authors: Jamsheed Manja Ppallan, Ruchi Pandey, Yellappa Damam, Vijay Narayan Tiwari, Karthikeyan Arunachalam, Antariksha Ray

Abstract: With the widespread adoption of smartphones, ensuring pedestrian safety on roads has become a critical concern due to smartphone distraction. This paper proposes a novel and real-time assistance system called UWB-assisted Safe Walk (UASW) for obstacle detection and warns users about real-time situations. The proposed method leverages Impulse Radio Ultra-Wideband (IR-UWB) radar embedded in the smar… ▽ More With the widespread adoption of smartphones, ensuring pedestrian safety on roads has become a critical concern due to smartphone distraction. This paper proposes a novel and real-time assistance system called UWB-assisted Safe Walk (UASW) for obstacle detection and warns users about real-time situations. The proposed method leverages Impulse Radio Ultra-Wideband (IR-UWB) radar embedded in the smartphone, which provides excellent range resolution and high noise resilience using short pulses. We implemented UASW specifically for Android smartphones with IR-UWB connectivity. The framework uses complex Channel Impulse Response (CIR) data to integrate rule-based obstacle detection with artificial neural network (ANN) based obstacle classification. The performance of the proposed UASW system is analyzed using real-time collected data. The results show that the proposed system achieves an obstacle detection accuracy of up to 97% and obstacle classification accuracy of up to 95% with an inference delay of 26.8 ms. The results highlight the effectiveness of UASW in assisting smartphone-distracted pedestrians and improving their situational awareness. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2308.14179 [pdf, other]

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

Authors: Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang

Abstract: Mechanistic interpretability seeks to understand the neural mechanisms that enable specific behaviors in Large Language Models (LLMs) by leveraging causality-based methods. While these approaches have identified neural circuits that copy spans of text, capture factual knowledge, and more, they remain unusable for multimodal models since adapting these tools to the vision-language domain requires c… ▽ More Mechanistic interpretability seeks to understand the neural mechanisms that enable specific behaviors in Large Language Models (LLMs) by leveraging causality-based methods. While these approaches have identified neural circuits that copy spans of text, capture factual knowledge, and more, they remain unusable for multimodal models since adapting these tools to the vision-language domain requires considerable architectural changes. In this work, we adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation. We demonstrate our approach on a visual question answering dataset, highlighting the causal relevance of later layer representations for all tokens. Furthermore, we release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability by the community. Our code is available at https://github.com/vedantpalit/Towards-Vision-Language-Mechanistic-Interpretability. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: Final version for 5th Workshop on Closing the Loop Between Vision and Language (CLVL) @ ICCV 2023. 4 pages, 5 figures

arXiv:2308.07265 [pdf, other]

Localization of DOA trajectories -- Beyond the grid

Authors: Ruchi Pandey, Santosh Nannuru

Abstract: The direction of arrival (DOA) estimation algorithms are crucial in localizing acoustic sources. Traditional localization methods rely on block-level processing to extract the directional information from multiple measurements processed together. However, these methods assume that DOA remains constant throughout the block, which may not be true in practical scenarios. Also, the performance of loca… ▽ More The direction of arrival (DOA) estimation algorithms are crucial in localizing acoustic sources. Traditional localization methods rely on block-level processing to extract the directional information from multiple measurements processed together. However, these methods assume that DOA remains constant throughout the block, which may not be true in practical scenarios. Also, the performance of localization methods is limited when the true parameters do not lie on the parameter search grid. In this paper we propose two trajectory models, namely the polynomial and bandlimited trajectory models, to capture the DOA dynamics. To estimate trajectory parameters, we adopt two gridless algorithms: i) Sliding Frank-Wolfe (SFW), which solves the Beurling LASSO problem and ii) Newtonized Orthogonal Matching Pursuit (NOMP), which improves over OMP using cyclic refinement. Furthermore, we extend our analysis to include wideband processing. The simulation results indicate that the proposed trajectory localization algorithms exhibit improved performance compared to grid-based methods in terms of resolution, robustness to noise, and computational efficiency. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.03580 [pdf, other]

Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization

Authors: Akshit Achara, Ram Krishna Pandey

Abstract: Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. However, when tested on unseen data, the models may not perform well. Therefore, the models need to be trained with additional and varying labeled data to improve the generalization. In this work, our goal is to understand the models, their performance and generalizat… ▽ More Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. However, when tested on unseen data, the models may not perform well. Therefore, the models need to be trained with additional and varying labeled data to improve the generalization. In this work, our goal is to understand the models, their performance and generalization. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior. Our proposed distance metric when combined with model performance can help in selecting an appropriate model/architecture from a pool of candidate architectures. We have shown that the generalization of these models can be improved by only adding a small number of unseen images (say 1, 3 or 7) into the training set. Our proposed approach reduces training and annotation costs while providing an estimate of model performance on unseen data in dynamic environments. △ Less

Submitted 29 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.01887 [pdf, other]

Athena 2.0: Discourse and User Modeling in Open Domain Dialogue

Authors: Omkar Patil, Lena Reed, Kevin K. Bowden, Juraj Juraska, Wen Cui, Vrindavan Harrison, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize… ▽ More Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize Challenge aims to create a socialbot, which allows the user to engage in coherent conversations, on a range of popular topics that will interest the user. Here we describe Athena 2.0, UCSC's conversational agent for Amazon's Socialbot Grand Challenge 4. Athena 2.0 utilizes a novel knowledge-grounded discourse model that tracks the entity links that Athena introduces into the dialogue, and uses them to constrain named-entity recognition and linking, and coreference resolution. Athena 2.0 also relies on a user model to personalize topic selection and other aspects of the conversation to individual users. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: Alexa Prize Proceedings, 2021. Socialbot Grand Challenge 4

arXiv:2306.04539 [pdf, other]

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov

Abstract: In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurri… ▽ More In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurring multimodal data (e.g., unlabeled images and captions, video and corresponding audio) but when labeling them is time-consuming. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds to quantify the amount of multimodal interactions in this semi-supervised setting. We propose two lower bounds: one based on the shared information between modalities and the other based on disagreement between separately trained unimodal classifiers, and derive an upper bound through connections to approximate algorithms for min-entropy couplings. We validate these estimated bounds and show how they accurately track true interactions. Finally, we show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks. △ Less

Submitted 13 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: ICLR 2024, Code available at: https://github.com/pliang279/PID

arXiv:2305.19088 [pdf, other]

TrueDeep: A systematic approach of crack detection with less data

Authors: Ram Krishna Pandey, Akshit Achara

Abstract: Supervised and semi-supervised semantic segmentation algorithms require significant amount of annotated data to achieve a good performance. In many situations, the data is either not available or the annotation is expensive. The objective of this work is to show that by incorporating domain knowledge along with deep learning architectures, we can achieve similar performance with less data. We have… ▽ More Supervised and semi-supervised semantic segmentation algorithms require significant amount of annotated data to achieve a good performance. In many situations, the data is either not available or the annotation is expensive. The objective of this work is to show that by incorporating domain knowledge along with deep learning architectures, we can achieve similar performance with less data. We have used publicly available crack segmentation datasets and shown that selecting the input images using knowledge can significantly boost the performance of deep-learning based architectures. Our proposed approaches have many fold advantages such as low annotation and training cost, and less energy consumption. We have measured the performance of our algorithm quantitatively in terms of mean intersection over union (mIoU) and F score. Our algorithms, developed with 23% of the overall data; have a similar performance on the test data and significantly better performance on multiple blind datasets. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.16328 [pdf, other]

Semantic Composition in Visually Grounded Language Models

Authors: Rohan Pandey

Abstract: What is sentence meaning and its ideal representation? Much of the expressive power of human language derives from semantic composition, the mind's ability to represent meaning hierarchically & relationally over constituents. At the same time, much sentential meaning is outside the text and requires grounding in sensory, motor, and experiential modalities to be adequately learned. Although large l… ▽ More What is sentence meaning and its ideal representation? Much of the expressive power of human language derives from semantic composition, the mind's ability to represent meaning hierarchically & relationally over constituents. At the same time, much sentential meaning is outside the text and requires grounding in sensory, motor, and experiential modalities to be adequately learned. Although large language models display considerable compositional ability, recent work shows that visually-grounded language models drastically fail to represent compositional structure. In this thesis, we explore whether & how models compose visually grounded semantics, and how we might improve their ability to do so. Specifically, we introduce 1) WinogroundVQA, a new compositional visual question answering benchmark, 2) Syntactic Neural Module Distillation, a measure of compositional ability in sentence embedding models, 3) Causal Tracing for Image Captioning Models to locate neural representations vital for vision-language composition, 4) Syntactic MeanPool to inject a compositional inductive bias into sentence embeddings, and 5) Cross-modal Attention Congruence Regularization, a self-supervised objective function for vision-language relation alignment. We close by discussing connections of our work to neuroscience, psycholinguistics, formal semantics, and philosophy. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: Carnegie Mellon University Senior Thesis. arXiv admin note: substantial text overlap with arXiv:2212.10549

arXiv:2305.11633 [pdf, other]

Goal-Oriented Communications in Federated Learning via Feedback on Risk-Averse Participation

Authors: Shashi Raj Pandey, Van Phuc Bui, Petar Popovski

Abstract: We treat the problem of client selection in a Federated Learning (FL) setup, where the learning objective and the local incentives of the participants are used to formulate a goal-oriented communication problem. Specifically, we incorporate the risk-averse nature of participants and obtain a communication-efficient on-device performance, while relying on feedback from the Parameter Server (\texttt… ▽ More We treat the problem of client selection in a Federated Learning (FL) setup, where the learning objective and the local incentives of the participants are used to formulate a goal-oriented communication problem. Specifically, we incorporate the risk-averse nature of participants and obtain a communication-efficient on-device performance, while relying on feedback from the Parameter Server (\texttt{PS}). A client has to decide its transmission plan on when not to participate in FL. This is based on its intrinsic incentive, which is the value of the trained global model upon participation by this client. Poor updates not only plunge the performance of the global model with added communication cost but also propagate the loss in performance on other participating devices. We cast the relevance of local updates as \emph{semantic information} for developing local transmission strategies, i.e., making a decision on when to ``not transmit". The devices use feedback about the state of the PS and evaluate their contributions in training the learning model in each aggregation period, which eventually lowers the number of occupied connections. Simulation results validate the efficacy of our proposed approach, with up to $1.4\times$ gain in communication links utilization as compared with the baselines. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Journal ref: PIMRC 2023, WS NAISC

arXiv:2305.04745 [pdf, other]

Controllable Light Diffusion for Portraits

Authors: David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel Sýkora, Rohit Pandey

Abstract: We introduce light diffusion, a novel method to improve lighting in portraits, softening harsh shadows and specular highlights while preserving overall scene illumination. Inspired by professional photographers' diffusers and scrims, our method softens lighting given only a single portrait photo. Previous portrait relighting approaches focus on changing the entire lighting environment, removing sh… ▽ More We introduce light diffusion, a novel method to improve lighting in portraits, softening harsh shadows and specular highlights while preserving overall scene illumination. Inspired by professional photographers' diffusers and scrims, our method softens lighting given only a single portrait photo. Previous portrait relighting approaches focus on changing the entire lighting environment, removing shadows (ignoring strong specular highlights), or removing shading entirely. In contrast, we propose a learning based method that allows us to control the amount of light diffusion and apply it on in-the-wild portraits. Additionally, we design a method to synthetically generate plausible external shadows with sub-surface scattering effects while conforming to the shape of the subject's face. Finally, we show how our approach can increase the robustness of higher level vision applications, such as albedo estimation, geometry estimation and semantic segmentation. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: CVPR 2023

ACM Class: I.4.3

arXiv:2304.01436 [pdf, other]

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduce over-smoothing and improve out-of-model expressions synthesis, we propose to predict local features anchored on the 3DMM geometry. These learnt features are driven by 3DMM deformation and interpolated in 3D space to yield the volumetric radiance at a designated query point. We further show that using a Convolutional Neural Network in the UV space is critical in incorporating spatial context and producing representative local features. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

arXiv:2303.17131 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096062

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

Authors: Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

Abstract: End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations, and in such cases, human knowledge in the form of a pronunciation lexicon can be useful. We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dyna… ▽ More End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations, and in such cases, human knowledge in the form of a pronunciation lexicon can be useful. We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dynamically injects lexicon knowledge into an RNN-T model by adding a phonemic embedding along with a textual embedding. The experimental results show that the proposed PROCTER architecture outperforms the baseline RNN-T model by improving the word error rate (WER) by 44% and 57% when measured on personalized entities and personalized rare entities, respectively, while increasing the model size (number of trainable parameters) by only 1%. Furthermore, when evaluated in a zero-shot setting to recognize personalized device names, we observe 7% WER improvement with PROCTER, as compared to only 1% WER improvement with text-only contextual attention △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: To appear in Proc. IEEE ICASSP

Journal ref: Proc. IEEE ICASSP, June 2023

arXiv:2302.01672 [pdf, other]

The Role of Game Networking in the Fusion of Physical and Digital Worlds through 6G Wireless Networks

Authors: Van-Phuc Bui, Shashi Raj Pandey, Andreas Casparsen, Federico Chiariotti, Petar Popovski

Abstract: The sixth generation (6G) of wireless technology is seen as one of the enablers of real-time fusion of the physical and digital realms, as in Digital Twin, eXtended reality, or the Metaverse. This would allow people to interact, work, and entertain themselves in an immersive social network of online 3D~virtual environments. From the viewpoint of communication and networking, this will represent an… ▽ More The sixth generation (6G) of wireless technology is seen as one of the enablers of real-time fusion of the physical and digital realms, as in Digital Twin, eXtended reality, or the Metaverse. This would allow people to interact, work, and entertain themselves in an immersive social network of online 3D~virtual environments. From the viewpoint of communication and networking, this will represent an evolution of the \emph{game networking} technology, designed to interconnect massive users in real-time online gaming environments. This article presents the basic principles of game networking and discusses their evolution towards meeting the requirements of the Metaverse and similar applications. Several open research challenges are discussed, along with possible solutions through experimental case studies. △ Less

Submitted 28 August, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

arXiv:2301.08998 [pdf, other]

Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings

Authors: Rohan Pandey

Abstract: Past work probing compositionality in sentence embedding models faces issues determining the causal impact of implicit syntax representations. Given a sentence, we construct a neural module net based on its syntax parse and train it end-to-end to approximate the sentence's embedding generated by a transformer model. The distillability of a transformer to a Syntactic NeurAl Module Net (SynNaMoN) th… ▽ More Past work probing compositionality in sentence embedding models faces issues determining the causal impact of implicit syntax representations. Given a sentence, we construct a neural module net based on its syntax parse and train it end-to-end to approximate the sentence's embedding generated by a transformer model. The distillability of a transformer to a Syntactic NeurAl Module Net (SynNaMoN) then captures whether syntax is a strong causal model of its compositional ability. Furthermore, we address questions about the geometry of semantic composition by specifying individual SynNaMoN modules' internal architecture & linearity. We find differences in the distillability of various sentence embedding models that broadly correlate with their performance, but observe that distillability doesn't considerably vary by model size. We also present preliminary evidence that much syntax-guided composition in sentence embedding models is linear, and that non-linearities may serve primarily to handle non-compositional phrases. △ Less

Submitted 8 February, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

Comments: EACL 2023 (camera-ready)

arXiv:2212.10549 [pdf, other]

Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

Authors: Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Abstract: Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships i… ▽ More Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships in the image (e.g., the position of the mug relative to the grass). To tackle this problem, we show that relation alignment can be enforced by encouraging the directed language attention from 'mug' to 'grass' (capturing the semantic relation 'in') to match the directed visual attention from the mug to the grass. Tokens and their corresponding objects are softly identified using the cross-modal attention. We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention matrices under a 'change of basis' provided by the cross-modal attention matrix. Intuitively, our approach projects visual attention into the language attention space to calculate its divergence from the actual language attention, and vice versa. We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground. △ Less

Submitted 4 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2212.03470 [pdf, other]

Improving trajectory localization accuracy via direction-of-arrival derivative estimation

Authors: Ruchi Pandey, Shreyas Jaiswal, Huy Phan, Santosh Nannuru

Abstract: Sound source localization is crucial in acoustic sensing and monitoring-related applications. In this paper, we do a comprehensive analysis of improvement in sound source localization by combining the direction of arrivals (DOAs) with their derivatives which quantify the changes in the positions of sources over time. This study uses the SALSA-Lite feature with a convolutional recurrent neural netw… ▽ More Sound source localization is crucial in acoustic sensing and monitoring-related applications. In this paper, we do a comprehensive analysis of improvement in sound source localization by combining the direction of arrivals (DOAs) with their derivatives which quantify the changes in the positions of sources over time. This study uses the SALSA-Lite feature with a convolutional recurrent neural network (CRNN) model for predicting DOAs and their first-order derivatives. An update rule is introduced to combine the predicted DOAs with the estimated derivatives to obtain the final DOAs. The experimental validation is done using TAU-NIGENS Spatial Sound Events (TNSSE) 2021 dataset. We compare the performance of the networks predicting DOAs with derivative vs. the one predicting only the DOAs at low SNR levels. The results show that combining the derivatives with the DOAs improves the localization accuracy of moving sources. △ Less

Submitted 10 December, 2022; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2209.09775 [pdf, other]

FedToken: Tokenized Incentives for Data Contribution in Federated Learning

Authors: Shashi Raj Pandey, Lam Duc Nguyen, Petar Popovski

Abstract: Incentives that compensate for the involved costs in the decentralized training of a Federated Learning (FL) model act as a key stimulus for clients' long-term participation. However, it is challenging to convince clients for quality participation in FL due to the absence of: (i) full information on the client's data quality and properties; (ii) the value of client's data contributions; and (iii)… ▽ More Incentives that compensate for the involved costs in the decentralized training of a Federated Learning (FL) model act as a key stimulus for clients' long-term participation. However, it is challenging to convince clients for quality participation in FL due to the absence of: (i) full information on the client's data quality and properties; (ii) the value of client's data contributions; and (iii) the trusted mechanism for monetary incentive offers. This often leads to poor efficiency in training and communication. While several works focus on strategic incentive designs and client selection to overcome this problem, there is a major knowledge gap in terms of an overall design tailored to the foreseen digital economy, including Web 3.0, while simultaneously meeting the learning objectives. To address this gap, we propose a contribution-based tokenized incentive scheme, namely \texttt{FedToken}, backed by blockchain technology that ensures fair allocation of tokens amongst the clients that corresponds to the valuation of their data during model training. Leveraging the engineered Shapley-based scheme, we first approximate the contribution of local models during model aggregation, then strategically schedule clients lowering the communication rounds for convergence and anchor ways to allocate \emph{affordable} tokens under a constrained monetary budget. Extensive simulations demonstrate the efficacy of our proposed method. △ Less

Submitted 3 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted at Workshop on Federated Learning: Recent Advances and New Challenges, in Conjunction with NeurIPS 2022 (FL-NeurIPS'22). 9 Pages, 5 Figures

arXiv:2209.04648 [pdf, other]

CoreDeep: Improving Crack Detection Algorithms Using Width Stochasticity

Authors: Ram Krishna Pandey, Akshit Achara

Abstract: Automatically detecting or segmenting cracks in images can help in reducing the cost of maintenance or operations. Detecting, measuring and quantifying cracks for distress analysis in challenging background scenarios is a difficult task as there is no clear boundary that separates cracks from the background. Developed algorithms should handle the inherent challenges associated with data. Some of t… ▽ More Automatically detecting or segmenting cracks in images can help in reducing the cost of maintenance or operations. Detecting, measuring and quantifying cracks for distress analysis in challenging background scenarios is a difficult task as there is no clear boundary that separates cracks from the background. Developed algorithms should handle the inherent challenges associated with data. Some of the perceptually noted challenges are color, intensity, depth, blur, motion-blur, orientation, different region of interest (ROI) for the defect, scale, illumination, complex and challenging background, etc. These variations occur across (crack inter class) and within images (crack intra-class variabilities). Overall, there is significant background (inter) and foreground (intra-class) variability. In this work, we have attempted to reduce the effect of these variations in challenging background scenarios. We have proposed a stochastic width (SW) approach to reduce the effect of these variations. Our proposed approach improves detectability and significantly reduces false positives and negatives. We have measured the performance of our algorithm objectively in terms of mean IoU, false positives and negatives and subjectively in terms of perceptual quality. △ Less

Submitted 29 December, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2206.07785 [pdf, other]

doi 10.1109/JIOT.2023.3310660

Strategic Coalition for Data Pricing in IoT Data Markets

Authors: Shashi Raj Pandey, Pierre Pinson, Petar Popovski

Abstract: This paper considers a market for trading Internet of Things (IoT) data that is used to train machine learning models. The data, either raw or processed, is supplied to the market platform through a network and the price of such data is controlled based on the value it brings to the machine learning model. We explore the correlation property of data in a game-theoretical setting to eventually deri… ▽ More This paper considers a market for trading Internet of Things (IoT) data that is used to train machine learning models. The data, either raw or processed, is supplied to the market platform through a network and the price of such data is controlled based on the value it brings to the machine learning model. We explore the correlation property of data in a game-theoretical setting to eventually derive a simplified distributed solution for a data trading mechanism that emphasizes the mutual benefit of devices and the market. The key proposal is an efficient algorithm for markets that jointly addresses the challenges of availability and heterogeneity in participation, as well as the transfer of trust and the economic value of data exchange in IoT networks. The proposed approach establishes the data market by reinforcing collaboration opportunities between device with correlated data to avoid information leakage. Therein, we develop a network-wide optimization problem that maximizes the social value of coalition among the IoT devices of similar data types; at the same time, it minimizes the cost due to network externalities, i.e., the impact of information leakage due to data correlation, as well as the opportunity costs. Finally, we reveal the structure of the formulated problem as a distributed coalition game and solve it following the simplified split-and-merge algorithm. Simulation results show the efficacy of our proposed mechanism design toward a trusted IoT data market, with up to 32.72% gain in the average payoff for each seller. △ Less

Submitted 29 August, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: 15 pages. 12 figures. This paper has been accepted for publication in IEEE Internet of Things Journal. Copyright may change without notice

arXiv:2206.06650 [pdf, other]

doi 10.1109/TIFS.2023.3259879

Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

Authors: René Bødker Christensen, Shashi Raj Pandey, Petar Popovski

Abstract: Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multipart… ▽ More Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices. △ Less

Submitted 11 April, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 11 pages

MSC Class: 94A60; 68P27

Journal ref: IEEE Transactions on Information Forensics and Security (2023). Vol 18, pp. 1978-1988

arXiv:2204.01542 [pdf, other]

CDKT-FL: Cross-Device Knowledge Transfer using Proxy Dataset in Federated Learning

Authors: Huy Q. Le, Minh N. H. Nguyen, Shashi Raj Pandey, Chaoning Zhang, Choong Seon Hong

Abstract: In a practical setting, how to enable robust Federated Learning (FL) systems, both in terms of generalization and personalization abilities, is one important research question. It is a challenging issue due to the consequences of non-i.i.d. properties of client's data, often referred to as statistical heterogeneity, and small local data samples from the various data distributions. Therefore, to de… ▽ More In a practical setting, how to enable robust Federated Learning (FL) systems, both in terms of generalization and personalization abilities, is one important research question. It is a challenging issue due to the consequences of non-i.i.d. properties of client's data, often referred to as statistical heterogeneity, and small local data samples from the various data distributions. Therefore, to develop robust generalized global and personalized models, conventional FL methods need to redesign the knowledge aggregation from biased local models while considering huge divergence of learning parameters due to skewed client data. In this work, we demonstrate that the knowledge transfer mechanism achieves these objectives and develop a novel knowledge distillation-based approach to study the extent of knowledge transfer between the global model and local models. Henceforth, our method considers the suitability of transferring the outcome distribution and (or) the embedding vector of representation from trained models during cross-device knowledge transfer using a small proxy dataset in heterogeneous FL. In doing so, we alternatively perform cross-device knowledge transfer following general formulations as 1) global knowledge transfer and 2) on-device knowledge transfer. Through simulations on three federated datasets, we show the proposed method achieves significant speedups and high personalized performance of local models. Furthermore, the proposed approach offers a more stable algorithm than other baselines during the training, with minimal communication data load when exchanging the trained model's outcomes and representation. △ Less

Submitted 8 June, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted to Engineering Applications of Artificial Intelligence (EAAI)

arXiv:2203.05369 [pdf, ps, other]

doi 10.1109/LCOMM.2022.3181678

A Contribution-based Device Selection Scheme in Federated Learning

Authors: Shashi Raj Pandey, Lam D. Nguyen, Petar Popovski

Abstract: In a Federated Learning (FL) setup, a number of devices contribute to the training of a common model. We present a method for selecting the devices that provide updates in order to achieve improved generalization, fast convergence, and better device-level performance. We formulate a min-max optimization problem and decompose it into a primal-dual setup, where the duality gap is used to quantify th… ▽ More In a Federated Learning (FL) setup, a number of devices contribute to the training of a common model. We present a method for selecting the devices that provide updates in order to achieve improved generalization, fast convergence, and better device-level performance. We formulate a min-max optimization problem and decompose it into a primal-dual setup, where the duality gap is used to quantify the device-level performance. Our strategy combines \emph{exploration} of data freshness through a random device selection with \emph{exploitation} through simplified estimates of device contributions. This improves the performance of the trained model both in terms of generalization and personalization. A modified Truncated Monte-Carlo (TMC) method is applied during the exploitation phase to estimate the device's contribution and lower the communication overhead. The experimental results show that the proposed approach has a competitive performance, with lower communication overhead and competitive personalization performance against the baseline schemes. △ Less

Submitted 7 June, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: This work has been accepted for publication in IEEE Communications Letters

arXiv:2201.04873 [pdf, other]

VoLux-GAN: A Generative Model for 3D Face Synthesis with HDRI Relighting

Authors: Feitong Tan, Sean Fanello, Abhimitra Meka, Sergio Orts-Escolano, Danhang Tang, Rohit Pandey, Jonathan Taylor, Ping Tan, Yinda Zhang

Abstract: We propose VoLux-GAN, a generative framework to synthesize 3D-aware faces with convincing relighting. Our main contribution is a volumetric HDRI relighting method that can efficiently accumulate albedo, diffuse and specular lighting contributions along each 3D ray for any desired HDR environmental map. Additionally, we show the importance of supervising the image decomposition process using multip… ▽ More We propose VoLux-GAN, a generative framework to synthesize 3D-aware faces with convincing relighting. Our main contribution is a volumetric HDRI relighting method that can efficiently accumulate albedo, diffuse and specular lighting contributions along each 3D ray for any desired HDR environmental map. Additionally, we show the importance of supervising the image decomposition process using multiple discriminators. In particular, we propose a data augmentation technique that leverages recent advances in single image portrait relighting to enforce consistent geometry, albedo, diffuse and specular components. Multiple experiments and comparisons with other generative frameworks show how our model is a step forward towards photorealistic relightable 3D generative models. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2112.02870 [pdf, other]

A Marketplace for Trading AI Models based on Blockchain and Incentives for IoT Data

Authors: Lam Duc Nguyen, Shashi Raj Pandey, Soret Beatriz, Arne Broering, Petar Popovski

Abstract: As Machine Learning (ML) models are becoming increasingly complex, one of the central challenges is their deployment at scale, such that companies and organizations can create value through Artificial Intelligence (AI). An emerging paradigm in ML is a federated approach where the learning model is delivered to a group of heterogeneous agents partially, allowing agents to train the model locally wi… ▽ More As Machine Learning (ML) models are becoming increasingly complex, one of the central challenges is their deployment at scale, such that companies and organizations can create value through Artificial Intelligence (AI). An emerging paradigm in ML is a federated approach where the learning model is delivered to a group of heterogeneous agents partially, allowing agents to train the model locally with their own data. However, the problem of valuation of models, as well the questions of incentives for collaborative training and trading of data/models, have received limited treatment in the literature. In this paper, a new ecosystem of ML model trading over a trusted Blockchain-based network is proposed. The buyer can acquire the model of interest from the ML market, and interested sellers spend local computations on their data to enhance that model's quality. In doing so, the proportional relation between the local data and the quality of trained models is considered, and the valuations of seller's data in training the models are estimated through the distributed Data Shapley Value (DSV). At the same time, the trustworthiness of the entire trading process is provided by the distributed Ledger Technology (DLT). Extensive experimental evaluation of the proposed approach shows a competitive run-time performance, with a 15\% drop in the cost of execution, and fairness in terms of incentives for the participants. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: 14 pages, 9 figures, submitted for publication

arXiv:2111.05811 [pdf, other]

Internet of Things (IoT) Connectivity in 6G: An Interplay of Time, Space, Intelligence, and Value

Authors: Petar Popovski, Federico Chiariotti, Victor Croisfelt, Anders E. Kalør, Israel Leyva-Mayorga, Letizia Marchegiani, Shashi Raj Pandey, Beatriz Soret

Abstract: Internet of Things (IoT) connectivity has a prominent presence in the 5G wireless communication systems. As these systems are being deployed, there is a surge of research efforts and visions towards 6G wireless systems. In order to position the evolution of IoT within the 6G systems, this paper first takes a critical view on the way IoT connectivity is supported within 5G. Following that, the wire… ▽ More Internet of Things (IoT) connectivity has a prominent presence in the 5G wireless communication systems. As these systems are being deployed, there is a surge of research efforts and visions towards 6G wireless systems. In order to position the evolution of IoT within the 6G systems, this paper first takes a critical view on the way IoT connectivity is supported within 5G. Following that, the wireless IoT evolution is discussed through multiple dimensions: time, space, intelligence, and value. We also conjecture that the focus will broaden from IoT devices and their connections towards the emergence of complex IoT environments, seen as building blocks of the overall IoT ecosystem. △ Less

Submitted 11 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: Submitted for publication

arXiv:2111.02519 [pdf, other]

Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot

Authors: Juraj Juraska, Kevin K. Bowden, Lena Reed, Vrindavan Harrison, Wen Cui, Omkar Patil, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Pr… ▽ More Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Prize during the 20/21 competition. A live demo of Athena as well as video recordings will provoke discussion on the state of the art in conversational AI. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: Accepted to EMNLP 2021 System Demonstrations

arXiv:2108.10316 [pdf, ps, other]

A Generalization of the ASR Search Algorithm to 2-Generator Quasi-Twisted Codes

Authors: Dev Akre, Nuh Aydin, Matthew J. Harrington, Saurav R. Pandey

Abstract: One of the main goals of coding theory is to construct codes with best possible parameters and properties. A special class of codes called quasi-twisted (QT) codes is well-known to produce codes with good parameters. Most of the work on QT codes has been over the 1-generator case. In this work, we focus on 2-generator QT codes and generalize the ASR algorithm that has been very effective to produc… ▽ More One of the main goals of coding theory is to construct codes with best possible parameters and properties. A special class of codes called quasi-twisted (QT) codes is well-known to produce codes with good parameters. Most of the work on QT codes has been over the 1-generator case. In this work, we focus on 2-generator QT codes and generalize the ASR algorithm that has been very effective to produce new linear codes from 1-generator QT codes. Moreover, we also generalize a recent algorithm to test equivalence of cyclic codes to constacyclic codes. This algorithm makes the ASR search even more effective. As a result of implementing our algorithm, we have found 103 QT codes that are new among the class of QT codes. Additionally, most of these codes possess the following additional properties: a) they have the same parameters as best known linear codes, and b) many of the have additional desired properties such as being LCD and dual-containing. Further, we have also found a binary 2-generator QT code that is new (record breaking) among all binary linear codes and its extension yields another record breaking binary linear code. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2108.06752 [pdf, ps, other]

New Binary and Ternary Quasi-Cyclic Codes with Good Properties

Authors: Dev Akre, Nuh Aydin, Matthew J. Harrington, Saurav R. Pandey

Abstract: One of the most important and challenging problems in coding theory is to construct codes with best possible parameters and properties. The class of quasi-cyclic (QC) codes is known to be fertile to produce such codes. Focusing on QC codes over the binary field, we have found 113 binary QC codes that are new among the class of QC codes using an implementation of a fast cyclic partitioning algorith… ▽ More One of the most important and challenging problems in coding theory is to construct codes with best possible parameters and properties. The class of quasi-cyclic (QC) codes is known to be fertile to produce such codes. Focusing on QC codes over the binary field, we have found 113 binary QC codes that are new among the class of QC codes using an implementation of a fast cyclic partitioning algorithm and the highly effective ASR algorithm. Moreover, these codes have the following additional properties: a) they have the same parameters as best known linear codes, and b) many of the have additional desired properties such as being reversible, LCD, self-orthogonal or dual-containing. Additionally, we present an algorithm for the generation of new codes from QC codes using ConstructionX, and introduce 35 new record breaking linear codes produced from this method. △ Less

Submitted 15 August, 2021; originally announced August 2021.

MSC Class: 94B05; 94B15; 94B65

arXiv:2108.04208 [pdf, other]

Experiences with the Introduction of AI-based Tools for Moderation Automation of Voice-based Participatory Media Forums

Authors: Aman Khullar, Paramita Panjal, Rachit Pandey, Abhishek Burnwal, Prashit Raj, Ankit Akash Jha, Priyadarshi Hitesh, R Jayanth Reddy, Himanshu, Aaditeshwar Seth

Abstract: Voice-based discussion forums where users can record audio messages which are then published for other users to listen and comment, are often moderated to ensure that the published audios are of good quality, relevant, and adhere to editorial guidelines of the forum. There is room for the introduction of AI-based tools in the moderation process, such as to identify and filter out blank or noisy au… ▽ More Voice-based discussion forums where users can record audio messages which are then published for other users to listen and comment, are often moderated to ensure that the published audios are of good quality, relevant, and adhere to editorial guidelines of the forum. There is room for the introduction of AI-based tools in the moderation process, such as to identify and filter out blank or noisy audios, use speech recognition to transcribe the voice messages in text, and use natural language processing techniques to extract relevant metadata from the audio transcripts. We design such tools and deploy them within a social enterprise working in India that runs several voice-based discussion forums. We present our findings in terms of the time and cost-savings made through the introduction of these tools, and describe the feedback of the moderators towards the acceptability of AI-based automation in their workflow. Our work forms a case-study in the use of AI for automation of several routine tasks, and can be especially relevant for other researchers and practitioners involved with the use of voice-based technologies in developing regions of the world. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Journal ref: The 3rd KDD Workshop on Data Science for Social Good, 2021

arXiv:2103.15573 [pdf, other]

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences

Authors: Feitong Tan, Danhang Tang, Mingsong Dou, Kaiwen Guo, Rohit Pandey, Cem Keskin, Ruofei Du, Deqing Sun, Sofien Bouaziz, Sean Fanello, Ping Tan, Yinda Zhang

Abstract: In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a fe… ▽ More In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels as if they were projected onto the surface of a 3D human scan. To this end, we introduce novel loss functions to push features apart according to their geodesic distances on the surface. Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space. Extensive experiments show that the learned embeddings can produce accurate correspondences between images with remarkable generalization capabilities on both intra and inter subjects. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.13293 [pdf, other]

Energy-aware Resource Management for Federated Learning in Multi-access Edge Computing Systems

Authors: Chit Wutyee Zaw, Shashi Raj Pandey, Kitae Kim, Choong Seon Hong

Abstract: In Federated Learning (FL), a global statistical model is developed by encouraging mobile users to perform the model training on their local data and aggregating the output local model parameters in an iterative manner. However, due to limited energy and computation capability at the mobile devices, the performance of the model training is always at stake to meet the objective of local energy mini… ▽ More In Federated Learning (FL), a global statistical model is developed by encouraging mobile users to perform the model training on their local data and aggregating the output local model parameters in an iterative manner. However, due to limited energy and computation capability at the mobile devices, the performance of the model training is always at stake to meet the objective of local energy minimization. In this regard, Multi-access Edge Computing (MEC)-enabled FL addresses the tradeoff between the model performance and the energy consumption of the mobile devices by allowing users to offload a portion of their local dataset to an edge server for the model training. Since the edge server has high computation capability, the time consumption of the model training at the edge server is insignificant. However, the time consumption for dataset offloading from mobile users to the edge server has a significant impact on the total time consumption. Thus, resource management in MEC-enabled FL is challenging, where the objective is to reduce the total time consumption while saving the energy consumption of the mobile devices. In this paper, we formulate an energy-aware resource management for MEC-enabled FL in which the model training loss and the total time consumption are jointly minimized, while considering the energy limitation of mobile devices. In addition, we recast the formulated problem as a Generalized Nash Equilibrium Problem (GNEP) to capture the coupling constraints between the radio resource management and dataset offloading. We then analyze the impact of the dataset offloading and computing resource allocation on the model training loss, time, and the energy consumption. △ Less

Submitted 11 January, 2021; originally announced March 2021.

arXiv:2103.06049 [pdf]

Search Disaster Victims using Sound Source Localization

Authors: Abhish Khanal, Deepak Chand, Prakash Chaudhary, Subash Timilsina, Sanjeeb Prasad Panday, Aman Shakya, Rom Kant Pandey

Abstract: Sound Source Localization (SSL) are used to estimate the position of sound sources. Various methods have been used for detecting sound and its localization. This paper presents a system for stationary sound source localization by cubical microphone array consisting of eight microphones placed on four vertical adjacent faces which is mounted on three wheel omni-directional drive for the inspection… ▽ More Sound Source Localization (SSL) are used to estimate the position of sound sources. Various methods have been used for detecting sound and its localization. This paper presents a system for stationary sound source localization by cubical microphone array consisting of eight microphones placed on four vertical adjacent faces which is mounted on three wheel omni-directional drive for the inspection and monitoring of the disaster victims in disaster areas. The proposed method localizes sound source on a 3D space by grid search method using Generalized Cross Correlation Phase Transform (GCC-PHAT) which is robust when operating in real life scenario where there is lack of visibility. The computed azimuth and elevation angle of victimized human voice are fed to embedded omni-directional drive system which navigates the vehicle automatically towards the stationary sound source. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: 9 pages, 17 figures, 17th ISCRAM Conference Blacksburg, VA, USA

Journal ref: Iscram 2020 1022-1030

arXiv:2012.00425 [pdf, other]

doi 10.1109/JIOT.2021.3085429

Edge-assisted Democratized Learning Towards Federated Analytics

Authors: Shashi Raj Pandey, Minh N. H. Nguyen, Tri Nguyen Dang, Nguyen H. Tran, Kyi Thar, Zhu Han, Choong Seon Hong

Abstract: A recent take towards Federated Analytics (FA), which allows analytical insights of distributed datasets, reuses the Federated Learning (FL) infrastructure to evaluate the summary of model performances across the training devices. However, the current realization of FL adopts single server-multiple client architecture with limited scope for FA, which often results in learning models with poor gene… ▽ More A recent take towards Federated Analytics (FA), which allows analytical insights of distributed datasets, reuses the Federated Learning (FL) infrastructure to evaluate the summary of model performances across the training devices. However, the current realization of FL adopts single server-multiple client architecture with limited scope for FA, which often results in learning models with poor generalization, i.e., an ability to handle new/unseen data, for real-world applications. Moreover, a hierarchical FL structure with distributed computing platforms demonstrates incoherent model performances at different aggregation levels. Therefore, we need to design a robust learning mechanism than the FL that (i) unleashes a viable infrastructure for FA and (ii) trains learning models with better generalization capability. In this work, we adopt the novel democratized learning (Dem-AI) principles and designs to meet these objectives. Firstly, we show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn, as a practical framework to empower generalization capability in support of FA. Secondly, we validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions by leveraging the distributed computing infrastructure. The distributed edge computing servers construct regional models, minimize the communication loads, and ensure distributed data analytic application's scalability. To that end, we adhere to a near-optimal two-sided many-to-one matching approach to handle the combinatorial constraints in Edge-DemLearn and solve it for fast knowledge acquisition with optimization of resource allocation and associations between multiple servers and devices. Extensive simulation results on real datasets demonstrate the effectiveness of the proposed methods. △ Less

Submitted 31 May, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted for publication in IEEE Internet of Things Journal

Showing 1–50 of 103 results for author: Pandey, R