Search | arXiv e-print repository

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Authors: Clayton Cohn, Eduardo Davalos, Caleb Vatral, Joyce Horn Fonteles, Hanchen David Wang, Meiyi Ma, Gautam Biswas

Abstract: Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training e… ▽ More Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training environments has not been conducted. This literature review provides an in-depth analysis of research methods in these environments, proposing a taxonomy and framework that encapsulates recent methodological advances in this field and characterizes the multimodal domain in terms of five modality groups: Natural Language, Video, Sensors, Human-Centered, and Environment Logs. We introduce a novel data fusion category -- mid fusion -- and a graph-based technique for refining literature reviews, termed citation graph pruning. Our analysis reveals that leveraging multiple modalities offers a more holistic understanding of the behaviors and outcomes of learners and trainees. Even when multimodality does not enhance predictive accuracy, it often uncovers patterns that contextualize and elucidate unimodal data, revealing subtleties that a single modality may miss. However, there remains a need for further research to bridge the divide between multimodal learning and training studies and foundational AI research. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Submitted to ACM Computing Surveys. Currently under review

arXiv:2407.08021 [pdf, other]

Field Deployment of Multi-Agent Reinforcement Learning Based Variable Speed Limit Controllers

Authors: Yuhang Zhang, Zhiyao Zhang, Marcos Quiñones-Grueiro, William Barbour, Clay Weston, Gautam Biswas, Daniel Work

Abstract: This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I-24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17-mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and seve… ▽ More This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I-24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17-mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and several safety guards to ensure the posted speed limits satisfy the real-world constraints from the traffic management center and the Tennessee Department of Transportation. Since the time of launch of the system through April, 2024, the system has made approximately 10,000,000 decisions on 8,000,000 trips. The analysis of the controller shows that the MARL policy takes control for up to 98% of the time without intervention from safety guards. The time-space diagrams of traffic speed and control commands illustrate how the algorithm behaves during rush hour. Finally, we quantify the domain mismatch between the simulation and real-world data and demonstrate the robustness of the MARL policy to this mismatch. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2406.15283 [pdf, other]

FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

Authors: Austin Coursey, Junyi Ji, Marcos Quinones-Grueiro, William Barbour, Yuhang Zhang, Tyler Derr, Gautam Biswas, Daniel B. Work

Abstract: Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-s… ▽ More Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-scale lane-level freeway traffic dataset for anomaly detection. Our dataset consists of a month of weekday radar detection sensor data collected in 4 lanes along an 18-mile stretch of Interstate 24 heading toward Nashville, TN, comprising over 3.7 million sensor measurements. We also collect official crash reports from the Nashville Traffic Management Center and manually label all other potential anomalies in the dataset. To show the potential for our dataset to be used in future machine learning and traffic research, we benchmark numerous deep learning anomaly detection models on our dataset. We find that unsupervised graph neural network autoencoders are a promising solution for this problem and that ignoring spatial relationships leads to decreased performance. We demonstrate that our methods can reduce reporting delays by over 10 minutes on average while detecting 75% of crashes. Our dataset and all preprocessing code needed to get started are publicly released at https://vu.edu/ft-aed/ to facilitate future research. △ Less

Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.11003 [pdf, other]

3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments

Authors: Eduardo Davalos, Yike Zhang, Ashwin T. S., Joyce H. Fonteles, Umesh Timalsina, Guatam Biswas

Abstract: This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings, aimed at enhancing joint attention and collaborative efforts in team-based scenarios. Conventional gaze tracking, often limited by monocular cameras and traditional eye-tracking apparatus, struggles with simultaneous data synchronization and analysis from multiple participants in group contexts. Our pro… ▽ More This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings, aimed at enhancing joint attention and collaborative efforts in team-based scenarios. Conventional gaze tracking, often limited by monocular cameras and traditional eye-tracking apparatus, struggles with simultaneous data synchronization and analysis from multiple participants in group contexts. Our proposed framework leverages state-of-the-art computer vision and machine learning techniques to overcome these obstacles, enabling precise 3D gaze estimation without dependence on specialized hardware or complex data fusion. Utilizing facial recognition and deep learning, the framework achieves real-time, tracking of gaze patterns across several individuals, addressing common depth estimation errors, and ensuring spatial and identity consistency within the dataset. Empirical results demonstrate the accuracy and reliability of our method in group environments. This provides mechanisms for significant advances in behavior and interaction analysis in educational and professional training applications in dynamic and unstructured environments. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 9 pages, 8 figures, conference, submitted to ICMI 2024

arXiv:2405.06203 [pdf, other]

A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments

Authors: Joyce Fonteles, Eduardo Davalos, Ashwin T. S., Yike Zhang, Mengxi Zhou, Efrat Ayalon, Alicia Lane, Selena Steinberg, Gabriella Anton, Joshua Danish, Noel Enyedy, Gautam Biswas

Abstract: Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and… ▽ More Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and interpret students' learning patterns. Our study aims to simplify researchers' tasks, using Machine Learning and Multimodal Learning Analytics to support the IA processes. Our study combines machine learning algorithms and multimodal analyses to support and streamline researcher efforts in developing a comprehensive understanding of students' scientific engagement through their movements, gaze, and affective responses in a simulated scenario. To facilitate an effective researcher-AI partnership, we present an initial case study to determine the feasibility of visually representing students' states, actions, gaze, affect, and movement on a timeline. Our case study focuses on a specific science scenario where students learn about photosynthesis. The timeline allows us to investigate the alignment of critical learning moments identified by multimodal and interaction analysis, and uncover insights into students' temporal learning progressions. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03677 [pdf, other]

doi 10.1007/978-3-031-64312-5_2

Towards A Human-in-the-Loop LLM Approach to Collaborative Discourse Analysis

Authors: Clayton Cohn, Caitlin Snyder, Justin Montenegro, Gautam Biswas

Abstract: LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students' collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo… ▽ More LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students' collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo to summarize and categorize students' synergistic learning during collaborative discourse. Our preliminary findings suggest GPT-4-Turbo may be able to characterize students' synergistic learning in a manner comparable to humans and that our approach warrants further investigation. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: In press at the 25th international conference on Artificial Intelligence in Education (AIED) Late-Breaking Results (LBR) track

arXiv:2403.14565 [pdf, other]

doi 10.1609/aaai.v38i21.30364

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Authors: Clayton Cohn, Nicole Hutchins, Tuan Le, Gautam Biswas

Abstract: This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning w… ▽ More This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: In press at EAAI-24: The 14th Symposium on Educational Advances in Artificial Intelligence

arXiv:2310.12359 [pdf, other]

MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits

Authors: Yuhang Zhang, Marcos Quinones-Grueiro, Zhiyao Zhang, Yanbing Wang, William Barbour, Gautam Biswas, Daniel Work

Abstract: Variable Speed Limit (VSL) control acts as a promising highway traffic management strategy with worldwide deployment, which can enhance traffic safety by dynamically adjusting speed limits according to real-time traffic conditions. Most of the deployed VSL control algorithms so far are rule-based, lacking generalizability under varying and complex traffic scenarios. In this work, we propose MARVEL… ▽ More Variable Speed Limit (VSL) control acts as a promising highway traffic management strategy with worldwide deployment, which can enhance traffic safety by dynamically adjusting speed limits according to real-time traffic conditions. Most of the deployed VSL control algorithms so far are rule-based, lacking generalizability under varying and complex traffic scenarios. In this work, we propose MARVEL (Multi-Agent Reinforcement-learning for large-scale Variable spEed Limits), a novel framework for large-scale VSL control on highway corridors with real-world deployment settings. MARVEL utilizes only sensing information observable in the real world as state input and learns through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility, thereby enabling multi-agent coordination. With parameter sharing among all VSL agents, the proposed framework scales to cover corridors with many agents. The policies are trained in a microscopic traffic simulation environment, focusing on a short freeway stretch with 8 VSL agents spanning 7 miles. For testing, these policies are applied to a more extensive network with 34 VSL agents spanning 17 miles of I-24 near Nashville, TN, USA. MARVEL-based method improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 58.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. Besides, we conduct an explainability analysis to examine the decision-making process of the agents and explore the learned policy under different traffic conditions. Finally, we test the response of the policy learned from the simulation-based experiments with real-world data collected from I-24 and illustrate its deployment capability. △ Less

Submitted 17 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2305.12543 [pdf, other]

A Reinforcement Learning Approach for Robust Supervisory Control of UAVs Under Disturbances

Authors: Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

Abstract: In this work, we present an approach to supervisory reinforcement learning control for unmanned aerial vehicles (UAVs). UAVs are dynamic systems where control decisions in response to disturbances in the environment have to be made in the order of milliseconds. We formulate a supervisory control architecture that interleaves with extant embedded control and demonstrates robustness to environmental… ▽ More In this work, we present an approach to supervisory reinforcement learning control for unmanned aerial vehicles (UAVs). UAVs are dynamic systems where control decisions in response to disturbances in the environment have to be made in the order of milliseconds. We formulate a supervisory control architecture that interleaves with extant embedded control and demonstrates robustness to environmental disturbances in the form of adverse wind conditions. We run case studies with a Tarot T-18 Octorotor to demonstrate the effectiveness of our approach and compare it against a classic cascade control architecture used in most vehicles. While the results show the performance difference is marginal for nominal operations, substantial performance improvement is obtained with the supervisory RL approach under unseen wind conditions. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: In review (2023-05-16)

arXiv:2305.12158 [pdf, other]

Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems

Authors: Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

Abstract: In this paper, we leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning (RL) algorithms. Accelerating learning is an active field of RL highly relevant in the context of time-varying systems. Traditional transfer learning methods propose to use prior knowledge of the system behavior to devise a gradual or immediate data-driven transformation of… ▽ More In this paper, we leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning (RL) algorithms. Accelerating learning is an active field of RL highly relevant in the context of time-varying systems. Traditional transfer learning methods propose to use prior knowledge of the system behavior to devise a gradual or immediate data-driven transformation of the control policy obtained through RL. Such transformation is usually computed by estimating the performance of previous control policies based on measurements recently collected from the system. However, such retrospective measures have debatable utility with no guarantees of positive transfer in most cases. Instead, we propose a model-based transformation, such that when actions from a control policy are applied to the target system, a positive transfer is achieved. The transformation can be used as an initialization for the reinforcement learning process to converge to a new optimum. We validate the performance of our approach through four benchmark examples. We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone and achieves comparable performance to linear-quadratic-regulators and model-predictive control when an accurate linear model is known in the three cases. If an accurate model is not known, we empirically show that the proposed approach still guarantees positive transfer with jump-start improvement. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: Published to IEEE CoDiT 2023

arXiv:2205.09836 [pdf, other]

Concurrent Policy Blending and System Identification for Generalized Assistive Control

Authors: Luke Bhan, Marcos Quinones-Grueiro, Gautam Biswas

Abstract: In this work, we address the problem of solving complex collaborative robotic tasks subject to multiple varying parameters. Our approach combines simultaneous policy blending with system identification to create generalized policies that are robust to changes in system parameters. We employ a blending network whose state space relies solely on parameter estimates from a system identification techn… ▽ More In this work, we address the problem of solving complex collaborative robotic tasks subject to multiple varying parameters. Our approach combines simultaneous policy blending with system identification to create generalized policies that are robust to changes in system parameters. We employ a blending network whose state space relies solely on parameter estimates from a system identification technique. As a result, this blending network learns how to handle parameter changes instead of trying to learn how to solve the task for a generalized parameter set simultaneously. We demonstrate our scheme's ability on a collaborative robot and human itching task in which the human has motor impairments. We then showcase our approach's efficiency with a variety of system identification techniques when compared to standard domain randomization. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted to ICRA 2022

arXiv:2202.09698 [pdf]

Analyzing Adaptive Scaffolds that Help Students Develop Self-Regulated Learning Behaviors

Authors: Anabil Munshi, Gautam Biswas, Ryan Baker, Jaclyn Ocumpaugh, Stephen Hutt, Luc Paquette

Abstract: Providing adaptive scaffolds to help learners develop self-regulated learning (SRL) processes has been an important goal for intelligent learning environments. Adaptive scaffolding is especially important in open-ended learning environments (OELE), where novice learners often face difficulties in completing their learning tasks. This paper presents a systematic framework for adaptive scaffolding i… ▽ More Providing adaptive scaffolds to help learners develop self-regulated learning (SRL) processes has been an important goal for intelligent learning environments. Adaptive scaffolding is especially important in open-ended learning environments (OELE), where novice learners often face difficulties in completing their learning tasks. This paper presents a systematic framework for adaptive scaffolding in Betty's Brain, a learning-by-teaching OELE for middle school science, where students construct a causal model to teach a virtual agent, generically named Betty. We evaluate the adaptive scaffolding framework and discuss its implications on the development of more effective scaffolds for SRL in OELEs. We detect key cognitive/metacognitive inflection points, i.e., instances where students' behaviors and performance change as they work on their learning tasks. At such inflection points, Mr. Davis (a mentor agent) or Betty (the teachable agent) provide conversational feedback, focused on strategies to help students become productive learners. We conduct a classroom study with 98 middle schoolers to analyze the impact of adaptive scaffolds on students' learning behaviors and performance. Adaptive scaffolding produced mixed results, with some scaffolds (viz., strategic hints that supported debugging and assessment of causal models) being generally more useful to students than others (viz., encouragement prompts). We also note differences in learning behaviors of High and Low performers after receiving scaffolds. Overall, our findings suggest how adaptive scaffolding in OELEs like Betty's Brain can be further improved to narrow the gap between High and Low performers. △ Less

Submitted 1 June, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

arXiv:2012.06016 [pdf, other]

Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

Authors: Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

Abstract: This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement… ▽ More This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate adaption to the new fault. We demonstrate the performance of our approach combining E-MAML with proximal policy optimization (PPO) on the well-known cart pole example, and then on the fuel transfer system of an aircraft. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2009.12634 [pdf, other]

doi 10.36001/phmconf.2020.v12i1.1289

Complementary Meta-Reinforcement Learning for Fault-Adaptive Control

Authors: Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

Abstract: Faults are endemic to all systems. Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt quickly to system changes to maintain system operations. We present a meta-reinforcement learning approach that quickly adapts its… ▽ More Faults are endemic to all systems. Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt quickly to system changes to maintain system operations. We present a meta-reinforcement learning approach that quickly adapts its control policy to changing conditions. The approach builds upon model-agnostic meta learning (MAML). The controller maintains a complement of prior policies learned under system faults. This "library" is evaluated on a system after a new fault to initialize the new policy. This contrasts with MAML, where the controller derives intermediate policies anew, sampled from a distribution of similar systems, to initialize a new policy. Our approach improves sample efficiency of the reinforcement learning process. We evaluate our approach on an aircraft fuel transfer system under abrupt faults. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: Accepted to PHM Conference 2020

Journal ref: Annual Conference of the PHM Society. Vol. 12. No. 1. 2020

arXiv:2008.04407 [pdf, other]

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

Authors: Ibrahim Ahmed, Marcos Quiñones-Grueiro, Gautam Biswas

Abstract: We propose a novel adaptive reinforcement learning control approach for fault tolerant control of degrading systems that is not preceded by a fault detection and diagnosis step. Therefore, \textit{a priori} knowledge of faults that may occur in the system is not required. The adaptive scheme combines online and offline learning of the on-policy control method to improve exploration and sample effi… ▽ More We propose a novel adaptive reinforcement learning control approach for fault tolerant control of degrading systems that is not preceded by a fault detection and diagnosis step. Therefore, \textit{a priori} knowledge of faults that may occur in the system is not required. The adaptive scheme combines online and offline learning of the on-policy control method to improve exploration and sample efficiency, while guaranteeing stable learning. The offline learning phase is performed using a data-driven model of the system, which is frequently updated to track the system's operating conditions. We conduct experiments on an aircraft fuel transfer system to demonstrate the effectiveness of our approach. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Published in IFAC World Congress 2020

arXiv:2008.04403 [pdf, other]

Comparison of Model Predictive and Reinforcement Learning Methods for Fault Tolerant Control

Authors: Ibrahim Ahmed, Hamed Khorasgani, Gautam Biswas

Abstract: A desirable property in fault-tolerant controllers is adaptability to system changes as they evolve during systems operations. An adaptive controller does not require optimal control policies to be enumerated for possible faults. Instead it can approximate one in real-time. We present two adaptive fault-tolerant control schemes for a discrete time system based on hierarchical reinforcement learnin… ▽ More A desirable property in fault-tolerant controllers is adaptability to system changes as they evolve during systems operations. An adaptive controller does not require optimal control policies to be enumerated for possible faults. Instead it can approximate one in real-time. We present two adaptive fault-tolerant control schemes for a discrete time system based on hierarchical reinforcement learning. We compare their performance against a model predictive controller in presence of sensor noise and persistent faults. The controllers are tested on a fuel tank model of a C-130 plane. Our experiments demonstrate that reinforcement learning-based controllers perform more robustly than model predictive controllers under faults, partially observable system models, and varying sensor noise levels. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Published in IFAC SAFEPROCESS 2018

arXiv:2008.01879 [pdf, other]

A Relearning Approach to Reinforcement Learning for Control of Smart Buildings

Authors: Avisek Naug, Marcos Quiñones-Grueiro, Gautam Biswas

Abstract: This paper demonstrates that continual relearning of control policies using incremental deep reinforcement learning (RL) can improve policy learning for non-stationary processes. We demonstrate this approach for a data-driven 'smart building environment' that we use as a test-bed for developing HVAC controllers for reducing energy consumption of large buildings on our university campus. The non-st… ▽ More This paper demonstrates that continual relearning of control policies using incremental deep reinforcement learning (RL) can improve policy learning for non-stationary processes. We demonstrate this approach for a data-driven 'smart building environment' that we use as a test-bed for developing HVAC controllers for reducing energy consumption of large buildings on our university campus. The non-stationarity in building operations and weather patterns makes it imperative to develop control strategies that are adaptive to changing conditions. On-policy RL algorithms, such as Proximal Policy Optimization (PPO) represent an approach for addressing this non-stationarity, but exploration on the actual system is not an option for safety-critical systems. As an alternative, we develop an incremental RL technique that simultaneously reduces building energy consumption without sacrificing overall comfort. We compare the performance of our incremental RL controller to that of a static RL controller that does not implement the relearning function. The performance of the static controller diminishes significantly over time, but the relearning controller adjusts to changing conditions while ensuring comfort and optimal energy performance. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:1304.2721 [pdf]

Using the Dempster-Shafer Scheme in a Diagnostic Expert System Shell

Authors: Gautam Biswas, Teywansh S. Anand

Abstract: This paper discusses an expert system shell that integrates rule-based reasoning and the Dempster-Shafer evidence combination scheme. Domain knowledge is stored as rules with associated belief functions. The reasoning component uses a combination of forward and backward inferencing mechanisms to allow interaction with users in a mixed-initiative format. This paper discusses an expert system shell that integrates rule-based reasoning and the Dempster-Shafer evidence combination scheme. Domain knowledge is stored as rules with associated belief functions. The reasoning component uses a combination of forward and backward inferencing mechanisms to allow interaction with users in a mixed-initiative format. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the Third Conference on Uncertainty in Artificial Intelligence (UAI1987)

Report number: UAI-P-1987-PG-98-105

arXiv:1111.7051 [pdf]

Design of Image Cryptosystem by Simultaneous VQ-Compression and Shuffling of Codebook and Index Matrix

Authors: Arup Kumar Pal, G. P. Biswas, S. Mukhopadhyay

Abstract: The popularity of Internet usage although increases exponentially, it is incapable of providing the security for exchange of confidential data between the users. As a result, several cryptosystems for encryption of data and images have been developed for secured transmission over Internet. In this work, a scheme for Image encryption/decryption based on Vector Quantization (VQ) has been proposed th… ▽ More The popularity of Internet usage although increases exponentially, it is incapable of providing the security for exchange of confidential data between the users. As a result, several cryptosystems for encryption of data and images have been developed for secured transmission over Internet. In this work, a scheme for Image encryption/decryption based on Vector Quantization (VQ) has been proposed that concurrently encodes the images for compression and shuffles the codebook and the index matrix using pseudorandom sequences for encryption. The processing time of the proposed scheme is much less than the other cryptosystems, because it does not use any traditional cryptographic operations, and instead it performs swapping between the contents of the codebook with respect to a random sequence, which resulted an indirect shuffling of the contents of the index matrix. It may be noted that the security of the proposed cryptosystem depends on the generation and the exchange of the random sequences used. Since the generation of truly random sequences are not practically feasible, we simulate the proposed scheme using MATLAB, where its operators like rand(method, seed), randperm(n) has been used to generate pseudorandom sequences and it has been seen that the proposed cryptosystem shows the expected performance. △ Less

Submitted 30 November, 2011; originally announced November 2011.

Journal ref: The International journal of Multimedia & Its Applications (IJMA), Vol.1, No.1, November 2009

Showing 1–19 of 19 results for author: Biswas, G