Search | arXiv e-print repository

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM. △ Less

Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Work in progress. Authors are listed in alphabetical order by family name

arXiv:2406.08858 [pdf, other]

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Project page: https://omni.human2humanoid.com/

arXiv:2406.06005 [pdf, other]

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Authors: Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

Abstract: Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still r… ▽ More Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Website and Videos: https://lecar-lab.github.io/wococo/

arXiv:2403.04436 [pdf, other]

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Authors: Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Abstract: We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. To create a large-scale retargeted motion dataset of human movements for humanoid robots, we propose a scalable "sim-to-data" process to filter and pick feasible motions using a privileged motion imitator. Afterwar… ▽ More We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. To create a large-scale retargeted motion dataset of human movements for humanoid robots, we propose a scalable "sim-to-data" process to filter and pick feasible motions using a privileged motion imitator. Afterwards, we train a robust real-time humanoid motion imitator in simulation using these refined motions and transfer it to the real humanoid robot in a zero-shot manner. We successfully achieve teleoperation of dynamic whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based real-time whole-body humanoid teleoperation. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Project website: https://human2humanoid.com/

arXiv:2401.17583 [pdf, other]

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Authors: Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi

Abstract: Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enabl… ▽ More Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles. △ Less

Submitted 21 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Published at RSS 2024, Project website: https://agile-but-safe.github.io/

arXiv:2401.08536 [pdf, other]

Dual-Loop Robust Control of Biased Koopman Operator Model by Noisy Data of Nonlinear Systems

Authors: Tianyi He, Anuj Pal

Abstract: The Koopman operator approach for data-driven control design of a nonlinear system is on the rise because of its capability to capture the behaviours of global dynamics. However, the measurement noises of inputs and outputs will bias the Koopman model identification and cause model mismatch from the actual nonlinear dynamics. The current work evaluates the bounds of the noise-induced model bias of… ▽ More The Koopman operator approach for data-driven control design of a nonlinear system is on the rise because of its capability to capture the behaviours of global dynamics. However, the measurement noises of inputs and outputs will bias the Koopman model identification and cause model mismatch from the actual nonlinear dynamics. The current work evaluates the bounds of the noise-induced model bias of the Koopman operator model and proposes a data-driven robust dual-loop control framework (Koopman based robust control-KROC) for the biased model. First, the model mismatch is found bounded under radial basis functions (RBF) and the bounded noises, and the bound of model mismatch is assessed. Second, the pitfalls of linear quadratic Gaussian (LQG) control based on the biased Koopman model of Van Der Pol oscillator are shown. Motivated from the pitfalls, the dual-loop control is proposed, which consist of an observer-based state-feedback control based on the nominal Koopman model and an additional robust loop to compensate model mismatch. A linear matrix inequality (LMI) is derived, which can guarantee robust stability and performance under bounded noises for the finite-dimensional Koopman operator model. Finally, the proposed framework is implemented to a nonlinear Van Der Pol oscillator to demonstrate enhanced control performance by the dual-loop robust control. △ Less

Submitted 2 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2310.17401 [pdf, other]

Energy Efficient Robust Beamforming for Vehicular ISAC with Imperfect Channel Estimation

Authors: Hanwen Zhang, Haijian Sun, Tianyi He, Weiming Xiang, Rose Qingyang Hu

Abstract: This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded cha… ▽ More This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded channel estimation errors. Next, fractional programming and semidefinite relaxation (SDR) are utilized to relax the rank-1 constraints. We further use Schur complement and S-Procedure to transform Cramer-Rao bound (CRB) and channel estimation error constraints into convex forms, respectively. Based on the Lagrangian dual function and Karush-Kuhn-Tucker (KKT) conditions, it is proved that the optimal beamforming solution is rank-1. Finally, we present comprehensive simulation results to demonstrate two key findings: 1) the proposed algorithm exhibits a favorable convergence rate, and 2) the approach effectively mitigates the impact of channel estimation errors. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE for future publication

arXiv:2310.11954 [pdf, other]

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Authors: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

Abstract: AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data… ▽ More AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience. △ Less

Submitted 25 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2307.13960 [pdf, other]

Data-Driven Reduced-Order Aeroelastic Modeling of Highly Flexible Aircraft by Parametric Dynamic Mode Decomposition

Authors: Tianyi He, Weihua Su

Abstract: This paper presents a method of data-driven parametric Dynamic Mode Decomposition (p-DMD) to derive a linear parameter-varying reduced-order model (LPV-ROM) for the nonlinear aeroelasticity of highly flexible aircraft. It directly uses the data snapshots obtained at varying flight conditions, and encodes the physical understanding of the nonlinear model's polynomial dependency on flight conditions… ▽ More This paper presents a method of data-driven parametric Dynamic Mode Decomposition (p-DMD) to derive a linear parameter-varying reduced-order model (LPV-ROM) for the nonlinear aeroelasticity of highly flexible aircraft. It directly uses the data snapshots obtained at varying flight conditions, and encodes the physical understanding of the nonlinear model's polynomial dependency on flight conditions to produce a polynomial-dependent LPV-ROM. Therefore, this method can handle not only the equilibrium flight conditions but also the cases of continuously-varying flight conditions. In the numerical studies, a highly flexible cantilever wing and a slender vehicle built based on it are first studied with fixed angles of attack as the scheduling parameter. The comparisons between traditional linearization-based parametric modeling and the data-driven p-DMD modeling are performed to verify the modeling accuracy. The results demonstrate that the current p-DMD modeling method can capture the aeroelastic and flight dynamic responses of highly flexible aircraft in both time and frequency domains. In addition, the proposed p-DMD method is applied to the highly flexible aircraft in a perturbed longitudinal flight with varying angles of attack as the scheduling parameter. The nonlinear aeroelastic and flight dynamic data are compared with the simulation results of the data-driven p-DMD model. The comparison results demonstrate that it can accurately capture the non-equilibrium (or transient) aeroelastic and flight dynamic behaviors of such slender vehicles. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2306.04466 [pdf, other]

Point Cloud Video Anomaly Detection Based on Point Spatio-Temporal Auto-Encoder

Authors: Tengjiao He, Wenguang Wang

Abstract: Video anomaly detection has great potential in enhancing safety in the production and monitoring of crucial areas. Currently, most video anomaly detection methods are based on RGB modality, but its redundant semantic information may breach the privacy of residents or patients. The 3D data obtained by depth camera and LiDAR can accurately locate anomalous events in 3D space while preserving human p… ▽ More Video anomaly detection has great potential in enhancing safety in the production and monitoring of crucial areas. Currently, most video anomaly detection methods are based on RGB modality, but its redundant semantic information may breach the privacy of residents or patients. The 3D data obtained by depth camera and LiDAR can accurately locate anomalous events in 3D space while preserving human posture and motion information. Identifying individuals through the point cloud is difficult due to its sparsity, which protects personal privacy. In this study, we propose Point Spatio-Temporal Auto-Encoder (PSTAE), an autoencoder framework that uses point cloud videos as input to detect anomalies in point cloud videos. We introduce PSTOp and PSTTransOp to maintain spatial geometric and temporal motion information in point cloud videos. To measure the reconstruction loss of the proposed autoencoder framework, we propose a reconstruction loss measurement strategy based on a shallow feature extractor. Experimental results on the TIMo dataset show that our method outperforms currently representative depth modality-based methods in terms of AUROC and has superior performance in detecting Medical Issue anomalies. These results suggest the potential of point cloud modality in video anomaly detection. Our method sets a new state-of-the-art (SOTA) on the TIMo dataset. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2210.08780 [pdf, other]

Sample-efficient Model Predictive Control Design of Soft Robotics by Bayesian Optimization

Authors: Anuj Pal, Tianyi He, Wenpeng Wei

Abstract: This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of c… ▽ More This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of closed-loop responses. The prediction model is updated by Bayesian optimization from the closed-loop input-output data in each iteration. A linear MPC is then designed based on the updated prediction model, and evaluated based on the closed-loop responses. Different from directly searching controller parameters, the closed-loop system stability, and inputs/outputs constraints can be easily handled in the MPC design. After a few iterations, a convergent solution of a (sub-)optimal controller can be obtained, which minimizes the user-defined closed-loop performance index. The proposed method is simulated and validated by a high-fidelity simulation of a cable-actuated soft robot. The simulation results demonstrate that the proposed approach can achieve desired tracking controller for the soft robot without a prior-known model. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: submitted to ACC 2023

arXiv:2205.00103 [pdf, other]

An Approach for Fast Cascading Failure Simulation in Dynamic Models of Power Systems

Authors: Sina Gharebaghi, Nilanjan Ray Chaudhuri, Ting He, Thomas La Porta

Abstract: The ground truth for cascading failure in power system can only be obtained through a detailed dynamic model involving nonlinear differential and algebraic equations whose solution process is computationally expensive. This has prohibited adoption of such models for cascading failure simulation. To solve this, we propose a fast cascading failure simulation approach based on implicit Backward Euler… ▽ More The ground truth for cascading failure in power system can only be obtained through a detailed dynamic model involving nonlinear differential and algebraic equations whose solution process is computationally expensive. This has prohibited adoption of such models for cascading failure simulation. To solve this, we propose a fast cascading failure simulation approach based on implicit Backward Euler method (BEM) with stiff decay property. Unfortunately, BEM suffers from hyperstability issue in case of oscillatory instability and converges to the unstable equilibrium. We propose a predictor-corrector approach to fully address the hyperstability issue in BEM. The predictor identifies oscillatory instability based on eigendecomposition of the system matrix at the post-disturbance unstable equilibrium obtained as a byproduct of BEM. The corrector uses right eigenvectors to identify the group of machines participating in the unstable mode. This helps in applying appropriate protection schemes as in ground truth. We use Trapezoidal method (TM)-based simulation as the benchmark to validate the results of the proposed approach on the IEEE 118-bus network, 2,383-bus Polish grid, and IEEE 68-bus system. The proposed approach is able to track the cascade path and replicate the end results of TM-based simulation with very high accuracy while reducing the average simulation time by approximately 10-35 fold. The proposed approach was also compared with the partitioned method, which led to similar conclusions. △ Less

Submitted 29 April, 2022; originally announced May 2022.

arXiv:2203.08944 [pdf, other]

Autonomous Wheel Loader Trajectory Tracking Control Using LPV-MPC

Authors: Ruitao Song, Zhixian Ye, Liyang Wang, Tianyi He, Liangjun Zhang

Abstract: In this paper, we present a systematic approach for high-performance and efficient trajectory tracking control of autonomous wheel loaders. With the nonlinear dynamic model of a wheel loader, nonlinear model predictive control (MPC) is used in offline trajectory planning to obtain a high-performance state-control trajectory while satisfying the state and control constraints. In tracking control, t… ▽ More In this paper, we present a systematic approach for high-performance and efficient trajectory tracking control of autonomous wheel loaders. With the nonlinear dynamic model of a wheel loader, nonlinear model predictive control (MPC) is used in offline trajectory planning to obtain a high-performance state-control trajectory while satisfying the state and control constraints. In tracking control, the nonlinear model is embedded into a Linear Parameter Varying (LPV) model and the LPV-MPC strategy is used to achieve fast online computation and good tracking performance. To demonstrate the effectiveness and the advantages of the LPV-MPC, we test and compare three model predictive control strategies in the high-fidelity simulation environment. With the planned trajectory, three tracking control strategies LPV-MPC, nonlinear MPC, and LTI-MPC are simulated and compared in the perspectives of computational burden and tracking performance. The LPV-MPC can achieve better performance than conventional LTI-MPC because more accurate nominal system dynamics are captured in the LPV model. In addition, LPV-MPC achieves slightly worse tracking performance but tremendously improved computational efficiency than nonlinear MPC. A video with loading cycles completed by our autonomous wheel loader in the simulation environment can be found here: https://youtu.be/QbNfS_wZKKA. △ Less

Submitted 7 April, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

arXiv:2110.12981 [pdf]

doi 10.1109/TPWRS.2022.3194570

Feasibility Study of Neural ODE and DAE Modules for Power System Dynamic Component Modeling

Authors: Tannan Xiao, Ying Chen, Shaowei Huang, Tirui He, Huizhe Guan

Abstract: In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures compon… ▽ More In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures components' dynamic characteristics and flexibly adapts to various interface settings. Secondly, analytical models and data-driven models learned by the neural ODE and DAE modules are integrated together and simulated simultaneously using unified transient stability simulation methods. Finally, the neural ODE and DAE modules are implemented with Python and made public on GitHub. Using the portal measurements, three simple but representative cases of excitation controller modeling, photovoltaic power plant modeling, and equivalent load modeling of a regional power network are carried out in the IEEE-39 system and 2383wp system. Neural dynamic model-integrated simulations are compared with the original model-based ones to verify the feasibility and potentiality of the proposed neural ODE and DAE modules. △ Less

Submitted 4 July, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 14 pages, 8 figures, 3 table. Under review by IEEE Transactions on Power Systems

arXiv:2110.01161 [pdf, other]

Enhance Images as You Like with Unpaired Learning

Authors: Xiaopeng Sun, Muxingzi Li, Tianyu He, Lubin Fan

Abstract: Low-light image enhancement exhibits an ill-posed nature, as a given image may have many enhanced versions, yet recent studies focus on building a deterministic mapping from input to an enhanced version. In contrast, we propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space, given only sets of low-… ▽ More Low-light image enhancement exhibits an ill-posed nature, as a given image may have many enhanced versions, yet recent studies focus on building a deterministic mapping from input to an enhanced version. In contrast, we propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space, given only sets of low- and normal-light training images without any correspondence. By formulating this ill-posed problem as a modulation code learning task, our network learns to generate a collection of enhanced images from a given input conditioned on various reference images. Therefore our inference model easily adapts to various user preferences, provided with a few favorable photos from each user. Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets, while being 6 to 10 times lighter than state-of-the-art generative adversarial networks (GANs) approaches. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Comments: 7 pages; IJCAI 2021

arXiv:2110.00931 [pdf]

doi 10.35833/MPCE.2022.000099

Exploration of Artificial Intelligence-oriented Power System Dynamic Simulators

Authors: Tannan Xiao, Ying Chen, Jianquan Wang, Shaowei Huang, Weilin Tong, Tirui He

Abstract: With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consi… ▽ More With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consists of a high-performance simulator with neural network supportability and flexible external and internal application programming interfaces (APIs). With the support of APIs, simulation-assisted AI and AI-assisted simulation form a comprehensive interaction mechanism between power system dynamic simulations and AI. A prototype of this design is implemented and made public based on a highly efficient electromechanical simulator. Tests of this prototype are carried out under four scenarios including sample generation, AI-based stability prediction, data-driven dynamic component modeling, and AI-aided stability control, which prove the validity, flexibility, and efficiency of the design and implementation of the AI-oriented power system dynamic simulator. △ Less

Submitted 6 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: 10 pages, 8 figures, 1 table. Accepted by Journal of Modern Power System and Clean Energy

arXiv:2109.03488 [pdf, ps, other]

Partial Symbol Recovery for Interference Resilience in Low-Power Wide Area Networks

Authors: Kai Sun, Zhimeng Yin, Weiwei Chen, Shuai Wang, Zeyu Zhang, Tian He

Abstract: Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate thi… ▽ More Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate this issue, this paper introduces the Partial Symbol Recovery (PSR) scheme for improving the CTI resilience of LPWAN. We verify our idea on LoRa, a widely adopted LPWAN technique, as a proof of concept. At the PHY layer, although CTI has much higher power, its duration is relatively shorter compared with LoRa symbols, leaving part of a LoRa symbol uncorrupted. Moreover, due to its high redundancy, LoRa chips within a symbol are highly correlated. This opens the possibility of detecting a LoRa symbol with only part of the chips. By examining the unique frequency patterns in LoRa symbols with time-frequency analysis, our design effectively detects the clean LoRa chips that are free of CTI. This enables PSR to only rely on clean LoRa chips for successfully recovering from communication failures. We evaluate our PSR design with real-world testbeds, including SX1280 LoRa chips and USRP B210, under Wi-Fi interference in various scenarios. Extensive experiments demonstrate that our design offers reliable packet recovery performance, successfully boosting the LoRa packet reception ratio from 45.2% to 82.2% with a performance gain of 1.8 times. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.09591 [pdf, other]

doi 10.1109/BHI50953.2021.9508604

Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks

Authors: Hung Q. Vo, Pengyu Yuan, Tiancheng He, Stephen T. C. Wong, Hien V. Nguyen

Abstract: Accurate breast lesion risk estimation can significantly reduce unnecessary biopsies and help doctors decide optimal treatment plans. Most existing computer-aided systems rely solely on mammogram features to classify breast lesions. While this approach is convenient, it does not fully exploit useful information in clinical reports to achieve the optimal performance. Would clinical features signifi… ▽ More Accurate breast lesion risk estimation can significantly reduce unnecessary biopsies and help doctors decide optimal treatment plans. Most existing computer-aided systems rely solely on mammogram features to classify breast lesions. While this approach is convenient, it does not fully exploit useful information in clinical reports to achieve the optimal performance. Would clinical features significantly improve breast lesion classification compared to using mammograms alone? How to handle missing clinical information caused by variation in medical practice? What is the best way to combine mammograms and clinical features? There is a compelling need for a systematic study to address these fundamental questions. This paper investigates several multimodal deep networks based on feature concatenation, cross-attention, and co-attention to combine mammograms and categorical clinical variables. We show that the proposed architectures significantly increase the lesion classification performance (average area under ROC curves from 0.89 to 0.94). We also evaluate the model when clinical variables are missing. △ Less

Submitted 21 August, 2021; originally announced August 2021.

arXiv:2108.02656 [pdf]

A Computer-Aided Diagnosis System for Breast Pathology: A Deep Learning Approach with Model Interpretability from Pathological Perspective

Authors: Wei-Wen Hsu, Yongfang Wu, Chang Hao, Yu-Ling Hou, Xiang Gao, Yun Shao, Xueli Zhang, Tao He, Yanhong Tai

Abstract: Objective: We develop a computer-aided diagnosis (CAD) system using deep learning approaches for lesion detection and classification on whole-slide images (WSIs) with breast cancer. The deep features being distinguishing in classification from the convolutional neural networks (CNN) are demonstrated in this study to provide comprehensive interpretability for the proposed CAD system using pathologi… ▽ More Objective: We develop a computer-aided diagnosis (CAD) system using deep learning approaches for lesion detection and classification on whole-slide images (WSIs) with breast cancer. The deep features being distinguishing in classification from the convolutional neural networks (CNN) are demonstrated in this study to provide comprehensive interpretability for the proposed CAD system using pathological knowledge. Methods: In the experiment, a total of 186 slides of WSIs were collected and classified into three categories: Non-Carcinoma, Ductal Carcinoma in Situ (DCIS), and Invasive Ductal Carcinoma (IDC). Instead of conducting pixel-wise classification into three classes directly, we designed a hierarchical framework with the multi-view scheme that performs lesion detection for region proposal at higher magnification first and then conducts lesion classification at lower magnification for each detected lesion. Results: The slide-level accuracy rate for three-category classification reaches 90.8% (99/109) through 5-fold cross-validation and achieves 94.8% (73/77) on the testing set. The experimental results show that the morphological characteristics and co-occurrence properties learned by the deep learning models for lesion classification are accordant with the clinical rules in diagnosis. Conclusion: The pathological interpretability of the deep features not only enhances the reliability of the proposed CAD system to gain acceptance from medical specialists, but also facilitates the development of deep learning frameworks for various tasks in pathology. Significance: This paper presents a CAD system for pathological image analysis, which fills the clinical requirements and can be accepted by medical specialists with providing its interpretability from the pathological perspective. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2107.09889 [pdf, other]

Fine-Grained Music Plagiarism Detection: Revealing Plagiarists through Bipartite Graph Matching and a Comprehensive Large-Scale Dataset

Authors: Wenxuan Liu, Tianyao He, Chen Gong, Ning Zhang, Hua Yang, Junchi Yan

Abstract: Music plagiarism detection is gaining more and more attention due to the popularity of music production and society's emphasis on intellectual property. We aim to find fine-grained plagiarism in music pairs since conventional methods are coarse-grained and cannot match real-life scenarios. Considering that there is no sizeable dataset designed for the music plagiarism task, we establish a large-sc… ▽ More Music plagiarism detection is gaining more and more attention due to the popularity of music production and society's emphasis on intellectual property. We aim to find fine-grained plagiarism in music pairs since conventional methods are coarse-grained and cannot match real-life scenarios. Considering that there is no sizeable dataset designed for the music plagiarism task, we establish a large-scale simulated dataset, named Music Plagiarism Detection Dataset (MPD-Set) under the guidance and expertise of renowned researchers from national-level professional institutions in the field of music. MPD-Set considers diverse music plagiarism cases found in real life from the melodic, rhythmic, and tonal levels respectively. Further, we establish a Real-life Dataset for evaluation, where all plagiarism pairs are real cases. To detect the fine-grained plagiarism pairs effectively, we propose a graph-based method called Bipatite Melody Matching Detector (BMM-Det), which formulates the problem as a max matching problem in the bipartite graph. Experimental results on both the simulated and Real-life Datasets demonstrate that BMM-Det outperforms the existing plagiarism detection methods, and is robust to common plagiarism cases like transpositions, pitch shifts, duration variance, and melody change. Datasets and source code are open-sourced at https://github.com/xuan301/BMMDet_MPDSet. △ Less

Submitted 2 July, 2023; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2104.06473 [pdf, other]

Topology Estimation Following Islanding and its Impact on Preventive Control of Cascading Failure

Authors: Sai Gopal Vennelaganti, Nilanjan Ray Chaudhuri, Ting He, Thomas La Porta

Abstract: Knowledge of power grid's topology during cascading failure is an essential element of centralized blackout prevention control, given that multiple islands are typically formed, as a cascade progresses. Moreover, academic research on interdependency between cyber and physical layers of the grid indicate that power failure during a cascade may lead to outages in communication networks, which progre… ▽ More Knowledge of power grid's topology during cascading failure is an essential element of centralized blackout prevention control, given that multiple islands are typically formed, as a cascade progresses. Moreover, academic research on interdependency between cyber and physical layers of the grid indicate that power failure during a cascade may lead to outages in communication networks, which progressively reduce the observable areas. These challenge the current literature on line outage detection, which assumes that the grid remains as a single connected component. We propose a new approach to eliminate that assumption. Following an islanding event, first the buses forming that connected components are identified and then further line outages within the individual islands are detected. In addition to the power system measurements, observable breaker statuses are integrated as constraints in our topology identification algorithm. The impact of error propagation on the estimation process as reliance on previous estimates keeps growing during cascade is also studied. Finally, the estimated admittance matrix is used in preventive control of cascading failure, creating a closed-loop system. The impact of such an interlinked estimation and control on that total load served is studied for the first time. Simulations in IEEE-118 bus system and 2,383-bus Polish network demonstrate the effectiveness of our approach. △ Less

Submitted 13 April, 2021; originally announced April 2021.

arXiv:2006.13305 [pdf]

doi 10.1016/j.jhydrol.2021.126626

Deep Learning of Dynamic Subsurface Flow via Theory-guided Generative Adversarial Network

Authors: Tianhao He, Dongxiao Zhang

Abstract: Generative adversarial network (GAN) has been shown to be useful in various applications, such as image recognition, text processing and scientific computing, due its strong ability to learn complex data distributions. In this study, a theory-guided generative adversarial network (TgGAN) is proposed to solve dynamic partial differential equations (PDEs). Different from standard GANs, the training… ▽ More Generative adversarial network (GAN) has been shown to be useful in various applications, such as image recognition, text processing and scientific computing, due its strong ability to learn complex data distributions. In this study, a theory-guided generative adversarial network (TgGAN) is proposed to solve dynamic partial differential equations (PDEs). Different from standard GANs, the training term is no longer the true data and the generated data, but rather their residuals. In addition, such theories as governing equations, other physical constraints and engineering controls, are encoded into the loss function of the generator to ensure that the prediction does not only honor the training data, but also obey these theories. TgGAN is proposed for dynamic subsurface flow with heterogeneous model parameters, and the data at each time step are treated as a two-dimensional image. In this study, several numerical cases are introduced to test the performance of the TgGAN. Predicting the future response, label-free learning and learning from noisy data can be realized easily by the TgGAN model. The effects of the number of training data and the collocation points are also discussed. In order to improve the efficiency of TgGAN, the transfer learning algorithm is also employed. Numerical results demonstrate that the TgGAN model is robust and reliable for deep learning of dynamic PDEs. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 35 pages, 19 figures, and 7 tables

Journal ref: Journal of Hydrology, 601 (2021), 126626

arXiv:2006.07585 [pdf, other]

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

Authors: Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Abstract: Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue. Existing methods largely rely on either external knowledge or statistical bias information to alleviate this problem. In this paper, we tackle this issue from another two aspects: (1) scene-object interaction aiming at learning specific… ▽ More Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue. Existing methods largely rely on either external knowledge or statistical bias information to alleviate this problem. In this paper, we tackle this issue from another two aspects: (1) scene-object interaction aiming at learning specific knowledge from a scene via an additive attention mechanism; and (2) long-tail knowledge transfer which tries to transfer the rich knowledge learned from the head into the tail. Extensive experiments on the benchmark dataset Visual Genome on three tasks demonstrate that our method outperforms current state-of-the-art competitors. △ Less

Submitted 13 June, 2020; originally announced June 2020.

arXiv:2004.13316 [pdf, other]

SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing

Authors: Xue Yang, Junchi Yan, Wenlong Liao, Xiaokang Yang, Jin Tang, Tao He

Abstract: Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of d… ▽ More Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S$^2$TLD by this paper. The results show the effectiveness of our approach. The released dataset S2TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories. △ Less

Submitted 28 April, 2022; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: 15 pages, 12 figures, 11 tables, accepted by TPAMI

arXiv:1912.12847 [pdf, other]

Generative Memorize-Then-Recall framework for low bit-rate Surveillance Video Compression

Authors: Yaojun Wu, Tianyu He, Zhibo Chen

Abstract: Applications of surveillance video have developed rapidly in recent years to protect public safety and daily life, which often detect and recognize objects in video sequences. Traditional coding frameworks remove temporal redundancy in surveillance video by block-wise motion compensation, lacking the extraction and utilization of inherent structure information. In this paper, we figure out this is… ▽ More Applications of surveillance video have developed rapidly in recent years to protect public safety and daily life, which often detect and recognize objects in video sequences. Traditional coding frameworks remove temporal redundancy in surveillance video by block-wise motion compensation, lacking the extraction and utilization of inherent structure information. In this paper, we figure out this issue by disentangling surveillance video into the structure of a global spatio-temporal feature (memory) for Group of Picture (GoP) and skeleton for each frame (clue). The memory is obtained by sequentially feeding frame inside GoP into a recurrent neural network, describing appearance for objects that appeared inside GoP. While the skeleton is calculated by a pose estimator, it is regarded as a clue to recall memory. Furthermore, an attention mechanism is introduced to obtain the relation between appearance and skeletons. Finally, we employ generative adversarial network to reconstruct each frame. Experimental results indicate that our method effectively generates realistic reconstruction based on appearance and skeleton, which show much higher compression performance on surveillance video compared with the latest video compression standard H.265. △ Less

Submitted 6 May, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: 11 pages, 8 figures

arXiv:1908.05612 [pdf, other]

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

Authors: Xue Yang, Junchi Yan, Ziming Feng, Tao He

Abstract: Rotation detection is a challenging task due to the difficulties of locating the multi-angle objects and separating them effectively from the background. Though considerable progress has been made, for practical settings, there still exist challenges for rotating objects with large aspect ratio, dense distribution and category extremely imbalance. In this paper, we propose an end-to-end refined si… ▽ More Rotation detection is a challenging task due to the difficulties of locating the multi-angle objects and separating them effectively from the background. Though considerable progress has been made, for practical settings, there still exist challenges for rotating objects with large aspect ratio, dense distribution and category extremely imbalance. In this paper, we propose an end-to-end refined single-stage rotation detector for fast and accurate object detection by using a progressive regression approach from coarse to fine granularity. Considering the shortcoming of feature misalignment in existing refined single-stage detector, we design a feature refinement module to improve detection performance by getting more accurate features. The key idea of feature refinement module is to re-encode the position information of the current refined bounding box to the corresponding feature points through pixel-wise feature interpolation to realize feature reconstruction and alignment. For more accurate rotation estimation, an approximate SkewIoU loss is proposed to solve the problem that the calculation of SkewIoU is not derivable. Experiments on three popular remote sensing public datasets DOTA, HRSC2016, UCAS-AOD as well as one scene text dataset ICDAR2015 show the effectiveness of our approach. Tensorflow and Pytorch version codes are available at https://github.com/Thinklab-SJTU/R3Det_Tensorflow and https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection, and R3Det is also integrated in our open source rotation detection benchmark: https://github.com/yangxue0827/RotationDetection. △ Less

Submitted 8 December, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

Comments: 13 pages, 12 figures, 9 tables

Journal ref: Thirty-Five AAAI Conference on Artificial Intelligence (AAAI2021)

arXiv:1907.03221 [pdf, other]

FC$^2$N: Fully Channel-Concatenated Network for Single Image Super-Resolution

Authors: Xiaole Zhao, Ying Liao, Tian He, Yulun Zhang, Yadong Wu, Tao Zhang

Abstract: Most current image super-resolution (SR) methods based on convolutional neural networks (CNNs) use residual learning in network structural design, which favors to effective back propagation and hence improves SR performance by increasing model scale. However, residual networks suffer from representational redundancy by introducing identity paths that impede the full exploitation of model capacity.… ▽ More Most current image super-resolution (SR) methods based on convolutional neural networks (CNNs) use residual learning in network structural design, which favors to effective back propagation and hence improves SR performance by increasing model scale. However, residual networks suffer from representational redundancy by introducing identity paths that impede the full exploitation of model capacity. Besides, blindly enlarging network scale can cause more problems in model training, even with residual learning. In this paper, a novel fully channel-concatenated network (FC$^2$N) is presented to make further mining of representational capacity of deep models, in which all interlayer skips are implemented by a simple and straightforward operation, i.e., weighted channel concatenation (WCC), followed by a 1$\times$1 conv layer. Based on the WCC, the model can achieve the joint attention mechanism of linear and nonlinear features in the network, and presents better performance than other state-of-the-art SR models with fewer model parameters. To our best knowledge, FC$^2$N is the first CNN model that does not use residual learning and reaches network depth over 400 layers. Moreover, it shows excellent performance in both largescale and lightweight implementations, which illustrates the full exploitation of the representational capacity of the model. △ Less

Submitted 5 May, 2021; v1 submitted 7 July, 2019; originally announced July 2019.

Comments: 17 pages, 8 figures and 4 tables

arXiv:1804.09869 [pdf, other]

doi 10.1109/TCSVT.2019.2892608

Learning for Video Compression

Authors: Zhibo Chen, Tianyu He, Xin Jin, Feng Wu

Abstract: One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the le… ▽ More One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis, binarization, etc. Experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding. △ Less

Submitted 9 January, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:1603.06263 [pdf, other]

Data-Driven Robust Taxi Dispatch under Demand Uncertainties

Authors: Fei Miao, Shuo Han, Shan Lin, Qian Wang, John Stankovic, Abdeltawab Hendawi, Desheng Zhang, Tian He, George J. Pappas

Abstract: In modern taxi networks, large amounts of taxi occupancy status and location data are collected from networked in-vehicle sensors in real-time. They provide knowledge of system models on passenger demand and mobility patterns for efficient taxi dispatch and coordination strategies. Such approaches face new challenges: how to deal with uncertainties of predicted customer demand while fulfilling the… ▽ More In modern taxi networks, large amounts of taxi occupancy status and location data are collected from networked in-vehicle sensors in real-time. They provide knowledge of system models on passenger demand and mobility patterns for efficient taxi dispatch and coordination strategies. Such approaches face new challenges: how to deal with uncertainties of predicted customer demand while fulfilling the system's performance requirements, including minimizing taxis' total idle mileage and maintaining service fairness across the whole city; how to formulate a computationally tractable problem. To address this problem, we develop a data-driven robust taxi dispatch framework to consider spatial-temporally correlated demand uncertainties. The robust vehicle dispatch problem we formulate is concave in the uncertain demand and convex in the decision variables. Uncertainty sets of random demand vectors are constructed from data based on theories in hypothesis testing, and provide a desired probabilistic guarantee level for the performance of robust taxi dispatch solutions. We prove equivalent computationally tractable forms of the robust dispatch problem using the minimax theorem and strong duality. Evaluations on four years of taxi trip data for New York City show that by selecting a probabilistic guarantee level at 75%, the average demand-supply ratio error is reduced by 31.7%, and the average total idle driving distance is reduced by 10.13% or about 20 million miles annually, compared with non-robust dispatch solutions. △ Less

Submitted 20 October, 2017; v1 submitted 20 March, 2016; originally announced March 2016.

Comments: Accepted as a regular paper, IEEE Transactions on Control Systems Technology; 15 pages. This version updated as of Oct 2017

arXiv:1603.04418 [pdf, other]

doi 10.1145/2735960.2735961

Taxi Dispatch with Real-Time Sensing Data in Metropolitan Areas: A Receding Horizon Control Approach

Authors: Fei Miao, Shuo Han, Shan Lin, John A. Stankovic, Hua Huang, Desheng Zhang, Sirajum Munir, Tian He, George J. Pappas

Abstract: Traditional taxi systems in metropolitan areas often suffer from inefficiencies due to uncoordinated actions as system capacity and customer demand change. With the pervasive deployment of networked sensors in modern vehicles, large amounts of information regarding customer demand and system status can be collected in real time. This information provides opportunities to perform various types of c… ▽ More Traditional taxi systems in metropolitan areas often suffer from inefficiencies due to uncoordinated actions as system capacity and customer demand change. With the pervasive deployment of networked sensors in modern vehicles, large amounts of information regarding customer demand and system status can be collected in real time. This information provides opportunities to perform various types of control and coordination for large-scale intelligent transportation systems. In this paper, we present a receding horizon control (RHC) framework to dispatch taxis, which incorporates highly spatiotemporally correlated demand/supply models and real-time GPS location and occupancy information. The objectives include matching spatiotemporal ratio between demand and supply for service quality with minimum current and anticipated future taxi idle driving distance. Extensive trace-driven analysis with a data set containing taxi operational records in San Francisco shows that our solution reduces the average total idle distance by 52%, and reduces the supply demand ratio error across the city during one experimental time slot by 45%. Moreover, our RHC framework is compatible with a wide variety of predictive models and optimization problem formulations. This compatibility property allows us to solve robust optimization problems with corresponding demand uncertainty models that provide disruptive event information. △ Less

Submitted 14 March, 2016; originally announced March 2016.

Comments: Accepted. Key words--Intelligent Transportation System, Real-Time Taxi Dispatch, Receding Horizon Control, Mobility Pattern

Journal ref: IEEE Transactions on Automation Science and Engineering (TASE), 2016

arXiv:1603.02764 [pdf, other]

doi 10.1109/TPDS.2016.2533614

Distributed Control for Charging Multiple Electric Vehicles with Overload Limitation

Authors: Bo Yang, Jingwei Li, Qiaoni Han, Tian He, Cailian Chen, Xinping Guan

Abstract: Severe pollution induced by traditional fossil fuels arouses great attention on the usage of plug-in electric vehicles (PEVs) and renewable energy. However, large-scale penetration of PEVs combined with other kinds of appliances tends to cause excessive or even disastrous burden on the power grid, especially during peak hours. This paper focuses on the scheduling of PEVs charging process among dif… ▽ More Severe pollution induced by traditional fossil fuels arouses great attention on the usage of plug-in electric vehicles (PEVs) and renewable energy. However, large-scale penetration of PEVs combined with other kinds of appliances tends to cause excessive or even disastrous burden on the power grid, especially during peak hours. This paper focuses on the scheduling of PEVs charging process among different charging stations and each station can be supplied by both renewable energy generators and a distribution network. The distribution network also powers some uncontrollable loads. In order to minimize the on-grid energy cost with local renewable energy and non-ideal storage while avoiding the overload risk of the distribution network, an online algorithm consisting of scheduling the charging of PEVs and energy management of charging stations is developed based on Lyapunov optimization and Lagrange dual decomposition techniques. The algorithm can satisfy the random charging requests from PEVs with provable performance. Simulation results with real data demonstrate that the proposed algorithm can decrease the time-average cost of stations while avoiding overload in the distribution network in the presence of random uncontrollable loads. △ Less

Submitted 8 March, 2016; originally announced March 2016.

Comments: 30 pages, 13 figures

Showing 1–31 of 31 results for author: He, T