Search | arXiv e-print repository

MobileMEF: Fast and Efficient Method for Multi-Exposure Fusion

Authors: Lucas Nedel Kirsten, Zhicheng Fu, Nikhil Ambha Madhusudhana

Abstract: Recent advances in camera design and imaging technology have enabled the capture of high-quality images using smartphones. However, due to the limited dynamic range of digital cameras, the quality of photographs captured in environments with highly imbalanced lighting often results in poor-quality images. To address this issue, most devices capture multi-exposure frames and then use some multi-exp… ▽ More Recent advances in camera design and imaging technology have enabled the capture of high-quality images using smartphones. However, due to the limited dynamic range of digital cameras, the quality of photographs captured in environments with highly imbalanced lighting often results in poor-quality images. To address this issue, most devices capture multi-exposure frames and then use some multi-exposure fusion method to merge those frames into a final fused image. Nevertheless, most traditional and current deep learning approaches are unsuitable for real-time applications on mobile devices due to their heavy computational and memory requirements. We propose a new method for multi-exposure fusion based on an encoder-decoder deep learning architecture with efficient building blocks tailored for mobile devices. This efficient design makes our model capable of processing 4K resolution images in less than 2 seconds on mid-range smartphones. Our method outperforms state-of-the-art techniques regarding full-reference quality measures and computational efficiency (runtime and memory usage), making it ideal for real-time applications on hardware-constrained devices. Our code is available at: https://github.com/LucasKirsten/MobileMEF. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2406.13292 [pdf, other]

An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease

Authors: Giorgio Dolci, Federica Cruciani, Md Abdur Rahaman, Anees Abrol, Jiayu Chen, Zening Fu, Ilaria Boscolo Galazzo, Gloria Menegaz, Vince D. Calhoun

Abstract: Alzheimer's disease (AD) is the most prevalent form of dementia with a progressive decline in cognitive abilities. The AD continuum encompasses a prodormal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD or remain stable. In this study, we leveraged structural and functional MRI to investigate the disease-induced grey matter and functional network connectiv… ▽ More Alzheimer's disease (AD) is the most prevalent form of dementia with a progressive decline in cognitive abilities. The AD continuum encompasses a prodormal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD or remain stable. In this study, we leveraged structural and functional MRI to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduce SNPs as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where generative module employing Cycle GANs was adopted to impute missing data within the latent space. Additionally, we adopted an Explainable AI method, Integrated Gradients, to extract input features relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our model was able to reach the SOA in the classification of CN/AD reaching an average test accuracy of $0.926\pm0.02$. For the MCI task, we achieved an average prediction accuracy of $0.711\pm0.01$ using the pre-trained model for CN/AD. The interpretability analysis revealed significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to amyloid-beta and cholesterol formation clearance and regulation, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 27 pages, 7 figures, submitted to a journal

arXiv:2406.10454 [pdf, other]

HumanPlus: Humanoid Shadowing and Imitation from Humans

Authors: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a… ▽ More One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/ △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: project website: https://humanoid-ai.github.io/

arXiv:2405.01119 [pdf]

Towards Understanding Worldwide Cross-cultural Differences in Implicit Driving Cues: Review, Comparative Analysis, and Research Roadmap

Authors: Yongqi Dong, Chang Liu, Yiyun Wang, Zhe Fu

Abstract: Recognizing and understanding implicit driving cues across diverse cultures is imperative for fostering safe and efficient global transportation systems, particularly when training new immigrants holding driving licenses from culturally disparate countries. Additionally, it is essential to consider cross-cultural differences in the development of Automated Driving features tailored to different co… ▽ More Recognizing and understanding implicit driving cues across diverse cultures is imperative for fostering safe and efficient global transportation systems, particularly when training new immigrants holding driving licenses from culturally disparate countries. Additionally, it is essential to consider cross-cultural differences in the development of Automated Driving features tailored to different countries. Previous piloting studies have compared and analyzed cross-cultural differences in selected implicit driving cues, but they typically examine only limited countries. However, a comprehensive worldwide comparison and analysis are lacking. This study conducts a thorough review of existing literature, online blogs, and expert insights from diverse countries to investigate cross-cultural disparities in driving behaviors, specifically focusing on implicit cues such as non-verbal communication (e.g., hand gestures, signal lighting, honking), norms, and social expectations. Through comparative analysis, variations in driving cues are illuminated across different cultural contexts. Based on the findings and identified gaps, a research roadmap is proposed for future research to further explore and address these differences, aiming to enhance intercultural communication, improve road safety, and increase transportation efficiency on a global scale. This paper presents the pioneering work towards a comprehensive understanding of the implicit driving cues across cultures. Moreover, this understanding will inform the development of automated driving systems tailored to different countries considering cross-cultural differences. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 7 pages, 1 figure, under review by the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

arXiv:2405.00472 [pdf, other]

DmADs-Net: Dense multiscale attention and depth-supervised network for medical image segmentation

Authors: Zhaojin Fu, Zheng Chen, Jinjiang Li, Lu Ren

Abstract: Deep learning has made important contributions to the development of medical image segmentation. Convolutional neural networks, as a crucial branch, have attracted strong attention from researchers. Through the tireless efforts of numerous researchers, convolutional neural networks have yielded numerous outstanding algorithms for processing medical images. The ideas and architectures of these algo… ▽ More Deep learning has made important contributions to the development of medical image segmentation. Convolutional neural networks, as a crucial branch, have attracted strong attention from researchers. Through the tireless efforts of numerous researchers, convolutional neural networks have yielded numerous outstanding algorithms for processing medical images. The ideas and architectures of these algorithms have also provided important inspiration for the development of later technologies.Through extensive experimentation, we have found that currently mainstream deep learning algorithms are not always able to achieve ideal results when processing complex datasets and different types of datasets. These networks still have room for improvement in lesion localization and feature extraction. Therefore, we have created the Dense Multiscale Attention and Depth-Supervised Network (DmADs-Net).We use ResNet for feature extraction at different depths and create a Multi-scale Convolutional Feature Attention Block to improve the network's attention to weak feature information. The Local Feature Attention Block is created to enable enhanced local feature attention for high-level semantic information. In addition, in the feature fusion phase, a Feature Refinement and Fusion Block is created to enhance the fusion of different semantic information.We validated the performance of the network using five datasets of varying sizes and types. Results from comparative experiments show that DmADs-Net outperformed mainstream networks. Ablation experiments further demonstrated the effectiveness of the created modules and the rationality of the network architecture. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.00144 [pdf, other]

An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis

Authors: Ziyu Zhou, Anton Orlichenko, Gang Qu, Zening Fu, Vince D Calhoun, Zhengming Ding, Yu-Ping Wang

Abstract: Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In t… ▽ More Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In this paper, we propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF), which aims to capture both intra-modal and inter-modal relationships between fMRI and sMRI, enhancing multi-modal data representation. Specifically, our CAMF framework employs self-attention modules to identify interactions within each modality while cross-attention modules identify interactions between modalities. Subsequently, our approach optimizes the integration of latent features from both modalities. This approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets, where CAMF consistently outperforms existing methods. Furthermore, the gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia. The bio-markers identified by CAMF align with established research, potentially offering new insights into the diagnosis and pathological endophenotypes of schizophrenia. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2402.17043 [pdf, other]

Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experiment leveraged a heterogeneous fleet of 100 longitudinally-controlled vehicles as Lagrangian traffic actuators, each of which ran a controller with the architecture described in this paper. The MegaController is a hierarchical control architecture, which consists of two main layers. The upper layer is called Speed Planner, and is a centralized optimal control algorithm. It assigns speed targets to the vehicles, conveyed through the LTE cellular network. The lower layer is a control layer, running on each vehicle. It performs local actuation by overriding the stock adaptive cruise controller, using the stock on-board sensors. The Speed Planner ingests live data feeds provided by third parties, as well as data from our own control vehicles, and uses both to perform the speed assignment. The architecture of the speed planner allows for modular use of standard control techniques, such as optimal control, model predictive control, kernel methods and others, including Deep RL, model predictive control and explicit controllers. Depending on the vehicle architecture, all onboard sensing data can be accessed by the local controllers, or only some. Control inputs vary across different automakers, with inputs ranging from torque or acceleration requests for some cars, and electronic selection of ACC set points in others. The proposed architecture allows for the combination of all possible settings proposed above. Most configurations were tested throughout the ramp up to the MegaVandertest. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16993 [pdf, other]

Hierarchical Speed Planner for Automated Vehicles: A Framework for Lagrangian Variable Speed Limit in Mixed Autonomy Traffic

Authors: Han Wang, Zhe Fu, Jonathan Lee, Hossein Nick Zinat Matin, Arwa Alanqary, Daniel Urieli, Sharon Hornstein, Abdul Rahman Kreidieh, Raphael Chekroun, William Barbour, William A. Richardson, Dan Work, Benedetto Piccoli, Benjamin Seibold, Jonathan Sprinkle, Alexandre M. Bayen, Maria Laura Delle Monache

Abstract: This paper introduces a novel control framework for Lagrangian variable speed limits in hybrid traffic flow environments utilizing automated vehicles (AVs). The framework was validated using a fleet of 100 connected automated vehicles as part of the largest coordinated open-road test designed to smooth traffic flow. The framework includes two main components: a high-level controller deployed on th… ▽ More This paper introduces a novel control framework for Lagrangian variable speed limits in hybrid traffic flow environments utilizing automated vehicles (AVs). The framework was validated using a fleet of 100 connected automated vehicles as part of the largest coordinated open-road test designed to smooth traffic flow. The framework includes two main components: a high-level controller deployed on the server side, named Speed Planner, and low-level controllers called vehicle controllers deployed on the vehicle side. The Speed Planner designs and updates target speeds for the vehicle controllers based on real-time Traffic State Estimation (TSE) [1]. The Speed Planner comprises two modules: a TSE enhancement module and a target speed design module. The TSE enhancement module is designed to minimize the effects of inherent latency in the received traffic information and to improve the spatial and temporal resolution of the input traffic data. The target speed design module generates target speed profiles with the goal of improving traffic flow. The vehicle controllers are designed to track the target speed meanwhile responding to the surrounding situation. The numerical simulation indicates the performance of the proposed method: the bottleneck throughput has increased by 5.01%, and the speed standard deviation has been reduced by a significant 34.36%. We further showcase an operational study with a description of how the controller was implemented on a field-test with 100 AVs and its comprehensive effects on the traffic flow. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2401.02117 [pdf, other]

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Authors: Zipeng Fu, Tony Z. Zhao, Chelsea Finn

Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body… ▽ More Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Project website: https://mobile-aloha.github.io (Zipeng Fu and Tony Z. Zhao are project co-leads, Chelsea Finn is the advisor)

arXiv:2309.01112 [pdf]

Swing Leg Motion Strategy for Heavy-load Legged Robot Based on Force Sensing

Authors: Ze Fu, Yinghui Li, Weizhong Guo

Abstract: The heavy-load legged robot has strong load carrying capacity and can adapt to various unstructured terrains. But the large weight results in higher requirements for motion stability and environmental perception ability. In order to utilize force sensing information to improve its motion performance, in this paper, we propose a finite state machine model for the swing leg in the static gait by imi… ▽ More The heavy-load legged robot has strong load carrying capacity and can adapt to various unstructured terrains. But the large weight results in higher requirements for motion stability and environmental perception ability. In order to utilize force sensing information to improve its motion performance, in this paper, we propose a finite state machine model for the swing leg in the static gait by imitating the movement of the elephant. Based on the presence or absence of additional terrain information, different trajectory planning strategies are provided for the swing leg to enhance the success rate of stepping and save energy. The experimental results on a novel quadruped robot show that our method has strong robustness and can enable heavy-load legged robots to pass through various complex terrains autonomously and smoothly. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.12797 [pdf, other]

TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search

Authors: Licheng Wen, Ze Fu, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi

Abstract: Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adapt… ▽ More Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adaptability. In this paper, we introduce TrafficMCTS, an innovative framework that harnesses the synergy of groupbased Monte Carlo tree search (MCTS) and Social Value Orientation (SVO) to engender a multifaceted traffic flow replete with varying driving styles and cooperative tendencies. Anchored by a closed-loop architecture, our framework enables vehicles to dynamically adapt to their environment in real time, and ensure feasible collision-free trajectories. Through comprehensive comparisons with state-of-the-art methods, we illuminate the advantages of our approach in terms of computational efficiency, planning success rate, intent completion time, and diversity metrics. Besides, we simulate highway and roundabout scenarios to illustrate the effectiveness of the proposed framework and highlight its ability to induce diverse social behaviors within the traffic flow. Finally, we validate the scalability of TrafficMCTS by showcasing its prowess in simultaneously mass vehicles within a sprawling road network, cultivating a landscape of traffic flow that mirrors the intricacies of human behavior. △ Less

Submitted 31 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2302.07453 [pdf, other]

Cooperative Driving for Speed Harmonization in Mixed-Traffic Environments

Authors: Zhe Fu, Abdul Rahman Kreidieh, Han Wang, Jonathan W. Lee, Maria Laura Delle Monache, Alexandre M. Bayen

Abstract: Autonomous driving systems present promising methods for congestion mitigation in mixed autonomy traffic control settings. In particular, when coupled with even modest traffic state estimates, such systems can plan and coordinate the behaviors of automated vehicles (AVs) in response to observed downstream events, thereby inhibiting the continued propagation of congestion. In this paper, we present… ▽ More Autonomous driving systems present promising methods for congestion mitigation in mixed autonomy traffic control settings. In particular, when coupled with even modest traffic state estimates, such systems can plan and coordinate the behaviors of automated vehicles (AVs) in response to observed downstream events, thereby inhibiting the continued propagation of congestion. In this paper, we present a two-layer control strategy in which the upper layer proposes the desired speeds that predictively react to the downstream state of traffic, and the lower layer maintains safe and reasonable headways with leading vehicles. This method is demonstrated to achieve an average of over 15% energy savings within simulations of congested events observed in Interstate 24 with only 4% AV penetration, while restricting negative externalities imposed on traveling times and mobility. The proposed strategy that served as part of the "speed planner" was deployed on 100 AVs in a massive traffic experiment conducted on Nashville's I-24 in November 2022. △ Less

Submitted 3 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE IV 2023

arXiv:2211.16398 [pdf, other]

Self-Supervised Mental Disorder Classifiers via Time Reversal

Authors: Zafar Iqbal, Usman Mahmood, Zening Fu, Sergey Plis

Abstract: Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a… ▽ More Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a Deep Neural Network on Independent components derived from fMRI data using the Independent component analysis (ICA) technique. It learns time direction in the ICA-based data. This pre-trained model is further trained to classify brain disorders in different datasets. Through various experiments, we have shown that learning time direction helps a model learn some causal relation in fMRI data that helps in faster convergence, and consequently, the model generalizes well in downstream classification tasks even with fewer data records. △ Less

Submitted 30 November, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: 10 pages, 7 figures

arXiv:2210.10044 [pdf, other]

Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

Authors: Zipeng Fu, Xuxin Cheng, Deepak Pathak

Abstract: An attached arm can significantly increase the applicability of legged robots to several mobile manipulation tasks that are not possible for the wheeled or tracked counterparts. The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion. However, this is ineffective. It requires immense engineering to support coord… ▽ More An attached arm can significantly increase the applicability of legged robots to several mobile manipulation tasks that are not possible for the wheeled or tracked counterparts. The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion. However, this is ineffective. It requires immense engineering to support coordination between the arm and legs, and error can propagate across modules causing non-smooth unnatural motions. It is also biological implausible given evidence for strong motor synergies across limbs. In this work, we propose to learn a unified policy for whole-body control of a legged manipulator using reinforcement learning. We propose Regularized Online Adaptation to bridge the Sim2Real gap for high-DoF control, and Advantage Mixing exploiting the causal dependency in the action space to overcome local minima during training the whole-body system. We also present a simple design for a low-cost legged manipulator, and find that our unified policy can demonstrate dynamic and agile behaviors across several task setups. Videos are at https://maniploco.github.io △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: CoRL 2022 (Oral). Project website at https://maniploco.github.io

arXiv:2209.07654 [pdf, ps, other]

Cerberus: Low-Drift Visual-Inertial-Leg Odometry For Agile Locomotion

Authors: Shuo Yang, Zixin Zhang, Zhengyu Fu, Zachary Manchester

Abstract: We present an open-source Visual-Inertial-Leg Odometry (VILO) state estimation solution, Cerberus, for legged robots that estimates position precisely on various terrains in real time using a set of standard sensors, including stereo cameras, IMU, joint encoders, and contact sensors. In addition to estimating robot states, we also perform online kinematic parameter calibration and contact outlier… ▽ More We present an open-source Visual-Inertial-Leg Odometry (VILO) state estimation solution, Cerberus, for legged robots that estimates position precisely on various terrains in real time using a set of standard sensors, including stereo cameras, IMU, joint encoders, and contact sensors. In addition to estimating robot states, we also perform online kinematic parameter calibration and contact outlier rejection to substantially reduce position drift. Hardware experiments in various indoor and outdoor environments validate that calibrating kinematic parameters within the Cerberus can reduce estimation drift to lower than 1% during long distance high speed locomotion. Our drift results are better than any other state estimation method using the same set of sensors reported in the literature. Moreover, our state estimator performs well even when the robot is experiencing large impacts and camera occlusion. The implementation of the state estimator, along with the datasets used to compute our results, are available at https://github.com/ShuoYangRobotics/Cerberus. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 7 pages, 6 figures, submitted to IEEE ICRA 2023

arXiv:2209.07590

Prediction of Gender from Longitudinal MRI data via Deep Learning on Adolescent Data Reveals Unique Patterns Associated with Brain Structure and Change over a Two-year Period

Authors: Yuda Bi, Anees Abrol, Zening Fu, Jiayu Chen, Jingyu Liu, Vince Calhoun

Abstract: Deep learning algorithms for predicting neuroimaging data have shown considerable promise in various applications. Prior work has demonstrated that deep learning models that take advantage of the data's 3D structure can outperform standard machine learning on several learning tasks. However, most prior research in this area has focused on neuroimaging data from adults. Within the Adolescent Brain… ▽ More Deep learning algorithms for predicting neuroimaging data have shown considerable promise in various applications. Prior work has demonstrated that deep learning models that take advantage of the data's 3D structure can outperform standard machine learning on several learning tasks. However, most prior research in this area has focused on neuroimaging data from adults. Within the Adolescent Brain and Cognitive Development (ABCD) dataset, a large longitudinal development study, we examine structural MRI data to predict gender and identify gender-related changes in brain structure. Results demonstrate that gender prediction accuracy is exceptionally high (>97%) with training epochs >200 and that this accuracy increases with age. Brain regions identified as the most discriminative in the task under study include predominantly frontal areas and the temporal lobe. When evaluating gender predictive changes specific to a two-year increase in age, a broader set of visual, cingulate, and insular regions are revealed. Our findings show a robust gender-related structural brain change pattern, even over a small age range. This suggests that it might be possible to study how the brain changes during adolescence by looking at how these changes are related to different behavioral and environmental factors. △ Less

Submitted 5 March, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: I submitted the wrong paper

arXiv:2208.12534 [pdf, other]

Learning energy-efficient driving behaviors by imitating experts

Authors: Abdul Rahman Kreidieh, Zhe Fu, Alexandre M. Bayen

Abstract: The rise of vehicle automation has generated significant interest in the potential role of future automated vehicles (AVs). In particular, in highly dense traffic settings, AVs are expected to serve as congestion-dampeners, mitigating the presence of instabilities that arise from various sources. However, in many applications, such maneuvers rely heavily on non-local sensing or coordination by int… ▽ More The rise of vehicle automation has generated significant interest in the potential role of future automated vehicles (AVs). In particular, in highly dense traffic settings, AVs are expected to serve as congestion-dampeners, mitigating the presence of instabilities that arise from various sources. However, in many applications, such maneuvers rely heavily on non-local sensing or coordination by interacting AVs, thereby rendering their adaptation to real-world settings a particularly difficult challenge. To address this challenge, this paper examines the role of imitation learning in bridging the gap between such control strategies and realistic limitations in communication and sensing. Treating one such controller as an "expert", we demonstrate that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations. Results and code are available online at https://sites.google.com/view/il-traffic/home. △ Less

Submitted 28 June, 2022; originally announced August 2022.

arXiv:2208.10642 [pdf, other]

Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound

Authors: Zeyu Fu, Jianbo Jiao, Robail Yasrab, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics.… ▽ More Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics. In this paper, we propose to improve visual representations of medical images via anatomy-aware contrastive learning (AWCL), which incorporates anatomy information to augment the positive/negative pair sampling in a contrastive learning manner. The proposed approach is demonstrated for automated fetal ultrasound imaging tasks, enabling the positive pairs from the same or different ultrasound scans that are anatomically similar to be pulled together and thus improving the representation learning. We empirically investigate the effect of inclusion of anatomy information with coarse- and fine-grained granularity, for contrastive learning and find that learning with fine-grained anatomy information which preserves intra-class difference is more effective than its counterpart. We also analyze the impact of anatomy ratio on our AWCL framework and find that using more distinct but anatomically similar samples to compose positive pairs results in better quality representations. Experiments on a large-scale fetal ultrasound dataset demonstrate that our approach is effective for learning representations that transfer well to three clinical downstream tasks, and achieves superior performance compared to ImageNet supervised and the current state-of-the-art contrastive learning methods. In particular, AWCL outperforms ImageNet supervised method by 13.8% and state-of-the-art contrastive-based method by 7.1% on a cross-domain segmentation task. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: ECCV-MCV 2022

arXiv:2208.04556 [pdf, ps, other]

doi 10.1109/TVT.2022.3195529

A Codebook Design for FD-MIMO Systems with Multi-Panel Array

Authors: Zhilin Fu, Sangwon Hwang, Jihwan Moon, Haibao Ren, Inkyu Lee

Abstract: In this work, we study codebook designs for full-dimension multiple-input multiple-output (FD-MIMO) systems with a multi-panel array (MPA). We propose novel codebooks which allow precise beam structures for MPA FD-MIMO systems by investigating the physical properties and alignments of the panels. We specifically exploit the characteristic that a group of antennas in a vertical direction exhibit mo… ▽ More In this work, we study codebook designs for full-dimension multiple-input multiple-output (FD-MIMO) systems with a multi-panel array (MPA). We propose novel codebooks which allow precise beam structures for MPA FD-MIMO systems by investigating the physical properties and alignments of the panels. We specifically exploit the characteristic that a group of antennas in a vertical direction exhibit more correlation than those in a horizontal direction. This enables an economical use of feedback bits while constructing finer beams compared to conventional codebooks. The codebook is further improved by dynamically allocating the feedback bits on multiple parts such as beam amplitude and co-phasing coefficients using reinforcement learning. The numerical results confirm the effectiveness of the proposed approach in terms of both performance and computational complexity. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2205.13109 [pdf]

Learning to segment with limited annotations: Self-supervised pretraining with regression and contrastive loss in MRI

Authors: Lavanya Umapathy, Zhiyang Fu, Rohit Philip, Diego Martin, Maria Altbach, Ali Bilgin

Abstract: Obtaining manual annotations for large datasets for supervised training of deep learning (DL) models is challenging. The availability of large unlabeled datasets compared to labeled ones motivate the use of self-supervised pretraining to initialize DL models for subsequent segmentation tasks. In this work, we consider two pre-training approaches for driving a DL model to learn different representa… ▽ More Obtaining manual annotations for large datasets for supervised training of deep learning (DL) models is challenging. The availability of large unlabeled datasets compared to labeled ones motivate the use of self-supervised pretraining to initialize DL models for subsequent segmentation tasks. In this work, we consider two pre-training approaches for driving a DL model to learn different representations using: a) regression loss that exploits spatial dependencies within an image and b) contrastive loss that exploits semantic similarity between pairs of images. The effect of pretraining techniques is evaluated in two downstream segmentation applications using Magnetic Resonance (MR) images: a) liver segmentation in abdominal T2-weighted MR images and b) prostate segmentation in T2-weighted MR images of the prostate. We observed that DL models pretrained using self-supervision can be finetuned for comparable performance with fewer labeled datasets. Additionally, we also observed that initializing the DL model using contrastive loss based pretraining performed better than the regression loss. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: Presented at the Annual Conference of International Society for Magnetic Resonance in Medicine, London, UK. May 2022

arXiv:2204.08114 [pdf, ps, other]

A Distributed control framework for the optimal operation of DC microgrids

Authors: Zao Fu, Michele Cucuzzella, Carlo Cenedese, Wenwu Yu, Jacquelien M. A. Scherpen

Abstract: In this paper we propose an original distributed control framework for DC mcirogrids. We first formulate the (optimal) control objectives as an aggregative game suitable for the energy trading market. Then, based on the dual theory, we analyze the equivalent distributed optimal condition for the proposed aggregative game and design a distributed control scheme to solve it. By interconnecting the D… ▽ More In this paper we propose an original distributed control framework for DC mcirogrids. We first formulate the (optimal) control objectives as an aggregative game suitable for the energy trading market. Then, based on the dual theory, we analyze the equivalent distributed optimal condition for the proposed aggregative game and design a distributed control scheme to solve it. By interconnecting the DC mcirogrid and the designed distributed control system in a power preserving way, we steer the DC microgrid's state to the desired optimal equilibrium, satisfying a predefined set of local and coupling constraints. Finally, based on the singular perturbation system theory, we analyze the convergence of the closed-loop system. The simulation results show excellent performance of the proposed control framework. △ Less

Submitted 11 January, 2023; v1 submitted 17 April, 2022; originally announced April 2022.

arXiv:2203.09487 [pdf, other]

Defending Against Adversarial Attack in ECG Classification with Adversarial Distillation Training

Authors: Jiahao Shao, Shijia Geng, Zhaoji Fu, Weilun Xu, Tong Liu, Shenda Hong

Abstract: In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attack… ▽ More In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attacks can significantly reduce the accuracy of DNNs. Studies have been conducted to defend ECG-based DNNs against traditional adversarial attacks, such as projected gradient descent (PGD), and smooth adversarial perturbation (SAP) which targets ECG classification; however, to the best of our knowledge, no study has completely explored the defense against adversarial attacks targeting ECG classification. Thus, we did different experiments to explore the effects of defense methods against white-box adversarial attack and black-box adversarial attack targeting ECG classification, and we found that some common defense methods performed well against these attacks. Besides, we proposed a new defense method called Adversarial Distillation Training (ADT) which comes from defensive distillation and can effectively improve the generalization performance of DNNs. The results show that our method performed more effectively against adversarial attacks targeting on ECG classification than the other baseline methods, namely, adversarial training, defensive distillation, Jacob regularization, and noise-to-signal ratio regularization. Furthermore, we found that our method performed better against PGD attacks with low noise levels, which means that our method has stronger robustness. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.00512 [pdf, other]

A Deep Bayesian Neural Network for Cardiac Arrhythmia Classification with Rejection from ECG Recordings

Authors: Wenrui Zhang, Xinxin Di, Guodong Wei, Shijia Geng, Zhaoji Fu, Shenda Hong

Abstract: With the development of deep learning-based methods, automated classification of electrocardiograms (ECGs) has recently gained much attention. Although the effectiveness of deep neural networks has been encouraging, the lack of information given by the outputs restricts clinicians' reexamination. If the uncertainty estimation comes along with the classification results, cardiologists can pay more… ▽ More With the development of deep learning-based methods, automated classification of electrocardiograms (ECGs) has recently gained much attention. Although the effectiveness of deep neural networks has been encouraging, the lack of information given by the outputs restricts clinicians' reexamination. If the uncertainty estimation comes along with the classification results, cardiologists can pay more attention to "uncertain" cases. Our study aims to classify ECGs with rejection based on data uncertainty and model uncertainty. We perform experiments on a real-world 12-lead ECG dataset. First, we estimate uncertainties using the Monte Carlo dropout for each classification prediction, based on our Bayesian neural network. Then, we accept predictions with uncertainty under a given threshold and provide "uncertain" cases for clinicians. Furthermore, we perform a simulation experiment using varying thresholds. Finally, with the help of a clinician, we conduct case studies to explain the results of large uncertainties and incorrect predictions with small uncertainties. The results show that correct predictions are more likely to have smaller uncertainties, and the performance on accepted predictions improves as the accepting ratio decreases (i.e. more rejections). Case studies also help explain why rejection can improve the performance. Our study helps neural networks produce more accurate results and provide information on uncertainties to better assist clinicians in the diagnosis process. It can also enable deep-learning-based ECG interpretation in clinical implementation. △ Less

Submitted 25 February, 2022; originally announced March 2022.

arXiv:2112.01697 [pdf, other]

LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences

Authors: Ziwang Fu, Feng Liu, Hanyang Wang, Siyuan Shen, Jiahao Zhang, Jiayin Qi, Xiangling Fu, Aimin Zhou

Abstract: Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, those approaches introduce information redundancy when fusing features and are inefficient without considering the comp… ▽ More Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, those approaches introduce information redundancy when fusing features and are inefficient without considering the complementarity of modalities. In this paper, we propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition from unaligned multimodal sequences. Specifically, we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences. Then, we design a novel transformer with cross-modal blocks (CB-Transformer) that enables complementary learning of different modalities, mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations. In addition, we splice the fused features with the original features to classify the emotions of the sequences. Finally, we conduct word-aligned and unaligned experiments on three challenging datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI. The experimental results show the superiority and efficiency of our proposed method in both settings. Compared with the mainstream methods, our approach reaches the state-of-the-art with a minimum number of parameters. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 9 pages ,Figure 2, Table 5

arXiv:2111.09103 [pdf, other]

Fast and Light-Weight Network for Single Frame Structured Illumination Microscopy Super-Resolution

Authors: Xi Cheng, Jun Li, Qiang Dai, Zhenyong Fu, Jian Yang

Abstract: Structured illumination microscopy (SIM) is an important super-resolution based microscopy technique that breaks the diffraction limit and enhances optical microscopy systems. With the development of biology and medical engineering, there is a high demand for real-time and robust SIM imaging under extreme low light and short exposure environments. Existing SIM techniques typically require multiple… ▽ More Structured illumination microscopy (SIM) is an important super-resolution based microscopy technique that breaks the diffraction limit and enhances optical microscopy systems. With the development of biology and medical engineering, there is a high demand for real-time and robust SIM imaging under extreme low light and short exposure environments. Existing SIM techniques typically require multiple structured illumination frames to produce a high-resolution image. In this paper, we propose a single-frame structured illumination microscopy (SF-SIM) based on deep learning. Our SF-SIM only needs one shot of a structured illumination frame and generates similar results compared with the traditional SIM systems that typically require 15 shots. In our SF-SIM, we propose a noise estimator which can effectively suppress the noise in the image and enable our method to work under the low light and short exposure environment, without the need for stacking multiple frames for non-local denoising. We also design a bandpass attention module that makes our deep network more sensitive to the change of frequency and enhances the imaging quality. Our proposed SF-SIM is almost 14 times faster than traditional SIM methods when achieving similar results. Therefore, our method is significantly valuable for the development of microbiology and medicine. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: 9 pages

arXiv:2109.15262 [pdf]

doi 10.1007/978-3-030-68222-4_7

Non-Hermitian physics and engineering in silicon photonics

Authors: Changqing Wang, Zhoutian Fu, Lan Yang

Abstract: Silicon photonics has been studied as an integratable optical platform where numerous applicable devices and systems are created based on modern physics and state-of-the-art nanotechnologies. The implementation of quantum mechanics has been the driving force of the most intriguing design of photonic structures, since the optical systems are found of great capability and potential in realizing the… ▽ More Silicon photonics has been studied as an integratable optical platform where numerous applicable devices and systems are created based on modern physics and state-of-the-art nanotechnologies. The implementation of quantum mechanics has been the driving force of the most intriguing design of photonic structures, since the optical systems are found of great capability and potential in realizing the analogues of quantum concepts and phenomena. Non-Hermitian physics, which breaks the conventional scope of quantum mechanics based on Hermitian Hamiltonian, has been widely explored in the platform of silicon photonics, with promising design of optical refractive index, modal coupling and gain-loss distribution. As we will discuss in this chapter, the unconventional properties of exceptional points and parity-time symmetry realized in silicon photonics have created new opportunities for ultrasensitive sensors, laser engineering, control of light propagation, topological mode conversion, etc. The marriage between the quantum non-Hermiticity and classical silicon platforms not only spurs numerous studies on the fundamental physics, but also enriches the potential functionalities of the integrated photonic systems. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: 30 pages, 12 figures, 225 references. Link to the published version: https://link.springer.com/chapter/10.1007%2F978-3-030-68222-4_7

Journal ref: Wang C., Fu Z., Yang L. (2021) Non-Hermitian Physics and Engineering in Silicon Photonics. In: Lockwood D.J., Pavesi L. (eds) Silicon Photonics IV. Topics in Applied Physics, vol 139. Springer, Cham

arXiv:2109.14671 [pdf, other]

Segmentation of Roads in Satellite Images using specially modified U-Net CNNs

Authors: Jonas Bokstaller, Yihang She, Zhehan Fu, Tommaso Macrì

Abstract: The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision al… ▽ More The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision algorithms, convolutional neural networks (CNNs) provide accurate and reliable results on this task. Our novel approach uses a sliding window to extract patches out of the whole image, data augmentation for generating more training/testing data and lastly a series of specially modified U-Net CNNs. This proposed technique outperforms all other baselines tested in terms of mean F-score metric. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: 4 pages, 4 figures

arXiv:2109.05485 [pdf, other]

Facial Anatomical Landmark Detection using Regularized Transfer Learning with Application to Fetal Alcohol Syndrome Recognition

Authors: Zeyu Fu, Jianbo Jiao, Michael Suttie, J. Alison Noble

Abstract: Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the… ▽ More Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not wellsuited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focuses on adjusting the pre-trained weights, the proposed learning approach regularizes the model behavior. It explicitly reuses the rich visual semantics of a domain-similar source model on the target task data as an additional supervisory signal for regularizing landmark detection optimization. Specifically, we develop four regularization constraints for the proposed transfer learning, including constraining the feature outputs from classification and intermediate layers, as well as matching activation attention maps in both spatial and channel levels. Experimental evaluation on a collected clinical imaging dataset demonstrate that the proposed approach can effectively improve model generalizability under limited training samples, and is advantageous to other approaches in the literature. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: To appear in IEEE journal of Biomedical and Health Informatics 2021

arXiv:2108.06652 [pdf, other]

Force-feedback based Whole-body Stabilizer for Position-Controlled Humanoid Robots

Authors: Shunpeng Yang, Hua Chen, Zhen Fu, Wei Zhang

Abstract: This paper studies stabilizer design for position-controlled humanoid robots. Stabilizers are an essential part for position-controlled humanoids, whose primary objective is to adjust the control input sent to the robot to assist the tracking controller to better follow the planned reference trajectory. To achieve this goal, this paper develops a novel force-feedback based whole-body stabilizer th… ▽ More This paper studies stabilizer design for position-controlled humanoid robots. Stabilizers are an essential part for position-controlled humanoids, whose primary objective is to adjust the control input sent to the robot to assist the tracking controller to better follow the planned reference trajectory. To achieve this goal, this paper develops a novel force-feedback based whole-body stabilizer that fully exploits the six-dimensional force measurement information and the whole-body dynamics to improve tracking performance. Relying on rigorous analysis of whole-body dynamics of position-controlled humanoids under unknown contact, the developed stabilizer leverages quadratic-programming based technique that allows cooperative consideration of both the center-of-mass tracking and contact force tracking. The effectiveness of the proposed stabilizer is demonstrated on the UBTECH Walker robot in the MuJoCo simulator. Simulation validations show a significant improvement in various scenarios as compared to commonly adopted stabilizers based on the zero-moment-point feedback and the linear inverted pendulum model. △ Less

Submitted 14 August, 2021; originally announced August 2021.

Comments: IROS 2021, 8 pages

arXiv:2105.08629 [pdf, other]

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

Authors: Andrey Ignatov, Kim Byeoung-su, Radu Timofte, Angeline Pouget, Fenglong Song, Cheng Li, Shuai Xiao, Zhongqian Fu, Matteo Maggioni, Yibin Huang, Shen Cheng, Xin Lu, Yifeng Zhou, Liangyu Chen, Donghao Liu, Xiangyu Zhang, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Bin Huang , et al. (7 additional authors not shown)

Abstract: Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut… ▽ More Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.07809, arXiv:2105.07825

arXiv:2105.01128 [pdf, other]

Fusing multimodal neuroimaging data with a variational autoencoder

Authors: Eloy Geenjaar, Noah Lewis, Zening Fu, Rohan Venkatdas, Sergey Plis, Vince Calhoun

Abstract: Neuroimaging studies often involve the collection of multiple data modalities. These modalities contain both shared and mutually exclusive information about the brain. This work aims at finding a scalable and interpretable method to fuse the information of multiple neuroimaging modalities using a variational autoencoder (VAE). To provide an initial assessment, this work evaluates the representatio… ▽ More Neuroimaging studies often involve the collection of multiple data modalities. These modalities contain both shared and mutually exclusive information about the brain. This work aims at finding a scalable and interpretable method to fuse the information of multiple neuroimaging modalities using a variational autoencoder (VAE). To provide an initial assessment, this work evaluates the representations that are learned using a schizophrenia classification task. A support vector machine trained on the representations achieves an area under the curve for the classifier's receiver operating characteristic (ROC-AUC) of 0.8610. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2103.05407 [pdf, ps, other]

doi 10.1109/CVPR46437.2021.00347

Efficient Multi-Stage Video Denoising with Recurrent Spatio-Temporal Fusion

Authors: Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, Fenglong Song

Abstract: In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of a… ▽ More In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of all past frames in the video. Then, a denoising stage removes the noise in the fused frame. Finally, a refinement stage restores the missing high frequency in the denoised frame. All stages operate on a transform-domain representation obtained by learnable and invertible linear operators which simultaneously increase accuracy and decrease complexity of the model. A single loss on the final output is sufficient for successful convergence, hence making EMVD easy to train. Experiments on real raw data demonstrate that EMVD outperforms the state of the art when complexity is constrained, and even remains competitive against methods whose complexities are several orders of magnitude higher. Further, the low complexity and memory requirements of EMVD enable real-time video denoising on commercial SoC in mobile devices. △ Less

Submitted 30 March, 2023; v1 submitted 9 March, 2021; originally announced March 2021.

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3465-3474

arXiv:2103.02186 [pdf]

Eye-gaze Estimation with HEOG and Neck EMG using Deep Neural Networks

Authors: Zhen Fu, Bo Wang, Fei Chen, Xihong Wu, Jing Chen

Abstract: Hearing-impaired listeners usually have troubles attending target talker in multi-talker scenes, even with hearing aids (HAs). The problem can be solved with eye-gaze steering HAs, which requires listeners eye-gazing on the target. In a situation where head rotates, eye-gaze is subject to both behaviors of saccade and head rotation. However, existing methods of eye-gaze estimation did not work rel… ▽ More Hearing-impaired listeners usually have troubles attending target talker in multi-talker scenes, even with hearing aids (HAs). The problem can be solved with eye-gaze steering HAs, which requires listeners eye-gazing on the target. In a situation where head rotates, eye-gaze is subject to both behaviors of saccade and head rotation. However, existing methods of eye-gaze estimation did not work reliably, since the listener's strategy of eye-gaze varies and measurements of the two behaviors were not properly combined. Besides, existing methods were based on hand-craft features, which could overlook some important information. In this paper, a head-fixed and a head-free experiments were conducted. We used horizontal electrooculography (HEOG) and neck electromyography (NEMG), which separately measured saccade and head rotation to commonly estimate eye-gaze. Besides traditional classifier and hand-craft features, deep neural networks (DNN) were introduced to automatically extract features from intact waveforms. Evaluation results showed that when the input was HEOG with inertial measurement unit, the best performance of our proposed DNN classifiers achieved 93.3%; and when HEOG was with NEMG together, the accuracy reached 72.6%, higher than that with HEOG (about 71.0%) or NEMG (about 35.7%) alone. These results indicated the feasibility to estimate eye-gaze with HEOG and NEMG. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 5 pages, 5 figures, submitted to EUSIPCO 2021

arXiv:2103.02183 [pdf]

Auditory Attention Decoding from EEG using Convolutional Recurrent Neural Network

Authors: Zhen Fu, Bo Wang, Xihong Wu, Jing Chen

Abstract: The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear model… ▽ More The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers' weight. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 5 pages, 4 figures, submitted to EUSIPCO 2021

arXiv:2102.00676 [pdf, other]

Underwater Image Enhancement via Learning Water Type Desensitized Representations

Authors: Zhenqi Fu, Xiaopeng Lin, Wu Wang, Yue Huang, Xinghao Ding

Abstract: We present a novel underwater image enhancement method termed SCNet to improve the image quality meanwhile cope with the degradation diversity caused by the water. SCNet is based on normalization schemes across both spatial and channel dimensions with the key idea of learning water type desensitized features. Specifically, we apply whitening to de-correlate activations across spatial dimensions fo… ▽ More We present a novel underwater image enhancement method termed SCNet to improve the image quality meanwhile cope with the degradation diversity caused by the water. SCNet is based on normalization schemes across both spatial and channel dimensions with the key idea of learning water type desensitized features. Specifically, we apply whitening to de-correlate activations across spatial dimensions for each instance in a mini-batch. We also eliminate channel-wise correlation by standardizing and re-injecting the first two moments of the activations across channels. The normalization schemes of spatial and channel dimensions are performed at each scale of the U-Net to obtain multi-scale representations. With such water type irrelevant encodings, the decoder can easily reconstruct the clean signal and be unaffected by the distortion types. Experimental results on two real-world underwater image datasets show that our approach can successfully enhance images with diverse water types, and achieves competitive performance in visual quality improvement. △ Less

Submitted 14 March, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

arXiv:2012.03673 [pdf, other]

Efficient Medical Image Segmentation with Intermediate Supervision Mechanism

Authors: Di Yuan, Junyang Chen, Zhenghua Xu, Thomas Lukasiewicz, Zhigang Fu, Guizhi Xu

Abstract: Because the expansion path of U-Net may ignore the characteristics of small targets, intermediate supervision mechanism is proposed. The original mask is also entered into the network as a label for intermediate output. However, U-Net is mainly engaged in segmentation, and the extracted features are also targeted at segmentation location information, and the input and output are different. The lab… ▽ More Because the expansion path of U-Net may ignore the characteristics of small targets, intermediate supervision mechanism is proposed. The original mask is also entered into the network as a label for intermediate output. However, U-Net is mainly engaged in segmentation, and the extracted features are also targeted at segmentation location information, and the input and output are different. The label we need is that the input and output are both original masks, which is more similar to the refactoring process, so we propose another intermediate supervision mechanism. However, the features extracted by the contraction path of this intermediate monitoring mechanism are not necessarily consistent. For example, U-Net's contraction path extracts transverse features, while auto-encoder extracts longitudinal features, which may cause the output of the expansion path to be inconsistent with the label. Therefore, we put forward the intermediate supervision mechanism of shared-weight decoder module. Although the intermediate supervision mechanism improves the segmentation accuracy, the training time is too long due to the extra input and multiple loss functions. For one of these problems, we have introduced tied-weight decoder. To reduce the redundancy of the model, we combine shared-weight decoder module with tied-weight decoder module. △ Less

Submitted 15 November, 2020; originally announced December 2020.

arXiv:2011.00940 [pdf, other]

Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A Survey

Authors: Dan Zhao, Guizhi Xu, Zhenghua XU, Thomas Lukasiewicz, Minmin Xue, Zhigang Fu

Abstract: Computer-Aided Diagnosis and Treatment of Tumors is a hot topic of deep learning in recent years, which constitutes a series of medical tasks, such as detection of tumor markers, the outline of tumor leisures, subtypes and stages of tumors, prediction of therapeutic effect, and drug development. Meanwhile, there are some deep learning models with precise positioning and excellent performance produ… ▽ More Computer-Aided Diagnosis and Treatment of Tumors is a hot topic of deep learning in recent years, which constitutes a series of medical tasks, such as detection of tumor markers, the outline of tumor leisures, subtypes and stages of tumors, prediction of therapeutic effect, and drug development. Meanwhile, there are some deep learning models with precise positioning and excellent performance produced in mainstream task scenarios. Thus follow to introduce deep learning methods from task-orient, mainly focus on the improvements for medical tasks. Then to summarize the recent progress in four stages of tumor diagnosis and treatment, which named In-Vitro Diagnosis (IVD), Imaging Diagnosis (ID), Pathological Diagnosis (PD), and Treatment Planning (TP). According to the specific data types and medical tasks of each stage, we present the applications of deep learning in the Computer-Aided Diagnosis and Treatment of Tumors and analyzing the excellent works therein. This survey concludes by discussing research issues and suggesting challenges for future improvement. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2010.09776 [pdf, other]

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS. △ Less

Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

arXiv:2010.08942 [pdf, other]

doi 10.1109/LSP.2021.3050712

Distortion-aware Monocular Depth Estimation for Omnidirectional Images

Authors: Hong-Xiang Chen, Kunhong Li, Zhiheng Fu, Mengyi Liu, Zonghao Chen, Yulan Guo

Abstract: A main challenge for tasks on panorama lies in the distortion of objects among images. In this work, we propose a Distortion-Aware Monocular Omnidirectional (DAMO) dense depth estimation network to address this challenge on indoor panoramas with two steps. First, we introduce a distortion-aware module to extract calibrated semantic features from omnidirectional images. Specifically, we exploit def… ▽ More A main challenge for tasks on panorama lies in the distortion of objects among images. In this work, we propose a Distortion-Aware Monocular Omnidirectional (DAMO) dense depth estimation network to address this challenge on indoor panoramas with two steps. First, we introduce a distortion-aware module to extract calibrated semantic features from omnidirectional images. Specifically, we exploit deformable convolution to adjust its sampling grids to geometric variations of distorted objects on panoramas and then utilize a strip pooling module to sample against horizontal distortion introduced by inverse gnomonic projection. Second, we further introduce a plug-and-play spherical-aware weight matrix for our objective function to handle the uneven distribution of areas projected from a sphere. Experiments on the 360D dataset show that the proposed method can effectively extract semantic features from distorted panoramas and alleviate the supervision bias caused by distortion. It achieves state-of-the-art performance on the 360D dataset with high efficiency. △ Less

Submitted 29 November, 2020; v1 submitted 18 October, 2020; originally announced October 2020.

Comments: Preprint

arXiv:2009.13635 [pdf, other]

Cross-Task Representation Learning for Anatomical Landmark Detection

Authors: Zeyu Fu, Jianbo Jiao, Michael Suttie, J. Alison Noble

Abstract: Recently, there is an increasing demand for automatically detecting anatomical landmarks which provide rich structural information to facilitate subsequent medical image analysis. Current methods related to this task often leverage the power of deep neural networks, while a major challenge in fine tuning such models in medical applications arises from insufficient number of labeled samples. To add… ▽ More Recently, there is an increasing demand for automatically detecting anatomical landmarks which provide rich structural information to facilitate subsequent medical image analysis. Current methods related to this task often leverage the power of deep neural networks, while a major challenge in fine tuning such models in medical applications arises from insufficient number of labeled samples. To address this, we propose to regularize the knowledge transfer across source and target tasks through cross-task representation learning. The proposed method is demonstrated for extracting facial anatomical landmarks which facilitate the diagnosis of fetal alcohol syndrome. The source and target tasks in this work are face recognition and landmark detection, respectively. The main idea of the proposed method is to retain the feature representations of the source model on the target task data, and to leverage them as an additional source of supervisory signals for regularizing the target model learning, thereby improving its performance under limited training samples. Concretely, we present two approaches for the proposed representation learning by constraining either final or intermediate model features on the target model. Experimental results on a clinical face image dataset demonstrate that the proposed approach works well with few labeled data, and outperforms other compared approaches. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: MICCAI-MLMI 2020

arXiv:2009.13634 [pdf, other]

MPG-Net: Multi-Prediction Guided Network for Segmentation of Retinal Layers in OCT Images

Authors: Zeyu Fu, Yang Sun, Xiangyu Zhang, Scott Stainton, Shaun Barney, Jeffry Hogg, William Innes, Satnam Dlay

Abstract: Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retinal information. Moreover there is an increasing demand for the automated retinal layer segmentation which facilitates the retinal disease diagnosis. In this paper, we propose a novel multiprediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images. The proposed m… ▽ More Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retinal information. Moreover there is an increasing demand for the automated retinal layer segmentation which facilitates the retinal disease diagnosis. In this paper, we propose a novel multiprediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images. The proposed method consists of two major steps to strengthen the discriminative power of a U-shape Fully convolutional network (FCN) for reliable automated segmentation. Firstly, the feature refinement module which adaptively re-weights the feature channels is exploited in the encoder to capture more informative features and discard information in irrelevant regions. Furthermore, we propose a multi-prediction guided attention mechanism which provides pixel-wise semantic prediction guidance to better recover the segmentation mask at each scale. This mechanism which transforms the deep supervision to supervised attention is able to guide feature aggregation with more semantic information between intermediate layers. Experiments on the publicly available Duke OCT dataset confirm the effectiveness of the proposed method as well as an improved performance over other state-of-the-art approaches. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: EUSIPCO2020

arXiv:2007.13135 [pdf, other]

Contrastive Visual-Linguistic Pretraining

Authors: Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

Abstract: Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label probl… ▽ More Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label problems, based on the visual features having been pretrained on the Visual Genome dataset. To overcome these issues, we propose unbiased Contrastive Visual-Linguistic Pretraining (CVLP), which constructs a visual self-supervised loss built upon contrastive learning. We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning. Our code is available at: https://github.com/ArcherYunDong/CVLP-. △ Less

Submitted 26 July, 2020; originally announced July 2020.

arXiv:2007.02165 [pdf, other]

CardioLearn: A Cloud Deep Learning Service for Cardiac Disease Detection from Electrocardiogram

Authors: Shenda Hong, Zhaoji Fu, Rongbo Zhou, Jie Yu, Yongkui Li, Kai Wang, Guanlin Cheng

Abstract: Electrocardiogram (ECG) is one of the most convenient and non-invasive tools for monitoring peoples' heart condition, which can use for diagnosing a wide range of heart diseases, including Cardiac Arrhythmia, Acute Coronary Syndrome, et al. However, traditional ECG disease detection models show substantial rates of misdiagnosis due to the limitations of the abilities of extracted features. Recent… ▽ More Electrocardiogram (ECG) is one of the most convenient and non-invasive tools for monitoring peoples' heart condition, which can use for diagnosing a wide range of heart diseases, including Cardiac Arrhythmia, Acute Coronary Syndrome, et al. However, traditional ECG disease detection models show substantial rates of misdiagnosis due to the limitations of the abilities of extracted features. Recent deep learning methods have shown significant advantages, but they do not provide publicly available services for those who have no training data or computational resources. In this paper, we demonstrate our work on building, training, and serving such out-of-the-box cloud deep learning service for cardiac disease detection from ECG named CardioLearn. The analytic ability of any other ECG recording devices can be enhanced by connecting to the Internet and invoke our open API. As a practical example, we also design a portable smart hardware device along with an interactive mobile program, which can collect ECG and detect potential cardiac diseases anytime and anywhere. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: WWW 2020 Demo

arXiv:2006.08939 [pdf, other]

Learning the Redundancy-free Features for Generalized Zero-Shot Object Recognition

Authors: Zongyan Han, Zhenyong Fu, Jian Yang

Abstract: Zero-shot object recognition or zero-shot learning aims to transfer the object recognition ability among the semantically related categories, such as fine-grained animal or bird species. However, the images of different fine-grained objects tend to merely exhibit subtle differences in appearance, which will severely deteriorate zero-shot object recognition. To reduce the superfluous information in… ▽ More Zero-shot object recognition or zero-shot learning aims to transfer the object recognition ability among the semantically related categories, such as fine-grained animal or bird species. However, the images of different fine-grained objects tend to merely exhibit subtle differences in appearance, which will severely deteriorate zero-shot object recognition. To reduce the superfluous information in the fine-grained objects, in this paper, we propose to learn the redundancy-free features for generalized zero-shot learning. We achieve our motivation by projecting the original visual features into a new (redundancy-free) feature space and then restricting the statistical dependence between these two feature spaces. Furthermore, we require the projected features to keep and even strengthen the category relationship in the redundancy-free feature space. In this way, we can remove the redundant information from the visual features without losing the discriminative information. We extensively evaluate the performance on four benchmark datasets. The results show that our redundancy-free feature based generalized zero-shot learning (RFF-GZSL) approach can achieve competitive results compared with the state-of-the-arts. △ Less

Submitted 23 May, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Some researchers and we have found KNN results in 1st version are incorrect, due to a careless mistake in the code. Concretely, the parameters for accuracy function of KNN were organized in the wrong order by mistake. The softmax results are correct. We have removed all KNN results and remove the SOTA claims. According to the Program Chairs' suggestion, we have made errata request to CVF and IEEE

arXiv:2005.08646 [pdf, other]

Character Matters: Video Story Understanding with Character-Aware Relations

Authors: Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

Abstract: Different from short videos and GIFs, video stories contain clear plots and lists of principal characters. Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots. Video Story Question Answering (VSQA) offers an effective way to benchmark higher-level comprehension abilities of a model. However, current VSQ… ▽ More Different from short videos and GIFs, video stories contain clear plots and lists of principal characters. Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots. Video Story Question Answering (VSQA) offers an effective way to benchmark higher-level comprehension abilities of a model. However, current VSQA methods merely extract generic visual features from a scene. With such an approach, they remain prone to learning just superficial correlations. In order to attain a genuine understanding of who did what to whom, we propose a novel model that continuously refines character-aware relations. This model specifically considers the characters in a video story, as well as the relations connecting different characters and objects. Based on these signals, our framework enables weakly-supervised face naming through multi-instance co-occurrence matching and supports high-level reasoning utilizing Transformer structures. We train and test our model on the six diverse TV shows in the TVQA dataset, which is by far the largest and only publicly available dataset for VSQA. We validate our proposed approach over TVQA dataset through extensive ablation study. △ Less

Submitted 9 May, 2020; originally announced May 2020.

arXiv:1911.06813 [pdf, ps, other]

Transfer Learning of fMRI Dynamics

Authors: Usman Mahmood, Md Mahfuzur Rahman, Alex Fedorov, Zening Fu, Sergey Plis

Abstract: As a mental disorder progresses, it may affect brain structure, but brain function expressed in brain dynamics is affected much earlier. Capturing the moment when brain dynamics express the disorder is crucial for early diagnosis. The traditional approach to this problem via training classifiers either proceeds from handcrafted features or requires large datasets to combat the $m>>n$ problem when… ▽ More As a mental disorder progresses, it may affect brain structure, but brain function expressed in brain dynamics is affected much earlier. Capturing the moment when brain dynamics express the disorder is crucial for early diagnosis. The traditional approach to this problem via training classifiers either proceeds from handcrafted features or requires large datasets to combat the $m>>n$ problem when a high dimensional fMRI volume only has a single label that carries learning signal. Large datasets may not be available for a study of each disorder, or rare disorder types or sub-populations may not warrant for them. In this paper, we demonstrate a self-supervised pre-training method that enables us to pre-train directly on fMRI dynamics of healthy control subjects and transfer the learning to much smaller datasets of schizophrenia. Not only we enable classification of disorder directly based on fMRI dynamics in small data but also significantly speed up the learning when possible. This is encouraging evidence of informative transfer learning across datasets and diagnostic categories. △ Less

Submitted 16 November, 2019; originally announced November 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1911.03461 [pdf, other]

AIM 2019 Challenge on Image Demoireing: Methods and Results

Authors: Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Ming Hong, Wenying Lin, Wenjin Yang, Yanyun Qu, Hong-Kyu Shin, Joon-Yeon Kim, Sung-Jea Ko, Hang Dong, Yu Guo, Jie Wang, Xuan Ding, Zongyan Han, Sourya Dipta Das, Kuldeep Purohit , et al. (3 additional authors not shown)

Abstract: This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa… ▽ More This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire was created for this challenge, and consists of 10,200 synthetically generated image pairs (moire and clean ground truth). The challenge was divided into 2 tracks. Track 1 targeted fidelity, measuring the ability of demoire methods to obtain a moire-free image compared with the ground truth, while Track 2 examined the perceptual quality of demoire methods. The tracks had 60 and 39 registered participants, respectively. A total of eight teams competed in the final testing phase. The entries span the current the state-of-the-art in the image demoireing problem. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1911.02498

arXiv:1910.06761 [pdf, other]

Causal Mechanism Transfer Network for Time Series Domain Adaptation in Mechanical Systems

Authors: Zijian Li, Ruichu Cai, Kok Soon Chai, Hong Wei Ng, Hoang Dung Vu, Marianne Winslett, Tom Z. J. Fu, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang

Abstract: Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry to massive labels over historical data, mostly contributed by human engineers at an… ▽ More Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry to massive labels over historical data, mostly contributed by human engineers at an extremely high cost. The label demand is now the major limiting factor to modeling accuracy, hindering the fulfillment of visions for applications. Fortunately, domain adaptation enhances the model generalization by utilizing the labelled source data as well as the unlabelled target data and then we can reuse the model on different domains. However, the mainstream domain adaptation methods cannot achieve ideal performance on time series data, because most of them focus on static samples and even the existing time series domain adaptation methods ignore the properties of time series data, such as temporal causal mechanism. In this paper, we assume that causal mechanism is invariant and present our Causal Mechanism Transfer Network(CMTN) for time series domain adaptation. By capturing and transferring the dynamic and temporal causal mechanism of multivariate time series data and alleviating the time lags and different value ranges among different machines, CMTN allows the data-driven models to exploit existing data and labels from similar systems, such that the resulting model on a new system is highly reliable even with very limited data. We report our empirical results and lessons learned from two real-world case studies, on chiller plant energy optimization and boiler fault detection, which outperforms the existing state-of-the-art method. △ Less

Submitted 13 October, 2019; originally announced October 2019.

arXiv:1811.08064 [pdf, other]

doi 10.1109/ICCAD.2017.8203885

Model and Integrate Medical Resource Availability into Verifiably Correct Executable Medical Guidelines - Technical Report

Authors: Chunhui Guo, Zhicheng Fu, Zhenyu Zhang, Shangping Ren, Lui Sha

Abstract: Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be reduced by computerizing medical guidelines. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However… ▽ More Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be reduced by computerizing medical guidelines. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However, the reality is that some medical resources, such as special medical equipment or medical specialists, can be temporarily unavailable for an individual patient. In such cases, safety properties validated and/or verified in existing medical guideline models without considering medical resource availability may not hold any more. The paper argues that considering medical resource availability is essential in building verifiably correct executable medical guidelines. We present an approach to explicitly and separately model medical resource availability and automatically integrate resource availability models into an existing statechart-based computerized medical guideline model. This approach requires minimal change in existing medical guideline models to take into consideration of medical resource availability in validating and verifying medical guideline models. A simplified stroke scenario is used as a case study to investigate the effectiveness and validity of our approach. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Comments: full version, 8 pages. arXiv admin note: substantial text overlap with arXiv:1811.08061

Journal ref: IEEE/ACM 36th International Conference on Computer-Aided Design (ICCAD), 2017

Showing 1–50 of 57 results for author: Fu, Z