Search | arXiv e-print repository

Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experiment leveraged a heterogeneous fleet of 100 longitudinally-controlled vehicles as Lagrangian traffic actuators, each of which ran a controller with the architecture described in this paper. The MegaController is a hierarchical control architecture, which consists of two main layers. The upper layer is called Speed Planner, and is a centralized optimal control algorithm. It assigns speed targets to the vehicles, conveyed through the LTE cellular network. The lower layer is a control layer, running on each vehicle. It performs local actuation by overriding the stock adaptive cruise controller, using the stock on-board sensors. The Speed Planner ingests live data feeds provided by third parties, as well as data from our own control vehicles, and uses both to perform the speed assignment. The architecture of the speed planner allows for modular use of standard control techniques, such as optimal control, model predictive control, kernel methods and others, including Deep RL, model predictive control and explicit controllers. Depending on the vehicle architecture, all onboard sensing data can be accessed by the local controllers, or only some. Control inputs vary across different automakers, with inputs ranging from torque or acceleration requests for some cars, and electronic selection of ACC set points in others. The proposed architecture allows for the combination of all possible settings proposed above. Most configurations were tested throughout the ramp up to the MegaVandertest. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.02491 [pdf, other]

VM-UNet: Vision Mamba UNet for Medical Image Segmentation

Authors: Jiacheng Ruan, Suncheng Xiang

Abstract: In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling l… ▽ More In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 12 pages, 2 figures, 3 tables. Work in progress

arXiv:2401.02099 [pdf]

Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

Authors: Zeyu Li, Suncheng Xiang, Tong Yu, Jingsheng Gao, Jiacheng Ruan, Yanping Hu, Ting Liu, Yuzhuo Fu

Abstract: The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audi… ▽ More The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 hours, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset. △ Less

Submitted 10 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by ICIC 2024

arXiv:2312.17030 [pdf, other]

Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation

Authors: Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Suncheng Xiang

Abstract: Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) ba… ▽ More Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2210.14007

arXiv:2312.06462 [pdf, other]

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2310.18151 [pdf, other]

Traffic smoothing using explicit local controllers

Authors: Amaury Hayat, Arwa Alanqary, Rahul Bhadani, Christopher Denaro, Ryan J. Weightman, Shengquan Xiang, Jonathan W. Lee, Matthew Bunting, Anish Gollakota, Matthew W. Nice, Derek Gloudemans, Gergely Zachar, Jon F. Davis, Maria Laura Delle Monache, Benjamin Seibold, Alexandre M. Bayen, Jonathan Sprinkle, Daniel B. Work, Benedetto Piccoli

Abstract: The dissipation of stop-and-go waves attracted recent attention as a traffic management problem, which can be efficiently addressed by automated driving. As part of the 100 automated vehicles experiment named MegaVanderTest, feedback controls were used to induce strong dissipation via velocity smoothing. More precisely, a single vehicle driving differently in one of the four lanes of I-24 in the N… ▽ More The dissipation of stop-and-go waves attracted recent attention as a traffic management problem, which can be efficiently addressed by automated driving. As part of the 100 automated vehicles experiment named MegaVanderTest, feedback controls were used to induce strong dissipation via velocity smoothing. More precisely, a single vehicle driving differently in one of the four lanes of I-24 in the Nashville area was able to regularize the velocity profile by reducing oscillations in time and velocity differences among vehicles. Quantitative measures of this effect were possible due to the innovative I-24 MOTION system capable of monitoring the traffic conditions for all vehicles on the roadway. This paper presents the control design, the technological aspects involved in its deployment, and, finally, the results achieved by the experiment. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 21 pages, 1 Table , 9 figures

MSC Class: 93D15; 93D21; 93-05; 34H05; ACM Class: H.2.2

arXiv:2305.05813 [pdf, other]

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review

Authors: Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang

Abstract: Change detection is an essential and widely utilized task in remote sensing that aims to detect and analyze changes occurring in the same geographical area over time, which has broad applications in urban development, agricultural surveys, and land cover monitoring. Detecting changes in remote sensing images is a complex challenge due to various factors, including variations in image quality, nois… ▽ More Change detection is an essential and widely utilized task in remote sensing that aims to detect and analyze changes occurring in the same geographical area over time, which has broad applications in urban development, agricultural surveys, and land cover monitoring. Detecting changes in remote sensing images is a complex challenge due to various factors, including variations in image quality, noise, registration errors, illumination changes, complex landscapes, and spatial heterogeneity. In recent years, deep learning has emerged as a powerful tool for feature extraction and addressing these challenges. Its versatility has resulted in its widespread adoption for numerous image-processing tasks. This paper presents a comprehensive survey of significant advancements in change detection for remote sensing images over the past decade. We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section. This survey enables readers to gain systematic knowledge of change detection tasks from various angles. We then summarize the state-of-the-art performance on several dominant change detection datasets, providing insights into the strengths and limitations of existing algorithms. Based on our survey, some future research directions for change detection in remote sensing are well identified. This survey paper will shed some light on the community and inspire further research efforts in the change detection task. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 21 pages, 4 figures, 10 tables

arXiv:2211.01784 [pdf, other]

MALUNet: A Multi-Attention and Light-weight UNet for Skin Lesion Segmentation

Authors: Jiacheng Ruan, Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu

Abstract: Recently, some pioneering works have preferred applying more complex modules to improve segmentation performances. However, it is not friendly for actual clinical environments due to limited computing resources. To address this challenge, we propose a light-weight model to achieve competitive performances for skin lesion segmentation at the lowest cost of parameters and computational complexity so… ▽ More Recently, some pioneering works have preferred applying more complex modules to improve segmentation performances. However, it is not friendly for actual clinical environments due to limited computing resources. To address this challenge, we propose a light-weight model to achieve competitive performances for skin lesion segmentation at the lowest cost of parameters and computational complexity so far. Briefly, we propose four modules: (1) DGA consists of dilated convolution and gated attention mechanisms to extract global and local feature information; (2) IEA, which is based on external attention to characterize the overall datasets and enhance the connection between samples; (3) CAB is composed of 1D convolution and fully connected layers to perform a global and local fusion of multi-stage features to generate attention maps at channel axis; (4) SAB, which operates on multi-stage features by a shared 2D convolution to generate attention maps at spatial axis. We combine four modules with our U-shape architecture and obtain a light-weight medical image segmentation model dubbed as MALUNet. Compared with UNet, our model improves the mIoU and DSC metrics by 2.39% and 1.49%, respectively, with a 44x and 166x reduction in the number of parameters and computational complexity. In addition, we conduct comparison experiments on two skin lesion segmentation datasets (ISIC2017 and ISIC2018). Experimental results show that our model achieves state-of-the-art in balancing the number of parameters, computational complexity and segmentation performances. Code is available at https://github.com/JCruan519/MALUNet. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 7 pages, 7 figures, 5 tables; This work has been accepted as a regular paper in IEEE BIBM 2022

arXiv:2210.14007 [pdf, other]

MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

Authors: Jiacheng Ruan, Mingye Xie, Suncheng Xiang, Ting Liu, Yuzhuo Fu

Abstract: Recently, Visual Transformer (ViT) has been widely used in various fields of computer vision due to applying self-attention mechanism in the spatial domain to modeling global knowledge. Especially in medical image segmentation (MIS), many works are devoted to combining ViT and CNN, and even some works directly utilize pure ViT-based models. However, recent works improved models in the aspect of sp… ▽ More Recently, Visual Transformer (ViT) has been widely used in various fields of computer vision due to applying self-attention mechanism in the spatial domain to modeling global knowledge. Especially in medical image segmentation (MIS), many works are devoted to combining ViT and CNN, and even some works directly utilize pure ViT-based models. However, recent works improved models in the aspect of spatial domain while ignoring the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) for MIS based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain, which is generated by our Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets and achieve state-of-the-art performances. In particular, on the Synapse dataset, our method outperforms MT-UNet by 10.15mm in terms of HD95. Code is available at https://github.com/JCruan519/MEW-UNet. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2112.05082 [pdf, other]

doi 10.1109/TAP.2022.3215230

Fast Electromagnetic Validations of Large-Scale Digital Coding Metasurfaces Accelerated by Recurrence Rebuild and Retrieval Method

Authors: Yu Zhao, Shang Xiang, Long Li

Abstract: The recurrence rebuild and retrieval method (R3M) is proposed in this paper to accelerate the electromagnetic (EM) validations of large-scale digital coding metasurfaces (DCMs). R3M aims to accelerate the EM validations of DCMs with varied codebooks, which involves the analysis of a group of similar but not identical structures. The method transforms general DCMs to rigorously periodic arrays by r… ▽ More The recurrence rebuild and retrieval method (R3M) is proposed in this paper to accelerate the electromagnetic (EM) validations of large-scale digital coding metasurfaces (DCMs). R3M aims to accelerate the EM validations of DCMs with varied codebooks, which involves the analysis of a group of similar but not identical structures. The method transforms general DCMs to rigorously periodic arrays by replacing each coding unit with the macro unit, which comprises all possible coding states. The system matrix corresponding to the rigorously periodic array is globally shared for DCMs with arbitrary codebooks via implicit retrieval. The discrepancy of the interactions for edge and corner units are precluded by the basis extension of periodic boundaries. Moreover, the hierarchical pattern exploitation (HPE) algorithm is leveraged to efficiently assemble the system matrix for further acceleration. Due to the fully utilization of the rigid periodicity, the computational complexity of R3M-HPE is theoretically lower than that of $\mathcal{H}$-matrix within the same paradigm. Numerical results for two types of DCMs indicate that R3M-HPE is accurate in comparison with commercial software. Besides, R3M-HPE is also compatible with the preconditioning for efficient iterative solutions. The efficiency of R3M-HPE for DCMs outperforms the conventional fast algorithms in both the storage and CPU time cost. △ Less

Submitted 20 July, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: under review

arXiv:2007.10309 [pdf]

Convolutional Image Edge Detection Using Ultrafast Photonic Spiking VCSEL Neurons

Authors: Joshua Robertson, Yahui Zhang, Matej Hejda, Andrew Adair, Julian Bueno, Shuiying Xiang, Antonio Hurtado

Abstract: We report experimentally and in theory on the detection of edge information in digital images using ultrafast spiking optical artificial neurons towards convolutional neural networks (CNNs). In tandem with traditional convolution techniques, a photonic neuron model based on a Vertical-Cavity Surface Emitting Laser (VCSEL) is implemented experimentally to threshold and activate fast spiking respons… ▽ More We report experimentally and in theory on the detection of edge information in digital images using ultrafast spiking optical artificial neurons towards convolutional neural networks (CNNs). In tandem with traditional convolution techniques, a photonic neuron model based on a Vertical-Cavity Surface Emitting Laser (VCSEL) is implemented experimentally to threshold and activate fast spiking responses upon the detection of target edge features in digital images. Edges of different directionalities are detected using individual kernel operators and complete image edge detection is achieved using gradient magnitude. Importantly, the neuromorphic (brain-like) image edge detection system of this work uses commercially sourced VCSELs exhibiting spiking responses at sub-nanosecond rates (many orders of magnitude faster than biological neurons) and operating at the telecom wavelength of 1300 nm; hence making our approach compatible with optical communication and data-center technologies. These results therefore have exciting prospects for ultrafast photonic implementations of neural networks towards computer vision and decision making systems for future artificial intelligence applications. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 8 pages, 6 figures, submitted to IEEE Journal of Selected Topics in Quantum Electronics - Special Issue: Optical Signal Processing

arXiv:2003.14034 [pdf, other]

SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings

Authors: Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, Chen Feng

Abstract: Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their "spatial intelligence"? To answer these questions, we present the… ▽ More Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their "spatial intelligence"? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors. The dataset and code are available at https://ai4ce.github.io/SPARE3D. △ Less

Submitted 2 September, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

Comments: This paper has been accepted in CVPR'20. The first two authors contributed equally. Chen Feng is the corresponding author

arXiv:1912.09425 [pdf, other]

Precipitation Forecasting via Multi-Scale Deconstructed ConvLSTM

Authors: Xinyu Xiao, Qiuming Kuang, Shiming Xiang, Junnan Hu, Chunhong Pan

Abstract: Numerical Weather Prediction (NWP), is widely used in precipitation forecasting, based on complex equations of atmospheric motion requires supercomputers to infer the state of the atmosphere. Due to the complexity of the task and the huge computation, this methodology has the problems of inefficiency and non-economic. With the rapid development of meteorological technology, the collection of plent… ▽ More Numerical Weather Prediction (NWP), is widely used in precipitation forecasting, based on complex equations of atmospheric motion requires supercomputers to infer the state of the atmosphere. Due to the complexity of the task and the huge computation, this methodology has the problems of inefficiency and non-economic. With the rapid development of meteorological technology, the collection of plentiful numerical meteorological data offers opportunities to develop data-driven models for NMP task. In this paper, we consider to combine NWP with deep learning. Firstly, to improve the spatiotemporal modeling of meteorological elements, a deconstruction mechanism and the multi-scale filters are composed to propose a multi-scale deconstructed ConvLSTM (MSD-ConvLSTM). The MSD-ConvLSTM captures and fuses the contextual information by multi-scale filters with low parameter consumption. Furthermore, an encoder-decoder is constructed to encode the features of multiple meteorological elements by deep CNN and decode the spatiotemporal information from different elements by the MSD-ConvLSTM. Our method demonstrates the data-driven way is significance for the weather prediction, which can be confirmed from the experimental results of precipitation forecasting on the European Centre Weather Forecasts (EC) and China Meteorological Forecasts (CM) datasets. △ Less

Submitted 9 January, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

Showing 1–13 of 13 results for author: Xiang, S