Search | arXiv e-print repository

Performance analysis for a rotary compressor at high speed: experimental study and mathematical modeling

Authors: Chuntai Zheng, Wei Zhao, Benshuai Lyu, Keke Gao, Hongjun Cao, Lei Zhong, Yi Gao, Ren Liao

Abstract: This paper conducted a comprehensive study on the performance of a rotary compressor over a rotational speed range of 80Hz to 200Hz through experimental tests and mathematical modeling. A compressor performance test rig was designed to conduct the performance tests, with fast-response pressure sensors and displacement sensors capturing the P-V diagram and dynamic motion of the moving components. R… ▽ More This paper conducted a comprehensive study on the performance of a rotary compressor over a rotational speed range of 80Hz to 200Hz through experimental tests and mathematical modeling. A compressor performance test rig was designed to conduct the performance tests, with fast-response pressure sensors and displacement sensors capturing the P-V diagram and dynamic motion of the moving components. Results show that the compressor efficiency degrades at high speeds due to the dominant loss factors of leakage and discharge power loss. Supercharging effects become significant at speeds above 160Hz, and its net effects reduce the compressor efficiency, especially at high speeds. This study identifies and analyzes the loss factors on the mass flow rate and power consumption based on experimental data, and hypothesizes possible mechanisms for each loss factor, which can aid in the design of a high-speed rotary compressor with higher efficiency. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.04675 [pdf, other]

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2404.19387 [pdf, other]

Online Electricity Purchase for Data Center with Dynamic Virtual Battery from Flexibility Aggregation

Authors: Kekun Gao, Yuejun Yan, Yixuan Liu, Endong Liu, Pengcheng You

Abstract: As a critical component of modern infrastructure, data centers account for a huge amount of power consumption and greenhouse gas emission. This paper studies the electricity purchase strategy for a data center to lower its energy cost while integrating local renewable generation under uncertainty. To facilitate efficient and scalable decision-making, we propose a two-layer hierarchy where the lowe… ▽ More As a critical component of modern infrastructure, data centers account for a huge amount of power consumption and greenhouse gas emission. This paper studies the electricity purchase strategy for a data center to lower its energy cost while integrating local renewable generation under uncertainty. To facilitate efficient and scalable decision-making, we propose a two-layer hierarchy where the lower layer consists of the operation of all electrical equipment in the data center and the upper layer determines the procurement and dispatch of electricity. At the lower layer, instead of device-level scheduling in real time, we propose to exploit the inherent flexibility in demand, such as thermostatically controlled loads and flexible computing tasks, and aggregate them into virtual batteries. By this means, the upper-layer decision only needs to take into account these virtual batteries, the size of which is generally small and independent of the data center scale. We further propose an online algorithm based on Lyapunov optimization to purchase electricity from the grid with a manageable energy cost, even though the prices, renewable availability, and battery specifications are uncertain and dynamic. In particular, we show that, under mild conditions, our algorithm can achieve bounded loss compared with the offline optimal cost, while strictly respecting battery operational constraints. Extensive simulation studies validate the theoretical analysis and illustrate the tradeoff between optimality and conservativeness. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.11070 [pdf, other]

Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

Authors: Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Kefu Gao, Jingnan Liu

Abstract: Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Full… ▽ More Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images . △ Less

Submitted 5 August, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2310.09937 [pdf, other]

Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: To appear in IEEE Sensors Journal

arXiv:2307.05499 [pdf, ps, other]

A Natural Lane Changing Decision Model For Mixed Traffic Flow Based On Extreme Value Theory

Authors: Jiali Peng, Wei Shangguan, Linguo Chai, Rui Luo, Ke Gao

Abstract: With the high frequency of highway accidents,studying how to use connected automated vehicle (CAV) to improve traffic efficiency and safety will become an important issue. In order to investigate how CAV can use the connected information for decision making, this study proposed a natural lane changing decision model for CAV to adapt the mixed traffic flow based on extreme value theory. Firstly, on… ▽ More With the high frequency of highway accidents,studying how to use connected automated vehicle (CAV) to improve traffic efficiency and safety will become an important issue. In order to investigate how CAV can use the connected information for decision making, this study proposed a natural lane changing decision model for CAV to adapt the mixed traffic flow based on extreme value theory. Firstly, on the bias of the mixed vehicle behavior analysis, the acceleration, deceleration, and randomization rules of the cellular automata model of mixed traffic flow in two lanes are developed. Secondly,the maximum value of CAV's lane change probability at each distance by extreme value distribution are modeled. Finally, a numerical simulation is conducted to analyze the trajectory-velocity diagram of mixed traffic flow, average travel time and average speed under different penetration rates of CAV. The result shows that our model can avoid the traffic risk well and significantly improve traffic efficiency and safety. △ Less

Submitted 3 August, 2023; v1 submitted 25 June, 2023; originally announced July 2023.

arXiv:2212.11772 [pdf, other]

A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Authors: Kaicheng Yang, Ruxuan Zhang, Hua Xu, Kai Gao

Abstract: Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn… ▽ More Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences. Different from previous works, our model not only makes full use of the interaction between different modalities but also maximizes the protection of the unimodal characteristics. Specifically, we first employ a crossmodal alignment module to project different modalities features to the same dimension. The crossmodal collaboration attention is then adopted to model the inter-modal interaction between text and audio sequences and initialize the fusion representations. After that, as the core unit of the SA-FRLM, the crossmodal adjustment transformer is proposed to protect original unimodal characteristics. It can dynamically adapt the fusion representations by using single modal streams. We evaluate our approach on the public multimodal sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experiment results show that our model has significantly improved the performance of all the metrics on the unaligned text-audio sequences. △ Less

Submitted 12 November, 2022; originally announced December 2022.

Comments: 8 pages

arXiv:2209.02604 [pdf, other]

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Authors: Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

Abstract: Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Un… ▽ More Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Under such circumstances, in this work, we emphasize making non-verbal cues matter for the MSA task. Firstly, from the resource perspective, we present the CH-SIMS v2.0 dataset, an extension and enhancement of the CH-SIMS. Compared with the original dataset, the CH-SIMS v2.0 doubles its size with another 2121 refined video segments with both unimodal and multimodal annotations and collects 10161 unlabelled raw video segments with rich acoustic and visual emotion-bearing context to highlight non-verbal cues for sentiment prediction. Secondly, from the model perspective, benefiting from the unimodal annotations and the unsupervised data in the CH-SIMS v2.0, the Acoustic Visual Mixup Consistent (AV-MC) framework is proposed. The designed modality mixup module can be regarded as an augmentation, which mixes the acoustic and visual modalities from different videos. Through drawing unobserved multimodal context along with the text, the model can learn to be aware of different non-verbal contexts for sentiment prediction. Our evaluations demonstrate that both CH-SIMS v2.0 and AV-MC framework enables further research for discovering emotion-bearing acoustic and visual cues and paves the path to interpretable end-to-end HCI applications for real-world scenarios. △ Less

Submitted 21 August, 2022; originally announced September 2022.

Comments: 16pages, 7 figures, accepted by ICMI 2022

arXiv:2202.09457 [pdf]

doi 10.1108/JICV-02-2022-0005

Merging Control Strategies of Connected and Autonomous Vehicles at Freeway On-Ramps: A Comprehensive Review

Authors: Jie Zhu, Said Easa, Kun Gao

Abstract: On-ramp merging areas are typical bottlenecks in the freeway network, since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to various negative impacts on traffic efficiency and safety. The connected and autonomous vehicles (CAVs), with their capabilities of real-time communication and precise motion control, hold a great potential to facilitate ramp… ▽ More On-ramp merging areas are typical bottlenecks in the freeway network, since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to various negative impacts on traffic efficiency and safety. The connected and autonomous vehicles (CAVs), with their capabilities of real-time communication and precise motion control, hold a great potential to facilitate ramp merging operation through enhanced coordination strategies. This paper presents a comprehensive review of the existing ramp merging strategies leveraging CAVs, focusing on the latest trends and developments in the research field. The review comprehensively covers 44 papers recently published in leading transportation journals. Based on the application context, control strategies are categorized into three categories: merging into sing-lane freeways with total CAVs, merging into sing-lane freeways with mixed traffic flows, and merging into multilane freeways. Relevant literature is reviewed regarding the required technologies, control decision level, applied methods, and impacts on traffic performance. More importantly, we identify the existing research gaps and provide insightful discussions on the potential and promising directions for future research based on the review, which facilitates further advancement in this research topic. △ Less

Submitted 22 March, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

arXiv:2104.10326 [pdf, other]

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

Authors: Jie Lian, Jingyu Liu, Shu Zhang, Kai Gao, Xiaoqing Liu, Dingwen Zhang, Yizhou Yu

Abstract: Instance level detection and segmentation of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN. The SAR-Net consists of three relation modules: 1. the anatomical structure relation module enc… ▽ More Instance level detection and segmentation of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN. The SAR-Net consists of three relation modules: 1. the anatomical structure relation module encoding spatial relations between diseases and anatomical parts. 2. the contextual relation module aggregating clues based on query-key pair of disease RoI and lung fields. 3. the disease relation module propagating co-occurrence and causal relations into disease proposals. Towards making a practical system, we also provide ChestX-Det, a chest X-Ray dataset with instance-level annotations (boxes and masks). ChestX-Det is a subset of the public dataset NIH ChestX-ray14. It contains ~3500 images of 13 common disease categories labeled by three board-certified radiologists. We evaluate our SAR-Net on it and another dataset DR-Private. Experimental results show that it can enhance the strong baseline of Mask R-CNN with significant improvements. The ChestX-Det is released at https://github.com/Deepwise-AILab/ChestX-Det-Dataset. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: This paper has been accepted by IEEE Transactions on Medical Imaging

arXiv:2103.09300 [pdf]

The impact of data volume on performance of deep learning based building rooftop extraction using very high spatial resolution aerial images

Authors: Hongjie He, Ke Yang, Yuwei Cai, Zijian Jiang, Qiutong Yu, Kun Zhao, Junbo Wang, Sarah Narges Fatholahi, Yan Liu, Hasti Andon Petrosians, Bingxu Hu, Liyuan Qing, Zhehan Zhang, Hongzhang Xu, Siyu Li, Kyle Gao, Linlin Xu, Jonathan Li

Abstract: Building rooftop data are of importance in several urban applications and in natural disaster management. In contrast to traditional surveying and mapping, by using high spatial resolution aerial images, deep learning-based building rooftops extraction methods are efficient and accurate. Although more training data is preferred in deep learning-based tasks, the effect of data volume on building ex… ▽ More Building rooftop data are of importance in several urban applications and in natural disaster management. In contrast to traditional surveying and mapping, by using high spatial resolution aerial images, deep learning-based building rooftops extraction methods are efficient and accurate. Although more training data is preferred in deep learning-based tasks, the effect of data volume on building extraction models is underexplored. Therefore, the paper explores the impact of data volume on the performance of building rooftop extraction from very-high-spatial-resolution (VHSR) images using deep learning-based methods. To do so, we manually labelled 0.12m spatial resolution aerial images and perform a comparative analysis of models trained on datasets of different sizes using popular deep learning architectures for segmentation tasks, including Fully Convolutional Networks (FCN)-8s, U-Net and DeepLabv3+. The experiments showed that with more training data, algorithms converged faster and achieved higher accuracy, while better algorithms were able to better mitigate the lack of training data. △ Less

Submitted 4 October, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

arXiv:2007.02096 [pdf]

doi 10.1109/TMI.2021.3055428

Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, Xiaoping Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue. △ Less

Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

Showing 1–12 of 12 results for author: Gao, K