Search | arXiv e-print repository

Multi-faceted Sensory Substitution for Curb Alerting: A Pilot Investigation in Persons with Blindness and Low Vision

Authors: Ligao Ruan, Giles Hamilton-Fletcher, Mahya Beheshti, Todd E Hudson, Maurizio Porfiri, JR Rizzo

Abstract: Curbs -- the edge of a raised sidewalk at the point where it meets a street -- crucial in urban environments where they help delineate safe pedestrian zones, from dangerous vehicular lanes. However, curbs themselves are significant navigation hazards, particularly for people who are blind or have low vision (pBLV). The challenges faced by pBLV in detecting and properly orientating themselves for t… ▽ More Curbs -- the edge of a raised sidewalk at the point where it meets a street -- crucial in urban environments where they help delineate safe pedestrian zones, from dangerous vehicular lanes. However, curbs themselves are significant navigation hazards, particularly for people who are blind or have low vision (pBLV). The challenges faced by pBLV in detecting and properly orientating themselves for these abrupt elevation changes can lead to falls and serious injuries. Despite recent advancements in assistive technologies, the detection and early warning of curbs remains a largely unsolved challenge. This paper aims to tackle this gap by introducing a novel, multi-faceted sensory substitution approach hosted on a smart wearable; the platform leverages an RGB camera and an embedded system to capture and segment curbs in real time and provide early warning and orientation information. The system utilizes YOLO (You Only Look Once) v8 segmentation model, trained on our custom curb dataset for the camera input. The output of the system consists of adaptive auditory beeps, abstract sonification, and speech, conveying information about the relative distance and orientation of curbs. Through human-subjects experimentation, we demonstrate the effectiveness of the system as compared to the white cane. Results show that our system can provide advanced warning through a larger safety window than the cane, while offering nearly identical curb orientation information. △ Less

Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.10533 [pdf, other]

FAGStyle: Feature Augmentation on Geodesic Surface for Zero-shot Text-guided Diffusion Image Style Transfer

Authors: Yuexing Han, Liheng Ruan, Bing Wang

Abstract: The goal of image style transfer is to render an image guided by a style reference while maintaining the original content. Existing image-guided methods rely on specific style reference images, restricting their wider application and potentially compromising result quality. As a flexible alternative, text-guided methods allow users to describe the desired style using text prompts. Despite their ve… ▽ More The goal of image style transfer is to render an image guided by a style reference while maintaining the original content. Existing image-guided methods rely on specific style reference images, restricting their wider application and potentially compromising result quality. As a flexible alternative, text-guided methods allow users to describe the desired style using text prompts. Despite their versatility, these methods often struggle with maintaining style consistency, reflecting the described style accurately, and preserving the content of the target image. To address these challenges, we introduce FAGStyle, a zero-shot text-guided diffusion image style transfer method. Our approach enhances inter-patch information interaction by incorporating the Sliding Window Crop technique and Feature Augmentation on Geodesic Surface into our style control loss. Furthermore, we integrate a Pre-Shape self-correlation consistency loss to ensure content consistency. FAGStyle demonstrates superior performance over existing methods, consistently achieving stylization that retains the semantic content of the source image. Experimental results confirms the efficacy of FAGStyle across a diverse range of source contents and styles, both imagined and common. △ Less

Submitted 20 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.09413 [pdf, ps, other]

Error minimization for fidelity estimation of GHZ states with arbitrary noise

Authors: Liangzhong Ruan

Abstract: Fidelity estimation is a crucial component for the quality control of entanglement distribution networks. This work studies a scenario in which multiple nodes share noisy Greenberger-Horne-Zeilinger (GHZ) states. Due to the collapsing nature of quantum measurements, the nodes randomly sample a subset of noisy GHZ states for measurement and then estimate the average fidelity of the unsampled states… ▽ More Fidelity estimation is a crucial component for the quality control of entanglement distribution networks. This work studies a scenario in which multiple nodes share noisy Greenberger-Horne-Zeilinger (GHZ) states. Due to the collapsing nature of quantum measurements, the nodes randomly sample a subset of noisy GHZ states for measurement and then estimate the average fidelity of the unsampled states conditioned on the measurement outcome. By developing a fidelity-preserving diagonalization operation, analyzing the Bloch representation of GHZ states, and maximizing the Fisher information, the proposed estimation protocol achieves the minimum mean squared estimation error in a challenging scenario characterized by arbitrary noise and the absence of prior information. Additionally, this protocol is implementation-friendly as it only uses local Pauli operators according to a predefined sequence. Numerical studies demonstrate that, compared to existing fidelity estimation protocols, the proposed protocol reduces estimation errors in both scenarios involving independent and identically distributed (i.i.d.) noise and correlated noise. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2406.12324 [pdf, other]

AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints

Authors: Yu-Zhe Shi, Haofei Hou, Zhangqian Bi, Fanxu Meng, Xiang Wei, Lecheng Ruan, Qining Wang

Abstract: Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the A… ▽ More Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the AutoDSL framework to automate DSL-based constraint design across various domains. Utilizing domain specified experimental protocol corpora, AutoDSL optimizes syntactic constraints and abstracts semantic constraints. Quantitative and qualitative analyses of the DSLs designed by AutoDSL across five distinct domains highlight its potential as an auxiliary module for language models, aiming to improve procedural planning and execution. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL'24)

arXiv:2403.01694 [pdf, other]

Tac-Man: Tactile-Informed Prior-Free Manipulation of Articulated Objects

Authors: Zihang Zhao, Yuyang Li, Wanlin Li, Zhenghao Qi, Lecheng Ruan, Yixin Zhu, Kaspar Althoefer

Abstract: Integrating robotics into human-centric environments such as homes, necessitates advanced manipulation skills as robotic devices will need to engage with articulated objects like doors and drawers. Key challenges in robotic manipulation are the unpredictability and diversity of these objects' internal structures, which render models based on priors, both explicit and implicit, inadequate. Their re… ▽ More Integrating robotics into human-centric environments such as homes, necessitates advanced manipulation skills as robotic devices will need to engage with articulated objects like doors and drawers. Key challenges in robotic manipulation are the unpredictability and diversity of these objects' internal structures, which render models based on priors, both explicit and implicit, inadequate. Their reliability is significantly diminished by pre-interaction ambiguities, imperfect structural parameters, encounters with unknown objects, and unforeseen disturbances. Here, we present a prior-free strategy, Tac-Man, focusing on maintaining stable robot-object contact during manipulation. Utilizing tactile feedback, but independent of object priors, Tac-Man enables robots to proficiently handle a variety of articulated objects, including those with complex joints, even when influenced by unexpected disturbances. Demonstrated in both real-world experiments and extensive simulations, it consistently achieves near-perfect success in dynamic and varied settings, outperforming existing methods. Our results indicate that tactile sensing alone suffices for managing diverse articulated objects, offering greater robustness and generalization than prior-based approaches. This underscores the importance of detailed contact modeling in complex manipulation tasks, especially with articulated objects. Advancements in tactile sensors significantly expand the scope of robotic applications in human-centric environments, particularly where accurate models are difficult to obtain. △ Less

Submitted 7 July, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 20 pages, 16 figures, 3 tables

arXiv:2401.09084 [pdf, other]

UniVG: Towards UNIfied-modal Video Generation

Authors: Ludan Ruan, Lei Tian, Chuanwei Huang, Xu Zhang, Xinyan Xiao

Abstract: Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application sc… ▽ More Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either individually or in combination. To address this, we propose a Unified-modal Video Genearation system that is capable of handling multiple video generation tasks across text and image modalities. To this end, we revisit the various video generation tasks within our system from the perspective of generative freedom, and classify them into high-freedom and low-freedom video generation categories. For high-freedom video generation, we employ Multi-condition Cross Attention to generate videos that align with the semantics of the input images or text. For low-freedom video generation, we introduce Biased Gaussian Noise to replace the pure random Gaussian Noise, which helps to better preserve the content of the input conditions. Our method achieves the lowest Fréchet Video Distance (FVD) on the public academic benchmark MSR-VTT, surpasses the current open-source methods in human evaluations, and is on par with the current close-source method Gen2. For more samples, visit https://univg-baidu.github.io. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.01749 [pdf, other]

Few-shot Image Generation via Information Transfer from the Built Geodesic Surface

Authors: Yuexing Han, Liheng Ruan, Bing Wang

Abstract: Images generated by most of generative models trained with limited data often exhibit deficiencies in either fidelity, diversity, or both. One effective solution to address the limitation is few-shot generative model adaption. However, the type of approaches typically rely on a large-scale pre-trained model, serving as a source domain, to facilitate information transfer to the target domain. In th… ▽ More Images generated by most of generative models trained with limited data often exhibit deficiencies in either fidelity, diversity, or both. One effective solution to address the limitation is few-shot generative model adaption. However, the type of approaches typically rely on a large-scale pre-trained model, serving as a source domain, to facilitate information transfer to the target domain. In this paper, we propose a method called Information Transfer from the Built Geodesic Surface (ITBGS), which contains two module: Feature Augmentation on Geodesic Surface (FAGS); Interpolation and Regularization (I\&R). With the FAGS module, a pseudo-source domain is created by projecting image features from the training dataset into the Pre-Shape Space, subsequently generating new features on the Geodesic surface. Thus, no pre-trained models is needed for the adaption process during the training of generative models with FAGS. I\&R module are introduced for supervising the interpolated images and regularizing their relative distances, respectively, to further enhance the quality of generated images. Through qualitative and quantitative experiments, we demonstrate that the proposed method consistently achieves optimal or comparable results across a diverse range of semantically distinct datasets, even in extremely few-shot scenarios. △ Less

Submitted 2 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2309.13626 [pdf]

Crack-Net: Prediction of Crack Propagation in Composites

Authors: Hao Xu, Wei Fan, Ambrose C. Taylor, Dongxiao Zhang, Lecheng Ruan, Rundong Shi

Abstract: Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expens… ▽ More Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expensive and time-consuming, limiting their application in computation-intensive scenarios. Here we propose a deep learning framework called Crack-Net, which incorporates the relationship between crack evolution and stress response to predict the fracture process in composites. Trained on a high-precision fracture development dataset generated using the phase field method, Crack-Net demonstrates a remarkable capability to accurately forecast the long-term evolution of crack growth patterns and the stress-strain curve for a given composite design. The Crack-Net captures the essential principle of crack growth, which enables it to handle more complex microstructures such as binary co-continuous structures. Moreover, transfer learning is adopted to further improve the generalization ability of Crack-Net for composite materials with reinforcements of different strengths. The proposed Crack-Net holds great promise for practical applications in engineering and materials science, in which accurate and efficient fracture prediction is crucial for optimizing material performance and microstructural design. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2305.09145 [pdf, other]

Deep ReLU Networks Have Surprisingly Simple Polytopes

Authors: Feng-Lei Fan, Wei Huang, Xiangru Zhong, Lecheng Ruan, Tieyong Zeng, Huan Xiong, Fei Wang

Abstract: A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new… ▽ More A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new level, here we propose to study the shapes of polytopes via the number of simplices obtained by triangulating the polytope. Then, by computing and analyzing the histogram of simplices across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes theoretically can be rather diverse and complicated. This finding can be appreciated as a novel implicit bias. Next, we use nontrivial combinatorial derivation to theoretically explain why adding depth does not create a more complicated polytope by bounding the average number of faces of polytopes with a function of the dimensionality. Our results concretely reveal what kind of simple functions a network learns and its space partition property. Also, by characterizing the shape of polytopes, the number of simplices be a leverage for other problems, \textit{e.g.}, serving as a generic functional complexity measure to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.07814 [pdf, other]

Cloud-RAIN: Point Cloud Analysis with Reflectional Invariance

Authors: Yiming Cui, Lecheng Ruan, Hang-Cheng Dong, Qiang Li, Zhongming Wu, Tieyong Zeng, Feng-Lei Fan

Abstract: The networks for point cloud tasks are expected to be invariant when the point clouds are affinely transformed such as rotation and reflection. So far, relative to the rotational invariance that has been attracting major research attention in the past years, the reflection invariance is little addressed. Notwithstanding, reflection symmetry can find itself in very common and important scenarios, e… ▽ More The networks for point cloud tasks are expected to be invariant when the point clouds are affinely transformed such as rotation and reflection. So far, relative to the rotational invariance that has been attracting major research attention in the past years, the reflection invariance is little addressed. Notwithstanding, reflection symmetry can find itself in very common and important scenarios, e.g., static reflection symmetry of structured streets, dynamic reflection symmetry from bidirectional motion of moving objects (such as pedestrians), and left- and right-hand traffic practices in different countries. To the best of our knowledge, unfortunately, no reflection-invariant network has been reported in point cloud analysis till now. To fill this gap, we propose a framework by using quadratic neurons and PCA canonical representation, referred to as Cloud-RAIN, to endow point \underline{Cloud} models with \underline{R}eflection\underline{A}l \underline{IN}variance. We prove a theorem to explain why Cloud-RAIN can enjoy reflection symmetry. Furthermore, extensive experiments also corroborate the reflection property of the proposed Cloud-RAIN and show that Cloud-RAIN is superior to data augmentation. Our code is available at https://github.com/YimingCuiCuiCui/Cloud-RAIN. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2304.11837 [pdf, other]

Fault-tolerant Control of an Over-actuated UAV Platform Built on Quadcopters and Passive Hinges

Authors: Yao Su, Pengkang Yu, Matthew J. Gerber, Lecheng Ruan, Tsu-Chin Tsao

Abstract: Propeller failure is a major cause of multirotor Unmanned Aerial Vehicles (UAVs) crashes. While conventional multirotor systems struggle to address this issue due to underactuation, over-actuated platforms can continue flying with appropriate fault-tolerant control (FTC). This paper presents a robust FTC controller for an over-actuated UAV platform composed of quadcopters mounted on passive joints… ▽ More Propeller failure is a major cause of multirotor Unmanned Aerial Vehicles (UAVs) crashes. While conventional multirotor systems struggle to address this issue due to underactuation, over-actuated platforms can continue flying with appropriate fault-tolerant control (FTC). This paper presents a robust FTC controller for an over-actuated UAV platform composed of quadcopters mounted on passive joints, offering input redundancy at both the high-level vehicle control and the low-level quadcopter control of vectored thrusts. To maximize the benefits of input redundancy during propeller failure, the proposed FTC controller features a hierarchical control architecture with three key components: (i) a low-level adjustment strategy to prevent propeller-level thrust saturation; (ii) a compensation loop for mitigating introduced disturbances; (iii) a nullspace-based control allocation framework to avoid quadcopter-level thrust saturation. Through reallocating actuator inputs in both the low-level and high-level control loops, the low-level quadcopter control can be maintained with up to two failed propellers, ensuring that the whole platform remains stable and avoids crashing. The proposed controller's superior performance is thoroughly examined through simulations and real-world experiments. △ Less

Submitted 14 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.06591 [pdf, other]

Accommodating Audio Modality in CLIP for Multimodal Processing

Authors: Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin

Abstract: Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio mul… ▽ More Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing. Specifically, we apply inter-modal and intra-modal contrastive learning to explore the correlation between audio and other modalities in addition to the inner characteristics of the audio modality. Moreover, we further design an audio type token to dynamically learn different audio information type for different scenarios, as both verbal and nonverbal heterogeneous information is conveyed in general audios. Our proposed CLIP4VLA model is validated in different downstream tasks including video retrieval and video captioning, and achieves the state-of-the-art performance on the benchmark datasets of MSR-VTT, VATEX, and Audiocaps. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI2023

arXiv:2303.06316 [pdf, other]

One Neuron Saved Is One Neuron Earned: On Parametric Efficiency of Quadratic Networks

Authors: Feng-Lei Fan, Hang-Cheng Dong, Zhongming Wu, Lecheng Ruan, Tieyong Zeng, Yiming Cui, Jing-Xiao Liao

Abstract: Inspired by neuronal diversity in the biological neural system, a plethora of studies proposed to design novel types of artificial neurons and introduce neuronal diversity into artificial neural networks. Recently proposed quadratic neuron, which replaces the inner-product operation in conventional neurons with a quadratic one, have achieved great success in many essential tasks. Despite the promi… ▽ More Inspired by neuronal diversity in the biological neural system, a plethora of studies proposed to design novel types of artificial neurons and introduce neuronal diversity into artificial neural networks. Recently proposed quadratic neuron, which replaces the inner-product operation in conventional neurons with a quadratic one, have achieved great success in many essential tasks. Despite the promising results of quadratic neurons, there is still an unresolved issue: \textit{Is the superior performance of quadratic networks simply due to the increased parameters or due to the intrinsic expressive capability?} Without clarifying this issue, the performance of quadratic networks is always suspicious. Additionally, resolving this issue is reduced to finding killer applications of quadratic networks. In this paper, with theoretical and empirical studies, we show that quadratic networks enjoy parametric efficiency, thereby confirming that the superior performance of quadratic networks is due to the intrinsic expressive capability. This intrinsic expressive ability comes from that quadratic neurons can easily represent nonlinear interaction, while it is hard for conventional neurons. Theoretically, we derive the approximation efficiency of the quadratic network over conventional ones in terms of real space and manifolds. Moreover, from the perspective of the Barron space, we demonstrate that there exists a functional space whose functions can be approximated by quadratic networks in a dimension-free error, but the approximation error of conventional networks is dependent on dimensions. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that quadratic models broadly enjoy parametric efficiency, and the gain of efficiency depends on the task. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: We have shared our code in https://github.com/asdvfghg/quadratic_efficiency

arXiv:2302.02234 [pdf, other]

Revisiting Image Deblurring with an Efficient ConvNet

Authors: Lingyan Ruan, Mojtaba Bemana, Hans-peter Seidel, Karol Myszkowski, Bin Chen

Abstract: Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-he… ▽ More Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-head self-attention (MHSA) mechanism, which offers a larger receptive field and better input content adaptability than CNNs. However, as MHSA demands high computational costs that grow quadratically with respect to the input resolution, it becomes impractical for high-resolution image deblurring tasks. In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs. Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure, attaining comparable or larger ERF than Transformers but with a smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments demonstrate the superior performance of our network and the effectiveness of each module. Furthermore, we propose a compact and intuitive ERFMeter metric that quantitatively characterizes ERF, and shows a high correlation to the network performance. We hope this work can inspire the research community to further explore the pros and cons of CNN and Transformer architectures beyond image deblurring tasks. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: 30 pages (12 pages for the main manuscript and 18 for the supplementary materials)

arXiv:2301.05880 [pdf, other]

doi 10.1145/3581783.3612425

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

Authors: Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

Abstract: To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which h… ▽ More To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of TikTalk, propose a video-based multi-modal chitchat task, and evaluate several dialogue baselines. Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall. Furthermore, no existing model can solve all the above challenges well. There is still a large room for future improvements, even for LLM with visual extensions. Our dataset is available at \url{https://ruc-aimind.github.io/projects/TikTalk/}. △ Less

Submitted 8 September, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

Comments: Accepted to ACM Multimedia 2023

arXiv:2212.09478 [pdf, other]

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Authors: Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

Abstract: We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a… ▽ More We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion. △ Less

Submitted 24 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted by CVPR 2023

arXiv:2204.00367 [pdf, other]

Learning to Deblur using Light Field Generated and Real Defocus Images

Authors: Lingyan Ruan, Bin Chen, Jizhou Li, Miuling Lam

Abstract: Defocus deblurring is a challenging task due to the spatially varying nature of defocus blur. While deep learning approach shows great promise in solving image restoration problems, defocus deblurring demands accurate training data that consists of all-in-focus and defocus image pairs, which is difficult to collect. Naive two-shot capturing cannot achieve pixel-wise correspondence between the defo… ▽ More Defocus deblurring is a challenging task due to the spatially varying nature of defocus blur. While deep learning approach shows great promise in solving image restoration problems, defocus deblurring demands accurate training data that consists of all-in-focus and defocus image pairs, which is difficult to collect. Naive two-shot capturing cannot achieve pixel-wise correspondence between the defocused and all-in-focus image pairs. Synthetic aperture of light fields is suggested to be a more reliable way to generate accurate image pairs. However, the defocus blur generated from light field data is different from that of the images captured with a traditional digital camera. In this paper, we propose a novel deep defocus deblurring network that leverages the strength and overcomes the shortcoming of light fields. We first train the network on a light field-generated dataset for its highly accurate image correspondence. Then, we fine-tune the network using feature loss on another dataset collected by the two-shot method to alleviate the differences between the defocus blur exists in the two domains. This strategy is proved to be highly effective and able to achieve the state-of-the-art performance both quantitatively and qualitatively on multiple test sets. Extensive ablation studies have been conducted to analyze the effect of each network module to the final performance. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: CVPR 2022 Oral

arXiv:2109.09920 [pdf, other]

Survey: Transformer based Video-Language Pre-training

Authors: Ludan Ruan, Qin Jin

Abstract: Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have begun to apply transformer to video processing. This survey aims to give a comprehensive overview on transformer-based pre-training methods for Video-Language learning. We first briefly introduce the transformer tructure as the background knowledge, includ… ▽ More Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have begun to apply transformer to video processing. This survey aims to give a comprehensive overview on transformer-based pre-training methods for Video-Language learning. We first briefly introduce the transformer tructure as the background knowledge, including attention mechanism, position encoding etc. We then describe the typical paradigm of pre-training & fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets. Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances. Finally, we analyze and discuss the current challenges and possible future research directions for Video-Language pre-training. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2108.10572 [pdf, other]

Optimal UAV Hitching on Ground Vehicles

Authors: Lihua Ruan, Lingjie Duan, Jianwei Huang

Abstract: Due to its mobility and agility, unmanned aerial vehicle (UAV) has emerged as a promising technology for various tasks, such as sensing, inspection and delivery. However, a typical UAV has limited energy storage and cannot fly a long distance without being recharged. This motivates several existing proposals to use trucks and other ground vehicles to offer riding to help UAVs save energy and expan… ▽ More Due to its mobility and agility, unmanned aerial vehicle (UAV) has emerged as a promising technology for various tasks, such as sensing, inspection and delivery. However, a typical UAV has limited energy storage and cannot fly a long distance without being recharged. This motivates several existing proposals to use trucks and other ground vehicles to offer riding to help UAVs save energy and expand the operation radius. We present the first theoretical study regarding how UAVs should optimally hitch on ground vehicles, considering vehicles' different travelling patterns and supporting capabilities. For a single UAV, we derive closed-form optimal vehicle selection and hitching strategy. When vehicles only support hitching, a UAV would prefer the vehicle that can carry it closest to its final destination. When vehicles can offer hitching plus charging, the UAV may hitch on a vehicle that carries it farther away from its destination and hitch a longer distance. The UAV may also prefer to hitch on a slower vehicle for the benefit of battery recharging. For multiple UAVs in need of hitching, we develop the max-saving algorithm (MSA) to optimally match UAV-vehicle collaboration. We prove that the MSA globally optimizes the total hitching benefits for the UAVs. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2107.13717 [pdf, other]

Maximize the Foot Clearance for a Hopping Robotic Leg Considering Motor Saturation

Authors: Juntong Su, Bingchen Jin, Shusheng Ye, Lecheng Ruan, Caiming Sun, Ning Ding, Yili Fu, Jianwen Luo

Abstract: A hopping leg, no matter in legged animals or humans, usually behaves like a spring during the periodic hopping. Hopping like a spring is efficient and without the requirement of complicated control algorithms. Position and force control are two main methods to realize such a spring-like behaviour. The position control usually consumes the torque resources to ensure the position accuracy and compe… ▽ More A hopping leg, no matter in legged animals or humans, usually behaves like a spring during the periodic hopping. Hopping like a spring is efficient and without the requirement of complicated control algorithms. Position and force control are two main methods to realize such a spring-like behaviour. The position control usually consumes the torque resources to ensure the position accuracy and compensate the tracking errors. In comparison, the force control strategy is able to maintain a high elasticity. Currently, the position and force control both leads to the discount of motor saturation ratio as well as the bandwidth of the control system, and thus attenuates the performance of the actuator. To augment the performance, this letter proposes a motor saturation strategy based on the force control to maximize the output torque of the actuator and realize the continuous hopping motion with natural dynamics. The proposed strategy is able to maximize the saturation ratio of motor and thus maximize the foot clearance of the single leg. The dynamics of the two-mass model is utilized to increase the force bandwidth and the performance of the actuator. A single leg with two degrees of freedom is designed as the experiment platform. The actuator consists of a powerful electric motor, a harmonic gear and encoder. The effectiveness of this method is verified through simulations and experiments using a robotic leg actuated by powerful high reduction ratio actuators. △ Less

Submitted 29 July, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.12479 [pdf]

doi 10.3389/frobt.2021.724138

Terrain-perception-free Quadrupedal Spinning Locomotion on Versatile Terrains: Modeling, Analysis, and Experimental Validation

Authors: Hongwu Zhu, Dong Wang, Nathan Boyd, Ziyi Zhou, Lecheng Ruan, Aidong Zhang, Ning Ding, Ye Zhao, Jianwen Luo

Abstract: Dynamic quadrupedal locomotion over rough terrains reveals remarkable progress over the last few decades. Small-scale quadruped robots are adequately flexible and adaptable to traverse uneven terrains along sagittal direction, such as slopes and stairs. To accomplish autonomous locomotion navigation in complex environments, spinning is a fundamental yet indispensable functionality for legged robot… ▽ More Dynamic quadrupedal locomotion over rough terrains reveals remarkable progress over the last few decades. Small-scale quadruped robots are adequately flexible and adaptable to traverse uneven terrains along sagittal direction, such as slopes and stairs. To accomplish autonomous locomotion navigation in complex environments, spinning is a fundamental yet indispensable functionality for legged robots. However, spinning behaviors of quadruped robots on uneven terrain often exhibit position drifts. Motivated by this problem, this study presents an algorithmic method to enable accurate spinning motions over uneven terrain and constrain the spinning radius of the Center of Mass (CoM) to be bounded within a small range to minimize the drift risks. A modified spherical foot kinematics representation is proposed to improve the foot kinematic model and rolling dynamics of the quadruped during locomotion. A CoM planner is proposed to generate stable spinning motion based on projected stability margins. Accurate motion tracking is accomplished with Linear Quadratic Regulator (LQR) to bound the position drift during the spinning movement. Experiments are conducted on a small-scale quadruped robot and the effectiveness of the proposed method is verified on versatile terrains including flat ground, stairs and slopes. △ Less

Submitted 26 October, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

Journal ref: Frontier in Robotics and AI. 2021

arXiv:2106.06138 [pdf, other]

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Authors: Ludan Ruan, Jieting Chen, Yuqing Song, Shizhe Chen, Qin Jin

Abstract: Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which limits the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to bo… ▽ More Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which limits the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to boost the whole system performance. For the caption generation, we propose a Unified Multi-modal Pre-training Model (UMPM) to generate event descriptions with rich objects for better localization. For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful. Our overall system achieves the state-of-the-art performances on both sub-tasks in Entities Object Localization challenge at Activitynet 2021, with 72.57 localization accuracy on the testing set of sub-task I and 0.2477 F1_all_per_sent on the hidden testing set of sub-task II. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 6 pages, 4 figures

arXiv:2105.08471 [pdf, other]

doi 10.1145/3450626.3459862

Solid-Fluid Interaction with Surface-Tension-Dominant Contact

Authors: Liangwang Ruan, Jinyuan Liu, Bo Zhu, Shinjiro Sueda, Bin Wang, Baoquan Chen

Abstract: We propose a novel three-way coupling method to model the contact interaction between solid and fluid driven by strong surface tension. At the heart of our physical model is a thin liquid membrane that simultaneously couples to both the liquid volume and the rigid objects, facilitating accurate momentum transfer, collision processing, and surface tension calculation. This model is implemented nume… ▽ More We propose a novel three-way coupling method to model the contact interaction between solid and fluid driven by strong surface tension. At the heart of our physical model is a thin liquid membrane that simultaneously couples to both the liquid volume and the rigid objects, facilitating accurate momentum transfer, collision processing, and surface tension calculation. This model is implemented numerically under a hybrid Eulerian-Lagrangian framework where the membrane is modelled as a simplicial mesh and the liquid volume is simulated on a background Cartesian grid. We devise a monolithic solver to solve the interactions among the three systems of liquid, solid, and membrane. We demonstrate the efficacy of our method through an array of rigid-fluid contact simulations dominated by strong surface tension, which enables the faithful modeling of a host of new surface-tension-dominant phenomena including: objects with higher density than water can keep afloat on top of it; 'Cheerios effect' about floating objects that do not normally float attract one another; surface tension weakening effect caused by surface-active constituents. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 120. Publication date: August 2021

arXiv:2104.00273 [pdf, other]

Perspective, Survey and Trends: Public Driving Datasets and Toolsets for Autonomous Driving Virtual Test

Authors: Pengliang Ji, Li Ruan, Yunzhi Xue, Limin Xiao, Qian Dong

Abstract: Owing to the merits of early safety and reliability guarantee, autonomous driving virtual testing has recently gains increasing attention compared with closed-loop testing in real scenarios. Although the availability and quality of autonomous driving datasets and toolsets are the premise to diagnose the autonomous driving system bottlenecks and improve the system performance, due to the diversity… ▽ More Owing to the merits of early safety and reliability guarantee, autonomous driving virtual testing has recently gains increasing attention compared with closed-loop testing in real scenarios. Although the availability and quality of autonomous driving datasets and toolsets are the premise to diagnose the autonomous driving system bottlenecks and improve the system performance, due to the diversity and privacy of the datasets and toolsets, collecting and featuring the perspective and quality of them become not only time-consuming but also increasingly challenging. This paper first proposes a Systematic Literature review approach for Autonomous driving tests (SLA), then presents an overview of existing publicly available datasets and toolsets from 2000 to 2020. Quantitative findings with the scenarios concerned, perspectives and trend inferences and suggestions with 35 automated driving test tool sets and 70 test data sets are also presented. To the best of our knowledge, we are the first to perform such recent empirical survey on both the datasets and toolsets using a SLA based survey approach. Our multifaceted analyses and new findings not only reveal insights that we believe are useful for system designers, practitioners and users, but also can promote more researches on a systematic survey analysis in autonomous driving surveys on dataset and toolsets. △ Less

Submitted 30 June, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: 6 pages, 4 figures. Accepted to 24th IEEE International Conference on Intelligent Transportation - ITSC2021

arXiv:2006.00812 [pdf]

Fog Computing for Smart Grids: Challenges and Solutions

Authors: Linna Ruan, Shaoyong Guo, Xuesong Qiu, Rajkumar Buyya

Abstract: Smart grids (SGs) enable integration of diverse power sources including renewable energy resources. They can contribute to the reduction of harmful gas emission, and support two-way information flow to enhance energy efficiency, along with small-scale Microgrids, acting as a promising solution to cope with environmental problems. However, with the emerging of mission-critical and delay-sensitive a… ▽ More Smart grids (SGs) enable integration of diverse power sources including renewable energy resources. They can contribute to the reduction of harmful gas emission, and support two-way information flow to enhance energy efficiency, along with small-scale Microgrids, acting as a promising solution to cope with environmental problems. However, with the emerging of mission-critical and delay-sensitive applications, traditional Cloud-based data processing mode becomes less satisfying. The use of Fog computing to empower the edge-side processing capability of Smart grid systems is considered as a potential solution to address the problem. In this chapter, we aim to offer a comprehensive analysis of application of Fog computing in Smart grids. We begin with an overview of Smart grids and Fog computing. Then, by surveying the existing research, we summarize the main Fog computing enabled Smart grid applications, key problems and the possible methods. We conclude the chapter by discussing the research challenges and future directions. △ Less

Submitted 31 August, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 36 pages, 3 figures

arXiv:2004.05573 [pdf, other]

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

Authors: Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin

Abstract: The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is \textbf{Facial Image Ordering}, which aims to understand visual effects of different actions ex… ▽ More The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is \textbf{Facial Image Ordering}, which aims to understand visual effects of different actions expressed in natural language to the facial object. The second task is \textbf{Step Ordering}, which aims to measure cross-modal semantic alignments between untrimmed videos and multi-sentence texts. In this paper, we present the challenge guidelines, the dataset used, and performances of baseline models on the two proposed tasks. The baseline codes and models are released at \url{https://github.com/AIM3-RUC/YouMakeup_Baseline}. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: CVPR LVVU Workshop 2020 YouMakeup VQA Challenge

arXiv:2002.09334 [pdf]

doi 10.1016/j.eng.2020.04.010

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia

Authors: Xiaowei Xu, Xiangao Jiang, Chunlian Ma, Peng Du, Xukun Li, Shuangzhi Lv, Liang Yu, Yanfei Chen, Junwei Su, Guanjing Lang, Yongtao Li, Hong Zhao, Kaijin Xu, Lingxiang Ruan, Wei Wu

Abstract: We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of v… ▽ More We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of viral pneumonia, such as Influenza-A viral pneumonia. Therefore, clinical doctors call for another early diagnostic criteria for this new type of pneumonia as soon as possible.This study aimed to establish an early screening model to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with pulmonary CT images using deep learning techniques. The candidate infection regions were first segmented out using a 3-dimensional deep learning model from pulmonary CT image set. These separated images were then categorized into COVID-19, Influenza-A viral pneumonia and irrelevant to infection groups, together with the corresponding confidence scores using a location-attention classification model. Finally the infection type and total confidence score of this CT case were calculated with Noisy-or Bayesian function.The experiments result of benchmark dataset showed that the overall accuracy was 86.7 % from the perspective of CT cases as a whole.The deep learning models established in this study were effective for the early screening of COVID-19 patients and demonstrated to be a promising supplementary diagnostic method for frontline clinical doctors. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Journal ref: Engineering, Volume 6, Issue 10, October 2020, Pages 1122-1129

arXiv:1812.00896 [pdf, other]

A Coalition-Based Communication Framework for Task-Driven Flying Ad-Hoc Networks

Authors: Dianxiong Liu, Jin Chen, Hong Li, Yang Yang, Lang Ruan, Yuli Zhang, Yuhua Xu

Abstract: In this paper, we develop a task-driven networking framework for Flying Ad-hoc Networks (FANETs), where a coalition-based model is outlined. Firstly, we present a brief survey to show the state-of-the-art studies on the intra-communication of unmanned aerial vehicle (UAV) swarms. The features and deficiencies of existing models are analyzed. To capture the task-driven requirement of the flying mul… ▽ More In this paper, we develop a task-driven networking framework for Flying Ad-hoc Networks (FANETs), where a coalition-based model is outlined. Firstly, we present a brief survey to show the state-of-the-art studies on the intra-communication of unmanned aerial vehicle (UAV) swarms. The features and deficiencies of existing models are analyzed. To capture the task-driven requirement of the flying multi-agent system, a coalition-based framework is proposed. We discuss the composition, networking mode and the classification of data transmission. After that, the application scenario of UAV coalitions is given, where large-scale, distributed and highly dynamic characteristics greatly increase the difficulty of resource optimization for UAVs. To tackle the problem, we design an intelligence-based optimization architecture, which mainly includes the game model, machine learning and real-time decision. Under the guidance of game theories and machine learning, UAVs can make comprehensive decisions by combining the previous training results with their sensing, information interaction, and game strategies. Finally, a preliminary case and promising open issues of UAV coalitions are studied. △ Less

Submitted 28 June, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

Comments: 7 pages, 5 figures

arXiv:1806.02326 [pdf, other]

Conditional Linear Regression

Authors: Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan

Abstract: Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predict… ▽ More Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit. △ Less

Submitted 9 July, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

ACM Class: I.2.6; F.2.1

arXiv:1803.00663 [pdf]

doi 10.1016/j.compmedimag.2018.09.004

SD-CNN: a Shallow-Deep CNN for Improved Breast Cancer Diagnosis

Authors: Fei Gao, Teresa Wu, Jing Li, Bin Zheng, Lingxiang Ruan, Desheng Shang, Bhavika Patel

Abstract: Breast cancer is the second leading cause of cancer death among women worldwide. Nevertheless, it is also one of the most treatable malignances if detected early. Screening for breast cancer with digital mammography (DM) has been widely used. However it demonstrates limited sensitivity for women with dense breasts. An emerging technology in the field is contrast-enhanced digital mammography (CEDM)… ▽ More Breast cancer is the second leading cause of cancer death among women worldwide. Nevertheless, it is also one of the most treatable malignances if detected early. Screening for breast cancer with digital mammography (DM) has been widely used. However it demonstrates limited sensitivity for women with dense breasts. An emerging technology in the field is contrast-enhanced digital mammography (CEDM), which includes a low energy (LE) image similar to DM, and a recombined image leveraging tumor neoangiogenesis similar to breast magnetic resonance imaging (MRI). CEDM has shown better diagnostic accuracy than DM. While promising, CEDM is not yet widely available across medical centers. In this research, we propose a Shallow-Deep Convolutional Neural Network (SD-CNN) where a shallow CNN is developed to derive "virtual" recombined images from LE images, and a deep CNN is employed to extract novel features from LE, recombined or "virtual" recombined images for ensemble models to classify the cases as benign vs. cancer. To evaluate the validity of our approach, we first develop a deep-CNN using 49 CEDM cases collected from Mayo Clinic to prove the contributions from recombined images for improved breast cancer diagnosis (0.86 in accuracy using LE imaging vs. 0.90 in accuracy using both LE and recombined imaging). We then develop a shallow-CNN using the same 49 CEDM cases to learn the nonlinear mapping from LE to recombined images. Next, we use 69 DM cases collected from the hospital located at Zhejiang University, China to generate "virtual" recombined images. Using DM alone provides 0.91 in accuracy, whereas SD-CNN improves the diagnostic accuracy to 0.95. △ Less

Submitted 26 October, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

Journal ref: Computerized Medical Imaging and Graphics (2018) 70 53-62

arXiv:1503.06361 [pdf, other]

doi 10.1109/TSP.2015.2474295

Generalized Interference Alignment --- Part II: Application to Wireless Secrecy

Authors: Liangzhong Ruan, Vincent K. N. Lau, Moe Z. Win

Abstract: In contrast to its wired counterpart, wireless communication is highly susceptible to eavesdropping due to the broadcast nature of the wireless propagation medium. Recent works have proposed the use of interference to reduce eavesdropping capabilities in wireless wiretap networks. However, the concurrent effect of interference on both eavesdropping receivers (ERs) and legitimate receivers (LRs) ha… ▽ More In contrast to its wired counterpart, wireless communication is highly susceptible to eavesdropping due to the broadcast nature of the wireless propagation medium. Recent works have proposed the use of interference to reduce eavesdropping capabilities in wireless wiretap networks. However, the concurrent effect of interference on both eavesdropping receivers (ERs) and legitimate receivers (LRs) has not been thoroughly investigated, and carefully engineering the network interference is required to harness the full potential of interference for wireless secrecy. This two part paper addresses this issue by proposing a generalized interference alignment (GIA) technique, which jointly designs the transceivers at the legitimate partners to impede the ERs without interfering with LRs. In Part I, we have established a theoretical framework for the GIA technique. In Part II, we will first propose an efficient GIA algorithm that is applicable to large-scale networks and then evaluate the performance of this algorithm in stochastic wireless wiretap network via both analysis and simulation. These results reveal insights into when and how GIA contributes to wireless secrecy. △ Less

Submitted 21 March, 2015; originally announced March 2015.

Comments: minor revision at IEEE Transactions on Signal Processing

arXiv:1503.06358 [pdf, other]

doi 10.1109/TSP.2015.2474301

Generalized Interference Alignment --- Part I: Theoretical Framework

Authors: Liangzhong Ruan, Vincent K. N. Lau, Moe Z. Win

Abstract: Interference alignment (IA) has attracted enormous research interest as it achieves optimal capacity scaling with respect to signal to noise ratio on interference networks. IA has also recently emerged as an effective tool in engineering interference for secrecy protection on wireless wiretap networks. However, despite the numerous works dedicated to IA, two of its fundamental issues, i.e., feasib… ▽ More Interference alignment (IA) has attracted enormous research interest as it achieves optimal capacity scaling with respect to signal to noise ratio on interference networks. IA has also recently emerged as an effective tool in engineering interference for secrecy protection on wireless wiretap networks. However, despite the numerous works dedicated to IA, two of its fundamental issues, i.e., feasibility conditions and transceiver design, are not completely addressed in the literature. In this two part paper, a generalised interference alignment (GIA) technique is proposed to enhance the IA's capability in secrecy protection. A theoretical framework is established to analyze the two fundamental issues of GIA in Part I and then the performance of GIA in large-scale stochastic networks is characterized to illustrate how GIA benefits secrecy protection in Part II. The theoretical framework for GIA adopts methodologies from algebraic geometry, determines the necessary and sufficient feasibility conditions of GIA, and generates a set of algorithms that can solve the GIA problem. This framework sets up a foundation for the development and implementation of GIA. △ Less

Submitted 21 March, 2015; originally announced March 2015.

Comments: Minor Revision at IEEE Transactions on Signal Processing

arXiv:1306.2015 [pdf, other]

doi 10.1109/TSP.2013.2269902

CSI Feedback Reduction for MIMO Interference Alignment

Authors: Xiongbin Rao, Liangzhong Ruan, Vincent K. N. Lau

Abstract: Interference alignment (IA) is a linear precoding strategy that can achieve optimal capacity scaling at high SNR in interference networks. Most of the existing IA designs require full channel state information (CSI) at the transmitters, which induces a huge CSI signaling cost. Hence it is desirable to improve the feedback efficiency for IA and in this paper, we propose a novel IA scheme with a sig… ▽ More Interference alignment (IA) is a linear precoding strategy that can achieve optimal capacity scaling at high SNR in interference networks. Most of the existing IA designs require full channel state information (CSI) at the transmitters, which induces a huge CSI signaling cost. Hence it is desirable to improve the feedback efficiency for IA and in this paper, we propose a novel IA scheme with a significantly reduced CSI feedback. To quantify the CSI feedback cost, we introduce a novel metric, namely the feedback dimension. This metric serves as a first-order measurement of CSI feedback overhead. Due to the partial CSI feedback constraint, conventional IA schemes can not be applied and hence, we develop a novel IA precoder / decorrelator design and establish new IA feasibility conditions. Via dynamic feedback profile design, the proposed IA scheme can also achieve a flexible tradeoff between the degree of freedom (DoF) requirements for data streams, the antenna resources and the CSI feedback cost. We show by analysis and simulations that the proposed scheme achieves substantial reductions of CSI feedback overhead under the same DoF requirement in MIMO interference networks. △ Less

Submitted 9 June, 2013; originally announced June 2013.

Comments: 30 pages, 7 figures, accepted for publication by IEEE transactions on signal processing in June, 2013

arXiv:1305.5884 [pdf, other]

doi 10.1109/TSP.2014.2302748

Hierarchical Radio Resource Optimization for Heterogeneous Networks with Enhanced Inter-cell Interference Coordination (eICIC)

Authors: An Liu, Vincent K. N. Lau, Liangzhong Ruan, Junting Chen, Dengkun Xiao

Abstract: Interference is a major performance bottleneck in Heterogeneous Network (HetNet) due to its multi-tier topological structure. We propose almost blank resource block (ABRB) for interference control in HetNet. When an ABRB is scheduled in a macro BS, a resource block (RB) with blank payload is transmitted and this eliminates the interference from this macro BS to the pico BSs. We study a two timesca… ▽ More Interference is a major performance bottleneck in Heterogeneous Network (HetNet) due to its multi-tier topological structure. We propose almost blank resource block (ABRB) for interference control in HetNet. When an ABRB is scheduled in a macro BS, a resource block (RB) with blank payload is transmitted and this eliminates the interference from this macro BS to the pico BSs. We study a two timescale hierarchical radio resource management (RRM) scheme for HetNet with dynamic ABRB control. The long term controls, such as dynamic ABRB, are adaptive to the large scale fading at a RRM server for co-Tier and cross-Tier interference control. The short term control (user scheduling) is adaptive to the local channel state information within each BS to exploit the multi-user diversity. The two timescale optimization problem is challenging due to the exponentially large solution space. We exploit the sparsity in the interference graph of the HetNet topology and derive structural properties for the optimal ABRB control. Based on that, we propose a two timescale alternative optimization solution for the user scheduling and ABRB control. The solution has low complexity and is asymptotically optimal at high SNR. Simulations show that the proposed solution has significant gain over various baselines. △ Less

Submitted 25 May, 2013; originally announced May 2013.

Comments: 14 pages, 8 figures

arXiv:1302.5978 [pdf, other]

doi 10.1109/TSP.2013.2252168

Limited Feedback Design for Interference Alignment on MIMO Interference Networks with Heterogeneous Path Loss and Spatial Correlations

Authors: Xiongbin Rao, Liangzhong Ruan, Vincent K. N. Lau

Abstract: Interference alignment is degree of freedom optimal in K -user MIMO interference channels and many previous works have studied the transceiver designs. However, these works predominantly focus on networks with perfect channel state information at the transmitters and symmetrical interference topology. In this paper, we consider a limited feedback system with heterogeneous path loss and spatial cor… ▽ More Interference alignment is degree of freedom optimal in K -user MIMO interference channels and many previous works have studied the transceiver designs. However, these works predominantly focus on networks with perfect channel state information at the transmitters and symmetrical interference topology. In this paper, we consider a limited feedback system with heterogeneous path loss and spatial correlations, and investigate how the dynamics of the interference topology can be exploited to improve the feedback efficiency. We propose a novel spatial codebook design, and perform dynamic quantization via bit allocations to adapt to the asymmetry of the interference topology. We bound the system throughput under the proposed dynamic scheme in terms of the transmit SNR, feedback bits and the interference topology parameters. It is shown that when the number of feedback bits scales with SNR as C_{s}\cdot\log\textrm{SNR}, the sum degrees of freedom of the network are preserved. Moreover, the value of scaling coefficient C_{s} can be significantly reduced in networks with asymmetric interference topology. △ Less

Submitted 25 February, 2013; v1 submitted 24 February, 2013; originally announced February 2013.

Comments: 30 pages, 6 figures, accepted by IEEE transactions on signal processing in Feb. 2013

arXiv:1211.3484 [pdf, other]

doi 10.1109/TSP.2013.2241056

The Feasibility Conditions for Interference Alignment in MIMO Networks

Authors: Liangzhong Ruan, Vincent K. N. Lau, Moe Z. Win

Abstract: Interference alignment (IA) has attracted great attention in the last few years for its breakthrough performance in interference networks. However, despite the numerous works dedicated to IA, the feasibility conditions of IA remains unclear for most network topologies. The IA feasibility analysis is challenging as the IA constraints are sets of high-degree polynomials, for which no systematic tool… ▽ More Interference alignment (IA) has attracted great attention in the last few years for its breakthrough performance in interference networks. However, despite the numerous works dedicated to IA, the feasibility conditions of IA remains unclear for most network topologies. The IA feasibility analysis is challenging as the IA constraints are sets of high-degree polynomials, for which no systematic tool to analyze the solvability conditions exists. In this work, by developing a new mathematical framework that maps the solvability of sets of polynomial equations to the linear independence of their first-order terms, we propose a sufficient condition that applies to MIMO interference networks with general configurations. We have further proved that this sufficient condition matches with the necessary conditions under a wide range of configurations. These results further consolidate the theoretical basis of IA. △ Less

Submitted 14 November, 2012; originally announced November 2012.

Comments: accepted by IEEE Trans. Signal Process

arXiv:1202.6091 [pdf, ps, other]

doi 10.1109/TSP.2012.2192432

Interference Alignment for Partially Connected MIMO Cellular Networks

Authors: Liangzhong Ruan, Vincent K. N. Lau

Abstract: In this paper, we propose an iterative interference alignment (IA) algorithm for MIMO cellular networks with partial connectivity, which is induced by heterogeneous path losses and spatial correlation. Such systems impose several key technical challenges in the IA algorithm design, namely the overlapping between the direct and interfering links due to the MIMO cellular topology as well as how to e… ▽ More In this paper, we propose an iterative interference alignment (IA) algorithm for MIMO cellular networks with partial connectivity, which is induced by heterogeneous path losses and spatial correlation. Such systems impose several key technical challenges in the IA algorithm design, namely the overlapping between the direct and interfering links due to the MIMO cellular topology as well as how to exploit the partial connectivity. We shall address these challenges and propose a three stage IA algorithm. As illustration, we analyze the achievable degree of freedom (DoF) of the proposed algorithm for a symmetric partially connected MIMO cellular network. We show that there is significant DoF gain compared with conventional IA algorithms due to partial connectivity. The derived DoF bound is also backward compatible with that achieved on fully connected K-pair MIMO interference channels. △ Less

Submitted 21 March, 2012; v1 submitted 27 February, 2012; originally announced February 2012.

Comments: Submitted to IEEE Transactions on Signal Processing, accepted

arXiv:1105.0286 [pdf, ps, other]

doi 10.1109/TSP.2011.2153198

Dynamic Interference Mitigation for Generalized Partially Connected Quasi-static MIMO Interference Channel

Authors: Liangzhong Ruan, Vincent K. N. Lau

Abstract: Recent works on MIMO interference channels have shown that interference alignment can significantly increase the achievable degrees of freedom (DoF) of the network. However, most of these works have assumed a fully connected interference graph. In this paper, we investigate how the partial connectivity can be exploited to enhance system performance in MIMO interference networks. We propose a novel… ▽ More Recent works on MIMO interference channels have shown that interference alignment can significantly increase the achievable degrees of freedom (DoF) of the network. However, most of these works have assumed a fully connected interference graph. In this paper, we investigate how the partial connectivity can be exploited to enhance system performance in MIMO interference networks. We propose a novel interference mitigation scheme which introduces constraints for the signal subspaces of the precoders and decorrelators to mitigate "many" interference nulling constraints at a cost of "little" freedoms in precoder and decorrelator design so as to extend the feasibility region of the interference alignment scheme. Our analysis shows that the proposed algorithm can significantly increase system DoF in symmetric partially connected MIMO interference networks. We also compare the performance of the proposed scheme with various baselines and show via simulations that the proposed algorithms could achieve significant gain in the system performance of randomly connected interference networks. △ Less

Submitted 2 May, 2011; originally announced May 2011.

Comments: 30 pages, 10 figures, accepted by IEEE Transaction on Signal Processing

arXiv:1007.4955 [pdf, ps, other]

doi 10.1109/TWC.2010.081610.081319

Decentralized Dynamic Hop Selection and Power Control in Cognitive Multi-hop Relay Systems

Authors: Liangzhong Ruan, Vincent K. N. Lau

Abstract: In this paper, we consider a cognitive multi-hop relay secondary user (SU) system sharing the spectrum with some primary users (PU). The transmit power as well as the hop selection of the cognitive relays can be dynamically adapted according to the local (and causal) knowledge of the instantaneous channel state information (CSI) in the multi-hop SU system. We shall determine a low complexity, dece… ▽ More In this paper, we consider a cognitive multi-hop relay secondary user (SU) system sharing the spectrum with some primary users (PU). The transmit power as well as the hop selection of the cognitive relays can be dynamically adapted according to the local (and causal) knowledge of the instantaneous channel state information (CSI) in the multi-hop SU system. We shall determine a low complexity, decentralized algorithm to maximize the average end-to-end throughput of the SU system with dynamic spatial reuse. The problem is challenging due to the decentralized requirement as well as the causality constraint on the knowledge of CSI. Furthermore, the problem belongs to the class of stochastic Network Utility Maximization (NUM) problems which is quite challenging. We exploit the time-scale difference between the PU activity and the CSI fluctuations and decompose the problem into a master problem and subproblems. We derive an asymptotically optimal low complexity solution using divide-and-conquer and illustrate that significant performance gain can be obtained through dynamic hop selection and power control. The worst case complexity and memory requirement of the proposed algorithm is O(M^2) and O(M^3) respectively, where $M$ is the number of SUs. △ Less

Submitted 28 July, 2010; originally announced July 2010.

Showing 1–39 of 39 results for author: Ruan, L