Search | arXiv e-print repository

NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

Authors: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

Abstract: Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, th… ▽ More Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, the original SAM presents limited adaptation to medical images. Moreover, SAM requires providing manual bounding box prompts for each object to produce satisfactory segmentation masks, so it is laborious in nuclei segmentation scenarios. To address these limitations, we propose a domain-generalizable framework for nuclei image segmentation, abbreviated to NuSegDG. Specifically, we first devise a Heterogeneous Space Adapter (HS-Adapter) to learn multi-dimensional feature representations of different nuclei domains by injecting a small number of trainable parameters into the image encoder of SAM. To alleviate the labor-intensive requirement of manual prompts, we introduce a Gaussian-Kernel Prompt Encoder (GKP-Encoder) to generate density maps driven by a single point, which guides segmentation predictions by mixing position prompts and semantic prompts. Furthermore, we present a Two-Stage Mask Decoder (TSM-Decoder) to effectively convert semantic masks to instance maps without the manual demand for morphological shape refinement. Based on our experimental evaluations, the proposed NuSegDG demonstrates state-of-the-art performance in nuclei instance segmentation, exhibiting superior domain generalization capabilities. The source code is available at https://github.com/xq141839/NuSegDG. △ Less

Submitted 24 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: Under Reivew

arXiv:2408.02929 [pdf, other]

Segmenting Small Stroke Lesions with Novel Labeling Strategies

Authors: Liang Shang, Zhengyang Lou, Andrew L. Alexander, Vivek Prabhakaran, William A. Sethares, Veena A. Nair, Nagesh Adluru

Abstract: Deep neural networks have demonstrated exceptional efficacy in stroke lesion segmentation. However, the delineation of small lesions, critical for stroke diagnosis, remains a challenge. In this study, we propose two straightforward yet powerful approaches that can be seamlessly integrated into a variety of networks: Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), with the aim of enhan… ▽ More Deep neural networks have demonstrated exceptional efficacy in stroke lesion segmentation. However, the delineation of small lesions, critical for stroke diagnosis, remains a challenge. In this study, we propose two straightforward yet powerful approaches that can be seamlessly integrated into a variety of networks: Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), with the aim of enhancing the segmentation accuracy of small lesions. MSL divides lesion masks into various categories based on lesion volume while DBL emphasizes the lesion boundaries. Experimental evaluations on the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset showcase that an ensemble of MSL and DBL achieves consistently better or equal performance on recall (3.6% and 3.7%), F1 (2.4% and 1.5%), and Dice scores (1.3% and 0.0%) compared to the top-1 winner of the 2022 MICCAI ATLAS Challenge on both the subset only containing small lesions and the entire dataset, respectively. Notably, on the mini-lesion subset, a single MSL model surpasses the previous best ensemble strategy, with enhancements of 1.0% and 0.3% on F1 and Dice scores, respectively. Our code is available at: https://github.com/nadluru/StrokeLesSeg. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2406.17628 [pdf, other]

Video Inpainting Localization with Contrastive Learning

Authors: Zijie Lou, Gang Cao, Man Lin

Abstract: Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning… ▽ More Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over state-of-the-arts. Code and dataset will be available at https://github.com/multimediaFor/ViLocal. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2406.13576

arXiv:2406.13576 [pdf, other]

Trusted Video Inpainting Localization via Deep Attentive Noise Learning

Authors: Zijie Lou, Gang Cao, Man Lin

Abstract: Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpaintin… ▽ More Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpainting Localization network (TruVIL) with excellent robustness and generalization ability. Observing that high-frequency noise can effectively unveil the inpainted regions, we design deep attentive noise learning in multiple stages to capture the inpainting traces. Firstly, a multi-scale noise extraction module based on 3D High Pass (HP3D) layers is used to create the noise modality from input RGB frames. Then the correlation between such two complementary modalities are explored by a cross-modality attentive fusion module to facilitate mutual feature learning. Lastly, spatial details are selectively enhanced by an attentive noise decoding module to boost the localization performance of the network. To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos with pixel-level annotation for all frames. Extensive experimental results validate the superiority of TruVIL compared with the state-of-the-arts. In particular, both quantitative and qualitative evaluations on various inpainted videos verify the remarkable robustness and generalization ability of our proposed TruVIL. Code and dataset will be available at https://github.com/multimediaFor/TruVIL. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13565 [pdf, other]

Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

Authors: Zijie Lou, Gang Cao, Kun Guo, Haochen Zhu, Lifang Yu

Abstract: Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-w… ▽ More Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-wise Contrastive algorithm (MPC) for image forgery localization. Specifically, we first pre-train the backbone network with the supervised contrastive loss to model pixel relationships from the perspectives of within-image, cross-scale and cross-modality. That is aimed at increasing intra-class compactness and inter-class separability. Then the localization head is fine-tuned using the cross-entropy loss, resulting in a better pixel localizer. The MPC is trained on three different scale training datasets to make a comprehensive and fair comparison with existing image forgery localization algorithms. Extensive experiments on the small, medium and large scale training datasets show that the proposed MPC achieves higher generalization performance and robustness against post-processing than the state-of-the-arts. Code will be available at https://github.com/multimediaFor/MPC. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.02911 [pdf, other]

Multimodal Sense-Informed Prediction of 3D Human Motions

Authors: Zhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou

Abstract: Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios. Despite encouraging results, existing approaches rarely consider the effects of the external scene on the motion sequence, leading to pronounced artifacts and physical im… ▽ More Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios. Despite encouraging results, existing approaches rarely consider the effects of the external scene on the motion sequence, leading to pronounced artifacts and physical implausibilities in the predictions. To address this limitation, this work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information: external 3D scene, and internal human gaze, and is able to recognize their salience for future human activity. Furthermore, the gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation to match where the human wants to reach. Meanwhile, we introduce semantic coherence-aware attention to explicitly distinguish the salient point clouds and the underlying ones, to ensure a reasonable interaction of the generated sequence with the 3D scene. On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.04505 [pdf, other]

Exploring UAV Networking from the Terrain Information Completeness Perspective: A Tutorial

Authors: Zhengying Lou, Ruibo Wang, Baha Eddine Youcef Belmekki, Mustafa A. Kishk, Mohamed-Slim Alouini

Abstract: Terrain information is a crucial factor affecting the performance of unmanned aerial vehicle (UAV) networks. As a tutorial, this article provides a unique perspective on the completeness of terrain information, summarizing and enhancing the research on terrain-based UAV deployment. In the presence of complete terrain information, two highly discussed topics are UAV-aided map construction and dynam… ▽ More Terrain information is a crucial factor affecting the performance of unmanned aerial vehicle (UAV) networks. As a tutorial, this article provides a unique perspective on the completeness of terrain information, summarizing and enhancing the research on terrain-based UAV deployment. In the presence of complete terrain information, two highly discussed topics are UAV-aided map construction and dynamic trajectory design based on maps. We propose a case study illustrating the mutually reinforcing relationship between them. When terrain information is incomplete, and only terrain-related feature parameters are available, we discuss how existing models map terrain features to blockage probabilities. By introducing the application of this model with stochastic geometry, a case study is proposed to analyze the accuracy of the model. When no terrain information is available, UAVs gather terrain information during the real-time networking process and determine the next position by collected information. This real-time search method is currently limited to relay communication. In the case study, we extend it to a multi-user scenario and summarize three trade-offs of the method. Finally, we conduct a qualitative analysis to assess the impact of three factors that have been overlooked in terrain-based UAV deployment. △ Less

Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

arXiv:2402.16121 [pdf, other]

Towards Accurate Post-training Quantization for Reparameterized Models

Authors: Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, YangWei Ying, Hong Zhou

Abstract: Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impac… ▽ More Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework comprises two main components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance. The code is available at \url{https://github.com/ilur98/DLMC-QUANT}. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.10186 [pdf, other]

Self-consistent Validation for Machine Learning Electronic Structure

Authors: Gengyuan Hu, Gengchen Wei, Zekun Lou, Philip H. S. Torr, Wanli Ouyang, Han-sen Zhong, Chen Lin

Abstract: Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems. Despite its potential, there is less guarantee for the model to generalize to unseen data that hinders its application in real-world scenarios. To address this issue, a technique has been proposed to estimate the accuracy of the predictions. This method integrates machine learning with self-… ▽ More Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems. Despite its potential, there is less guarantee for the model to generalize to unseen data that hinders its application in real-world scenarios. To address this issue, a technique has been proposed to estimate the accuracy of the predictions. This method integrates machine learning with self-consistent field methods to achieve both low validation cost and interpret-ability. This, in turn, enables exploration of the model's ability with active learning and instills confidence in its integration into real-world studies. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

arXiv:2401.09346 [pdf, other]

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Authors: Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

Abstract: Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution info… ▽ More Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.02468 [pdf, other]

Terrain-Based UAV Deployment: Providing Coverage for Outdoor Users

Authors: Zhengying Lou, Ruibo Wang, Baha Eddine Youcef Belmekki, Mustafa A. Kishk, Mohamed-Slim Alouini

Abstract: Deploying unmanned aerial vehicle (UAV) networks to provide coverage for outdoor users has attracted great attention during the last decade. However, outdoor coverage is challenging due to the high mobility of crowds and the diverse terrain configurations causing building blockage. Most studies use stochastic channel models to characterize the impact of building blockage on user performance and do… ▽ More Deploying unmanned aerial vehicle (UAV) networks to provide coverage for outdoor users has attracted great attention during the last decade. However, outdoor coverage is challenging due to the high mobility of crowds and the diverse terrain configurations causing building blockage. Most studies use stochastic channel models to characterize the impact of building blockage on user performance and do not take into account terrain information. On the other hand, real-time search methods use terrain information, but they are only practical when a single UAV serves a single user.In this paper, we put forward two methods to avoid building blockage in a multi-user system by collecting prior terrain information and using real-time search.We proposed four algorithms related to the combinations of the above methods and their performances are evaluated and compared in different scenarios.By adjusting the height of the UAV based on terrain information collected before networking, the performance is significantly enhanced compared to the one when no terrain information is available.The algorithm based on real-time search further improves the coverage performance by avoiding the shadow of buildings. During the execution of the real-time search algorithm, the search distance is reduced using the collected terrain information. △ Less

Submitted 6 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01938 [pdf, other]

DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and Small Text

Authors: Weijia Wu, Yiming Zhang, Yefei He, Luoming Zhang, Zhenyu Lou, Hong Zhou, Xiang Bai

Abstract: Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenario, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this paper, we establish a video text re… ▽ More Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenario, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this paper, we establish a video text reading benchmark, named DSText V2, which focuses on Dense and Small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotters to track and read. 2) High-proportioned small texts, coupled with the blurriness and distortion in the video, will bring further challenges. 3) Various new scenarios, e.g., Game, Sports, etc. The proposed DSText V2 includes 140 video clips from 7 open scenarios, supporting three tasks, i.e., video text detection (Task 1), video text tracking (Task 2), and end-to-end video text spotting (Task 3). In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols, and the results summaries. Most importantly, a thorough investigation and analysis targeting three unique challenges derived from our dataset are provided, aiming to provide new insights. Moreover, we hope the benchmark will promise video text research in the community. DSText v2 is built upon DSText v1, which was previously introduced to organize the ICDAR 2023 competition for dense and small video text. △ Less

Submitted 29 November, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2304.04376

Journal ref: Pattern Recognition 2023/2024

arXiv:2310.09659 [pdf, other]

HAPS in the Non-Terrestrial Network Nexus: Prospective Architectures and Performance Insights

Authors: Zhengying Lou, Baha Eddine Youcef Belmekki, Mohamed-Slim Alouini

Abstract: High altitude platform stations (HAPS) have recently emerged as a new key stratospheric player in non-terrestrial networks (NTN) alongside satellites and low-altitude platforms. In this paper, we present the main communication links between HAPS and other NTN platforms, their advantages, and their challenges. Then, prospective network architectures in which HAPS plays an indispensable role in the… ▽ More High altitude platform stations (HAPS) have recently emerged as a new key stratospheric player in non-terrestrial networks (NTN) alongside satellites and low-altitude platforms. In this paper, we present the main communication links between HAPS and other NTN platforms, their advantages, and their challenges. Then, prospective network architectures in which HAPS plays an indispensable role in the future NTNs are presented such as ad-hoc, cell-free, and integrated access and backhaul. To showcase the importance of HAPS in the NTN, we provide comprehensive performance insights when using HAPS in the prospective architectures with the most suitable communication link. The insights show the HAPS' ability to interconnect the NTN nexus as well as their versatility by incorporating different metrics into the analysis such as routing latency, energy efficiency, coverage probability, and channel capacity. Depending on the architecture, HAPS will play different roles in NTN, such as a UAV network center, satellite relay, and ground network extension. Finally, the performance gain provided by HAPS usage in NTN is further highlighted by comparing the results when no HAPS are used. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2310.04836 [pdf, other]

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

Authors: Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou

Abstract: Large Language Models (LLMs) pose significant hardware challenges related to memory requirements and computational ability. There are two mainstream quantization schemes for LLMs: coarse-grained ($\textit{e.g.,}$ channel-wise) quantization and fine-grained ($\textit{e.g.,}$ group-wise) quantization. Fine-grained quantization has smaller quantization loss, consequently achieving superior performanc… ▽ More Large Language Models (LLMs) pose significant hardware challenges related to memory requirements and computational ability. There are two mainstream quantization schemes for LLMs: coarse-grained ($\textit{e.g.,}$ channel-wise) quantization and fine-grained ($\textit{e.g.,}$ group-wise) quantization. Fine-grained quantization has smaller quantization loss, consequently achieving superior performance. However, when applied to weight-activation quantization, it disrupts continuous integer matrix multiplication, leading to inefficient inference. In this paper, we introduce Dual Grained Quantization (DGQ), a novel A8W4 quantization for LLM that maintains superior performance while ensuring fast inference speed. DSQ dequantizes the fine-grained INT4 weight into coarse-grained INT8 representation and preform matrix multiplication using INT8 kernels. Besides, we develop a two-phase grid search algorithm to simplify the determination of fine-grained and coarse-grained quantization scales. We also devise a percentile clipping schema for smoothing the activation outliers without the need for complex optimization techniques. Experimental results demonstrate that DGQ consistently outperforms prior methods across various LLM architectures and a wide range of tasks. Remarkably, by our implemented efficient CUTLASS kernel, we achieve $\textbf{1.12}$ $\times$ memory reduction and $\textbf{3.24}$ $\times$ speed gains comparing A16W4 implementation. These advancements enable efficient deployment of A8W4 LLMs for real-world applications. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: 15 pages, 2 figures

arXiv:2309.10243 [pdf]

Transferable Adversarial Attack on Image Tampering Localization

Authors: Yuqi Wang, Gang Cao, Zijie Lou, Haochen Zhu

Abstract: It is significant to evaluate the security of existing digital image tampering localization algorithms in real-world applications. In this paper, we propose an adversarial attack scheme to reveal the reliability of such tampering localizers, which would be fooled and fail to predict altered regions correctly. Specifically, the adversarial examples based on optimization and gradient are implemented… ▽ More It is significant to evaluate the security of existing digital image tampering localization algorithms in real-world applications. In this paper, we propose an adversarial attack scheme to reveal the reliability of such tampering localizers, which would be fooled and fail to predict altered regions correctly. Specifically, the adversarial examples based on optimization and gradient are implemented for white/black-box attacks. Correspondingly, the adversarial example is optimized via reverse gradient propagation, and the perturbation is added adaptively in the direction of gradient rising. The black-box attack is achieved by relying on the transferability of such adversarial examples to different localizers. Extensive evaluations verify that the proposed attack sharply reduces the localization accuracy while preserving high visual quality of the attacked images. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.09482 [pdf, other]

Spatio-temporal Co-attention Fusion Network for Video Splicing Localization

Authors: Man Lin, Gang Cao, Zijie Lou

Abstract: Digital video splicing has become easy and ubiquitous. Malicious users copy some regions of a video and paste them to another video for creating realistic forgeries. It is significant to blindly detect such forgery regions in videos. In this paper, a spatio-temporal co-attention fusion network (SCFNet) is proposed for video splicing localization. Specifically, a three-stream network is used as an… ▽ More Digital video splicing has become easy and ubiquitous. Malicious users copy some regions of a video and paste them to another video for creating realistic forgeries. It is significant to blindly detect such forgery regions in videos. In this paper, a spatio-temporal co-attention fusion network (SCFNet) is proposed for video splicing localization. Specifically, a three-stream network is used as an encoder to capture manipulation traces across multiple frames. The deep interaction and fusion of spatio-temporal forensic features are achieved by the novel parallel and cross co-attention fusion modules. A lightweight multilayer perceptron (MLP) decoder is adopted to yield a pixel-level tampering localization map. A new large-scale video splicing dataset is created for training the SCFNet. Extensive tests on benchmark datasets show that the localization and generalization performances of our SCFNet outperform the state-of-the-art. Code and datasets will be available at https://github.com/multimediaFor/SCFNet. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.05872 [pdf, other]

Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention

Authors: Liang Shang, Yanli Liu, Zhengyang Lou, Shuxue Quan, Nagesh Adluru, Bochen Guan, William A. Sethares

Abstract: Convolutional neural networks (CNNs) and vision transformers (ViTs) have achieved remarkable success in various vision tasks. However, many architectures do not consider interactions between feature maps from different stages and scales, which may limit their performance. In this work, we propose a simple add-on attention module to overcome these limitations via multi-stage and cross-scale interac… ▽ More Convolutional neural networks (CNNs) and vision transformers (ViTs) have achieved remarkable success in various vision tasks. However, many architectures do not consider interactions between feature maps from different stages and scales, which may limit their performance. In this work, we propose a simple add-on attention module to overcome these limitations via multi-stage and cross-scale interactions. Specifically, the proposed Multi-Stage Cross-Scale Attention (MSCSA) module takes feature maps from different stages to enable multi-stage interactions and achieves cross-scale interactions by computing self-attention at different scales based on the multi-stage feature maps. Our experiments on several downstream tasks show that MSCSA provides a significant performance boost with modest additional FLOPs and runtime. △ Less

Submitted 14 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.02918 [pdf, other]

Spectral Ranking Inferences based on General Multiway Comparisons

Authors: Jianqing Fan, Zhipeng Lou, Weichen Wang, Mengxin Yu

Abstract: This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in… ▽ More This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in scenarios where the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that this is the first time effective two-sample rank testing methods have been proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences for statistical journals and movie rankings. △ Less

Submitted 1 March, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

Comments: 62 pages, 4 figures

arXiv:2305.16481 [pdf, other]

SimHaze: game engine simulated data for real-world dehazing

Authors: Zhengyang Lou, Huan Xu, Fangzhou Mu, Yanli Liu, Xiaoyu Zhang, Liang Shang, Jiang Li, Bochen Guan, Yin Li, Yu Hen Hu

Abstract: Deep models have demonstrated recent success in single-image dehazing. Most prior methods consider fully supervised training and learn from paired clean and hazy images, where a hazy image is synthesized based on a clean image and its estimated depth map. This paradigm, however, can produce low-quality hazy images due to inaccurate depth estimation, resulting in poor generalization of the trained… ▽ More Deep models have demonstrated recent success in single-image dehazing. Most prior methods consider fully supervised training and learn from paired clean and hazy images, where a hazy image is synthesized based on a clean image and its estimated depth map. This paradigm, however, can produce low-quality hazy images due to inaccurate depth estimation, resulting in poor generalization of the trained models. In this paper, we explore an alternative approach for generating paired clean-hazy images by leveraging computer graphics. Using a modern game engine, our approach renders crisp clean images and their precise depth maps, based on which high-quality hazy images can be synthesized for training dehazing models. To this end, we present SimHaze: a new synthetic haze dataset. More importantly, we show that training with SimHaze alone allows the latest dehazing models to achieve significantly better performance in comparison to previous dehazing datasets. Our dataset and code will be made publicly available. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Submitted to ICIP 2023

arXiv:2303.10021 [pdf, other]

Coverage Analysis of Hybrid RF/THz Networks With Best Relay Selection

Authors: Zhengying Lou, Baha Eddine Youcef Belmekki, Mohamed-Slim Alouini

Abstract: Utilizing terahertz (THz) transmission to enhance coverage has proven various benefits compared to traditional radio frequency (RF) counterparts. This letter proposes a dual-hop decode-and-forward (DF) routing protocol in a hybrid RF and THz relay network named hybrid relay selection (HRS). The coverage probability of the HRS protocol is derived. The HRS protocol prioritizes THz relays for higher… ▽ More Utilizing terahertz (THz) transmission to enhance coverage has proven various benefits compared to traditional radio frequency (RF) counterparts. This letter proposes a dual-hop decode-and-forward (DF) routing protocol in a hybrid RF and THz relay network named hybrid relay selection (HRS). The coverage probability of the HRS protocol is derived. The HRS protocol prioritizes THz relays for higher data rates or short source-destination distances; and RF relays for lower data rates or large source-destination distances. The proposed HRS protocol offers nearly the same performance as the optimal selection protocol, which requires complete instantaneous channel state information (CSI) of all the nodes. △ Less

Submitted 17 March, 2023; originally announced March 2023.

arXiv:2212.06524 [pdf, other]

SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

Authors: Chenyangguang Zhang, Zhiqiang Lou, Yan Di, Federico Tombari, Xiangyang Ji

Abstract: Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods have demonstrated promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D recon… ▽ More Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods have demonstrated promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a novel cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements. △ Less

Submitted 24 July, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: ICME 2023 (oral)

Report number: camera ready for ICME 2023

arXiv:2211.11959 [pdf, ps, other]

Robust High-dimensional Tuning Free Multiple Testing

Authors: Jianqing Fan, Zhipeng Lou, Mengxin Yu

Abstract: A stylized feature of high-dimensional data is that many variables have heavy tails, and robust statistical inference is critical for valid large-scale statistical inference. Yet, the existing developments such as Winsorization, Huberization and median of means require the bounded second moments and involve variable-dependent tuning parameters, which hamper their fidelity in applications to large-… ▽ More A stylized feature of high-dimensional data is that many variables have heavy tails, and robust statistical inference is critical for valid large-scale statistical inference. Yet, the existing developments such as Winsorization, Huberization and median of means require the bounded second moments and involve variable-dependent tuning parameters, which hamper their fidelity in applications to large-scale problems. To liberate these constraints, this paper revisits the celebrated Hodges-Lehmann (HL) estimator for estimating location parameters in both the one- and two-sample problems, from a non-asymptotic perspective. Our study develops Berry-Esseen inequality and Cramér type moderate deviation for the HL estimator based on newly developed non-asymptotic Bahadur representation, and builds data-driven confidence intervals via a weighted bootstrap approach. These results allow us to extend the HL estimator to large-scale studies and propose \emph{tuning-free} and \emph{moment-free} high-dimensional inference procedures for testing global null and for large-scale multiple testing with false discovery proportion control. It is convincingly shown that the resulting tuning-free and moment-free methods control false discovery proportion at a prescribed level. The simulation studies lend further support to our developed theory. △ Less

Submitted 23 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: In this paper, we develop tuning-free and moment-free high dimensional inference procedures;

arXiv:2211.11957 [pdf, other]

Ranking Inferences Based on the Top Choice of Multiway Comparisons

Authors: Jianqing Fan, Zhipeng Lou, Weichen Wang, Mengxin Yu

Abstract: This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$. Under a uniform sampling scheme in which any $M$ dist… ▽ More This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$. Under a uniform sampling scheme in which any $M$ distinguished items are selected for comparisons with probability $p$ and the selected $M$ items are compared $L$ times with multinomial outcomes, we establish the statistical rates of convergence for underlying $n$ preference scores using both $\ell_2$-norm and $\ell_\infty$-norm, with the minimum sampling complexity. In addition, we establish the asymptotic normality of the maximum likelihood estimator that allows us to construct confidence intervals for the underlying scores. Furthermore, we propose a novel inference framework for ranking items through a sophisticated maximum pairwise difference statistic whose distribution is estimated via a valid Gaussian multiplier bootstrap. The estimated distribution is then used to construct simultaneous confidence intervals for the differences in the preference scores and the ranks of individual items. They also enable us to address various inference questions on the ranks of these items. Extensive simulation studies lend further support to our theoretical results. A real data application illustrates the usefulness of the proposed methods convincingly. △ Less

Submitted 5 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: In this paper, we build simultaneous confidence intervals for ranks through multiway comparisons

arXiv:2211.07091 [pdf, other]

BiViT: Extremely Compressed Binary Vision Transformer

Authors: Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

Abstract: Model binarization can significantly compress model size, reduce energy consumption, and accelerate inference through efficient bit-wise operations. Although binarizing convolutional neural networks have been extensively studied, there is little work on exploring binarization of vision Transformers which underpin most recent breakthroughs in visual recognition. To this end, we propose to solve two… ▽ More Model binarization can significantly compress model size, reduce energy consumption, and accelerate inference through efficient bit-wise operations. Although binarizing convolutional neural networks have been extensively studied, there is little work on exploring binarization of vision Transformers which underpin most recent breakthroughs in visual recognition. To this end, we propose to solve two fundamental challenges to push the horizon of Binary Vision Transformers (BiViT). First, the traditional binary method does not take the long-tailed distribution of softmax attention into consideration, bringing large binarization errors in the attention module. To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization. Second, to better preserve the information of the pretrained model and restore accuracy, we propose a Cross-layer Binarization scheme that decouples the binarization of self-attention and multi-layer perceptrons (MLPs), and Parameterized Weight Scales which introduce learnable scaling factors for weight binarization. Overall, our method performs favorably against state-of-the-arts by 19.8% on the TinyImageNet dataset. On ImageNet, our BiViT achieves a competitive 75.6% Top-1 accuracy over Swin-S model. Additionally, on COCO object detection, our method achieves an mAP of 40.8 with a Swin-T backbone over Cascade Mask R-CNN framework. △ Less

Submitted 5 October, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Accepted by ICCV 2023

arXiv:2211.03509 [pdf]

Black-Box Attack against GAN-Generated Image Detector with Contrastive Perturbation

Authors: Zijie Lou, Gang Cao, Man Lin

Abstract: Visually realistic GAN-generated facial images raise obvious concerns on potential misuse. Many effective forensic algorithms have been developed to detect such synthetic images in recent years. It is significant to assess the vulnerability of such forensic detectors against adversarial attacks. In this paper, we propose a new black-box attack method against GAN-generated image detectors. A novel… ▽ More Visually realistic GAN-generated facial images raise obvious concerns on potential misuse. Many effective forensic algorithms have been developed to detect such synthetic images in recent years. It is significant to assess the vulnerability of such forensic detectors against adversarial attacks. In this paper, we propose a new black-box attack method against GAN-generated image detectors. A novel contrastive learning strategy is adopted to train the encoder-decoder network based anti-forensic model under a contrastive loss function. GAN images and their simulated real counterparts are constructed as positive and negative samples, respectively. Leveraging on the trained attack model, imperceptible contrastive perturbation could be applied to input synthetic images for removing GAN fingerprint to some extent. As such, existing GAN-generated image detectors are expected to be deceived. Extensive experimental results verify that the proposed attack effectively reduces the accuracy of three state-of-the-art detectors on six popular GANs. High visual quality of the attacked images is also achieved. The source code will be available at https://github.com/ZXMMD/BAttGAND. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2208.00237 [pdf, other]

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

Authors: Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Federico Tombari, Xiangyang Ji

Abstract: Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. However, their shape prior integration strategy boosts pose estimation indirectly, which l… ▽ More Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. However, their shape prior integration strategy boosts pose estimation indirectly, which leads to insufficient pose-sensitive feature extraction and slow inference speed. To tackle this problem, in this paper, we propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors describing the displacements from the shape-prior-indicated object surface projections on the bounding box towards the real surface projections. Such definition of residual vectors is inherently zero-mean and relatively small, and explicitly encapsulates spatial cues of the 3D object for robust and accurate pose regression. We enforce geometry-aware consistency terms to align the predicted pose and residual vectors to further boost performance. △ Less

Submitted 28 September, 2022; v1 submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted by ECCV 2022

arXiv:2203.07918 [pdf, other]

GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

Authors: Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, Federico Tombari

Abstract: While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications. To circumvent this problem, category-level object pose estimation has recently been revamped, which aims at predicting the 6D pose as well as the 3D metric size for previously unseen instances from a given set of obje… ▽ More While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications. To circumvent this problem, category-level object pose estimation has recently been revamped, which aims at predicting the 6D pose as well as the 3D metric size for previously unseen instances from a given set of object classes. This is, however, a much more challenging task due to severe intra-class shape variations. To address this issue, we propose GPV-Pose, a novel framework for robust category-level pose estimation, harnessing geometric insights to enhance the learning of category-level pose-sensitive features. First, we introduce a decoupled confidence-driven rotation representation, which allows geometry-aware recovery of the associated rotation matrix. Second, we propose a novel geometry-guided point-wise voting paradigm for robust retrieval of the 3D object bounding box. Finally, leveraging these different output streams, we can enforce several geometric consistency terms, further increasing performance, especially for non-symmetric categories. GPV-Pose produces superior results to state-of-the-art competitors on common public benchmarks, whilst almost achieving real-time inference speed at 20 FPS. △ Less

Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.01219 [pdf, other]

Are Latent Factor Regression and Sparse Regression Adequate?

Authors: Jianqing Fan, Zhipeng Lou, Mengxin Yu

Abstract: We propose the Factor Augmented sparse linear Regression Model (FARM) that not only encompasses both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1+x)-th moment… ▽ More We propose the Factor Augmented sparse linear Regression Model (FARM) that not only encompasses both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1+x)-th moment, for all x>0), respectively. In addition, the existing works on supervised learning often assume the latent factor regression or the sparse linear regression is the true underlying model without justifying its adequacy. To fill in such an important gap, we also leverage our model as the alternative model to test the sufficiency of the latent factor regression and the sparse linear regression models. To accomplish these goals, we propose the Factor-Adjusted de-Biased Test (FabTest) and a two-stage ANOVA type test respectively. We also conduct large-scale numerical experiments including both synthetic and FRED macroeconomics data to corroborate the theoretical properties of our methods. Numerical results illustrate the robustness and effectiveness of our model against latent factor regression and sparse linear regression models. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2112.02058 [pdf, other]

doi 10.1016/j.compeleceng.2021.107619

I-WKNN: Fast-Speed and High-Accuracy WIFI Positioning for Intelligent Stadiums

Authors: Zhangzhi Zhao, Zhengying Lou, Ruibo Wang, Qingyao Li, Xing Xu

Abstract: Based on various existing wireless fingerprint location algorithms in intelligent sports venues, a high-precision and fast indoor location algorithm improved weighted k-nearest neighbor (I-WKNN) is proposed. In order to meet the complex environment of sports venues and the demand of high-speed sampling, this paper proposes an AP selection algorithm for offline and online stages. Based on the chara… ▽ More Based on various existing wireless fingerprint location algorithms in intelligent sports venues, a high-precision and fast indoor location algorithm improved weighted k-nearest neighbor (I-WKNN) is proposed. In order to meet the complex environment of sports venues and the demand of high-speed sampling, this paper proposes an AP selection algorithm for offline and online stages. Based on the characteristics of the signal intensity distribution in intelligent venues, an asymmetric Gaussian filter algorithm is proposed. This paper introduces the application of the positioning algorithm in the intelligent stadium system, and completes the data acquisition and real-time positioning of the stadium. Compared with traditional WKNN and KNN algorithms, the I-WKNN algorithm has advantages in fingerprint positioning database processing, environmental noise adaptability, real-time positioning accuracy and positioning speed, etc. The experimental results show that the I-WKNN algorithm has obvious advantages in positioning accuracy and positioning time in a complex noise environment and has obvious application potential in a smart stadium. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Journal ref: Computers & Electrical Engineering, 98, 107619 (2022)

arXiv:2108.12981 [pdf, ps, other]

The ensmallen library for flexible numerical optimization

Authors: Ryan R. Curtin, Marcus Edel, Rahul Ganesh Prabhu, Suryoday Basak, Zhihao Lou, Conrad Sanderson

Abstract: We overview the ensmallen numerical optimization library, which provides a flexible C++ framework for mathematical optimization of user-supplied objective functions. Many types of objective functions are supported, including general, differentiable, separable, constrained, and categorical. A diverse set of pre-built optimizers is provided, including Quasi-Newton optimizers and many variants of Sto… ▽ More We overview the ensmallen numerical optimization library, which provides a flexible C++ framework for mathematical optimization of user-supplied objective functions. Many types of objective functions are supported, including general, differentiable, separable, constrained, and categorical. A diverse set of pre-built optimizers is provided, including Quasi-Newton optimizers and many variants of Stochastic Gradient Descent. The underlying framework facilitates the implementation of new optimizers. Optimization of an objective function typically requires supplying only one or two C++ functions. Custom behavior can be easily specified via callback functions. Empirical comparisons show that ensmallen outperforms other frameworks while providing more functionality. The library is available at https://ensmallen.org and is distributed under the permissive BSD license. △ Less

Submitted 9 February, 2024; v1 submitted 29 August, 2021; originally announced August 2021.

MSC Class: 65K10; 68N99 ACM Class: G.4; G.1.3; G.1.6

Journal ref: Journal of Machine Learning Research, Vol. 22, No. 166, 2021

arXiv:2105.14359 [pdf, ps, other]

Green Tethered UAVs for EMF-Aware Cellular Networks

Authors: Zhengying Lou, Ahmed Elzanaty, Mohamed-Slim Alouini

Abstract: A prevalent theory circulating among the non-scientific community is that the intensive deployment of base stations over the territory significantly increases the level of electromagnetic field (EMF) exposure and affects population health. To alleviate this concern, in this work, we propose a network architecture that introduces tethered unmanned aerial vehicles (TUAVs) carrying green antennas to… ▽ More A prevalent theory circulating among the non-scientific community is that the intensive deployment of base stations over the territory significantly increases the level of electromagnetic field (EMF) exposure and affects population health. To alleviate this concern, in this work, we propose a network architecture that introduces tethered unmanned aerial vehicles (TUAVs) carrying green antennas to minimize the EMF exposure while guaranteeing a high data rate for users. In particular, each TUAV can attach itself to one of the possible ground stations at the top of some buildings. The location of the TUAVs, transmit power of user equipment and association policy are optimized to minimize the EMF exposure. Unfortunately, the problem turns out to be mixed-integer non-linear programming (MINLP), which is non-deterministic polynomial-time (NP) hard. We propose an efficient low-complexity algorithm composed of three submodules. Firstly, we propose an algorithm based on the greedy principle to determine the optimal association matrix between the users and base stations. Then, we offer two approaches, a modified K-mean and shrink and realign (SR) process, to associate each TUAV with a ground station. Finally, we put forward two algorithms based on the golden search and SR process to adjust the TUAV's position within the hovering area over the building. After that, we consider the dual problem that maximizes the sum rate while keeping the exposure below a predefined value, such as the level enforced by the regulation. Next, we perform extensive simulations to show the effectiveness of the proposed TUAVs to reduce the exposure compared to various architectures. Eventually, we show that TUAVs with green antennas can effectively mitigate the EMF exposure by more than 20% compared to fixed green small cells while achieving a higher data rate. △ Less

Submitted 3 June, 2021; v1 submitted 29 May, 2021; originally announced May 2021.

Comments: 30 pages

arXiv:2007.03092 [pdf, other]

Neural Subgraph Matching

Authors: Rex, Ying, Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, Jure Leskovec

Abstract: Subgraph matching is the problem of determining the presence and location(s) of a given query graph in a large target graph. Despite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. However, existing techniques based on combinatorial matching and integer programming cannot hand… ▽ More Subgraph matching is the problem of determining the presence and location(s) of a given query graph in a large target graph. Despite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. However, existing techniques based on combinatorial matching and integer programming cannot handle matching problems with both large target and query graphs. Here we propose NeuroMatch, an accurate, efficient, and robust neural approach to subgraph matching. NeuroMatch decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. Trained to capture geometric constraints corresponding to subgraph relations, NeuroMatch then efficiently performs subgraph matching directly in the embedding space. Experiments demonstrate NeuroMatch is 100x faster than existing combinatorial approaches and 18% more accurate than existing approximate subgraph matching methods. △ Less

Submitted 27 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2003.04103 [pdf, ps, other]

Flexible numerical optimization with ensmallen

Authors: Ryan R. Curtin, Marcus Edel, Rahul Ganesh Prabhu, Suryoday Basak, Zhihao Lou, Conrad Sanderson

Abstract: This report provides an introduction to the ensmallen numerical optimization library, as well as a deep dive into the technical details of how it works. The library provides a fast and flexible C++ framework for mathematical optimization of arbitrary user-supplied functions. A large set of pre-built optimizers is provided, including many variants of Stochastic Gradient Descent and Quasi-Newton opt… ▽ More This report provides an introduction to the ensmallen numerical optimization library, as well as a deep dive into the technical details of how it works. The library provides a fast and flexible C++ framework for mathematical optimization of arbitrary user-supplied functions. A large set of pre-built optimizers is provided, including many variants of Stochastic Gradient Descent and Quasi-Newton optimizers. Several types of objective functions are supported, including differentiable, separable, constrained, and categorical objective functions. Implementation of a new optimizer requires only one method, while a new objective function requires typically only one or two C++ methods. Through internal use of C++ template metaprogramming, ensmallen provides support for arbitrary user-supplied callbacks and automatic inference of unsupplied methods without any runtime overhead. Empirical comparisons show that ensmallen outperforms other optimization frameworks (such as Julia and SciPy), sometimes by large margins. The library is available at https://ensmallen.org and is distributed under the permissive BSD license. △ Less

Submitted 15 November, 2023; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: https://ensmallen.org/

arXiv:1909.13055 [pdf, other]

DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision

Authors: Duc Tam Nguyen, Maximilian Dax, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox

Abstract: Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement… ▽ More Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods. Each handcrafted method is substituted by a deep network that learns to generate the pseudo labels. These labels are refined incrementally in multiple iterations via our proposed self-supervision technique. In the second stage, the refined labels produced from multiple networks representing multiple saliency methods are used to train the actual saliency detection network. We show that this self-learning procedure outperforms all the existing unsupervised methods over different datasets. Results are even comparable to those of fully-supervised state-of-the-art approaches. The code is available at https://tinyurl.com/wtlhgo3 . △ Less

Submitted 15 March, 2021; v1 submitted 28 September, 2019; originally announced September 2019.

Comments: NeuRIPS-2019 (Vancouver, Canada): camera ready version

arXiv:1909.01584 [pdf, other]

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Authors: Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han

Abstract: Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distri… ▽ More Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of context to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: CIKM 2019

arXiv:1906.00216 [pdf, other]

Robust Learning Under Label Noise With Iterative Noise-Filtering

Authors: Duc Tam Nguyen, Thi-Phuong-Nhung Ngo, Zhongyu Lou, Michael Klar, Laura Beggel, Thomas Brox

Abstract: We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to them or completely removing them from the training set. In the first case the model however still learns from noisy labels; in the latter approach, good training d… ▽ More We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to them or completely removing them from the training set. In the first case the model however still learns from noisy labels; in the latter approach, good training data can be lost. In this paper, we propose an iterative semi-supervised mechanism for robust learning which excludes noisy labels but is still able to learn from the corresponding samples. To this end, we add an unsupervised loss term that also serves as a regularizer against the remaining label noise. We evaluate our approach on common classification tasks with different noise ratios. Our robust models outperform the state-of-the-art methods by a large margin. Especially for very large noise ratios, we achieve up to 20 % absolute improvement compared to the previous best model. △ Less

Submitted 1 June, 2019; originally announced June 2019.

arXiv:1810.13292 [pdf, other]

Anomaly Detection With Multiple-Hypotheses Predictions

Authors: Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox

Abstract: In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used… ▽ More In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution of the foreground more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and enforces diversity across hypotheses. Our multiple-hypothesesbased anomaly detection framework allows the reliable identification of out-of-distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%. △ Less

Submitted 31 May, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

Comments: In proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, California, PMLR 97, 2019

arXiv:1605.02971 [pdf, other]

Structured Receptive Fields in CNNs

Authors: Jörn-Henrik Jacobsen, Jan van Gemert, Zhongyu Lou, Arnold W. M. Smeulders

Abstract: Learning powerful feature representations with CNNs is hard when training data are limited. Pre-training is one way to overcome this, but it requires large datasets sufficiently similar to the target domain. Another option is to design priors into the model, which can range from tuned hyperparameters to fully engineered representations like Scattering Networks. We combine these ideas into structur… ▽ More Learning powerful feature representations with CNNs is hard when training data are limited. Pre-training is one way to overcome this, but it requires large datasets sufficiently similar to the target domain. Another option is to design priors into the model, which can range from tuned hyperparameters to fully engineered representations like Scattering Networks. We combine these ideas into structured receptive field networks, a model which has a fixed filter basis and yet retains the flexibility of CNNs. This flexibility is achieved by expressing receptive fields in CNNs as a weighted sum over a fixed basis which is similar in spirit to Scattering Networks. The key difference is that we learn arbitrary effective filter sets from the basis rather than modeling the filters. This approach explicitly connects classical multiscale image analysis with general CNNs. With structured receptive field networks, we improve considerably over unstructured CNNs for small and medium dataset scenarios as well as over Scattering for large datasets. We validate our findings on ILSVRC2012, Cifar-10, Cifar-100 and MNIST. As a realistic small dataset example, we show state-of-the-art classification results on popular 3D MRI brain-disease datasets where pre-training is difficult due to a lack of large public datasets in a similar domain. △ Less

Submitted 13 May, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

Comments: Reason for update: i) Fix Reference for "Deep roto-translation scattering for object classification" by Oyallon and Mallat. ii) Fixed two minor typos. iii) Removed implicit assumption in equation (4) where scale is represented with diffusion time and adapted to rest of paper where scale is represented with standard deviation, to avoid possible confusion

arXiv:1503.01820 [pdf, other]

Latent Hierarchical Model for Activity Recognition

Authors: Ninghang Hu, Gwenn Englebienne, Zhongyu Lou, Ben Kröse

Abstract: We present a novel hierarchical model for human activity recognition. In contrast to approaches that successively recognize actions and activities, our approach jointly models actions and activities in a unified framework, and their labels are simultaneously predicted. The model is embedded with a latent layer that is able to capture a richer class of contextual information in both state-state and… ▽ More We present a novel hierarchical model for human activity recognition. In contrast to approaches that successively recognize actions and activities, our approach jointly models actions and activities in a unified framework, and their labels are simultaneously predicted. The model is embedded with a latent layer that is able to capture a richer class of contextual information in both state-state and observation-state pairs. Although loops are present in the model, the model has an overall linear-chain structure, where the exact inference is tractable. Therefore, the model is very efficient in both inference and learning. The parameters of the graphical model are learned with a Structured Support Vector Machine (Structured-SVM). A data-driven approach is used to initialize the latent variables; therefore, no manual labeling for the latent states is required. The experimental results from using two benchmark datasets show that our model outperforms the state-of-the-art approach, and our model is computationally more efficient. △ Less

Submitted 5 March, 2015; originally announced March 2015.

Showing 1–39 of 39 results for author: Lou, Z