Search | arXiv e-print repository

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

Authors: Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

Abstract: We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. Our approach decomposes the contrastive loss and its gradient computation into two parts, one to calculate the intra-GPU gradients and the other to compute the inter-GPU gradients. According to our decomposition, only the intr… ▽ More We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. Our approach decomposes the contrastive loss and its gradient computation into two parts, one to calculate the intra-GPU gradients and the other to compute the inter-GPU gradients. According to our decomposition, only the intra-GPU gradients are computed on the current GPU, while the inter-GPU gradients are collected via all_reduce from other GPUs instead of being repeatedly computed on every GPU. In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training. Such a distributed solution is mathematically equivalent to the original non-distributed contrastive loss computation, without sacrificing any computation accuracy. It is particularly efficient for large-batch CLIP training. For instance, DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs, compared with the original CLIP solution which requires 128 A100 40GB GPUs to train a ViT-B/32 model with a batch size of 32K. The code will be released at https://github.com/IDEA-Research/DisCo-CLIP △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: To appear in CVPR 2023 as a highlight, our code will be public at https://github.com/IDEA-Research/DisCo-CLIP

arXiv:2304.07051 [pdf, other]

The Second Monocular Depth Estimation Challenge

Authors: Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis , et al. (18 additional authors not shown)

Abstract: This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes… ▽ More This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes complex natural environments, e.g. forests or fields, which are greatly underrepresented in current benchmarks. The challenge received eight unique submissions that outperformed the provided SotA baseline on any of the pointcloud- or image-based metrics. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%. Supervised submissions generally leveraged large collections of datasets to improve data diversity. Self-supervised submissions instead updated the network architecture and pretrained backbones. These results represent a significant progress in the field, while highlighting avenues for future research, such as reducing interpolation artifacts at depth boundaries, improving self-supervised indoor performance and overall natural image accuracy. △ Less

Submitted 26 April, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Published at CVPRW2023

arXiv:2304.06219 [pdf, other]

Transport of intense ion beams in plasmas: collimation and energy-loss reduction

Authors: Yongtao Zhao, Benzheng Chen, Dong Wu, Rui Cheng, Xianming Zhou, Yu Lei, Yuyu Wang, Xin Qi, Guoqing Xiao, Jieru Ren, Xing Wang, Dieter H. H. Hoffmann, Fei Gao, Zhanghu Hu, Younian Wang, Wei Yu, Stephan Fritzsche, Xiantu He

Abstract: We compare the transport properties of a well-characterized hydrogen plasma for low and high current ion beams. The energy-loss of low current beams can be well understood, within the framework of current stopping power models. However, for high current proton beams, significant energy-loss reduction and collimation is observed in the experiment. We have developed a new particle-in-cell code, whic… ▽ More We compare the transport properties of a well-characterized hydrogen plasma for low and high current ion beams. The energy-loss of low current beams can be well understood, within the framework of current stopping power models. However, for high current proton beams, significant energy-loss reduction and collimation is observed in the experiment. We have developed a new particle-in-cell code, which includes both collective electromagnetic effects and collisional interactions. Our simulations indicate that resistive magnetic fields, induced by the transport of an intense proton beam, act to collimate the proton beam and simultaneously deplete the local plasma density along the beam path. This in turn causes the energy-loss reduction detected in the experiment. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.00962 [pdf, other]

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Authors: Jihan Yang, Runyu Ding, Weipeng Deng, Zhe Wang, Xiaojuan Qi

Abstract: We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely \textbf{RegionPLC}, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on our empirical studies, we introduce a 3D-aware SFusion strategy that fuses 3D vision-language pairs derived from multiple 2D foundation models, yieldin… ▽ More We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely \textbf{RegionPLC}, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on our empirical studies, we introduce a 3D-aware SFusion strategy that fuses 3D vision-language pairs derived from multiple 2D foundation models, yielding high-quality, dense region-level language descriptions without human 3D annotations. Subsequently, we devise a region-aware point-discriminative contrastive learning objective to enable robust and effective 3D learning from dense regional language supervision. We carry out extensive experiments on ScanNet, ScanNet200, and nuScenes datasets, and our model outperforms prior 3D open-world scene understanding approaches by an average of 17.2\% and 9.1\% for semantic and instance segmentation, respectively, while maintaining greater scalability and lower resource demands. Furthermore, our method has the flexibility to be effortlessly integrated with language models to enable open-ended grounded 3D reasoning without extra task-specific training. Code is available at https://github.com/CVMI-Lab/PLA. △ Less

Submitted 5 May, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: To appear in CVPR2024 .project page: https://jihanyang.github.io/projects/RegionPLC

arXiv:2303.15713 [pdf]

Robust 3.7 V-Na$_{2/3}$[Cu$_{1/3}$Mn$_{2/3}$]O$_2$ Cathode for Na-ion Batteries

Authors: Xiaohui Rong, Xingguo Qi, Quan Zhou, Libin Kang, Dongdong Xiao, Ruijuan Xiao, Feixiang Ding, Yang Yang, Yuan Liu, Yun Su, Shiguang Zhang, Lunhua He, Yaxiang Lu, Liquan Chen, Yong-Sheng Hu

Abstract: Na-ion batteries (NIBs), which are recognized as a next-generation alternative technology for energy storage, still suffer from commercialization constraints due to the lack of low-cost, high-performance cathode materials. Since our first discovery of Cu$^{3+}$/Cu$^{2+}$ electrochemistry in 2014, numerous Cu-substituted/doped materials have been designed for NIBs. However for almost ten years, the… ▽ More Na-ion batteries (NIBs), which are recognized as a next-generation alternative technology for energy storage, still suffer from commercialization constraints due to the lack of low-cost, high-performance cathode materials. Since our first discovery of Cu$^{3+}$/Cu$^{2+}$ electrochemistry in 2014, numerous Cu-substituted/doped materials have been designed for NIBs. However for almost ten years, the potential of Cu$^{3+}$/Cu$^{2+}$ electrochemistry has been grossly underappreciated and normally regarded as a semielectrochemically active redox. Here, we re-synthesized P2-Na$_{2/3}$[Cu$_{1/3}$Mn$_{2/3}$]O$_2$ and reinterpreted it as a high-voltage, cost-efficient, air-stable, long-life, and high-rate cathode material for NIBs, which demonstrates a high operating voltage of 3.7 V and a completely active Cu$^{3+}$/Cu$^{2+}$ redox reaction. The 2.3 Ah cylindrical cells exhibit excellent cycling (93.1% capacity after 2000 cycles), high rate (97.2% capacity at 10C rate), good low-temperature performance (86.6% capacity at -30$^\circ$C), and high safety, based on which, a 56 V-11.5 Ah battery pack for E-bikes is successfully constructed, exhibiting stable cycling (96.5% capacity at the 800th cycle) and a long driving distance (36 km, tester weight 65 kg). This work offers a commercially feasible cathode material for low-cost, high-voltage NIBs, paving the way for advanced NIBs in power and stationary energy storage applications. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 15 pages, 3 figures, 1 table

arXiv:2303.15181 [pdf, other]

DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation

Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu

Abstract: In this paper, we present a new text-guided 3D shape generation approach DreamStone that uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP featur… ▽ More In this paper, we present a new text-guided 3D shape generation approach DreamStone that uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between rendered images and the input text. Besides, to extend beyond the generative capability of the SVR model, we design a text-guided 3D shape stylization module that can enhance the output shapes with novel structures and textures. Further, we exploit pre-trained text-to-image diffusion models to enhance the generative diversity, fidelity, and stylization capability. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity. Extensive experimental results demonstrate that our approach outperforms the state-of-the-art methods in terms of generative quality and consistency with the input text. Codes and models are released at https://github.com/liuzhengzhe/DreamStone-ISS. △ Less

Submitted 23 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2303.15169 [pdf, other]

doi 10.1088/1475-7516/2023/05/051

Inflation and Dark Matter in the $Z_5$ Model

Authors: XinXin Qi, Hao Sun

Abstract: We discuss the possibility of unifying dark matter physics and inflation in the $Z_5$ model of the two-component dark matter. Inflation driven by the two-component dark matter fields can be divided into two cases, singlet dark matter inflation and mixed dark matter inflation, where both two-component play the role of inflaton in the latter case. For dark matter, we focus on the mixed dark matter i… ▽ More We discuss the possibility of unifying dark matter physics and inflation in the $Z_5$ model of the two-component dark matter. Inflation driven by the two-component dark matter fields can be divided into two cases, singlet dark matter inflation and mixed dark matter inflation, where both two-component play the role of inflaton in the latter case. For dark matter, we focus on the mixed dark matter inflation case. We show a viable parameter space that satisfies the theoretical and dark matter relic density constraint in the case of successful inflation. It turns out that the dark matter density is dominated by the light component, which is consistent with the feature of the $Z_5$ model of the two-component dark matter. △ Less

Submitted 28 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Journal ref: JCAP05(2023)051

arXiv:2303.14893 [pdf, other]

Context-Aware Transformer for 3D Point Cloud Automatic Annotation

Authors: Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong

Abstract: 3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples… ▽ More 3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points, and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-art by up to 1.79% 3D AP on the hard task of the KITTI test set. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.14727 [pdf, other]

You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene Understanding

Authors: Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu

Abstract: 3D scene understanding, e.g., point cloud semantic and instance segmentation, often requires large-scale annotated training data, but clearly, point-wise labels are too tedious to prepare. While some recent methods propose to train a 3D network with small percentages of point labels, we take the approach to an extreme and propose ``One Thing One Click,'' meaning that the annotator only needs to la… ▽ More 3D scene understanding, e.g., point cloud semantic and instance segmentation, often requires large-scale annotated training data, but clearly, point-wise labels are too tedious to prepare. While some recent methods propose to train a 3D network with small percentages of point labels, we take the approach to an extreme and propose ``One Thing One Click,'' meaning that the annotator only needs to label one point per object. To leverage these extremely sparse labels in network training, we design a novel self-training approach, in which we iteratively conduct the training and label propagation, facilitated by a graph propagation module. Also, we adopt a relation network to generate the per-category prototype to enhance the pseudo label quality and guide the iterative training. Besides, our model can be compatible to 3D instance segmentation equipped with a point-clustering strategy. Experimental results on both ScanNet-v2 and S3DIS show that our self-training approach, with extremely-sparse annotations, outperforms all existing weakly supervised methods for 3D semantic and instance segmentation by a large margin, and our results are also comparable to those of the fully supervised counterparts. Codes and models are available at https://github.com/liuzhengzhe/One-Thing-One-Click. △ Less

Submitted 9 September, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: Extension of One Thing One Click (CVPR'2021) arXiv:2104.02246

arXiv:2303.13479 [pdf, other]

IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation

Authors: Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi

Abstract: Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a larg… ▽ More Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a large amount of 3D models, which is labor-consuming and often not accessible in practice. This motivates us to investigate whether priors are necessary to make prior-based methods effective. Our empirical study shows that the 3D prior itself is not the credit to the high performance. The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world-space 3D models (also called canonical space). Inspired by these observations, we introduce a simple prior-free implicit space transformation network, namely IST-Net, to transform camera-space features to world-space counterparts and build correspondence between them in an implicit manner without relying on 3D priors. Besides, we design camera- and world-space enhancers to enrich the features with pose-sensitive information and geometrical constraints, respectively. Albeit simple, IST-Net achieves state-of-the-art performance based-on prior-free design, with top inference speed on the REAL275 benchmark. Our code and models are available at https://github.com/CVMI-Lab/IST-Net. △ Less

Submitted 19 July, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted by ICCV2023

arXiv:2303.11633 [pdf, other]

Learning Context-aware Classifier for Semantic Segmentation

Authors: Zhuotao Tian, Jiequan Cui, Li Jiang, Xiaojuan Qi, Xin Lai, Yixin Chen, Shu Liu, Jiaya Jia

Abstract: Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing. Different from the mainstream literature where the efficacy of strong backbones and effective decoder heads has been well studied, in this paper, additional contextual hints are instead exploite… ▽ More Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing. Different from the mainstream literature where the efficacy of strong backbones and effective decoder heads has been well studied, in this paper, additional contextual hints are instead exploited via learning a context-aware classifier whose content is data-conditioned, decently adapting to different latent distributions. Since only the classifier is dynamically altered, our method is model-agnostic and can be easily applied to generic segmentation models. Notably, with only negligible additional parameters and +2\% inference time, decent performance gain has been achieved on both small and large models with challenging benchmarks, manifesting substantial practical merits brought by our simple yet effective method. The implementation is available at \url{https://github.com/tianzhuotao/CAC}. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: AAAI 2023. Code and models are available at https://github.com/tianzhuotao/CAC

arXiv:2303.11301 [pdf, other]

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia

Abstract: 3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based… ▽ More 3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely. It is an elegant and efficient framework, with no need for sparse-to-dense conversion or NMS post-processing. Our method achieves a better speed-accuracy trade-off than other mainframe detectors on the nuScenes dataset. For the first time, we show that a fully sparse voxel-based representation works decently for LIDAR 3D object detection and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2 benchmarks validate the effectiveness of our approach. Without bells and whistles, our model outperforms all existing LIDAR methods on the nuScenes tracking test benchmark. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: In CVPR 2023, Code and models are available at https://github.com/dvlab-research/VoxelNeXt

arXiv:2303.09152 [pdf, other]

Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation

Authors: Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi

Abstract: Implicit neural rendering, which uses signed distance function (SDF) representation with geometric priors (such as depth or surface normal), has led to impressive progress in the surface reconstruction of large-scale scenes. However, applying this method to reconstruct a room-level scene from images may miss structures in low-intensity areas or small and thin objects. We conducted experiments on t… ▽ More Implicit neural rendering, which uses signed distance function (SDF) representation with geometric priors (such as depth or surface normal), has led to impressive progress in the surface reconstruction of large-scale scenes. However, applying this method to reconstruct a room-level scene from images may miss structures in low-intensity areas or small and thin objects. We conducted experiments on three datasets to identify limitations of the original color rendering loss and priors-embedded SDF scene representation. We found that the color rendering loss results in optimization bias against low-intensity areas, causing gradient vanishing and leaving these areas unoptimized. To address this issue, we propose a feature-based color rendering loss that utilizes non-zero feature values to bring back optimization signals. Additionally, the SDF representation can be influenced by objects along a ray path, disrupting the monotonic change of SDF values when a single object is present. To counteract this, we explore using the occupancy representation, which encodes each point separately and is unaffected by objects along a querying ray. Our experimental results demonstrate that the joint forces of the feature-based rendering loss and Occ-SDF hybrid representation scheme can provide high-quality reconstruction results, especially in challenging room-level scenarios. The code would be released. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.07939 [pdf, ps, other]

Measurement of hyperfine structure and the Zemach radius in $\rm^6Li^+$ using optical Ramsey technique

Authors: Wei Sun, Pei-Pei Zhang, Peng-peng Zhou, Shao-long Chen, Zhi-qiang Zhou, Yao Huang, Xiao-Qiu Qi, Zong-Chao Yan, Ting-Yun Shi, G. W. F. Drake, Zhen-Xiang Zhong, Hua Guan, Ke-lin Gao

Abstract: We investigate the $2\,^3\!S_1$--$2\,^3\!P_J$ ($J = 0, 1, 2$) transitions in $\rm^6Li^+$ using the optical Ramsey technique and achieve the most precise values of the hyperfine splittings of the $2\,^3\!S_1$ and $2\,^3\!P_J$ states, with smallest uncertainty of about 10~kHz. The present results reduce the uncertainties of previous experiments by a factor of 5 for the $2\,^3\!S_1$ state and a facto… ▽ More We investigate the $2\,^3\!S_1$--$2\,^3\!P_J$ ($J = 0, 1, 2$) transitions in $\rm^6Li^+$ using the optical Ramsey technique and achieve the most precise values of the hyperfine splittings of the $2\,^3\!S_1$ and $2\,^3\!P_J$ states, with smallest uncertainty of about 10~kHz. The present results reduce the uncertainties of previous experiments by a factor of 5 for the $2\,^3\!S_1$ state and a factor of 50 for the $2\,^3\!P_J$ states, and are in better agreement with theoretical values. Combining our measured hyperfine intervals of the $2\,^3\!S_1$ state with the latest quantum electrodynamic (QED) calculations, the improved Zemach radius of the $\rm^6Li$ nucleus is determined to be 2.44(2)~fm, with the uncertainty entirely due to the uncalculated QED effects of order $mα^7$. The result is in sharp disagreement with the value 3.71(16) fm determined from simple models of the nuclear charge and magnetization distribution. We call for a more definitive nuclear physics value of the $\rm^6Li$ Zemach radius. △ Less

Submitted 18 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 6 pages, 6 figures

arXiv:2303.03910 [pdf, other]

JUNO sensitivity to $^7$Be, $pep$, and CNO solar neutrinos

Authors: Angel Abusleme, Thomas Adam, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Muhammad Akram, Abid Aleem, Tsagkarakis Alexandros, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, Burin Asavapibhop, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta , et al. (592 additional authors not shown)

Abstract: The Jiangmen Underground Neutrino Observatory (JUNO), the first multi-kton liquid scintillator detector, which is under construction in China, will have a unique potential to perform a real-time measurement of solar neutrinos well below the few MeV threshold typical for Water Cherenkov detectors. JUNO's large target mass and excellent energy resolution are prerequisites for reaching unprecedented… ▽ More The Jiangmen Underground Neutrino Observatory (JUNO), the first multi-kton liquid scintillator detector, which is under construction in China, will have a unique potential to perform a real-time measurement of solar neutrinos well below the few MeV threshold typical for Water Cherenkov detectors. JUNO's large target mass and excellent energy resolution are prerequisites for reaching unprecedented levels of precision. In this paper, we provide estimation of the JUNO sensitivity to 7Be, pep, and CNO solar neutrinos that can be obtained via a spectral analysis above the 0.45 MeV threshold. This study is performed assuming different scenarios of the liquid scintillator radiopurity, ranging from the most opti mistic one corresponding to the radiopurity levels obtained by the Borexino experiment, up to the minimum requirements needed to perform the neutrino mass ordering determination with reactor antineutrinos - the main goal of JUNO. Our study shows that in most scenarios, JUNO will be able to improve the current best measurements on 7Be, pep, and CNO solar neutrino fluxes. We also perform a study on the JUNO capability to detect periodical time variations in the solar neutrino flux, such as the day-night modulation induced by neutrino flavor regeneration in Earth, and the modulations induced by temperature changes driven by helioseismic waves. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.01765 [pdf, other]

Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement

Authors: Xingqun Qi, Chen Liu, Muyi Sun, Lincheng Li, Changjie Fan, Xin Yu

Abstract: Predicting natural and diverse 3D hand gestures from the upper body dynamics is a practical yet challenging task in virtual avatar creation. Previous works usually overlook the asymmetric motions between two hands and generate two hands in a holistic manner, leading to unnatural results. In this work, we introduce a novel bilateral hand disentanglement based two-stage 3D hand generation method to… ▽ More Predicting natural and diverse 3D hand gestures from the upper body dynamics is a practical yet challenging task in virtual avatar creation. Previous works usually overlook the asymmetric motions between two hands and generate two hands in a holistic manner, leading to unnatural results. In this work, we introduce a novel bilateral hand disentanglement based two-stage 3D hand generation method to achieve natural and diverse 3D hand prediction from body dynamics. In the first stage, we intend to generate natural hand gestures by two hand-disentanglement branches. Considering the asymmetric gestures and motions of two hands, we introduce a Spatial-Residual Memory (SRM) module to model spatial interaction between the body and each hand by residual learning. To enhance the coordination of two hand motions wrt. body dynamics holistically, we then present a Temporal-Motion Memory (TMM) module. TMM can effectively model the temporal association between body dynamics and two hand motions. The second stage is built upon the insight that 3D hand predictions should be non-deterministic given the sequential body postures. Thus, we further diversify our 3D hand predictions based on the initial output from the stage one. Concretely, we propose a Prototypical-Memory Sampling Strategy (PSS) to generate the non-deterministic hand gestures by gradient-based Markov Chain Monte Carlo (MCMC) sampling. Extensive experiments demonstrate that our method outperforms the state-of-the-art models on the B2H dataset and our newly collected TED Hands dataset. The dataset and code are available at https://github.com/XingqunQi-lab/Diverse-3D-Hand-Gesture-Prediction. △ Less

Submitted 20 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023

arXiv:2303.00369 [pdf, other]

Indescribable Multi-modal Spatial Evaluator

Authors: Lingke Kong, X. Sharon Qi, Qijin Shen, Jiacheng Wang, Jingyi Zhang, Yanle Hu, Qichao Zhou

Abstract: Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-mo… ▽ More Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-model Spatial Evaluator (IMSE), to address multi-modal image registration. IMSE creates an accurate multi-modal spatial evaluator to measure spatial differences between two images, and then optimizes registration by minimizing the error predicted of the evaluator. To optimize IMSE performance, we also proposed a new style enhancement method called Shuffle Remap which randomizes the image distribution into multiple segments, and then randomly disorders and remaps these segments, so that the distribution of the original image is changed. Shuffle Remap can help IMSE to predict the difference in spatial location from unseen target distributions. Our results show that IMSE outperformed the existing methods for registration using T1-T2 and CT-MRI datasets. IMSE also can be easily integrated into the traditional registration process, and can provide a convenient way to evaluate and visualize registration results. IMSE also has the potential to be used as a new paradigm for image-to-image translation. Our code is available at https://github.com/Kid-Liet/IMSE. △ Less

Submitted 1 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR2023

arXiv:2302.10050 [pdf, ps, other]

Pionic transitions from $Z_c(4020)$ to $D$ wave charmonia

Authors: Xiao-Yu Qi, Qi Wu, Dian-Yong Chen

Abstract: In the present work, we investigate the charmed meson loops contributions to the pionic transitions from $Z_c(4020)^+$ to the $D$ wave triplets charmonia by using an effective Lagrangian approach. Our estimations indicate that the predicted branching fraction of $Z_c(4020)^+ \to π^+ ψ(1^3D_J) , \ J=(1,2,3)$ are much smaller than the one of $Z_c(4020)^+ \to π^+ h_c $. Thus, searching… ▽ More In the present work, we investigate the charmed meson loops contributions to the pionic transitions from $Z_c(4020)^+$ to the $D$ wave triplets charmonia by using an effective Lagrangian approach. Our estimations indicate that the predicted branching fraction of $Z_c(4020)^+ \to π^+ ψ(1^3D_J) , \ J=(1,2,3)$ are much smaller than the one of $Z_c(4020)^+ \to π^+ h_c $. Thus, searching $Z_c(4020)^\pm$ in the $π^\pm ψ(1^3D_J) $ invariant mass distributions is impossible. Thus, the observed peak structures at 4.04 and 4.13 GeV in the $π^\pm ψ(3770)$ invariant mass distributions should not come from the contributions of $Z_c(4020)^\pm$, and further precise experimental measurements of the $e^+ e^- \to π^+ π^- ψ(3770)$ process are needed to decode the nature of these two peak structures. △ Less

Submitted 16 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: 9 pages, 5 figures, accepted for publication in EPJC

arXiv:2302.09621 [pdf]

Augmenting endometriosis analysis from ultrasound data with deep learning

Authors: Adrian Balica, Jennifer Dai, Kayla Piiwaa, Xiao Qi, Ashlee N. Green, Nancy Phillips, Susan Egan, Ilker Hacihaliloglu

Abstract: Endometriosis is a non-malignant disorder that affects 176 million women globally. Diagnostic delays result in severe dysmenorrhea, dyspareunia, chronic pelvic pain, and infertility. Therefore, there is a significant need to diagnose patients at an early stage. Our objective in this work is to investigate the potential of deep learning methods to classify endometriosis from ultrasound data. Retros… ▽ More Endometriosis is a non-malignant disorder that affects 176 million women globally. Diagnostic delays result in severe dysmenorrhea, dyspareunia, chronic pelvic pain, and infertility. Therefore, there is a significant need to diagnose patients at an early stage. Our objective in this work is to investigate the potential of deep learning methods to classify endometriosis from ultrasound data. Retrospective data from 100 subjects were collected at the Rutgers Robert Wood Johnson University Hospital (New Brunswick, NJ, USA). Endometriosis was diagnosed via laparoscopy or laparotomy. We designed and trained five different deep learning methods (Xception, Inception-V4, ResNet50, DenseNet, and EfficientNetB2) for the classification of endometriosis from ultrasound data. Using 5-fold cross-validation study we achieved an average area under the receiver operator curve (AUC) of 0.85 and 0.90 respectively for the two evaluation studies. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: Accepted to 2023 SPIE Medical Imaging Conference

arXiv:2302.06107 [pdf, ps, other]

Representation type of blocks of cyclotomic Hecke algebras of type $G(r, 1, n)$

Authors: Yanbo Li, Xiangyu Qi

Abstract: Let $K$ be an algebraically closed field with $Char K\neq 2$ and $(s_1, s_2, \cdots, s_r)\in \mathbb{Z}^r$ a multicharge with $r>2$. Let $\mathcal {H}_n(q, Q)$ be a cyclotomic Hecke algebra of type $G(r, 1, n)$, where $q\neq 0, 1$ and $Q=(q^{s_1}, q^{s_2}, \cdots, q^{s_r})$. For each block $B$ of $\mathcal {H}_n(q, Q)$, we introduce a new invariant, called block move vector, which can be considere… ▽ More Let $K$ be an algebraically closed field with $Char K\neq 2$ and $(s_1, s_2, \cdots, s_r)\in \mathbb{Z}^r$ a multicharge with $r>2$. Let $\mathcal {H}_n(q, Q)$ be a cyclotomic Hecke algebra of type $G(r, 1, n)$, where $q\neq 0, 1$ and $Q=(q^{s_1}, q^{s_2}, \cdots, q^{s_r})$. For each block $B$ of $\mathcal {H}_n(q, Q)$, we introduce a new invariant, called block move vector, which can be considered as a generalization of the weight $w(B)$. We prove by using block move vector that block $B$ has finite representation type if and only if $w(B)<2$, or $B$ is Morita equivalent to $K[x]/x^{w(B)+1}$. Blocks of finite representation type with weight more than one are determined completely by block move vectors. This result implies that some blocks of finite type are Brauer tree algebras whose Brauer trees have exceptional vertex. We also determine representation type for all the blocks of cyclotomic $q$-Schur algebras. Moreover, by using our result, we construct examples of blocks with the same weight that are not derived equivalent. Examples of derived equivalent blocks being in different orbits under the adjoint action of the affine Weyl group are also given. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: 66 pages

arXiv:2301.13007 [pdf, other]

doi 10.54364/aaiml.2023.1152

EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry

Authors: Man Fai Wong, Xintong Qi, Chee Wei Tan

Abstract: In this paper, we present a deep learning-based framework for solving geometric construction problems through visual reasoning, which is useful for automated geometry theorem proving. Constructible problems in geometry often ask for the sequence of straightedge-and-compass constructions to construct a given goal given some initial setup. Our EuclidNet framework leverages the neural network archite… ▽ More In this paper, we present a deep learning-based framework for solving geometric construction problems through visual reasoning, which is useful for automated geometry theorem proving. Constructible problems in geometry often ask for the sequence of straightedge-and-compass constructions to construct a given goal given some initial setup. Our EuclidNet framework leverages the neural network architecture Mask R-CNN to extract the visual features from the initial setup and goal configuration with extra points of intersection, and then generate possible construction steps as intermediary data models that are used as feedback in the training process for further refinement of the construction step sequence. This process is repeated recursively until either a solution is found, in which case we backtrack the path for a step-by-step construction guide, or the problem is identified as unsolvable. Our EuclidNet framework is validated on complex Japanese Sangaku geometry problems, demonstrating its capacity to leverage backtracking for deep visual reasoning of challenging problems. △ Less

Submitted 27 December, 2022; originally announced January 2023.

Comments: Accepted by 2nd MATH-AI Workshop at NeurIPS'22

Journal ref: Adv. Artif. Intell. Mach. Learn.(2023), 3(1):839-852

arXiv:2301.12576 [pdf, other]

Uncovering Adversarial Risks of Test-Time Adaptation

Authors: Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

Abstract: Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts. It allows a base model to adapt to an unforeseen distribution during inference by leveraging the information from the batch of (unlabeled) test data. However, we uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by ma… ▽ More Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts. It allows a base model to adapt to an unforeseen distribution during inference by leveraging the information from the batch of (unlabeled) test data. However, we uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by malicious samples in the same batch. To exploit this vulnerability, we propose Distribution Invading Attack (DIA), which injects a small fraction of malicious data into the test batch. DIA causes models using TTA to misclassify benign and unperturbed test data, providing an entirely new capability for adversaries that is infeasible in canonical machine learning pipelines. Through comprehensive evaluations, we demonstrate the high effectiveness of our attack on multiple benchmarks across six TTA methods. In response, we investigate two countermeasures to robustify the existing insecure TTA implementations, following the principle of "security by design". Together, we hope our findings can make the community aware of the utility-security tradeoffs in deploying TTA and provide valuable insights for developing robust TTA approaches. △ Less

Submitted 4 February, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

arXiv:2301.09544 [pdf, other]

Learning to View: Decision Transformers for Active Object Detection

Authors: Wenhao Ding, Nathalie Majcherczyk, Mohit Deshpande, Xuewei Qi, Ding Zhao, Rajasimman Madhivanan, Arnie Sen

Abstract: Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve… ▽ More Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve the results if we allow planning to consume detection signals and move the robot to collect views that maximize the quality of the results. In this paper, we use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. Specifically, we propose using a Decision Transformer with online fine-tuning, which first optimizes the policy with a pre-collected expert dataset and then improves the learned policy by exploring better solutions in the environment. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator. Experimental results demonstrate that our method outperforms all baselines, including expert policy and pure offline RL methods. We also provide exhaustive analyses of the reward distribution and observation space. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted to ICRA 2023

arXiv:2301.01100 [pdf, other]

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

Authors: Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia

Abstract: A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our em… ▽ More A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Technical Report

arXiv:2301.00145 [pdf, other]

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Authors: Liguang Zhou, Yuhongze Zhou, Xiaonan Qi, Junjie Hu, Tin Lun Lam, Yangsheng Xu

Abstract: Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound sig… ▽ More Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound signals and visual images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph convolutional network (AGCN), for structure-aware audio-visual scene representation. First, the spectrogram of sound and input image is processed by a backbone network for feature extraction. Then, to build multi-scale hierarchical information of input features, we utilize an attention fusion mechanism to aggregate features from multiple layers of the backbone network. Notably, to well represent the salient regions and contextual information of audio-visual inputs, the salient acoustic graph (SAG) and contextual acoustic graph (CAG), salient visual graph (SVG), and contextual visual graph (CVG) are constructed for the audio-visual scene representation. Finally, the constructed graphs pass through a graph convolutional network for structure-aware audio-visual scene recognition. Extensive experimental results on the audio, visual and audio-visual scene recognition datasets show that promising results have been achieved by the AGCN methods. Visualizing graphs on the spectrograms and images have been presented to show the effectiveness of proposed CAG/SAG and CVG/SVG that could focus on the salient and semantic relevant regions. △ Less

Submitted 31 December, 2022; originally announced January 2023.

arXiv:2212.13771 [pdf, other]

Exploring Vision Transformers as Diffusion Learners

Authors: He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan Yao, Lei Zhang

Abstract: Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted… ▽ More Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.13341 [pdf]

doi 10.1038/s41467-024-44765-7

Unconventionally Fast Transport through Sliding Dynamics of Rodlike Particles in Macromolecular Networks

Authors: Xuanyu Zhang, Xiaobin Dai, Md Ahsan Habib, Ziyang Xu, Lijuan Gao, Wenlong Chen, Wenjie Wei, Zhongqiu Tang, Xianyu Qi, Xiangjun Gong, Lingxiang Jiang, Li-Tang Yan

Abstract: Transport of rodlike particles in confinement environments of macromolecular networks plays crucial roles in many important biological processes and technological applications. The relevant understanding has been limited to thin rods with diameter much smaller than network mesh size, although the opposite case, of which the dynamical behaviors and underlying physical mechanisms remain unclear, is… ▽ More Transport of rodlike particles in confinement environments of macromolecular networks plays crucial roles in many important biological processes and technological applications. The relevant understanding has been limited to thin rods with diameter much smaller than network mesh size, although the opposite case, of which the dynamical behaviors and underlying physical mechanisms remain unclear, is ubiquitous. Here, we solve this issue by combining experiments, simulations and theory. We find a nonmonotonic dependence of translational diffusion on rod length, characterized by length commensuration-governed unconventionally fast dynamics which is in striking contrast to the monotonic dependence for thin rods. Our results clarify that such a fast diffusion of thick rods with length of integral multiple of mesh size follows sliding dynamics and demonstrate it to be "anomalous yet Brownian". Moreover, good agreement between theoretical analysis and simulations corroborates that the sliding dynamics is an intermediate regime between hopping and Brownian dynamics, and provides a mechanistic interpretation based on the rod-length dependent entropic free energy barrier. The findings yield a principle, that is, length commensuration, for optimal design of rodlike particles with highly efficient transport in confined environments of macromolecular networks, and might enrich the physics of the diffusion dynamics in heterogeneous media. △ Less

Submitted 19 November, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.05537 [pdf, other]

Technical Debt Management in OSS Projects: An Empirical Study on GitHub

Authors: Zengyang Li, Yilin Peng, Peng Liang, Apostolos Ampatzoglou, Ran Mo, Hui Liu, Xiaoxiao Qi

Abstract: Technical debt (TD) refers to delayed tasks and immature artifacts that may bring short-term benefits but incur extra costs of change during maintenance and evolution in the long term. TD has been extensively studied in the past decade, and numerous open source software (OSS) projects were used to explore specific aspects of TD and validate various approaches for TD management (TDM). However, ther… ▽ More Technical debt (TD) refers to delayed tasks and immature artifacts that may bring short-term benefits but incur extra costs of change during maintenance and evolution in the long term. TD has been extensively studied in the past decade, and numerous open source software (OSS) projects were used to explore specific aspects of TD and validate various approaches for TD management (TDM). However, there still lacks a comprehensive understanding on the practice of TDM in OSS development, which penetrates the OSS community's perception of the TD concept and how TD is managed in OSS development. To this end, we conducted an empirical study on the whole GitHub to explore the adoption and execution of TDM based on issues in OSS projects. We collected 35,278 issues labeled as TD (TD issues) distributed over 3,598 repositories in total from the issue tracking system of GitHub between 2009 and 2020. The findings are that: (1) the OSS community is embracing the TD concept; (2) the analysis of TD instances shows that TD may affect both internal and external quality of software systems; (3) only one TD issue was identified in 31.1% of the repositories and all TD issues were identified by only one developer in 69.0% of the repositories; (4) TDM was ignored in 27.3% of the repositories after TD issues were identified; and (5) among the repositories with TD labels, 32.9% have abandoned TDM while only 8.2% adopt TDM as a consistent practice. These findings provide valuable insights for practitioners in TDM and promising research directions for further investigation. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 15 pages, 8 images, 10 tables, Manuscript submitted to a Journal (2022)

arXiv:2212.05326 [pdf, other]

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

Authors: Hai Wu, Ruifei He, Haoru Tan, Xiaojuan Qi, Kaibin Huang

Abstract: Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neur… ▽ More Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. With this representation, we can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism which allows us to obtain multiple quantized networks from one full precision source model by progressively mapping the higher precision weights to their adjacent lower precision counterparts. Then, with networks of different bit-widths from one source model, multi-objective optimization is employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. By doing this, the shared weights will be optimized to balance the performance of different quantized models, thus making the weights transferable among different bit widths. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width. Code will be available. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: Submitted to IEEE for possible publication

arXiv:2212.01749 [pdf, other]

Semantic Graph Neural Network with Multi-measure Learning for Semi-supervised Classification

Authors: Junchao Lin, Yuan Wan, Jingwen Xu, Xingchen Qi

Abstract: Graph Neural Networks (GNNs) have attracted increasing attention in recent years and have achieved excellent performance in semi-supervised node classification tasks. The success of most GNNs relies on one fundamental assumption, i.e., the original graph structure data is available. However, recent studies have shown that GNNs are vulnerable to the complex underlying structure of the graph, making… ▽ More Graph Neural Networks (GNNs) have attracted increasing attention in recent years and have achieved excellent performance in semi-supervised node classification tasks. The success of most GNNs relies on one fundamental assumption, i.e., the original graph structure data is available. However, recent studies have shown that GNNs are vulnerable to the complex underlying structure of the graph, making it necessary to learn comprehensive and robust graph structures for downstream tasks, rather than relying only on the raw graph structure. In light of this, we seek to learn optimal graph structures for downstream tasks and propose a novel framework for semi-supervised classification. Specifically, based on the structural context information of graph and node representations, we encode the complex interactions in semantics and generate semantic graphs to preserve the global structure. Moreover, we develop a novel multi-measure attention layer to optimize the similarity rather than prescribing it a priori, so that the similarity can be adaptively evaluated by integrating measures. These graphs are fused and optimized together with GNN towards semi-supervised classification objective. Extensive experiments and ablation studies on six real-world datasets clearly demonstrate the effectiveness of our proposed model and the contribution of each component. △ Less

Submitted 4 December, 2022; originally announced December 2022.

arXiv:2211.16312 [pdf, other]

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Authors: Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

Abstract: Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred to 3D scenarios due to the inaccessibility of large-scale 3D-text pairs. To this… ▽ More Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred to 3D scenarios due to the inaccessibility of large-scale 3D-text pairs. To this end, we propose to distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning multi-view images from 3D, which allows explicitly associating 3D and semantic-rich captions. Further, to foster coarse-to-fine visual-semantic representation learning from captions, we design hierarchical 3D-caption pairs, leveraging geometric constraints between 3D scenes and multi-view images. Finally, by employing contrastive learning, the model learns language-aware embeddings that connect 3D and text for open-vocabulary tasks. Our method not only remarkably outperforms baseline methods by 25.8% $\sim$ 44.7% hIoU and 14.5% $\sim$ 50.4% hAP$_{50}$ in open-vocabulary semantic and instance segmentation, but also shows robust transferability on challenging zero-shot domain transfer tasks. See the project website at https://dingry.github.io/projects/PLA. △ Less

Submitted 22 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: CVPR2023

arXiv:2211.15098 [pdf, other]

MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection

Authors: Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, Yik-Chung Wu

Abstract: Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitud… ▽ More Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Report number: AAAI2023

arXiv:2211.11727 [pdf, other]

Parametric Classification for Generalized Category Discovery: A Baseline Study

Authors: Xin Wen, Bingchen Zhao, Xiaojuan Qi

Abstract: Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers,… ▽ More Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. We hope the investigation and proposed simple framework can serve as a strong baseline to facilitate future studies in this field. Our code is available at: https://github.com/CVMI-Lab/SimGCD. △ Less

Submitted 15 December, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: v3: ICCV'23 version; v4: updated the dataset table

arXiv:2211.03266 [pdf, ps, other]

doi 10.1140/epjp/s13360-023-04700-z

A $(k+1)$-partite entanglement measure of $N$-partite quantum states

Authors: Yan Hong, Xianfei Qi, Ting Gao, Fengli Yan

Abstract: The concept of \textquotedblleft the permutationally invariant part of a density matrx\textquotedblright constitutes an important tool for entanglement characterization of multiqubit systems. In this paper, we first present $(k+1)$-partite entanglement measure of $N$-partite quantum system, which possesses desirable properties of an entanglement measure. Moreover, we give strong bounds on this mea… ▽ More The concept of \textquotedblleft the permutationally invariant part of a density matrx\textquotedblright constitutes an important tool for entanglement characterization of multiqubit systems. In this paper, we first present $(k+1)$-partite entanglement measure of $N$-partite quantum system, which possesses desirable properties of an entanglement measure. Moreover, we give strong bounds on this measure by considering the permutationally invariant part of a multipartite state. We give two definitions of efficient measurable degree of $(k+1)$-partite entanglement. Finally, several concrete examples are given to illustrate the effectiveness of our results. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Journal ref: Eur. Phys. J. Plus (2023) 138:1081

arXiv:2211.00899 [pdf, other]

LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

Authors: Hao Dang, Yuekai Zhang, Xingqun Qi, Wanting Zhou, Muyi Sun

Abstract: In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for… ▽ More In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for lightweight coronary artery vessel segmentation. Primarily, we propose a Feature-wise Similarity Distillation (FSD) module for semantic-shift modeling. Specifically, we calculate the feature similarity between the symmetric layers from the encoder and decoder. Then the similarity is transferred as knowledge from a cumbersome teacher network to a non-trained lightweight student network. Meanwhile, for encouraging the student model to learn more pixel-wise semantic information, we introduce the Adversarial Similarity Distillation (ASD) module. Concretely, the ASD module aims to construct the spatial adversarial correlation between the annotation and prediction from the teacher and student models, respectively. Through the ASD module, the student model obtains fined-grained subtle edge segmented results of the coronary artery vessel. Extensive experiments conducted on Clinical Coronary Artery Vessel Dataset demonstrate that LightVessel outperforms various knowledge distillation counterparts. △ Less

Submitted 25 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 5 pages, 7 figures, conference

arXiv:2210.17002 [pdf, ps, other]

doi 10.1088/1361-648X/acb8f5

Two-band description of the strong `spin'-orbit coupled one-dimensional hole gas in a cylindrical Ge nanowire

Authors: Rui Li, Xin-Yu Qi

Abstract: The low-energy effective Hamiltonian of the strong `spin'-orbit coupled one-dimensional hole gas in a cylindrical Ge nanowire in the presence of a strong magnetic field is studied both numerically and analytically. Basing on the Luttinger-Kohn Hamiltonian in the spherical approximation, we show this strong `spin'-orbit coupled one-dimensional hole gas can be accurately described by an effective tw… ▽ More The low-energy effective Hamiltonian of the strong `spin'-orbit coupled one-dimensional hole gas in a cylindrical Ge nanowire in the presence of a strong magnetic field is studied both numerically and analytically. Basing on the Luttinger-Kohn Hamiltonian in the spherical approximation, we show this strong `spin'-orbit coupled one-dimensional hole gas can be accurately described by an effective two-band Hamiltonian $H^{\rm ef}=\hbar^{2}k^{2}_{z}/(2m^{*}_{h})+ασ^{x}k_{z}+g^{*}_{h}μ_{B}Bσ^{z}/2$, as long as the magnetic field is purely longitudinal or purely transverse. The explicit magnetic field dependent expressions of the `spin'-orbit coupling $α\equivα(B)$ and the effective $g$-factor $g^{*}_{h}\equiv\,g^{*}_{h}(B)$ are given. When the magnetic field is applied in an arbitrary direction, the two-band Hamiltonian description is still a good approximation. △ Less

Submitted 10 February, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

Comments: 8 pages, 7 figures

Journal ref: J. Phys.: Condens. Matter 35, 135302 (2023)

arXiv:2210.16810 [pdf, other]

SL3D: Self-supervised-Self-labeled 3D Recognition

Authors: Fernando Julio Cendra, Lan Ma, Jiajun Shen, Xiaojuan Qi

Abstract: Deep learning has attained remarkable success in many 3D visual recognition tasks, including shape classification, object detection, and semantic segmentation. However, many of these results rely on manually collecting densely annotated real-world 3D data, which is highly time-consuming and expensive to obtain, limiting the scalability of 3D recognition tasks. Thus, we study unsupervised 3D recogn… ▽ More Deep learning has attained remarkable success in many 3D visual recognition tasks, including shape classification, object detection, and semantic segmentation. However, many of these results rely on manually collecting densely annotated real-world 3D data, which is highly time-consuming and expensive to obtain, limiting the scalability of 3D recognition tasks. Thus, we study unsupervised 3D recognition and propose a Self-supervised-Self-Labeled 3D Recognition (SL3D) framework. SL3D simultaneously solves two coupled objectives, i.e., clustering and learning feature representation to generate pseudo-labeled data for unsupervised 3D recognition. SL3D is a generic framework and can be applied to solve different 3D recognition tasks, including classification, object detection, and semantic segmentation. Extensive experiments demonstrate its effectiveness. Code is available at https://github.com/fcendra/sl3d. △ Less

Submitted 16 December, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

Comments: This paper has already been accepted by Neural Information Processing Systems (NeurIPS 2022) Workshop on Self-Supervised Learning: Theory and Practice

arXiv:2210.12262 [pdf, other]

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

Authors: Mengdi Xu, Peide Huang, Yaru Niu, Visak Kumar, Jielin Qiu, Chao Fang, Kuan-Hui Lee, Xuewei Qi, Henry Lam, Bo Li, Ding Zhao

Abstract: One key challenge for multi-task Reinforcement learning (RL) in practice is the absence of task indicators. Robust RL has been applied to deal with task ambiguity, but may result in over-conservative policies. To balance the worst-case (robustness) and average performance, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP), a flexible hierarchical MDP formulation that encod… ▽ More One key challenge for multi-task Reinforcement learning (RL) in practice is the absence of task indicators. Robust RL has been applied to deal with task ambiguity, but may result in over-conservative policies. To balance the worst-case (robustness) and average performance, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP), a flexible hierarchical MDP formulation that encodes task groups via a latent mixture model. GDR-MDP identifies the optimal policy that maximizes the expected return under the worst-possible qualified belief over task groups within an ambiguity set. We rigorously show that GDR-MDP's hierarchical structure improves distributional robustness by adding regularization to the worst possible outcomes. We then develop deep RL algorithms for GDR-MDP for both value-based and policy-based RL methods. Extensive experiments on Box2D control tasks, MuJoCo benchmarks, and Google football platforms show that our algorithms outperform classic robust training algorithms across diverse environments in terms of robustness under belief uncertainties. Demos are available on our project page (\url{https://sites.google.com/view/gdr-rl/home}). △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 27 pages, 10 figures

arXiv:2210.09509 [pdf, other]

Deep Data Augmentation for Weed Recognition Enhancement: A Diffusion Probabilistic Model and Transfer Learning Based Approach

Authors: Dong Chen, Xinda Qi, Yu Zheng, Yuzhen Lu, Zhaojian Li

Abstract: Weed management plays an important role in many modern agricultural applications. Conventional weed control methods mainly rely on chemical herbicides or hand weeding, which are often cost-ineffective, environmentally unfriendly, or even posing a threat to food safety and human health. Recently, automated/robotic weeding using machine vision systems has seen increased research attention with its p… ▽ More Weed management plays an important role in many modern agricultural applications. Conventional weed control methods mainly rely on chemical herbicides or hand weeding, which are often cost-ineffective, environmentally unfriendly, or even posing a threat to food safety and human health. Recently, automated/robotic weeding using machine vision systems has seen increased research attention with its potential for precise and individualized weed treatment. However, dedicated, large-scale, and labeled weed image datasets are required to develop robust and effective weed identification systems but they are often difficult and expensive to obtain. To address this issue, data augmentation approaches, such as generative adversarial networks (GANs), have been explored to generate highly realistic images for agricultural applications. Yet, despite some progress, those approaches are often complicated to train or have difficulties preserving fine details in images. In this paper, we present the first work of applying diffusion probabilistic models (also known as diffusion models) to generate high-quality synthetic weed images based on transfer learning. Comprehensive experimental results show that the developed approach consistently outperforms several state-of-the-art GAN models, representing the best trade-off between sample fidelity and diversity and highest FID score on a common weed dataset, CottonWeedID15. In addition, the expanding dataset with synthetic weed images can apparently boost model performance on four deep learning (DL) models for the weed classification tasks. Furthermore, the DL models trained on CottonWeedID15 dataset with only 10% of real images and 90% of synthetic weed images achieve a testing accuracy of over 94%, showing high-quality of the generated weed samples. The codes of this study are made publicly available at https://github.com/DongChen06/DMWeeds. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 15 pages, 9 figures

arXiv:2210.07574 [pdf, other]

Is synthetic data from generative models ready for image recognition?

Authors: Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

Abstract: Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be use… ▽ More Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings (i.e. zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks. Code: https://github.com/CVMI-Lab/SyntheticData. △ Less

Submitted 15 February, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: ICLR 2023, spotlight

arXiv:2210.05593 [pdf, other]

Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

Authors: Shizhen Zhao, Xiaojuan Qi

Abstract: Most existing 3D point cloud object detection approaches heavily rely on large amounts of labeled training data. However, the labeling process is costly and time-consuming. This paper considers few-shot 3D point cloud object detection, where only a few annotated samples of novel classes are needed with abundant samples of base classes. To this end, we propose Prototypical VoteNet to recognize and… ▽ More Most existing 3D point cloud object detection approaches heavily rely on large amounts of labeled training data. However, the labeling process is costly and time-consuming. This paper considers few-shot 3D point cloud object detection, where only a few annotated samples of novel classes are needed with abundant samples of base classes. To this end, we propose Prototypical VoteNet to recognize and localize novel instances, which incorporates two new modules: Prototypical Vote Module (PVM) and Prototypical Head Module (PHM). Specifically, as the 3D basic geometric structures can be shared among categories, PVM is designed to leverage class-agnostic geometric prototypes, which are learned from base classes, to refine local features of novel categories.Then PHM is proposed to utilize class prototypes to enhance the global feature of each object, facilitating subsequent object localization and classification, which is trained by the episodic training strategy. To evaluate the model in this new setting, we contribute two new benchmark datasets, FS-ScanNet and FS-SUNRGBD. We conduct extensive experiments to demonstrate the effectiveness of Prototypical VoteNet, and our proposed method shows significant and consistent improvements compared to baselines on two benchmark datasets. △ Less

Submitted 21 December, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2210.03555 [pdf]

In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks

Authors: Kaibin Huang, Hai Wu, Zhiyan Liu, Xiaojuan Qi

Abstract: The sixth-generation (6G) mobile networks are expected to feature the ubiquitous deployment of machine learning and AI algorithms at the network edge. With rapid advancements in edge AI, the time has come to realize intelligence downloading onto edge devices (e.g., smartphones and sensors). To materialize this version, we propose a novel technology in this article, called in-situ model downloading… ▽ More The sixth-generation (6G) mobile networks are expected to feature the ubiquitous deployment of machine learning and AI algorithms at the network edge. With rapid advancements in edge AI, the time has come to realize intelligence downloading onto edge devices (e.g., smartphones and sensors). To materialize this version, we propose a novel technology in this article, called in-situ model downloading, that aims to achieve transparent and real-time replacement of on-device AI models by downloading from an AI library in the network. Its distinctive feature is the adaptation of downloading to time-varying situations (e.g., application, location, and time), devices' heterogeneous storage-and-computing capacities, and channel states. A key component of the presented framework is a set of techniques that dynamically compress a downloaded model at the depth-level, parameter-level, or bit-level to support adaptive model downloading. We further propose a virtualized 6G network architecture customized for deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI library. Furthermore, experiments are conducted to quantify 6G connectivity requirements and research opportunities pertaining to the proposed technology are discussed. △ Less

Submitted 2 April, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: To appear in IEEE Wireless Communications

arXiv:2209.14201 [pdf, other]

Spatial Pruned Sparse Convolution for Efficient 3D Object Detection

Authors: Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi

Abstract: 3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects. In this paper, we analyze major components of existing sparse 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational ove… ▽ More 3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects. In this paper, we analyze major components of existing sparse 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead. Inspired by this, we propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS-Conv), both of which are based on the idea of dynamically determining crucial areas for redundancy reduction. We validate that the magnitude can serve as important cues to determine crucial areas which get rid of the extra computations of learning-based methods. The proposed modules can easily be incorporated into existing sparse 3D CNNs without extra architectural modifications. Extensive experiments on the KITTI, Waymo and nuScenes datasets demonstrate that our method can achieve more than 50% reduction in GFLOPs without compromising the performance. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2209.13103 [pdf]

A Review: Random Walk in Graph Sampling

Authors: Xiao Qi

Abstract: Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. Among various graph sampling approaches, Traversal Based Sampling (TBS) are widely used due to low cost and feasibility for many cases, in which Simple Random Walk (SRW) and its variants share a large proportion in TBS. We illustrate the foundation SRW and presents the problems of SRW. Based on the proble… ▽ More Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. Among various graph sampling approaches, Traversal Based Sampling (TBS) are widely used due to low cost and feasibility for many cases, in which Simple Random Walk (SRW) and its variants share a large proportion in TBS. We illustrate the foundation SRW and presents the problems of SRW. Based on the problems, we provide a taxonomy of different Random Walk (RW) based graph sampling methods and give an insight to the reason why and how they revise SRW. our summary includes classical methods and state-of-art RW-based methods. There are 3 ways to propose new algorithms based on SRW, including SRW and its combinations, modified selection mechanisms, and the graph topology modification. We explained the ideas behind those algorithms, and present detailed pseudo codes. In addition, we add the mathematics behind random walk, and the essence of random walk variants, which is not mentioned in detail in many research papers and literature reviews. Apart from RW-based methods, SRW also has related with the non-RW and non-TBS methods, we discuss the relationships between SRW and non-RW methods, and the relationships between SRW and non-TBS methods. The relations between these approaches are formally argued and a general framework to bridge theoretical analysis and practical implementation is provided. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.12804 [pdf]

Efficient Random Walk based Sampling with Inverse Degree

Authors: Xiao Qi

Abstract: Random walk sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as MHRW design weighted walking by repeating low-degree nodes while rejecting high-degree nodes, so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, howe… ▽ More Random walk sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as MHRW design weighted walking by repeating low-degree nodes while rejecting high-degree nodes, so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, however, may make the sampler stay in the same node for several times, leading to undersampling. To address this issue, we propose a sampling framework that only need current and candidate node degree to improve the performance of graph sampling methods. We also extend our original idea to a more general framework. Our extended IDRW method finds a balance between the large deviation problem of SRW and sample rejection problem in MHRW. We evaluate our technique in simulation by running extensive experiments on various real-world datasets, and the result show that our method improves the accuracy compared with the state of art techniques. We also investigate the effect of the parameter and give the suggested range for a better usage in application. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.12797 [pdf, other]

Rethinking Resolution in the Context of Efficient Video Recognition

Authors: Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi

Abstract: In this paper, we empirically study how to make the most of low-resolution frames for efficient video recognition. Existing methods mainly focus on developing compact networks or alleviating temporal redundancy of video inputs to increase efficiency, whereas compressing frame resolution has rarely been considered a promising solution. A major concern is the poor recognition accuracy on low-resolut… ▽ More In this paper, we empirically study how to make the most of low-resolution frames for efficient video recognition. Existing methods mainly focus on developing compact networks or alleviating temporal redundancy of video inputs to increase efficiency, whereas compressing frame resolution has rarely been considered a promising solution. A major concern is the poor recognition accuracy on low-resolution frames. We thus start by analyzing the underlying causes of performance degradation on low-resolution frames. Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale. Motivated by the success of knowledge distillation (KD), we propose to bridge the gap between network and input size via cross-resolution KD (ResKD). Our work shows that ResKD is a simple but effective method to boost recognition accuracy on low-resolution frames. Without bells and whistles, ResKD considerably surpasses all competitive methods in terms of efficiency and accuracy on four large-scale benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V2. In addition, we extensively demonstrate its effectiveness over state-of-the-art architectures, i.e., 3D-CNNs and Video Transformers, and scalability towards super low-resolution frames. The results suggest ResKD can serve as a general inference acceleration method for state-of-the-art video recognition. Our code will be available at https://github.com/CVMI-Lab/ResKD. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted by NIPS2022

arXiv:2209.12767 [pdf]

Weighted Jump in Random Walk Graph Sampling

Authors: Xiao Qi

Abstract: Random walk based sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as GMD modify the topology of target graphs so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, however, reduces the conductance of graphs, thus ma… ▽ More Random walk based sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as GMD modify the topology of target graphs so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, however, reduces the conductance of graphs, thus makes the sampler stay in the same node for long time, resulting in undersampling. To address this issue, we propose a new way of modifying target graph, thus propose Weighted Jump Random Walk (WJRW) with parameter C to improve the performance. We prove that WJRW can unify Simple Random Walk and uniform distribution through C, and we also conduct extensive experiments on real-world dataset. The experimental results show WJRW can promote the accuracy significantly under the same budget. We also investigate the effect of the parameter C, and give the suggested range for a better usage in application. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.04145 [pdf, other]

ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation

Authors: Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu

Abstract: Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape data, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes. This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for pai… ▽ More Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape data, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes. This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for paired text-shape data. Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes by harnessing a pre-trained single-view reconstruction (SVR) model with multi-view supervisions: first map the CLIP image feature to the detail-rich shape space in the SVR model, then map the CLIP text feature to the shape space and optimize the mapping by encouraging CLIP consistency between the input text and the rendered images. Further, we formulate a text-guided shape stylization module to dress up the output shapes with novel textures. Beyond existing works on 3D shape generation from text, our new approach is general for creating shapes in a broad range of categories, without requiring paired text-shape data. Experimental results manifest that our approach outperforms the state-of-the-arts and our baselines in terms of fidelity and consistency with text. Further, our approach can stylize the generated shapes with both realistic and fantasy structures and textures. △ Less

Submitted 23 February, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: ICLR 2023 spotlight

arXiv:2209.02940 [pdf, other]

Emergent bulk gauge field in random tensor networks

Authors: Xiao-Liang Qi

Abstract: Random tensor network states are toy models for holographic duality, which have entanglement properties determined by graph geometry. In this paper, we propose a generalization of the random tensor network states which describe an ensemble of states preserving a given global symmetry. We show that Renyi entropy for this family of states can be described by a quantum extremal surface formula, with… ▽ More Random tensor network states are toy models for holographic duality, which have entanglement properties determined by graph geometry. In this paper, we propose a generalization of the random tensor network states which describe an ensemble of states preserving a given global symmetry. We show that Renyi entropy for this family of states can be described by a quantum extremal surface formula, with corrections to the area law term determined by a bulk gauge theory wavefunction. This provides a toy model of the correspondence between boundary global symmetry and bulk gauge symmetry in holographic duality. We discuss the boundary physical consequences of the bulk deconfined and confined phases. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: A paper contributed to "A Festschrift in Honor of the C. N. Yang Centenary". 17 pages. 3 figures

arXiv:2208.13953 [pdf, other]

doi 10.1364/OL.475254

Imaginary coupling induced Dirac points and group velocity control in non-reciprocal Hermitian Lattice

Authors: Yuandan Wang, Junhao Yang, Yu Dang, Haohao Wang, Guoguo Xin, Xinyuan Qi

Abstract: We propose a mechanism to achieve the group velocity control of bifurcation light via an imaginary coupling effect in the non-reciprocal lattice. The physical model is composed of two-layer photonic lattices with non-reciprocal coupling in each unit cell, which can support a real energy spectrum with a pair of Dirac points in the first Brillouin zone due to the Hermicity. Furthermore, we show that… ▽ More We propose a mechanism to achieve the group velocity control of bifurcation light via an imaginary coupling effect in the non-reciprocal lattice. The physical model is composed of two-layer photonic lattices with non-reciprocal coupling in each unit cell, which can support a real energy spectrum with a pair of Dirac points in the first Brillouin zone due to the Hermicity. Furthermore, we show that the systems experience topological phase transition at the Dirac points by tuning the coupling strength, allowing the existence of topological edge states on the left or right boundaries of respective lattice layers. By adjusting the imaginary coupling and the wave number, the group velocity of the light wave can be manipulated, and bifurcation light transmission can be achieved both at the Dirac points and the condition without the group velocity dispersion. Our work might guide the design of photonic directional couplers with group velocity control functions. △ Less

Submitted 2 September, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

Showing 151–200 of 623 results for author: Qi, X