Search | arXiv e-print repository

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

Authors: Yinlin Guo, Yening Lv, Jinqiao Dou, Yan Zhang, Yuehai Wang

Abstract: While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed b… ▽ More While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed by the inverse short-time Fourier transform to synthesize waveforms; 2) To compress the model size, we introduce the grouped parameter-sharing mechanism to the text encoder and flow-based model; 3) We further employ the large pre-trained WavLM model for adversarial training to improve synthesis quality. Experimental results show that our model achieves a real-time factor of 0.0139 on an Intel Core i9 CPU, 8.8x faster than the baseline (0.1221), with a 1.6x parameter compression. Objective and subjective evaluations indicate that FLY-TTS exhibits comparable speech quality to the strong baseline. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted to Interspeech 2024. 5 pages, 1 figure

arXiv:2406.10098 [pdf, other]

ECGMamba: Towards Efficient ECG Classification with BiSSM

Authors: Yupeng Qiang, Xunde Dong, Xiuling Liu, Yang Yang, Yihai Fang, Jianhong Dou

Abstract: Electrocardiogram (ECG) signal analysis represents a pivotal technique in the diagnosis of cardiovascular diseases. Although transformer-based models have made significant progress in ECG classification, they exhibit inefficiencies in the inference phase. The issue is primarily attributable to the secondary computational complexity of Transformer's self-attention mechanism. particularly when proce… ▽ More Electrocardiogram (ECG) signal analysis represents a pivotal technique in the diagnosis of cardiovascular diseases. Although transformer-based models have made significant progress in ECG classification, they exhibit inefficiencies in the inference phase. The issue is primarily attributable to the secondary computational complexity of Transformer's self-attention mechanism. particularly when processing lengthy sequences. To address this issue, we propose a novel model, ECGMamba, which employs a bidirectional state-space model (BiSSM) to enhance classification efficiency. ECGMamba is based on the innovative Mamba-based block, which incorporates a range of time series modeling techniques to enhance performance while maintaining the efficiency of inference. The experimental results on two publicly available ECG datasets demonstrate that ECGMamba effectively balances the effectiveness and efficiency of classification, achieving competitive performance. This study not only contributes to the body of knowledge in the field of ECG classification but also provides a new research path for efficient and accurate ECG signal analysis. This is of guiding significance for the development of diagnostic models for cardiovascular diseases. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:2404.17858 by other authors

arXiv:2406.08009 [pdf, other]

OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Jianyu Dou, Yi Yang, Yufeng Yue

Abstract: In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby… ▽ More In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby overlooking the intricate details of the object's interior. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object-level. Moreover, we incorporate part-level features into the neural fields, enabling a nuanced representation of object interiors. This approach captures object-level instances while maintaining a fine-grained understanding. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot semantic segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at multiple scales, including global movement and local manipulation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages, 7figures. Project Url: https://openobj.github.io/

arXiv:2403.10761 [pdf, other]

Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning

Authors: Jizhe Dou, Haotian Zhang, Guodong Sun

Abstract: Recently there has been a growing interest in industry and academia, regarding the use of wireless chargers to prolong the operational longevity of unmanned aerial vehicles (commonly knowns as drones). In this paper we consider a charger-assisted drone application: a drone is deployed to observe a set points of interest, while a charger can move to recharge the drone's battery. We focus on the rou… ▽ More Recently there has been a growing interest in industry and academia, regarding the use of wireless chargers to prolong the operational longevity of unmanned aerial vehicles (commonly knowns as drones). In this paper we consider a charger-assisted drone application: a drone is deployed to observe a set points of interest, while a charger can move to recharge the drone's battery. We focus on the route and charging schedule of the drone and the mobile charger, to obtain high observation utility with the shortest possible time, while ensuring the drone remains operational during task execution. Essentially, this proposed drone-charger scheduling problem is a multi-stage decision-making process, in which the drone and the mobile charger act as two agents who cooperate to finish a task. The discrete-continuous hybrid action space of the two agents poses a significant challenge in our problem. To address this issue, we present a hybrid-action deep reinforcement learning framework, called HaDMC, which uses a standard policy learning algorithm to generate latent continuous actions. Motivated by representation learning, we specifically design and train an action decoder. It involves two pipelines to convert the latent continuous actions into original discrete and continuous actions, by which the drone and the charger can directly interact with environment. We embed a mutual learning scheme in model training, emphasizing the collaborative rather than individual actions. We conduct extensive numerical experiments to evaluate HaDMC and compare it with state-of-the-art deep reinforcement learning approaches. The experimental results show the effectiveness and efficiency of our solution. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2308.00942 [pdf]

doi 10.1038/s41377-023-01340-x

On the use of deep learning for phase recovery

Authors: Kaiqiang Wang, Li Song, Chutian Wang, Zhenbo Ren, Guangyuan Zhao, Jiazhen Dou, Jianglei Di, George Barbastathis, Renjie Zhou, Jianlin Zhao, Edmund Y. Lam

Abstract: Phase recovery (PR) refers to calculating the phase of the light field from its intensity measurements. As exemplified from quantitative phase imaging and coherent diffraction imaging to adaptive optics, PR is essential for reconstructing the refractive index distribution or topography of an object and correcting the aberration of an imaging system. In recent years, deep learning (DL), often imple… ▽ More Phase recovery (PR) refers to calculating the phase of the light field from its intensity measurements. As exemplified from quantitative phase imaging and coherent diffraction imaging to adaptive optics, PR is essential for reconstructing the refractive index distribution or topography of an object and correcting the aberration of an imaging system. In recent years, deep learning (DL), often implemented through deep neural networks, has provided unprecedented support for computational imaging, leading to more efficient solutions for various PR problems. In this review, we first briefly introduce conventional methods for PR. Then, we review how DL provides support for PR from the following three stages, namely, pre-processing, in-processing, and post-processing. We also review how DL is used in phase image processing. Finally, we summarize the work in DL for PR and outlook on how to better use DL to improve the reliability and efficiency in PR. Furthermore, we present a live-updating resource (https://github.com/kqwang/phase-recovery) for readers to learn more about PR. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 82 pages, 32 figures

Journal ref: Light: Science & Applications 13, 4 (2024)

arXiv:2305.02854 [pdf, other]

Distributed Construction of Near-Optimal Compact Routing Schemes for Planar Graphs

Authors: Jinfeng Dou, Thorsten Götte, Henning Hillebrandt, Christian Scheideler, Julian Werthmann

Abstract: We consider the problem of computing compact routing tables for a (weighted) planar graph $G:= (V, E,w)$ in the PRAM, CONGEST, and the novel HYBRID communication model. We present algorithms with polylogarithmic work and communication that are almost optimal in all relevant parameters, i.e., computation time, table sizes, and stretch. All algorithms are heavily randomized, and all our bounds hold… ▽ More We consider the problem of computing compact routing tables for a (weighted) planar graph $G:= (V, E,w)$ in the PRAM, CONGEST, and the novel HYBRID communication model. We present algorithms with polylogarithmic work and communication that are almost optimal in all relevant parameters, i.e., computation time, table sizes, and stretch. All algorithms are heavily randomized, and all our bounds hold w.h.p. For a given parameter $ε>0$, our scheme computes labels of size $\widetilde{O}(ε^{-1})$ and is computed in $\widetilde{O}(ε^{-2})$ time and $\widetilde{O}(n)$ work in the PRAM and the HYBRID model and $\widetilde{O}(ε^{-2} \cdot HD)$ (Here, $HD$ denotes the network's hop-diameter) time in CONGEST. The stretch of the resulting routing scheme is $1+ε$. To achieve these results, we extend the divide-and-conquer framework of Li and Parter [STOC '19] and combine it with state-of-the-art distributed distance approximation algorithms [STOC '22]. Furthermore, we provide a distributed decomposition scheme, which may be of independent interest. △ Less

Submitted 12 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.06388 [pdf, other]

How Practical Phase-shift Errors Affect Beamforming of Reconfigurable Intelligent Surface?

Authors: Jun Yang, Yijian Chen, Yijun Cui, Qingqing Wu, Jianwu Dou, Yuxin Wang

Abstract: Reconfigurable intelligent surface (RIS) is a new technique that is able to manipulate the wireless environment smartly and has been exploited for assisting the wireless communications, especially at high frequency band. However, it suffers from hardware impairments (HWIs) in practical designs, which inevitably degrades its performance and thus limits its full potential. To address this practical… ▽ More Reconfigurable intelligent surface (RIS) is a new technique that is able to manipulate the wireless environment smartly and has been exploited for assisting the wireless communications, especially at high frequency band. However, it suffers from hardware impairments (HWIs) in practical designs, which inevitably degrades its performance and thus limits its full potential. To address this practical issue, we first propose a new RIS reflection model involving phase-shift errors, which is then verified by the measurement results from field trials. With this beamforming model, various phase-shift errors caused by different HWIs can be analyzed. The phase-shift errors are classified into three categories: (1) globally independent and identically distributed errors, (2) grouped independent and identically distributed errors and (3) grouped fixed errors. The impact of typical HWIs, including frequency mismatch, PIN diode failures and panel deformation, on RIS beamforming ability are studied with the theoretical model and are compared with numerical results. The impact of frequency mismatch are discussed separately for narrow-band and wide-band beamforming. Finally, useful insights and guidelines on the RIS design and its deployment are highlighted for practical wireless systems. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2304.03708 [pdf, other]

Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

Authors: Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu , et al. (5 additional authors not shown)

Abstract: Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi… ▽ More Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}. △ Less

Submitted 9 August, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2212.05024 [pdf, other]

Decomposable Sparse Tensor on Tensor Regression

Authors: Haiyi Mao, Jason Xiaotian Dou

Abstract: Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be de… ▽ More Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be decomposed into standard inner products and outer products, the problem can be simply transformed into a tensor to scalar regression followed by a tensor decomposition. So we propose a fast solution based on stagewise search composed by contraction part and generation part which are optimized alternatively. We successfully demonstrate our method can out perform current methods in terms of accuracy and predictors selection by effectively incorporating the structural information. △ Less

Submitted 14 December, 2022; v1 submitted 9 December, 2022; originally announced December 2022.

arXiv:2210.02284 [pdf, other]

Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics

Authors: Zihao Wang, Jiaheng Dou, Yong Zhang

Abstract: Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC fo… ▽ More Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies demonstrate the effectiveness of our approaches. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: COLING 2022; Github repository https://github.com/zihao-wang/rots ; Partially overlapped with arXiv:2002.00745 ; 20 pages, 5 figures, 17 tables

arXiv:2208.08056 [pdf, other]

doi 10.13140/RG.2.2.21905.92008

Sampling Through the Lens of Sequential Decision Making

Authors: Jason Xiaotian Dou, Alvin Qingkai Pan, Runxue Bao, Haiyi Harry Mao, Lei Luo, Zhi-Hong Mao

Abstract: Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics.… ▽ More Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments. △ Less

Submitted 13 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2207.07734 [pdf, other]

COEM: Cross-Modal Embedding for MetaCell Identification

Authors: Haiyi Mao, Minxue Jia, Jason Xiaotian Dou, Haotian Zhang, Panayiotis V. Benos

Abstract: Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present \textbf{C}ross-M\textbf{O}dal \textbf{E}mbedding for \textbf{M}etaCell Id… ▽ More Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present \textbf{C}ross-M\textbf{O}dal \textbf{E}mbedding for \textbf{M}etaCell Identification (COEM), which utilizes an embedded space leveraging the information of both scATAC-seq and scRNA-seq to perform aggregation, balancing the trade-off between fine resolution and sufficient sequencing coverage. COEM outperforms the state-of-the-art method SEACells by efficiently identifying accurate and well-separated metacells across datasets with continuous and discrete cell types. Furthermore, COEM significantly improves peak-to-gene association analyses, and facilitates complex gene regulatory inference tasks. △ Less

Submitted 24 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: 5 pages, 2 figures, ICML workshop on computational biology

arXiv:2204.00298 [pdf, other]

Unitail: Detecting, Reading, and Matching in Retail Scene

Authors: Fangyi Chen, Han Zhang, Zaiwang Li, Jiachen Dou, Shentong Mo, Hao Chen, Yongxin Zhang, Uzair Ahmed, Chenchen Zhu, Marios Savvides

Abstract: To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, th… ▽ More To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, the Unitail offers a detection dataset to align product appearance better. Furthermore, it provides a gallery-style OCR dataset containing 1454 product categories, 30k text regions, and 21k transcriptions to enable robust reading on products and motivate enhanced product matching. Besides benchmarking the datasets using various state-of-the-arts, we customize a new detector for product detection and provide a simple OCR-based matching solution that verifies its effectiveness. △ Less

Submitted 20 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: ECCV 2022

arXiv:2112.11730 [pdf, other]

GUX-Analyzer: A Deep Multi-modal Analyzer Via Motivational Flow For Game User Experience

Authors: Zhitao Liu, Ning Xie, Guobiao Yang, Jiale Dou, Lanxiao Huang, Guang Yang, Lin Yuan

Abstract: Quantitative analysis of Game User eXperience (GUX) is important to the game industry. Different from the typical questionnaire analysis, this paper focuses on the computational analysis of GUX. We aim to analyze the relationship between game and players using the multi-modal data including physiological data and game process data. We theoretically extend the Flow model from the classic skill-and-… ▽ More Quantitative analysis of Game User eXperience (GUX) is important to the game industry. Different from the typical questionnaire analysis, this paper focuses on the computational analysis of GUX. We aim to analyze the relationship between game and players using the multi-modal data including physiological data and game process data. We theoretically extend the Flow model from the classic skill-and-challenge plane by expanding new dimension on motivation, which is the result of the multi-modal data analysis on affect, and physiological data. We call this 3D Flow as Motivational Flow, MovFlow. Meanwhile, we implement a quantitative GUX Analysis System (GUXAS), which can predict the player's in-game experience state by only using game process data. It analyzes the correlation among not only in-game state, but the player's psychological-and-physiological reaction in the entire interactive game-play process. The experiments demonstrated our MovFlow model efficiently distinguished the users' in-game experience states from the perspective of GUX. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2106.10493 [pdf, other]

CenterAtt: Fast 2-stage Center Attention Network

Authors: Jianyun Xu, Xin Tang, Jian Dou, Xu Shu, Yushi Zhu

Abstract: In this technical report, we introduce the methods of HIKVISION_LiDAR_Det in the challenge of waymo open dataset real-time 3D detection. Our solution for the competition are built upon Centerpoint 3D detection framework. Several variants of CenterPoint are explored, including center attention head and feature pyramid network neck. In order to achieve real time detection, methods like batchnorm mer… ▽ More In this technical report, we introduce the methods of HIKVISION_LiDAR_Det in the challenge of waymo open dataset real-time 3D detection. Our solution for the competition are built upon Centerpoint 3D detection framework. Several variants of CenterPoint are explored, including center attention head and feature pyramid network neck. In order to achieve real time detection, methods like batchnorm merge, half-precision floating point network and GPU-accelerated voxelization process are adopted. By using these methods, our team ranks 6th among all the methods on real-time 3D detection challenge in the waymo open dataset. △ Less

Submitted 19 June, 2021; originally announced June 2021.

arXiv:2103.12978 [pdf, other]

RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation

Authors: Jianyun Xu, Ruixiang Zhang, Jian Dou, Yushi Zhu, Jie Sun, Shiliang Pu

Abstract: Point clouds can be represented in many forms (views), typically, point-based sets, voxel-based cells or range-based images(i.e., panoramic view). The point-based view is geometrically accurate, but it is disordered, which makes it difficult to find local neighbors efficiently. The voxel-based view is regular, but sparse, and computation grows cubically when voxel resolution increases. The range-b… ▽ More Point clouds can be represented in many forms (views), typically, point-based sets, voxel-based cells or range-based images(i.e., panoramic view). The point-based view is geometrically accurate, but it is disordered, which makes it difficult to find local neighbors efficiently. The voxel-based view is regular, but sparse, and computation grows cubically when voxel resolution increases. The range-based view is regular and generally dense, however spherical projection makes physical dimensions distorted. Both voxel- and range-based views suffer from quantization loss, especially for voxels when facing large-scale scenes. In order to utilize different view's advantages and alleviate their own shortcomings in fine-grained segmentation task, we propose a novel range-point-voxel fusion network, namely RPVNet. In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views and propose a gated fusion module (termed as GFM), which can adaptively merge the three features based on concurrent inputs. Moreover, the proposed RPV interaction mechanism is highly efficient, and we summarize it into a more general formulation. By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient. Finally, we evaluated the proposed model on two large-scale datasets, i.e., SemanticKITTI and nuScenes, and it shows state-of-the-art performance on both of them. Note that, our method currently ranks 1st on SemanticKITTI leaderboard without any extra tricks. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:1903.06405 [pdf, other]

BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

Authors: Jianru Xue, Jianwu Fang, Tao Li, Bohua Zhang, Pu Zhang, Zhen Ye, Jian Dou

Abstract: In autonomous driving community, numerous benchmarks have been established to assist the tasks of 3D/2D object detection, stereo vision, semantic/instance segmentation. However, the more meaningful dynamic evolution of the surrounding objects of ego-vehicle is rarely exploited, and lacks a large-scale dataset platform. To address this, we introduce BLVD, a large-scale 5D semantics benchmark which… ▽ More In autonomous driving community, numerous benchmarks have been established to assist the tasks of 3D/2D object detection, stereo vision, semantic/instance segmentation. However, the more meaningful dynamic evolution of the surrounding objects of ego-vehicle is rarely exploited, and lacks a large-scale dataset platform. To address this, we introduce BLVD, a large-scale 5D semantics benchmark which does not concentrate on the static detection or semantic/instance segmentation tasks tackled adequately before. Instead, BLVD aims to provide a platform for the tasks of dynamic 4D (3D+temporal) tracking, 5D (4D+interactive) interactive event recognition and intention prediction. This benchmark will boost the deeper understanding of traffic scenes than ever before. We totally yield 249,129 3D annotations, 4,902 independent individuals for tracking with the length of overall 214,922 points, 6,004 valid fragments for 5D interactive event recognition, and 4,900 individuals for 5D intention prediction. These tasks are contained in four kinds of scenarios depending on the object density (low and high) and light conditions (daytime and nighttime). The benchmark can be downloaded from our project site https://github.com/VCCIV/BLVD/. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: To appear in ICRA2019

arXiv:1711.04618 [pdf]

Impartial redistricting: a Markov chain approach to the "Gerrymandering problem"

Authors: Jason Dou

Abstract: After every U.S. national census, a state legislature is required to redraw the boundaries of congressional districts in order to account for changes in population. At the moment this is done in a highly partisan way, with districting done in order to maximize the benefits to the party in power. This is a threat to U.S's democracy. There have been proposals to take the re-districting out of the ha… ▽ More After every U.S. national census, a state legislature is required to redraw the boundaries of congressional districts in order to account for changes in population. At the moment this is done in a highly partisan way, with districting done in order to maximize the benefits to the party in power. This is a threat to U.S's democracy. There have been proposals to take the re-districting out of the hands of political parties and give to an "independent" commission. Independence is hard to come by and in this thesis we want to explore the possibility of computer generated districts that as far as possible to avoid partisan "gerrymandering". The idea we have is to treat every possible redistricting as a state in a Markov Chain: every state is obtained by its former state in random way. With some technical conditions, we will get a near uniform member of the states after running sufficiently long time (the mixing time). Then we can say the uniform member is an impartial distribution. Based on the geographical and statistical data of Pennsylvania, I have achieved the Markov Chain algorithm with several constraints, done optimization experiments and a web interface is going to be made to show the results. △ Less

Submitted 30 October, 2017; originally announced November 2017.

Comments: Bachelor's thesis, Beijing Univ (2014)

arXiv:1710.00273 [pdf]

What Words Do We Use to Lie?: Word Choice in Deceptive Messages

Authors: Jason Xiaotian Dou, Michelle Liu, Haaris Muneer, Adam Schlussel

Abstract: Text messaging is the most widely used form of computer-mediated communication (CMC). Previous findings have shown that linguistic factors can reliably indicate messages as deceptive. For example, users take longer and use more words to craft deceptive messages than they do truthful messages. Existing research has also examined how factors, such as student status and gender, affect rates of decept… ▽ More Text messaging is the most widely used form of computer-mediated communication (CMC). Previous findings have shown that linguistic factors can reliably indicate messages as deceptive. For example, users take longer and use more words to craft deceptive messages than they do truthful messages. Existing research has also examined how factors, such as student status and gender, affect rates of deception and word choice in deceptive messages. However, this research has been limited by small sample sizes and has returned contradicting findings. This paper aims to address these issues by using a dataset of text messages collected from a large and varied set of participants using an Android messaging application. The results of this paper show significant differences in word choice and frequency of deceptive messages between male and female participants, as well as between students and non-students. △ Less

Submitted 1 August, 2022; v1 submitted 30 September, 2017; originally announced October 2017.

arXiv:1602.01428 [pdf]

"Draw My Topics": Find Desired Topics fast from large scale of Corpus

Authors: Jason Dou, Ni Sun, Xiaojun Zou

Abstract: We develop the "Draw My Topics" toolkit, which provides a fast way to incorporate social scientists' interest into standard topic modelling. Instead of using raw corpus with primitive processing as input, an algorithm based on Vector Space Model and Conditional Entropy are used to connect social scientists' willingness and unsupervised topic models' output. Space for users' adjustment on specific… ▽ More We develop the "Draw My Topics" toolkit, which provides a fast way to incorporate social scientists' interest into standard topic modelling. Instead of using raw corpus with primitive processing as input, an algorithm based on Vector Space Model and Conditional Entropy are used to connect social scientists' willingness and unsupervised topic models' output. Space for users' adjustment on specific corpus of their interest is also accommodated. We demonstrate the toolkit's use on the Diachronic People's Daily Corpus in Chinese. △ Less

Submitted 3 February, 2016; originally announced February 2016.

arXiv:1510.03247

Impartial Redistricting: A Markov Chain Approach

Authors: Lucy Chenyun Wu, Jason Xiaotian Dou, Danny Sleator, Alan Frieze, David Miller

Abstract: The gerrymandering problem is a worldwide problem which sets great threat to democracy and justice in district based elections. Thanks to partisan redistricting commissions, district boundaries are often manipulated to benefit incumbents. Since an independent commission is hard to come by, the possibility of impartially generating districts with a computer is explored in this thesis. We have devel… ▽ More The gerrymandering problem is a worldwide problem which sets great threat to democracy and justice in district based elections. Thanks to partisan redistricting commissions, district boundaries are often manipulated to benefit incumbents. Since an independent commission is hard to come by, the possibility of impartially generating districts with a computer is explored in this thesis. We have developed an algorithm to randomly produce legal redistricting schemes for Pennsylvania. △ Less

Submitted 13 October, 2015; v1 submitted 12 October, 2015; originally announced October 2015.

Comments: about authorship naming problem, will fix soon

Showing 1–21 of 21 results for author: Dou, J