Search | arXiv e-print repository

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the speech modality. We propose a method called "Vector Quantized Contrastive Token-Acoustic Pre-training (VQ-CTAP)", which uses the cross-modal aligned sequence transcoder to bring text and speech into a joint multimodal space, learning how to connect text and speech at the frame level. The proposed VQ-CTAP is a paradigm for cross-modal sequence representation learning, offering a promising solution for fine-grained generation and recognition tasks in speech processing. The VQ-CTAP can be directly applied to VC and ASR tasks without fine-tuning or additional structures. We propose a sequence-aware semantic connector, which connects multiple frozen pre-trained modules for the TTS task, exhibiting a plug-and-play capability. We design a stepping optimization strategy to ensure effective model convergence by gradually injecting and adjusting the influence of various loss components. Furthermore, we propose a semantic-transfer-wise paralinguistic consistency loss to enhance representational capabilities, allowing the model to better generalize to unseen data and capture the nuances of paralinguistic information. In addition, VQ-CTAP achieves high-compression speech coding at a rate of 25Hz from 24kHz input waveforms, which is a 960-fold reduction in the sampling rate. The audio demo is available at https://qiangchunyu.github.io/VQCTAP/ △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2407.18902 [pdf, other]

Lessons from Learning to Spin "Pens"

Authors: Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

Abstract: In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation… ▽ More In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: Website: https://penspin.github.io/

arXiv:2404.04556 [pdf, other]

Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach

Authors: Haibo Jin, Haoxuan Che, Hao Chen

Abstract: Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper thre… ▽ More Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Under review

arXiv:2404.02903 [pdf, other]

LidarDM: Generative LiDAR Simulation in a Generated World

Authors: Vlas Zyrianov, Henry Che, Zhijian Liu, Shenlong Wang

Abstract: We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unprecedented capabilities in LiDAR generative modeling: (i) LiDAR generation guided by driving scenarios, offering significant potential for autonomous driving simulations, and (ii) 4D LiDAR point cloud generation,… ▽ More We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unprecedented capabilities in LiDAR generative modeling: (i) LiDAR generation guided by driving scenarios, offering significant potential for autonomous driving simulations, and (ii) 4D LiDAR point cloud generation, enabling the creation of realistic and temporally coherent sequences. At the heart of our model is a novel integrated 4D world generation framework. Specifically, we employ latent diffusion models to generate the 3D scene, combine it with dynamic actors to form the underlying 4D world, and subsequently produce realistic sensory observations within this virtual environment. Our experiments indicate that our approach outperforms competing algorithms in realism, temporal coherency, and layout consistency. We additionally show that LidarDM can be used as a generative world model simulator for training and testing perception models. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.16384 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447712

Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging

Authors: Jintong Hu, Hui Che, Zishuo Li, Wenming Yang

Abstract: Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma. Traditional interpolation-based zoom-in techniques often sacrifice detail and introduce artifacts. Motivated by the potential of arbitrary-scale super-resolution to naturally address these inherent challenges, we present… ▽ More Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma. Traditional interpolation-based zoom-in techniques often sacrifice detail and introduce artifacts. Motivated by the potential of arbitrary-scale super-resolution to naturally address these inherent challenges, we present the Residual Dense Swin Transformer Network (RDSTN), designed to capture the non-local characteristics and long-range dependencies intrinsic to ultrasound images. It comprises a linear embedding module for feature enhancement, an encoder with shifted-window attention for modeling non-locality, and an MLP decoder for continuous detail reconstruction. This strategy streamlines balancing image quality and field-of-view, which offers superior textures over traditional methods. Experimentally, RDSTN outperforms existing approaches while requiring fewer parameters. In conclusion, RDSTN shows promising potential for ultrasound image enhancement by overcoming the limitations of conventional interpolation-based methods and achieving depth-independent imaging. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by ICASSP2024, https://ieeexplore.ieee.org/document/10447712

Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2312.01853 [pdf, other]

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Authors: Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang

Abstract: Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation in… ▽ More Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ . △ Less

Submitted 31 July, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Project page: https://yingyuan0414.github.io/visuotactile/

arXiv:2309.17269 [pdf, ps, other]

Unpaired Optical Coherence Tomography Angiography Image Super-Resolution via Frequency-Aware Inverse-Consistency GAN

Authors: Weiwen Zhang, Dawei Yang, Haoxuan Che, An Ran Ran, Carol Y. Cheung, Hao Chen

Abstract: For optical coherence tomography angiography (OCTA) images, a limited scanning rate leads to a trade-off between field-of-view (FOV) and imaging resolution. Although larger FOV images may reveal more parafoveal vascular lesions, their application is greatly hampered due to lower resolution. To increase the resolution, previous works only achieved satisfactory performance by using paired data for t… ▽ More For optical coherence tomography angiography (OCTA) images, a limited scanning rate leads to a trade-off between field-of-view (FOV) and imaging resolution. Although larger FOV images may reveal more parafoveal vascular lesions, their application is greatly hampered due to lower resolution. To increase the resolution, previous works only achieved satisfactory performance by using paired data for training, but real-world applications are limited by the challenge of collecting large-scale paired images. Thus, an unpaired approach is highly demanded. Generative Adversarial Network (GAN) has been commonly used in the unpaired setting, but it may struggle to accurately preserve fine-grained capillary details, which are critical biomarkers for OCTA. In this paper, our approach aspires to preserve these details by leveraging the frequency information, which represents details as high-frequencies ($\textbf{hf}$) and coarse-grained backgrounds as low-frequencies ($\textbf{lf}$). In general, we propose a GAN-based unpaired super-resolution method for OCTA images and exceptionally emphasize $\textbf{hf}$ fine capillaries through a dual-path generator. To facilitate a precise spectrum of the reconstructed image, we also propose a frequency-aware adversarial loss for the discriminator and introduce a frequency-aware focal consistency loss for end-to-end optimization. Experiments show that our method outperforms other state-of-the-art unpaired methods both quantitatively and visually. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: 10 pages, 9 figures

arXiv:2308.13286 [pdf, other]

Unsupervised Domain Adaptation for Anatomical Landmark Detection

Authors: Haibo Jin, Haoxuan Che, Hao Chen

Abstract: Recently, anatomical landmark detection has achieved great progresses on single-domain data, which usually assumes training and test sets are from the same domain. However, such an assumption is not always true in practice, which can cause significant performance drop due to domain shift. To tackle this problem, we propose a novel framework for anatomical landmark detection under the setting of un… ▽ More Recently, anatomical landmark detection has achieved great progresses on single-domain data, which usually assumes training and test sets are from the same domain. However, such an assumption is not always true in practice, which can cause significant performance drop due to domain shift. To tackle this problem, we propose a novel framework for anatomical landmark detection under the setting of unsupervised domain adaptation (UDA), which aims to transfer the knowledge from labeled source domain to unlabeled target domain. The framework leverages self-training and domain adversarial learning to address the domain gap during adaptation. Specifically, a self-training strategy is proposed to select reliable landmark-level pseudo-labels of target domain data with dynamic thresholds, which makes the adaptation more effective. Furthermore, a domain adversarial learning module is designed to handle the unaligned data distributions of two domains by learning domain-invariant features via adversarial training. Our experiments on cephalometric and lung landmark detection show the effectiveness of the method, which reduces the domain gap by a large margin and outperforms other UDA methods consistently. The code is available at https://github.com/jhb86253817/UDA_Med_Landmark. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Accepted to MICCAI 2023

arXiv:2308.12604 [pdf, other]

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

Authors: Haibo Jin, Haoxuan Che, Yi Lin, Hao Chen

Abstract: Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need for precise clinical understanding and disease identification. Moreover, the imbalanced distribution of diseases makes the challenge even more pronounced, as rare d… ▽ More Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need for precise clinical understanding and disease identification. Moreover, the imbalanced distribution of diseases makes the challenge even more pronounced, as rare diseases are underrepresented in training data, making their diagnostic performance unreliable. To address these challenges, we propose diagnosis-driven prompts for medical report generation (PromptMRG), a novel framework that aims to improve the diagnostic accuracy of MRG with the guidance of diagnosis-aware prompts. Specifically, PromptMRG is based on encoder-decoder architecture with an extra disease classification branch. When generating reports, the diagnostic results from the classification branch are converted into token prompts to explicitly guide the generation process. To further improve the diagnostic accuracy, we design cross-modal feature enhancement, which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging the knowledge from a pre-trained CLIP. Moreover, the disease imbalanced issue is addressed by applying an adaptive logit-adjusted loss to the classification branch based on the individual learning status of each disease, which overcomes the barrier of text decoder's inability to manipulate disease distributions. Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains state-of-the-art clinical efficacy performance on both datasets. The code is available at https://github.com/jhb86253817/PromptMRG. △ Less

Submitted 12 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Accepted to AAAI 2024

arXiv:2308.11985 [pdf, other]

DSSP: A Distributed, SLO-aware, Sensing-domain-privacy-Preserving Architecture for Sensing-as-a-Service

Authors: Lin Sun, Todd Rosenkrantz, Prathyusha Enganti, Huiyang Li, Zhijun Wang, Hao Che, Hong Jiang, Xukai Zou

Abstract: In this paper, we propose DSSP, a Distributed, SLO-aware, Sensing-domain-privacy-Preserving architecture for Sensing-as-a-Service (SaS). DSSP addresses four major limitations of the current SaS architecture. First, to improve sensing quality and enhance geographic coverage, DSSP allows Independent sensing Administrative Domains (IADs) to participate in sensing services, while preserving the autono… ▽ More In this paper, we propose DSSP, a Distributed, SLO-aware, Sensing-domain-privacy-Preserving architecture for Sensing-as-a-Service (SaS). DSSP addresses four major limitations of the current SaS architecture. First, to improve sensing quality and enhance geographic coverage, DSSP allows Independent sensing Administrative Domains (IADs) to participate in sensing services, while preserving the autonomy of control and privacy for individual domains. Second, DSSP enables a marketplace in which a sensing data seller (i.e., an IAD) can sell its sensing data to more than one buyer (i.e., cloud service provider (CSP)), rather than being locked in with just one CSP. Third, DSSP enables per-query tail-latency service-level-objective (SLO) guaranteed SaS. Fourth, DSSP enables distributed, rather than centralized, query scheduling, making SaS highly scalable. At the core of DSSP is the design of a budget decomposition technique that translates: (a) a query tail-latency SLO into exact task response time budgets for sensing tasks of the query dispatched to individual IADs; and (b) the task budget for a task arrived at an IAD into exact subtask queuing deadlines for subtasks of the task dispatched to individual edge nodes in each IAD. This enables IADs to allocate their internal resources independently and accurately to meet the task budgets and hence, query tail-latency SLO, based on a simple subtask-budget-aware earliest-deadline-first queuing (EDFQ) policy for all the subtasks. The performance and scalability of DSSP are evaluated and verified by both on-campus testbed experiment at small scale and simulation at large scale. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 14 pages

ACM Class: C.2.4; C.4

arXiv:2307.11336 [pdf, other]

doi 10.1109/MAPR56351.2022.9924897

Character Time-series Matching For Robust License Plate Recognition

Authors: Quang Huy Che, Tung Do Thanh, Cuong Truong Van

Abstract: Automatic License Plate Recognition (ALPR) is becoming a popular study area and is applied in many fields such as transportation or smart city. However, there are still several limitations when applying many current methods to practical problems due to the variation in real-world situations such as light changes, unclear License Plate (LP) characters, and image quality. Almost recent ALPR algorith… ▽ More Automatic License Plate Recognition (ALPR) is becoming a popular study area and is applied in many fields such as transportation or smart city. However, there are still several limitations when applying many current methods to practical problems due to the variation in real-world situations such as light changes, unclear License Plate (LP) characters, and image quality. Almost recent ALPR algorithms process on a single frame, which reduces accuracy in case of worse image quality. This paper presents methods to improve license plate recognition accuracy by tracking the license plate in multiple frames. First, the Adaptive License Plate Rotation algorithm is applied to correctly align the detected license plate. Second, we propose a method called Character Time-series Matching to recognize license plate characters from many consequence frames. The proposed method archives high performance in the UFPR-ALPR dataset which is \boldmath$96.7\%$ accuracy in real-time on RTX A5000 GPU card. We also deploy the algorithm for the Vietnamese ALPR system. The accuracy for license plate detection and character recognition are 0.881 and 0.979 $mAP^{test}[email protected] respectively. The source code is available at https://github.com/chequanghuy/Character-Time-series-Matching.git △ Less

Submitted 12 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Journal ref: 2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)

arXiv:2307.10705 [pdf, other]

doi 10.1109/MAPR59823.2023.10288710

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

Authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam

Abstract: Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles.… ▽ More Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}. △ Less

Submitted 13 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted by MAPR 2023

arXiv:2307.04378 [pdf, other]

Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains

Authors: Haoxuan Che, Yuhan Cheng, Haibo Jin, Hao Chen

Abstract: Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalizatio… ▽ More Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalization problem. Existing works have mainly attributed the performance degradation to limited domain shifts caused by simple visual discrepancies, which cannot handle complex real-world scenarios. Instead, we present preliminary evidence suggesting the existence of three-fold generalization issues: visual and degradation style shifts, diagnostic pattern diversity, and data imbalance. To tackle these issues, we propose a novel unified framework named Generalizable Diabetic Retinopathy Grading Network (GDRNet). GDRNet consists of three vital components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented images via visual transformation and image degradation, while DahLoss jointly leverages pixel-level consistency and image-level semantics to capture the diverse diagnostic patterns and build generalizable feature representations. Moreover, DCR mitigates the data imbalance from a domain-class view and avoids undesired over-emphasis on rare domain-class pairs. Finally, we design a publicly available benchmark for fair evaluations. Extensive comparison experiments against advanced methods and exhaustive ablation studies demonstrate the effectiveness and generalization ability of GDRNet. △ Less

Submitted 21 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Early Accepted by MICCAI 2023, the 26th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2304.02389 [pdf, other]

DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sungjin Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, Ping Zhang, Xiaokang Yang, Yiyu Cai, Weiping Jia, Huating Li, Bin Sheng

Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.15038 [pdf, other]

Image Quality-aware Diagnosis via Meta-knowledge Co-embedding

Authors: Haoxuan Che, Siyu Chen, Hao Chen

Abstract: Medical images usually suffer from image degradation in clinical practice, leading to decreased performance of deep learning-based models. To resolve this problem, most previous works have focused on filtering out degradation-causing low-quality images while ignoring their potential value for models. Through effectively learning and leveraging the knowledge of degradations, models can better resis… ▽ More Medical images usually suffer from image degradation in clinical practice, leading to decreased performance of deep learning-based models. To resolve this problem, most previous works have focused on filtering out degradation-causing low-quality images while ignoring their potential value for models. Through effectively learning and leveraging the knowledge of degradations, models can better resist their adverse effects and avoid misdiagnosis. In this paper, we raise the problem of image quality-aware diagnosis, which aims to take advantage of low-quality images and image quality labels to achieve a more accurate and robust diagnosis. However, the diversity of degradations and superficially unrelated targets between image quality assessment and disease diagnosis makes it still quite challenging to effectively leverage quality labels to assist diagnosis. Thus, to tackle these issues, we propose a novel meta-knowledge co-embedding network, consisting of two subnets: Task Net and Meta Learner. Task Net constructs an explicit quality information utilization mechanism to enhance diagnosis via knowledge co-embedding features, while Meta Learner ensures the effectiveness and constrains the semantics of these features via meta-learning and joint-encoding masking. Superior performance on five datasets with four widely-used medical imaging modalities demonstrates the effectiveness and generalizability of our method. △ Less

Submitted 14 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023

arXiv:2303.07711 [pdf, other]

Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

Authors: Chunyu Qiang, Peng Yang, Hao Che, Ying Zhang, Xiaorui Wang, Zhongyuan Wang

Abstract: Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre. In most previous methods, the synthesized fine-grained prosody features often represent the source speaker's average style, similar to the one-to-many problem(i.e., multiple prosody variations correspond to the same text). In response to this problem… ▽ More Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre. In most previous methods, the synthesized fine-grained prosody features often represent the source speaker's average style, similar to the one-to-many problem(i.e., multiple prosody variations correspond to the same text). In response to this problem, a strength-controlled semi-supervised style extractor is proposed to disentangle the style from content and timbre, improving the representation and interpretability of the global style embedding, which can alleviate the one-to-many mapping and data imbalance problems in prosody prediction. A hierarchical prosody predictor is proposed to improve prosody modeling. We find that better style transfer can be achieved by using the source speaker's prosody features that are easily predicted. Additionally, a speaker-transfer-wise cycle consistency loss is proposed to assist the model in learning unseen style-timbre combinations during the training phase. Experimental results show that the method outperforms the baseline. We provide a website with audio samples. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP2023

arXiv:2212.06397 [pdf, other]

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis

Authors: Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang

Abstract: Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style… ▽ More Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style transfer from source speaker to target speaker without style labels. Firstly, a reference encoder structure based on quantized variational autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style representations. Secondly, a speaker-wise batch normalization layer is proposed to reduce the source speaker leakage. In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed. Experimental results show that the method outperforms the baseline. We provide a website with audio samples. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Published to ISCSLP 2022

arXiv:2211.09495 [pdf, other]

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Authors: Chunyu Qiang, Peng Yang, Hao Che, Jinba Xiao, Xiaorui Wang, Zhongyuan Wang

Abstract: Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simp… ▽ More Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Published to APSIPA ASC 2022

arXiv:2209.05741 [pdf, other]

SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus

Authors: Yufeng Zhao, Haiying Che

Abstract: BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the ski… ▽ More BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the skimming-intensive reading method used by humans when reading a long paragraph, the Skimming-Intensive Model (SkIn) is proposed. It can dynamically select the critical information in the text so that the sentence input into the BERT-Base model is significantly shortened, which can effectively save the cost of the classification algorithm. Experiments show that the SkIn method has achieved superior accuracy than the baselines on long-text classification datasets in the medical field, while its time and space requirements increase linearly with the text length, alleviating the time and space overflow problem of basic BERT on long-text data. △ Less

Submitted 24 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: 14 pages, 4 figures

arXiv:2207.04183 [pdf, other]

doi 10.1007/978-3-031-16437-8_50

Learning Robust Representation for Joint Grading of Ophthalmic Diseases via Adaptive Curriculum and Feature Disentanglement

Authors: Haoxuan Che, Haibo Jin, Hao Chen

Abstract: Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of permanent blindness worldwide. Designing an automatic grading system with good generalization ability for DR and DME is vital in clinical practice. However, prior works either grade DR or DME independently, without considering internal correlations between them, or grade them jointly by shared feature representation,… ▽ More Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of permanent blindness worldwide. Designing an automatic grading system with good generalization ability for DR and DME is vital in clinical practice. However, prior works either grade DR or DME independently, without considering internal correlations between them, or grade them jointly by shared feature representation, yet ignoring potential generalization issues caused by difficult samples and data bias. Aiming to address these problems, we propose a framework for joint grading with the dynamic difficulty-aware weighted loss (DAW) and the dual-stream disentangled learning architecture (DETACH). Inspired by curriculum learning, DAW learns from simple samples to difficult samples dynamically via measuring difficulty adaptively. DETACH separates features of grading tasks to avoid potential emphasis on the bias. With the addition of DAW and DETACH, the model learns robust disentangled feature representations to explore internal correlations between DR and DME and achieve better grading performance. Experiments on three benchmarks show the effectiveness and robustness of our framework under both the intra-dataset and cross-dataset tests. △ Less

Submitted 26 March, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: Accepted by MICCAI22

arXiv:2110.10035 [pdf]

A Soft-Rigid Hybrid Gripper with Lateral Compliance and Dexterous In-hand Manipulation

Authors: Wenpei Zhu, Chenghua Lu, Qule Zheng, Zhonggui Fang, Haichuan Che, Kailuan Tang, Mingchao Zhu, Sicong Liu, Zheng Wang

Abstract: Soft grippers are receiving growing attention due to their compliance-based interactive safety and dexterity. Hybrid gripper (soft actuators enhanced by rigid constraints) is a new trend in soft gripper design. With right structural components actuated by soft actuators, they could achieve excellent grasping adaptability and payload, while also being easy to model and control with conventional kin… ▽ More Soft grippers are receiving growing attention due to their compliance-based interactive safety and dexterity. Hybrid gripper (soft actuators enhanced by rigid constraints) is a new trend in soft gripper design. With right structural components actuated by soft actuators, they could achieve excellent grasping adaptability and payload, while also being easy to model and control with conventional kinematics. However, existing works were mostly focused on achieving superior payload and perception with simple planar workspaces, resulting in far less dexterity compared with conventional grippers. In this work, we took inspiration from the human Metacarpophalangeal (MCP) joint and proposed a new hybrid gripper design with 8 independent muscles. It was shown that adding the MCP complexity was critical in enabling a range of novel features in the hybrid gripper, including in-hand manipulation, lateral passive compliance, as well as new control modes. A prototype gripper was fabricated and tested on our proprietary dual-arm robot platform with vision guided grasping. With very lightweight pneumatic bellows soft actuators, the gripper could grasp objects over 25 times its own weight with lateral compliance. Using the dual-arm platform, highly anthropomorphic dexterous manipulations were demonstrated using two hybrid grippers, from Tug-of-war on a rigid rod, to passing a soft towel between two grippers using in-hand manipulation. Matching with the novel features and performance specifications of the proposed hybrid gripper, the underlying modeling, actuation, control, and experimental validation details were also presented, offering a promising approach to achieving enhanced dexterity, strength, and compliance in robotic grippers. △ Less

Submitted 19 October, 2021; originally announced October 2021.

arXiv:2108.02476 [pdf, other]

Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment

Authors: Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Abstract: Differentiation of colorectal polyps is an important clinical examination. A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images. Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images. However, the wide range of these models' applications for clinical work has been limited by… ▽ More Differentiation of colorectal polyps is an important clinical examination. A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images. Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images. However, the wide range of these models' applications for clinical work has been limited by the lagging of imaging techniques. Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination. In practice, during training, the auxiliary NBI images are utilized to train a teacher network and guide the student network to acquire richer feature representation from WL images. The feature transfer is realized by domain alignment and contrastive learning. Eventually the final student network has the ability to extract aligned features from only WL images to facilitate the CPC task. Besides, we release the first public-available paired CPC dataset containing WL-NBI pairs for the alignment training. Quantitative and qualitative evaluation indicates that the proposed method outperforms the previous methods in CPC, improving the accuracy by 5.6%with very fast speed. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted in MICCAI-21

arXiv:2107.12775 [pdf, other]

doi 10.1007/978-3-030-87583-1_18

Realistic Ultrasound Image Synthesis for Improved Classification of Liver Disease

Authors: Hui Che, Sumana Ramanathan, David Foran, John L Nosher, Vishal M Patel, Ilker Hacihaliloglu

Abstract: With the success of deep learning-based methods applied in medical image analysis, convolutional neural networks (CNNs) have been investigated for classifying liver disease from ultrasound (US) data. However, the scarcity of available large-scale labeled US data has hindered the success of CNNs for classifying liver disease from US data. In this work, we propose a novel generative adversarial netw… ▽ More With the success of deep learning-based methods applied in medical image analysis, convolutional neural networks (CNNs) have been investigated for classifying liver disease from ultrasound (US) data. However, the scarcity of available large-scale labeled US data has hindered the success of CNNs for classifying liver disease from US data. In this work, we propose a novel generative adversarial network (GAN) architecture for realistic diseased and healthy liver US image synthesis. We adopt the concept of stacking to synthesize realistic liver US data. Quantitative and qualitative evaluation is performed on 550 in-vivo B-mode liver US images collected from 55 subjects. We also show that the synthesized images, together with real in vivo data, can be used to significantly improve the performance of traditional CNN architectures for Nonalcoholic fatty liver disease (NAFLD) classification. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: Accepted for presentation at the 2021 MICCAI-International Workshop of Advances in Simplifying Medical UltraSound (ASMUS2021)

arXiv:2106.11596 [pdf, other]

doi 10.1007/s13042-023-01841-6

Multi-layered Semantic Representation Network for Multi-label Image Classification

Authors: Xiwen Qu, Hao Che, Jun Huang, Linchuan Xu, Xiao Zheng

Abstract: Multi-label image classification (MLIC) is a fundamental and practical task, which aims to assign multiple possible labels to an image. In recent years, many deep convolutional neural network (CNN) based approaches have been proposed which model label correlations to discover semantics of labels and learn semantic representations of images. This paper advances this research direction by improving… ▽ More Multi-label image classification (MLIC) is a fundamental and practical task, which aims to assign multiple possible labels to an image. In recent years, many deep convolutional neural network (CNN) based approaches have been proposed which model label correlations to discover semantics of labels and learn semantic representations of images. This paper advances this research direction by improving both the modeling of label correlations and the learning of semantic representations. On the one hand, besides the local semantics of each label, we propose to further explore global semantics shared by multiple labels. On the other hand, existing approaches mainly learn the semantic representations at the last convolutional layer of a CNN. But it has been noted that the image representations of different layers of CNN capture different levels or scales of features and have different discriminative abilities. We thus propose to learn semantic representations at multiple convolutional layers. To this end, this paper designs a Multi-layered Semantic Representation Network (MSRN) which discovers both local and global semantics of labels through modeling label correlations and utilizes the label semantics to guide the semantic representations learning at multiple layers through an attention mechanism. Extensive experiments on four benchmark datasets including VOC 2007, COCO, NUS-WIDE, and Apparel show a competitive performance of the proposed MSRN against state-of-the-art models. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Journal ref: International Journal of Machine Learning and Cybernetics, 2023

arXiv:2105.06564 [pdf, other]

Physical Artificial Intelligence: The Concept Expansion of Next-Generation Artificial Intelligence

Authors: Yingbo Li, Yucong Duan, Anamaria-Beatrice Spulber, Haoyang Che, Zakaria Maamar, Zhao Li, Chen Yang, Yu lei

Abstract: Artificial Intelligence has been a growth catalyst to our society and is cosidered across all idustries as a fundamental technology. However, its development has been limited to the signal processing domain that relies on the generated and collected data from other sensors. In recent research, concepts of Digital Artificial Intelligence and Physicial Artifical Intelligence have emerged and this ca… ▽ More Artificial Intelligence has been a growth catalyst to our society and is cosidered across all idustries as a fundamental technology. However, its development has been limited to the signal processing domain that relies on the generated and collected data from other sensors. In recent research, concepts of Digital Artificial Intelligence and Physicial Artifical Intelligence have emerged and this can be considered a big step in the theoretical development of Artifical Intelligence. In this paper we explore the concept of Physicial Artifical Intelligence and propose two subdomains: Integrated Physicial Artifical Intelligence and Distributed Physicial Artifical Intelligence. The paper will also examine the trend and governance of Physicial Artifical Intelligence. △ Less

Submitted 16 May, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

arXiv:2105.04045 [pdf, other]

doi 10.1155/2021/6671628

Swarm Differential Privacy for Purpose Driven Data-Information-Knowledge-Wisdom Architecture

Authors: Yingbo Li, Yucong Duan, Zakaria Maama, Haoyang Che, Anamaria-Beatrice Spulber, Stelios Fuentes

Abstract: Privacy protection has recently been in the spotlight of attention to both academia and industry. Society protects individual data privacy through complex legal frameworks. The increasing number of applications of data science and artificial intelligence has resulted in a higher demand for the ubiquitous application of the data. The privacy protection of the broad Data-Information-Knowledge-Wisdom… ▽ More Privacy protection has recently been in the spotlight of attention to both academia and industry. Society protects individual data privacy through complex legal frameworks. The increasing number of applications of data science and artificial intelligence has resulted in a higher demand for the ubiquitous application of the data. The privacy protection of the broad Data-Information-Knowledge-Wisdom (DIKW) landscape, the next generation of information organization, has taken a secondary role. In this paper, we will explore DIKW architecture through the applications of the popular swarm intelligence and differential privacy. As differential privacy proved to be an effective data privacy approach, we will look at it from a DIKW domain perspective. Swarm Intelligence can effectively optimize and reduce the number of items in DIKW used in differential privacy, thus accelerating both the effectiveness and the efficiency of differential privacy for crossing multiple modals of conceptual DIKW. The proposed approach is demonstrated through the application of personalized data that is based on the open-sourse IRIS dataset. This experiment demonstrates the efficiency of Swarm Intelligence in reducing computing complexity. △ Less

Submitted 29 June, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

Journal ref: Mobile Information Systems. Volume 2021, Article ID 6671628. 28 Jun 2021

arXiv:2005.01006 [pdf, other]

An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert

Authors: Wei Bao, Hongshu Che, Jiandong Zhang

Abstract: Natural Language Processing (NLP) has been widely used in the semantic analysis in recent years. Our paper mainly discusses a methodology to analyze the effect that context has on human perception of similar words, which is the third task of SemEval 2020. We apply several methods in calculating the distance between two embedding vector generated by Bidirectional Encoder Representation from Transfo… ▽ More Natural Language Processing (NLP) has been widely used in the semantic analysis in recent years. Our paper mainly discusses a methodology to analyze the effect that context has on human perception of similar words, which is the third task of SemEval 2020. We apply several methods in calculating the distance between two embedding vector generated by Bidirectional Encoder Representation from Transformer (BERT). Our team will_go won the 1st place in Finnish language track of subtask1, the second place in English track of subtask1. △ Less

Submitted 21 July, 2020; v1 submitted 3 May, 2020; originally announced May 2020.

Comments: ACL-SemEval 2020

arXiv:1910.13276 [pdf, other]

a novel cross-lingual voice cloning approach with a few text-free samples

Authors: Xinyong Zhou, Hao Che, Xiaorui Wang, Lei Xie

Abstract: In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target… ▽ More In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target speaker to realize voice cloning. This system can generate speech of arbitrary utterance of target language in cross-lingual speakers' voice. We verify that with small amount of audio data, our proposed approach can well handle cross-lingual tasks. And in intra-lingual tasks, our proposed approach also performs better than baseline approach in naturalness and similarity. △ Less

Submitted 30 October, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Submitted to ICASSP 2020

arXiv:1903.00923 [pdf]

doi 10.1088/1361-6560/abfce3

Pancreas segmentation with probabilistic map guided bi-directional recurrent UNet

Authors: Jun Li, Xiaozhu Lin, Hui Che, Hao Li, Xiaohua Qian

Abstract: Pancreas segmentation in medical imaging data is of great significance for clinical pancreas diagnostics and treatment. However, the large population variations in the pancreas shape and volume cause enormous segmentation difficulties, even for state-of-the-art algorithms utilizing fully-convolutional neural networks (FCNs). Specifically, pancreas segmentation suffers from the loss of spatial info… ▽ More Pancreas segmentation in medical imaging data is of great significance for clinical pancreas diagnostics and treatment. However, the large population variations in the pancreas shape and volume cause enormous segmentation difficulties, even for state-of-the-art algorithms utilizing fully-convolutional neural networks (FCNs). Specifically, pancreas segmentation suffers from the loss of spatial information in 2D methods, and the high computational cost of 3D methods. To alleviate these problems, we propose a probabilistic-map-guided bi-directional recurrent UNet (PBR-UNet) architecture, which fuses intra-slice information and inter-slice probabilistic maps into a local 3D hybrid regularization scheme, which is followed by bi-directional recurrent network optimization. The PBR-UNet method consists of an initial estimation module for efficiently extracting pixel-level probabilistic maps and a primary segmentation module for propagating hybrid information through a 2.5D U-Net architecture. Specifically, local 3D information is inferred by combining an input image with the probabilistic maps of the adjacent slices into multichannel hybrid data, and then hierarchically aggregating the hybrid information of the entire segmentation network. Besides, a bi-directional recurrent optimization mechanism is developed to update the hybrid information in both the forward and the backward directions. This allows the proposed network to make full and optimal use of the local context information. Quantitative and qualitative evaluation was performed on the NIH Pancreas-CT dataset, and our proposed PBR-UNet method achieved better segmentation results with less computational cost compared to other state-of-the-art methods. △ Less

Submitted 11 August, 2022; v1 submitted 3 March, 2019; originally announced March 2019.

Comments: accepted by Physics in Medicine & Biology

Journal ref: Physics in Medicine & Biology, 66(11), 115010 (2021)

arXiv:1705.05691 [pdf, other]

Cloudroid: A Cloud Framework for Transparent and QoS-aware Robotic Computation Outsourcing

Authors: Ben Hu, Huaimin Wang, Pengfei Zhang, Bo Ding, Huimin Che

Abstract: Many robotic tasks require heavy computation, which can easily exceed the robot's onboard computer capability. A promising solution to address this challenge is outsourcing the computation to the cloud. However, exploiting the potential of cloud resources in robotic software is difficult, because it involves complex code modification and extensive (re)configuration procedures. Moreover, quality of… ▽ More Many robotic tasks require heavy computation, which can easily exceed the robot's onboard computer capability. A promising solution to address this challenge is outsourcing the computation to the cloud. However, exploiting the potential of cloud resources in robotic software is difficult, because it involves complex code modification and extensive (re)configuration procedures. Moreover, quality of service (QoS) such as timeliness, which is critical to robot's behavior, have to be considered. In this paper, we propose a transparent and QoS-aware software framework called Cloudroid for cloud robotic applications. This framework supports direct deployment of existing robotic software packages to the cloud, transparently transforming them into Internet-accessible cloud services. And with the automatically generated service stubs, robotic applications can outsource their computation to the cloud without any code modification. Furthermore, the robot and the cloud can cooperate to maintain the specific QoS property such as request response time, even in a highly dynamic and resource-competitive environment. We evaluated Cloudroid based on a group of typical robotic scenarios and a set of software packages widely adopted in real-world robot practices. Results show that robot's capability can be enhanced significantly without code modification and specific QoS objectives can be guaranteed. In certain tasks, the "cloud + robot" setup shows improved performance in orders of magnitude compared with the robot native setup. △ Less

Submitted 16 May, 2017; originally announced May 2017.

Comments: Accepted by 10th IEEE International Conference on Cloud Computing in 2017

arXiv:1304.6693 [pdf, other]

Reliable Deniable Communication: Hiding Messages in Noise

Authors: Pak Hou Che, Mayank Bakshi, Sidharth Jaggi

Abstract: A transmitter Alice may wish to reliably transmit a message to a receiver Bob over a binary symmetric channel (BSC), while simultaneously ensuring that her transmission is deniable from an eavesdropper Willie. That is, if Willie listening to Alice's transmissions over a "significantly noisier" BSC than the one to Bob, he should be unable to estimate even whether Alice is transmitting. We consider… ▽ More A transmitter Alice may wish to reliably transmit a message to a receiver Bob over a binary symmetric channel (BSC), while simultaneously ensuring that her transmission is deniable from an eavesdropper Willie. That is, if Willie listening to Alice's transmissions over a "significantly noisier" BSC than the one to Bob, he should be unable to estimate even whether Alice is transmitting. We consider two scenarios. In our first scenario, we assume that the channel transition probability from Alice to Bob and Willie is perfectly known to all parties. Here, even when Alice's (potential) communication scheme is publicly known to Willie (with no common randomness between Alice and Bob), we prove that over 'n' channel uses Alice can transmit a message of length O(sqrt{n}) bits to Bob, deniably from Willie. We also prove information-theoretic order-optimality of this result. In our second scenario, we allow uncertainty in the knowledge of the channel transition probability parameters. In particular, we assume that the channel transition probabilities for both Bob and Willie are uniformly drawn from a known interval. Here, we show that, in contrast to the previous setting, Alice can communicate O(n) bits of message reliably and deniably (again, with no common randomness). We give both an achievability result and a matching converse for this setting. Our work builds upon the work of Bash et al on AWGN channels (but with common randomness) and differs from other recent works (by Wang et al and Bloch) in two important ways - firstly our deniability metric is variational distance (as opposed to Kullback-Leibler divergence), and secondly, our techniques are significantly different from these works. △ Less

Submitted 11 July, 2016; v1 submitted 24 April, 2013; originally announced April 2013.

arXiv:1303.0141 [pdf, other]

Routing for Security in Networks with Adversarial Nodes

Authors: Pak Hou Che, Minghua Chen, Tracey Ho, Sidharth Jaggi, Michael Langberg

Abstract: We consider the problem of secure unicast transmission between two nodes in a directed graph, where an adversary eavesdrops/jams a subset of nodes. This adversarial setting is in contrast to traditional ones where the adversary controls a subset of links. In particular, we study, in the main, the class of routing-only schemes (as opposed to those allowing coding inside the network). Routing-only s… ▽ More We consider the problem of secure unicast transmission between two nodes in a directed graph, where an adversary eavesdrops/jams a subset of nodes. This adversarial setting is in contrast to traditional ones where the adversary controls a subset of links. In particular, we study, in the main, the class of routing-only schemes (as opposed to those allowing coding inside the network). Routing-only schemes usually have low implementation complexity, yet a characterization of the rates achievable by such schemes was open prior to this work. We first propose an LP based solution for secure communication against eavesdropping, and show that it is information-theoretically rate-optimal among all routing-only schemes. The idea behind our design is to balance information flow in the network so that no subset of nodes observe "too much" information. Interestingly, we show that the rates achieved by our routing-only scheme are always at least as good as, and sometimes better, than those achieved by "naïve" network coding schemes (i.e. the rate-optimal scheme designed for the traditional scenario where the adversary controls links in a network rather than nodes.) We also demonstrate non-trivial network coding schemes that achieve rates at least as high as (and again sometimes better than) those achieved by our routing schemes, but leave open the question of characterizing the optimal rate-region of the problem under all possible coding schemes. We then extend these routing-only schemes to the adversarial node-jamming scenarios and show similar results. During the journey of our investigation, we also develop a new technique that has the potential to derive non-trivial bounds for general secure-communication schemes. △ Less

Submitted 1 March, 2013; originally announced March 2013.

arXiv:1107.4540 [pdf, other]

Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms

Authors: Chun Lam Chan, Pak Hou Che, Sidharth Jaggi, Venkatesh Saligrama

Abstract: We consider the problem of detecting a small subset of defective items from a large set via non-adaptive "random pooling" group tests. We consider both the case when the measurements are noiseless, and the case when the measurements are noisy (the outcome of each group test may be independently faulty with probability q). Order-optimal results for these scenarios are known in the literature. We gi… ▽ More We consider the problem of detecting a small subset of defective items from a large set via non-adaptive "random pooling" group tests. We consider both the case when the measurements are noiseless, and the case when the measurements are noisy (the outcome of each group test may be independently faulty with probability q). Order-optimal results for these scenarios are known in the literature. We give information-theoretic lower bounds on the query complexity of these problems, and provide corresponding computationally efficient algorithms that match the lower bounds up to a constant factor. To the best of our knowledge this work is the first to explicitly estimate such a constant that characterizes the gap between the upper and lower bounds for these problems. △ Less

Submitted 22 July, 2011; originally announced July 2011.

arXiv:0907.1228 [pdf, ps, other]

doi 10.1142/S0129183110014999

Degree correlation effect of bipartite network on personalized recommendation

Authors: Jian-Guo Liu, Tao Zhou, Zhao-Guo Xuan, Hong-An Che, Bing-Hong Wang, Yi-Cheng Zhang

Abstract: In this paper, by introducing a new user similarity index base on the diffusion process, we propose a modified collaborative filtering (MCF) algorithm, which has remarkably higher accuracy than the standard collaborative filtering. In the proposed algorithm, the degree correlation between users and objects is taken into account and embedded into the similarity index by a tunable parameter. The n… ▽ More In this paper, by introducing a new user similarity index base on the diffusion process, we propose a modified collaborative filtering (MCF) algorithm, which has remarkably higher accuracy than the standard collaborative filtering. In the proposed algorithm, the degree correlation between users and objects is taken into account and embedded into the similarity index by a tunable parameter. The numerical simulation on a benchmark data set shows that the algorithmic accuracy of the MCF, measured by the average ranking score, is further improved by 18.19% in the optimal case. In addition, two significant criteria of algorithmic performance, diversity and popularity, are also taken into account. Numerical results show that the presented algorithm can provide more diverse and less popular recommendations, for example, when the recommendation list contains 10 objects, the diversity, measured by the hamming distance, is improved by 21.90%. △ Less

Submitted 7 July, 2009; originally announced July 2009.

Comments: 9 pages, 3 figures

Journal ref: IJMPC 21(01) 2010 137-147

Showing 1–34 of 34 results for author: Che, H