Zum Hauptinhalt springen

Showing 1–44 of 44 results for author: Ji, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16532  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

    Authors: Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

    Abstract: Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Working in progress. arXiv admin note: text overlap with arXiv:2402.12208

  2. arXiv:2408.14423  [pdf, other

    eess.AS cs.SD

    DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance

    Authors: Jinhyeok Yang, Junhyeok Lee, Hyeong-Seok Choi, Seunghun Ji, Hyeongju Kim, Juheon Lee

    Abstract: Text-to-Speech (TTS) models have advanced significantly, aiming to accurately replicate human speech's diversity, including unique speaker identities and linguistic nuances. Despite these advancements, achieving an optimal balance between speaker-fidelity and text-intelligibility remains a challenge, particularly when diverse control demands are considered. Addressing this, we introduce DualSpeech… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: Accepted to INTERSPEECH 2024

  3. arXiv:2407.06530  [pdf, ps, other

    eess.SP

    RS-BNN: A Deep Learning Framework for the Optimal Beamforming Design of Rate-Splitting Multiple Access

    Authors: Yiwen Wang, Yijie Mao, Sijie Ji

    Abstract: Rate splitting multiple access (RSMA) relies on beamforming design for attaining spectral efficiency and energy efficiency gains over traditional multiple access schemes. While conventional optimization approaches such as weighted minimum mean square error (WMMSE) achieve suboptimal solutions for RSMA beamforming optimization, they are computationally demanding. A novel approach based on fractiona… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  5. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2404.07577  [pdf, other

    cs.LG eess.SP

    Generating Comprehensive Lithium Battery Charging Data with Generative AI

    Authors: Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He

    Abstract: In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  7. arXiv:2402.12208  [pdf, other

    eess.AS cs.SD

    Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

    Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

    Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: We release a more powerful checkpoint in Language-Codec v3

  8. arXiv:2402.09378  [pdf, other

    eess.AS cs.SD

    MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

    Authors: Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

    Abstract: Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, mod… ▽ More

    Submitted 2 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Main Conference)

  9. arXiv:2401.03690  [pdf

    physics.med-ph eess.IV q-bio.QM

    So You Want to Image Myelin Using MRI: Magnetic Susceptibility Source Separation for Myelin Imaging

    Authors: Jongho Lee, Sooyeon Ji, Se-Hong Oh

    Abstract: In MRI, researchers have long endeavored to effectively visualize myelin distribution in the brain, a pursuit with significant implications for both scientific research and clinical applications. Over time, various methods such as myelin water imaging, magnetization transfer imaging, and relaxometric imaging have been developed, each carrying distinct advantages and limitations. Recently, an innov… ▽ More

    Submitted 28 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to Magnetic Resonance in Medical Sciences

  10. Construct 3D Hand Skeleton with Commercial WiFi

    Authors: Sijie Ji, Xuanye Zhang, Yuanqing Zheng, Mo Li

    Abstract: This paper presents HandFi, which constructs hand skeletons with practical WiFi devices. Unlike previous WiFi hand sensing systems that primarily employ predefined gestures for pattern matching, by constructing the hand skeleton, HandFi can enable a variety of downstream WiFi-based hand sensing applications in gaming, healthcare, and smart homes. Deriving the skeleton from WiFi signals is challeng… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Journal ref: ACM SenSys 2023

  11. arXiv:2312.10307  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion

    Authors: Shulei Ji, Xinyu Yang

    Abstract: Generating music with emotion is an important task in automatic music generation, in which emotion is evoked through a variety of musical elements (such as pitch and duration) that change over time and collaborate with each other. However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the d… ▽ More

    Submitted 1 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  12. Robust Target Detection of Intelligent Integrated Optical Camera and mmWave Radar System

    Authors: Chen Zhu, Zhouxiang Zhao, Zejing Shan, Lijie Yang, Sijie Ji, Zhaohui Yang, Zhaoyang Zhang

    Abstract: Target detection is pivotal for modern urban computing applications. While image-based techniques are widely adopted, they falter under challenging environmental conditions such as adverse weather, poor lighting, and occlusion. To improve the target detection performance under complex real-world scenarios, this paper proposes an intelligent integrated optical camera and millimeter-wave (mmWave) ra… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  13. arXiv:2310.04722  [pdf, other

    cs.SD cs.AI eess.AS

    A Holistic Evaluation of Piano Sound Quality

    Authors: Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li

    Abstract: This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality datas… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  14. TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Recently, there has been a growing interest in the field of controllable Text-to-Speech (TTS). While previous studies have relied on users providing specific style factor values based on acoustic knowledge or selecting reference speeches that meet certain requirements, generating speech solely from natural text prompts has emerged as a new challenge for researchers. This challenge arises due to th… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  15. arXiv:2307.07218  [pdf, other

    eess.AS cs.SD

    Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

    Authors: Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

    Abstract: Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which si… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted by ICLR 2024

  16. arXiv:2306.03718  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder

    Authors: Shulei Ji, Xinyu Yang

    Abstract: Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional co… ▽ More

    Submitted 19 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE SMC 2023

  17. arXiv:2306.03509  [pdf, other

    eess.AS cs.AI cs.SD

    Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

    Authors: Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

    Abstract: Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS. However, previous works usually encode speech into latent using audio codec and use autoregressive language models or diffusion models to generate it, which ignores the intrinsic nature of speech and may lead to inferior or un… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  18. arXiv:2306.00303  [pdf, other

    cs.CV eess.IV

    Sea Ice Extraction via Remote Sensed Imagery: Algorithms, Datasets, Applications and Challenges

    Authors: Anzhu Yu, Wenjun Huang, Qing Xu, Qun Sun, Wenyue Guo, Song Ji, Bowei Wen, Chunping Qiu

    Abstract: The deep learning, which is a dominating technique in artificial intelligence, has completely changed the image understanding over the past decade. As a consequence, the sea ice extraction (SIE) problem has reached a new era. We present a comprehensive review of four important aspects of SIE, including algorithms, datasets, applications, and the future trends. Our review focuses on researches publ… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 24 pages, 6 figures

  19. arXiv:2212.10103  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

    Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

    Abstract: Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulat… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 7 pages,5 figures

  20. arXiv:2211.08697  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    PBSM: Backdoor attack against Keyword spotting based on pitch boosting and sound masking

    Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

    Abstract: Keyword spotting (KWS) has been widely used in various speech control scenarios. The training of KWS is usually based on deep neural networks and requires a large amount of data. Manufacturers often use third-party data to train KWS. However, deep neural networks are not sufficiently interpretable to manufacturers, and attackers can manipulate third-party training data to plant backdoors during th… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures

  21. arXiv:2211.07429  [pdf, other

    q-bio.NC cs.LG eess.IV stat.CO stat.ME

    Accounting for Temporal Variability in Functional Magnetic Resonance Imaging Improves Prediction of Intelligence

    Authors: Yang Li, Xin Ma, Raj Sunderraman, Shihao Ji, Suprateek Kundu

    Abstract: Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dy… ▽ More

    Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  22. arXiv:2210.06753  [pdf, other

    physics.pop-ph eess.SY

    Using Physics Simulations to Find Targeting Strategies in Competitive Bowling

    Authors: Simon Ji, Shouzhuo Yang, Wilber Dominguez, Cacey Bester

    Abstract: This article demonstrates a new approach to finding ideal bowling targeting strategies through computer simulation. To model bowling ball behaviour, a system of five coupled differential equations is derived using Euler equations for rigid body rotations. We used a computer program to demonstrate the phases of ball motion and output a plot that displays the optimum initial conditions that can lead… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  23. arXiv:2208.11333  [pdf, other

    cs.IT cs.AI eess.SP

    Enhancing Deep Learning Performance of Massive MIMO CSI Feedback

    Authors: Sijie Ji, Mo Li

    Abstract: CSI feedback is an important problem of Massive multiple-input multiple-output (MIMO) technology because the feedback overhead is proportional to the number of sub-channels and the number of antennas, both of which scale with the size of the Massive MIMO system. Deep learning-based CSI feedback methods have been widely adopted recently owing to their superior performance. Despite the success, curr… ▽ More

    Submitted 3 February, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: This work has been accepted by IEEE ICC 2023. Copyright has been transferred to IEEE

    Journal ref: the IEEE International Conference on Communication, ICC 2023

  24. arXiv:2204.05649  [pdf, other

    cs.SD eess.AS

    ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

    Authors: Zi Huang, Shulei Ji, Zhilan Hu, Chuangjian Cai, Jing Luo, Xinyu Yang

    Abstract: Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module… ▽ More

    Submitted 30 June, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: It has been received by Interspeech2022

  25. arXiv:2112.06443  [pdf, other

    cs.CR cs.SD eess.AS

    Detecting Audio Adversarial Examples with Logit Noising

    Authors: Namgyu Park, Sangwoo Ji, Jong Kim

    Abstract: Automatic speech recognition (ASR) systems are vulnerable to audio adversarial examples that attempt to deceive ASR systems by adding perturbations to benign speech signals. Although an adversarial example and the original benign wave are indistinguishable to humans, the former is transcribed as a malicious target sentence by ASR systems. Several methods have been proposed to generate audio advers… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 10 pages, 12 figures, In Proceedings of the 37th Annual Computer Security Applications Conference (ACSAC) 2021

  26. arXiv:2109.11121  [pdf, other

    eess.IV cs.CV

    Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching

    Authors: Jian Gao, Jin Liu, Shunping Ji

    Abstract: Satellite multi-view stereo (MVS) imagery is particularly suited for large-scale Earth surface reconstruction. Differing from the perspective camera model (pin-hole model) that is commonly used for close-range and aerial cameras, the cubic rational polynomial camera (RPC) model is the mainstream model for push-broom linear-array satellite cameras. However, the homography warping used in the prevai… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV) 2021

  27. arXiv:2104.02331  [pdf, other

    eess.IV cs.CV

    Brain Tumors Classification for MR images based on Attention Guided Deep Learning Model

    Authors: Yuhao Zhang, Shuhang Wang, Haoxiang Wu, Kejia Hu, Shufan Ji

    Abstract: In the clinical diagnosis and treatment of brain tumors, manual image reading consumes a lot of energy and time. In recent years, the automatic tumor classification technology based on deep learning has entered people's field of vision. Brain tumors can be divided into primary and secondary intracranial tumors according to their source. However, to our best knowledge, most existing research on bra… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  28. EfficientTDNN: Efficient Architecture Search for Speaker Recognition

    Authors: Rui Wang, Zhihua Wei, Haoran Duan, Shouling Ji, Yang Long, Zhen Hong

    Abstract: Convolutional neural networks (CNNs), such as the time-delay neural network (TDNN), have shown their remarkable capability in learning speaker embedding. However, they meanwhile bring a huge computational cost in storage size, processing, and memory. Discovering the specialized CNN that meets a specific constraint requires a substantial effort of human experts. Compared with hand-designed approach… ▽ More

    Submitted 18 June, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 13 pages, 12 figures, accepted to TASLP

  29. arXiv:2102.07507  [pdf, ps, other

    cs.IT cs.AI eess.SP

    CLNet: Complex Input Lightweight Neural Network designed for Massive MIMO CSI Feedback

    Authors: Sijie Ji, Mo Li

    Abstract: Unleashing the full potential of massive MIMO in FDD mode by reducing the overhead of CSI feedback has recently garnered attention. Numerous deep learning for massive MIMO CSI feedback approaches have demonstrated their efficiency and potential. However, most existing methods improve accuracy at the cost of computational complexity and the accuracy decreases significantly as the CSI compression ra… ▽ More

    Submitted 28 April, 2023; v1 submitted 15 February, 2021; originally announced February 2021.

    Journal ref: IEEE Wireless Communications Letters, 2021

  30. arXiv:2101.05410  [pdf, other

    eess.IV cs.CV

    A Multi-Stage Attentive Transfer Learning Framework for Improving COVID-19 Diagnosis

    Authors: Yi Liu, Shuiwang Ji

    Abstract: Computed tomography (CT) imaging is a promising approach to diagnosing the COVID-19. Machine learning methods can be employed to train models from labeled CT images and predict whether a case is positive or negative. However, there exists no publicly-available and large-scale CT data to train accurate models. In this work, we propose a multi-stage attentive transfer learning framework for improvin… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: 12 pages, 4 figures, 6 tables

  31. arXiv:2012.00909  [pdf, other

    cs.CV eess.IV

    Visually Imperceptible Adversarial Patch Attacks on Digital Images

    Authors: Yaguan Qian, Jiamin Wang, Bin Wang, Shaoning Zeng, Zhaoquan Gu, Shouling Ji, Wassim Swaileh

    Abstract: The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted more attention. Many algorithms have been proposed to craft powerful adversarial examples. However, most of these algorithms modified the global or local region of pixels without taking network explanations into account. Hence, the perturbations are redundant, which are easily detected by human eyes. In this pap… ▽ More

    Submitted 27 April, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

  32. arXiv:2011.06801  [pdf, other

    cs.SD cs.LG eess.AS

    A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

    Authors: Shulei Ji, Jing Luo, Xinyu Yang

    Abstract: The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance ge… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

    Comments: 96 pages,this is a draft

  33. arXiv:2011.02028  [pdf

    physics.ins-det eess.SY

    The upgrade of EAST Safety and Interlock system

    Authors: Z. C. Zhang, B. J. Xiao, Z. S. Ji, Y. Wang, F. Xia, Z. H. Xu

    Abstract: The Experimental Advanced Superconducting Tokamak (EAST), a nation-level large-scale scientific project of China, plays a key role for the research of peaceful utilizations of fusion energy. The safety and interlock system (SIS) is in charge of the supervision and control of all the EAST components involved in the protection of human and tokamak from potential accidents. With the development of ph… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

  34. arXiv:2008.02340  [pdf, other

    eess.IV cs.CV cs.LG

    Global Voxel Transformer Networks for Augmented Microscopy

    Authors: Zhengyang Wang, Yaochen Xie, Shuiwang Ji

    Abstract: Advances in deep learning have led to remarkable success in augmented microscopy, enabling us to obtain high-quality microscope images without using expensive microscopy hardware and sample preparation techniques. However, current deep learning models for augmented microscopy are mostly U-Net based neural networks, thus sharing certain drawbacks that limit the performance. In this work, we introdu… ▽ More

    Submitted 23 November, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: Supplementary Material: https://documentcloud.adobe.com/link/track?uri=urn:aaid:scds:US:9fcf9e0d-6ea2-470b-8a89-ed09ac634ef8

  35. DeepResp: Deep learning solution for respiration-induced B0 fluctuation artifacts in multi-slice GRE

    Authors: Hongjun An, Hyeong-Geol Shin, Sooyoen Ji, Woojin Jung, Sehong Oh, Dongmyung Shin, Juhyung Park, Jongho Lee

    Abstract: Respiration-induced B$_0$ fluctuation corrupts MRI images by inducing phase errors in k-space. A few approaches such as navigator have been proposed to correct for the artifacts at the expense of sequence modification. In this study, a new deep learning method, which is referred to as DeepResp, is proposed for reducing the respiration-artifacts in multi-slice gradient echo (GRE) images. DeepResp i… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: 19 pages

  36. arXiv:2006.05245  [pdf

    eess.IV cs.CV cs.LG

    A Review of Automated Diagnosis of COVID-19 Based on Scanning Images

    Authors: Delong Chen, Shunhui Ji, Fan Liu, Zewen Li, Xinyu Zhou

    Abstract: The pandemic of COVID-19 has caused millions of infections, which has led to a great loss all over the world, socially and economically. Due to the false-negative rate and the time-consuming of the conventional Reverse Transcription Polymerase Chain Reaction (RT-PCR) tests, diagnosing based on X-ray images and Computed Tomography (CT) images has been widely adopted. Therefore, researchers of the c… ▽ More

    Submitted 24 July, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: In ICRAI 2020: 2020 6th International Conference on Robotics and Artificial Intelligence

  37. Exact artificial boundary conditions of 1D semi-discretized peridynamics

    Authors: Songsong Ji, Gang Pang, Jiwei Zhang, Yibo Yang, Paris Perdikaris

    Abstract: The peridynamic theory reformulates the equations of continuum mechanics in terms of integro-differential equations instead of partial differential equations. It is not trivial to directly apply naive approach in artificial boundary conditions for continua to peridynamics modeling, because it usually involves semi-discretization scheme. In this paper, we present a new way to construct exact bounda… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: 21 pages, 14 figures

  38. arXiv:1912.09015  [pdf

    cs.LG cs.AI eess.IV eess.SP stat.ML

    Deep Reinforcement Learning Designed Shinnar-Le Roux RF Pulse using Root-Flipping: DeepRF_SLR

    Authors: Dongmyung Shin, Sooyeon Ji, Doohee Lee, Jieun Lee, Se-Hong Oh, Jongho Lee

    Abstract: A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape,… ▽ More

    Submitted 1 September, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted at IEEE transactions on Medical Imaging (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9174664)

  39. W-Net: Two-stage U-Net with misaligned data for raw-to-RGB mapping

    Authors: Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, Sung-Jin Cho, Jun-Pyo Hong, Sung-Jea Ko

    Abstract: Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreov… ▽ More

    Submitted 21 November, 2019; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Accepted by ICCVW 2019

  40. arXiv:1911.07424  [pdf, other

    cs.CV cs.LG eess.IV

    Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

    Authors: Cheol-hwan Yoo, Seo-won Ji, Yong-goo Shin, Seung-wook Kim, Sung-jea Ko

    Abstract: 3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A han… ▽ More

    Submitted 18 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

    Journal ref: IEEE Access. 8 (2020) 114010-114019

  41. arXiv:1909.13287  [pdf, other

    cs.MM cs.SD eess.AS

    MG-VAE: Deep Chinese Folk Songs Generation with Specific Regional Style

    Authors: Jing Luo, Xinyu Yang, Shulei Ji, Juan Li

    Abstract: Regional style in Chinese folk songs is a rich treasure that can be used for ethnic music creation and folk culture research. In this paper, we propose MG-VAE, a music generative model based on VAE (Variational Auto-Encoder) that is capable of capturing specific music style and generating novel tunes for Chinese folk songs (Min Ge) in a manipulatable way. Specifically, we disentangle the latent sp… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: Accepted by the 7th Conference on Sound and Music Technology, 2019, Harbin, China

  42. arXiv:1907.00941  [pdf, other

    eess.IV cs.LG stat.ML

    Global Pixel Transformers for Virtual Staining of Microscopy Images

    Authors: Yi Liu, Hao Yuan, Zhengyang Wang, Shuiwang Ji

    Abstract: Visualizing the details of different cellular structures is of great importance to elucidate cellular functions. However, it is challenging to obtain high quality images of different structures directly due to complex cellular environments. Fluorescence staining is a popular technique to label different structures but has several drawbacks. In particular, label staining is time consuming and may a… ▽ More

    Submitted 30 September, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: 10 pages, 6 figures, 5 tables

  43. arXiv:1806.09998  [pdf

    eess.SP cs.SE eess.SY

    Real time state monitoring and fault diagnosis system for motor based on LabVIEW

    Authors: S. Q. Liu, Z. S. Ji, Y Wang, Z. C. Zhang

    Abstract: Motor is the most widely used production equipment in industrial field. In order to realize the real-time state monitoring and multi-fault pre-diagnosis of three-phase motor, this paper presents a design of three-phase motor state monitoring and fault diagnosis system based on LabVIEW. The multi-dimensional vibration acceleration, rotational speed, temperature, current and voltage signals of the… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 4 pages,the IEEE NPSS Real Time Conference

    Report number: 547,poster session 1

    Journal ref: 2018 IEEE-NPSS Real Time Conference (RT)

  44. arXiv:1806.08530  [pdf

    eess.SP

    Upgrade of the Analog Integrator for EAST Device

    Authors: Y. Wang, Z. S. Ji, Z. C. Zhang, S. Li, F. Wang, X. Y. Sun

    Abstract: Integrators are fundamental instruments to recover differential signals from magnetic probes in Experimental Advanced Superconducting Tokamak (EAST) experiments. A kind of difference integrator is introduced which has the same structure as the standard difference amplifier. The linear fitting method is used for determining the effective drift slope, then the plasma control system (PCS) use the dri… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Report number: 564