Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Mao, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2406.10724  [pdf, other

    eess.IV cs.CV cs.LG

    Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

    Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

    Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in 38th Annual Small Satellite Conference

  3. arXiv:2405.02191  [pdf

    cs.CV cs.LG eess.IV

    Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning

    Authors: Yijun Yan, Jinchang Ren, Barry Harrison, Oliver Lewis, Yinhe Li, Ping Ma

    Abstract: Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destru… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 4 pages,4 figures

  4. arXiv:2405.01362  [pdf, other

    eess.SP

    Wideband Penetration Loss through Building Materials and Partitions at 6.75 GHz in FR1(C) and 16.95 GHz in the FR3 Upper Mid-band spectrum

    Authors: Dipankar Shakya, Mingjun Ying, Theodore S. Rappaport, Hitesh Poddar, Peijie Ma, Yanbo Wang, Idris Al-Wazani

    Abstract: The 4--8 GHz FR1(C) and 7--24 GHz upper mid-band FR3 spectrum are promising new 6G spectrum allocations being considered by the International Telecommunications Union (ITU) and major governments around the world. There is an urgent need to understand the propagation behavior and radio coverage, outage, and material penetration for the global mobile wireless industry in both indoor and outdoor envi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures, 2 tables, IEEE GLOBECOM 2024

  5. arXiv:2405.01358  [pdf, other

    eess.SP

    Propagation measurements and channel models in Indoor Environment at 6.75 GHz FR1(C) and 16.95 GHz FR3 Upper-mid band Spectrum for 5G and 6G

    Authors: Dipankar Shakya, Mingjun Ying, Theodore S. Rappaport, Hitesh Poddar, Peijie Ma, Yanbo Wang, Idris Al-Wazani

    Abstract: New spectrum allocations in the 4--8 GHz FR1(C) and 7--24 GHz FR3 mid-band frequency spectrum are being considered for 5G/6G cellular deployments. This paper presents results from the world's first comprehensive indoor hotspot (InH) propagation measurement campaign at 6.75 GHz and 16.95 GHz in the NYU WIRELESS Research Center using a 1 GHz wideband channel sounder system over distances from 11 to… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 6 pages, 7 figures, 4 tables, IEEE GLOBECOM 2024

  6. arXiv:2404.09200  [pdf, other

    cs.RO eess.SY

    Tube-RRT*: Efficient Homotopic Path Planning for Swarm Robotics Passing-Through Large-Scale Obstacle Environments

    Authors: Pengda Mao, Quan Quan

    Abstract: Recently, the concept of optimal virtual tube has emerged as a novel solution to the challenging task of navigating obstacle-dense environments for swarm robotics, offering a wide ranging of applications. However, it lacks an efficient homotopic path planning method in obstacle-dense environments. This paper introduces Tube-RRT*, an innovative homotopic path planning method that builds upon and im… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures, submitted to RA-L

  7. arXiv:2404.06784  [pdf

    quant-ph cond-mat.mes-hall cs.AR eess.SY

    Statistical evaluation of 571 GaAs quantum point contact transistors showing the 0.7 anomaly in quantized conductance using millikelvin cryogenic on-chip multiplexing

    Authors: Pengcheng Ma, Kaveh Delfanazari, Reuben K. Puddy, Jiahui Li, Moda Cao, Teng Yi, Jonathan P. Griffiths, Harvey E. Beere, David A. Ritchie, Michael J. Kelly, Charles G. Smith

    Abstract: The mass production and the practical number of cryogenic quantum devices producible in a single chip are limited to the number of electrical contact pads and wiring of the cryostat or dilution refrigerator. It is, therefore, beneficial to contrast the measurements of hundreds of devices fabricated in a single chip in one cooldown process to promote the scalability, integrability, reliability, and… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  8. arXiv:2401.17575  [pdf, other

    eess.SP

    Can We Improve Channel Reciprocity via Loop-back Compensation for RIS-assisted Physical Layer Key Generation

    Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao, Na Li, Pengxuan Mao, Tianyuan Yang

    Abstract: Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware… ▽ More

    Submitted 13 August, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by ICC 2024

  9. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  10. arXiv:2303.17200  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

    Authors: Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

    Abstract: Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. Our method, termed SynthVSR, substantially improves the performance of VSR systems wit… ▽ More

    Submitted 3 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: IEEE/CVF CVPR 2023

  11. Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

    Authors: Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

    Abstract: Audio-visual speech recognition has received a lot of attention due to its robustness against acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets. However, accurate labelling of datasets is time-consuming and expensive. Hence… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

  12. arXiv:2303.09455  [pdf, other

    cs.CL cs.CV cs.LG cs.SD eess.AS

    Learning Cross-lingual Visual Speech Representations

    Authors: Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Cross-lingual self-supervised learning has been a growing research topic in the last few years. However, current works only explored the use of audio signals to create representations. In this work, we study cross-lingual self-supervised visual representation learning. We use the recently-proposed Raw Audio-Visual Speech Encoders (RAVEn) framework to pre-train an audio-visual model with unlabelled… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  13. arXiv:2302.13854  [pdf, other

    eess.SP astro-ph.IM cs.LG cs.SD eess.AS

    A Deep Neural Network Based Reverse Radio Spectrogram Search Algorithm

    Authors: Peter Xiangyuan Ma, Steve Croft, Chris Lintott, Andrew P. V. Siemion

    Abstract: Modern radio astronomy instruments generate vast amounts of data, and the increasingly challenging radio frequency interference (RFI) environment necessitates ever-more sophisticated RFI rejection algorithms. The "needle in a haystack" nature of searches for transients and technosignatures requires us to develop methods that can determine whether a signal of interest has unique properties, or is a… ▽ More

    Submitted 18 January, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: 8 pages, 8 figures

    Journal ref: RAS Techniques and Instruments 2023

  14. arXiv:2211.02133  [pdf, other

    eess.AS cs.CV cs.SD

    Streaming Audio-Visual Speech Recognition with Alignment Regularization

    Authors: Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

    Abstract: In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture. The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution. Streaming recognition with a decoder neural network is realized by… ▽ More

    Submitted 1 July, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted to Interspeech 2023

  15. arXiv:2207.14166  [pdf, ps, other

    cs.CV cs.LG eess.IV

    RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation

    Authors: Guijie Zhu, Zhun Fan, Jiacheng Liu, Duan Yuan, Peili Ma, Meihua Wang, Weihua Sheng, Kelvin C. P. Wang

    Abstract: The acquisition and evaluation of pavement surface data play an essential role in pavement condition evaluation. In this paper, an efficient and effective end-to-end network for automatic pavement crack segmentation, called RHA-Net, is proposed to improve the pavement crack segmentation accuracy. The RHA-Net is built by integrating residual blocks (ResBlocks) and hybrid attention blocks into the e… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  16. arXiv:2202.13084  [pdf, other

    cs.CV cs.SD eess.AS

    Visual Speech Recognition for Multiple Languages in the Wild

    Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Visual speech recognition (VSR) aims to recognize the content of speech based on lip movements, without relying on the audio stream. Advances in deep learning and the availability of large audio-visual datasets have led to the development of much more accurate and robust VSR models than ever before. However, these advances are usually due to the larger training sets rather than the model design. H… ▽ More

    Submitted 30 October, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Published in Nature Machine Intelligence

  17. arXiv:2202.09020  [pdf, other

    cs.CV eess.IV q-bio.QM

    A Comprehensive Survey with Quantitative Comparison of Image Analysis Methods for Microorganism Biovolume Measurements

    Authors: Jiawei Zhang, Chen Li, Md Mamunur Rahaman, Yudong Yao, Pingli Ma, Jinghua Zhang, Xin Zhao, Tao Jiang, Marcin Grzegorzek

    Abstract: With the acceleration of urbanization and living standards, microorganisms play increasingly important roles in industrial production, bio-technique, and food safety testing. Microorganism biovolume measurements are one of the essential parts of microbial analysis. However, traditional manual measurement methods are time-consuming and challenging to measure the characteristics precisely. With the… ▽ More

    Submitted 2 May, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

  18. arXiv:2202.07820  [pdf, other

    eess.IV cs.CV

    A Survey of Semen Quality Evaluation in Microscopic Videos Using Computer Assisted Sperm Analysis

    Authors: Wenwei Zhao, Pingli Ma, Chen Li, Xiaoning Bu, Shuojia Zou, Tao Jiang, Marcin Grzegorzek

    Abstract: The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basi… ▽ More

    Submitted 17 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  19. arXiv:2106.09171  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    LiRA: Learning Visual Speech Representations from Audio through Self-supervision

    Authors: Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

    Abstract: The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning. Recent works have focused on each of these modalities separately, while others have attempted to model both simultaneously in a cross-modal fashion. However, comparatively little attention has been given to leveraging one modality as a training… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at Interspeech 2021

  20. arXiv:2104.13332  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

    Authors: Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

    Abstract: Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based… ▽ More

    Submitted 15 August, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Published in IEEE Transactions on Cybernetics (April 2022)

  21. arXiv:2103.13625  [pdf, other

    eess.IV q-bio.QM

    A Comprehensive Review of Image Analysis Methods for Microorganism Counting: From Classical Image Processing to Deep Learning Approaches

    Authors: Jiawei Zhang, Chen Li, Md Mamunur Rahaman, Yudong Yao, Pingli Ma, Jinghua Zhang, Xin Zhao, Tao Jiang, Marcin Grzegorzek

    Abstract: Microorganisms such as bacteria and fungi play essential roles in many application fields, like biotechnique, medical technique and industrial domain. Microorganism counting techniques are crucial in microorganism analysis, helping biologists and related researchers quantitatively analyze the microorganisms and calculate their characteristics, such as biomass concentration and biological activity.… ▽ More

    Submitted 29 September, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  22. arXiv:2103.03447  [pdf, other

    eess.SY

    User-Centric Cooperative MEC Service Offloading

    Authors: Ruoyun Chen, Hancheng Lu, Pengfei Ma

    Abstract: Mobile edge computing provides users with a cloud environment close to the edge of the wireless network, supporting the computing intensive applications that have low latency requirements. The combination of offloading with the wireless communication brings new challenges. This paper investigates the service caching problem during the long-term service offloading in the user-centric wireless netwo… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 6 pages

  23. arXiv:2102.06657  [pdf, other

    cs.CV eess.AS

    End-to-end Audio-visual Speech Recognition with Conformers

    Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner. In particular, the audio and visual encoders learn to extract features directly from raw pixels and audio waveforms, respectively, which are then fed to conformers and then fusion takes place via a Multi-Layer Perceptron (MLP). T… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  24. arXiv:2001.08702  [pdf, other

    cs.CV cs.SD eess.AS

    Lipreading using Temporal Convolutional Networks

    Authors: Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU) layers. In this work, we address the limitations of this model and we propose changes which further improve its performance. Firstly, the BGRU l… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  25. arXiv:2001.04316  [pdf, other

    eess.AS cs.CV cs.MM

    Visually Guided Self Supervised Learning of Speech Representations

    Authors: Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very limited work that studies the interaction between the two modalities for learning self supervised representations. We propose a framework for learning audio represent… ▽ More

    Submitted 20 February, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: Accepted at ICASSP 2020 v2: Updated to the ICASSP 2020 camera ready version

  26. arXiv:1912.08639  [pdf, other

    cs.CV cs.SD eess.AS

    Detecting Adversarial Attacks On Audiovisual Speech Recognition

    Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Adversarial attacks pose a threat to deep learning models. However, research on adversarial detection methods, especially in the multi-modal domain, is very limited. In this work, we propose an efficient and straightforward detection method based on the temporal correlation between audio and video streams. The main idea is that the correlation between audio and video in adversarial examples will b… ▽ More

    Submitted 12 February, 2021; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted to ICASSP 2021

  27. arXiv:1906.06301  [pdf, other

    eess.AS cs.CV cs.SD

    Video-Driven Speech Reconstruction using Generative Adversarial Networks

    Authors: Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Speech is a means of communication which relies on both audio and visual information. The absence of one modality can often lead to confusion or misinterpretation of information. In this paper we present an end-to-end temporal model capable of directly synthesising audio from silent video, without needing to transform to-and-from intermediate features. Our proposed approach, based on GANs is capab… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  28. arXiv:1906.02112  [pdf, other

    eess.AS cs.CV eess.IV

    Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

    Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

    Abstract: Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise. However, almost all of them ignore the impact of the Lombard effect, i.e., the change in speaking style in noisy environments which aims to make speech more intelligible and affects both the acoustic characteristics of speech and the lip mo… ▽ More

    Submitted 9 July, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at Interspeech 2019

  29. arXiv:1903.03474  [pdf, other

    physics.app-ph eess.SP physics.optics

    Demonstration of multivariate photonics: blind dimensionality reduction with analog integrated photonics

    Authors: Alexander N. Tait, Philip Y. Ma, Thomas Ferreira de Lima, Eric C. Blow, Matthew P. Chang, Mitchell A. Nahmias, Bhavin J. Shastri, Paul R. Prucnal

    Abstract: Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies gr… ▽ More

    Submitted 10 February, 2019; originally announced March 2019.

    Comments: 24 pages, 7 figures