Zum Hauptinhalt springen

Showing 1–50 of 100 results for author: Chiu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02849  [pdf, other

    cs.LG cs.NI

    Active Learning for WBAN-based Health Monitoring

    Authors: Cho-Chun Chiu, Tuan Nguyen, Ting He, Shiqiang Wang, Beom-Su Kim, Ki-Il Kim

    Abstract: We consider a novel active learning problem motivated by the need of learning machine learning models for health monitoring in wireless body area network (WBAN). Due to the limited resources at body sensors, collecting each unlabeled sample in WBAN incurs a nontrivial cost. Moreover, training health monitoring models typically requires labels indicating the patient's health state that need to be g… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.07492  [pdf, other

    cs.CV cs.LG

    Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

    Authors: Christopher Chiu, Maximilian Heil, Teresa Kim, Anthony Miyaguchi

    Abstract: FungiCLEF 2024 addresses the fine-grained visual categorization (FGVC) of fungi species, with a focus on identifying poisonous species. This task is challenging due to the size and class imbalance of the dataset, subtle inter-class variations, and significant intra-class variability amongst samples. In this paper, we document our approach in tackling this challenge through the use of ensemble clas… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Submitted and accepted into CLEF 2024 CEUR-WS proceedings

  4. arXiv:2405.20550  [pdf

    cs.LG stat.ML

    Uncertainty Quantification for Deep Learning

    Authors: Peter Jan van Leeuwen, J. Christine Chiu, C. Kevin Yang

    Abstract: A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages 4 figures, submitted to Environmental data Science

    MSC Class: 62D99 ACM Class: G.3

  5. arXiv:2405.19660  [pdf, other

    cs.CL

    PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

    Authors: Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen

    Abstract: Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cogniti… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  6. arXiv:2405.16495  [pdf, ps, other

    cs.GT

    DePIN: A Framework for Token-Incentivized Participatory Sensing

    Authors: Michael T. C. Chiu, Sachit Mahajan, Mark C. Ballandies, Uroš V. Kalabić

    Abstract: There is always demand for integrating data into microeconomic decision making. Participatory sensing deals with how real-world data may be extracted with stakeholder participation and resolves a problem of Big Data, which is concerned with monetizing data extracted from individuals without their participation. We present how Decentralized Physical Infrastructure Networks (DePINs) extend participa… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  7. arXiv:2405.08681  [pdf, other

    cs.CV cs.AI

    Achieving Fairness Through Channel Pruning for Dermatological Disease Diagnosis

    Authors: Qingpeng Kong, Ching-Hao Chiu, Dewen Zeng, Yu-Jen Chen, Tsung-Yi Ho, Jingtong hu, Yiyu Shi

    Abstract: Numerous studies have revealed that deep learning-based medical image classification models may exhibit bias towards specific demographic attributes, such as race, gender, and age. Existing bias mitigation methods often achieve high level of fairness at the cost of significant accuracy degradation. In response to this challenge, we propose an innovative and adaptable Soft Nearest Neighbor Loss-bas… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 13 pages, 3 figures, early accepted by International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024

  8. arXiv:2404.17716  [pdf

    cs.AI

    Airlift Challenge: A Competition for Optimizing Cargo Delivery

    Authors: Adis Delanovic, Carmen Chiu, Andre Beckus

    Abstract: Airlift operations require the timely distribution of various cargo, much of which is time sensitive and valuable. However, these operations have to contend with sudden disruptions from weather and malfunctions, requiring immediate rescheduling. The Airlift Challenge competition seeks possible solutions via a simulator that provides a simplified abstraction of the airlift problem. The simulator us… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  9. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  10. arXiv:2402.13061  [pdf, other

    cs.CV

    Toward Fairness via Maximum Mean Discrepancy Regularization on Logits Space

    Authors: Hao-Wei Chung, Ching-Hao Chiu, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho

    Abstract: Fairness has become increasingly pivotal in machine learning for high-risk applications such as machine learning in healthcare and facial recognition. However, we see the deficiency in the previous logits space constraint methods. Therefore, we propose a novel framework, Logits-MMD, that achieves the fairness condition by imposing constraints on output logits with Maximum Mean Discrepancy. Moreove… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  11. arXiv:2402.12862  [pdf, other

    cs.CL

    Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

    Authors: Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland

    Abstract: The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without m… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  12. Achieve Fairness without Demographics for Dermatological Disease Diagnosis

    Authors: Ching-Hao Chiu, Yu-Jen Chen, Yawen Wu, Yiyu Shi, Tsung-Yi Ho

    Abstract: In medical image diagnosis, fairness has become increasingly crucial. Without bias mitigation, deploying unfair AI would harm the interests of the underprivileged population and potentially tear society apart. Recent research addresses prediction biases in deep learning models concerning demographic groups (e.g., gender, age, and race) by utilizing demographic (sensitive attribute) information dur… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  13. A metric for characterizing the arm nonuse workspace in poststroke individuals using a robot arm

    Authors: Nathaniel Dennler, Amelia Cain, Erica De Guzman, Claudia Chiu, Carolee J. Winstein, Stefanos Nikolaidis, Maja J. Matarić

    Abstract: An over-reliance on the less-affected limb for functional tasks at the expense of the paretic limb and in spite of recovered capacity is an often-observed phenomenon in survivors of hemispheric stroke. The difference between capacity for use and actual spontaneous use is referred to as arm nonuse. Obtaining an ecologically valid evaluation of arm nonuse is challenging because it requires the obser… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to Science Robotics at https://www.science.org/doi/10.1126/scirobotics.adf7723 on November 15th, 2023

    Journal ref: Science Robotics 8, eadf7723(2023)

  14. arXiv:2401.03083  [pdf, other

    cs.LG cs.DC math.OC

    Energy-efficient Decentralized Learning via Graph Sparsification

    Authors: Xusheng Zhang, Cho-Chun Chiu, Ting He

    Abstract: This work aims at improving the energy efficiency of decentralized learning by optimizing the mixing matrix, which controls the communication demands during the learning process. Through rigorous analysis based on a state-of-the-art decentralized learning algorithm, the problem is formulated as a bi-level optimization, with the lower level solved by graph sparsification. A solution with guaranteed… ▽ More

    Submitted 22 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  15. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  16. arXiv:2310.05418  [pdf, other

    cs.CL cs.AI cs.HC

    Humanoid Agents: Platform for Simulating Human-like Generative Agents

    Authors: Zhilin Wang, Yu Ying Chiu, Yu Cheung Chiu

    Abstract: Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Em… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP System Demonstrations 2023

  17. arXiv:2310.00230  [pdf, other

    cs.CL cs.SD eess.AS

    SLM: Bridge the thin gap between speech and text foundation models

    Authors: Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

    Abstract: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achiev… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  18. arXiv:2308.14845  [pdf, other

    cs.LG cs.DB

    SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

    Authors: Chun Wai Chiu, Leandro L. Minku

    Abstract: Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in class imbalanced data streams, are not taken into account by existing approaches when learning class imbalanced data streams. In this work, we propose a drift ad… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 59 pages, 85 figures

  19. arXiv:2308.10355  [pdf, other

    eess.AS cs.SD

    Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music

    Authors: Ching-Yu Chiu, Meinard Müller, Matthew E. P. Davies, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: To model the periodicity of beats, state-of-the-art beat tracking systems use "post-processing trackers" (PPTs) that rely on several empirically determined global assumptions for tempo transition, which work well for music with a steady tempo. For expressive classical music, however, these assumptions can be too rigid. With two large datasets of Western classical piano music, namely the Aligned Sc… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (July 2023)

  20. arXiv:2308.07024  [pdf, other

    cs.CV

    PGT-Net: Progressive Guided Multi-task Neural Network for Small-area Wet Fingerprint Denoising and Recognition

    Authors: Yu-Ting Li, Ching-Te Chiu, An-Ting Hsieh, Mao-Hsiu Hsu, Long Wenyong, Jui-Min Hsu

    Abstract: Fingerprint recognition on mobile devices is an important method for identity verification. However, real fingerprints usually contain sweat and moisture which leads to poor recognition performance. In addition, for rolling out slimmer and thinner phones, technology companies reduce the size of recognition sensors by embedding them with the power button. Therefore, the limited size of fingerprint… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  21. arXiv:2307.07650  [pdf, ps, other

    cs.LG cs.AI eess.SP

    SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization

    Authors: An-Hung Hsiao, Li-Hsiang Shen, Chen-Yi Chang, Chun-Jie Chiu, Kai-Ten Feng

    Abstract: Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  22. Toward Fairness Through Fair Multi-Exit Framework for Dermatological Disease Diagnosis

    Authors: Ching-Hao Chiu, Hao-Wei Chung, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho

    Abstract: Fairness has become increasingly pivotal in medical image recognition. However, without mitigating bias, deploying unfair medical AI systems could harm the interests of underprivileged populations. In this paper, we observe that while features extracted from the deeper layers of neural networks generally offer higher accuracy, fairness conditions deteriorate as we extract features from deeper laye… ▽ More

    Submitted 1 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: MICCAI2023

  23. arXiv:2306.08131  [pdf, other

    eess.AS cs.SD

    Efficient Adapters for Giant Speech Models

    Authors: Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

    Abstract: Large pre-trained speech models are widely used as the de-facto paradigm, especially in scenarios when there is a limited amount of labeled data available. However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales. In this paper, we propose a novel approach c… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  24. arXiv:2304.05483  [pdf, other

    cs.RO

    Contingency Games for Multi-Agent Interaction

    Authors: Lasse Peters, Andrea Bajcsy, Chih-Yuan Chiu, David Fridovich-Keil, Forrest Laine, Laura Ferranti, Javier Alonso-Mora

    Abstract: Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contin… ▽ More

    Submitted 21 December, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

  25. arXiv:2303.01037  [pdf, other

    cs.CL cs.SD eess.AS

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Authors: Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk , et al. (2 additional authors not shown)

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quant… ▽ More

    Submitted 24 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 20 pages, 7 figures, 8 tables

  26. arXiv:2301.06869  [pdf, other

    cs.CV

    SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation

    Authors: Junjie Zhou, Yongping Xiong, Chinwai Chiu, Fangyu Liu, Xiangyang Gong

    Abstract: Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene objects. In this paper, we propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes. Our SAT… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  27. arXiv:2301.02989  [pdf, other

    cs.CV cs.AI cs.LG

    Fair Multi-Exit Framework for Facial Attribute Classification

    Authors: Ching-Hao Chiu, Hao-Wei Chung, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho

    Abstract: Fairness has become increasingly pivotal in facial recognition. Without bias mitigation, deploying unfair AI would harm the interest of the underprivileged population. In this paper, we observe that though the higher accuracy that features from the deeper layer of a neural networks generally offer, fairness conditions deteriorate as we extract features from deeper layers. This phenomenon motivates… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

  28. arXiv:2301.01398  [pdf, other

    cs.MA cs.RO eess.SY

    Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories

    Authors: Jingqi Li, Chih-Yuan Chiu, Lasse Peters, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil

    Abstract: In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that th… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Accepted by AAMAS 2023. This is a preprint version

  29. arXiv:2211.16596  [pdf, other

    stat.ML cs.LG eess.SY

    Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test

    Authors: Chih-Yuan Chiu, Kshitij Kulkarni, Shankar Sastry

    Abstract: Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probabilit… ▽ More

    Submitted 17 July, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

  30. arXiv:2211.00115  [pdf, other

    cs.CL cs.SD eess.AS

    Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

    Authors: Xinjian Li, Ye Jia, Chung-Cheng Chiu

    Abstract: Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years. Many end-to-end systems have been proposed and show advantages over conventional cascade systems, which are often composed of recognition, translation and synthesis sub-systems. However, most of the end-to-end systems still rely on intermediate textual supervision during training, which makes it infeasible to w… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  31. An Analysis Method for Metric-Level Switching in Beat Tracking

    Authors: Ching-Yu Chiu, Meinard Müller, Matthew E. P. Davies, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: For expressive music, the tempo may change over time, posing challenges to tracking the beats by an automatic model. The model may first tap to the correct tempo, but then may fail to adapt to a tempo change, or switch between several incorrect but perceptually plausible ones (e.g., half- or double-tempo). Existing evaluation metrics for beat tracking do not reflect such behaviors, as they typical… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE Signal Processing Letters (Oct. 2022)

  32. arXiv:2210.06007  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE

    Authors: Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang

    Abstract: This paper proposes a model that generates a drum track in the audio domain to play along to a user-provided drum-free recording. Specifically, using paired data of drumless tracks and the corresponding human-made drum tracks, we train a Transformer model to improvise the drum part of an unseen drumless recording. We combine two approaches to encode the input audio. First, we train a vector-quanti… ▽ More

    Submitted 31 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at ISMIR 2022

  33. arXiv:2209.01604  [pdf, other

    cs.CV cs.LG

    Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation

    Authors: Yu-Jen Chen, Wei-Hsiang Shen, Hao-Wei Chung, Ching-Hao Chiu, Da-Cheng Juan, Tsung-Ying Ho, Chi-Tung Cheng, Meng-Lin Li, Tsung-Yi Ho

    Abstract: Medical report generation is a challenging task since it is time-consuming and requires expertise from experienced radiologists. The goal of medical report generation is to accurately capture and describe the image findings. Previous works pretrain their visual encoding neural networks with large datasets in different domains, which cannot learn general visual representation in the specific medica… ▽ More

    Submitted 7 January, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

  34. arXiv:2207.05043  [pdf, other

    cs.RO eess.SY

    SLAM Backends with Objects in Motion: A Unifying Framework and Tutorial

    Authors: Chih-Yuan Chiu

    Abstract: Simultaneous Localization and Mapping (SLAM) algorithms are frequently deployed to support a wide range of robotics applications, such as autonomous navigation in unknown environments, and scene mapping in virtual reality. Many of these applications require autonomous agents to perform SLAM in highly dynamic scenes. To this end, this tutorial extends a recently introduced, unifying optimization-ba… ▽ More

    Submitted 27 February, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

  35. arXiv:2206.09120  [pdf, other

    stat.ML cs.LG

    Pursuit of a Discriminative Representation for Multiple Subspaces via Sequential Games

    Authors: Druv Pai, Michael Psenka, Chih-Yuan Chiu, Manxi Wu, Edgar Dobriban, Yi Ma

    Abstract: We consider the problem of learning discriminative representations for data in a high-dimensional space with distribution supported on or around multiple low-dimensional linear subspaces. That is, we wish to compute a linear injective map of the data such that the features lie on multiple orthogonal subspaces. Instead of treating this learning problem using multiple PCAs, we cast it as a sequentia… ▽ More

    Submitted 5 October, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: main body is 16 pages and has 5 figures; appendix is 17 pages and has 6 figures

  36. arXiv:2205.12501  [pdf, ps, other

    eess.SP cs.IT

    Using Loaded N-port Structures to Achieve the Continuous-Space Electromagnetic Channel Capacity Bound

    Authors: Zixiang Han, Shanpu Shen, Yujie Zhang, Shiwen Tang, Chi-Yuk Chiu, Ross Murch

    Abstract: A method for achieving the continuous-space electromagnetic channel capacity bound using loaded N-port structures is described. It is relevant for the design of compact multiple-input multiple-output (MIMO) antennas that can achieve channel capacity bounds when constrained by size. The method is not restricted to a specific antenna configuration and a closed-form expression for the channel capacit… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  37. arXiv:2205.08014  [pdf, ps, other

    eess.AS cs.SD

    Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

    Authors: Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

    Abstract: Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 5 pages, 3 tables

  38. arXiv:2203.16690  [pdf, other

    cs.RO eess.SY

    GTP-SLAM: Game-Theoretic Priors for Simultaneous Localization and Mapping in Multi-Agent Scenarios

    Authors: Chih-Yuan Chiu, David Fridovich-Keil

    Abstract: Robots operating in multi-player settings must simultaneously model the environment and the behavior of human or robotic agents who share that environment. This modeling is often approached using Simultaneous Localization and Mapping (SLAM); however, SLAM algorithms usually neglect multi-player interactions. In contrast, the motion planning literature often uses dynamic game theory to explicitly m… ▽ More

    Submitted 8 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: 6 pages, 3 figures

  39. Mental Health Pandemic during the COVID-19 Outbreak: Social Media as a Window to Public Mental Health

    Authors: Michelle Bak, Chungyi Chiu, Jessie Chin

    Abstract: Intensified preventive measures during the COVID-19 pandemic, such as lockdown and social distancing, heavily increased the perception of social isolation (i.e., a discrepancy between one's social needs and the provisions of the social environment) among young adults. Social isolation is closely associated with situational loneliness (i.e., loneliness emerging from environmental change), a risk fa… ▽ More

    Submitted 25 April, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

  40. arXiv:2202.05267  [pdf, other

    physics.med-ph cs.CV eess.IV

    On Real-time Image Reconstruction with Neural Networks for MRI-guided Radiotherapy

    Authors: David E. J. Waddington, Nicholas Hindley, Neha Koonjoo, Christopher Chiu, Tess Reynolds, Paul Z. Y. Liu, Bo Zhu, Danyal Bhutto, Chiara Paganelli, Paul J. Keall, Matthew S. Rosen

    Abstract: MRI-guidance techniques that dynamically adapt radiation beams to follow tumor motion in real-time will lead to more accurate cancer treatments and reduced collateral healthy tissue damage. The gold-standard for reconstruction of undersampled MR data is compressed sensing (CS) which is computationally slow and limits the rate that images can be available for real-time adaptation. Here, we demonstr… ▽ More

    Submitted 18 May, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 12 pages, 6 figures, 1 table. v2 has a typo in eqn 1 corrected and references added to the discussion

  41. arXiv:2202.01855  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised Learning with Random-projection Quantizer for Speech Recognition

    Authors: Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu

    Abstract: We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither… ▽ More

    Submitted 29 June, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  42. arXiv:2202.00717  [pdf, ps, other

    cs.DC

    Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

    Authors: Cheng-Hsiang Chiu, Tsung-Wei Huang, Zizheng Guo, Yibo Lin

    Abstract: Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient for algorithms that only exploit task parallelism in pipeline. As a result, we introduce a new task-parallel pipeline programming framework called Pipeflow. Pipe… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

  43. arXiv:2111.00127  [pdf, other

    eess.AS cs.SD

    Cross-attention conformer for context modeling in speech enhancement for ASR

    Authors: Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He

    Abstract: This work introduces \emph{cross-attention conformer}, an attention-based architecture for context modeling in speech enhancement. Given that the context information can often be sequential, and of different length as the audio that is to be enhanced, we make use of cross-attention to summarize and merge contextual information with input features. Building upon the recently proposed conformer mode… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: Will appear in IEEE-ASRU 2021

  44. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  45. Two-Stage Mesh Deep Learning for Automated Tooth Segmentation and Landmark Localization on 3D Intraoral Scans

    Authors: Tai-Hsien Wu, Chunfeng Lian, Sanghee Lee, Matthew Pastewait, Christian Piers, Jie Liu, Fang Wang, Li Wang, Chiung-Ying Chiu, Wenchi Wang, Christina Jackson, Wei-Lun Chao, Dinggang Shen, Ching-Chang Ko

    Abstract: Accurately segmenting teeth and identifying the corresponding anatomical landmarks on dental mesh models are essential in computer-aided orthodontic treatment. Manually performing these two tasks is time-consuming, tedious, and, more importantly, highly dependent on orthodontists' experiences due to the abnormality and large-scale variance of patients' teeth. Some machine learning-based methods ha… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 9 pages, 8 figures, accepted by IEEE TMI

  46. arXiv:2108.06209  [pdf, other

    cs.LG cs.SD eess.AS

    W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

    Authors: Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

    Abstract: Motivated by the success of masked language modeling~(MLM) in pre-training natural language processing models, we propose w2v-BERT that explores MLM for self-supervised speech representation learning. w2v-BERT is a framework that combines contrastive learning and MLM, where the former trains the model to discretize input continuous speech signals into a finite set of discriminative speech tokens,… ▽ More

    Submitted 13 September, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

  47. arXiv:2107.00998  [pdf

    cs.NI cs.LG cs.PF

    Gamers Private Network Performance Forecasting. From Raw Data to the Data Warehouse with Machine Learning and Neural Nets

    Authors: Albert Wong, Chun Yin Chiu, Gaétan Hains, Jack Humphrey, Hans Fuhrmann, Youry Khmelevsky, Chris Mazur

    Abstract: Gamers Private Network (GPN) is a client/server technology that guarantees a connection for online video games that is more reliable and lower latency than a standard internet connection. Users of the GPN technology benefit from a stable and high-quality gaming experience for online games, which are hosted and played across the world. After transforming a massive volume of raw networking data coll… ▽ More

    Submitted 25 May, 2021; originally announced July 2021.

    Comments: 8 pages, 12 figures

  48. arXiv:2106.09082  [pdf, other

    math.OC cs.LG

    Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization

    Authors: Chinmay Maheshwari, Chih-Yuan Chiu, Eric Mazumdar, S. Shankar Sastry, Lillian J. Ratliff

    Abstract: Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algor… ▽ More

    Submitted 19 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 38 pages, 6 figures

  49. arXiv:2106.08703  [pdf, other

    cs.SD cs.LG eess.AS

    Source Separation-based Data Augmentation for Improved Joint Beat and Downbeat Tracking

    Authors: Ching-Yu Chiu, Joann Ching, Wen-Yi Hsiao, Yu-Hua Chen, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: Due to advances in deep learning, the performance of automatic beat and downbeat tracking in musical audio signals has seen great improvement in recent years. In training such deep learning based models, data augmentation has been found an important technique. However, existing data augmentation methods for this task mainly target at balancing the distribution of the training data with respect to… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to European Signal Processing Conference (EUSIPCO 2021)

  50. arXiv:2106.08685  [pdf, other

    cs.SD cs.LG eess.AS

    Drum-Aware Ensemble Architecture for Improved Joint Musical Beat and Downbeat Tracking

    Authors: Ching-Yu Chiu, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: This paper presents a novel system architecture that integrates blind source separation with joint beat and downbeat tracking in musical audio signals. The source separation module segregates the percussive and non-percussive components of the input signal, over which beat and downbeat tracking are performed separately and then the results are aggregated with a learnable fusion mechanism. This way… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to IEEE Signal Processing Letters (May 2021)