Search | arXiv e-print repository

Periodic minimum in the count of binomial coefficients not divisible by a prime

Authors: Hsien-Kuei Hwang, Svante Janson, Tsung-Hsi Tsai

Abstract: The summatory function of the number of binomial coefficients not divisible by a prime is known to exhibit regular periodic oscillations, yet identifying the less regularly behaved minimum of the underlying periodic functions has been open for almost all cases. We propose an approach to identify such minimum in some generality, solving particularly a previous conjecture of B. Wilson [Asymptotic be… ▽ More The summatory function of the number of binomial coefficients not divisible by a prime is known to exhibit regular periodic oscillations, yet identifying the less regularly behaved minimum of the underlying periodic functions has been open for almost all cases. We propose an approach to identify such minimum in some generality, solving particularly a previous conjecture of B. Wilson [Asymptotic behavior of Pascal's triangle modulo a prime, Acta Arith. 83 (1998), pp. 105-116]. △ Less

Submitted 13 August, 2024; originally announced August 2024.

MSC Class: 05A10; 11B65; 11B37; 39A23; 65Q30

arXiv:2403.12024 [pdf, other]

Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

Authors: Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai

Abstract: Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to l… ▽ More Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Taiwanese Hokkien Han and Traditional Mandarin Chinese. Our comprehensive experiments involve translation tasks across various writing systems of Taiwanese Hokkien as well as between Taiwanese Hokkien and other HRLs. We find that the use of a limited monolingual corpus still further improves the model's Taiwanese Hokkien capabilities. We then utilize our translation model to standardize all Taiwanese Hokkien writing systems into Hokkien Han, resulting in further performance improvements. Additionally, we introduce an evaluation method incorporating back-translation and GPT-4 to ensure reliable translation quality assessment even for LRLs. The study contributes to narrowing the resource gap for Taiwanese Hokkien and empirically investigates the advantages and limitations of pre-training and fine-tuning based on LLaMA 2. △ Less

Submitted 14 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024 as a long oral paper

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.01685 [pdf, other]

SMUTF: Schema Matching Using Generative Tags and Hybrid Features

Authors: Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai

Abstract: We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the… ▽ More We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy 'generative tags' for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models. Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and} improving the F1 score by 11.84% and the AUC of ROC by 5.08%. △ Less

Submitted 6 February, 2024; v1 submitted 22 January, 2024; originally announced February 2024.

arXiv:2401.16803 [pdf, other]

PBSCR: The Piano Bootleg Score Composer Recognition Dataset

Authors: Arhan Jain, Alec Bunn, Austin Pham, TJ Tsai

Abstract: This article motivates, describes, and presents the PBSCR dataset for studying composer recognition of classical piano music. Our goal was to design a dataset that facilitates large-scale research on composer recognition that is suitable for modern architectures and training practices. To achieve this goal, we utilize the abundance of sheet music images and rich metadata on IMSLP, use a previously… ▽ More This article motivates, describes, and presents the PBSCR dataset for studying composer recognition of classical piano music. Our goal was to design a dataset that facilitates large-scale research on composer recognition that is suitable for modern architectures and training practices. To achieve this goal, we utilize the abundance of sheet music images and rich metadata on IMSLP, use a previously proposed feature representation called a bootleg score to encode the location of noteheads relative to staff lines, and present the data in an extremely simple format (2D binary images) to encourage rapid exploration and iteration. The dataset itself contains 40,000 62x64 bootleg score images for a 9-class recognition task, 100,000 62x64 bootleg score images for a 100-class recognition task, and 29,310 unlabeled variable-length bootleg score images for pretraining. The labeled data is presented in a form that mirrors MNIST images, in order to make it extremely easy to visualize, manipulate, and train models in an efficient manner. We include relevant information to connect each bootleg score image with its underlying raw sheet music image, and we scrape, organize, and compile metadata from IMSLP on all piano works to facilitate multimodal research and allow for convenient linking to other datasets. We release baseline results in a supervised and low-shot setting for future works to compare against, and we discuss open research questions that the PBSCR data is especially well suited to facilitate research on. △ Less

Submitted 5 August, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 19 pages, 6 figures, to be published in Transactions of the International Society for Music Information Retrieval

arXiv:2401.15879 [pdf, other]

lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap

Authors: Tzu-Hsien Tsai, Yun-Da Tsai, Shou-De Lin

Abstract: Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given th… ▽ More Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $λ$ output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets. △ Less

Submitted 12 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.05782 [pdf, other]

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications

Authors: Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li

Abstract: Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significan… ▽ More Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significant performance, area, and memory footprint improvement. While promising, the mixed-precision computation on error resilience remains unexplored. To this end, we develop a fault injection framework that systematically injects fault into the mixed-precision computation results. We investigate how the faults affect the accuracy of machine learning applications. Based on the error resilience characteristics, we offer lightweight error detection and correction solutions that significantly improve the overall model accuracy if the models experience hardware faults. The solutions can be efficiently integrated into the accelerator's pipelines. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.04799 [pdf, other]

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Authors: Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee

Abstract: Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic.… ▽ More Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic. The chat vector is derived by subtracting the weights of a pre-trained base model (e.g. LLaMA2) from those of its corresponding chat model (e.g. LLaMA2-chat). By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities in new languages without the need for further training. Our empirical studies demonstrate the superior efficacy of the chat vector from three different aspects: instruction following, toxicity mitigation, and multi-turn dialogue. Moreover, to showcase the adaptability of our approach, we extend our experiments to encompass various languages, base models, and chat vectors. The results underscore the chat vector's simplicity, effectiveness, and wide applicability, making it a compelling solution for efficiently enabling conversational capabilities in pre-trained language models. Our code is available at https://github.com/aqweteddy/ChatVector. △ Less

Submitted 7 June, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: ACL 2024 camera-ready version

arXiv:2308.15118 [pdf, other]

Large Language Models on the Chessboard: A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills

Authors: Mu-Tien Kuo, Chih-Chung Hsueh, Richard Tzong-Han Tsai

Abstract: While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining bot… ▽ More While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining both the legality and quality of moves, we assess ChatGPT's understanding of the chessboard, adherence to chess rules, and strategic decision-making abilities. Our evaluation identifies limitations within ChatGPT's attention mechanism that affect its formal language comprehension and uncovers the model's underdeveloped self-regulation abilities. Our study also reveals ChatGPT's propensity for a coherent strategy in its gameplay and a noticeable uptick in decision-making assertiveness when the model is presented with a greater volume of natural language or possesses a more lucid understanding of the state of the chessboard. These findings contribute to the growing exploration of language models' abilities beyond natural language processing, providing valuable information for future research towards models demonstrating human-like cognitive abilities. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2303.07154 [pdf, other]

Differential Good Arm Identification

Authors: Yun-Da Tsai, Tzu-Hsien Tsai, Shou-De Lin

Abstract: This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification alg… ▽ More This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification algorithm to improve the sample complexity of the state-of-the-art HDoC algorithm in a data-driven fashion. We also showed that the DGAI can further boost the performance of a general multi-arm bandit (MAB) problem given a threshold as a prior knowledge to the arm set. Extensive experiments confirm that our algorithm outperform the baseline algorithms significantly in both synthetic and real world datasets for both GAI and MAB tasks. △ Less

Submitted 15 February, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

arXiv:2301.08937 [pdf, other]

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

Authors: Sin-En Lu, Bo-Han Lu, Chao-Yi Lu, Richard Tzong-Han Tsai

Abstract: In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects. In Southeast Asian countries such as Singapore, Indonesia, and Malaysia, Hokkien-Mandarin is the most widespread code-mixed language pair among Chinese immigrants, and it is also common in Taiwan. However, dialects such as Hokkien often have a scarcity of resources an… ▽ More In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects. In Southeast Asian countries such as Singapore, Indonesia, and Malaysia, Hokkien-Mandarin is the most widespread code-mixed language pair among Chinese immigrants, and it is also common in Taiwan. However, dialects such as Hokkien often have a scarcity of resources and the lack of an official writing system, limiting the development of dialect CM research. In this paper, we propose a method to construct a Hokkien-Mandarin CM dataset to mitigate the limitation, overcome the morphological issue under the Sino-Tibetan language family, and offer an efficient Hokkien word segmentation method through a linguistics-based toolkit. Furthermore, we use our proposed dataset and employ transfer learning to train the XLM (cross-lingual language model) for translation tasks. To fit the code-mixing scenario, we adapt XLM slightly. We found that by using linguistic knowledge, rules, and language tags, the model produces good results on CM data translation while maintaining monolingual translation quality. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: The paper was accepted by EMNLP 2022 findings

arXiv:2210.10968 [pdf, other]

Identities and periodic oscillations of divide-and-conquer recurrences splitting at half

Authors: Hsien-Kuei Hwang, Svante Janson, Tsung-Hsi Tsai

Abstract: We study divide-and-conquer recurrences of the form \begin{equation*} f(n) = αf(\lfloor \tfrac n2\rfloor) + βf(\lceil \tfrac n2\rceil) + g(n) \qquad(n\ge2), \end{equation*} with $g(n)$ and $f(1)$ given, where $α,β\ge0$ with $α+β>0$; such recurrences appear often in analysis of computer algorithms, numeration systems, combinatorial sequences, and related areas. We show that the solution sat… ▽ More We study divide-and-conquer recurrences of the form \begin{equation*} f(n) = αf(\lfloor \tfrac n2\rfloor) + βf(\lceil \tfrac n2\rceil) + g(n) \qquad(n\ge2), \end{equation*} with $g(n)$ and $f(1)$ given, where $α,β\ge0$ with $α+β>0$; such recurrences appear often in analysis of computer algorithms, numeration systems, combinatorial sequences, and related areas. We show that the solution satisfies always the simple \emph{identity} \begin{equation*} f(n) = n^{\log_2(α+β)} P(\log_2n) - Q(n) \end{equation*} under an optimum (iff) condition on $g(n)$. This form is not only an identity but also an asymptotic expansion because $Q(n)$ is of a smaller order. Explicit forms for the \emph{continuity} of the periodic function $P$ are provided, together with a few other smoothness properties. We show how our results can be easily applied to many dozens of concrete examples collected from the literature, and how they can be extended in various directions. Our method of proof is surprisingly simple and elementary, but leads to the strongest types of results for all examples to which our theory applies. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 69 pages, 13 figures, 13 tables

MSC Class: 68Q25; 39B12; 11B37; 11B83; 05A15; 05A16; 42A16 ACM Class: F.2.2; G.2.1; G.2.3

arXiv:2206.07860 [pdf, other]

doi 10.1109/LSP.2022.3184636

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Authors: Li-Chin Chen, Po-Hsun Chen, Richard Tzong-Han Tsai, Yu Tsao

Abstract: Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been ad… ▽ More Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. The late fusion strategy is deemed to be the most effective approach for simultaneous speech generation and enhancement. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted By IEEE Signal Processing Letter

Journal ref: IEEE Signal Processing Letters, vol. 29, p. 2582-2586, 2022

arXiv:2205.03347 [pdf, other]

Zhuyi: Perception Processing Rate Estimation for Safety in Autonomous Vehicles

Authors: Yu-Shun Hsiao, Siva Kumar Sastry Hari, Michał Filipiuk, Timothy Tsai, Michael B. Sullivan, Vijay Janapa Reddi, Vasu Singh, Stephen W. Keckler

Abstract: The processing requirement of autonomous vehicles (AVs) for high-accuracy perception in complex scenarios can exceed the resources offered by the in-vehicle computer, degrading safety and comfort. This paper proposes a sensor frame processing rate (FPR) estimation model, Zhuyi, that quantifies the minimum safe FPR continuously in a driving scenario. Zhuyi can be employed post-deployment as an onli… ▽ More The processing requirement of autonomous vehicles (AVs) for high-accuracy perception in complex scenarios can exceed the resources offered by the in-vehicle computer, degrading safety and comfort. This paper proposes a sensor frame processing rate (FPR) estimation model, Zhuyi, that quantifies the minimum safe FPR continuously in a driving scenario. Zhuyi can be employed post-deployment as an online safety check and to prioritize work. Experiments conducted using a multi-camera state-of-the-art industry AV system show that Zhuyi's estimated FPRs are conservative, yet the system can maintain safety by processing only 36% or fewer frames compared to a default 30-FPR system in the tested scenarios. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 2022 Design Automation Conference (DAC), July 10-14, 2022, San Francisco

arXiv:2203.07474 [pdf, other]

Distributed On-Sensor Compute System for AR/VR Devices: A Semi-Analytical Simulation Framework for Power Estimation

Authors: Jorge Gomez, Saavan Patel, Syed Shakib Sarwar, Ziyun Li, Raffaele Capoccia, Zhao Wang, Reid Pinkham, Andrew Berkovich, Tsung-Hsun Tsai, Barbara De Salvo, Chiao Liu

Abstract: Augmented Reality/Virtual Reality (AR/VR) glasses are widely foreseen as the next generation computing platform. AR/VR glasses are a complex "system of systems" which must satisfy stringent form factor, computing-, power- and thermal- requirements. In this paper, we will show that a novel distributed on-sensor compute architecture, coupled with new semiconductor technologies (such as dense 3D-IC i… ▽ More Augmented Reality/Virtual Reality (AR/VR) glasses are widely foreseen as the next generation computing platform. AR/VR glasses are a complex "system of systems" which must satisfy stringent form factor, computing-, power- and thermal- requirements. In this paper, we will show that a novel distributed on-sensor compute architecture, coupled with new semiconductor technologies (such as dense 3D-IC interconnects and Spin-Transfer Torque Magneto Random Access Memory, STT-MRAM) and, most importantly, a full hardware-software co-optimization are the solutions to achieve attractive and socially acceptable AR/VR glasses. To this end, we developed a semi-analytical simulation framework to estimate the power consumption of novel AR/VR distributed on-sensor computing architectures. The model allows the optimization of the main technological features of the system modules, as well as the computer-vision algorithm partition strategy across the distributed compute architecture. We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system, with the additional benefits in terms of latency and privacy. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 6 pages, 5 figures, TinyML Research Symposium

arXiv:2105.01899 [pdf, other]

MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering

Authors: Tsung Wei Tsai, Chongxuan Li, Jun Zhu

Abstract: We present Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering framework that simultaneously exploits the discriminative representations learned by contrastive learning and the semantic structures captured by a latent mixture model. Motivated by the mixture of experts, MiCE employs a gating function to partition an unlabeled dataset into subsets according to the latent semant… ▽ More We present Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering framework that simultaneously exploits the discriminative representations learned by contrastive learning and the semantic structures captured by a latent mixture model. Motivated by the mixture of experts, MiCE employs a gating function to partition an unlabeled dataset into subsets according to the latent semantics and multiple experts to discriminate distinct subsets of instances assigned to them in a contrastive learning manner. To solve the nontrivial inference and learning problems caused by the latent variables, we further develop a scalable variant of the Expectation-Maximization (EM) algorithm for MiCE and provide proof of the convergence. Empirically, we evaluate the clustering performance of MiCE on four widely adopted natural image datasets. MiCE achieves significantly better results than various previous methods and a strong contrastive learning baseline. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: International Conference on Learning Representations (ICLR) 2021

arXiv:2103.07403 [pdf, other]

Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles

Authors: Zahra Ghodsi, Siva Kumar Sastry Hari, Iuri Frosio, Timothy Tsai, Alejandro Troccoli, Stephen W. Keckler, Siddharth Garg, Anima Anandkumar

Abstract: Extracting interesting scenarios from real-world data as well as generating failure cases is important for the development and testing of autonomous systems. We propose efficient mechanisms to both characterize and generate testing scenarios using a state-of-the-art driving simulator. For any scenario, our method generates a set of possible driving paths and identifies all the possible safe drivin… ▽ More Extracting interesting scenarios from real-world data as well as generating failure cases is important for the development and testing of autonomous systems. We propose efficient mechanisms to both characterize and generate testing scenarios using a state-of-the-art driving simulator. For any scenario, our method generates a set of possible driving paths and identifies all the possible safe driving trajectories that can be taken starting at different times, to compute metrics that quantify the complexity of the scenario. We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project, as well as adversarial scenarios generated in simulation. We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident. We demonstrate a strong correlation between the proposed metrics and human intuition. △ Less

Submitted 12 March, 2021; originally announced March 2021.

arXiv:2102.08235 [pdf, other]

doi 10.1145/3411764.3445200

Significant Otter: Understanding the Role of Biosignals in Communication

Authors: Fannie Liu, Chunjong Park, Yu Jiang Tham, Tsung-Yu Tsai, Laura Dabbish, Geoff Kaufman, Andrés Monroy-Hernández

Abstract: With the growing ubiquity of wearable devices, sensed physiological responses provide new means to connect with others. While recent research demonstrates the expressive potential for biosignals, the value of sharing these personal data remains unclear. To understand their role in communication, we created Significant Otter, an Apple Watch/iPhone app that enables romantic partners to share and res… ▽ More With the growing ubiquity of wearable devices, sensed physiological responses provide new means to connect with others. While recent research demonstrates the expressive potential for biosignals, the value of sharing these personal data remains unclear. To understand their role in communication, we created Significant Otter, an Apple Watch/iPhone app that enables romantic partners to share and respond to each other's biosignals in the form of animated otter avatars. In a one-month study with 20 couples, participants used Significant Otter with biosignals sensing OFF and ON. We found that while sensing OFF enabled couples to keep in touch, sensing ON enabled easier and more authentic communication that fostered social connection. However, the addition of biosignals introduced concerns about autonomy and agency over the messages they sent. We discuss design implications and future directions for communication systems that recommend messages based on biosignals. △ Less

Submitted 15 April, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: CHI Conference on Human Factors in Computing Systems (CHI '21), May 8--13, 2021, Yokohama, Japan

arXiv:2010.12173 [pdf, other]

A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio

Authors: Mengyi Shan, TJ Tsai

Abstract: This paper tackles the problem of verifying the authenticity of speech recordings from world leaders. Whereas previous work on detecting deep fake or tampered audio focus on scrutinizing an audio recording in isolation, we instead reframe the problem and focus on cross-verifying a questionable recording against trusted references. We present a method for cross-verifying a speech recording against… ▽ More This paper tackles the problem of verifying the authenticity of speech recordings from world leaders. Whereas previous work on detecting deep fake or tampered audio focus on scrutinizing an audio recording in isolation, we instead reframe the problem and focus on cross-verifying a questionable recording against trusted references. We present a method for cross-verifying a speech recording against a reference that consists of two steps: aligning the two recordings and then classifying each query frame as matching or non-matching. We propose a subsequence alignment method based on the Needleman-Wunsch algorithm and show that it significantly outperforms dynamic time warping in handling common tampering operations. We also explore several binary classification models based on LSTM and Transformer architectures to verify content at the frame level. Through extensive experiments on tampered speech recordings of Donald Trump, we show that our system can reliably detect audio tampering operations of different types and durations. Our best model achieves 99.7% accuracy for the alignment task at an error tolerance of 50 ms and a 0.43% equal error rate in classifying audio frames as matching or non-matching. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: 5 pages, 4 figures, 1 table

arXiv:2007.14587 [pdf, other]

Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining

Authors: TJ Tsai, Kevin Ji

Abstract: This paper studies composer style classification of piano sheet music images. Previous approaches to the composer classification task have been limited by a scarcity of data. We address this issue in two ways: (1) we recast the problem to be based on raw sheet music images rather than a symbolic music format, and (2) we propose an approach that can be trained on unlabeled data. Our approach first… ▽ More This paper studies composer style classification of piano sheet music images. Previous approaches to the composer classification task have been limited by a scarcity of data. We address this issue in two ways: (1) we recast the problem to be based on raw sheet music images rather than a symbolic music format, and (2) we propose an approach that can be trained on unlabeled data. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation, and then feeds the sequence into a text classifier. We show that it is possible to significantly improve classifier performance by first training a language model on a set of unlabeled data, initializing the classifier with the pretrained language model weights, and then finetuning the classifier on a small amount of labeled data. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP. We find that transformer-based architectures outperform CNN and LSTM models, and pretraining boosts classification accuracy for the GPT-2 model from 46\% to 70\% on a 9-way classification task. The trained model can also be used as a feature extractor that projects piano sheet music into a feature space that characterizes compositional style. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 8 pages, 7 figures. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2020

arXiv:2007.14580 [pdf, other]

Improved Handling of Repeats and Jumps in Audio-Sheet Image Synchronization

Authors: Mengyi Shan, TJ Tsai

Abstract: This paper studies the problem of automatically generating piano score following videos given an audio recording and raw sheet music images. Whereas previous works focus on synthetic sheet music where the data has been cleaned and preprocessed, we instead focus on developing a system that can cope with the messiness of raw, unprocessed sheet music PDFs from IMSLP. We investigate how well existing… ▽ More This paper studies the problem of automatically generating piano score following videos given an audio recording and raw sheet music images. Whereas previous works focus on synthetic sheet music where the data has been cleaned and preprocessed, we instead focus on developing a system that can cope with the messiness of raw, unprocessed sheet music PDFs from IMSLP. We investigate how well existing systems cope with real scanned sheet music, filler pages and unrelated pieces or movements, and discontinuities due to jumps and repeats. We find that a significant bottleneck in system performance is handling jumps and repeats correctly. In particular, we find that a previously proposed Jump DTW algorithm does not perform robustly when jump locations are unknown a priori. We propose a novel alignment algorithm called Hierarchical DTW that can handle jumps and repeats even when jump locations are not known. It first performs alignment at the feature level on each sheet music line, and then performs a second alignment at the segment level. By operating at the segment level, it is able to encode domain knowledge about how likely a particular jump is. Through carefully controlled experiments on unprocessed sheet music PDFs from IMSLP, we show that Hierarachical DTW significantly outperforms Jump DTW in handling various types of jumps. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 8 pages, 5 figures. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2020

arXiv:2007.14579 [pdf, other]

Camera-Based Piano Sheet Music Identification

Authors: Daniel Yang, TJ Tsai

Abstract: This paper presents a method for large-scale retrieval of piano sheet music images. Our work differs from previous studies on sheet music retrieval in two ways. First, we investigate the problem at a much larger scale than previous studies, using all solo piano sheet music images in the entire IMSLP dataset as a searchable database. Second, we use cell phone images of sheet music as our input quer… ▽ More This paper presents a method for large-scale retrieval of piano sheet music images. Our work differs from previous studies on sheet music retrieval in two ways. First, we investigate the problem at a much larger scale than previous studies, using all solo piano sheet music images in the entire IMSLP dataset as a searchable database. Second, we use cell phone images of sheet music as our input queries, which lends itself to a practical, user-facing application. We show that a previously proposed fingerprinting method for sheet music retrieval is far too slow for a real-time application, and we diagnose its shortcomings. We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime while simultaneously boosting retrieval accuracy. In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 8 pages, 3 figures, 2 tables. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2020

arXiv:2006.04984 [pdf, other]

Making Convolutions Resilient via Algorithm-Based Error Detection Techniques

Authors: Siva Kumar Sastry Hari, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler

Abstract: The ability of Convolutional Neural Networks (CNNs) to accurately process real-time telemetry has boosted their use in safety-critical and high-performance computing systems. As such systems require high levels of resilience to errors, CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a prohibitive 100% overhead. Algorithmic t… ▽ More The ability of Convolutional Neural Networks (CNNs) to accurately process real-time telemetry has boosted their use in safety-critical and high-performance computing systems. As such systems require high levels of resilience to errors, CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a prohibitive 100% overhead. Algorithmic techniques are known to offer low-cost solutions, but the practical feasibility and performance of such techniques have never been studied for CNN deployment platforms (e.g., TensorFlow or TensorRT on GPUs). In this paper, we focus on algorithmically verifying Convolutions, which are the most resource-demanding operations in CNNs. We use checksums to verify convolutions, adding a small amount of redundancy, far less than full-duplication. We first identify the challenges that arise in employing Algorithm-Based Error Detection (ABED) for Convolutions in optimized inference platforms that fuse multiple network layers and use reduced-precision operations, and demonstrate how to overcome them. We propose and evaluate variations of ABED techniques that offer implementation complexity, runtime overhead, and coverage trade-offs. Results show that ABED can detect all transient hardware errors that might otherwise corrupt output and does so while incurring low runtime overheads (6-23%), offering at least 1.6X throughput to workloads compared to full duplication. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2005.01445 [pdf, other]

Estimating Silent Data Corruption Rates Using a Two-Level Model

Authors: Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler

Abstract: High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full application… ▽ More High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full applications because of their slow speeds, while application-level accelerated fault testing in accelerated particle beams is often impractical. We present a new two-level methodology for application resilience evaluation that overcomes these challenges. The proposed approach decomposes application failure rate estimation into (1) identifying how particle strikes in low-level unprotected state manifest at the architecture-level, and (2) measuring how such architecture-level manifestations propagate to the program output. We demonstrate the effectiveness of this approach on GPU architectures. We also show that using just one of the two steps can overestimate SDC rates and produce different trends---the composition of the two is needed for accurate reliability modeling. △ Less

Submitted 27 April, 2020; originally announced May 2020.

arXiv:2004.13004 [pdf, other]

ML-driven Malware that Targets AV Safety

Authors: Saurabh Jha, Shengkun Cui, Subho S. Banerjee, Timothy Tsai, Zbigniew Kalbarczyk, Ravi Iyer

Abstract: Ensuring the safety of autonomous vehicles (AVs) is critical for their mass deployment and public adoption. However, security attacks that violate safety constraints and cause accidents are a significant deterrent to achieving public trust in AVs, and that hinders a vendor's ability to deploy AVs. Creating a security hazard that results in a severe safety compromise (for example, an accident) is c… ▽ More Ensuring the safety of autonomous vehicles (AVs) is critical for their mass deployment and public adoption. However, security attacks that violate safety constraints and cause accidents are a significant deterrent to achieving public trust in AVs, and that hinders a vendor's ability to deploy AVs. Creating a security hazard that results in a severe safety compromise (for example, an accident) is compelling from an attacker's perspective. In this paper, we introduce an attack model, a method to deploy the attack in the form of smart malware, and an experimental evaluation of its impact on production-grade autonomous driving software. We find that determining the time interval during which to launch the attack is{ critically} important for causing safety hazards (such as collisions) with a high degree of success. For example, the smart malware caused 33X more forced emergency braking than random attacks did, and accidents in 52.6% of the driving simulations. △ Less

Submitted 12 June, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted for DSN 2020

Journal ref: 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

arXiv:2004.11724 [pdf, other]

doi 10.1109/TMM.2020.2973831

Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages

Authors: TJ Tsai, Daniel Yang, Mengyi Shan, Thitaree Tanprasert, Teerapat Jenrungrot

Abstract: This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of… ▽ More This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of sheet music. To solve this problem, we introduce a novel feature representation called a bootleg score which encodes the position of noteheads relative to staff lines in sheet music. The MIDI representation can be converted into a bootleg score using deterministic rules of Western musical notation, and the sheet music image can be converted into a bootleg score using classical computer vision techniques for detecting simple geometrical shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we can estimate the alignment using dynamic programming. The most notable characteristic of our system is that it has no trainable weights at all -- only a set of about 40 hyperparameters. With a training set of just 400 images, we show that our system generalizes well to a much larger set of 1600 test images from 160 unseen musical scores. Our system achieves a test F measure score of 0.89, has an average runtime of 0.90 seconds, and outperforms baseline systems based on music object detection and sheet-audio alignment. We provide extensive experimental validation and analysis of our system. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: 13 pages, 8 figures, 3 tables. Accepted article in IEEE Transactions on Multimedia. arXiv admin note: text overlap with arXiv:2004.10347

arXiv:2004.10391 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053815

Towards Linking the Lakh and IMSLP Datasets

Authors: TJ Tsai

Abstract: This paper investigates the problem of matching a MIDI file against a large database of piano sheet music images. Previous sheet-audio and sheet-MIDI alignment approaches have primarily focused on a 1-to-1 alignment task, which is not a scalable solution for retrieval from large databases. We propose a method for scalable cross-modal retrieval that might be used to link the Lakh MIDI dataset with… ▽ More This paper investigates the problem of matching a MIDI file against a large database of piano sheet music images. Previous sheet-audio and sheet-MIDI alignment approaches have primarily focused on a 1-to-1 alignment task, which is not a scalable solution for retrieval from large databases. We propose a method for scalable cross-modal retrieval that might be used to link the Lakh MIDI dataset with IMSLP sheet music data. Our approach is to modify a previously proposed feature representation called a symbolic bootleg score to be suitable for hashing. On a database of 5,000 piano scores containing 55,000 individual sheet music images, our system achieves a mean reciprocal rank of 0.84 and an average retrieval time of 25.4 seconds. △ Less

Submitted 22 April, 2020; originally announced April 2020.

Comments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

arXiv:2004.10347 [pdf, other]

MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music

Authors: Daniel Yang, Thitaree Tanprasert, Teerapat Jenrungrot, Mengyi Shan, TJ Tsai

Abstract: This paper investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of a physical page of sheet music. While audio-sheet music retrieval has been explored by a number of works, this scenario is novel in that the query is a cell phone picture rather than a digital scan. To solve this problem, we introduce… ▽ More This paper investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of a physical page of sheet music. While audio-sheet music retrieval has been explored by a number of works, this scenario is novel in that the query is a cell phone picture rather than a digital scan. To solve this problem, we introduce a mid-level feature representation called a bootleg score which explicitly encodes the rules of Western musical notation. We convert both the MIDI and the sheet music into bootleg scores using deterministic rules of music and classical computer vision techniques for detecting simple geometric shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we estimate the alignment using dynamic programming. The most notable characteristic of our system is that it does test-time adaptation and has no trainable weights at all -- only a set of about 30 hyperparameters. On a dataset containing 1000 cell phone pictures taken of 100 scores of classical piano music, our system achieves an F measure score of .869 and outperforms baseline systems based on commercial optical music recognition software. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: 8 pages, 8 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019

arXiv:2004.10345 [pdf, other]

MIDI-Sheet Music Alignment Using Bootleg Score Synthesis

Authors: Thitaree Tanprasert, Teerapat Jenrungrot, Meinard Mueller, T. J. Tsai

Abstract: MIDI-sheet music alignment is the task of finding correspondences between a MIDI representation of a piece and its corresponding sheet music images. Rather than using optical music recognition to bridge the gap between sheet music and MIDI, we explore an alternative approach: projecting the MIDI data into pixel space and performing alignment in the image domain. Our method converts the MIDI data i… ▽ More MIDI-sheet music alignment is the task of finding correspondences between a MIDI representation of a piece and its corresponding sheet music images. Rather than using optical music recognition to bridge the gap between sheet music and MIDI, we explore an alternative approach: projecting the MIDI data into pixel space and performing alignment in the image domain. Our method converts the MIDI data into a crude representation of the score that only contains rectangular floating notehead blobs, a process we call bootleg score synthesis. Furthermore, we project sheet music images into the same bootleg space by applying a deep watershed notehead detector and filling in the bounding boxes around each detected notehead. Finally, we align the bootleg representations using a simple variant of dynamic time warping. On a dataset of 68 real scanned piano scores from IMSLP and corresponding MIDI performances, our method achieves a 97.3% accuracy at an error tolerance of one second, outperforming several baseline systems that employ optical music recognition. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: 8 pages, 6 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019

arXiv:2002.09786 [pdf, other]

HarDNN: Feature Map Vulnerability Evaluation in CNNs

Authors: Abdulrahman Mahmoud, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler

Abstract: As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed ap… ▽ More As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed approach to identify vulnerable computations during a CNN inference and selectively protect them based on their propensity towards corrupting the inference output in the presence of a hardware error. We show that HarDNN can accurately estimate relative vulnerability of a feature map (fmap) in CNNs using a statistical error injection campaign, and explore heuristics for fast vulnerability assessment. Based on these results, we analyze the tradeoff between error coverage and computational overhead that the system designers can use to employ selective protection. Results show that the improvement in resilience for the added computation is superlinear with HarDNN. For example, HarDNN improves SqueezeNet's resilience by 10x with just 30% additional computations. △ Less

Submitted 25 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

Comments: 14 pages, 5 figures, a short version accepted for publication in First Workshop on Secure and Resilient Autonomy (SARA) co-located with MLSys2020

arXiv:1907.01051 [pdf, other]

ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection

Authors: Saurabh Jha, Subho S. Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B. Sullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, Ravishankar K. Iyer

Abstract: The safety and resilience of fully autonomous vehicles (AVs) are of significant concern, as exemplified by several headline-making accidents. While AV development today involves verification, validation, and testing, end-to-end assessment of AV systems under accidental faults in realistic driving scenarios has been largely unexplored. This paper presents DriveFI, a machine learning-based fault inj… ▽ More The safety and resilience of fully autonomous vehicles (AVs) are of significant concern, as exemplified by several headline-making accidents. While AV development today involves verification, validation, and testing, end-to-end assessment of AV systems under accidental faults in realistic driving scenarios has been largely unexplored. This paper presents DriveFI, a machine learning-based fault injection engine, which can mine situations and faults that maximally impact AV safety, as demonstrated on two industry-grade AV technology stacks (from NVIDIA and Baidu). For example, DriveFI found 561 safety-critical faults in less than 4 hours. In comparison, random injection experiments executed over several weeks could not find any safety-critical faults △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: Accepted at 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

arXiv:1907.01024 [pdf, other]

Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors

Authors: Saurabh Jha, Timothy Tsai, Siva Hari, Michael Sullivan, Zbigniew Kalbarczyk, Stephen W. Keckler, Ravishankar K. Iyer

Abstract: Fully autonomous vehicles (AVs), i.e., AVs with autonomy level 5, are expected to dominate road transportation in the near-future and contribute trillions of dollars to the global economy. The general public, government organizations, and manufacturers all have significant concern regarding resiliency and safety standards of the autonomous driving system (ADS) of AVs . In this work, we proposed an… ▽ More Fully autonomous vehicles (AVs), i.e., AVs with autonomy level 5, are expected to dominate road transportation in the near-future and contribute trillions of dollars to the global economy. The general public, government organizations, and manufacturers all have significant concern regarding resiliency and safety standards of the autonomous driving system (ADS) of AVs . In this work, we proposed and developed (a) `Kayotee' - a fault injection-based tool to systematically inject faults into software and hardware components of the ADS to assess the safety and reliability of AVs to faults and errors, and (b) an ontology model to characterize errors and safety violations impacting reliability and safety of AVs. Kayotee is capable of characterizing fault propagation and resiliency at different levels - (a) hardware, (b) software, (c) vehicle dynamics, and (d) traffic resilience. We used Kayotee to study a proprietary ADS technology built by Nvidia corporation and are currently applying Kayotee to other open-source ADS systems. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: Presented at Automotive Reliability and Testing (ART) 2018 colocated with International Testing Conference

arXiv:1905.13305 [pdf, other]

Countering Noisy Labels By Learning From Auxiliary Clean Labels

Authors: Tsung Wei Tsai, Chongxuan Li, Jun Zhu

Abstract: We consider the learning from noisy labels (NL) problem which emerges in many real-world applications. In addition to the widely-studied synthetic noise in the NL literature, we also consider the pseudo labels in semi-supervised learning (Semi-SL) as a special case of NL. For both types of noise, we argue that the generalization performance of existing methods is highly coupled with the quality of… ▽ More We consider the learning from noisy labels (NL) problem which emerges in many real-world applications. In addition to the widely-studied synthetic noise in the NL literature, we also consider the pseudo labels in semi-supervised learning (Semi-SL) as a special case of NL. For both types of noise, we argue that the generalization performance of existing methods is highly coupled with the quality of noisy labels. Therefore, we counter the problem from a novel and unified perspective: learning from the auxiliary clean labels. Specifically, we propose the Rotational-Decoupling Consistency Regularization (RDCR) framework that integrates the consistency-based methods with the self-supervised rotation task to learn noise-tolerant representations. The experiments show that RDCR achieves comparable or superior performance than the state-of-the-art methods under small noise, while outperforms the existing methods significantly when there is large noise. △ Less

Submitted 12 September, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1901.10219 [pdf]

doi 10.1093/bib/bbaa054

Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

Authors: Ming-Siang Huang, Po-Ting Lai, Richard Tzong-Han Tsai, Wen-Lian Hsu

Abstract: The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid… ▽ More The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several imperfection issues in JNLPBA are pointed out and made up in the new corpus. To compare the adaptability of different NER systems in Revised JNLPBA and JNLPBA corpora, the F1-measure was measured in three open sources NER systems including BANNER, Gimli and NERSuite. In the same circumstance, all the systems perform average 10% better in Revised JNLPBA than in JNLPBA. Moreover, the cross-validation test is carried out which we train the NER systems on JNLPBA/Revised JNLPBA corpora and access the performance in both protein-protein interaction extraction (PPIE) and biomedical event extraction (BEE) corpora to confirm that the newly refined Revised JNLPBA is a competent NER corpus in biomedical relation application. The revised JNLPBA corpus is freely available at iasl-btm.iis.sinica.edu.tw/BNER/Content/Revised_JNLPBA.zip. △ Less

Submitted 29 January, 2019; originally announced January 2019.

Comments: 17 pages

Journal ref: Briefings in Bioinformatics, 2020, bbaa054

arXiv:1510.03021 [pdf]

doi 10.1145/2818869.2818912

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Authors: Chao-Lin Liu, Guan-Tao Jin, Hongsu Wang, Qing-Feng Liu, Wen-Huei Cheng, Wei-Yun Chiu, Richard Tzong-Han Tsai, Yu-Chun Wang

Abstract: We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 a… ▽ More We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 and 1930. We also attempted to disambiguate names that were shared by multiple government officers who served between 618 and 1912 and were recorded in Chinese local gazetteers. To showcase the potentials and challenges of computer-assisted analysis of Chinese literatures, we explored some interesting yet non-trivial questions about two of the Four Great Classical Novels of China: (1) Which monsters attempted to consume the Buddhist monk Xuanzang in the Journey to the West (JTTW), which was published in the 16th century, (2) Which was the most powerful monster in JTTW, and (3) Which major role smiled the most in the Dream of the Red Chamber, which was published in the 18th century. Similar approaches can be applied to the analysis and study of modern documents, such as the newspaper articles published about the 228 incident that occurred in 1947 in Taiwan. △ Less

Submitted 11 October, 2015; originally announced October 2015.

Comments: 11 pages, 7 figures, 2 tables, The Fourth ASE International Conference on Social Informatics

arXiv:1502.06260 [pdf, other]

doi 10.1109/JSTSP.2015.2411575

Compressive Hyperspectral Imaging with Side Information

Authors: Xin Yuan, Tsung-Han Tsai, Ruoyu Zhu, Patrick Llull, David Brady, Lawrence Carin

Abstract: A blind compressive sensing algorithm is proposed to reconstruct hyperspectral images from spectrally-compressed measurements.The wavelength-dependent data are coded and then superposed, mapping the three-dimensional hyperspectral datacube to a two-dimensional image. The inversion algorithm learns a dictionary {\em in situ} from the measurements via global-local shrinkage priors. By using RGB imag… ▽ More A blind compressive sensing algorithm is proposed to reconstruct hyperspectral images from spectrally-compressed measurements.The wavelength-dependent data are coded and then superposed, mapping the three-dimensional hyperspectral datacube to a two-dimensional image. The inversion algorithm learns a dictionary {\em in situ} from the measurements via global-local shrinkage priors. By using RGB images as side information of the compressive sensing system, the proposed approach is extended to learn a coupled dictionary from the joint dataset of the compressed measurements and the corresponding RGB images, to improve reconstruction quality. A prototype camera is built using a liquid-crystal-on-silicon modulator. Experimental reconstructions of hyperspectral datacubes from both simulated and real compressed measurements demonstrate the efficacy of the proposed inversion algorithm, the feasibility of the camera and the benefit of side information. △ Less

Submitted 22 February, 2015; originally announced February 2015.

Comments: 20 pages, 21 figures. To appear in the IEEE Journal of Selected Topics Signal Processing

arXiv:1409.4955 [pdf, other]

Probabilistic analysis of the (1+1)-evolutionary algorithm

Authors: Hsien-Kuei Hwang, Alois Panholzer, Nicolas Rolin, Tsung-Hsi Tsai, Wei-Mei Chen

Abstract: We give a detailed analysis of the cost used by the (1+1)-evolutionary algorithm. The problem has been approached in the evolutionary algorithm literature under various views, formulation and degree of rigor. Our asymptotic approximations for the mean and the variance represent the strongest of their kind. The approach we develop is also applicable to characterize the limit laws and is based on as… ▽ More We give a detailed analysis of the cost used by the (1+1)-evolutionary algorithm. The problem has been approached in the evolutionary algorithm literature under various views, formulation and degree of rigor. Our asymptotic approximations for the mean and the variance represent the strongest of their kind. The approach we develop is also applicable to characterize the limit laws and is based on asymptotic resolution of the underlying recurrence. While most approximations have their simple formal nature, we elaborate on the delicate error analysis required for rigorous justifications. △ Less

Submitted 17 September, 2014; originally announced September 2014.

Comments: 53 pages with 8 figures and 4 appendices

MSC Class: 60C05; 68W40 (Primary); 60F06; 65Q30 (Secondary)

arXiv:1111.6224 [pdf, ps, other]

Threshold phenomena in k-dominant skylines of random samples

Authors: Hsien-Kuei Hwang, Tsung-Hsi Tsai, Wei-Mei Chen

Abstract: Skylines emerged as a useful notion in database queries for selecting representative groups in multivariate data samples for further decision making, multi-objective optimization or data processing, and the $k$-dominant skylines were naturally introduced to resolve the abundance of skylines when the dimensionality grows or when the coordinates are negatively correlated. We prove in this paper that… ▽ More Skylines emerged as a useful notion in database queries for selecting representative groups in multivariate data samples for further decision making, multi-objective optimization or data processing, and the $k$-dominant skylines were naturally introduced to resolve the abundance of skylines when the dimensionality grows or when the coordinates are negatively correlated. We prove in this paper that the expected number of $k$-dominant skylines is asymptotically zero for large samples when $1\le k\le d-1$ under two reasonable (continuous) probability assumptions of the input points, $d$ being the (finite) dimensionality, in contrast to the asymptotic unboundedness when $k=d$. In addition to such an asymptotic zero-infinity property, we also establish a sharp threshold phenomenon for the expected ($d-1$)-dominant skylines when the dimensionality is allowed to grow with $n$. Several related issues such as the dominant cycle structures and numerical aspects, are also briefly studied. △ Less

Submitted 26 November, 2011; originally announced November 2011.

Comments: 38 pages, 4 figures

MSC Class: 60C05; 68P15; 60F20; 68Q25; 82B26

arXiv:0910.1392 [pdf, other]

Simple, efficient maxima-finding algorithms for multidimensional samples

Authors: Wei-Mei Chen, Hsien-Kuei Hwang, Tsung-Hsi Tsai

Abstract: New algorithms are devised for finding the maxima of multidimensional point samples, one of the very first problems studied in computational geometry. The algorithms are very simple and easily coded and modified for practical needs. The expected complexity of some measures related to the performance of the algorithms is analyzed. We also compare the efficiency of the algorithms with a few major… ▽ More New algorithms are devised for finding the maxima of multidimensional point samples, one of the very first problems studied in computational geometry. The algorithms are very simple and easily coded and modified for practical needs. The expected complexity of some measures related to the performance of the algorithms is analyzed. We also compare the efficiency of the algorithms with a few major ones used in practice, and apply our algorithms to find the maximal layers and the longest common subsequences of multiple sequences. △ Less

Submitted 7 October, 2009; originally announced October 2009.

arXiv:0805.0883 [pdf]

Portable Valve-less Peristaltic Micro-pump Design and Fabrication

Authors: H. Yang, T. -H. Tsai, C. -C. Hu

Abstract: This paper is to describe a design and fabrication method for a valve-less peristaltic micro-pump. The valve-less peristaltic micro-pump with three membrane chambers in a serial is actuated by three piezoelectric (PZT) actuators. With the fluidic flow design, liquid in the flow channel is pumped to a constant flow speed ranged from 0.4 to 0.48 mm/s. In term of the maximum flow rate of the micro-… ▽ More This paper is to describe a design and fabrication method for a valve-less peristaltic micro-pump. The valve-less peristaltic micro-pump with three membrane chambers in a serial is actuated by three piezoelectric (PZT) actuators. With the fluidic flow design, liquid in the flow channel is pumped to a constant flow speed ranged from 0.4 to 0.48 mm/s. In term of the maximum flow rate of the micro-pump is about 365 mircoliters/min, when the applied voltage is 24V and frequency 50 Hz. Photolithography process was used to fabricate the micro-pump mold. PDMS molding and PDMS bonding method were used to fabricate the micro-channel and actuator chambers. A portable drive controller was designed to control three PZT actuators in a proper sequence to drive the chamber membrane. Then, all parts were integrated into the portable valve-less peristaltic micro-pump system. △ Less

Submitted 7 May, 2008; originally announced May 2008.

Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/handle/2042/16838)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2008, Nice : France (2008)

arXiv:math/0309285 [pdf, ps, other]

doi 10.1109/LSP.2001.838216

An Algorithm for Optimal Partitioning of Data on an Interval

Authors: Brad Jackson, Jeffrey D. Scargle, David Barnes, Sundararajan Arabhi, Alina Alt, Peter Gioumousis, Elyus Gwin, Paungkaew Sangtrakulcharoen, Linda Tan, Tun Tao Tsai

Abstract: Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of $N$ data points in time $O(N^2)$. The algorithm is guaranteed to find the exact global optimum, automatically determines the model order (t… ▽ More Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of $N$ data points in time $O(N^2)$. The algorithm is guaranteed to find the exact global optimum, automatically determines the model order (the number of segments), has a convenient real-time mode, can be extended to higher dimensional data spaces, and solves a surprising variety of problems in signal detection and characterization, density estimation, cluster analysis and classification. △ Less

Submitted 9 April, 2004; v1 submitted 17 September, 2003; originally announced September 2003.

Comments: 3 pages, 1 figure, submitted to IEEE Signal Processing Letters, revised version with added references

MSC Class: 65C60

Showing 1–42 of 42 results for author: Tsai, T