Skip to main content

Showing 1–50 of 64 results for author: Kim, N S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung Jin Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2405.04752  [pdf, other

    eess.AS cs.SD

    HILCodec: High Fidelity and Lightweight Neural Audio Codec

    Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

    Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2405.02499  [pdf, other

    cs.CR cs.AR

    DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

  4. arXiv:2401.01498  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages, utilizing discrete semantic tokens obtained from wav2vec2.0 embeddings. For a robust and efficient alignment modeling, we employ a neural transducer named token trans… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  5. arXiv:2401.01099  [pdf, other

    eess.AS cs.AI cs.LG

    Efficient Parallel Audio Generation using Group Masked Language Modeling

    Authors: Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2312.06065  [pdf, other

    eess.AS cs.SD

    EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

    Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

    Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  7. arXiv:2311.02898  [pdf, other

    eess.AS cs.LG

    Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim

    Abstract: We introduce a text-to-speech(TTS) framework based on a neural transducer. We use discretized semantic tokens acquired from wav2vec2.0 embeddings, which makes it easy to adopt a neural transducer for the TTS framework enjoying its monotonic alignment constraints. The proposed model first generates aligned semantic tokens using the neural transducer, then synthesizes a speech sample from the semant… ▽ More

    Submitted 8 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ASRU2023

  8. arXiv:2306.10058  [pdf, other

    cs.LG cs.CL eess.AS

    EM-Network: Oracle Guided Self-distillation for Sequence Learning

    Authors: Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim

    Abstract: We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the t… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  9. X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official d… ▽ More

    Submitted 12 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 4 pages, 7 figures, accepted at IEEE Computer Architecture Letters

  10. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  11. arXiv:2305.07522  [pdf, other

    cs.AR cs.AI

    SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

    Authors: Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi

    Abstract: 3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection. However, the state-of-the-art methods employing Po… ▽ More

    Submitted 13 January, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 14 pages, 15 figures

  12. A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors

    Authors: Reese Kuper, Ipoom Jeong, Yifan Yuan, Jiayu Hu, Ren Wang, Narayan Ranganathan, Nam Sung Kim

    Abstract: As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets… ▽ More

    Submitted 29 January, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted by ASPLOS'24. Please refer to the linked DOI for the official version of this paper

  13. arXiv:2304.00350  [pdf, other

    cs.CL

    When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

    Authors: Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, Jihwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim

    Abstract: Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept. Such a tendency can be seen in generative tasks like question-answering and dialogue generation and also in tasks that create a categorization-based corpus, like topic classification or sentiment analysis. Open-domain conversations involve two or… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Presented at HCOMP 2022 as Works-in-Progress

  14. Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

    Authors: Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim

    Abstract: The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technolo… ▽ More

    Submitted 4 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted by MICRO'23. Please refer to the https://doi.org/10.1145/3613424.3614256 for the official version of this paper

    ACM Class: C.4; D.4; C.0

  15. arXiv:2302.13394  [pdf, other

    cs.AR

    Asynchronous Persistence with ASAP

    Authors: Ahmed Abulila, Izzat El Hajj, Myoungsoo Jung, Nam Sung Kim

    Abstract: Supporting atomic durability of updates for persistent memories is typically achieved with Write-Ahead Logging (WAL). WAL flushes log entries to persistent memory before making the actual data persistent to ensure that a consistent state can be recovered if a crash occurs. Performing WAL in hardware is attractive because it makes most aspects of log management transparent to software, and it compl… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 2 pages, 2 figures, 14th Annual Non-Volatile Memories Workshop

  16. arXiv:2302.01474  [pdf, other

    cs.CR cs.AR cs.LG

    Defensive ML: Defending Architectural Side-channels with Adversarial Obfuscation

    Authors: Hyoungwook Nam, Raghavendra Pradyumna Pothukuchi, Bo Li, Nam Sung Kim, Josep Torrellas

    Abstract: Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels. We call this approach Defensive ML, and the generator to obf… ▽ More

    Submitted 14 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Preprint. Under review

  17. SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

    Authors: Byoung Jin Choi, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

    Abstract: Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker. The main challenge of ZSM-TTS is to increase the overall speaker similarity for unseen speakers. One of the most successful speaker conditioning methods for flow-based multi-speaker text-to-speech (TTS) models is to utilize the functions which predict the scal… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Signal Processing Letters

  18. arXiv:2211.15075  [pdf, other

    eess.AS cs.SD

    Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

    Authors: Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim

    Abstract: Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy comput… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted by 2022 SLT Workshop

  19. arXiv:2210.08974  [pdf

    cs.CY

    Coordinated Science Laboratory 70th Anniversary Symposium: The Future of Computing

    Authors: Klara Nahrstedt, Naresh Shanbhag, Vikram Adve, Nancy Amato, Romit Roy Choudhury, Carl Gunter, Nam Sung Kim, Olgica Milenkovic, Sayan Mitra, Lav Varshney, Yurii Vlasov, Sarita Adve, Rashid Bashir, Andreas Cangellaris, James DiCarlo, Katie Driggs-Campbell, Nick Feamster, Mattia Gazzola, Karrie Karahalios, Sanmi Koyejo, Paul Kwiat, Bo Li, Negar Mehr, Ravish Mehra, Andrew Miller , et al. (3 additional authors not shown)

    Abstract: In 2021, the Coordinated Science Laboratory CSL, an Interdisciplinary Research Unit at the University of Illinois Urbana-Champaign, hosted the Future of Computing Symposium to celebrate its 70th anniversary. CSL's research covers the full computing stack, computing's impact on society and the resulting need for social responsibility. In this white paper, we summarize the major technological points… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  20. arXiv:2210.05979  [pdf, other

    eess.AS cs.SD

    Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

    Authors: Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

    Abstract: Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers. However, synthesizing a new speaker's voice with a single reference audio, commonly known as zero-shot multi-speaker text-to-speech (ZSM-TTS), is still a very challenging task. The main c… ▽ More

    Submitted 22 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: APSIPA 2022

  21. arXiv:2208.08012  [pdf, other

    eess.AS cs.SD

    Disentangled Speaker Representation Learning via Mutual Information Minimization

    Authors: Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

    Abstract: Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive lo… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table

  22. arXiv:2204.06328  [pdf, other

    cs.CL cs.SD eess.AS

    HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

    Authors: Ji Won Yoon, Beom Jun Woo, Nam Soo Kim

    Abstract: Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost to achieve outstanding performance, slowing down the inference speed. To improve the model efficiency, we introduce an early exit scheme for ASR, namely HuBERT-E… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2024

  23. arXiv:2204.01005  [pdf, other

    eess.AS cs.AI

    Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

    Authors: Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim

    Abstract: The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional lay… ▽ More

    Submitted 12 October, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted by IEEE SLT 2022. 7 pages, 4 figures, 1 table. Code is available at https://github.com/msh9184/ska-tdnn.git

  24. Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

    Authors: Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim

    Abstract: Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre-training. By leveraging wav2vec2.0 representation, unlabeled speech can highly improve performance, especially in the lack of labeled speech. We also… ▽ More

    Submitted 6 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech2022

  25. arXiv:2203.10983  [pdf, other

    cs.LG cs.AI

    BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling

    Authors: Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art method for graph-based learning tasks. However, training GCNs at scale is still challenging, hindering both the exploration of more sophisticated GCN architectures and their applications to real-world large graphs. While it might be natural to consider graph partition and distributed training for tackling this challenge, this… ▽ More

    Submitted 26 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: MLSys 2022

  26. arXiv:2203.10428  [pdf, other

    cs.LG cs.AI

    PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

    Authors: Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  27. arXiv:2203.08906  [pdf, other

    cs.AR cs.DC cs.NI

    ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

    Authors: Yifan Yuan, Jinghan Huang, Yan Sun, Tianchen Wang, Jacob Nelson, Dan R. K. Ports, Yipeng Wang, Ren Wang, Charlie Tai, Nam Sung Kim

    Abstract: Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution… ▽ More

    Submitted 17 October, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted by HPCA'23. This arxiv paper is not the final camera-ready version

  28. Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers

    Authors: Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, Nam Sung Kim

    Abstract: Deep neural networks (DNNs) have grown exponentially in size over the past decade, leaving only those who have massive datacenter-based resources with the ability to develop and train such models. One of the main challenges for the long tail of researchers who might have only limited resources (e.g., a single multi-GPU server) is limited GPU memory capacity compared to model size. The problem is s… ▽ More

    Submitted 1 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted at VLDB 2022

  29. arXiv:2112.08929  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

    Authors: Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, Nam Soo Kim

    Abstract: In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In t… ▽ More

    Submitted 24 December, 2021; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by IEEE Access

  30. arXiv:2112.06095  [pdf, other

    cs.NI cs.DC

    Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

    Authors: Yifan Yuan, Omar Alama, Amedeo Sapio, Jiawei Fei, Jacob Nelson, Dan R. K. Ports, Marco Canini, Nam Sung Kim

    Abstract: The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network prot… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

    Comments: This paper has been accepted by NSDI'22. This arxiv paper is not the final camera-ready version

  31. arXiv:2111.03664  [pdf, other

    cs.LG eess.AS eess.IV

    Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

    Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

    Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teach… ▽ More

    Submitted 11 August, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  32. arXiv:2107.02875  [pdf, other

    cs.CL

    Kosp2e: Korean Speech to English Translation Corpus

    Authors: Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim

    Abstract: Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies. For some languages, this problem was tackled through corpus construction, but the farther linguistically from English or the more under-resourced, this deficiency and underrepresentedness becomes more significant. In this paper,… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Interspeech 2021 Camera-ready

  33. Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution

    Authors: Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Nam Sung Kim, Mahmut Taylan Kandemir, Myoungsoo Jung

    Abstract: Large persistent memories such as NVDIMM have been perceived as a disruptive memory technology, because they can maintain the state of a system even after a power failure and allow the system to recover quickly. However, overheads incurred by a heavy software-stack intervention seriously negate the benefits of such memories. First, to significantly reduce the software stack overheads, we propose H… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  34. arXiv:2104.01409  [pdf, other

    eess.AS cs.AI cs.SD

    Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

    Authors: Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

    Abstract: Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency. In this work, we propose a novel non-autoregressive TTS model, namely Diff-TTS, which achieves highly natural and efficient speech synthesis. Given the text, Diff-TTS exploits a denoising d… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021

  35. arXiv:2103.13439  [pdf, other

    cs.CL

    StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands

    Authors: Won Ik Cho, Sangwhan Moon, Jong In Kim, Seok Min Kim, Nam Soo Kim

    Abstract: Paraphrasing is often performed with less concern for controlled style conversion. Especially for questions and commands, style-variant paraphrasing can be crucial in tone and manner, which also matters with industrial applications such as dialog systems. In this paper, we attack this issue with a corpus construction scheme that simultaneously considers the core content and style of directives, na… ▽ More

    Submitted 27 April, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: LREC 2022 Camera-ready

  36. arXiv:2102.03542   

    eess.SP cs.LG

    Continuous Monitoring of Blood Pressure with Evidential Regression

    Authors: Hyeongju Kim, Woo Hyun Kang, Hyeonseung Lee, Nam Soo Kim

    Abstract: Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement. However, the performance of existing machine learning-based BP measuring methods still fall behind some BP measurement guidelines and most of them provide only p… ▽ More

    Submitted 25 February, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

    Comments: We found some errors in the experimental configuration. We plan to revise the paper and republish it later

  37. arXiv:2010.11433  [pdf, other

    eess.AS cs.SD

    Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the pr… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, 4 tables

  38. arXiv:2010.11408  [pdf, ps, other

    eess.AS cs.SD

    Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in INTERSPEECH 2020

  39. Disentangled speaker and nuisance attribute embedding for robust speaker verification

    Authors: Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states)… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  40. Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim

    Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end with… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020

  41. Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

    Authors: Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim

    Abstract: Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attent… ▽ More

    Submitted 14 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  42. arXiv:2007.04552  [pdf, other

    cs.AR cs.OS

    IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform

    Authors: Yifan Yuan, Mohammad Alian, Yipeng Wang, Ilia Kurakin, Ren Wang, Charlie Tai, Nam Sung Kim

    Abstract: In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of the workloads, and how to manage LLC is a key to the performance isolation and QoS in the cloud with multi-tenancy. In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management. This is because of an Intel architectural… ▽ More

    Submitted 4 March, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted by the 48th IEEE/ACM International Symposium on Computer Architecture (ISCA'21). The title is "Don't Forget the I/O When Allocating Your LLC"

  43. arXiv:2006.08966  [pdf, ps, other

    cs.OS

    FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack

    Authors: Jie Zhang, Miryeong Kwon, Sanghyun Han, Nam Sung Kim, Mahmut Kandemir, Myoungsoo Jung

    Abstract: Host-side page victimizations can easily overflow the SSD internal buffer, which interferes I/O services of diverse user applications thereby degrading user-level experiences. To address this, we propose FastDrain, a co-design of OS kernel and flash firmware to avoid the buffer overflow, caused by page victimizations. Specifically, FastDrain can detect a triggering point where a near-future page v… ▽ More

    Submitted 22 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

  44. arXiv:2006.04604  [pdf, other

    cs.CV cs.LG

    SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Joun Yeop Lee, Nam Soo Kim

    Abstract: Flow-based generative models are composed of invertible transformations between two random variables of the same dimension. Therefore, flow-based models cannot be adequately trained if the dimension of the data distribution does not match that of the underlying target distribution. In this paper, we propose SoftFlow, a probabilistic framework for training normalizing flows on manifolds. To sideste… ▽ More

    Submitted 15 November, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 17 pages, 15figures

  45. arXiv:2006.04598  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

    Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conven… ▽ More

    Submitted 2 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 4 figures, Second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020)

  46. arXiv:2005.08213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

    Authors: Won Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim

    Abstract: Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized end-to-en… ▽ More

    Submitted 8 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020 Camera-ready

  47. arXiv:2004.05333  [pdf, other

    cs.LG cs.PF

    Bit-Parallel Vector Composability for Neural Acceleration

    Authors: Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, Hadi Esmaeilzadeh

    Abstract: Conventional neural accelerators rely on isolated self-sufficient functional units that perform an atomic operation while communicating the results through an operand delivery-aggregation logic. Each single unit processes all the bits of their operands atomically and produce all the bits of the results in isolation. This paper explores a different design style, where each unit is only responsible… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

  48. arXiv:1912.00342  [pdf, other

    cs.CL

    Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives

    Authors: Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim

    Abstract: Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are expected to extrapolate intent from the user's dialogue even when subjected to non-canonical forms of speech. This depends on the agent's comprehension of parap… ▽ More

    Submitted 7 October, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: Findings of ACL: EMNLP 2020

  49. arXiv:1910.09275  [pdf, other

    cs.CL eess.AS

    Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution

    Authors: Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim

    Abstract: Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics. It is, at the same time, one of the most challenging issues for spoken language understanding (SLU) systems as well. As syntactic ambiguity is intertwined with issues regarding prosody and semantics, the computational approach toward speech intention identification is expected to… ▽ More

    Submitted 21 May, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: CogSci 2020 Camera-ready

  50. arXiv:1906.11915  [pdf, other

    cs.AR

    Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

    Authors: Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh

    Abstract: Low-power potential of mixed-signal design makes it an alluring option to accelerate Deep Neural Networks (DNNs). However, mixed-signal circuitry suffers from limited range for information encoding, susceptibility to noise, and Analog to Digital (A/D) conversion overheads. This paper aims to address these challenges by offering and leveraging the insight that a vector dot-product (the basic operat… ▽ More

    Submitted 12 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.