-
Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference
Authors:
Jihwan Bang,
Juntae Lee,
Kyuhong Shim,
Seunghan Yang,
Simyung Chang
Abstract:
The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constr…
▽ More
The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs
Authors:
Yerin Hwang,
Yongil Kim,
Yunah Jang,
Jeesoo Bang,
Hyunkyung Bae,
Kyomin Jung
Abstract:
Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions.…
▽ More
Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Authors:
Jehyeon Bang,
Yujeong Choi,
Myeongwoo Kim,
Yongdeok Kim,
Minsoo Rhu
Abstract:
As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of L…
▽ More
As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of LLM parallelization. Such limitation renders existing systems to leave significant performance left on the table, wasting millions of dollars worth of training cost. This paper presents our profiling-driven simulator called vTrain, providing AI practitioners a fast yet accurate software framework to determine an efficient and cost-effective LLM training system configuration. We demonstrate vTrain's practicality through several case studies, e.g., effectively evaluating optimal training parallelization strategies that balances training time and its associated training cost, efficient multi-tenant GPU cluster schedulers targeting multiple LLM training jobs, and determining a compute-optimal LLM model architecture given a fixed compute budget.
△ Less
Submitted 27 November, 2023;
originally announced December 2023.
-
Adaptive Shortcut Debiasing for Online Continual Learning
Authors:
Doyoung Kim,
Dongmin Park,
Yooju Shin,
Jihwan Bang,
Hwanjun Song,
Jae-Gil Lee
Abstract:
We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the onli…
▽ More
We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the online environment where prior knowledge and auxiliary data are not ready, two novel techniques -- feature map fusion and adaptive intensity shifting -- enable us to automatically determine the appropriate level and proportion of the candidate shortcut features to be dropped. Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, DropTop increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
One Size Fits All for Semantic Shifts: Adaptive Prompt Tuning for Continual Learning
Authors:
Doyoung Kim,
Susik Yoon,
Dongmin Park,
Youngjun Lee,
Hwanjun Song,
Jihwan Bang,
Jae-Gil Lee
Abstract:
In real-world continual learning (CL) scenarios, tasks often exhibit intricate and unpredictable semantic shifts, posing challenges for fixed prompt management strategies which are tailored to only handle semantic shifts of uniform degree (i.e., uniformly mild or uniformly abrupt). To address this limitation, we propose an adaptive prompting approach that effectively accommodates semantic shifts o…
▽ More
In real-world continual learning (CL) scenarios, tasks often exhibit intricate and unpredictable semantic shifts, posing challenges for fixed prompt management strategies which are tailored to only handle semantic shifts of uniform degree (i.e., uniformly mild or uniformly abrupt). To address this limitation, we propose an adaptive prompting approach that effectively accommodates semantic shifts of varying degree where mild and abrupt shifts are mixed. AdaPromptCL employs the assign-and-refine semantic grouping mechanism that dynamically manages prompt groups in accordance with the semantic similarity between tasks, enhancing the quality of grouping through continuous refinement. Our experiment results demonstrate that AdaPromptCL outperforms existing prompting methods by up to 21.3%, especially in the benchmark datasets with diverse semantic shifts between tasks.
△ Less
Submitted 22 July, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Active Prompt Learning in Vision Language Models
Authors:
Jihwan Bang,
Sumyeong Ahn,
Jae-Gil Lee
Abstract:
Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active lear…
▽ More
Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may degrade performance compared to random selection because of the class imbalance in labeling candidates, and (2) the knowledge of VLMs can provide hints for achieving the balance before labeling. Based on these observations, we devise a novel active learning framework for VLMs, denoted as PCB. To assess the effectiveness of our approach, we conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods. Code will be available in https://github.com/kaist-dmlab/pcb .
△ Less
Submitted 21 March, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Authors:
Yerin Hwang,
Yongil Kim,
Hyunkyung Bae,
Jeesoo Bang,
Hwanhee Lee,
Kyomin Jung
Abstract:
To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer…
▽ More
To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer alignment. To overcome this limitation, we propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance from textual sources. The framework incorporates two training tasks: question-answer matching (QAM) and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted during the inference phase based on the contextual relevance of the generated questions. Using our framework, we produce four ConvQA datasets by utilizing documents from multiple domains as the primary source. Through automatic evaluation using diverse metrics, as well as human evaluation, we validate that our proposed framework exhibits the ability to generate datasets of higher quality compared to the baseline dialog inpainting model.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Authors:
Jaeseok Yoon,
Seunghyun Hwang,
Ran Han,
Jeonguk Bang,
Kee-Eung Kim
Abstract:
Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are ve…
▽ More
Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora.
△ Less
Submitted 9 January, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
Authors:
Hwanjun Song,
Jihwan Bang
Abstract:
Prompt-OVD is an efficient and effective framework for open-vocabulary object detection that utilizes class embeddings from CLIP as prompts, guiding the Transformer decoder to detect objects in both base and novel classes. Additionally, our novel RoI-based masked attention and RoI pruning techniques help leverage the zero-shot classification ability of the Vision Transformer-based CLIP, resulting…
▽ More
Prompt-OVD is an efficient and effective framework for open-vocabulary object detection that utilizes class embeddings from CLIP as prompts, guiding the Transformer decoder to detect objects in both base and novel classes. Additionally, our novel RoI-based masked attention and RoI pruning techniques help leverage the zero-shot classification ability of the Vision Transformer-based CLIP, resulting in improved detection performance at minimal computational cost. Our experiments on the OV-COCO and OVLVIS datasets demonstrate that Prompt-OVD achieves an impressive 21.2 times faster inference speed than the first end-to-end open-vocabulary detection method (OV-DETR), while also achieving higher APs than four two-stage-based methods operating within similar inference time ranges. Code will be made available soon.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Template-Based Conjecturing for Automated Induction in Isabelle/HOL
Authors:
Yutaka Nagashima,
Zijin Xu,
Ningli Wang,
Daniel Sebastian Goc,
James Bang
Abstract:
Proof by induction plays a central role in formal verification. However, its automation remains as a formidable challenge in Computer Science. To solve inductive problems, human engineers often have to provide auxiliary lemmas manually. We automate this laborious process with template-based conjecturing, a novel approach to generate auxiliary lemmas and use them to prove final goals. Our evaluatio…
▽ More
Proof by induction plays a central role in formal verification. However, its automation remains as a formidable challenge in Computer Science. To solve inductive problems, human engineers often have to provide auxiliary lemmas manually. We automate this laborious process with template-based conjecturing, a novel approach to generate auxiliary lemmas and use them to prove final goals. Our evaluation shows that our working prototype, TBC, achieved 40 percentage point improvement of success rates for problems at intermediate difficulty level.
△ Less
Submitted 19 January, 2023; v1 submitted 20 November, 2022;
originally announced December 2022.
-
Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning
Authors:
Dongmin Park,
Yooju Shin,
Jihwan Bang,
Youngjun Lee,
Hwanjun Song,
Jae-Gil Lee
Abstract:
Unlabeled data examples awaiting annotations contain open-set noise inevitably. A few active learning studies have attempted to deal with this open-set noise for sample selection by filtering out the noisy examples. However, because focusing on the purity of examples in a query set leads to overlooking the informativeness of the examples, the best balancing of purity and informativeness remains an…
▽ More
Unlabeled data examples awaiting annotations contain open-set noise inevitably. A few active learning studies have attempted to deal with this open-set noise for sample selection by filtering out the noisy examples. However, because focusing on the purity of examples in a query set leads to overlooking the informativeness of the examples, the best balancing of purity and informativeness remains an important question. In this paper, to solve this purity-informativeness dilemma in open-set active learning, we propose a novel Meta-Query-Net,(MQ-Net) that adaptively finds the best balancing between the two factors. Specifically, by leveraging the multi-round property of active learning, we train MQ-Net using a query set without an additional validation set. Furthermore, a clear dominance relationship between unlabeled examples is effectively captured by MQ-Net through a novel skyline regularization. Extensive experiments on multiple open-set active learning scenarios demonstrate that the proposed MQ-Net achieves 20.14% improvement in terms of accuracy, compared with the state-of-the-art methods.
△ Less
Submitted 11 January, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training
Authors:
Yukyung Lee,
Takyoung Kim,
Hoonsang Yoon,
Pilsung Kang,
Junseong Bang,
Misuk Kim
Abstract:
Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue c…
▽ More
Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue corpora. In this study, we propose DSTEA, improving Dialogue State Tracking via Entity Adaptive pre-training, which can enhance the encoder through by intensively training key entities in dialogue utterances. DSTEA identifies these pivotal entities from input dialogues utilizing four different methods: ontology information, named-entity recognition, the spaCy, and the flair library. Subsequently, it employs selective knowledge masking to train the model effectively. Remarkably, DSTEA only requires pre-training without the direct infusion of extra knowledge into the DST model. This approach resulted in substantial performance improvements of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further validation of DSTEA's efficacy was provided through comparative experiments considering various entity types and different entity adaptive pre-training configurations such as masking strategy and masking rate.
△ Less
Submitted 23 July, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries
Authors:
Jihwan Bang,
Hyunseo Koh,
Seulki Park,
Hwanjun Song,
Jung-Woo Ha,
Jonghyun Choi
Abstract:
Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with…
▽ More
Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle. To address the task, we first argue the importance of both diversity and purity of examples in the episodic memory of continual learning models. To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. Our empirical validations on four real-world or synthetic noise datasets (CIFAR10 and 100, mini-WebVision, and Food-101N) exhibit that our method significantly outperforms prior arts in this realistic and challenging continual learning scenario. Code and data splits are available in https://github.com/clovaai/puridiver.
△ Less
Submitted 30 March, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Interpretable Convolutional Neural Networks for Subject-Independent Motor Imagery Classification
Authors:
Ji-Seon Bang,
Seong-Whan Lee
Abstract:
Deep learning frameworks have become increasingly popular in brain computer interface (BCI) study thanks to their outstanding performance. However, in terms of the classification model alone, they are treated as black box as they do not provide any information on what led them to reach a particular decision. In other words, we cannot convince whether the high performance was aroused by the neuro-p…
▽ More
Deep learning frameworks have become increasingly popular in brain computer interface (BCI) study thanks to their outstanding performance. However, in terms of the classification model alone, they are treated as black box as they do not provide any information on what led them to reach a particular decision. In other words, we cannot convince whether the high performance was aroused by the neuro-physiological factors or simply noise. Because of this disadvantage, it is difficult to ensure adequate reliability compared to their high performance. In this study, we propose an explainable deep learning model for BCI. Specifically, we aim to classify EEG signal which is obtained from the motor-imagery (MI) task. In addition, we adopted layer-wise relevance propagation (LRP) to the model to interpret the reason that the model derived certain classification output. We visualized the heatmap which indicates the output of the LRP in form of topography to certify neuro-physiological factors. Furthermore, we classified EEG with the subject-independent manner to learn robust and generalized EEG features by avoiding subject dependency. The methodology also provides the advantage of avoiding the expense of building training data for each subject. With our proposed model, we obtained generalized heatmap patterns for all subjects. As a result, we can conclude that our proposed model provides neuro-physiologically reliable interpretation.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
A Reliable, Self-Adaptive Face Identification Framework via Lyapunov Optimization
Authors:
Dohyeon Kim,
Joongheon Kim,
Jae young Bang
Abstract:
Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a…
▽ More
Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a queue overflow that hampers the system's reliability. This paper proposes a novel, queue-aware FID framework that adapts the sampling rate to maximize the FID performance while avoiding a queue overflow by implementing the Lyapunov optimization. A preliminary evaluation via a trace-based simulation confirms the effectiveness of the framework.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances
Authors:
Takyoung Kim,
Yukyung Lee,
Hoonsang Yoon,
Pilsung Kang,
Junseong Bang,
Misuk Kim
Abstract:
The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no…
▽ More
The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study, "Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind after a certain topic is over?" We found that the answer is "No" because DST models cannot refer to previous user preferences when template-based turnback utterances are injected into the dataset. Even in the the simplest mind-changing (turnback) scenario, the performance of DST models significantly degenerated. However, we found that this performance degeneration can be recovered when the turnback scenarios are explicitly designed in the training set, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset.
△ Less
Submitted 12 October, 2022; v1 submitted 28 August, 2021;
originally announced August 2021.
-
Motor Imagery Classification based on CNN-GRU Network with Spatio-Temporal Feature Representation
Authors:
Ji-Seon Bang,
Seong-Whan Lee
Abstract:
Recently, various deep neural networks have been applied to classify electroencephalogram (EEG) signal. EEG is a brain signal that can be acquired in a non-invasive way and has a high temporal resolution. It can be used to decode the intention of users. As the EEG signal has a high dimension of feature space, appropriate feature extraction methods are needed to improve classification performance.…
▽ More
Recently, various deep neural networks have been applied to classify electroencephalogram (EEG) signal. EEG is a brain signal that can be acquired in a non-invasive way and has a high temporal resolution. It can be used to decode the intention of users. As the EEG signal has a high dimension of feature space, appropriate feature extraction methods are needed to improve classification performance. In this study, we obtained spatio-temporal feature representation and classified them with the combined convolutional neural networks (CNN)-gated recurrent unit (GRU) model. To this end, we obtained covariance matrices in each different temporal band and then concatenated them on the temporal axis to obtain a final spatio-temporal feature representation. In the classification model, CNN is responsible for spatial feature extraction and GRU is responsible for temporal feature extraction. Classification performance was improved by distinguishing spatial data processing and temporal data processing. The average accuracy of the proposed model was 77.70% for the BCI competition IV_2a data set. The proposed method outperformed all other methods compared as a baseline method.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning
Authors:
Pramod Chunduri,
Jaeho Bang,
Yao Lu,
Joy Arulraj
Abstract:
Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is…
▽ More
Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is also important to consider the entire sequence of frames to answer the query effectively.
In this paper, we present ZEUS, a video analytics system tailored for answering action queries. We present a novel technique for efficiently answering these queries using deep reinforcement learning. ZEUS trains a reinforcement learning agent that learns to adaptively modify the input video segments that are subsequently sent to an action classification network. The agent alters the input segments along three dimensions - sampling rate, segment length, and resolution. To meet the user-specified accuracy target, ZEUS's query optimizer trains the agent based on an accuracy-aware, aggregate reward function. Evaluation on three diverse video datasets shows that ZEUS outperforms state-of-the-art frame- and window-based filtering techniques by up to 22.1x and 4.7x, respectively. It also consistently meets the user-specified accuracy target across all queries.
△ Less
Submitted 27 September, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
EKO: Adaptive Sampling of Compressed Video Data
Authors:
Jaeho Bang,
Pramod Chunduri,
Joy Arulraj
Abstract:
Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the…
▽ More
Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the sampling algorithm. Third, the sampling algorithms often require labeled training data obtained using a specific deep learning model. These limitations lead to lower accuracy, higher query execution time, and larger memory footprint. In this paper, we present EKO, a storage engine for efficiently managing video data. EKO relies on two optimizations. First, it uses a novel unsupervised, adaptive sampling algorithm for identifying the key frames in a given video. Second, it stores the identified key frames in a compressed representation that is optimized for machine consumption. We show that EKO improves F1-score by up to 9% compared to the next best performing state-of-the-art unsupervised, sampling algorithms by selecting more representative frames. It reduces query execution time by 3X and memory footprint by 10X in comparison to a widely-used, traditional video storage format.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Authors:
Jihwan Bang,
Heesu Kim,
YoungJoon Yoo,
Jung-Woo Ha,
Jonghyun Choi
Abstract:
Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memor…
▽ More
Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memory. To enhance the sample diversity in the memory, we propose a novel memory management strategy based on per-sample classification uncertainty and data augmentation, named Rainbow Memory (RM). With extensive empirical validations on MNIST, CIFAR10, CIFAR100, and ImageNet datasets, we show that the proposed method significantly improves the accuracy in blurry continual learning setups, outperforming state of the arts by large margins despite its simplicity. Code and data splits will be available in https://github.com/clovaai/rainbow-memory.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples
Authors:
Jihwan Bang,
Heesu Kim,
YoungJoon Yoo,
Jung-Woo Ha
Abstract:
The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training pipeline boosting the conventional active learning approach targeting label-efficient learning to resolve the mentioned problem. Existing active learning methods only…
▽ More
The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training pipeline boosting the conventional active learning approach targeting label-efficient learning to resolve the mentioned problem. Existing active learning methods only focus on selecting a set of informative samples under a labeling budget. One step further, we suggest that the training efficiency can be further improved by utilizing the unlabeled samples, exceeding the labeling budget, by introducing sophisticatedly configured unsupervised loss complementing supervised loss effectively. We propose new unsupervised loss based on consistency regularization, and we configure appropriate augmentation techniques for utterances to adopt consistency regularization in the automatic speech recognition task. From the qualitative and quantitative experiments on the real-world dataset and under real-usage scenarios, we show that the proposed training pipeline can boost the efficacy of active learning approaches, thus successfully reducing a sustainable amount of human labeling cost.
△ Less
Submitted 5 November, 2020; v1 submitted 19 June, 2020;
originally announced June 2020.
-
SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
Authors:
Hyojin Park,
Lars Lowe Sjösund,
YoungJoon Yoo,
Nicolas Monet,
Jihwan Bang,
Nojun Kwak
Abstract:
Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem and less handled in the semantic segmentation field. Obviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of…
▽ More
Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem and less handled in the semantic segmentation field. Obviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many real-world applications, it requires extremely lightweight models. Second, there has not been any public datasets in this domain that contain a sufficient number of images with unbiased statistics. To solve the first problem, we introduce the new extremely lightweight portrait segmentation model SINet, containing an information blocking decoder and spatial squeeze modules. The information blocking decoder uses confidence estimates to recover local spatial information without spoiling global consistency. The spatial squeeze module uses multiple receptive fields to cope with various sizes of consistency in the image. To tackle the second problem, we propose a simple method to create additional portrait segmentation data which can improve accuracy on the EG1800 dataset. In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models. Our method reduces the number of parameters from 2.1M to 86.9K (around 95.9% reduction), while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. We also show our model is successfully executed on a real mobile device with 100.6 FPS. In addition, we demonstrate that our method can be used for general semantic segmentation on the Cityscapes dataset. The code and dataset are available in https://github.com/HYOJINPARK/ExtPortraitSeg .
△ Less
Submitted 9 February, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
ExtremeC3Net: Extreme Lightweight Portrait Segmentation Networks using Advanced C3-modules
Authors:
Hyojin Park,
Lars Lowe Sjösund,
YoungJoon Yoo,
Jihwan Bang,
Nojun Kwak
Abstract:
Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem. bviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many realworld applications, it r…
▽ More
Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem. bviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many realworld applications, it requires extremely lightweight models. Second, there has not been any public datasets in this domain that contain a sufficient number of images with unbiased statistics. To solve the problems, we introduce a new extremely lightweight portrait segmentation model consisting of a two-branched architecture based on the concentrated-comprehensive convolutions block. Our method reduces the number of parameters from 2.1M to 37.7K (around 98.2% reduction), while maintaining the accuracy within a 1% margin from the state-of-the-art portrait segmentation method. In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models. Second, we propose a simple method to create additional portrait segmentation data which can improve accuracy on the EG1800 dataset. Also, we analyze the bias in public datasets by additionally annotating race, gender, and age on our own. The augmented dataset, the additional annotations and code are available in https://github.com/HYOJINPARK/ExtPortraitSeg .
△ Less
Submitted 9 December, 2019; v1 submitted 8 August, 2019;
originally announced August 2019.
-
Classification-based Financial Markets Prediction using Deep Neural Networks
Authors:
Matthew Dixon,
Diego Klabjan,
Jin Hoon Bang
Abstract:
Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previousl…
▽ More
Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy backtesting environment both of which are available as open source code written by the authors.
△ Less
Submitted 13 June, 2017; v1 submitted 28 March, 2016;
originally announced March 2016.
-
Multicell Zero-Forcing and User Scheduling on the Downlink of a Linear Cell Array
Authors:
H. J. Bang,
D. Gesbert
Abstract:
Coordinated base station (BS) transmission has attracted much interest for its potential to increase the capacity of wireless networks. Yet at the same time, the achievable sum-rate with single-cell processing (SCP) scales optimally with the number of users under Rayleigh fading conditions. One may therefore ask if the value of BS coordination is limited in the many-user regime from a sum-rate p…
▽ More
Coordinated base station (BS) transmission has attracted much interest for its potential to increase the capacity of wireless networks. Yet at the same time, the achievable sum-rate with single-cell processing (SCP) scales optimally with the number of users under Rayleigh fading conditions. One may therefore ask if the value of BS coordination is limited in the many-user regime from a sum-rate perspective. With this in mind we consider multicell zero-forcing beamforming (ZFBF) on the downlink of a linear cell-array. We first identify the beamforming weights and the optimal scheduling policy under a per-base power constraint. We then compare the number of users m and n required per-cell to achieve the same mean SINR, after optimal scheduling, with SCP and ZFBF respectively. Specifically, we show that the ratio m/n grows logarithmically with n. Finally, we demonstrate that the gain in sum-rate between ZFBF and SCP is significant for all practical values of number of users.
△ Less
Submitted 6 November, 2009; v1 submitted 5 November, 2009;
originally announced November 2009.