Search | arXiv e-print repository

Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning

Authors: Fang-Duo Tsai, Shih-Lun Wu, Haven Kim, Bo-Yu Chen, Hao-Chung Cheng, Yi-Hsuan Yang

Abstract: Text-to-music models allow users to generate nearly realistic musical audio with textual commands. However, editing music audios remains challenging due to the conflicting desiderata of performing fine-grained alterations on the audio while maintaining a simple user interface. To address this challenge, we propose Audio Prompt Adapter (or AP-Adapter), a lightweight addition to pretrained text-to-m… ▽ More Text-to-music models allow users to generate nearly realistic musical audio with textual commands. However, editing music audios remains challenging due to the conflicting desiderata of performing fine-grained alterations on the audio while maintaining a simple user interface. To address this challenge, we propose Audio Prompt Adapter (or AP-Adapter), a lightweight addition to pretrained text-to-music models. We utilize AudioMAE to extract features from the input audio, and construct attention-based adapters to feedthese features into the internal layers of AudioLDM2, a diffusion-based text-to-music model. With 22M trainable parameters, AP-Adapter empowers users to harness both global (e.g., genre and timbre) and local (e.g., melody) aspects of music, using the original audio and a short text as inputs. Through objective and subjective studies, we evaluate AP-Adapter on three tasks: timbre transfer, genre transfer, and accompaniment generation. Additionally, we demonstrate its effectiveness on out-of-domain audios containing unseen instruments during training. △ Less

Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)

arXiv:2407.09059 [pdf, other]

Domain-adaptive Video Deblurring via Test-time Blurring

Authors: Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

Abstract: Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adapta… ▽ More Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adaptation scheme based on a blurring model to achieve test-time fine-tuning for deblurring models in unseen domains. Since blurred and sharp pairs are unavailable for fine-tuning during inference, our scheme can generate domain-adaptive training pairs to calibrate a deblurring model for the target domain. First, a Relative Sharpness Detection Module is proposed to identify relatively sharp regions from the blurry input images and regard them as pseudo-sharp images. Next, we utilize a blurring model to produce blurred images based on the pseudo-sharp images extracted during testing. To synthesize blurred images in compliance with the target data distribution, we propose a Domain-adaptive Blur Condition Generation Module to create domain-specific blur conditions for the blurring model. Finally, the generated pseudo-sharp and blurred pairs are used to fine-tune a deblurring model for better performance. Extensive experimental results demonstrate that our approach can significantly improve state-of-the-art video deblurring methods, providing performance gains of up to 7.54dB on various real-world video deblurring datasets. The source code is available at https://github.com/Jin-Ting-He/DADeblur. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2404.09269 [pdf, other]

PANet: A Physics-guided Parametric Augmentation Net for Image Dehazing by Hazing

Authors: Chih-Ling Chang, Fu-Jen Tsai, Zi-Ling Huang, Lin Gu, Chia-Wen Lin

Abstract: Image dehazing faces challenges when dealing with hazy images in real-world scenarios. A huge domain gap between synthetic and real-world haze images degrades dehazing performance in practical settings. However, collecting real-world image datasets for training dehazing models is challenging since both hazy and clean pairs must be captured under the same conditions. In this paper, we propose a Phy… ▽ More Image dehazing faces challenges when dealing with hazy images in real-world scenarios. A huge domain gap between synthetic and real-world haze images degrades dehazing performance in practical settings. However, collecting real-world image datasets for training dehazing models is challenging since both hazy and clean pairs must be captured under the same conditions. In this paper, we propose a Physics-guided Parametric Augmentation Network (PANet) that generates photo-realistic hazy and clean training pairs to effectively enhance real-world dehazing performance. PANet comprises a Haze-to-Parameter Mapper (HPM) to project hazy images into a parameter space and a Parameter-to-Haze Mapper (PHM) to map the resampled haze parameters back to hazy images. In the parameter space, we can pixel-wisely resample individual haze parameter maps to generate diverse hazy images with physically-explainable haze conditions unseen in the training set. Our experimental results demonstrate that PANet can augment diverse realistic hazy images to enrich existing hazy image benchmarks so as to effectively boost the performances of state-of-the-art image dehazing models. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2312.14502 [pdf, other]

ViStripformer: A Token-Efficient Transformer for Versatile Video Restoration

Authors: Fu-Jen Tsai, Yan-Tsung Peng, Chen-Yu Chang, Chan-Yu Li, Yen-Yu Lin, Chung-Chi Tsai, Chia-Wen Lin

Abstract: Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of the Transformer has raised awareness in the computer-vision community. However, its self-attention mechanism requires much memory, which is unsuitable for high-… ▽ More Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of the Transformer has raised awareness in the computer-vision community. However, its self-attention mechanism requires much memory, which is unsuitable for high-resolution vision tasks like video restoration. In this paper, we propose ViStripformer (Video Stripformer), which utilizes spatio-temporal strip attention to catch long-range data correlations, consisting of intra-frame strip attention (Intra-SA) and inter-frame strip attention (Inter-SA) for extracting spatial and temporal information. It decomposes video frames into strip-shaped features in horizontal and vertical directions for Intra-SA and Inter-SA to address degradation patterns with various orientations and magnitudes. Besides, ViStripformer is an effective and efficient transformer architecture with much lower memory usage than the vanilla transformer. Extensive experiments show that the proposed model achieves superior results with fast inference time on video restoration tasks, including video deblurring, demoireing, and deraining. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.10998 [pdf, other]

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation

Authors: Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

Abstract: Image deblurring aims to remove undesired blurs from an image captured in a dynamic scene. Much research has been dedicated to improving deblurring performance through model architectural designs. However, there is little work on data augmentation for image deblurring. Since continuous motion causes blurred artifacts during image exposure, we aspire to develop a groundbreaking blur augmentation me… ▽ More Image deblurring aims to remove undesired blurs from an image captured in a dynamic scene. Much research has been dedicated to improving deblurring performance through model architectural designs. However, there is little work on data augmentation for image deblurring. Since continuous motion causes blurred artifacts during image exposure, we aspire to develop a groundbreaking blur augmentation method to generate diverse blurred images by simulating motion trajectories in a continuous space. This paper proposes Implicit Diffusion-based reBLurring AUgmentation (ID-Blau), utilizing a sharp image paired with a controllable blur condition map to produce a corresponding blurred image. We parameterize the blur patterns of a blurred image with their orientations and magnitudes as a pixel-wise blur condition map to simulate motion trajectories and implicitly represent them in a continuous space. By sampling diverse blur conditions, ID-Blau can generate various blurred images unseen in the training set. Experimental results demonstrate that ID-Blau can produce realistic blurred images for training and thus significantly improve performance for state-of-the-art deblurring models. The source code is available at https://github.com/plusgood-steven/ID-Blau. △ Less

Submitted 20 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: CVPR 2024

arXiv:2304.02868 [pdf, other]

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

Authors: Chen Feng Tsai, Xiaochen Zhou, Sierra S. Liu, Jing Li, Mo Yu, Hongyuan Mei

Abstract: Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT… ▽ More Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence. Precisely, ChatGPT can not construct the world model by playing the game or even reading the game manual; it may fail to leverage the world knowledge that it already has; it cannot infer the goal of each step as the game progresses. Our results open up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2210.08036 [pdf, other]

Meta Transferring for Deblurring

Authors: Po-Sheng Liu, Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

Abstract: Most previous deblurring methods were built with a generic model trained on blurred images and their sharp counterparts. However, these approaches might have sub-optimal deblurring results due to the domain gap between the training and test sets. This paper proposes a reblur-deblur meta-transferring scheme to realize test-time adaptation without using ground truth for dynamic scene deblurring. Sin… ▽ More Most previous deblurring methods were built with a generic model trained on blurred images and their sharp counterparts. However, these approaches might have sub-optimal deblurring results due to the domain gap between the training and test sets. This paper proposes a reblur-deblur meta-transferring scheme to realize test-time adaptation without using ground truth for dynamic scene deblurring. Since the ground truth is usually unavailable at inference time in a real-world scenario, we leverage the blurred input video to find and use relatively sharp patches as the pseudo ground truth. Furthermore, we propose a reblurring model to extract the homogenous blur from the blurred input and transfer it to the pseudo-sharps to obtain the corresponding pseudo-blurred patches for meta-learning and test-time adaptation with only a few gradient updates. Extensive experimental results show that our reblur-deblur meta-learning scheme can improve state-of-the-art deblurring models on the DVD, REDS, and RealBlur benchmark datasets. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted at BMVC 2022

arXiv:2204.04627 [pdf, other]

Stripformer: Strip Transformer for Fast Image Deblurring

Authors: Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin, Chung-Chi Tsai, Chia-Wen Lin

Abstract: Images taken in dynamic scenes may contain unwanted motion blur, which significantly degrades visual quality. Such blur causes short- and long-range region-specific smoothing artifacts that are often directional and non-uniform, which is difficult to be removed. Inspired by the current success of transformers on computer vision and image processing tasks, we develop, Stripformer, a transformer-bas… ▽ More Images taken in dynamic scenes may contain unwanted motion blur, which significantly degrades visual quality. Such blur causes short- and long-range region-specific smoothing artifacts that are often directional and non-uniform, which is difficult to be removed. Inspired by the current success of transformers on computer vision and image processing tasks, we develop, Stripformer, a transformer-based architecture that constructs intra- and inter-strip tokens to reweight image features in the horizontal and vertical directions to catch blurred patterns with different orientations. It stacks interlaced intra-strip and inter-strip attention layers to reveal blur magnitudes. In addition to detecting region-specific blurred patterns of various orientations and magnitudes, Stripformer is also a token-efficient and parameter-efficient transformer model, demanding much less memory usage and computation cost than the vanilla transformer but works better without relying on tremendous training data. Experimental results show that Stripformer performs favorably against state-of-the-art models in dynamic scene deblurring. △ Less

Submitted 22 July, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: ECCV 2022 Oral Presentation

arXiv:2101.07518 [pdf, other]

doi 10.1109/TIP.2022.3216216

BANet: Blur-aware Attention Networks for Dynamic Scene Deblurring

Authors: Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin, Chung-Chi Tsai, Chia-Wen Lin

Abstract: Image motion blur results from a combination of object motions and camera shakes, and such blurring effect is generally directional and non-uniform. Previous research attempted to solve non-uniform blurs using self-recurrent multiscale, multi-patch, or multi-temporal architectures with self-attention to obtain decent results. However, using self-recurrent frameworks typically lead to a longer infe… ▽ More Image motion blur results from a combination of object motions and camera shakes, and such blurring effect is generally directional and non-uniform. Previous research attempted to solve non-uniform blurs using self-recurrent multiscale, multi-patch, or multi-temporal architectures with self-attention to obtain decent results. However, using self-recurrent frameworks typically lead to a longer inference time, while inter-pixel or inter-channel self-attention may cause excessive memory usage. This paper proposes a Blur-aware Attention Network (BANet), that accomplishes accurate and efficient deblurring via a single forward pass. Our BANet utilizes region-based self-attention with multi-kernel strip pooling to disentangle blur patterns of different magnitudes and orientations and cascaded parallel dilated convolution to aggregate multi-scale content features. Extensive experimental results on the GoPro and RealBlur benchmarks demonstrate that the proposed BANet performs favorably against the state-of-the-arts in blurred image restoration and can provide deblurred results in real-time. △ Less

Submitted 25 October, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

Comments: TIP 2022, Code: https://github.com/pp00704831/BANet

arXiv:2009.13772 [pdf, ps, other]

doi 10.1109/DAC18074.2021.9586087

Trust-Region Method with Deep Reinforcement Learning in Analog Design Space Exploration

Authors: Kai-En Yang, Chia-Yu Tsai, Hung-Hao Shen, Chen-Feng Chiang, Feng-Ming Tsai, Chung-An Wang, Yiju Ting, Chia-Shun Yeh, Chin-Tang Lai

Abstract: This paper introduces new perspectives on analog design space search. To minimize the time-to-market, this endeavor better cast as constraint satisfaction problem than global optimization defined in prior arts. We incorporate model-based agents, contrasted with model-free learning, to implement a trust-region strategy. As such, simple feed-forward networks can be trained with supervised learning,… ▽ More This paper introduces new perspectives on analog design space search. To minimize the time-to-market, this endeavor better cast as constraint satisfaction problem than global optimization defined in prior arts. We incorporate model-based agents, contrasted with model-free learning, to implement a trust-region strategy. As such, simple feed-forward networks can be trained with supervised learning, where the convergence is relatively trivial. Experiment results demonstrate orders of magnitude improvement on search iterations. Additionally, the unprecedented consideration of PVT conditions are accommodated. On circuits with TSMC 5/6nm process, our method achieve performance surpassing human designers. Furthermore, this framework is in production in industrial settings. △ Less

Submitted 2 December, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: 6 pages, 3 figures, 5 tables

arXiv:1111.0867 [pdf, ps, other]

The black-and-white coloring problem on distance hereditary graphs and strongly chordal graphs

Authors: Ton Kloks, Sheung-Hung Poon, Feng-Ren Tsai, Yue-Li Wang

Abstract: Given a graph G and integers b and w. The black-and-white coloring problem asks if there exist disjoint sets of vertices B and W with |B|=b and |W|=w such that no vertex in B is adjacent to any vertex in W. In this paper we show that the problem is polynomial when restricted to cographs, distance-hereditary graphs, interval graphs and strongly chordal graphs. We show that the problem is NP-complet… ▽ More Given a graph G and integers b and w. The black-and-white coloring problem asks if there exist disjoint sets of vertices B and W with |B|=b and |W|=w such that no vertex in B is adjacent to any vertex in W. In this paper we show that the problem is polynomial when restricted to cographs, distance-hereditary graphs, interval graphs and strongly chordal graphs. We show that the problem is NP-complete on splitgraphs. △ Less

Submitted 3 November, 2011; originally announced November 2011.

Showing 1–11 of 11 results for author: Tsai, F