Zum Hauptinhalt springen

Showing 1–50 of 343 results for author: Hsu, W

.
  1. arXiv:2408.17443  [pdf, other

    cs.CV cs.AI cs.CL

    Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding

    Authors: Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai

    Abstract: While existing research often treats long-form videos as extended short videos, we propose a novel approach that more accurately reflects human cognition. This paper introduces BREASE: BRidging Episodes And SEmantics for Long-Form Video Understanding, a model that simulates episodic memory accumulation to capture action sequences and reinforces them with semantic knowledge dispersed throughout the… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted to the EVAL-FoMo Workshop at ECCV'24. Project page: https://joslefaure.github.io/assets/html/hermes.html

  2. arXiv:2408.11038  [pdf, other

    physics.chem-ph physics.comp-ph

    Multiple Topology Replica Exchange of Expanded Ensembles (MT-REXEE) for Multidimensional Alchemical Calculations

    Authors: Anika J. Friedman, Wei-Tse Hsu, Michael R. Shirts

    Abstract: Relative free energy calculations are now widely used in academia and industry, but the accuracy is often limited by poor sampling of the complexes conformational ensemble. To address this, we have developed a novel method termed Multi-Topology Replica Exchange of Expanded Ensembles (MT-REXEE). This method enables parallel expanded ensemble calculations, facilitating iterative relative free energy… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.09481  [pdf, other

    cs.CL cs.AI

    PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

    Authors: Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu

    Abstract: While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversati… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  4. arXiv:2407.15291  [pdf, other

    cs.IR

    Evidence-Based Temporal Fact Verification

    Authors: Anab Maulana Barik, Wynne Hsu, Mong Li Lee

    Abstract: Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be appli… ▽ More

    Submitted 18 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  5. arXiv:2407.12867  [pdf, other

    astro-ph.HE gr-qc

    Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

    Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

    Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 50 pages, 10 figures, 4 tables

  6. arXiv:2407.03648  [pdf, other

    eess.AS cs.SD

    High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

    Authors: Gael Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra

    Abstract: We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2406.13578  [pdf, other

    cs.CL

    Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

    Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

    Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Findings at ACL 2024

  8. arXiv:2406.10923  [pdf, other

    cs.CV cs.CL cs.LG

    Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

    Authors: Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

    Abstract: Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reaso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Project page: https://ander1119.github.io/TiM

  9. arXiv:2406.09342  [pdf, other

    physics.optics cond-mat.dis-nn physics.comp-ph

    Wavefront shaping simulations with augmented partial factorization

    Authors: Ho-Chun Lin, Zeyu Wang, Chia Wei Hsu

    Abstract: Wavefront shaping can tailor multipath interference to control multiple scattering of waves in complex optical systems. However, full-wave simulations that capture multiple scattering are computationally demanding given the large system size and the large number of input channels. Recently, an "augmented partial factorization" (APF) method was proposed to significantly speed-up such full-wave simu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.09272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    Authors: Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman

    Abstract: Generating realistic audio for human actions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations… ▽ More

    Submitted 25 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://vision.cs.utexas.edu/projects/action2sound. ECCV 2024 camera-ready version

  11. arXiv:2406.07777  [pdf, other

    cs.LG

    Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction

    Authors: Raja Farrukh Ali, Stephanie Milani, John Woods, Emmanuel Adenij, Ayesha Farooq, Clayton Mansel, Jeffrey Burns, William Hsu

    Abstract: Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important ques… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Previous versions accepted to NeurIPS 2023's XAIA and AAAI 2024's XAI4DRL workshops

  12. arXiv:2406.06727  [pdf, other

    physics.optics cond-mat.dis-nn physics.comp-ph

    Full transmission of vectorial waves through 3D multiple-scattering media

    Authors: Ho-Chun Lin, Chia Wei Hsu

    Abstract: A striking prediction from the random matrix theory in mesoscopic physics is the existence of "open channels": waves that can use multipath interference to achieve perfect transmission across an opaque disordered medium even in the multiple-scattering regime. Realization of such open channels requires a coherent control of the complete incident wavefront. To date, the open channels have only been… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  13. arXiv:2406.06251  [pdf, other

    eess.AS cs.CL

    Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

    Authors: Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu

    Abstract: As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that integrates fine-grained conditions into a pre-trained Voicebox speech generation model using a cross-attention module. To ensure a smooth integration of newly added modules with pre-trained one… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  14. arXiv:2406.04377  [pdf, other

    eess.IV cs.LG

    Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images

    Authors: Ruiwen Ding, Kha-Dinh Luong, Erika Rodriguez, Ana Cristina Araujo Lemos da Silva, William Hsu

    Abstract: In computational pathology, extracting spatial features from gigapixel whole slide images (WSIs) is a fundamental task, but due to their large size, WSIs are typically segmented into smaller tiles. A critical aspect of this analysis is aggregating information from these tiles to make predictions at the WSI level. We introduce a model that combines a message-passing graph neural network (GNN) with… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2406.00761  [pdf, other

    cs.LG cs.AI

    Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning

    Authors: Po-Shao Lin, Jia-Fong Yeh, Yi-Ting Chen, Winston H. Hsu

    Abstract: We observe that current state-of-the-art (SOTA) methods suffer from the performance imbalance issue when performing multi-task reinforcement learning (MTRL) tasks. While these methods may achieve impressive performance on average, they perform extremely poorly on a few tasks. To address this, we propose a new and effective method called STARS, which consists of two novel strategies: a shared-uniqu… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: The first two authors contribute equally

  16. arXiv:2405.18357  [pdf, other

    cs.CL

    Faithful Logical Reasoning via Symbolic Chain-of-Thought

    Authors: Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, Wynne Hsu

    Abstract: While the recent Chain-of-Thought (CoT) technique enhances the reasoning ability of large language models (LLMs) with the theory of mind, it might still struggle in handling logical reasoning that relies much on symbolic expressions and rigid deducing rules. To strengthen the logical reasoning capability of LLMs, we propose a novel Symbolic Chain-of-Thought, namely SymbCoT, a fully LLM-based frame… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 (main proceeding)

  17. arXiv:2405.17507  [pdf, other

    cs.LG cs.AI cs.NI

    Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Traditional traffic prediction, limited by the scope of sensor data, falls short in comprehensive traffic management. Mobile networks offer a promising alternative using network activity counts, but these lack crucial directionality. Thus, we present the TeltoMob dataset, featuring undirected telecom counts and corresponding directional flows, to predict directional mobility flows on roadways. To… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 8 Figures, 5 Tables. Just accepted by IJCAI (to appear)

  18. arXiv:2405.16545  [pdf, other

    cs.RO

    VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

    Authors: Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, Winston H. Hsu

    Abstract: We study reward models for long-horizon manipulation tasks by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Recent advancements in cross-modality modeling have highlighted the potential of reward modeling through visual and language correlations. However, existing VIC methods face challenges in learning rewards for long-… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  19. arXiv:2405.13237  [pdf

    eess.IV cs.CV

    Spatial Matching of 2D Mammography Images and Specimen Radiographs: Towards Improved Characterization of Suspicious Microcalcifications

    Authors: Noor Nakhaei, Chrysostomos Marasinou, Akinyinka Omigbodun, Nina Capiro, Bo Li, Anne Hoyt, William Hsu

    Abstract: Accurate characterization of suspicious microcalcifications is critical to determine whether these calcifications are associated with invasive disease. Our overarching objective is to enable the joint characterization of microcalcifications and surrounding breast tissue using mammography images and digital histopathology images. Towards this goal, we investigate a template matching-based approach… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Journal ref: Medical Imaging 2021: Computer-Aided Diagnosis (Vol. 11597, pp. 511-516). SPIE

  20. arXiv:2405.11478  [pdf, other

    cs.CV eess.IV

    Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement

    Authors: Igor Morawski, Kai He, Shusil Dangi, Winston H. Hsu

    Abstract: Currently, low-light conditions present a significant challenge for machine cognition. In this paper, rather than optimizing models by assuming that human and machine cognition are correlated, we use zero-reference low-light enhancement to improve the performance of downstream task models. We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguisti… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024 Workshop NTIRE: New Trends in Image Restoration and Enhancement workshop and Challenges

  21. arXiv:2405.08586  [pdf, other

    cs.CV

    Cross-Domain Feature Augmentation for Domain Generalization

    Authors: Yingnan Liu, Yingtian Zou, Rui Qiao, Fusheng Liu, Mong Li Lee, Wynne Hsu

    Abstract: Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature spa… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024); Code is available at https://github.com/NancyQuris/XDomainMix

  22. arXiv:2404.11678  [pdf, other

    stat.ME math.OC stat.AP

    Corrected Correlation Estimates for Meta-Analysis

    Authors: Alexander Johnson-Vázquez, Alexander W. Hsu, Peng Zheng, Aleksandr Aravkin

    Abstract: Meta-analysis allows rigorous aggregation of estimates and uncertainty across multiple studies. When a given study reports multiple estimates, such as log odds ratios (ORs) or log relative risks (RRs) across exposure groups, accounting for within-study correlations improves accuracy and efficiency of meta-analytic results. Canonical approaches of Greenland-Longnecker and Hamling estimate pseudo ca… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 31 pages, 9 figures

    MSC Class: 62-08; 62P10; 90C25

  23. arXiv:2404.09956  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

    Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

    Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at ACM MM 2024

  24. arXiv:2404.04248  [pdf, other

    astro-ph.HE gr-qc

    Observation of Gravitational Waves from the Coalescence of a $2.5\text{-}4.5~M_\odot$ Compact Object and a Neutron Star

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, S. Akçay, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah , et al. (1771 additional authors not shown)

    Abstract: We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the so… ▽ More

    Submitted 26 July, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: 45 pages (10 pages author list, 13 pages main text, 1 page acknowledgements, 13 pages appendices, 8 pages bibliography), 17 figures, 16 tables. Update to match version published in The Astrophysical Journal Letters. Data products available from https://zenodo.org/records/10845779

    Report number: LIGO-P2300352

    Journal ref: ApJL 970, L34 (2024)

  25. arXiv:2403.18330  [pdf, other

    cs.CV cs.LG

    Tracking-Assisted Object Detection with Event Cameras

    Authors: Ting-Kang Yen, Igor Morawski, Shusil Dangi, Kai He, Chung-Yi Lin, Jia-Fong Yeh, Hung-Ting Su, Winston Hsu

    Abstract: Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various implicit-learned memor… ▽ More

    Submitted 11 August, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  26. arXiv:2403.14402  [pdf, other

    cs.SD cs.CL eess.AS

    XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

    Authors: HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

    Abstract: Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-v… ▽ More

    Submitted 12 August, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ACL2024

  27. arXiv:2403.13493  [pdf, other

    math.PR

    SPDEs on narrow channels and graphs: convergence and large deviations in case of non smooth noise

    Authors: Sandra Cerrai, Wen-Tai Hsu

    Abstract: We investigate a class of stochastic partial differential equations of reaction-diffusion type defined on graphs, which can be derived as the limit of SPDEs on narrow planar channels. In the first part, we demonstrate that this limit can be achieved under less restrictive assumptions on the regularity of the noise, compared to [4]. In the second part, we establish the validity of a large deviation… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.12991  [pdf, other

    cs.CV cs.LG

    Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Vehicle flow, a crucial indicator for transportation, is often limited by detector coverage. With the advent of extensive mobile network coverage, we can leverage mobile user activities, or cellular traffic, on roadways as a proxy for vehicle flow. However, as counts of cellular traffic may not directly align with vehicle flow due to data from various user types, we present a new task: predicting… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 4 pages, 5 figures, 4 tables. Accepted by WWW'24, to appear

  29. arXiv:2403.06392  [pdf, other

    cs.LG

    Towards Robust Out-of-Distribution Generalization Bounds via Sharpness

    Authors: Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu

    Abstract: Generalizing to out-of-distribution (OOD) data or unseen domain, termed OOD generalization, still lacks appropriate theoretical guarantees. Canonical OOD bounds focus on different distance measurements between source and target domains but fail to consider the optimization property of the learned model. As empirically shown in recent work, the sharpness of learned minima influences OOD generalizat… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 40 pages, 9 figures, ICLR 2024 Spotlight Presentation

  30. arXiv:2403.03170  [pdf, other

    cs.MM cs.AI cs.CL cs.CV cs.CY

    SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

    Authors: Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee

    Abstract: Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. W… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: To appear in CVPR 2024

  31. arXiv:2403.03162  [pdf, other

    cond-mat.soft cond-mat.mes-hall cond-mat.stat-mech physics.atm-clus

    Statistical modeling of equilibrium phase transition in confined fluids

    Authors: Gunjan Auti, Soumyadeep Paul, Wei-Lun Hsu, Shohei Chiashi, Shigeo Maruyama, Hirofumi Daiguji

    Abstract: The phase transition of confined fluids in mesoporous materials deviates from that of bulk fluids due to the interactions with the surrounding heterogeneous structure. For example, adsorbed fluids in metal-organic-frameworks (MOFs) have atypical phase characteristics such as capillary condensation and higher-order phase transitions due to a strong heterogeneous field. Considering a many-body probl… ▽ More

    Submitted 20 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 25 pages, 14 figures

  32. arXiv:2403.03004  [pdf, other

    astro-ph.CO gr-qc hep-ph

    Ultralight vector dark matter search using data from the KAGRA O3GK run

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

    Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 20 pages, 5 figures

    Report number: LIGO-P2300250

  33. arXiv:2402.03860  [pdf, other

    cs.RO

    AED: Adaptable Error Detection for Few-shot Imitation Policy

    Authors: Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston H. Hsu

    Abstract: We introduce a new task called Adaptable Error Detection (AED), which aims to identify behavior errors in few-shot imitation (FSI) policies based on visual observations in novel environments. The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsis… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  34. arXiv:2402.00747  [pdf, other

    cond-mat.str-el

    Mott resistive switching initiated by topological defects

    Authors: Alessandra Milloch, Ignacio Figueruelo-Campanero, Wei-Fan Hsu, Selene Mor, Simon Mellaerts, Francesco Maccherozzi, Larissa Ishibe Veiga, Sarnjeet S. Dhesi, Mauro Spera, Jin Won Seo, Jean-Pierre Locquet, Michele Fabrizio, Mariela Menghini, Claudio Giannetti

    Abstract: Resistive switching is the fundamental process that triggers the sudden change of the electrical properties in solid-state devices under the action of intense electric fields. Despite its relevance for information processing, ultrafast electronics, neuromorphic devices, resistive memories and brain-inspired computation, the nature of the local stochastic fluctuations that drive the formation of me… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  35. arXiv:2401.16737  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall physics.chem-ph

    Formation of highly stable interfacial nitrogen gas hydrate overlayers under ambient conditions

    Authors: Chung-Kai Fang, Cheng-Hao Chuang, Chih-Wen Yang, Zheng-Rong Guo, Wei-Hao Hsu, Chia-Hsin Wang, Ing-Shouh Hwang

    Abstract: Surfaces (interfaces) dictate many physical and chemical properties of solid materials and adsorbates considerably affect these properties. Nitrogen molecules, which are the most abundant constituent in ambient air, are considered to be inert. Our study combining atomic force microscopy (AFM), X-ray photoemission spectroscopy (XPS), and thermal desorption spectroscopy (TDS) revealed that nitrogen… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  36. arXiv:2401.07781  [pdf, other

    cs.CV

    Towards A Better Metric for Text-to-Video Generation

    Authors: Jay Zhangjie Wu, Guian Fang, Haoning Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, Yuchao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou

    Abstract: Generative models have demonstrated remarkable capability in synthesizing high-quality text, images, and videos. For video generation, contemporary text-to-video models exhibit impressive capabilities, crafting visually stunning videos. Nonetheless, evaluating such videos poses significant challenges. Current research predominantly employs automated metrics such as FVD, IS, and CLIP Score. However… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Project page: https://showlab.github.io/T2VScore/

  37. arXiv:2401.03138  [pdf, other

    cs.LG cs.AI

    TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that inte… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: 7 pages, 7 figures, 4 tables. Accepted by AAAI-24-IAAI, to appear

  38. arXiv:2312.15821  [pdf, other

    cs.SD cs.LG eess.AS

    Audiobox: Unified Audio Generation with Natural Language Prompts

    Authors: Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

    Abstract: Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in sever… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  39. arXiv:2312.04425  [pdf, other

    cond-mat.mtrl-sci cond-mat.str-el

    Confinement-Induced Isosymmetric Metal-Insulator Transition in Ultrathin Epitaxial V2O3 Films

    Authors: Simon Mellaerts, Claudio Bellani, Wei-Fan Hsu, Alberto Binetti, Koen Schouteden, Maria Recaman-Payo, Mariela Menghini, Juan Rubio Zuazo, Jesús López Sánchez, Jin Won Seo, Michel Houssa, Jean-Pierre Locquet

    Abstract: Dimensional confinement has shown to be an effective strategy to tune competing degrees of freedom in complex oxides. Here, we achieved atomic layered growth of trigonal vanadium sesquioxide (V2O3) by means of oxygen-assisted molecular beam epitaxy. This led to a series of high-quality epitaxial ultrathin V2O3 films down to unit cell thickness, enabling the study of the intrinsic electron correlat… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Journal ref: ACS Appl. Mater. Interfaces 2021, 13, 30941-30949

  40. High-efficiency high-NA metalens designed by maximizing the efficiency limit

    Authors: Shiyu Li, Ho-Chun Lin, Chia Wei Hsu

    Abstract: Theoretical bounds are commonly used to assess the limitations of photonic design. Here we introduce a more active way to use theoretical bounds, integrating them into part of the design process and identifying optimal system parameters that maximize the efficiency limit itself. As an example, we consider wide-field-of-view high-numerical-aperture metalenses, which can be used for high-resolution… ▽ More

    Submitted 1 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Journal ref: Optica 11, 454-459 (2024)

  41. arXiv:2311.05672  [pdf, other

    math.OC math.PR stat.CO stat.ML

    Conditional Optimal Transport on Function Spaces

    Authors: Bamdad Hosseini, Alexander W. Hsu, Amirhossein Taghvaei

    Abstract: We present a systematic study of conditional triangular transport maps in function spaces from the perspective of optimal transportation and with a view towards amortized Bayesian inference. More specifically, we develop a theory of constrained optimal transport problems that describe block-triangular Monge maps that characterize conditional measures along with their Kantorovich relaxations. This… ▽ More

    Submitted 6 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    MSC Class: 49Q22; 62G86; 62F15; 60B05

  42. arXiv:2311.02772  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

    Authors: Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel

    Abstract: In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing convolutional modules with self-attention modules. They achieve state-of-the-art performance on ASR with top efficiency. We first show that employing these speech tr… ▽ More

    Submitted 8 February, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 5 pages; accepted to Self-supervision in Audio, Speech and Beyond (SASB) workshop in ICASSP24

  43. arXiv:2311.02332  [pdf, other

    cs.LG cs.CV

    Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects

    Authors: Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, Olivier Gevaert, Arvind Rao

    Abstract: Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing… ▽ More

    Submitted 19 January, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

  44. arXiv:2310.16338  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Generative Pre-training for Speech with Flow Matching

    Authors: Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

    Abstract: Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there… ▽ More

    Submitted 25 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  45. arXiv:2310.14643  [pdf, other

    physics.optics

    Dynamic gain and frequency comb formation in exceptional-point lasers

    Authors: Xingwei Gao, Hao He, Scott Sobolewski, Alexander Cerjan, Chia Wei Hsu

    Abstract: Exceptional points (EPs)--singularities in the parameter space of non-Hermitian systems where two nearby eigenmodes coalesce--feature unique properties with applications for microcavity lasers such as sensitivity enhancement and chiral emission. Present EP lasers operate with static populations in the gain medium. Here, we show theoretically that a laser operating sufficiently close to an EP will… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  46. arXiv:2310.13615  [pdf, other

    cs.CL

    Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning

    Authors: An-Zi Yen, Wei-Ling Hsu

    Abstract: Due to the remarkable language understanding and generation abilities of large language models (LLMs), their use in educational applications has been explored. However, little work has been done on investigating the pedagogical ability of LLMs in helping students to learn mathematics. In this position paper, we discuss the challenges associated with employing LLMs to enhance students' mathematical… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 Findings

  47. arXiv:2310.08715  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Toward Joint Language Modeling for Speech Units and Text

    Authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

    Abstract: Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform co… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: EMNLP findings 2023

  48. arXiv:2310.03821  [pdf, other

    cs.CV cs.RO

    WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection

    Authors: Tsung-Lin Tsou, Tsung-Han Wu, Winston H. Hsu

    Abstract: In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplo… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024. Code is available at https://github.com/jacky121298/WLST

  49. arXiv:2309.17020  [pdf, other

    eess.AS cs.SD

    Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

    Authors: Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed

    Abstract: Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TT… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ASRU 2023 SPARKS Workshop

  50. arXiv:2309.09376  [pdf, other

    physics.optics physics.bio-ph

    Delivering Broadband Light Deep Inside Diffusive Media

    Authors: Rohin McIntosh, Arthur Goetschy, Nicholas Bender, Alexey Yamilov, Chia Wei Hsu, Hasan Yilmaz, Hui Cao

    Abstract: Wavefront shaping enables targeted delivery of coherent light into random-scattering media, such as biological tissue, by constructive interference of scattered waves. However, broadband waves have short coherence times, weakening the interference effect. Here, we introduce a broadband deposition matrix that identifies a single input wavefront that maximizes the broadband energy delivered to an ex… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 17 pages, 10 figures

    Journal ref: Nature Photonics (2024)