Zum Hauptinhalt springen

Showing 1–50 of 686 results for author: Chang, K

.
  1. arXiv:2408.14262  [pdf

    cs.CL cs.SD eess.AS

    Self-supervised Speech Representations Still Struggle with African American Vernacular English

    Authors: Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, David R. Mortensen

    Abstract: Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American Eng… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  2. arXiv:2408.13905  [pdf

    cond-mat.mtrl-sci

    Circularly polarised electroluminescence from chiral excitons in vacuum-sublimed supramolecular semiconductor thin films

    Authors: Rituparno Chowdhury, Marco D. Preuss, Hwan-Hee Cho, Joshua J. P. Thompson, Samarpita Sen, Tomi Baikie, Pratyush Ghosh, Yorrick Boeije, Xian-Wei Chua, Kai-Wei Chang, Erjuan Guo, Joost van der Tol, Bart W. L. van den Bersselaar, Andrea Taddeucci, Nicolas Daub, Daphne M. Dekker, Scott T. Keene, Ghislaine Vantomme, Bruno Ehrler, Stefan C. J. Meskers, Akshay Rao, Bartomeu Monserrat, E. W. Meijer, Richard H. Friend

    Abstract: Materials with chiral electronic structures are of great interest. We report a triazatruxene, TAT, molecular semiconductor with chiral alkyl side chains that crystallises from solution to form chirally-stacked columns with a helical pitch of 6 TATs (2.3 nm). These crystals show strong circularly polarised, CP, green photoluminescence, with dissymmetry of 24%. Electronic structure calculations usin… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  3. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  4. arXiv:2408.06545  [pdf

    eess.SP

    Optimal Preprocessing for Joint Detection and Classification of Wireless Communication Signals in Congested Spectrum Using Computer Vision Methods

    Authors: Xiwen Kang, Hua-mei Chen, Genshe Chen, Kuo-Chu Chang, Thomas M. Clemons

    Abstract: The joint detection and classification of RF signals has been a critical problem in the field of wideband RF spectrum sensing. Recent advancements in deep learning models have revolutionized this field, remarkably through the application of state-of-the-art computer vision algorithms such as YOLO (You Only Look Once) and DETR (Detection Transformer) to the spectrogram images. This paper focuses on… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  5. arXiv:2408.05457  [pdf, other

    cs.CL cs.AI

    Investigating Instruction Tuning Large Language Models on Graphs

    Authors: Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, Jiawei Han

    Abstract: Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs, aiming to offer empirical insights into how LLMs can effectively interact with graphs and generalize across graph tasks. We begin by constructing… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  6. arXiv:2408.01046  [pdf, other

    cs.CL

    QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

    Authors: Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng

    Abstract: Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered)… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 11 Pages, 5 figures

  7. arXiv:2408.00842  [pdf, other

    quant-ph

    Surface Code with Imperfect Erasure Checks

    Authors: Kathleen Chang, Shraddha Singh, Jahan Claes, Kaavya Sahay, James Teoh, Shruti Puri

    Abstract: Recently, a lot of effort has been devoted towards designing erasure qubits in which dominant physical noise excites leakage states whose population can be detected and returned to the qubit subspace. Interest in these erasure qubits has been driven by studies showing that the requirements for fault-tolerant quantum error correction are significantly relaxed when noise in every gate operation is d… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 19 pages, 15 figures

  8. arXiv:2407.21358  [pdf, other

    cs.AI

    Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs

    Authors: Elan Markowitz, Anil Ramakrishna, Jwala Dhamala, Ninareh Mehrabi, Charith Peris, Rahul Gupta, Kai-Wei Chang, Aram Galstyan

    Abstract: Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge. However, KGs and LLMs are often developed separately and must be integrated after training. We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs. The algorithm equips… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at the ACL 2024 Conference

  9. arXiv:2407.19283  [pdf, other

    cs.CE

    Smart Contracts, Smarter Payments: Innovating Cross Border Payments and Reporting Transactions

    Authors: Maruf Ahmed Mridul, Kaiyang Chang, Aparna Gupta, Oshani Seneviratne

    Abstract: The global financial landscape is experiencing significant transformation driven by technological advancements and evolving market dynamics. Moreover, blockchain technology has become a pivotal platform with widespread applications, especially in finance. Cross-border payments have emerged as a key area of interest, with blockchain offering inherent benefits such as enhanced security, transparency… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 8 pages, 1 figure, 1 table, CIFEr Conference 2024

  10. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  11. arXiv:2407.06549  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  12. arXiv:2407.02511  [pdf, other

    cs.RO cs.AI cs.CL

    LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning

    Authors: Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, Kai-Wei Chang

    Abstract: Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large langua… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

    Comments: Submitted to The 2024 Conference on Empirical Methods in Natural Language Processing

  13. arXiv:2407.02235  [pdf

    cs.CL

    Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

    Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

    Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

  14. arXiv:2407.02083  [pdf, other

    math.DS eess.SY math.OC

    Passivity Tools for Hybrid Learning Rules in Large Populations

    Authors: Jair Certorio, Nuno C. Martins, Kevin Chang, Pierluigi Nuzzo, Yasser Shoukry

    Abstract: Recent work has pioneered the use of system-theoretic passivity to study equilibrium stability for the dynamics of noncooperative strategic interactions in large populations of learning agents. In this and related works, the stability analysis leverages knowledge that certain ``canonical'' classes of learning rules used to model the agents' strategic behaviors satisfy a passivity condition known a… ▽ More

    Submitted 16 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

    MSC Class: 92D10; 92D25

  15. arXiv:2407.00377  [pdf, other

    cs.CL cs.AI cs.CV cs.CY

    The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

    Authors: Yixin Wan, Di Wu, Haoran Wang, Kai-Wei Chang

    Abstract: Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  16. arXiv:2407.00191  [pdf, other

    cs.CL

    MetaKP: On-Demand Keyphrase Generation

    Authors: Di Wu, Xiaoxian Shen, Kai-Wei Chang

    Abstract: Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  17. arXiv:2406.19486  [pdf, other

    cs.CL cs.AI cs.ET cs.LG eess.SP

    LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  18. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  19. arXiv:2406.14621  [pdf, other

    quant-ph

    A mid-circuit erasure check on a dual-rail cavity qubit using the joint-photon number-splitting regime of circuit QED

    Authors: Stijn J. de Graaf, Sophia H. Xue, Benjamin J. Chapman, James D. Teoh, Takahiro Tsunoda, Patrick Winkel, John W. O. Garmon, Kathleen M. Chang, Luigi Frunzio, Shruti Puri, Robert J. Schoelkopf

    Abstract: Quantum control of a linear oscillator using a static dispersive coupling to a nonlinear ancilla underpins a wide variety of experiments in circuit QED. Extending this control to more than one oscillator while minimizing the required connectivity to the ancilla would enable hardware-efficient multi-mode entanglement and measurements. We show that the spectrum of an ancilla statically coupled to a… ▽ More

    Submitted 13 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Updated references

  20. arXiv:2406.14137  [pdf, other

    cs.CL

    MACAROON: Training Vision-Language Models To Be Your Engaged Partners

    Authors: Shujin Wu, Yi R. Fung, Sha Li, Yixin Wan, Kai-Wei Chang, Heng Ji

    Abstract: Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this s… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The code will be made public at https://github.com/ShujinWu-0814/MACAROON

  21. arXiv:2406.13692  [pdf, other

    cs.CL

    Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

    Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

    Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  23. arXiv:2406.12725  [pdf

    cs.CL cs.AI

    Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

    Authors: Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen

    Abstract: Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and co… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.10746  [pdf, other

    cs.CL cs.IR

    SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

    Authors: Haike Xu, Zongyu Lin, Yizhou Sun, Kai-Wei Chang, Piotr Indyk

    Abstract: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations.… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  25. arXiv:2406.10046  [pdf, other

    physics.optics nlin.PS

    In-situ aligned all-polarization-maintaining Er-doped fiber laser mode-locked by a nonlinear amplifying loop mirror

    Authors: Xiang Zhang, Kangrui Chang, Haobin Zheng, Yongzhuang Zhou, Yong Shen, Hongxin Zou

    Abstract: Despite the wide applications for high-repetition-rate mode-locked fiber lasers, challenges persist in shortening the cavity length and coupling the fiber collimators for most existing techniques. Here, we introduce a novel collimator alignment method and demonstrate an all-polarization-maintaining erbium-doped fiber laser that contains a nonlinear amplifying loop mirror with a repetition rate of… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 6 pages, 7 figures

  26. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  27. arXiv:2406.05755  [pdf, other

    cs.CV

    A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

    Authors: Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of appl… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: The article is accepted by IEEE Transactions on Geoscience and Remote Sensing. Our code will be available at https://github.com/hoiliu-0801/DNTR

  28. arXiv:2406.05003  [pdf, other

    cs.RO cs.HC

    Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems

    Authors: Rohan Paleja, Michael Munje, Kimberlee Chang, Reed Jensen, Matthew Gombolay

    Abstract: Collaborative robots and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity and enhancing safety. Despite this, we show in a ubiquitous experimental domain, Overcooked-AI, that state-of-the-art techniques for human-machine teaming (HMT), which rely on imitation or reinforcement learning, are brittle and result in a machine ag… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  29. arXiv:2406.03520  [pdf, other

    cs.CV cs.AI cs.LG

    VideoPhy: Evaluating Physical Commonsense for Video Generation

    Authors: Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover

    Abstract: Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 36 pages, 26 figures, 8 tables

  30. arXiv:2406.01495  [pdf, other

    cs.CL

    Re-ReST: Reflection-Reinforced Self-Training for Language Agents

    Authors: Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonst… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  31. arXiv:2406.00582  [pdf

    eess.SP

    Joint Detection and Classification of Communication and Radar Signals in Congested RF Environments Using YOLOv8

    Authors: Xiwen Kang, Hua-mei Chen, Genshe Chen, Kuo-Chu Chang, Thomas M. Clemons

    Abstract: In this paper, we present a comprehensive study on the application of YOLOv8, a state-of-the-art computer vision (CV) model, to the challenging problem of joint detection and classification of signals in a highly dynamic and congested RF environment. Using our synthetic RF datasets, we explored three different scenarios with congested communication and radar signals. In the first study, we applied… ▽ More

    Submitted 12 August, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Revised from the previous version

  32. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  33. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  34. arXiv:2405.18368  [pdf, other

    cs.CV

    The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

    Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

    Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 1 table

  35. Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning Challenge

    Authors: Kendall Schmidt, Benjamin Bearce, Ken Chang, Laura Coombs, Keyvan Farahani, Marawan Elbatele, Kaouther Mouhebe, Robert Marti, Ruipeng Zhang, Yao Zhang, Yanfeng Wang, Yaojun Hu, Haochao Ying, Yuyang Xu, Conrad Testagrose, Mutlu Demirer, Vikash Gupta, Ünal Akünal, Markus Bujotzek, Klaus H. Maier-Hein, Yi Qin, Xiaomeng Li, Jayashree Kalpathy-Cramer, Holger R. Roth

    Abstract: The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

    Journal ref: Medical Image Analysis Volume 95, July 2024, 103206

  36. arXiv:2405.14842  [pdf, other

    physics.optics astro-ph.IM

    Dual-comb correlation spectroscopy of thermal light

    Authors: Eugene J. Tsao, Alexander J. Lind, Connor Fredrick, Ryan K. Cole, Peter Chang, Kristina F. Chang, Dahyeon Lee, Matthew Heyrich, Nazanin Hoghooghi, Franklyn Quinlan, Scott A. Diddams

    Abstract: The detection of light of thermal origin is the principal means by which humanity has learned about our world and the cosmos. In optical astronomy, in particular, direct detection of thermal photons and the resolution of their spectra have enabled discoveries of the broadest scope and impact. Such measurements, however, do not capture the phase of the thermal fields--a parameter that has proven cr… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 54 pages, 7 figures

  37. arXiv:2405.13355  [pdf

    cond-mat.mtrl-sci cond-mat.other quant-ph

    Geodesic nature and quantization of shift vector

    Authors: Hua Wang, Kai Chang

    Abstract: We present the geodesic nature and quantization of geometric shift vector in quantum systems, with the parameter space defined by the Bloch momentum, using the Wilson loop approach. Our analysis extends to include bosonic phonon drag shift vectors with non-vertical transitions. We demonstrate that the gauge invariant shift vector can be quantized as integer values, analogous to the Euler character… ▽ More

    Submitted 17 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures

  38. arXiv:2405.11145  [pdf, other

    cs.CV cs.AI cs.MM

    Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

    Authors: Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

    Abstract: Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context. Training models on such data foster biased learning and hallucinations as models te… ▽ More

    Submitted 25 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  39. arXiv:2405.10452  [pdf, other

    cs.CL cs.LG

    Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation

    Authors: Junhao Song, Yingfang Yuan, Kaiwen Chang, Bing Xu, Jin Xuan, Wei Pang

    Abstract: To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned acro… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  40. arXiv:2405.08991   

    cs.CV cs.RO

    Theoretical Analysis for Expectation-Maximization-Based Multi-Model 3D Registration

    Authors: David Jin, Harry Zhang, Kai Chang

    Abstract: We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by es… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Course project based on a previous submission. Very similar to the submitted conference version. see here: arXiv:2402.10865

  41. arXiv:2405.07387  [pdf, other

    cs.LG

    Semantic Loss Functions for Neuro-Symbolic Structured Prediction

    Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

    Abstract: Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which inject… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

  42. arXiv:2405.04682  [pdf, other

    cs.CV cs.AI cs.LG

    TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

    Authors: Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

    Abstract: Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., 'a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since t… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 21 pages, 12 figures, 8 tables

  43. arXiv:2404.17779  [pdf, other

    cs.CL

    Medical Vision-Language Pre-Training for Brain Abnormalities

    Authors: Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

    Abstract: Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take b… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  44. arXiv:2404.10965  [pdf, other

    eess.IV

    IMIL: Interactive Medical Image Learning Framework

    Authors: Adrit Rao, Andrea Fisher, Ken Chang, John Christopher Panagides, Katherine McNamara, Joon-Young Lee, Oliver Aalami

    Abstract: Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically relevant information from medical images, leading to incorrect predictions at inference time. We propose the Interactive Medical Image Learning (IMIL) framework, a novel approach for… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop on Domain adaptation, Explainability and Fairness in AI for Medical Image Analysis (DEF-AI-MIA)

  45. arXiv:2404.10508  [pdf, other

    cs.CL cs.AI cs.CY

    White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

    Authors: Yixin Wan, Kai-Wei Chang

    Abstract: Language agency is an important aspect of evaluating social biases in texts. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous research often relies on string-matching techniques to identify agentic and communal words within texts, which fall short… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  46. arXiv:2404.07376  [pdf, other

    cs.CL

    LLMs in Biomedicine: A study on clinical Named Entity Recognition

    Authors: Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlolah Mohaghegh, Mozhdeh Rouhsedaghat, Kai-Wei Chang

    Abstract: Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the bi… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  47. arXiv:2404.04763  [pdf, other

    cs.CV cs.AI

    GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

    Authors: Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng

    Abstract: Multimodal event argument role labeling (EARL), a task that assigns a role for each event participant (object) in an image is a complex challenge. It requires reasoning over the entire image, the depicted event, and the interactions between various objects participating in the event. Existing models heavily rely on high-quality event-annotated training data to understand the event semantics and st… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 20 pages, 15 Figures, 13 figures

  48. arXiv:2404.03921  [pdf, other

    cs.CL

    Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

    Authors: Bowen Zhang, Kehua Chang, Chunping Li

    Abstract: Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted by ICIC 2024 (Oral)

  49. arXiv:2404.03414  [pdf, other

    cs.CL cs.AI

    Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

    Authors: Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, Kai-Wei Chang, Chengwei Su

    Abstract: We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resour… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This paper is accepted to LREC-COLING 2024

  50. arXiv:2404.01679  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Event Detection from Social Media for Epidemic Prediction

    Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

    Abstract: Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024