Skip to main content

Showing 1–50 of 513 results for author: Kang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13427  [pdf, other

    cs.CE cs.AI

    DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection

    Authors: Donghee Choi, Jinkyu Kim, Mogan Gim, Jinho Lee, Jaewoo Kang

    Abstract: Utilizing market forecasts is pivotal in optimizing portfolio selection strategies. We introduce DeepClair, a novel framework for portfolio selection. DeepClair leverages a transformer-based time-series forecasting model to predict market trends, facilitating more informed and adaptable portfolio decisions. To integrate the forecasting model into a deep reinforcement learning-driven portfolio sele… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: CIKM 2024 Accepted

  2. arXiv:2407.11275  [pdf, other

    cs.CV

    M18K: A Comprehensive RGB-D Dataset and Benchmark for Mushroom Detection and Instance Segmentation

    Authors: Abdollah Zakeri, Mulham Fawakherji, Jiming Kang, Bikram Koirala, Venkatesh Balan, Weihang Zhu, Driss Benhaddou, Fatima A. Merchant

    Abstract: Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, growth monitoring, and quality control of the button mushroom produced using Agaricus Bisporus fungus. With over 1… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.11036  [pdf, other

    cs.AI cs.NI

    Hybrid-Generative Diffusion Models for Attack-Oriented Twin Migration in Vehicular Metaverses

    Authors: Yingkai Kang, Jinbo Wen, Jiawen Kang, Tao Zhang, Hongyang Du, Dusit Niyato, Rong Yu, Shengli Xie

    Abstract: The vehicular metaverse is envisioned as a blended immersive domain that promises to bring revolutionary changes to the automotive industry. As a core component of vehicular metaverses, Vehicle Twins (VTs) are digital twins that cover the entire life cycle of vehicles, providing immersive virtual services for Vehicular Metaverse Users (VMUs). Vehicles with limited resources offload the computation… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  4. arXiv:2407.10831  [pdf, other

    cs.CV

    Temporal Event Stereo via Joint Learning with Stereoscopic Flow

    Authors: Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon

    Abstract: Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the t… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  5. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  6. arXiv:2407.09541  [pdf, other

    cs.CL cs.AI cs.CV

    MATE: Meet At The Embedding -- Connecting Images with Long Texts

    Authors: Young Kyun Jang, Junmo Kang, Yong Jae Lee, Donghyun Kim

    Abstract: While advancements in Vision Language Models (VLMs) have significantly improved the alignment of visual and textual data, these models primarily focus on aligning images with short descriptive captions. This focus limits their ability to handle complex text interactions, particularly with longer texts such as lengthy captions or documents, which have not been extensively explored yet. In this pape… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  7. arXiv:2407.09014  [pdf, other

    cs.CL

    CompAct: Compressing Retrieved Documents Actively for Question Answering

    Authors: Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang

    Abstract: Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/dmis-lab/CompAct

  8. arXiv:2407.07475  [pdf, ps, other

    cs.NI

    Learning-based Power Control for Secure Covert Semantic Communication

    Authors: Yansheng Liu, Jinbo Wen, Zongyao Zhang, Kun Zhu, Jiawen Kang

    Abstract: Despite progress in semantic communication (SemCom), research on SemCom security is still in its infancy. To bridge this gap, we propose a general covert SemCom framework for wireless networks, reducing eavesdropping risk. Our approach transmits semantic information covertly, making it difficult for wardens to detect. Given the aim of maximizing covert SemCom performance, we formulate a power cont… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  9. arXiv:2407.05744  [pdf, other

    eess.AS cs.SD

    Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas

    Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Vanessa Boey, Irene Lee, Joo Young Hong, Jian Kang, Kar Fye Alvin Lee, Georgios Christopoulos, Woon-Seng Gan

    Abstract: Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 41 pages, 4 figures. Preprint submitted to an Elsevier journal

  10. arXiv:2407.05520  [pdf, ps, other

    cs.LG stat.ML

    A Theory of Machine Learning

    Authors: Jinsook Kim, Jinho Kang

    Abstract: We critically review three major theories of machine learning and provide a new theory according to which machines learn a function when the machines successfully compute it. We show that this theory challenges common assumptions in the statistical and the computational learning theories, for it implies that learning true probabilities is equivalent neither to obtaining a correct calculation of th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  11. arXiv:2407.04367  [pdf, ps, other

    math.CO cs.DS

    Reconfiguration of Independent Transversals

    Authors: Pjotr Buys, Ross J. Kang, Kenta Ozeki

    Abstract: Given integers $Δ\ge 2$ and $t\ge 2Δ$, suppose there is a graph of maximum degree $Δ$ and a partition of its vertices into blocks of size at least $t$. By a seminal result of Haxell, there must be some independent set of the graph that is transversal to the blocks, a so-called independent transversal. We show that, if moreover $t\ge2Δ+1$, then every independent transversal can be transformed withi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    MSC Class: 05C35; 05C69; 05C15; 68R05; 68R10

  12. arXiv:2407.04345  [pdf, other

    cs.CV

    CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

    Authors: Jisu Shin, Junmyeong Lee, Seongmin Lee, Min-Gyu Park, Ju-Mi Kang, Ju Hong Yoon, Hae-Gon Jeon

    Abstract: We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Accepted (18 pages, 9 figures)

  13. arXiv:2407.02945  [pdf, other

    cs.CV

    VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

    Authors: Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo

    Abstract: Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolate… ▽ More

    Submitted 13 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. Project Page: https://vegs3d.github.io/

  14. arXiv:2407.01850  [pdf, other

    cs.CL

    Purple-teaming LLMs with Adversarial Defender Training

    Authors: Jingyan Zhou, Kun Li, Junan Li, Jiawen Kang, Minda Hu, Xixin Wu, Helen Meng

    Abstract: Existing efforts in safeguarding LLMs are limited in actively exposing the vulnerabilities of the target LLM and readily adapting to newly emerging safety risks. To address this, we present Purple-teaming LLMs with Adversarial Defender training (PAD), a pipeline designed to safeguard LLMs by novelly incorporating the red-teaming (attack) and blue-teaming (safety training) techniques. In PAD, we au… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.00978  [pdf, other

    cs.AI cs.LG

    Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach

    Authors: Cheng Su, Jinbo Wen, Jiawen Kang, Yonghua Wang, Hudan Pan, M. Shamim Hossain

    Abstract: Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures

  16. arXiv:2406.18864  [pdf, other

    cs.CV

    Learning Modality Knowledge Alignment for Cross-Modality Transfer

    Authors: Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang

    Abstract: Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  17. arXiv:2406.18763  [pdf, other

    cs.LG cs.AI

    Conformalized Link Prediction on Graph Neural Networks

    Authors: Tianyi Zhao, Jian Kang, Lu Cheng

    Abstract: Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  18. arXiv:2406.16030  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

    Authors: Jimin Sohn, Haeji Jung, Alex Cheng, Jooeon Kang, Yilin Du, David R. Mortensen

    Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significa… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, 5 tables

  19. arXiv:2406.15664  [pdf, other

    stat.ML cs.LG

    Flat Posterior Does Matter For Bayesian Transfer Learning

    Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

    Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  20. arXiv:2406.14333  [pdf, other

    cs.IR cs.SD eess.AS

    LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation

    Authors: Rebecca Salganik, Xiaohao Liu, Yunshan Ma, Jian Kang, Tat-Seng Chua

    Abstract: As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming. Currently, many existing playlist continuation approaches rely on collaborative filtering methods to perform recommendation.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  21. arXiv:2406.13964  [pdf, other

    cs.NI

    Hierarchical Micro-Segmentations for Zero-Trust Services via Large Language Model (LLM)-enhanced Graph Diffusion

    Authors: Yinqiu Liu, Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin Shen

    Abstract: In the rapidly evolving Next-Generation Networking (NGN) era, the adoption of zero-trust architectures has become increasingly crucial to protect security. However, provisioning zero-trust services in NGNs poses significant challenges, primarily due to the environmental complexity and dynamics. Motivated by these challenges, this paper explores efficient zero-trust service provisioning using hiera… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  22. arXiv:2406.12034  [pdf, other

    cs.CL cs.LG

    Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

    Authors: Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

    Abstract: We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  23. arXiv:2406.11208  [pdf

    cs.NI

    Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses

    Authors: Cheng Su, Xiaofeng Luo, Zhenmou Liu, Jiawen Kang, Min Hao, Zehui Xiong, Zhaohui Yang, Chongwen Huang

    Abstract: The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 6pages, 4 figures

  24. arXiv:2406.10671  [pdf

    cs.CL

    Augmenting Biomedical Named Entity Recognition with General-domain Resources

    Authors: Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

    Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

  25. arXiv:2406.10382  [pdf, other

    cs.AI cs.CL

    Efficient Prompting for LLM-based Generative Internet of Things

    Authors: Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani

    Abstract: Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network s… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  26. arXiv:2406.09003  [pdf, other

    cs.CV cs.LG

    Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

    Authors: Lincan Cai, Shuang Li, Wenxuan Ma, Jingxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu

    Abstract: Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  27. arXiv:2406.08809  [pdf, other

    cs.SD cs.AI eess.AS

    Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

    Authors: Jaeyong Kang, Dorien Herremans

    Abstract: Deep learning models for music have advanced drastically in the last few years. But how good are machine learning models at capturing emotion these days and what challenges are researchers facing? In this paper, we provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field. We also provide a brief overview of vario… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  28. arXiv:2406.05914  [pdf, other

    eess.AS cs.SD eess.SP

    Soundscape Captioning using Sound Affective Quality Network and Large Language Model

    Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

    Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

  29. arXiv:2406.05422  [pdf, other

    cs.AI cs.RO

    Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses

    Authors: Yongju Tong, Jiawen Kang, Junlong Chen, Minrui Xu, Gaolei Li, Weiting Zhang, Xincheng Yan

    Abstract: Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic v… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  30. arXiv:2406.05418  [pdf, other

    cs.AI cs.NI

    Multi-attribute Auction-based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-based DRL Approach

    Authors: Yongju Tong, Junlong Chen, Minrui Xu, Jiawen Kang, Zehui Xiong, Dusit Niyato, Chau Yuen, Zhu Han

    Abstract: Vehicular Metaverses are developed to enhance the modern automotive industry with an immersive and safe experience among connected vehicles and roadside infrastructures, e.g., RoadSide Units (RSUs). For seamless synchronization with virtual spaces, Vehicle Twins (VTs) are constructed as digital representations of physical entities. However, resource-intensive VTs updating and high mobility of vehi… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures, 3 tables

  31. arXiv:2406.00014  [pdf, other

    cs.DB cs.AI cs.CL cs.IR

    KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

    Authors: Hajung Kim, Chanhwi Kim, Hoonick Lee, Kyochul Jang, Jiwoo Lee, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang

    Abstract: Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handle… ▽ More

    Submitted 19 June, 2024; v1 submitted 21 May, 2024; originally announced June 2024.

    Comments: Published at ClinicalNLP workshop @ NAACL 2024

  32. arXiv:2405.20568  [pdf, other

    cs.LG cs.NI

    Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Fang Mei, Jiawen Kang, Hongyang Du, Shiwen Mao

    Abstract: As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and en… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  33. arXiv:2405.20567  [pdf, other

    cs.RO

    Fast Decentralized State Estimation for Legged Robot Locomotion via EKF and MHE

    Authors: Jiarong Kang, Yi Wang, Xiaobin Xiong

    Abstract: In this paper, we present a fast and decentralized state estimation framework for the control of legged locomotion. The nonlinear estimation of the floating base states is decentralized to an orientation estimation via Extended Kalman Filter (EKF) and a linear velocity estimation via Moving Horizon Estimation (MHE). The EKF fuses the inertia sensor with vision to estimate the floating base orienta… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  35. arXiv:2405.14222  [pdf, other

    cs.LG cs.CV eess.IV

    RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

    Authors: Jiwan Seo, Joonhyuk Kang

    Abstract: Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Under review

  36. arXiv:2405.12701  [pdf, other

    cs.CL cs.AI

    OLAPH: Improving Factuality in Biomedical Long-form Question Answering

    Authors: Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang

    Abstract: In the medical domain, numerous scenarios necessitate the long-form generation ability of large language models (LLMs). Specifically, when addressing patients' questions, it is essential that the model's response conveys factual claims, highlighting the need for an automated method to evaluate those claims. Thus, we introduce MedLFQA, a benchmark dataset reconstructed using long-form question-answ… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  37. arXiv:2405.12472  [pdf, ps, other

    cs.NI

    Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts

    Authors: Ruichen Zhang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Ping Zhang, Dong In Kim

    Abstract: In the continued development of next-generation networking and artificial intelligence content generation (AIGC) services, the integration of multi-agent systems (MAS) and the mixture of experts (MoE) frameworks is becoming increasingly important. Motivated by this, this article studies the contrasting and converging of MAS and MoE in AIGC-enabled networking. First, we discuss the architectural de… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  38. arXiv:2405.11473  [pdf, other

    cs.CV cs.AI

    FIFO-Diffusion: Generating Infinite Videos from Text without Training

    Authors: Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han

    Abstract: We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Project Page: https://jjihwan.github.io/projects/FIFO-Diffusion

  39. arXiv:2405.04907  [pdf, other

    cs.NI

    Empowering Wireless Networks with Artificial Intelligence Generated Graph

    Authors: Jiacheng Wang, Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Haibo Zhou, Dong In Kim

    Abstract: In wireless communications, transforming network into graphs and processing them using deep learning models, such as Graph Neural Networks (GNNs), is one of the mainstream network optimization approaches. While effective, the generative AI (GAI) shows stronger capabilities in graph analysis, processing, and generation, than conventional methods such as GNN, offering a broader exploration space for… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  40. arXiv:2405.04198  [pdf, other

    cs.CR

    Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts

    Authors: Changyuan Zhao, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin, Shen, Khaled B. Letaief

    Abstract: AI technologies have become more widely adopted in wireless communications. As an emerging type of AI technologies, the generative artificial intelligence (GAI) gains lots of attention in communication security. Due to its powerful learning ability, GAI models have demonstrated superiority over conventional AI methods. However, GAI still has several limitations, including high computational comple… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  41. arXiv:2405.00523  [pdf, other

    cs.AI cs.CL

    CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions

    Authors: Donghee Choi, Mogan Gim, Donghyeon Park, Mujeen Sung, Hyunjae Kim, Jaewoo Kang, Jihun Choi

    Abstract: This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which res… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024 Accepted

  42. arXiv:2404.18881  [pdf, other

    cs.HC cs.LG cs.SE

    Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking

    Authors: Hong Jin Kang, Fabrice Harel-Canada, Muhammad Ali Gulzar, Violet Peng, Miryung Kim

    Abstract: Data augmentation techniques apply transformations to existing texts to generate additional data. The transformations may produce low-quality texts, where the meaning of the text is changed and the text may even be mangled beyond human comprehension. Analyzing the synthetically generated texts and their corresponding labels is slow and demanding. To winnow out texts with incorrect labels, we devel… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 Findings

  43. arXiv:2404.18756  [pdf, other

    cs.SE cs.PL

    K-CIRCT: A Layered, Composable, and Executable Formal Semantics for CIRCT Hardware IRs

    Authors: Jianhong Zhao, Jinhui Kang, Yongwang Zhao

    Abstract: CIRCT, an open-source EDA framework akin to LLVM for software, is a foundation for various hardware description languages. Despite its crucial role, CIRCT's lack of formal semantics challenges necessary rigorous hardware verification. Thus, this paper introduces K-CIRCT, the first formal semantics in {\K} for a substantial CIRCT subset adequate for simulating a RISC-V processor. Our semantics are… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  44. arXiv:2404.18077  [pdf, other

    cs.NI cs.LG

    Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models

    Authors: Jinbo Wen, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Hongyang Du, Yang Zhang, Zhu Han

    Abstract: By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excelle… ▽ More

    Submitted 17 July, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.16947  [pdf, other

    cs.SE

    Fuzzing MLIR by Synthesizing Custom Mutations

    Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim

    Abstract: Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available diale… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  46. arXiv:2404.16356  [pdf, other

    cs.NI cs.AI cs.LG

    Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey

    Authors: Minrui Xu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Yuguang Fang, Dong In Kim, Xuemin, Shen

    Abstract: Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets, completing sensor data, and making sequential decisions. In addition, the mixture of experts (MoE) can enable the distributed and collaborative execution of AI models without performance degradation between connected vehicl… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  47. arXiv:2404.15292  [pdf, other

    eess.SP cs.IT

    Multi-objective Optimization for Multi-UAV-assisted Mobile Edge Computing

    Authors: Geng Sun, Yixian Wang, Zemin Sun, Qingqing Wu, Jiawen Kang, Dusit Niyato, Victor C. M. Leung

    Abstract: Recent developments in unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) have provided users with flexible and resilient computing services. However, meeting the computing-intensive and latency-sensitive demands of users poses a significant challenge due to the limited resources of UAVs. To address this challenge, we present a multi-objective optimization approach for multi-UAV-assis… ▽ More

    Submitted 23 March, 2024; originally announced April 2024.

  48. arXiv:2404.14687  [pdf, other

    cs.MM cs.AI cs.CL cs.CV

    Pegasus-v1 Technical Report

    Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

    Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.13919  [pdf, other

    cs.CL cs.AI cs.HC

    Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models

    Authors: Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang

    Abstract: Large Language Models (LLMs) have significantly impacted the writing process, enabling collaborative content creation and enhancing productivity. However, generating high-quality, user-aligned text remains challenging. In this paper, we propose Writing Path, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality pieces of writing. Our approach draws inspira… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: under review

  50. arXiv:2404.13898  [pdf, other

    cs.NI

    Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering

    Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, Ping Zhang, Xuemin Shen

    Abstract: Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential tr… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.