Skip to main content

Showing 1–50 of 970 results for author: Park, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11682  [pdf, other

    cs.CV

    MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

    Authors: Xiaoshuai Hao, Ruikai Li, Hui Zhang, Dingzhe Li, Rong Yin, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, Jing Zhang

    Abstract: Online high-definition (HD) map construction is an important and challenging task in autonomous driving. Recently, there has been a growing interest in cost-effective multi-view camera-based methods without relying on other sensors like LiDAR. However, these methods suffer from a lack of explicit depth information, necessitating the use of large models to achieve satisfactory performance. To addre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2407.11057  [pdf, other

    cs.LG cs.AI q-bio.BM

    SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction

    Authors: Seungyeon Choi, Sangmin Seo, Sanghyun Park

    Abstract: Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI 2024

  3. arXiv:2407.10476  [pdf, other

    cs.CV cs.AI

    Kinetic Typography Diffusion Model

    Authors: Seonmi Park, Inhwan Bae, Seunghyun Shin, Hae-Gon Jeon

    Abstract: This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by p… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024, Project page: https://seonmip.github.io/kinety

  4. arXiv:2407.06851  [pdf, other

    cs.CL

    Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders

    Authors: Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohyung Park, Sungzoon Cho

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ACL 2024 KnowledgeableLMs workshop paper

  5. arXiv:2407.06614  [pdf, other

    eess.IV cs.CV

    Implicit Regression in Subspace for High-Sensitivity CEST Imaging

    Authors: Chu Chen, Yang Liu, Se Weon Park, Jizhou Li, Kannie W. Y. Chan, Raymond H. F. Chan

    Abstract: Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.06328  [pdf, other

    cs.MA eess.SY

    Learning Equilibrium with Estimated Payoffs in Population Games

    Authors: Shinkyu Park

    Abstract: We study a multi-agent decision problem in population games, where agents select from multiple available strategies and continually revise their selections based on the payoffs associated with these strategies. Unlike conventional population game formulations, we consider a scenario where agents must estimate the payoffs through local measurements and communication with their neighbors. By employi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  7. arXiv:2407.05527  [pdf, other

    cs.CV cs.LG eess.IV

    Rethinking Image Skip Connections in StyleGAN2

    Authors: Seung Park, Yong-Goo Shin

    Abstract: Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathemat… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  8. arXiv:2407.03051  [pdf, other

    cs.CL

    Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

    Authors: Janghwan Lee, Seongmin Park, Sukjin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi

    Abstract: The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu… ▽ More

    Submitted 18 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ACL 2024 Main

  9. arXiv:2407.02736  [pdf, other

    cs.CL

    MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control

    Authors: Yeonji Lee, Sangjun Park, Kyunghyun Cho, JinYeong Bak

    Abstract: As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  10. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  11. arXiv:2406.19502  [pdf, other

    cs.CL cs.AI

    Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning

    Authors: Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo

    Abstract: Despite significant advancements, there is a limited understanding of how large language models (LLMs) utilize knowledge for reasoning. To address this, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with parent nodes of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Work in progress; code is available at https://github.com/kaistAI/knowledge-reasoning

  12. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  14. arXiv:2406.15819  [pdf, other

    cs.LG cs.IT cs.NI eess.SP

    Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning

    Authors: Qiushuo Hou, Matteo Zecchin, Sangwoo Park, Yunlong Cai, Guanding Yu, Kaushik Chowdhury, Osvaldo Simeone

    Abstract: In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameter… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: submitted for a journal publication

  15. arXiv:2406.15635  [pdf, other

    cs.LG cs.CR cs.CV

    DataFreeShield: Defending Adversarial Attacks without Training Data

    Authors: Hyeyoon Lee, Kanghyun Choi, Dain Kwon, Sunjong Park, Mayoore Selvarasa Jaiswal, Noseong Park, Jonghyun Choi, Jinho Lee

    Abstract: Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data bec… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  16. arXiv:2406.13023  [pdf

    math.OC cs.LG

    Stackelberg Games with $k$-Submodular Function under Distributional Risk-Receptiveness and Robustness

    Authors: Seonghun Park, Manish Bansal

    Abstract: We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  17. arXiv:2406.12233  [pdf, other

    cs.AI cs.CL cs.CV

    SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

    Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

    Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.11608  [pdf, other

    cs.CV

    Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

    Authors: Seulki Park, Youren Zhang, Stella X. Yu, Sara Beery, Jonathan Huang

    Abstract: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  19. arXiv:2406.11260  [pdf, other

    cs.CL cs.AI

    Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection

    Authors: Sungwon Park, Sungwon Han, Meeyoung Cha

    Abstract: The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more cha… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  20. arXiv:2406.09799  [pdf, other

    cs.CY

    GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

    Authors: Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha

    Abstract: Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Pre… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  21. arXiv:2406.09329  [pdf, other

    cs.LG cs.AI

    Is Value Learning Really the Main Bottleneck in Offline RL?

    Authors: Seohong Park, Kevin Frans, Sergey Levine, Aviral Kumar

    Abstract: While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2406.08020  [pdf, other

    cs.CV

    Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

    Authors: Kyeongjin Ahn, Sungwon Han, Sungwon Park, Jihee Kim, Sangyoon Park, Meeyoung Cha

    Abstract: The increasing frequency and intensity of natural disasters demand more sophisticated approaches for rapid and precise damage assessment. To tackle this issue, researchers have developed various methods on disaster benchmark datasets from satellite imagery to aid in detecting disaster damage. However, the diverse nature of geographical landscapes and disasters makes it challenging to apply existin… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures, 2 tables

  23. arXiv:2406.07886  [pdf, other

    cs.CL

    Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection

    Authors: Jaehoon Kim, Seungwan Jin, Sohyun Park, Someen Park, Kyungsik Han

    Abstract: Detecting implicit hate speech that is not directly hateful remains a challenge. Recent research has attempted to detect implicit hate speech by applying contrastive learning to pre-trained language models such as BERT and RoBERTa, but the proposed models still do not have a significant advantage over cross-entropy loss-based learning. We found that contrastive learning based on randomly sampled b… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  24. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  25. arXiv:2406.07736  [pdf, other

    cs.CL

    MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models

    Authors: Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo, Soonha Hwang, Seonwoo Park, Sungeun Lee

    Abstract: As the capabilities of LLMs expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, a robust test suite designed for the multilingual pragmatic evaluation of LLMs across English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Coop… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 8 pages, under review

  26. arXiv:2406.07488  [pdf, other

    cs.CV

    ReduceFormer: Attention with Tensor Reduction by Summation

    Authors: John Yang, Le An, Su Inn Park

    Abstract: Transformers have excelled in many tasks including vision. However, efficient deployment of transformer models in low-latency or high-throughput applications is hindered by the computation in the attention mechanism which involves expensive operations such as matrix multiplication and Softmax. To address this, we introduce ReduceFormer, a family of models optimized for efficiency with the spirit o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  28. arXiv:2406.06287  [pdf, other

    math.NA cs.LG

    VS-PINN: A fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior

    Authors: Seungchan Ko, Sang Hyeon Park

    Abstract: Physics-informed neural networks (PINNs) have recently emerged as a promising way to compute the solutions of partial differential equations (PDEs) using deep neural networks. However, despite their significant success in various fields, it remains unclear in many aspects how to effectively train PINNs if the solutions of PDEs exhibit stiff behaviors or high frequencies. In this paper, we propose… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  30. arXiv:2406.05432  [pdf, other

    cs.CV

    Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

    Authors: Minho Park, Sunghyun Park, Jooyeol Yun, Jaegul Choo

    Abstract: Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models, which prove particularly valuable in scenarios where real-world data is limited. In this study, our goal is to address the challenges when fine-tuning vision-language models (e.g., CLIP) on generated datasets. Specifically, we aim to fine-tune visio… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  31. arXiv:2406.05396  [pdf, other

    cs.LG cs.AI cs.CV

    Mean-field Chaos Diffusion Models

    Authors: Sungwoo Park, Dongjun Kim, Ahmed Alaa

    Abstract: In this paper, we introduce a new class of score-based generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory. We present mean-field chaos diffusion models (MF-CDMs), which address the curse of dimensionality inherent in high-cardinality data by utilizing the propagation of chaos property of interacting particles. By treating h… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  32. arXiv:2406.03671  [pdf, other

    cs.LG cs.AI

    PANDA: Expanded Width-Aware Message Passing Beyond Rewiring

    Authors: Jeongwhan Choi, Sumin Park, Hyowon Wi, Sung-Bae Cho, Noseong Park

    Abstract: Recent research in the field of graph neural network (GNN) has identified a critical issue known as "over-squashing," resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagatio… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  33. SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

    Authors: Sanglee Park, Seung-won Hwang, Jungmin So

    Abstract: Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: accepted at ICASSP 2023

  34. arXiv:2406.00505  [pdf, other

    cs.CV

    Improving Text Generation on Images with Synthetic Captions

    Authors: Jun Young Koh, Sang Hyun Park, Joy Song

    Abstract: The recent emergence of latent diffusion models such as SDXL and SD 1.5 has shown significant capability in generating highly detailed and realistic images. Despite their remarkable ability to produce images, generating accurate text within images still remains a challenging task. In this paper, we examine the validity of fine-tuning approaches in generating legible text within the image. We propo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 9 pages, 12 figures

  35. arXiv:2405.20829  [pdf, other

    cs.CV cs.LG

    Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

    Authors: Seongheon Park, Hyuk Kwon, Kwanghoon Sohn, Kibok Lee

    Abstract: Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: CVPR Workshop on Computer Vision in the Wild (CVinW), 2024

  36. arXiv:2405.19961  [pdf, other

    cs.LG

    Collective Variable Free Transition Path Sampling with Generative Flow Network

    Authors: Kiyoung Seong, Seonghyun Park, Seonghwan Kim, Woo Youn Kim, Sungsoo Ahn

    Abstract: Understanding transition paths between meta-stable states in molecular systems is fundamental for material design and drug discovery. However, sampling these paths via unbiased molecular dynamics simulations is computationally prohibitive due to the high energy barriers between the meta-stable states. Recent machine learning approaches are often restricted to simple systems or rely on collective v… ▽ More

    Submitted 18 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, 2 tables

  37. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 9 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  38. arXiv:2405.18400  [pdf, other

    cs.CL cs.LG

    Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

    Authors: Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

    Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages, 15 figures

  39. arXiv:2405.17977  [pdf, other

    cs.CL

    Aligning to Thousands of Preferences via System Message Generalization

    Authors: Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo

    Abstract: Although humans inherently have diverse values, current large language model (LLM) alignment methods often assume that aligning LLMs with the general public's preferences is optimal. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  40. arXiv:2405.17206  [pdf, other

    cs.SD cs.LG

    A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

    Authors: Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque

    Abstract: We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 25 pages, 5 figures, and 4 tables

  41. arXiv:2405.15912  [pdf, other

    cs.PL cs.LG stat.ML

    Uncertainty Quantification for Neurosymbolic Programs via Compositional Conformal Prediction

    Authors: Ramya Ramalingam, Sangdon Park, Osbert Bastani

    Abstract: Machine learning has become an effective tool for automatically annotating unstructured data (e.g., images) with structured labels (e.g., object detections). As a result, a new programming paradigm called neurosymbolic programming has emerged where users write queries against these predicted annotations. However, due to the intrinsic fallibility of machine learning models, these programs currently… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: arXiv abstract edited

  42. arXiv:2405.11530  [pdf, other

    cs.LG

    Learning More Generalized Experts by Merging Experts in Mixture-of-Experts

    Authors: Sejik Park

    Abstract: We observe that incorporating a shared layer in a mixture-of-experts can lead to performance degradation. This leads us to hypothesize that learning shared features poses challenges in deep learning, potentially caused by the same feature being learned as various different features. To address this issue, we track each expert's usage frequency and merge the two most frequently selected experts. We… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures

  43. arXiv:2405.11441  [pdf, other

    cs.IR cs.CL

    EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations

    Authors: Chiyu Zhang, Yifei Sun, Minghao Wu, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Rong Jin, Angli Liu, Ji Zhu, Sem Park, Ning Yao, Bo Long

    Abstract: Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum deri… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Under review

  44. Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

    Authors: Seongmin Park, Kyungho Kim, Jaejin Seo, Jihwa Lee

    Abstract: We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simp… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: ICASSP 2024

  45. arXiv:2405.07267  [pdf, other

    cs.HC

    Fields, Bridges, and Foundations: How Researchers Browse Citation Network Visualizations

    Authors: Kiroong Choe, Eunhye Kim, Sangwon Park, Jinwook Seo

    Abstract: Visualizing citation relations with network structures is widely used, but the visual complexity can make it challenging for individual researchers to navigate through them. We collected data from 18 researchers using an interface that we designed using network simplification methods and analyzed how users browsed and identified important papers. Our analysis reveals six major patterns used for id… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  46. arXiv:2405.03183  [pdf, other

    cs.DC cs.CR math.NA

    Impact of EIP-4844 on Ethereum: Consensus Security, Ethereum Usage, Rollup Transaction Dynamics, and Blob Gas Fee Markets

    Authors: Seongwan Park, Bosul Mun, Seungyun Lee, Woojin Jeong, Jaewook Lee, Hyeonsang Eom, Huisu Jang

    Abstract: On March 13, 2024, Ethereum implemented EIP-4844, designed to enhance its role as a data availability layer. While this upgrade reduces data posting costs for rollups, it also raises concerns about its impact on the consensus layer due to increased propagation sizes. Moreover, the broader effects on the overall Ethereum ecosystem remain largely unexplored. In this paper, we conduct an empirical an… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  47. arXiv:2405.01033  [pdf, other

    cs.LG cs.IT

    CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes

    Authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No

    Abstract: Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMP… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 13 pages

  48. arXiv:2405.00260  [pdf, other

    cs.CV

    CREPE: Coordinate-Aware End-to-End Document Parser

    Authors: Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee

    Abstract: In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference

  49. arXiv:2405.00021  [pdf, other

    cs.CV cs.AI cs.CL

    SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials

    Authors: Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park

    Abstract: Recently, interpreting complex charts with logical reasoning has emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Large Language Model (LLM) for reasoning. However, unlike natural images, charts contain a mix of essen… ▽ More

    Submitted 17 June, 2024; v1 submitted 22 February, 2024; originally announced May 2024.

  50. arXiv:2404.19299  [pdf, other

    cs.CV

    Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

    Authors: Sungjune Park, Hyunjun Kim, Yong Man Ro

    Abstract: Pedestrian detection is a crucial field of computer vision research which can be adopted in various real-world applications (e.g., self-driving systems). However, despite noticeable evolution of pedestrian detection, pedestrian representations learned within a detection framework are usually limited to particular scene data in which they were trained. Therefore, in this paper, we propose a novel a… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.