Skip to main content

Showing 1–50 of 476 results for author: Choo, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12329  [pdf, other

    cs.CV

    Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views

    Authors: Jihoon Cho, Suhyun Ahn, Beomju Kim, Hyungjoon Bae, Xiaofeng Liu, Fangxu Xing, Kyungeun Lee, Georges Elfakhri, Van Wedeen, Jonghye Woo, Jinah Park

    Abstract: Deep learning-based segmentation techniques have shown remarkable performance in brain segmentation, yet their success hinges on the availability of extensive labeled training data. Acquiring such vast datasets, however, poses a significant challenge in many clinical applications. To address this issue, in this work, we propose a novel 3D brain segmentation approach using complementary 2D diffusio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Extended version of "3D Segmentation of Subcortical Brain Structure with Few Labeled Data using 2D Diffusion Models" (ISMRM 2024 oral)

  2. arXiv:2407.11245  [pdf, other

    cs.IR cs.AI

    Pacer and Runner: Cooperative Learning Framework between Single- and Cross-Domain Sequential Recommendation

    Authors: Chung Park, Taesan Kim, Hyungjun Yoon, Junui Hong, Yelim Yu, Mincheol Cho, Minsung Choi, Jaegul Choo

    Abstract: Cross-Domain Sequential Recommendation (CDSR) improves recommendation performance by utilizing information from multiple domains, which contrasts with Single-Domain Sequential Recommendation (SDSR) that relies on a historical interaction within a specific domain. However, CDSR may underperform compared to the SDSR approach in certain domains due to negative transfer, which occurs when there is a l… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR'24

  3. arXiv:2407.09779  [pdf, other

    cs.CV cs.AI

    Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation

    Authors: Kangyeol Kim, Wooseok Seo, Sehyun Nam, Bodam Kim, Suhyeon Jeong, Wonwoo Cho, Jaegul Choo, Youngjae Yu

    Abstract: Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generati… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  4. arXiv:2407.09012  [pdf, other

    cs.CV cs.AI

    TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

    Authors: Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo

    Abstract: Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  5. arXiv:2407.07176  [pdf, other

    cs.CV

    Scaling Up Personalized Aesthetic Assessment via Task Vector Customization

    Authors: Jooyeol Yun, Jaegul Choo

    Abstract: The task of personalized image aesthetic assessment seeks to tailor aesthetic score prediction models to match individual preferences with just a few user-provided inputs. However, the scalability and generalization capabilities of current approaches are considerably restricted by their reliance on an expensive curated database. To overcome this long-standing scalability challenge, we present a un… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  6. arXiv:2407.03890  [pdf, other

    cs.RO

    Addressing Relative Pose Impact on UWB Localization: Dataset Introduction and Analysis

    Authors: Jun Hyeok Choe, Inwook Shim

    Abstract: UWB has recently gained new attention as an auxiliary sensor in the field of robot localization due to its compactness and ease of distance measurement. Consequently, various UWB-related localization and dataset research have increased. Despite this broad interest, there is a lack of UWB datasets that thoroughly analyze the performance of UWB ranging measurement. To address this issue, our paper i… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 4 pages

  7. arXiv:2407.02945  [pdf, other

    cs.CV

    VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

    Authors: Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo

    Abstract: Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolate… ▽ More

    Submitted 13 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. Project Page: https://vegs3d.github.io/

  8. arXiv:2407.01158  [pdf, other

    cs.CL

    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

    Authors: Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

    Abstract: Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress. Resources are available at https://github.com/youngerous/qtree

  9. arXiv:2407.00553  [pdf, other

    cs.LG cs.AI

    Cooperative Advisory Residual Policies for Congestion Mitigation

    Authors: Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell

    Abstract: Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver b… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  10. arXiv:2406.16521  [pdf, other

    cs.CL cs.AI

    Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

    Authors: Jimin Sohn, Jeihee Cho, Junyong Lee, Songmu Heo, Ji-Eun Han, David R. Mortensen

    Abstract: Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative f… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 8 figures

  11. arXiv:2406.16469  [pdf, other

    cs.CL cs.CV

    Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

    Authors: Yujin Baek, ChaeHun Park, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

    Abstract: To create culturally inclusive vision-language models (VLMs), the foremost requirement is developing a test benchmark that can diagnose the models' ability to respond to questions reflecting cultural elements. This paper addresses the necessity for such benchmarks, noting that existing research has relied on human annotators' manual efforts, which impedes diversity and efficiency. We propose a sem… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.14954  [pdf, other

    eess.IV cs.CV

    A Unified Framework for Synthesizing Multisequence Brain MRI via Hybrid Fusion

    Authors: Jihoon Cho, Jonghye Woo, Jinah Park

    Abstract: Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (H… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures

  13. arXiv:2406.14585  [pdf

    physics.comp-ph cs.LG physics.data-an physics.optics

    Deep-learning-assisted reconfigurable metasurface antenna for real-time holographic beam steering

    Authors: Hyunjun Ma, Jin-soo Kim, Jong-Ho Choe, Q-Han Park

    Abstract: We propose a metasurface antenna capable of real time holographic beam steering. An array of reconfigurable dipoeles can generate on demand far field patterns of radiation through the specific encoding of meta atomic states. i.e., the configuration of each dipole. Suitable states for the generation of the desired patterns can be identified using iteartion, but this is very slow and needs to be don… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: Nanophotonics 12.13 (2023): 2415-2423

  14. arXiv:2406.14091  [pdf, other

    cs.CL

    Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models

    Authors: Dohyun Lee, Daniel Rim, Minseok Choi, Jaegul Choo

    Abstract: Although language models (LMs) demonstrate exceptional capabilities on various tasks, they are potentially vulnerable to extraction attacks, which represent a significant privacy risk. To mitigate the privacy concerns of LMs, machine unlearning has emerged as an important research area, which is utilized to induce the LM to selectively forget about some of its training data. While completely retra… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 findings

  15. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encod… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  16. arXiv:2406.12354  [pdf, other

    cs.CL

    Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

    Authors: Minseok Choi, Kyunghyun Min, Jaegul Choo

    Abstract: Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  17. arXiv:2406.12329  [pdf, other

    cs.CL

    SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions

    Authors: Minseok Choi, Daniel Rim, Dohyun Lee, Jaegul Choo

    Abstract: Instruction-following large language models (LLMs), such as ChatGPT, have become increasingly popular with the general audience, many of whom are incorporating them into their daily routines. However, these LLMs inadvertently disclose personal or copyrighted information, which calls for a machine unlearning method to remove selective knowledge. Previous attempts sought to forget the link between t… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  18. arXiv:2406.12319  [pdf, other

    cs.CL

    PRePair: Pointwise Reasoning Enhance Pairwise Evaluating for Robust Instruction-Following Assessments

    Authors: Hawon Jeong, ChaeHun Park, Jimin Hong, Jaegul Choo

    Abstract: Pairwise evaluation using large language models (LLMs) is widely used for evaluating natural language generation (NLG) tasks. However, the reliability of LLMs is often compromised by biases, such as favoring verbosity and authoritative tone. In the study, we focus on the comparison of two LLM-based evaluation approaches, pointwise and pairwise. Our findings demonstrate that pointwise evaluators ex… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  19. arXiv:2406.12307  [pdf, other

    cs.CL

    Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

    Authors: Seungbin Yang, ChaeHun Park, Taehee Kim, Jaegul Choo

    Abstract: Recent advancements in integrating large language models (LLMs) with tools have allowed the models to interact with real-world environments. However, these tool-augmented LLMs often encounter incomplete scenarios when users provide partial information or the necessary tools are unavailable. Recognizing and managing such scenarios is crucial for LLMs to ensure their reliability, but this exploratio… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  20. arXiv:2406.11672  [pdf, other

    cs.CV

    Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting

    Authors: Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim

    Abstract: 3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its p… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: project page: https://junhahyung.github.io/erankgs.github.io

  21. arXiv:2406.11427  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

    Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

    Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  22. arXiv:2406.06947  [pdf, other

    cs.AI cs.HC

    CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

    Authors: Junhee Cho, Jihoon Kim, Daseul Bae, Jinho Choo, Youngjune Gwon, Yeong-Dae Kwon

    Abstract: Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML sour… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures; (19 pages and 6 figures more in appendix)

  23. arXiv:2406.06072  [pdf, other

    cs.CV cs.LG cs.RO

    Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

    Authors: Dongyoon Hwang, Byungkun Lee, Hojoon Lee, Hyunseung Kim, Jaegul Choo

    Abstract: Vision Transformers (ViT), when paired with large-scale pretraining, have shown remarkable performance across various computer vision tasks, primarily due to their weak inductive bias. However, while such weak inductive bias aids in pretraining scalability, this may hinder the effective adaptation of ViTs for visuo-motor control tasks as a result of the absence of control-centric inductive biases.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  24. arXiv:2406.06037  [pdf, other

    cs.LG cs.AI cs.CV

    Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

    Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo

    Abstract: Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  25. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  26. arXiv:2406.05432  [pdf, other

    cs.CV

    Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

    Authors: Minho Park, Sunghyun Park, Jooyeol Yun, Jaegul Choo

    Abstract: Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models, which prove particularly valuable in scenarios where real-world data is limited. In this study, our goal is to address the challenges when fine-tuning vision-language models (e.g., CLIP) on generated datasets. Specifically, we aim to fine-tune visio… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  27. arXiv:2406.02596  [pdf, other

    cs.LG cs.AI

    Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

    Authors: Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle

    Abstract: This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  28. arXiv:2406.02331  [pdf, other

    cs.CL

    Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering

    Authors: ChaeHun Park, Koanho Lee, Hyesu Lim, Jaeseok Kim, Junmo Park, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

    Abstract: Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings Accepted

  29. arXiv:2406.01506  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    The Geometry of Categorical and Hierarchical Concepts in Large Language Models

    Authors: Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch

    Abstract: Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations

  30. arXiv:2406.00324  [pdf, other

    cs.LG cs.AI

    Do's and Don'ts: Learning Desirable Skills with Instruction Videos

    Authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Donghu Kim, Jaegul Choo

    Abstract: Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle wi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  31. arXiv:2405.18832  [pdf, other

    cs.LG cs.AI cs.AR

    MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

    Authors: Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim

    Abstract: Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter move… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to DAC 2024

  32. arXiv:2405.18368  [pdf, other

    cs.CV

    The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

    Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

    Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 1 table

  33. arXiv:2405.13154  [pdf, other

    cs.HC

    Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming

    Authors: Xinyi Zhang, Pamela J. Wisniewski, Jin-hee Cho, Lifu Huang, Sang Won Lee

    Abstract: Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artifi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  34. arXiv:2405.09806  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis

    Authors: Joseph Cho, Cyril Zakka, Dhamanpreet Kaur, Rohan Shad, Ross Wightman, Akshay Chaudhari, William Hiesinger

    Abstract: Diffusion models have recently gained significant traction due to their ability to generate high-fidelity and diverse images and videos conditioned on text prompts. In medicine, this application promises to address the critical challenge of data scarcity, a consequence of barriers in data sharing, stringent patient privacy regulations, and disparities in patient population and demographics. By gen… ▽ More

    Submitted 10 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  35. arXiv:2405.08142  [pdf

    cs.CL cs.CY

    Discursive objection strategies in online comments: Developing a classification schema and validating its training

    Authors: Ashley L. Shea, Aspen K. B. Omapang, Ji Yong Cho, Miryam Y. Ginsparg, Natalie Bazarova, Winice Hui, René F. Kizilcec, Chau Tong, Drew Margolin

    Abstract: Most Americans agree that misinformation, hate speech and harassment are harmful and inadequately curbed on social media through current moderation practices. In this paper, we aim to understand the discursive strategies employed by people in response to harmful speech in news comments. We conducted a content analysis of more than 6500 comment replies to trending news videos on YouTube and Twitter… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: This paper was accepted and presented at the 73rd Annual International Communication Association International Conference, May 2023

    ACM Class: I.2.7, J.4

  36. arXiv:2405.07896  [pdf, other

    cs.AI cs.HC cs.IR cs.LG

    Almanac Copilot: Towards Autonomous Electronic Health Record Navigation

    Authors: Cyril Zakka, Joseph Cho, Gracia Fahed, Rohan Shad, Michael Moor, Robyn Fong, Dhamanpreet Kaur, Vishnu Ravi, Oliver Aalami, Roxana Daneshjou, Akshay Chaudhari, William Hiesinger

    Abstract: Clinicians spend large amounts of time on clinical documentation, and inefficiencies impact quality of care and increase clinician burnout. Despite the promise of electronic medical records (EMR), the transition from paper-based records has been negatively associated with clinician wellness, in part due to poor user experience, increased burden of documentation, and alert fatigue. In this study, w… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

  37. arXiv:2405.03958  [pdf, other

    cs.CV cs.AI cs.LG

    Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

    Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

    Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  38. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  39. arXiv:2404.19753  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DOCCI: Descriptions of Connected and Contrasting Images

    Authors: Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

    Abstract: Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  40. arXiv:2404.19250  [pdf, other

    cs.CV

    Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

    Authors: Jeonghoon Park, Chaeyeon Chung, Juyoung Lee, Jaegul Choo

    Abstract: In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias… ▽ More

    Submitted 17 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  41. arXiv:2404.18826  [pdf, other

    cs.SI

    Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization

    Authors: Qi Zhang, Lance M. Kaplan, Audun Jøsang, Dong Hyun. Jeong, Feng Chen, Jin-Hee Cho

    Abstract: Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, submitted to ASONAM 2024

  42. arXiv:2404.16137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM

    Authors: Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang

    Abstract: High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this wor… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 5 pages, under review

  43. arXiv:2404.12404  [pdf, other

    cs.LG cs.AI

    Exploring Prompting Methods for Mitigating Class Imbalance through Synthetic Data Generation with Large Language Models

    Authors: Jinhee Kim, Taesung Kim, Jaegul Choo

    Abstract: Large language models (LLMs) have demonstrated impressive in-context learning capabilities across various domains. Inspired by this, our study explores the effectiveness of LLMs in generating realistic tabular data to mitigate class imbalance. We investigate and identify key prompt design elements such as data format, class presentation, and variable mapping to optimize the generation performance.… ▽ More

    Submitted 26 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  44. arXiv:2404.10980  [pdf, other

    cs.CV cs.LG

    Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty

    Authors: Changbin Li, Kangshuo Li, Yuzhe Ou, Lance M. Kaplan, Audun Jøsang, Jin-Hee Cho, Dong Hyun Jeong, Feng Chen

    Abstract: Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: In Proceedings of The Twelfth International Conference on Learning Representations, ICLR 2024

  45. arXiv:2404.09967  [pdf, other

    cs.CV cs.AI cs.LG

    Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

    Authors: Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal

    Abstract: ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses. However, when it comes to controllable video generation, ControlNets cannot be directly integrated into new backbones due to feature space mismatches, and training ControlNets for new backbones can be a significant burden for m… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: First two authors contributed equally; Project page: https://ctrl-adapter.github.io/

  46. arXiv:2404.02781  [pdf, other

    eess.AS cs.SD

    CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

    Authors: Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho

    Abstract: With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis. Despite the ongoing rush towards scaling paradigms, audio tokenization ironically amplifies the scalability challenge, stemming from its long sequence length and the complex… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  47. arXiv:2404.01808  [pdf, other

    cs.CR

    Software-Defined Cryptography: A Design Feature of Cryptographic Agility

    Authors: Jihoon Cho, Changhoon Lee, Eunkyung Kim, Jieun Lee, Beumjin Cho

    Abstract: Cryptographic agility, or crypto-agility, is a design feature that enables agile updates to new cryptographic algorithms and standards without the need to modify or replace the surrounding infrastructure. This paper examines the prerequisites for crypto-agility and proposes its desired design feature. More specifically, we investigate the design characteristics of widely deployed cybersecurity par… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  48. arXiv:2404.01015  [pdf, other

    cs.CL

    PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

    Authors: ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo

    Abstract: Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by considering their relevance to previous dialogue histories. Although effective, these metrics evaluate individual responses directly rather than considering their relative quality compared to other… ▽ More

    Submitted 17 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: COLM2024 (accepted)

  49. arXiv:2404.00741  [pdf, other

    cs.CV

    Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts

    Authors: Qin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer

    Abstract: The goal of interactive image segmentation is to delineate specific regions within an image via visual or language prompts. Low-latency and high-quality interactive segmentation with diverse prompts remain challenging for existing specialist and generalist models. Specialist models, with their limited prompts and task-specific designs, experience high latency because the image must be recomputed e… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: CVPR 2024 https://github.com/uncbiag/SegNext

  50. arXiv:2403.19985  [pdf, other

    cs.CV

    Stable Surface Regularization for Fast Few-Shot NeRF

    Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-Jin Yoon

    Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 3DV 2024