Search | arXiv e-print repository

MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline

Authors: Donghoon Han, Eunhwan Park, Gisang Lee, Adam Lee, Nojun Kwak

Abstract: The rapid expansion of multimedia content has made accurately retrieving relevant videos from large collections increasingly challenging. Recent advancements in text-video retrieval have focused on cross-modal interactions, large-scale foundation model training, and probabilistic modeling, yet often neglect the crucial user perspective, leading to discrepancies between user queries and the content… ▽ More The rapid expansion of multimedia content has made accurately retrieving relevant videos from large collections increasingly challenging. Recent advancements in text-video retrieval have focused on cross-modal interactions, large-scale foundation model training, and probabilistic modeling, yet often neglect the crucial user perspective, leading to discrepancies between user queries and the content retrieved. To address this, we introduce MERLIN (Multimodal Embedding Refinement via LLM-based Iterative Navigation), a novel, training-free pipeline that leverages Large Language Models (LLMs) for iterative feedback learning. MERLIN refines query embeddings from a user perspective, enhancing alignment between queries and video content through a dynamic question answering process. Experimental results on datasets like MSR-VTT, MSVD, and ActivityNet demonstrate that MERLIN substantially improves Recall@1, outperforming existing systems and confirming the benefits of integrating LLMs into multimodal retrieval systems for more responsive and context-aware multimedia retrieval. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Work in progress

arXiv:2406.18459 [pdf, other]

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Authors: Younghyun Kim, Geunmin Hwang, Junyu Zhang, Eunbyung Park

Abstract: Recent surge in large-scale generative models has spurred the development of vast fields in computer vision. In particular, text-to-image diffusion models have garnered widespread adoption across diverse domain due to their potential for high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generate images of up to 1K resolution, which is far from meeti… ▽ More Recent surge in large-scale generative models has spurred the development of vast fields in computer vision. In particular, text-to-image diffusion models have garnered widespread adoption across diverse domain due to their potential for high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generate images of up to 1K resolution, which is far from meeting the demands of contemporary commercial applications. Directly sampling higher-resolution images often yields results marred by artifacts such as object repetition and distorted shapes. Addressing the aforementioned issues typically necessitates training or fine-tuning models on higher resolution datasets. However, this undertaking poses a formidable challenge due to the difficulty in collecting large-scale high-resolution contents and substantial computational resources. While several preceding works have proposed alternatives, they often fail to produce convincing results. In this work, we probe the generative ability of diffusion models at higher resolution beyond its original capability and propose a novel progressive approach that fully utilizes generated low-resolution image to guide the generation of higher resolution image. Our method obviates the need for additional training or fine-tuning which significantly lowers the burden of computational costs. Extensive experiments and results validate the efficiency and efficacy of our method. Project page: https://yhyun225.github.io/DiffuseHigh/ △ Less

Submitted 11 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.15102 [pdf, other]

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Authors: Seonggon Kim, Eunhyeok Park

Abstract: With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which… ▽ More With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13251 [pdf, other]

Freq-Mip-AA : Frequency Mip Representation for Anti-Aliasing Neural Radiance Fields

Authors: Youngin Park, Seungtae Nam, Cheul-hee Hahm, Eunbyung Park

Abstract: Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While… ▽ More Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While effective, this approach requires long training times due to its reliance on MLP architecture. In this work, we propose a novel anti-aliasing technique that utilizes grid-based representations, usually showing significantly faster training time. In addition, we exploit frequency-domain representation to handle the aliasing problem inspired by the sampling theorem. The proposed method, FreqMipAA, utilizes scale-specific low-pass filtering (LPF) and learnable frequency masks. Scale-specific low-pass filters (LPF) prevent aliasing and prioritize important image details, and learnable masks effectively remove problematic high-frequency elements while retaining essential information. By employing a scale-specific LPF and trainable masks, FreqMipAA can effectively eliminate the aliasing factor while retaining important details. We validated the proposed technique by incorporating it into a widely used grid-based method. The experimental results have shown that the FreqMipAA effectively resolved the aliasing issues and achieved state-of-the-art results in the multi-scale Blender dataset. Our code is available at https://github.com/yi0109/FreqMipAA . △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICIP 2024, 7 pages, 3 figures

arXiv:2405.17083 [pdf, other]

F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting

Authors: Xiangyu Sun, Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Usman Ali, Eunbyung Park

Abstract: The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering spee… ▽ More The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering speeds while maintaining excellent image quality. However, as it represents objects and scenes using a myriad of Gaussians, it requires substantial storage to achieve high-quality representation. To mitigate the storage overhead, we propose Factorized 3D Gaussian Splatting (F-3DGS), a novel approach that drastically reduces storage requirements while preserving image quality. Inspired by classical matrix and tensor factorization techniques, our method represents and approximates dense clusters of Gaussians with significantly fewer Gaussians through efficient factorization. We aim to efficiently represent dense 3D Gaussians by approximating them with a limited amount of information for each axis and their combinations. This method allows us to encode a substantially large number of Gaussians along with their essential attributes -- such as color, scale, and rotation -- necessary for rendering using a relatively small number of elements. Extensive experimental results demonstrate that F-3DGS achieves a significant reduction in storage costs while maintaining comparable quality in rendered images. △ Less

Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: Our project page including code is available at https://xiangyu1sun.github.io/Factorize-3DGS/

arXiv:2404.19381 [pdf, other]

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

Authors: Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim

Abstract: To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overc… ▽ More To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overcome such limitations, prior works propose application-specific HW units that are not suitable for practical CXL memory-based systems that should support various applications. On the other hand, existing CPU or GPU cores are not cost-effective for NDP because they are not optimized for memory-bound applications. In addition, the communication between the host processor and CXL controller for NDP offloading should achieve low latency, but the CXL$.$io (or PCIe) protocol incurs $μ$s-scale latency and is not suitable for fine-grain NDP. To achieve high-performance NDP end-to-end, we propose a low-overhead general-purpose NDP architecture for CXL memory referred to as Memory-Mapped NDP (M$^2$NDP), which comprises memory-mapped functions (M$^2$func) and memory-mapped $μ$threading (M$^2μ$thr). The M$^2$func is a CXL.mem-compatible low-overhead communication mechanism between the host processor and NDP controller in the CXL memory. The M$^2μ$thr enables low-cost, general-purpose NDP unit design by introducing lightweight $μ$threads that support highly concurrent execution of NDP kernels with minimal resource wastage. By combining them, our M$^2$NDP achieves significant speedups for various applications, including in-memory OLAP, key-value store, large language model, recommendation model, and graph analytics by up to 128$\times$ (11.5$\times$ overall) and reduces energy by up to 87.9\% (80.1\% overall) compared to a baseline CPU or GPU host with passive CXL memory. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.14687 [pdf, other]

Pegasus-v1 Technical Report

Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.04913 [pdf, other]

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

Authors: Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park

Abstract: Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encod… ▽ More Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encoding and decoding time, compact model sizes, and high-quality renderings. Despite significant advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of a novel encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we develop a novel finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 150x and 20x reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets, such as ShapeNet and Objaverse. △ Less

Submitted 28 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Project page: https://gynjn.github.io/Codec-NeRF/

arXiv:2404.01745 [pdf, other]

Unleash the Potential of CLIP for Video Highlight Detection

Authors: Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

Abstract: Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train… ▽ More Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.00018 [pdf, other]

Can AI Outperform Human Experts in Creating Social Media Creatives?

Authors: Eunkyung Park, Raymond K. Wong, Junbum Kwon

Abstract: Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most… ▽ More Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions. △ Less

Submitted 19 March, 2024; originally announced April 2024.

Comments: 17 pages, 5 figures

MSC Class: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

arXiv:2403.15485 [pdf, other]

MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection

Authors: Junyeop Cha, Seoyun Kim, Dongjae Kim, Eunil Park

Abstract: Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text,… ▽ More Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text, images, or videos. To overcome this limitation, we introduce a Multimodal Object-Oriented Graph Attention Model (MOGAM), which can be applied to diverse types of data, offering a more scalable and versatile solution. Furthermore, to ensure that our model can capture authentic symptoms of depression, we only include vlogs from users with a clinical diagnosis. To leverage the diverse features of vlogs, we adopt a multimodal approach and collect additional metadata such as the title, description, and duration of the vlogs. To effectively aggregate these multimodal features, we employed a cross-attention mechanism. MOGAM achieved an accuracy of 0.871 and an F1-score of 0.888. Moreover, to validate the scalability of MOGAM, we evaluated its performance with a benchmark dataset and achieved comparable results with prior studies (0.61 F1-score). In conclusion, we believe that the proposed model, MOGAM, is an effective solution for detecting depression in social media, offering potential benefits in the early detection and treatment of this mental health condition. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures, 4 tables

arXiv:2403.13909 [pdf, other]

Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

Authors: Yimeng Fan, Pedram Agand, Mo Chen, Edward J. Park, Allison Kennedy, Chanwoo Bae

Abstract: The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static… ▽ More The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI} △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures, AAAI 2024 student abstract

arXiv:2403.06342 [pdf, other]

Separable Physics-informed Neural Networks for Solving the BGK Model of the Boltzmann Equation

Authors: Jaemin Oh, Seung Yeon Cho, Seok-Bae Yun, Eunbyung Park, Youngjoon Hong

Abstract: In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, w… ▽ More In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, which can compromise the mesh-free benefit and increase computational costs. To address this, we leverage the canonical polyadic decomposition structure of SPINNs and the linear nature of moment calculation, achieving a substantial reduction in computational expense for quadrature rule application. The multi-scale nature of the particle density function poses difficulties in precisely approximating macroscopic moments using neural networks. To improve SPINN training, we introduce the integration of Gaussian functions into SPINNs, coupled with a relative loss approach. This modification enables SPINNs to decay as rapidly as Maxwellian distributions, thereby enhancing the accuracy of macroscopic moment approximations. The relative loss design further ensures that both large and small-scale features are effectively captured by the SPINNs. The efficacy of our approach is demonstrated through a series of five numerical experiments, including the solution to a challenging 3D Riemann problem. These results highlight the potential of our novel method in efficiently and accurately addressing complex challenges in computational physics. △ Less

Submitted 10 March, 2024; originally announced March 2024.

MSC Class: 68T20; 35R09

arXiv:2402.18293 [pdf, other]

Continuous Memory Representation for Anomaly Detection

Authors: Joo Chan Lee, Taejune Kim, Eunbyung Park, Simon S. Woo, Jong Hwan Ko

Abstract: There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space i… ▽ More There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from poor generalization or an identity shortcut issue outputting the same as input, respectively. Furthermore, the majority of existing methods are designed to detect single-class anomalies, resulting in unsatisfactory performance when presented with multiple classes of objects. To tackle all of the above challenges, we propose CRAD, a novel anomaly detection method for representing normal features within a "continuous" memory, enabled by transforming spatial features into coordinates and mapping them to continuous grids. Furthermore, we carefully design the grids tailored for anomaly detection, representing both local and global normal features and fusing them effectively. Our extensive experiments demonstrate that CRAD successfully generalizes the normal features and mitigates the identity shortcut, furthermore, CRAD effectively handles diverse classes in a single model thanks to the high-granularity continuous representation. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection. The project page is available at https://tae-mo.github.io/crad/. △ Less

Submitted 10 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Project page: https://tae-mo.github.io/crad/

arXiv:2402.14604 [pdf, other]

Embeddings and near-neighbor searching with constant additive error for hyperbolic spaces

Authors: Eunku Park, Antoine Vigneron

Abstract: We give an embedding of the Poincaré halfspace $H^D$ into a discrete metric space based on a binary tiling of $H^D$, with additive distortion $O(\log D)$. It yields the following results. We show that any subset $P$ of $n$ points in $H^D$ can be embedded into a graph-metric with $2^{O(D)}n$ vertices and edges, and with additive distortion $O(\log D)$. We also show how to construct, for any $k$, an… ▽ More We give an embedding of the Poincaré halfspace $H^D$ into a discrete metric space based on a binary tiling of $H^D$, with additive distortion $O(\log D)$. It yields the following results. We show that any subset $P$ of $n$ points in $H^D$ can be embedded into a graph-metric with $2^{O(D)}n$ vertices and edges, and with additive distortion $O(\log D)$. We also show how to construct, for any $k$, an $O(k\log D)$-purely additive spanner of $P$ with $2^{O(D)}n$ Steiner vertices and $2^{O(D)}n \cdot λ_k(n)$ edges, where $λ_k(n)$ is the $k$th-row inverse Ackermann function. Finally, we show how to construct an approximate Voronoi diagram for $P$ of size $2^{O(D)}n$. It allows us to answer approximate near-neighbor queries in $2^{O(D)}+O(\log n)$ time, with additive error $O(\log D)$. These constructions can be done in $2^{O(D)}n \log n$ time. △ Less

Submitted 1 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14196 [pdf, other]

Mip-Grid: Anti-aliased Grid Representations for Neural Radiance Fields

Authors: Seungtae Nam, Daniel Rho, Jong Hwan Ko, Eunbyung Park

Abstract: Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP ar… ▽ More Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP architecture to represent the radiance fields, missing out on the fast training speed offered by the latest grid-based methods. In this work, we present mip-Grid, a novel approach that integrates anti-aliasing techniques into grid-based representations for radiance fields, mitigating the aliasing artifacts while enjoying fast training time. The proposed method generates multi-scale grids by applying simple convolution operations over a shared grid representation and uses the scale-aware coordinate to retrieve features at different scales from the generated multi-scale grids. To test the effectiveness, we integrated the proposed method into the two recent representative grid-based methods, TensoRF and K-Planes. Experimental results demonstrate that mip-Grid greatly improves the rendering performance of both methods and even outperforms mip-NeRF on multi-scale datasets while achieving significantly faster training time. For code and demo videos, please see https://stnamjef.github.io/mipgrid.github.io/. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted to NeurIPS 2023

arXiv:2401.17547 [pdf, other]

Task-Oriented Diffusion Model Compression

Authors: Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho

Abstract: As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper,… ▽ More As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper, we explore the compression potential of these I2I models in a task-oriented manner and introduce a novel method for reducing both model size and the number of timesteps. Through extensive experiments, we observe key insights and use our empirical knowledge to develop practical solutions that aim for near-optimal results with minimal exploration costs. We validate the effectiveness of our method by applying it to InstructPix2Pix for image editing and StableSR for image restoration. Our approach achieves satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7% decrease in latency to InstructPix2Pix and StableSR, respectively. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.00834 [pdf, other]

Deblurring 3D Gaussian Splatting

Authors: Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park

Abstract: Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model… ▽ More Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, Deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While Deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. Qualitative results are available at https://benhenryl.github.io/Deblurring-3D-Gaussian-Splatting/ △ Less

Submitted 26 May, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 29 pages, 16 figures

arXiv:2401.00825 [pdf, other]

Sharp-NeRF: Grid-based Fast Deblurring Neural Radiance Fields Using Sharpness Prior

Authors: Byeonghyeon Lee, Howoong Lee, Usman Ali, Eunbyung Park

Abstract: Neural Radiance Fields (NeRF) have shown remarkable performance in neural rendering-based novel view synthesis. However, NeRF suffers from severe visual quality degradation when the input images have been captured under imperfect conditions, such as poor illumination, defocus blurring, and lens aberrations. Especially, defocus blur is quite common in the images when they are normally captured usin… ▽ More Neural Radiance Fields (NeRF) have shown remarkable performance in neural rendering-based novel view synthesis. However, NeRF suffers from severe visual quality degradation when the input images have been captured under imperfect conditions, such as poor illumination, defocus blurring, and lens aberrations. Especially, defocus blur is quite common in the images when they are normally captured using cameras. Although few recent studies have proposed to render sharp images of considerably high-quality, yet they still face many key challenges. In particular, those methods have employed a Multi-Layer Perceptron (MLP) based NeRF, which requires tremendous computational time. To overcome these shortcomings, this paper proposes a novel technique Sharp-NeRF -- a grid-based NeRF that renders clean and sharp images from the input blurry images within half an hour of training. To do so, we used several grid-based kernels to accurately model the sharpness/blurriness of the scene. The sharpness level of the pixels is computed to learn the spatially varying blur kernels. We have conducted experiments on the benchmarks consisting of blurry images and have evaluated full-reference and non-reference metrics. The qualitative and quantitative results have revealed that our approach renders the sharp novel views with vivid colors and fine details, and it has considerably faster training time than the previous works. Our project page is available at https://benhenryl.github.io/SharpNeRF/ △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Accepted to WACV 2024

arXiv:2312.07905 [pdf, ps, other]

Plant Disease Recognition Datasets in the Age of Deep Learning: Challenges and Opportunities

Authors: Mingle Xu, Ji Eun Park, Jaehwan Lee, Jucheng Yang, Sook Yoon

Abstract: Plant disease recognition has witnessed a significant improvement with deep learning in recent years. Although plant disease datasets are essential and many relevant datasets are public available, two fundamental questions exist. First, how to differentiate datasets and further choose suitable public datasets for specific applications? Second, what kinds of characteristics of datasets are desired… ▽ More Plant disease recognition has witnessed a significant improvement with deep learning in recent years. Although plant disease datasets are essential and many relevant datasets are public available, two fundamental questions exist. First, how to differentiate datasets and further choose suitable public datasets for specific applications? Second, what kinds of characteristics of datasets are desired to achieve promising performance in real-world applications? To address the questions, this study explicitly propose an informative taxonomy to describe potential plant disease datasets. We further provide several directions for future, such as creating challenge-oriented datasets and the ultimate objective deploying deep learning in real-world applications with satisfactory performance. In addition, existing related public RGB image datasets are summarized. We believe that this study will contributing making better datasets and that this study will contribute beyond plant disease recognition such as plant species recognition. To facilitate the community, our project is public https://github.com/xml94/PPDRD with the information of relevant public datasets. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Submission v1 to a journal

arXiv:2312.03517 [pdf, other]

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

Authors: Junhyuk So, Jungwon Lee, Eunhyeok Park

Abstract: The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denois… ▽ More The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks. △ Less

Submitted 2 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Work in progress. Project page : https://jungwon-lee.github.io/Project_FRDiff/

arXiv:2311.14993 [pdf, other]

Coordinate-Aware Modulation for Neural Fields

Authors: Joo Chan Lee, Daniel Rho, Seungtae Nam, Jong Hwan Ko, Eunbyung Park

Abstract: Neural fields, mapping low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other h… ▽ More Neural fields, mapping low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other hand, methods using grids are free from spectral bias and achieve fast training speed, however, at the expense of high spatial complexity. In this work, we propose a novel way for exploiting both MLPs and grid representations in neural fields. Unlike the prevalent methods that combine them sequentially (extract features from the grids first and feed them to the MLP), we inject spectral bias-free grid representations into the intermediate features in the MLP. More specifically, we suggest a Coordinate-Aware Modulation (CAM), which modulates the intermediate features using scale and shift parameters extracted from the grid representations. This can maintain the strengths of MLPs while mitigating any remaining potential biases, facilitating the rapid learning of high-frequency components. In addition, we empirically found that the feature normalizations, which have not been successful in neural filed literature, proved to be effective when applied in conjunction with the proposed CAM. Experimental results demonstrate that CAM enhances the performance of neural representation and improves learning stability across a range of signals. Especially in the novel view synthesis task, we achieved state-of-the-art performance with the least number of parameters and fast training speed for dynamic scenes and the best performance under 1MB memory for static scenes. CAM also outperforms the best-performing video compression methods using neural fields by a large margin. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: Project page: http://maincold2.github.io/cam/

arXiv:2311.13681 [pdf, other]

Compact 3D Gaussian Representation for Radiance Field

Authors: Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park

Abstract: Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-ba… ▽ More Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25$\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/. △ Less

Submitted 15 February, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Project page: http://maincold2.github.io/c3dgs/

arXiv:2310.13123 [pdf, other]

doi 10.1016/j.oceaneng.2023.115271

Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study

Authors: Pedram Agand, Allison Kennedy, Trevor Harris, Chanwoo Bae, Mo Chen, Edward J Park

Abstract: As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the… ▽ More As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research. △ Less

Submitted 23 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 20 pages, 11 figures, 7 tables

Journal ref: Ocean Engineering 284 (2023): 115271

arXiv:2309.16884 [pdf, other]

An MCTS-DRL Based Obstacle and Occlusion Avoidance Methodology in Robotic Follow-Ahead Applications

Authors: Sahar Leisiazar, Edward J. Park, Angelica Lim, Mo Chen

Abstract: We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile… ▽ More We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile robot. Monte Carlo Tree Search is integrated with a Deep Reinforcement Learning method to enhance the performance of the decision-making process and generate more reliable navigational goals. Through extensive experimentation and analysis, we demonstrate the effectiveness and superiority of our proposed approach in comparison to the existing follow-ahead human-following robotic methods. Our code is available at https://github.com/saharLeisiazar/follow-ahead-ros. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.06922 [pdf, other]

Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning

Authors: Sanghyeon Kim, Hyunmo Yang, Younghyun Kim, Youngjoon Hong, Eunbyung Park

Abstract: The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis th… ▽ More The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, due to its multi-head computational branches, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed adaptation method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, encompassing comparisons and ablation studies, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. Our code is available on \url{https://github.com/extremebird/Hydra} △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2307.09455 [pdf, other]

Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

Authors: Jaeyoung Kim, Kyuheon Jung, Dongbin Na, Sion Jang, Eunbin Park, Sungchul Choi

Abstract: For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since the… ▽ More For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks. △ Less

Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 12 pages, 2 figures

MSC Class: 68T50

Journal ref: Findings of the Association for Computational Linguistics: ACL 2023 (2023) 1469-1482

arXiv:2306.15969 [pdf, other]

Separable Physics-Informed Neural Networks

Authors: Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, Eunbyung Park

Abstract: Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but… ▽ More Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but it is severely limited due to the expensive computational costs and heavy memory overhead. To overcome this issue, we propose a network architecture and training algorithm for PINNs. The proposed method, separable PINN (SPINN), operates on a per-axis basis to significantly reduce the number of network propagations in multi-dimensional PDEs unlike point-wise processing in conventional PINNs. We also propose using forward-mode automatic differentiation to reduce the computational cost of computing PDE residuals, enabling a large number of collocation points (>10^7) on a single commodity GPU. The experimental results show drastically reduced computational costs (62x in wall-clock time, 1,394x in FLOPs given the same number of collocation points) in multi-dimensional PDEs while achieving better accuracy. Furthermore, we present that SPINN can solve a chaotic (2+1)-d Navier-Stokes equation significantly faster than the best-performing prior method (9 minutes vs 10 hours in a single GPU), maintaining accuracy. Finally, we showcase that SPINN can accurately obtain the solution of a highly nonlinear and multi-dimensional PDE, a (3+1)-d Navier-Stokes equation. For visualized results and code, please see https://jwcho5576.github.io/spinn.github.io/. △ Less

Submitted 31 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: To appear in NeurIPS 2023 (28 pages, 13 figures). workshop paper: arXiv:2211.08761

arXiv:2306.02316 [pdf, other]

Temporal Dynamic Quantization for Diffusion Models

Authors: Junhyuk So, Jungwon Lee, Daehyun Ahn, Hyungjun Kim, Eunhyeok Park

Abstract: The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property o… ▽ More The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets. △ Less

Submitted 11 December, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2306.02272 [pdf, other]

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Authors: Changhun Lee, Jungyu Jin, Taesu Kim, Hyungjun Kim, Eunhyeok Park

Abstract: Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantizat… ▽ More Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantization, storing them in high-precision, while applying highly tuned quantization to the remaining dense weights. This sensitivity-aware mixed-precision scheme reduces the quantization error notably, and extensive experiments demonstrate that 3.1-bit models using OWQ perform comparably to 4-bit models optimized by OPTQ. Furthermore, OWQ incorporates a parameter-efficient fine-tuning for task-specific adaptation, called weak column tuning (WCT), enabling accurate task-specific LLM adaptation with minimal memory overhead in the optimized format. OWQ represents a notable advancement in the flexibility, efficiency, and practicality of LLM optimization literature. The source code is available at https://github.com/xvyaward/owq △ Less

Submitted 23 January, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted at AAAI 2024 (oral presentation)

arXiv:2304.02330 [pdf, other]

SMPConv: Self-moving Point Representations for Continuous Convolution

Authors: Sanghyeon Kim, Eunbyung Park

Abstract: Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons… ▽ More Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons (MLPs), is by far the most prevalent approach to implementing continuous convolution. However, there are a few drawbacks, such as high computational costs, complex hyperparameter tuning, and limited descriptive power of filters. This paper suggests an alternative approach to building a continuous convolution without neural networks, resulting in more computationally efficient and improved performance. We present self-moving point representations where weight parameters freely move, and interpolation schemes are used to implement continuous functions. When applied to construct convolutional kernels, the experimental results have shown improved performance with drop-in replacement in the existing frameworks. Due to its lightweight structure, we are first to demonstrate the effectiveness of continuous convolution in a large-scale setting, e.g., ImageNet, presenting the improvements over the prior arts. Our code is available on https://github.com/sangnekim/SMPConv △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted to CVPR 2023

arXiv:2303.15699 [pdf, other]

Enhancing Breast Cancer Risk Prediction by Incorporating Prior Images

Authors: Hyeonsoo Lee, Junha Kim, Eunkyung Park, Minjeong Kim, Taesoo Kim, Thijs Kooi

Abstract: Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method th… ▽ More Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method that only uses mammograms from a single time point. We validate our approach on a dataset with 16,113 exams and further demonstrate that it effectively captures patterns of changes from prior mammograms, such as changes in breast density, resulting in improved short-term and long-term breast cancer risk prediction. Experimental results show that our model achieves a statistically significant improvement in performance over the state-of-the-art based model, with a C-index increase from 0.68 to 0.73 (p < 0.05) on held-out test sets. △ Less

Submitted 28 August, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: MICCAI 2023 accepted

arXiv:2212.12294 [pdf, other]

doi 10.1145/3581783.3612444

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

Authors: Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Eunbyung Park

Abstract: Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which… ▽ More Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms. △ Less

Submitted 6 August, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: Our project page including code is available at https://maincold2.github.io/ffnerv/

arXiv:2212.09069 [pdf, other]

Masked Wavelet Representation for Compact Neural Radiance Fields

Authors: Daniel Rho, Byeonghyeon Lee, Seungtae Nam, Joo Chan Lee, Jong Hwan Ko, Eunbyung Park

Abstract: Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using a… ▽ More Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked_wavelet_nerf. △ Less

Submitted 21 March, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: Accepted to CVPR 2023

arXiv:2212.05081 [pdf, other]

doi 10.1088/2632-2153/ad12e3

FAIR AI Models in High Energy Physics

Authors: Javier Duarte, Haoyang Li, Avik Roy, Ruike Zhu, E. A. Huerta, Daniel Diaz, Philip Harris, Raghav Kansal, Daniel S. Katz, Ishaan H. Kavoori, Volodymyr V. Kindratenko, Farouk Mokhtar, Mark S. Neubauer, Sang Eon Park, Melissa Quinnan, Roger Rusack, Zhizhen Zhao

Abstract: The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly… ▽ More The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability. △ Less

Submitted 29 December, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 34 pages, 9 figures, 10 tables

Journal ref: Mach. Learn.: Sci. Technol. 4 (2023) 045062

arXiv:2211.08761 [pdf, other]

Separable PINN: Mitigating the Curse of Dimensionality in Physics-Informed Neural Networks

Authors: Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, Eunbyung Park

Abstract: Physics-informed neural networks (PINNs) have emerged as new data-driven PDE solvers for both forward and inverse problems. While promising, the expensive computational costs to obtain solutions often restrict their broader applicability. We demonstrate that the computations in automatic differentiation (AD) can be significantly reduced by leveraging forward-mode AD when training PINN. However, a… ▽ More Physics-informed neural networks (PINNs) have emerged as new data-driven PDE solvers for both forward and inverse problems. While promising, the expensive computational costs to obtain solutions often restrict their broader applicability. We demonstrate that the computations in automatic differentiation (AD) can be significantly reduced by leveraging forward-mode AD when training PINN. However, a naive application of forward-mode AD to conventional PINNs results in higher computation, losing its practical benefit. Therefore, we propose a network architecture, called separable PINN (SPINN), which can facilitate forward-mode AD for more efficient computation. SPINN operates on a per-axis basis instead of point-wise processing in conventional PINNs, decreasing the number of network forward passes. Besides, while the computation and memory costs of standard PINNs grow exponentially along with the grid resolution, that of our model is remarkably less susceptible, mitigating the curse of dimensionality. We demonstrate the effectiveness of our model in various PDE systems by significantly reducing the training run-time while achieving comparable accuracy. Project page: https://jwcho5576.github.io/spinn/ △ Less

Submitted 2 November, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: To appear in NeurIPS 2022 Workshop on The Symbiosis of Deep Learning and Differential Equations (DLDE) - II, 12 pages, 5 figures, full paper: arXiv:2306.15969

arXiv:2208.05484 [pdf, other]

doi 10.1007/JHEP07(2023)108

Neural Embedding: Learning the Embedding of the Manifold of Physics Data

Authors: Sang Eon Park, Philip Harris, Bryan Ostdiek

Abstract: In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedd… ▽ More In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets. △ Less

Submitted 14 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.00338 [pdf, other]

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization

Authors: Sein Park, Yeongsang Jang, Eunhyeok Park

Abstract: Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimiza… ▽ More Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. Based on these insights, we propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL). Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization on existing post-training quantization (PTQ) and quantization-aware training (QAT) algorithms and enables us to obtain a single weight flexible enough to maintain the output quality under various conditions. We conduct extensive studies on CIFAR and ImageNet datasets and validate the effectiveness of the proposed methods. △ Less

Submitted 30 July, 2022; originally announced August 2022.

arXiv:2207.12800 [pdf, other]

PIXEL: Physics-Informed Cell Representations for Fast and Accurate PDE Solvers

Authors: Namgyu Kang, Byeonghyeon Lee, Youngjoon Hong, Seok-Bae Yun, Eunbyung Park

Abstract: With the increases in computational power and advances in machine learning, data-driven learning-based methods have gained significant attention in solving PDEs. Physics-informed neural networks (PINNs) have recently emerged and succeeded in various forward and inverse PDE problems thanks to their excellent properties, such as flexibility, mesh-free solutions, and unsupervised training. However, t… ▽ More With the increases in computational power and advances in machine learning, data-driven learning-based methods have gained significant attention in solving PDEs. Physics-informed neural networks (PINNs) have recently emerged and succeeded in various forward and inverse PDE problems thanks to their excellent properties, such as flexibility, mesh-free solutions, and unsupervised training. However, their slower convergence speed and relatively inaccurate solutions often limit their broader applicability in many science and engineering domains. This paper proposes a new kind of data-driven PDEs solver, physics-informed cell representations (PIXEL), elegantly combining classical numerical methods and learning-based approaches. We adopt a grid structure from the numerical methods to improve accuracy and convergence speed and overcome the spectral bias presented in PINNs. Moreover, the proposed method enjoys the same benefits in PINNs, e.g., using the same optimization frameworks to solve both forward and inverse PDE problems and readily enforcing PDE constraints with modern automatic differentiation techniques. We provide experimental results on various challenging PDEs that the original PINNs have struggled with and show that PIXEL achieves fast convergence speed and high accuracy. Project page: https://namgyukang.github.io/PIXEL/ △ Less

Submitted 4 February, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: Accepted to the 37th AAAI Conference on Artificial Intelligence (AAAI 2023) Main Track, DLDE-II NeurIPS 2022 Workshop (Spotlight), https://namgyukang.github.io/PIXEL/ (Project Page), https://github.com/NamGyuKang/CosineSampler (Customized CUDA) (17 pages, 17 figures)

arXiv:2207.09663 [pdf, other]

Streamable Neural Fields

Authors: Junwoo Cho, Seungtae Nam, Daniel Rho, Jong Hwan Ko, Eunbyung Park

Abstract: Neural fields have emerged as a new data representation paradigm and have shown remarkable success in various signal representations. Since they preserve signals in their network parameters, the data transfer by sending and receiving the entire model parameters prevents this emerging technology from being used in many practical scenarios. We propose streamable neural fields, a single model that co… ▽ More Neural fields have emerged as a new data representation paradigm and have shown remarkable success in various signal representations. Since they preserve signals in their network parameters, the data transfer by sending and receiving the entire model parameters prevents this emerging technology from being used in many practical scenarios. We propose streamable neural fields, a single model that consists of executable sub-networks of various widths. The proposed architectural and training techniques enable a single network to be streamable over time and reconstruct different qualities and parts of signals. For example, a smaller sub-network produces smooth and low-frequency signals, while a larger sub-network can represent fine details. Experimental results have shown the effectiveness of our method in various domains, such as 2D images, videos, and 3D signed distance functions. Finally, we demonstrate that our proposed method improves training stability, by exploiting parameter sharing. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: To appear in ECCV 2022

arXiv:2207.07597 [pdf, other]

OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text

Authors: Minsang Kim, Sang-hyun Je, Eunjoo Park

Abstract: In recent years, creating and managing knowledge bases have become crucial to the retail product and enterprise domains. We present an automatic knowledge base construction system that mines data from documents. This system can generate training data during the training process without human intervention. Therefore, it is domain-agnostic trainable using only the target domain text corpus and a pre… ▽ More In recent years, creating and managing knowledge bases have become crucial to the retail product and enterprise domains. We present an automatic knowledge base construction system that mines data from documents. This system can generate training data during the training process without human intervention. Therefore, it is domain-agnostic trainable using only the target domain text corpus and a pre-defined knowledge base. This system is called OASYS and is the first system built with the Korean language in mind. In addition, we also have constructed a new human-annotated benchmark dataset of the Korean Wikipedia corpus paired with a Korean DBpedia to aid system evaluation. The system performance results on human-annotated benchmark test dataset are meaningful and show that the generated knowledge base from OASYS trained on only auto-generated data is useful. We provide both a human-annotated test dataset and an auto-generated dataset. △ Less

Submitted 29 June, 2022; originally announced July 2022.

Comments: ACM SIGKDD Workshop on Mining and Learning with Graphs 2022, Accepted

arXiv:2206.00820 [pdf, other]

NIPQ: Noise proxy-based Integrated Pseudo-Quantization

Authors: Juncheol Shin, Junhyuk So, Sein Park, Seungyeop Kang, Sungjoo Yoo, Eunhyeok Park

Abstract: Straight-through estimator (STE), which enables the gradient flow over the non-differentiable function via approximation, has been favored in studies related to quantization-aware training (QAT). However, STE incurs unstable convergence during QAT, resulting in notable quality degradation in low precision. Recently, pseudoquantization training has been proposed as an alternative approach to updati… ▽ More Straight-through estimator (STE), which enables the gradient flow over the non-differentiable function via approximation, has been favored in studies related to quantization-aware training (QAT). However, STE incurs unstable convergence during QAT, resulting in notable quality degradation in low precision. Recently, pseudoquantization training has been proposed as an alternative approach to updating the learnable parameters using the pseudo-quantization noise instead of STE. In this study, we propose a novel noise proxy-based integrated pseudoquantization (NIPQ) that enables unified support of pseudoquantization for both activation and weight by integrating the idea of truncation on the pseudo-quantization framework. NIPQ updates all of the quantization parameters (e.g., bit-width and truncation boundary) as well as the network parameters via gradient descent without STE instability. According to our extensive experiments, NIPQ outperforms existing quantization algorithms in various vision and language applications by a large margin. △ Less

Submitted 1 July, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.11179 [pdf, other]

Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking

Authors: Ilchae Jung, Minji Kim, Eunhyeok Park, Bohyung Han

Abstract: This paper presents a novel hybrid representation learning framework for streaming data, where an image frame in a video is modeled by an ensemble of two distinct deep neural networks; one is a low-bit quantized network and the other is a lightweight full-precision network. The former learns coarse primary information with low cost while the latter conveys residual information for high fidelity to… ▽ More This paper presents a novel hybrid representation learning framework for streaming data, where an image frame in a video is modeled by an ensemble of two distinct deep neural networks; one is a low-bit quantized network and the other is a lightweight full-precision network. The former learns coarse primary information with low cost while the latter conveys residual information for high fidelity to original representations. The proposed parallel architecture is effective to maintain complementary information since fixed-point arithmetic can be utilized in the quantized network and the lightweight model provides precise representations given by a compact channel-pruned network. We incorporate the hybrid representation technique into an online visual tracking task, where deep neural networks need to handle temporal variations of target appearances in real-time. Compared to the state-of-the-art real-time trackers based on conventional deep neural networks, our tracking algorithm demonstrates competitive accuracy on the standard benchmarks with a small fraction of computational cost and memory footprint. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 7 pages, 1 figure, accepted at IJCAI2022

arXiv:2204.10836 [pdf, other]

doi 10.1038/s41467-022-33407-5

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing. △ Less

Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

arXiv:2204.07439 [pdf, other]

INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold

Authors: Changhun Lee, Hyungjun Kim, Eunhyeok Park, Jae-Joon Kim

Abstract: Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks, but they suffer from quality degradation due to the lack of freedom as activations and weights are constrained to the binary values. To compensate for the accuracy drop, we propose a novel BNN design called Binary Neural Network with INSTAnce-aware thresho… ▽ More Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks, but they suffer from quality degradation due to the lack of freedom as activations and weights are constrained to the binary values. To compensate for the accuracy drop, we propose a novel BNN design called Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which controls the quantization threshold dynamically in an input-dependent or instance-aware manner. According to our observation, higher-order statistics can be a representative metric to estimate the characteristics of the input distribution. INSTA-BNN is designed to adjust the threshold dynamically considering various information, including higher-order statistics, but it is also optimized judiciously to realize minimal overhead on a real device. Our extensive study shows that INSTA-BNN outperforms the baseline by 3.0% and 2.8% on the ImageNet classification task with comparable computing cost, achieving 68.5% and 72.2% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively. △ Less

Submitted 19 October, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: Accepted at ICCV 2023

arXiv:2203.10996 [pdf, other]

Technologies for AI-Driven Fashion Social Networking Service with E-Commerce

Authors: Jinseok Seol, Seongjae Kim, Sungchan Park, Holim Lim, Hyunsoo Na, Eunyoung Park, Dohee Jung, Soyoung Park, Kangwoo Lee, Sang-goo Lee

Abstract: The rapid growth of the online fashion market brought demands for innovative fashion services and commerce platforms. With the recent success of deep learning, many applications employ AI technologies such as visual search and recommender systems to provide novel and beneficial services. In this paper, we describe applied technologies for AI-driven fashion social networking service that incorporat… ▽ More The rapid growth of the online fashion market brought demands for innovative fashion services and commerce platforms. With the recent success of deep learning, many applications employ AI technologies such as visual search and recommender systems to provide novel and beneficial services. In this paper, we describe applied technologies for AI-driven fashion social networking service that incorporate fashion e-commerce. In the application, people can share and browse their outfit-of-the-day (OOTD) photos, while AI analyzes them and suggests similar style OOTDs and related products. To this end, we trained deep learning based AI models for fashion and integrated them to build a fashion visual search system and a recommender system for OOTD. With aforementioned technologies, the AI-driven fashion SNS platform, iTOO, has been successfully launched. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 16 pages, accepted in International Semantic Intelligence Conference (ISIC) 2022, The Applications and Deployment Track

arXiv:2203.08373 [pdf, other]

doi 10.36190/2022.66

Are you aware of what you are watching? Role of machine heuristic in online content recommendations

Authors: Soyoung Oh, Eunil Park

Abstract: Since recommender systems have been created and developed to automate the recommendation process, users can easily consume their desired video content on online platforms. In this line, several content recommendation algorithms are introduced and employed to allow users to encounter content of their interests, effectively. However, the recommendation systems sometimes regrettably recommend inappro… ▽ More Since recommender systems have been created and developed to automate the recommendation process, users can easily consume their desired video content on online platforms. In this line, several content recommendation algorithms are introduced and employed to allow users to encounter content of their interests, effectively. However, the recommendation systems sometimes regrettably recommend inappropriate content, including misinformation or fake news. To make it worse, people would unreservedly accept such content due to their cognitive heuristic, machine heuristic, which is the rule of thumb that machines are more accurate and trustworthy than humans. In this study, we designed and conducted a web-based experiment where the participants are invoked machine heuristic by experiencing the whole process of machine or human recommendation system. The results demonstrated that participants (N = 89) showed a more positive attitude toward a machine recommender than a human recommender, even the recommended videos contain inappropriate content. While participants who have a high level of trust in machines exhibited a negative attitude toward recommendations. Based on these results, we suggest that a phenomenon known as algorithm aversion might be simultaneously considered with machine heuristic in investigating human interaction with a machine. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2201.04329 [pdf, other]

Neural Residual Flow Fields for Efficient Video Representations

Authors: Daniel Rho, Junwoo Cho, Jong Hwan Ko, Eunbyung Park

Abstract: Neural fields have emerged as a powerful paradigm for representing various signals, including videos. However, research on improving the parameter efficiency of neural fields is still in its early stages. Even though neural fields that map coordinates to colors can be used to encode video signals, this scheme does not exploit the spatial and temporal redundancy of video signals. Inspired by standa… ▽ More Neural fields have emerged as a powerful paradigm for representing various signals, including videos. However, research on improving the parameter efficiency of neural fields is still in its early stages. Even though neural fields that map coordinates to colors can be used to encode video signals, this scheme does not exploit the spatial and temporal redundancy of video signals. Inspired by standard video compression algorithms, we propose a neural field architecture for representing and compressing videos that deliberately removes data redundancy through the use of motion information across video frames. Maintaining motion information, which is typically smoother and less complex than color signals, requires a far fewer number of parameters. Furthermore, reusing color values through motion information further improves the network parameter efficiency. In addition, we suggest using more than one reference frame for video frame reconstruction and separate networks, one for optical flows and the other for residuals. Experimental results have shown that the proposed method outperforms the baseline methods by a significant margin. The code is available in https://github.com/daniel03c1/eff_video_representation △ Less

Submitted 5 October, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: Accepted for ACCV 2022, codes are available at https://github.com/daniel03c1/eff_video_representation

arXiv:2112.14370 [pdf, other]

On the Overlooked Significance of Underutilized Contextual Features in Recent News Recommendation Models

Authors: Sungmin Cho, Hongjun Lim, Keunchan Park, Sungjoo Yoo, Eunhyeok Park

Abstract: Personalized news recommendation aims to provide attractive articles for readers by predicting their likelihood of clicking on a certain article. To accurately predict this probability, plenty of studies have been proposed that actively utilize content features of articles, such as words, categories, or entities. However, we observed that the articles' contextual features, such as CTR (click-throu… ▽ More Personalized news recommendation aims to provide attractive articles for readers by predicting their likelihood of clicking on a certain article. To accurately predict this probability, plenty of studies have been proposed that actively utilize content features of articles, such as words, categories, or entities. However, we observed that the articles' contextual features, such as CTR (click-through-rate), popularity, or freshness, were either neglected or underutilized recently. To prove that this is the case, we conducted an extensive comparison between recent deep-learning models and naive contextual models that we devised and surprisingly discovered that the latter easily outperforms the former. Furthermore, our analysis showed that the recent tendency to apply overly sophisticated deep-learning operations to contextual features was actually hindering the recommendation performance. From this knowledge, we design a purposefully simple contextual module that can boost the previous news recommendation models by a large margin. △ Less

Submitted 28 December, 2021; originally announced December 2021.

arXiv:2111.03146 [pdf, other]

Generating Diverse Realistic Laughter for Interactive Art

Authors: M. Mehdi Afsar, Eric Park, Étienne Paquette, Gauthier Gidel, Kory W. Mathewson, Eilif Muller

Abstract: We propose an interactive art project to make those rendered invisible by the COVID-19 crisis and its concomitant solitude reappear through the welcome melody of laughter, and connections created and explored through advanced laughter synthesis approaches. However, the unconditional generation of the diversity of human emotional responses in high-quality auditory synthesis remains an open problem,… ▽ More We propose an interactive art project to make those rendered invisible by the COVID-19 crisis and its concomitant solitude reappear through the welcome melody of laughter, and connections created and explored through advanced laughter synthesis approaches. However, the unconditional generation of the diversity of human emotional responses in high-quality auditory synthesis remains an open problem, with important implications for the application of these approaches in artistic settings. We developed LaughGANter, an approach to reproduce the diversity of human laughter using generative adversarial networks (GANs). When trained on a dataset of diverse laughter samples, LaughGANter generates diverse, high quality laughter samples, and learns a latent space suitable for emotional analysis and novel artistic applications such as latent mixing/interpolation and emotional transfer. △ Less

Submitted 29 July, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: Presented at Machine Learning for Creativity and Design workshop at NeurIPS 2021, 6 pages

Showing 1–50 of 91 results for author: Park, E