Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Zharkov, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01425  [pdf, other

    cs.CV

    FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

    Authors: Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming Liang

    Abstract: Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple ye… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2404.08292  [pdf, other

    cs.CV cs.GR

    AdaContour: Adaptive Contour Descriptor with Hierarchical Representation

    Authors: Tianyu Ding, Jinxin Zhou, Tianyi Chen, Zhihui Zhu, Ilya Zharkov, Luming Liang

    Abstract: Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably charac… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  3. arXiv:2404.08111  [pdf, other

    cs.CV cs.AI cs.CL

    S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing

    Authors: Guangzhi Wang, Tianyi Chen, Kamran Ghasedi, HsiangTao Wu, Tianyu Ding, Chris Nuesmeyer, Ilya Zharkov, Mohan Kankanhalli, Luming Liang

    Abstract: Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introdu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2312.09411  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

    Authors: Tianyi Chen, Tianyu Ding, Zhihui Zhu, Zeyu Chen, HsiangTao Wu, Ilya Zharkov, Luming Liang

    Abstract: Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 39 pages. Due to the page dim limitation, the full appendix is attached here https://tinyurl.com/otov3appendix. Recommend to zoom-in for finer details. arXiv admin note: text overlap with arXiv:2305.18030

  5. arXiv:2312.00678  [pdf, other

    cs.CL

    The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

    Authors: Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang

    Abstract: The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algor… ▽ More

    Submitted 18 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  6. arXiv:2312.00210  [pdf, other

    cs.CV cs.AI

    DREAM: Diffusion Rectification and Estimation-Adaptive Models

    Authors: Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang

    Abstract: We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which ba… ▽ More

    Submitted 19 March, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 16 pages, 22 figures, 5 tables; the first two authors contributed to this work equally

  7. arXiv:2311.15510  [pdf, other

    cs.CV

    CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

    Authors: Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang

    Abstract: Generalizability and few-shot learning are key challenges in Neural Radiance Fields (NeRF), often due to the lack of a holistic understanding in pixel-level rendering. We introduce CaesarNeRF, an end-to-end approach that leverages scene-level CAlibratEd SemAntic Representation along with pixel-level representations to advance few-shot, generalizable neural rendering, facilitating a holistic unders… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Accepted to ECCV 2024. Project available at https://haidongz-usc.github.io/project/caesarnerf

  8. arXiv:2311.03770  [pdf, other

    cs.CV

    Lightweight Portrait Matting via Regional Attention and Refinement

    Authors: Yatao Zhong, Ilya Zharkov

    Abstract: We present a lightweight model for high resolution portrait matting. The model does not use any auxiliary inputs such as trimaps or background captures and achieves real time performance for HD videos and near real time for 4K. Our model is built upon a two-stage framework with a low resolution network for coarse alpha estimation followed by a refinement network for local region improvement. Howev… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  9. arXiv:2310.18356  [pdf, other

    cs.CL cs.AI cs.LG

    LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

    Authors: Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang

    Abstract: Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal str… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  10. arXiv:2308.16154  [pdf, other

    cs.CV

    MMVP: Motion-Matrix-based Video Prediction

    Authors: Yiqi Zhong, Luming Liang, Ilya Zharkov, Ulrich Neumann

    Abstract: A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually h… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 (Oral)

  11. arXiv:2305.18030  [pdf, other

    cs.LG cs.AI cs.CV

    Automated Search-Space Generation Neural Architecture Search

    Authors: Tianyi Chen, Luming Liang, Tianyu Ding, Ilya Zharkov

    Abstract: To search an optimal sub-network within a general deep neural network (DNN), existing neural architecture search (NAS) methods typically rely on handcrafting a search space beforehand. Such requirements make it challenging to extend them onto general scenarios without significant human expertise and manual intervention. To overcome the limitations, we propose Automated Search-Space Generation Neur… ▽ More

    Submitted 5 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Graph visualization for DARTS, SuperResNet are omitted for arXiv version due to exceeding page dimension limit. Please refer to the open-review version for taking the visualizations

  12. arXiv:2303.06862  [pdf, other

    cs.CV cs.AI

    OTOV2: Automatic, Generic, User-Friendly

    Authors: Tianyi Chen, Luming Liang, Tianyu Ding, Zhihui Zhu, Ilya Zharkov

    Abstract: The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a genera… ▽ More

    Submitted 23 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Published on ICLR 2023. Remark here that a few images of dependency graphs can not be included in arXiv due to exceeding size limit

  13. arXiv:2210.02391  [pdf, other

    cs.CV cs.LG cs.MM

    Geometry Driven Progressive Warping for One-Shot Face Animation

    Authors: Yatao Zhong, Faezeh Amjadi, Ilya Zharkov

    Abstract: Face animation aims at creating photo-realistic portrait videos with animated poses and expressions. A common practice is to generate displacement fields that are used to warp pixels and features from source to target. However, prior attempts often produce sub-optimal displacements. In this work, we present a geometry driven model and propose two geometric patterns as guidance: 3D face rendered di… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  14. arXiv:2209.04551  [pdf, other

    cs.CV

    Sparsity-guided Network Design for Frame Interpolation

    Authors: Tianyu Ding, Luming Liang, Zhihui Zhu, Tianyi Chen, Ilya Zharkov

    Abstract: DNN-based frame interpolation, which generates intermediate frames from two consecutive frames, is often dependent on model architectures with a large number of features, preventing their deployment on systems with limited resources, such as mobile devices. We present a compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. The corresponding CVPR paper can be found at arXiv:2103.10559

  15. arXiv:2203.14186  [pdf, other

    cs.CV

    RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

    Authors: Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

    Abstract: Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We p… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

  16. arXiv:2103.10559  [pdf, other

    cs.CV

    CDFI: Compression-Driven Network Design for Frame Interpolation

    Authors: Tianyu Ding, Luming Liang, Zhihui Zhu, Ilya Zharkov

    Abstract: DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsity-indu… ▽ More

    Submitted 27 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: To appear in the proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  17. arXiv:2004.08513  [pdf, other

    eess.IV cs.CV

    ImagePairs: Realistic Super Resolution Dataset via Beam Splitter Camera Rig

    Authors: Hamid Reza Vaezi Joze, Ilya Zharkov, Karlton Powell, Carl Ringler, Luming Liang, Andy Roulston, Moshe Lutz, Vivek Pradeep

    Abstract: Super Resolution is the problem of recovering a high-resolution image from a single or multiple low-resolution images of the same scene. It is an ill-posed problem since high frequency visual details of the scene are completely lost in low-resolution images. To overcome this, many machine learning approaches have been proposed aiming at training a model to recover the lost details in the new scene… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2020

  18. arXiv:1803.11264  [pdf, other

    cs.CV

    DIY Human Action Data Set Generation

    Authors: Mehran Khodabandeh, Hamid Reza Vaezi Joze, Ilya Zharkov, Vivek Pradeep

    Abstract: The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2018