Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Conde, M V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16807  [pdf, other

    cs.CV cs.AI cs.GR cs.MM

    Extreme Compression of Adaptive Neural Images

    Authors: Leo Hoshikawa, Marcos V. Conde, Takeshi Ohashi, Atsushi Irie

    Abstract: Implicit Neural Representations (INRs) and Neural Fields are a novel paradigm for signal representation, from images and audio to 3D scenes and videos. The fundamental idea is to represent a signal as a continuous and differentiable neural network. This idea offers unprecedented benefits such as continuous resolution and memory efficiency, enabling new compression techniques. However, representing… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Technical Report. Work in progress

  2. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  4. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  5. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  6. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  7. arXiv:2404.11770  [pdf, other

    cs.CV cs.AI

    Event-Based Eye Tracking. AIS 2024 Challenge Survey

    Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li , et al. (14 additional authors not shown)

    Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Qinyu Chen is the corresponding author

  8. arXiv:2404.11569  [pdf, other

    cs.CV cs.LG eess.IV

    Simple Image Signal Processing using Global Context Guidance

    Authors: Omar Elezabi, Marcos V. Conde, Radu Timofte

    Abstract: In modern smartphone cameras, the Image Signal Processor (ISP) is the core element that converts the RAW readings from the sensor into perceptually pleasant RGB images for the end users. The ISP is typically proprietary and handcrafted and consists of several blocks such as white balance, color correction, and tone mapping. Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB im… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Preprint under review

  9. arXiv:2404.11159  [pdf, other

    cs.CV

    Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey

    Authors: Nicolas Chahine, Marcos V. Conde, Daniela Carfora, Gabriel Pacianotto, Benoit Pochon, Sira Ferradans, Radu Timofte

    Abstract: This paper reviews the NTIRE 2024 Portrait Quality Assessment Challenge, highlighting the proposed solutions and results. This challenge aims to obtain an efficient deep neural network capable of estimating the perceptual quality of real portrait photos. The methods must generalize to diverse scenes and diverse lighting conditions (indoor, outdoor, low-light), movement, blur, and other challenging… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPRW - NTIRE 2024

  10. arXiv:2401.16468  [pdf, other

    cs.CV cs.LG eess.IV

    InstructIR: High-Quality Image Restoration Following Human Instructions

    Authors: Marcos V. Conde, Gregor Geigle, Radu Timofte

    Abstract: Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions… ▽ More

    Submitted 7 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: European Conference on Computer Vision (ECCV) 2024

  11. arXiv:2312.15487  [pdf, other

    eess.IV cs.CV

    BSRAW: Improving Blind RAW Image Super-Resolution

    Authors: Marcos V. Conde, Florin Vasluianu, Radu Timofte

    Abstract: In smartphones and compact cameras, the Image Signal Processor (ISP) transforms the RAW sensor image into a human-readable sRGB image. Most popular super-resolution methods depart from a sRGB image and upscale it further, improving its quality. However, modeling the degradations in the sRGB domain is complicated because of the non-linear ISP transformations. Despite this known issue, only a few me… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  12. arXiv:2310.13012  [pdf, other

    cs.CL cs.AI

    H2O Open Ecosystem for State-of-the-art Large Language Models

    Authors: Arno Candel, Jon McKinney, Philipp Singer, Pascal Pfeiffer, Maximilian Jeblick, Chun Ming Lee, Marcos V. Conde

    Abstract: Large Language Models (LLMs) represent a revolution in AI. However, they also pose many significant risks, such as the presence of biased, private, copyrighted or harmful text. For this reason we need open, transparent and safe solutions. We introduce a complete open-source ecosystem for developing and testing LLMs. The goal of this project is to boost open alternatives to closed-source approaches… ▽ More

    Submitted 23 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Demo - ACL Empirical Methods in Natural Language Processing

  13. arXiv:2309.03387  [pdf, other

    cs.RO cs.AI cs.MA

    Efficient Baselines for Motion Prediction in Autonomous Driving

    Authors: Carlos Gómez-Huélamo, Marcos V. Conde, Rafael Barea, Manuel Ocaña, Luis M. Bergasa

    Abstract: Motion Prediction (MP) of multiple surroundings agents is a crucial task in arbitrarily complex environments, from simple robots to Autonomous Driving Stacks (ADS). Current techniques tackle this problem using end-to-end pipelines, where the input data is usually a rendered top-view of the physical information and the past trajectories of the most relevant agents; leveraging this information is a… ▽ More

    Submitted 31 October, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: IEEE T-ITS Transactions on Intelligent Transportation Systems

  14. arXiv:2307.04916  [pdf, other

    cs.CV eess.IV

    Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery

    Authors: Gabor Fodor, Marcos V. Conde

    Abstract: Deforestation estimation and fire detection in the Amazon forest poses a significant challenge due to the vast size of the area and the limited accessibility. However, these are crucial problems that lead to severe environmental consequences, including climate change, global warming, and biodiversity loss. To effectively address this problem, multimodal satellite imagery and remote sensing offer a… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: CVPR 2023 Workshop on Multimodal Learning for Earth and Environment (MultiEarth)

  15. arXiv:2306.11920  [pdf, other

    cs.CV

    NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement

    Authors: Marcos V. Conde, Javier Vazquez-Corral, Michael S. Brown, Radu Timofte

    Abstract: 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet n… ▽ More

    Submitted 24 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: AAAI 2024 - The 38th Annual AAAI Conference on Artificial Intelligence

  16. arXiv:2306.08161  [pdf, other

    cs.CL cs.AI cs.HC cs.IR cs.LG

    h2oGPT: Democratizing Large Language Models

    Authors: Arno Candel, Jon McKinney, Philipp Singer, Pascal Pfeiffer, Maximilian Jeblick, Prithvi Prabhu, Jeff Gambera, Mark Landry, Shivam Bansal, Ryan Chesler, Chun Ming Lee, Marcos V. Conde, Pasha Stetsenko, Olivier Grellier, SriSatish Ambati

    Abstract: Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for… ▽ More

    Submitted 16 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Work in progress by H2O.ai, Inc

  17. arXiv:2211.14040  [pdf, other

    eess.IV cs.CV

    Real-Time Under-Display Cameras Image Restoration and HDR on Mobile Devices

    Authors: Marcos V. Conde, Florin Vasluianu, Sabari Nathan, Radu Timofte

    Abstract: The new trend of full-screen devices implies positioning the camera behind the screen to bring a larger display-to-body ratio, enhance eye contact, and provide a notch-free viewing experience on smartphones, TV or tablets. On the other hand, the images captured by under-display cameras (UDCs) are degraded by the screen in front of them. Deep learning methods for image restoration can significantly… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: ECCV 2022 AIM Workshop. arXiv admin note: text overlap with arXiv:2210.13552

  18. arXiv:2211.13130  [pdf, other

    cs.CY cs.AI cs.LG

    A Brief Overview of AI Governance for Responsible Machine Learning Systems

    Authors: Navdeep Gill, Abhishek Mathur, Marcos V. Conde

    Abstract: Organizations of all sizes, across all industries and domains are leveraging artificial intelligence (AI) technologies to solve some of their biggest challenges around operations, customer experience, and much more. However, due to the probabilistic nature of AI, the risks associated with it are far greater than traditional technologies. Research has shown that these risks can range anywhere from… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 Trustworthy and Socially Responsible Machine Learning (TSRML) Workshop

  19. arXiv:2211.04470  [pdf, other

    cs.CV eess.IV

    Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

    Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

  20. arXiv:2211.03885  [pdf, other

    cs.CV eess.IV

    Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

    Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  21. arXiv:2210.13552  [pdf, other

    cs.CV eess.IV

    Perceptual Image Enhancement for Smartphone Real-Time Applications

    Authors: Marcos V. Conde, Florin Vasluianu, Javier Vazquez-Corral, Radu Timofte

    Abstract: Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image… ▽ More

    Submitted 22 November, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: IEEE/CVF WACV 2023 (Oral)

  22. arXiv:2210.11153  [pdf, other

    eess.IV cs.CV

    Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Yibin Huang, Jingyang Peng, Chang Chen, Cheng Li, Eduardo Pérez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu , et al. (18 additional authors not shown)

    Abstract: Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image data… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: ECCV 2022 Advances in Image Manipulation (AIM) workshop

  23. arXiv:2210.11141  [pdf, other

    cs.CV cs.AI

    General Image Descriptors for Open World Image Retrieval using ViT CLIP

    Authors: Marcos V. Conde, Ivan Aerlic, Simon Jégou

    Abstract: The Google Universal Image Embedding (GUIE) Challenge is one of the first competitions in multi-domain image representations in the wild, covering a wide distribution of objects: landmarks, artwork, food, etc. This is a fundamental computer vision problem with notable applications in image retrieval, search engines and e-commerce. In this work, we explain our 4th place solution to the GUIE Challen… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: ECCV 2022 Instance-Level Recognition Workshop

  24. arXiv:2209.12674  [pdf, other

    cs.CV cs.AI cs.RO

    Exploring Attention GAN for Vehicle Motion Prediction

    Authors: Carlos Gómez-Huélamo, Marcos V. Conde, Miguel Ortiz, Santiago Montiel, Rafael Barea, Luis M. Bergasa

    Abstract: The design of a safe and reliable Autonomous Driving stack (ADS) is one of the most challenging tasks of our era. These ADS are expected to be driven in highly dynamic environments with full autonomy, and a reliability greater than human beings. In that sense, to efficiently and safely navigate through arbitrarily complex traffic scenarios, ADS must have the ability to forecast the future trajecto… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: IEEE International Conference on Intelligent Transportation Systems 2022

  25. arXiv:2209.11345  [pdf, other

    cs.CV eess.IV

    Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration

    Authors: Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte

    Abstract: Compression plays an important role on the efficient transmission and storage of images and videos through band-limited systems such as streaming services, virtual reality or videogames. However, compression unavoidably leads to artifacts and the loss of the original information, which may severely degrade the visual quality. For these reasons, quality enhancement of compressed images has become a… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: European Conference on Computer Vision (ECCV 2022) Workshops

  26. arXiv:2206.11260  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Few-shot Long-Tailed Bird Audio Recognition

    Authors: Marcos V. Conde, Ui-Jin Choi

    Abstract: It is easier to hear birds than see them. However, they still play an essential role in nature and are excellent indicators of deteriorating environmental quality and pollution. Recent advances in Deep Neural Networks allow us to process audio data to detect and classify birds. This technology can assist researchers in monitoring bird populations and biodiversity. We propose a sound detection and… ▽ More

    Submitted 4 July, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: LifeCLEF2022 (best paper award)

  27. arXiv:2205.13071  [pdf, other

    cs.RO cs.CV

    Exploring Map-based Features for Efficient Attention-based Vehicle Motion Prediction

    Authors: Carlos Gómez-Huélamo, Marcos V. Conde, Miguel Ortiz

    Abstract: Motion prediction (MP) of multiple agents is a crucial task in arbitrarily complex environments, from social robots to self-driving cars. Current approaches tackle this problem using end-to-end networks, where the input data is usually a rendered top-view of the scene and the past trajectories of all the agents; leveraging this information is a must to obtain optimal performance. In that sense, a… ▽ More

    Submitted 10 June, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR MABe 2022 - ICRA FFPFAD 2022 Workshops

  28. CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification

    Authors: Marcos V. Conde, Kerem Turgutlu

    Abstract: Existing computer vision research in artwork struggles with artwork's fine-grained attributes recognition and lack of curated annotated datasets due to their costly creation. To the best of our knowledge, we are one of the first methods to use CLIP (Contrastive Language-Image Pre-Training) to train a neural network on a variety of artwork images and text descriptions pairs. CLIP is able to learn d… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: CVPR CVFAD Workshop 2021

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 3956-3960

  29. arXiv:2204.12819  [pdf, other

    eess.IV cs.CV

    Conformer and Blind Noisy Students for Improved Image Quality Assessment

    Authors: Marcos V. Conde, Maxime Burchi, Radu Timofte

    Abstract: Generative models for image restoration, enhancement, and generation have significantly improved the quality of the generated images. Surprisingly, these models produce more pleasant images to the human eye than other methods, yet, they may get a lower perceptual quality score using traditional perceptual quality metrics such as PSNR or SSIM. Therefore, it is necessary to develop a quantitative me… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: CVPR NTIRE 2022

  30. Model-Based Image Signal Processors via Learnable Dictionaries

    Authors: Marcos V. Conde, Steven McDonagh, Matteo Maggioni, Aleš Leonardis, Eduardo Pérez-Pellitero

    Abstract: Digital cameras transform sensor RAW readings into RGB images by means of their Image Signal Processor (ISP). Computational photography tasks such as image denoising and colour constancy are commonly performed in the RAW domain, in part due to the inherent hardware design, but also due to the appealing simplicity of noise statistics that result from the direct sensor readings. Despite this, the av… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

    Comments: AAAI 2022

    Journal ref: Vol. 36 No. 1: AAAI-22 Technical Tracks 1 (2022) 481-489

  31. arXiv:2112.05534  [pdf, other

    cs.RO cs.CV

    An Embarrassingly Pragmatic Introduction to Vision-based Autonomous Robots

    Authors: Marcos V. Conde

    Abstract: Autonomous robots are currently one of the most popular Artificial Intelligence problems, having experienced significant advances in the last decade, from Self-driving cars and humanoids to delivery robots and drones. Part of the problem is to get a robot to emulate the perception of human beings, our sense of sight, replacing the eyes with cameras and the brain with mathematical models such as Ne… ▽ More

    Submitted 14 December, 2021; v1 submitted 14 November, 2021; originally announced December 2021.

    Comments: CS Thesis. Lecture Notes in Computer Science

  32. arXiv:2107.04878  [pdf, other

    cs.SD cs.MM eess.AS

    Weakly-Supervised Classification and Detection of Bird Sounds in the Wild. A BirdCLEF 2021 Solution

    Authors: Marcos V. Conde, Kumar Shubham, Prateek Agnihotri, Nitin D. Movva, Szilard Bessenyei

    Abstract: It is easier to hear birds than see them, however, they still play an essential role in nature and they are excellent indicators of deteriorating environmental quality and pollution. Recent advances in Machine Learning and Convolutional Neural Networks allow us to detect and classify bird sounds, by doing this, we can assist researchers in monitoring the status and trends of bird populations and b… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Proceedings Working Notes CEURWS @ CLEF 2021 - BirdCLEF 2021

  33. arXiv:2106.10587  [pdf, other

    cs.CV cs.LG

    Exploring Vision Transformers for Fine-grained Classification

    Authors: Marcos V. Conde, Kerem Turgutlu

    Abstract: Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both t… ▽ More

    Submitted 29 June, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

    Comments: 4 pages, 5 figures, 4 tables. Published in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021 - FGVC8. For code see https://github.com/mv-lab/ViT-FGVC8 and for other workshop papers see https://sites.google.com/view/fgvc8/papers

  34. arXiv:1911.06866  [pdf, other

    cs.CV

    Multi-attention Networks for Temporal Localization of Video-level Labels

    Authors: Lijun Zhang, Srinath Nizampatnam, Ahana Gangopadhyay, Marcos V. Conde

    Abstract: Temporal localization remains an important challenge in video understanding. In this work, we present our solution to the 3rd YouTube-8M Video Understanding Challenge organized by Google Research. Participants were required to build a segment-level classifier using a large-scale training data set with noisy video-level labels and a relatively small-scale validation data set with accurate segment-l… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: 7 pages, 3 figures; This work was presented at the 3rd Workshop on YouTube-8M Large-Scale Video Understanding, at the International Conference on Computer Vision (ICCV 2019) in Seoul, Korea