Zum Hauptinhalt springen

Showing 1–50 of 157 results for author: Davis, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.14661  [pdf, other

    cs.CY cs.CL cs.LG

    Towards Modeling Learner Performance with Large Language Models

    Authors: Seyed Parsa Neshaei, Richard Lee Davis, Adam Hazimeh, Bojan Lazarevski, Pierre Dillenbourg, Tanja Käser

    Abstract: Recent work exploring the capabilities of pre-trained large language models (LLMs) has demonstrated their ability to act as general pattern machines by completing complex token sequences representing a wide array of tasks, including time-series prediction and robot control. This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the dom… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 12 pages, 4 figures

  2. arXiv:2310.05010  [pdf, other

    cs.CV

    Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

    Authors: Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang

    Abstract: Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCL… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.00624

  3. arXiv:2307.00751  [pdf, other

    cs.LG cs.AI q-bio.PE

    Population Age Group Sensitivity for COVID-19 Infections with Deep Learning

    Authors: Md Khairul Islam, Tyler Valentine, Royal Wang, Levi Davis, Matt Manner, Judy Fox

    Abstract: The COVID-19 pandemic has created unprecedented challenges for governments and healthcare systems worldwide, highlighting the critical importance of understanding the factors that contribute to virus transmission. This study aimed to identify the most influential age groups in COVID-19 infection rates at the US county level using the Modified Morris Method and deep learning for time series. Our ap… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  4. arXiv:2306.02206  [pdf

    q-bio.BM cond-mat.soft cs.LG

    Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI

    Authors: Hunter Sturm, Jonas Teufel, Kaitlin A. Isfeld, Pascal Friederich, Rebecca L. Davis

    Abstract: As the importance of high-throughput screening (HTS) continues to grow due to its value in early stage drug discovery and data generation for training machine learning models, there is a growing need for robust methods for pre-screening compounds to identify and prevent false-positive hits. Small, colloidally aggregating molecules are one of the primary sources of false-positive hits in high-throu… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 17 pages, plus SI

  5. arXiv:2303.16759  [pdf

    cs.CL cs.IR cs.LG cs.SI

    Exploring celebrity influence on public attitude towards the COVID-19 pandemic: social media shared sentiment analysis

    Authors: Brianna M White, Chad A Melton, Parya Zareie, Robert L Davis, Robert A Bednarczyk, Arash Shaban-Nejad

    Abstract: The COVID-19 pandemic has introduced new opportunities for health communication, including an increase in the public use of online outlets for health-related emotions. People have turned to social media networks to share sentiments related to the impacts of the COVID-19 pandemic. In this paper we examine the role of social messaging shared by Persons in the Public Eye (i.e. athletes, politicians,… ▽ More

    Submitted 23 February, 2023; originally announced March 2023.

    Comments: 7 Pages, 4 Figures

    ACM Class: I.2.7

    Journal ref: BMJ Health & Care Informatics 2023;30:e100665

  6. arXiv:2303.14368  [pdf, other

    cs.CV cs.AI cs.LG

    FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views

    Authors: Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, Larry S. Davis

    Abstract: We present FlexNeRF, a method for photorealistic freeviewpoint rendering of humans in motion from monocular videos. Our approach works well with sparse views, which is a challenging scenario when the subject is exhibiting fast/complex motions. We propose a novel approach which jointly optimizes a canonical time and pose configuration, with a pose-dependent motion field and pose-independent tempora… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  7. arXiv:2212.05667  [pdf, other

    cs.CV

    Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

    Authors: Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang

    Abstract: Online media data, in the forms of images and videos, are becoming mainstream communication channels. However, recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost, which not only poses a serious threat to the trustworthiness of digital information but also has severe societal implications. This… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

  8. arXiv:2211.15407  [pdf

    cs.CL cs.SI

    Fine-tuned Sentiment Analysis of COVID-19 Vaccine-Related Social Media Data: Comparative Study

    Authors: Chad A Melton, Brianna M White, Robert L Davis, Robert A Bednarczyk, Arash Shaban-Nejad

    Abstract: This study investigated and compared public sentiment related to COVID-19 vaccines expressed on two popular social media platforms, Reddit and Twitter, harvested from January 1, 2020, to March 1, 2022. To accomplish this task, we created a fine-tuned DistilRoBERTa model to predict sentiments of approximately 9.5 million Tweets and 70 thousand Reddit comments. To fine-tune our model, our team manua… ▽ More

    Submitted 17 October, 2022; originally announced November 2022.

    Comments: 11 Pages, 5 Figures, and 1 Table

    MSC Class: 92-11 ACM Class: I.2.7

    Journal ref: Journal of Medical Internet Research (JMIR) 2022;24(10):e40408

  9. arXiv:2208.01813  [pdf, other

    cs.CV

    TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

    Authors: Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, Ran Xu, Joseph F. JaJa, Larry S. Davis

    Abstract: Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates… ▽ More

    Submitted 7 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: BMVC 2022

  10. arXiv:2206.04875  [pdf, other

    cs.HC

    Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

    Authors: Lydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, Lexing Xie

    Abstract: Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A "Smallset" is a small selection of rows from the original dataset containi… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22), June 21-24, 2022, Seoul, Republic of Korea

  11. arXiv:2204.08453  [pdf, other

    cs.CV

    Neural Space-filling Curves

    Authors: Hanyu Wang, Kamal Gupta, Larry Davis, Abhinav Shrivastava

    Abstract: We present Neural Space-filling Curves (SFCs), a data-driven approach to infer a context-based scan order for a set of images. Linear ordering of pixels forms the basis for many applications such as video scrambling, compression, and auto-regressive models that are used in generative modeling for images. Existing algorithms resort to a fixed scanning algorithm such as Raster scan or Hilbert scan.… ▽ More

    Submitted 30 July, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: ECCV 2022. Project page: https://hywang66.github.io/publication/neuralsfc/

  12. arXiv:2202.00011  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

    Authors: Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

    Abstract: Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: WACV 2024

  13. arXiv:2112.04598  [pdf, other

    cs.CV cs.LG stat.ML

    InvGAN: Invertible GANs

    Authors: Partha Ghosh, Dominik Zietlow, Michael J. Black, Larry S. Davis, Xiaochen Hu

    Abstract: Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the… ▽ More

    Submitted 10 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

  14. arXiv:2110.05458  [pdf, other

    cs.CV

    Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency

    Authors: Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matthew Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J. Black

    Abstract: Synthesizing images of a person in novel poses from a single image is a highly ambiguous task. Most existing approaches require paired training images; i.e. images of the same person with the same clothing in different poses. However, obtaining sufficiently large datasets with paired data is challenging and costly. Previous methods that forego paired supervision lack realism. We propose a self-sup… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: International Conference on Computer Vision (ICCV)

  15. arXiv:2108.11579  [pdf, other

    cs.LG stat.ML

    Modeling Item Response Theory with Stochastic Variational Inference

    Authors: Mike Wu, Richard L. Davis, Benjamin W. Domingue, Chris Piech, Noah Goodman

    Abstract: Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many… ▽ More

    Submitted 28 July, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: version two includes added experiments; 33 pages of content; 6 pages appendix; figures at the bottom. arXiv admin note: text overlap with arXiv:2002.00276

  16. arXiv:2108.08864  [pdf, other

    cs.DS math.CO math.PR

    Partitioned K-nearest neighbor local depth for scalable comparison-based learning

    Authors: Jacob D. Baron, R. W. R. Darling, J. Laylon Davis, R. Pettit

    Abstract: A triplet comparison oracle on a set $S$ takes an object $x \in S$ and for any pair $\{y, z\} \subset S \setminus \{x\}$ declares which of $y$ and $z$ is more similar to $x$. Partitioned Local Depth (PaLD) supplies a principled non-parametric partitioning of $S$ under such triplet comparisons but needs $O(n^2 \log{n})$ oracle calls and $O(n^3)$ post-processing steps. We introduce Partitioned Nea… ▽ More

    Submitted 2 December, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

    Comments: 27 pages, 2 figures

    MSC Class: 90C35 ACM Class: F.2.2

  17. arXiv:2107.07430  [pdf, other

    cs.CL

    Wordcraft: a Human-AI Collaborative Editor for Story Writing

    Authors: Andy Coenen, Luke Davis, Daphne Ippolito, Emily Reif, Ann Yuan

    Abstract: As neural language models grow in effectiveness, they are increasingly being applied in real-world settings. However these applications tend to be limited in the modes of interaction they support. In this extended abstract, we propose Wordcraft, an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story. Our novel interface uses few-shot learning and… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Journal ref: First Workshop on Bridging Human-Computer Interaction and Natural Language Processing at EACL 2021

  18. arXiv:2106.00168  [pdf, other

    cs.CV

    Rethinking Pseudo Labels for Semi-Supervised Object Detection

    Authors: Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

    Abstract: Recent advances in semi-supervised object detection (SSOD) are largely driven by consistency-based pseudo-labeling methods for image classification tasks, producing pseudo labels as supervisory signals. However, when using pseudo labels, there is a lack of consideration in localization precision and amplified class imbalance, both of which are critical for detection tasks. In this paper, we introd… ▽ More

    Submitted 29 December, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: AAAI 2022

  19. arXiv:2105.09597  [pdf, other

    cs.CV

    More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

    Authors: Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

    Abstract: Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities. However, the cross-modal attention models of existing methods could be sub-optimal and inaccurate because there is no direct supervision provided during the training process. In t… ▽ More

    Submitted 3 October, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted to WACV 2023

  20. arXiv:2105.07322  [pdf, other

    cs.CV cs.LG eess.IV

    Unsupervised Super-Resolution of Satellite Imagery for High Fidelity Material Label Transfer

    Authors: Arthita Ghosh, Max Ehrlich, Larry Davis, Rama Chellappa

    Abstract: Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source dom… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: Published in the proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium

    Journal ref: IGARSS (2019), 5144-5147

  21. arXiv:2105.06464  [pdf, other

    cs.CV cs.LG

    DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

    Authors: Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

    Abstract: We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pai… ▽ More

    Submitted 5 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: Tech Report

  22. arXiv:2105.02668  [pdf, other

    cs.CV

    VideoLT: Large-scale Long-tailed Video Recognition

    Authors: Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

    Abstract: Label distributions in real-world are oftentimes long-tailed and imbalanced, resulting in biased models towards dominant labels. While long-tailed recognition has been extensively studied for image classification tasks, limited effort has been made for video domain. In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recogniti… ▽ More

    Submitted 18 August, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear in ICCV 2021

  23. arXiv:2104.14557  [pdf, other

    cs.CV

    Learned Spatial Representations for Few-shot Talking-Head Synthesis

    Authors: Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava

    Abstract: We propose a novel approach for few-shot talking-head synthesis. While recent works in neural talking heads have produced promising results, they can still produce images that do not preserve the identity of the subject in source images. We posit this is a result of the entangled representation of each subject in a single latent code that models 3D shape information, identity cues, colors, lightin… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: http://www.cs.umd.edu/~mmeshry/projects/lsr/

  24. arXiv:2104.11896  [pdf, other

    cs.CV

    M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

    Authors: Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

    Abstract: We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transfo… ▽ More

    Submitted 22 October, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

  25. arXiv:2104.07098  [pdf, other

    cs.CV

    StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

    Authors: Moustafa Meshry, Yixuan Ren, Larry S Davis, Abhinav Shrivastava

    Abstract: We propose a novel approach for multi-modal Image-to-image (I2I) translation. To tackle the one-to-many relationship between input and output domains, previous works use complex training objectives to learn a latent embedding, jointly with the generator, that models the variability of the output domain. In contrast, we directly model the style variability of images, independent of the image synthe… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  26. arXiv:2104.01198  [pdf, other

    cs.CV

    Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

    Authors: Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang

    Abstract: The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label. We argue that a single clip may not have enough temporal coverage to exhibit the label to recognize, since video datasets are often weakly labeled with categorical information but without dense temporal annotations. Furthe… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  27. arXiv:2103.16748  [pdf, other

    cs.CV cs.GR

    Dual Contrastive Loss and Attention for GANs

    Authors: Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz

    Abstract: Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Yet generated images are still easy to spot especially on datasets with high variance (e.g. bedroom, church). In this paper, we propose various improvements to further push the boundaries in image generation. Specifically, we propose a novel dual contras… ▽ More

    Submitted 17 March, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted to ICCV'21

  28. arXiv:2103.13612  [pdf, other

    cs.CV cs.LG

    THAT: Two Head Adversarial Training for Improving Robustness at Scale

    Authors: Zuxuan Wu, Tom Goldstein, Larry S. Davis, Ser-Nam Lim

    Abstract: Many variants of adversarial training have been proposed, with most research focusing on problems with relatively few classes. In this paper, we propose Two Head Adversarial Training (THAT), a two-stream adversarial learning network that is designed to handle the large-scale many-class ImageNet dataset. The proposed method trains a network with two heads and two loss functions; one to minimize fea… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  29. arXiv:2103.09311  [pdf

    cs.AI cs.DL

    Using a Personal Health Library-Enabled mHealth Recommender System for Self-Management of Diabetes Among Underserved Populations: Use Case for Knowledge Graphs and Linked Data

    Authors: Nariman Ammar, James E Bailey, Robert L Davis, Arash Shaban-Nejad

    Abstract: Personal health libraries (PHLs) provide a single point of secure access to patients digital health data and enable the integration of knowledge stored in their digital health profiles with other sources of global knowledge. PHLs can help empower caregivers and health care providers to make informed decisions about patients health by understanding medical events in the context of their lives. This… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: 21 Pages, 13 Figures

    ACM Class: I.2.4; J.3

    Journal ref: JMIR Form Res. 2021 March 16;5(3):e24738

  30. arXiv:2103.05152  [pdf, other

    cs.CV cs.AI cs.LG

    Knowledge Evolution in Neural Networks

    Authors: Ahmed Taha, Abhinav Shrivastava, Larry Davis

    Abstract: Deep learning relies on the availability of a large corpus of data (labeled or unlabeled). Thus, one challenging unsettled question is: how to train a deep network on a relatively small dataset? To tackle this question, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypothese… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: CVPR Oral 2021

  31. arXiv:2103.02770  [pdf, other

    cs.CV cs.LG

    SVMax: A Feature Embedding Regularizer

    Authors: Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis

    Abstract: A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -- feature embeddings -- which in turn regularize the network's weights implicitly. We propose singular value maximization (SVMax) to learn a more uniform feature embedding. The SVMax regularizer supports both supervised… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

  32. arXiv:2102.05646  [pdf, other

    cs.CV cs.AI

    Scale Normalized Image Pyramids with AutoFocus for Object Detection

    Authors: Bharat Singh, Mahyar Najibi, Abhishek Sharma, Larry S. Davis

    Abstract: We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyra… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Accepted in T-PAMI 2021

  33. arXiv:2101.11080  [pdf, other

    cs.CV

    Deep Video Inpainting Detection

    Authors: Peng Zhou, Ning Yu, Zuxuan Wu, Larry S. Davis, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally. In particular, we introduce VIDNet, Video Inpainting Detection Network, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames,… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  34. arXiv:2012.14950  [pdf, other

    cs.CV

    2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

    Authors: Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

    Abstract: 3D convolutional networks are prevalent for video recognition. While achieving excellent recognition performance on standard benchmarks, they operate on a sequence of frames with 3D convolutions and thus are computationally demanding. Exploiting large variations among different videos, we introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determ… ▽ More

    Submitted 28 April, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  35. arXiv:2012.13117  [pdf, other

    cs.DL cs.CY

    Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

    Authors: Task Force on Best Practices for Software Registries, :, Alain Monteil, Alejandra Gonzalez-Beltran, Alexandros Ioannidis, Alice Allen, Allen Lee, Anita Bandrowski, Bruce E. Wilson, Bryce Mecum, Cai Fan Du, Carly Robinson, Daniel Garijo, Daniel S. Katz, David Long, Genevieve Milliken, Hervé Ménager, Jessica Hausman, Jurriaan H. Spaaks, Katrina Fenlon, Kristin Vanderbilt, Lorraine Hwang, Lynn Davis, Martin Fenner, Michael R. Crusoe , et al. (8 additional authors not shown)

    Abstract: Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these r… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

    Comments: 18 pages

  36. arXiv:2012.08726  [pdf, other

    cs.CR cs.CV cs.CY cs.GR cs.LG

    Responsible Disclosure of Generative Models Using Scalable Fingerprinting

    Authors: Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis, Mario Fritz

    Abstract: Over the past years, deep generative models have achieved a new level of performance. Generated data has become difficult, if not impossible, to be distinguished from real data. While there are plenty of use cases that benefit from this technology, there are also strong concerns on how this new technology can be misused to generate deep fakes and enable misinformation at scale. Unfortunately, curr… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted to ICLR'22 as Spotlight

  37. arXiv:2012.04643  [pdf, other

    cs.CV cs.LG

    The Lottery Ticket Hypothesis for Object Recognition

    Authors: Sharath Girish, Shishira R. Maiya, Kamal Gupta, Hao Chen, Larry Davis, Abhinav Shrivastava

    Abstract: Recognition tasks, such as object recognition and keypoint estimation, have seen widespread adoption in recent years. Most state-of-the-art methods for these tasks use deep networks that are computationally expensive and have huge memory footprints. This makes it exceedingly difficult to deploy these systems on low power embedded devices. Hence, the importance of decreasing the storage requirement… ▽ More

    Submitted 19 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: To appear at CVPR 2021

  38. arXiv:2011.10269  [pdf, other

    cs.CV

    SLADE: A Self-Training Framework For Distance Metric Learning

    Authors: Jiali Duan, Yen-Liang Lin, Son Tran, Larry S. Davis, C. -C. Jay Kuo

    Abstract: Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both la… ▽ More

    Submitted 29 March, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted by CVPR 2021

  39. arXiv:2011.08932  [pdf, other

    cs.CV cs.LG

    Analyzing and Mitigating JPEG Compression Defects in Deep Learning

    Authors: Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava

    Abstract: With the proliferation of deep learning methods, many computer vision problems which were considered academic are now viable in the consumer setting. One drawback of consumer applications is lossy compression, which is necessary from an engineering standpoint to efficiently and cheaply store and transmit user images. Despite this, there has been little study of the effect of compression on deep ne… ▽ More

    Submitted 20 September, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to the ICCV MELEX Workshop

  40. arXiv:2008.12432  [pdf, other

    cs.CV

    All About Knowledge Graphs for Actions

    Authors: Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava

    Abstract: Current action recognition systems require large amounts of training data for recognizing an action. Recent works have explored the paradigm of zero-shot and few-shot learning to learn classifiers for unseen categories or categories with few labels. Following similar paradigms in object recognition, these approaches utilize external sources of knowledge (eg. knowledge graphs from language domains)… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  41. arXiv:2007.10321  [pdf, other

    cs.CV

    Hierarchical Contrastive Motion Learning for Video Action Recognition

    Authors: Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz

    Abstract: One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. This hierarchical desig… ▽ More

    Submitted 17 January, 2022; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: BMVC2021 camera ready (Oral)

  42. arXiv:2007.09785  [pdf, other

    cs.CV

    ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors

    Authors: Rohun Tripathi, Vasu Singla, Mahyar Najibi, Bharat Singh, Abhishek Sharma, Larry Davis

    Abstract: The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines. Unfortunately, for the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature. In this article, we carefully profile Greedy-NMS iterations to find that a major chunk of computation is wasted in c… ▽ More

    Submitted 21 August, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: Under Review at CVIU

  43. arXiv:2007.09748  [pdf, other

    cs.CV

    A Generic Visualization Approach for Convolutional Neural Networks

    Authors: Ahmed Taha, Xitong Yang, Abhinav Shrivastava, Larry Davis

    Abstract: Retrieval networks are essential for searching and indexing. Compared to classification networks, attention visualization for retrieval networks is hardly studied. We formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. Unlike recent liter… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: ECCV'2020

  44. arXiv:2007.08556  [pdf, other

    cs.CV

    InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

    Authors: Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

    Abstract: Real-time 3D object detection is crucial for autonomous cars. Achieving promising performance with high efficiency, voxel-based approaches have received considerable attention. However, previous methods model the input space with features extracted from equally divided sub-regions without considering that point cloud is generally non-uniformly distributed over the space. To address this issue, we… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  45. arXiv:2006.14615  [pdf, other

    cs.CV cs.LG

    LayoutTransformer: Layout Generation and Completion with Self-attention

    Authors: Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

    Abstract: We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To… ▽ More

    Submitted 30 September, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: To appear at ICCV 2021

  46. arXiv:2004.09320  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Quantization Guided JPEG Artifact Correction

    Authors: Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava

    Abstract: The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the curren… ▽ More

    Submitted 16 July, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Published in the proceedings of ECCV 2020, please see our released code and models at https://gitlab.com/Queuecumber/quantization-guided-ac

  47. arXiv:2004.03355  [pdf, other

    cs.CV

    Inclusive GAN: Improving Data and Minority Coverage in Generative Models

    Authors: Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz

    Abstract: Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images. Yet the equitable allocation of their modeling capacity among subgroups has received less attention, which could lead to potential biases against underrepresented minorities if left uncontrolled. In this work, we first formalize the problem of minority inclusion as one of data coverag… ▽ More

    Submitted 22 August, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: Accepted to ECCV'20

  48. arXiv:2004.01170  [pdf, other

    cs.CV

    DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

    Authors: Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

    Abstract: We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that b… ▽ More

    Submitted 6 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  49. arXiv:2003.12125  [pdf, other

    cs.CV

    SaccadeNet: A Fast and Accurate Object Detector

    Authors: Shiyi Lan, Zhou Ren, Yi Wu, Larry S. Davis, Gang Hua

    Abstract: Object detection is an essential step towards holistic scene understanding. Most existing object detection algorithms attend to certain object areas once and then predict the object locations. However, neuroscientists have revealed that humans do not look at the scene in fixed steadiness. Instead, human eyes move around, locating informative parts to understand the object location. This active per… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

  50. arXiv:2003.11670  [pdf, other

    cs.CV

    DeepStrip: High Resolution Boundary Refinement

    Authors: Peng Zhou, Brian Price, Scott Cohen, Gregg Wilensky, Larry S. Davis

    Abstract: In this paper, we target refining the boundaries in high resolution images given low resolution masks. For memory and computation efficiency, we propose to convert the regions of interest into strip images and compute a boundary prediction in the strip domain. To detect the target boundary, we present a framework with two prediction layers. First, all potential boundaries are predicted as an initi… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Journal ref: CVPR 2020