Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Fatahalian, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13934  [pdf, other

    cs.LG cs.AI cs.GR

    Learning to Move Like Professional Counter-Strike Players

    Authors: David Durst, Feng Xie, Vishnu Sarukkai, Brennan Shacklett, Iuri Frosio, Chen Tessler, Joohwan Kim, Carly Taylor, Gilbert Bernstein, Sanjiban Choudhury, Pat Hanrahan, Kayvon Fatahalian

    Abstract: In multiplayer, first-person shooter games like Counter-Strike: Global Offensive (CS:GO), coordinated movement is a critical component of high-level strategic play. However, the complexity of team coordination and the variety of conditions present in popular game maps make it impractical to author hand-crafted movement policies for every scenario. We show that it is possible to take a data-driven… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: The project website is at https://davidbdurst.com/mlmove/

    Journal ref: ACM SIGGRAPH / Eurographics Symposium on Computer Animation (SCA), August 21-23, 2024, Montreal, Canada

  2. arXiv:2402.18116  [pdf, other

    cs.GR cs.CV

    Block and Detail: Scaffolding Sketch-to-Image Generation

    Authors: Vishnu Sarukkai, Lu Yuan, Mia Tang, Maneesh Agrawala, Kayvon Fatahalian

    Abstract: We introduce a novel sketch-to-image tool that aligns with the iterative refinement process of artists. Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes. We develop a two-pass algorithm for generating high-fidelity images from such sketches at any point in the iterative process. In the first… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 12 pages, 13 figures

  3. arXiv:2312.12080  [pdf, other

    cs.CV cs.GR

    Learning Subject-Aware Cropping by Outpainting Professional Photos

    Authors: James Hong, Lu Yuan, Michaël Gharbi, Matthew Fisher, Kayvon Fatahalian

    Abstract: How to frame (or crop) a photo often depends on the image subject and its context; e.g., a human portrait. Recent works have defined the subject-aware image cropping task as a nuanced and practical version of image cropping. We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. Unlike supervised prior work, GenCrop… ▽ More

    Submitted 4 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: AAAI 24. Extended version with supplemental materials

  4. Iterative Motion Editing with Natural Language

    Authors: Purvi Goel, Kuan-Chieh Wang, C. Karen Liu, Kayvon Fatahalian

    Abstract: Text-to-motion diffusion models can generate realistic animations from text prompts, but do not support fine-grained motion editing controls. In this paper, we present a method for using natural language to iteratively specify local edits to existing character animations, a task that is common in most computer animation workflows. Our key idea is to represent a space of motion edits using a set of… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  5. arXiv:2303.00262  [pdf, other

    cs.CV cs.GR cs.LG

    Collage Diffusion

    Authors: Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, Kayvon Fatahalian

    Abstract: We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input l… ▽ More

    Submitted 31 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  6. arXiv:2207.10213  [pdf, other

    cs.CV

    Spotting Temporally Precise, Fine-Grained Events in Video

    Authors: James Hong, Haotian Zhang, Michaël Gharbi, Matthew Fisher, Kayvon Fatahalian

    Abstract: We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Website URL: https://jhong93.github.io/projects/spot.html

  7. arXiv:2204.07596  [pdf, other

    stat.ML cs.LG

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Authors: Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

    Abstract: An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them,… ▽ More

    Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: ICML 2022 Camera Ready

  8. arXiv:2203.13270  [pdf, other

    stat.ML cs.LG

    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

    Authors: Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

    Abstract: Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudol… ▽ More

    Submitted 1 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: UAI 2022 Camera Ready

  9. arXiv:2109.05720  [pdf, other

    cs.CV cs.LG

    Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

    Authors: Fait Poms, Vishnu Sarukkai, Ravi Teja Mullapudi, Nimit S. Sohoni, William R. Mark, Deva Ramanan, Kayvon Fatahalian

    Abstract: For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging. Our key insight is that simultaneous ca… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted to ICCV 2021; 12 pages, 12 figures

  10. arXiv:2109.01305  [pdf, other

    cs.CV

    Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

    Authors: James Hong, Matthew Fisher, Michaël Gharbi, Kayvon Fatahalian

    Abstract: Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose. End-to-end learning circumvents pose, but requires more labels to generalize. We introd… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (poster)

  11. arXiv:2107.00643  [pdf, other

    cs.LG

    Mandoline: Model Evaluation under Distribution Shift

    Authors: Mayee Chen, Karan Goel, Nimit S. Sohoni, Fait Poms, Kayvon Fatahalian, Christopher Ré

    Abstract: Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as im… ▽ More

    Submitted 10 April, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: 33 pages. Published as a conference paper at ICML 2021

  12. arXiv:2103.07013  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    Large Batch Simulation for Deep Reinforcement Learning

    Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

    Abstract: We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at ICLR 2021

  13. arXiv:2011.10688  [pdf, other

    cs.CV cs.GR cs.MM

    Iterative Text-based Editing of Talking-heads Using Neural Retargeting

    Authors: Xinwei Yao, Ohad Fried, Kayvon Fatahalian, Maneesh Agrawala

    Abstract: We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g. a smile) or changing the overall performance style (e.g. energetic, mumble). Our tool r… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Project Website is https://davidyao.me/projects/text2vid

  14. arXiv:2008.12873  [pdf, other

    cs.CV cs.LG

    Background Splitting: Finding Rare Classes in a Sea of Background

    Authors: Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian

    Abstract: We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories. In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background). We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accur… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

  15. arXiv:2008.06007  [pdf, other

    cs.CY cs.MM

    Analyzing Who and What Appears in a Decade of US Cable TV News

    Authors: James Hong, Will Crichton, Haotian Zhang, Daniel Y. Fu, Jacob Ritchie, Jeremy Barenholtz, Ben Hannel, Xinwei Yao, Michaela Murray, Geraldine Moriba, Maneesh Agrawala, Kayvon Fatahalian

    Abstract: Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces… ▽ More

    Submitted 24 January, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Published in KDD 2021 as "Analysis of Faces in a Decade of US Cable TV News". ArXiv draft: 14 pages, 22 figures (15 pages, 16 figures in supplemental materials)

  16. arXiv:2008.04524  [pdf, other

    cs.GR

    Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players

    Authors: Haotian Zhang, Cristobal Sciutto, Maneesh Agrawala, Kayvon Fatahalian

    Abstract: We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures, and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point p… ▽ More

    Submitted 21 December, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

    Comments: 16 pages, Latex; added player shadows in Sec 8.1; added visual quality evaluation in Sec 9.3; website: https://cs.stanford.edu/~haotianz/research/vid2player/

  17. arXiv:2006.15168  [pdf, other

    stat.ML cs.LG

    Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

    Authors: Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

    Abstract: Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  18. arXiv:2002.11955  [pdf, other

    stat.ML cs.LG

    Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

    Authors: Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

    Abstract: Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive,… ▽ More

    Submitted 15 July, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  19. arXiv:1910.09505  [pdf, other

    stat.ML cs.CV cs.LG

    Multi-Resolution Weak Supervision for Sequential Data

    Authors: Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré

    Abstract: Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in w… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019 (Conference on Neural Information Processing Systems)

  20. arXiv:1910.02993  [pdf, other

    cs.DB cs.CL cs.CV cs.IR

    Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

    Authors: Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian

    Abstract: Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  21. arXiv:1812.02699  [pdf, other

    cs.CV

    Online Model Distillation for Efficient Video Inference

    Authors: Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian

    Abstract: High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, w… ▽ More

    Submitted 27 January, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

    Journal ref: ICCV 2019

  22. arXiv:1805.07339  [pdf, other

    cs.CV cs.DC cs.GR

    Scanner: Efficient Video Analysis at Scale

    Authors: Alex Poms, Will Crichton, Pat Hanrahan, Kayvon Fatahalian

    Abstract: A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field's ability to explore new… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: 14 pages, 14 figuers