Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: You, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01476  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    TraveLER: A Multi-LMM Agent Framework for Video Question-Answering

    Authors: Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig

    Abstract: Recently, Large Multimodal Models (LMMs) have made significant progress in video question-answering using a frame-wise approach by leveraging large-scale, image-based pretraining in a zero-shot manner. While image-based methods for videos have shown impressive performance, a current limitation is that they often overlook how key timestamps are selected and cannot adjust when incorrect timestamps a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2309.11580  [pdf, other

    cs.RO

    A real-time, hardware agnostic framework for close-up branch reconstruction using RGB data

    Authors: Alexander You, Aarushi Mehta, Luke Strohbehn, Jochen Hemming, Cindy Grimm, Joseph R. Davidson

    Abstract: Creating accurate 3D models of tree topology is an important task for tree pruning. The 3D model is used to decide which branches to prune and then to execute the pruning cuts. Previous methods for creating 3D tree models have typically relied on point clouds, which are often computationally expensive to process and can suffer from data defects, especially with thin branches. In this paper, we pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

  3. arXiv:2206.07201  [pdf, other

    cs.RO

    An autonomous robot for pruning modern, planar fruit trees

    Authors: Alexander You, Nidhi Parayil, Josyula Gopala Krishna, Uddhav Bhattarai, Ranjan Sapkota, Dawood Ahmed, Matthew Whiting, Manoj Karkee, Cindy M. Grimm, Joseph R. Davidson

    Abstract: Dormant pruning of fruit trees is an important task for maintaining tree health and ensuring high-quality fruit. Due to decreasing labor availability, pruning is a prime candidate for robotic automation. However, pruning also represents a uniquely difficult problem for robots, requiring robust systems for perception, pruning point determination, and manipulation that must operate under variable li… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

  4. arXiv:2202.13050  [pdf, other

    cs.CV

    Optical flow-based branch segmentation for complex orchard environments

    Authors: Alexander You, Cindy Grimm, Joseph R. Davidson

    Abstract: Machine vision is a critical subsystem for enabling robots to be able to perform a variety of tasks in orchard environments. However, orchards are highly visually complex environments, and computer vision algorithms operating in them must be able to contend with variable lighting conditions and background noise. Past work on enabling deep learning algorithms to operate in these environments has ty… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  5. arXiv:2109.13162  [pdf, other

    cs.RO

    Precision fruit tree pruning using a learned hybrid vision/interaction controller

    Authors: Alexander You, Hannah Kolano, Nidhi Parayil, Cindy Grimm, Joseph R. Davidson

    Abstract: Robotic tree pruning requires highly precise manipulator control in order to accurately align a cutting implement with the desired pruning point at the correct angle. Simultaneously, the robot must avoid applying excessive force to rigid parts of the environment such as trees, support posts, and wires. In this paper, we propose a hybrid control system that uses a learned vision-based controller to… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Submitted for consideration for the 2022 IEEE International Conference on Robotics and Automation (ICRA)

  6. Towards Controllable and Photorealistic Region-wise Image Manipulation

    Authors: Ansheng You, Chenglin Zhou, Qixuan Zhang, Lan Xu

    Abstract: Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresp… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Journal ref: ACMMM 2021

  7. arXiv:2103.02833  [pdf, other

    cs.RO

    Semantics-guided Skeletonization of Sweet Cherry Trees for Robotic Pruning

    Authors: Alexander You, Cindy Grimm, Abhisesh Silwal, Joseph R. Davidson

    Abstract: Dormant pruning for fresh market fruit trees is a relatively unexplored application of agricultural robotics for which few end-to-end systems exist. One of the biggest challenges in creating an autonomous pruning system is the need to reconstruct a model of a tree which is accurate and informative enough to be useful for deciding where to cut. One useful structure for modeling a tree is a skeleton… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

  8. Towards Efficient Scene Understanding via Squeeze Reasoning

    Authors: Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin

    Abstract: Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeez… ▽ More

    Submitted 20 July, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE-TIP

  9. arXiv:2003.12697  [pdf, other

    cs.CV

    Semantically Multi-modal Image Synthesis

    Authors: Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai

    Abstract: In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level. Previous work seeks to use multiple class-specific generators, constraining its usage in datasets with a small number of classes. We instead propose a novel Group Decreasing Network (GroupDNet) that leverages group convolutions in the generator and progressi… ▽ More

    Submitted 2 April, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: To appear in CVPR 2020

  10. arXiv:2002.10120  [pdf, other

    cs.CV cs.RO

    Semantic Flow for Fast and Accurate Scene Parsing

    Authors: Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong

    Abstract: In this paper, we focus on designing effective method for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely used -- atrous convolutions and feature pyramid fusion, are either computation intensive or ineffective. Inspired by the Optical Flow for motion alignment betw… ▽ More

    Submitted 29 March, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: accepted by ECCV 2020(oral)

  11. arXiv:1909.07229  [pdf, other

    cs.CV

    Global Aggregation then Local Distribution in Fully Convolutional Networks

    Authors: Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong

    Abstract: It has been widely proven that modelling long-range dependencies in fully convolutional networks (FCNs) via global aggregation modules is critical for complex scene understanding tasks such as semantic segmentation and object detection. However, global aggregation is often dominated by features of large patterns and tends to oversmooth regions that contain small patterns (e.g., boundaries and smal… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

    Comments: accepted at BMVC 2019

  12. arXiv:1907.11830  [pdf, other

    cs.CV

    Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images

    Authors: Pengyu Zhao, Ansheng You, Yuanxing Zhang, Jiaying Liu, Kaigui Bian, Yunhai Tong

    Abstract: 360° images are usually represented in either equirectangular projection (ERP) or multiple perspective projections. Different from the flat 2D images, the detection task is challenging for 360° images due to the distortion of ERP and the inefficiency of perspective projections. However, existing methods mostly focus on one of the above representations instead of both, leading to limited detection… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

    Comments: 10 pages, 7 figures