Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: A, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07410  [pdf, other

    cs.CL

    Aquila2 Technical Report

    Authors: Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

    Abstract: This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  3. arXiv:2405.05648  [pdf, other

    cs.RO cs.CV

    ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera

    Authors: Jun Shi, Yong A, Yixiang Jin, Dingzhe Li, Haoyu Niu, Zhezhu Jin, He Wang

    Abstract: In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo net… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2024

  4. arXiv:2404.19130  [pdf, other

    cs.IR cs.AI cs.LG

    SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval

    Authors: Zihao Li, Yuyi Ao, Jingrui He

    Abstract: Knowledge graphs (KGs), which store an extensive number of relational facts (head, relation, tail), serve various applications. While many downstream tasks highly rely on the expressive modeling and predictive embedding of KGs, most of the current KG representation learning methods, where each entity is embedded as a vector in the Euclidean space and each relation is embedded as a transformation,… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024, Camera Ready Version

  5. arXiv:2404.18201  [pdf, other

    cs.RO

    What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

    Authors: Dingzhe Li, Yixiang Jin, Yong A, Hongze Yu, Jun Shi, Xiaoshuai Hao, Peng Hao, Huaping Liu, Fuchun Sun, Jianwei Zhang, Bin Fang

    Abstract: The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of compu… ▽ More

    Submitted 9 August, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  6. arXiv:2403.00274  [pdf, other

    cs.CV cs.SD eess.AS

    CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

    Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

    Abstract: Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but… ▽ More

    Submitted 29 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  7. arXiv:2312.01421  [pdf, other

    cs.RO

    RobotGPT: Robot Manipulation Learning from ChatGPT

    Authors: Yixiang Jin, Dingzhe Li, Yong A, Jun Shi, Peng Hao, Fuchun Sun, Jianwei Zhang, Bin Fang

    Abstract: We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Altho… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  8. arXiv:2305.05354  [pdf, other

    cs.RO cs.AI

    Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement

    Authors: Yunke Ao, Hooman Esfandiari, Fabio Carrillo, Yarden As, Mazda Farshad, Benjamin F. Grewe, Andreas Krause, Philipp Fuernstahl

    Abstract: Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conducted in critical proximity to vital structures with a limited view of anatomy. Robotic surgery systems have been proposed to improve placement accuracy, however, state-of-the-art systems suffer from the limitations of open-loop approaches, as they follow traditional concepts of preoperative pl… ▽ More

    Submitted 10 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 10 pages, 4 figures

  9. arXiv:2210.11153  [pdf, other

    eess.IV cs.CV

    Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Yibin Huang, Jingyang Peng, Chang Chen, Cheng Li, Eduardo Pérez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu , et al. (18 additional authors not shown)

    Abstract: Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image data… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: ECCV 2022 Advances in Image Manipulation (AIM) workshop

  10. arXiv:2207.13505  [pdf, other

    cs.CV

    Multi-Forgery Detection Challenge 2022: Push the Frontier of Unconstrained and Diverse Forgery Detection

    Authors: Jianshu Li, Man Luo, Jian Liu, Tao Chen, Chengjie Wang, Ziwei Liu, Shuo Liu, Kewei Yang, Xuning Shao, Kang Chen, Boyuan Liu, Mingyu Guo, Ying Guo, Yingying Ao, Pengfei Gao

    Abstract: In this paper, we present the Multi-Forgery Detection Challenge held concurrently with the IEEE Computer Society Workshop on Biometrics at CVPR 2022. Our Multi-Forgery Detection Challenge aims to detect automatic image manipulations including but not limited to image editing, image synthesis, image generation, image photoshop, etc. Our challenge has attracted 674 teams from all over the world, wit… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Workshop and challenge summary paper, containing technical reports from different teams

  11. arXiv:2207.07268  [pdf, other

    cs.CV

    Lightweight Vision Transformer with Cross Feature Attention

    Authors: Youpeng Zhao, Huadong Tang, Yingying Jiang, Yong A, Qiang Wu

    Abstract: Recent advances in vision transformers (ViTs) have achieved great performance in visual recognition tasks. Convolutional neural networks (CNNs) exploit spatial inductive bias to learn visual representations, but these networks are spatially local. ViTs can learn global representations with their self-attention mechanism, but they are usually heavy-weight and unsuitable for mobile devices. In this… ▽ More

    Submitted 5 July, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Technical Report. A shorter version has been accepted to ICIP 2023

  12. arXiv:2203.07068  [pdf

    cs.LG

    A New Learning Paradigm for Stochastic Configuration Network: SCN+

    Authors: Yanshuang Ao, Xinyu Zhou, Wei Dai

    Abstract: Learning using privileged information (LUPI) paradigm, which pioneered teacher-student interaction mechanism, makes the learning models use additional information in training stage. This paper is the first to propose an incremental learning algorithm with LUPI paradigm for stochastic configuration network (SCN), named SCN+. This novel algorithm can leverage privileged information into SCN in the t… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  13. arXiv:2203.02086  [pdf, other

    cs.LG cs.CV cs.NE

    WPNAS: Neural Architecture Search by jointly using Weight Sharing and Predictor

    Authors: Ke Lin, Yong A, Zhuoxin Gan, Yingying Jiang

    Abstract: Weight sharing based and predictor based methods are two major types of fast neural architecture search methods. In this paper, we propose to jointly use weight sharing and predictor in a unified framework. First, we construct a SuperNet in a weight-sharing way and probabilisticly sample architectures from the SuperNet. To increase the correctness of the evaluation of architectures, besides direct… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  14. arXiv:2112.02752  [pdf, other

    cs.DC cs.AI cs.LG

    End-to-end Adaptive Distributed Training on PaddlePaddle

    Authors: Yulong Ao, Zhihua Wu, Dianhai Yu, Weibao Gong, Zhiqing Kui, Minxu Zhang, Zilingfeng Ye, Liang Shen, Yanjun Ma, Tian Wu, Haifeng Wang, Wei Zeng, Chao Yang

    Abstract: Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse computing resources, and their dynamic changes during a training job. In this study, we design our distributed training framework in a systematic end-to-end view to pro… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 16 pages, 10 figures, 4 tables

  15. arXiv:2111.00659  [pdf, other

    cs.CV

    Feature Aggregation and Refinement Network for 2D AnatomicalLandmark Detection

    Authors: Yueyuan Ao, Hong Wu

    Abstract: Localization of anatomical landmarks is essential for clinical diagnosis, treatment planning, and research. In this paper, we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks. To alleviate the problem of limited training data in the medical domain, our network adopts a deep network pre-trained on natural im… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

  16. arXiv:2110.00791  [pdf, other

    cs.CV cs.AI

    Optimizing Neural Network for Computer Vision task in Edge Device

    Authors: Ranjith M S, S Parameshwara, Pavan Yadav A, Shriganesh Hegde

    Abstract: The field of computer vision has grown very rapidly in the past few years due to networks like convolution neural networks and their variants. The memory required to store the model and computational expense are very high for such a network limiting it to deploy on the edge device. Many times, applications rely on the cloud but that makes it hard for working in real-time due to round-trip delays.… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  17. arXiv:2109.14974  [pdf, other

    cs.RO cs.LG

    Unified Data Collection for Visual-Inertial Calibration via Deep Reinforcement Learning

    Authors: Yunke Ao, Le Chen, Florian Tschopp, Michel Breyer, Andrei Cramariuc, Roland Siegwart

    Abstract: Visual-inertial sensors have a wide range of applications in robotics. However, good performance often requires different sophisticated motion routines to accurately calibrate camera intrinsics and inter-sensor extrinsics. This work presents a novel formulation to learn a motion policy to be executed on a robot arm for automatic data collection for calibrating intrinsics and extrinsics jointly. Ou… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  18. arXiv:2011.02574  [pdf, other

    cs.RO cs.LG

    Learning Trajectories for Visual-Inertial System Calibration via Model-based Heuristic Deep Reinforcement Learning

    Authors: Le Chen, Yunke Ao, Florian Tschopp, Andrei Cramariuc, Michel Breyer, Jen Jen Chung, Roland Siegwart, Cesar Cadena

    Abstract: Visual-inertial systems rely on precise calibrations of both camera intrinsics and inter-sensor extrinsics, which typically require manually performing complex motions in front of a calibration target. In this work we present a novel approach to obtain favorable trajectories for visual-inertial system calibration, using model-based deep reinforcement learning. Our key contribution is to model the… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Journal ref: Proceedings of the 4th Conference on Robot Learning (CoRL) 2020

  19. arXiv:2006.16767  [pdf, other

    cs.DC cs.MS

    Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

    Authors: Min Li, Yulong Ao, Chao Yang

    Abstract: Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication (SpMSpV), not to mention dealing with input vectors of varied sparsity. The key challenge is that depending on the sparsity levels, distribution of data, and compute p… ▽ More

    Submitted 17 December, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: 12 pages