Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Bin, Y

.
  1. arXiv:2408.04388  [pdf, other

    cs.MM cs.AI cs.IR

    MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

    Authors: Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

    Abstract: We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models. Compared to using text or graph modalities, the investigation of utilizing images for temporal event forecasting has not been fully explored, especially in the era of large language models (LLMs). To bridge this gap, we are particularly interested in two key questions of: 1) why images… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    ACM Class: H.3.3

  2. arXiv:2408.00491  [pdf, other

    cs.CL cs.CV cs.MM

    GalleryGPT: Analyzing Paintings with Large Multimodal Models

    Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted as Oral Presentation at ACM Multimedia 2024

  3. Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

    Authors: Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality. Despite of the… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  4. arXiv:2407.12339  [pdf, other

    cs.CV

    Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection

    Authors: Zhenni Yu, Xiaoqin Zhang, Li Zhao, Yi Bin, Guobao Xiao

    Abstract: This paper introduces a new Segment Anything Model with Depth Perception (DSAM) for Camouflaged Object Detection (COD). DSAM exploits the zero-shot capability of SAM to realize precise segmentation in the RGB-D domain. It consists of the Prompt-Deeper Module and the Finer Module. The Prompt-Deeper Module utilizes knowledge distillation and the Bias Correction Module to achieve the interaction betw… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ACM MM 2024

  5. arXiv:2407.03788  [pdf, other

    cs.CV cs.CL

    Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

    Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More

    Submitted 19 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  6. arXiv:2406.17294  [pdf, other

    cs.CL

    Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

    Authors: Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages

  7. arXiv:2406.05615  [pdf, other

    cs.CL

    Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

    Authors: Thong Nguyen, Yi Bin, Junbin Xiao, Leigang Qu, Yicong Li, Jay Zhangjie Wu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with te… ▽ More

    Submitted 1 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 (Findings)

  8. arXiv:2404.05705  [pdf, other

    cs.CV

    Learning 3D-Aware GANs from Unposed Images with Template Feature Field

    Authors: Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

    Abstract: Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice. This work targets learning 3D-aware GANs from unposed images, for which we propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF). Concretely, in addition to a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: https://XDimlab.github.io/TeFF

  9. arXiv:2311.03133  [pdf, other

    physics.flu-dyn

    Incorporating basic calibrations in existing machine-learned turbulence modeling

    Authors: Jiaqi J. L. Li, Yuanwei Bin, George P. Huang, Xiang I. A. Yang

    Abstract: This work aims to incorporate basic calibrations of Reynolds-averaged Navier-Stokes (RANS) models as part of machine learning (ML) frameworks. The ML frameworks considered are tensor-basis neural network (TBNN), physics-informed machine learning (PIML), and field inversion & machine learning (FIML) in J. Fluid Mech., 2016, 807, 155-166, Phys. Rev. Fluids, 2017, 2(3), 034603 and J. Comp. Phys., 201… ▽ More

    Submitted 14 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  10. arXiv:2311.01807  [pdf, other

    cs.SI

    Cross-modal Consistency Learning with Fine-grained Fusion Network for Multimodal Fake News Detection

    Authors: Jun Li, Yi Bin, Jie Zou, Jie Zou, Guoqing Wang, Yang Yang

    Abstract: Previous studies on multimodal fake news detection have observed the mismatch between text and images in the fake news and attempted to explore the consistency of multimodal news based on global features of different modalities. However, they fail to investigate this relationship between fine-grained fragments in multimodal content. To gain public trust, fake news often includes relevant parts in… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  11. arXiv:2310.12640  [pdf, other

    cs.CL

    Non-Autoregressive Sentence Ordering

    Authors: Yi Bin, Wenhao Shi, Bin Ji, Jipeng Zhang, Yujuan Ding, Yang Yang

    Abstract: Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step. Such an autoregressive manner only leverages unilateral dependencies during decoding and cannot fully explore the semantic dependency between sentences for ordering. To overcome these limitations, in this paper, we pro… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings of EMNLP2023

  12. arXiv:2310.09590  [pdf, other

    cs.CL cs.AI

    Solving Math Word Problems with Reexamination

    Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Yang Yang, See-Kiong Ng

    Abstract: Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules. This paper brings a different perspective of \textit{reexamination process} during training by introducing a pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual) learning scheme to… ▽ More

    Submitted 19 November, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: To be appeared at NeurIPS2023 Workshop on MATH-AI

  13. arXiv:2310.09368  [pdf, other

    physics.flu-dyn

    Constrained re-calibration of Reynolds-averaged Navier-Stokes models

    Authors: Yuanwei Bin, George Huang, Robert Kunz, Xiang I A Yang

    Abstract: The constants and functions in Reynolds-averaged Navier Stokes (RANS) turbulence models are coupled. Consequently, modifications of a RANS model often negatively impact its basic calibrations, which is why machine-learned augmentations are often detrimental outside the training dataset. A solution to this is to identify the degrees of freedom that do not affect the basic calibrations and only modi… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  14. arXiv:2310.09367  [pdf, other

    physics.flu-dyn

    Large-eddy simulation of separated flows on unconventionally coarse grids

    Authors: Yuanwei Bin, George I. Park, Yu Lv, Xiang I. A. Yang

    Abstract: We examine and benchmark the emerging idea of applying the large-eddy simulation (LES) formalism to unconventionally coarse grids where RANS would be considered more appropriate at first glance. We distinguish this idea from very-large-eddy-simulation (VLES) and detached-eddy-simulation (DES), which require switching between RANS and LES formalism. LES on RANS grid is appealing because first, it r… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  15. arXiv:2310.09366  [pdf, other

    physics.flu-dyn

    A priori screening of data-enabled turbulence models

    Authors: Peng E S Chen, Yuanwei Bin, Xiang I A Yang, Yipeng Shi, Mahdi Abkar, George I. Park

    Abstract: Assessing the compliance of a white-box turbulence model with known turbulent knowledge is straightforward. It enables users to screen conventional turbulence models and identify apparent inadequacies, thereby allowing for a more focused and fruitful validation and verification. However, comparing a black-box machine-learning model to known empirical scalings is not straightforward. Unless one imp… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  16. arXiv:2309.04800  [pdf, other

    cs.CV

    VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis

    Authors: Xinya Chen, Jiaxin Huang, Yanrui Bin, Lu Yu, Yiyi Liao

    Abstract: Unsupervised learning of 3D-aware generative adversarial networks has lately made much progress. Some recent work demonstrates promising results of learning human generative models using neural articulated radiance fields, yet their generalization ability and controllability lag behind parametric human models, i.e., they do not perform well when generalizing to novel pose/shape and are not part co… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  17. arXiv:2308.04380  [pdf, other

    cs.CV cs.IR cs.MM

    Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

    Authors: Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

    Abstract: Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, wh… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM MM 2023

  18. arXiv:2308.04343  [pdf, other

    cs.CV cs.IR cs.MM

    Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

    Authors: Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

    Abstract: Most existing cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts, \textit{e.g.}, CNN for images and RNN/Transformer for texts. Such discrepancy in architectures may induce different semantic distribution spaces and limit the interactions between images and texts, and further result in inferior alignment between images and texts. To fill this… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM Multimedia 2023

  19. arXiv:2306.11746  [pdf, other

    cs.SI cs.MM

    Focusing on Relevant Responses for Multi-modal Rumor Detection

    Authors: Jun Li, Yi Bin, Liang Peng, Yang Yang, Yangyang Li, Hao Jin, Zi Huang

    Abstract: In the absence of an authoritative statement about a rumor, people may expose the truth behind such rumor through their responses on social media. Most rumor detection methods aggregate the information of all the responses and have made great progress. However, due to the different backgrounds of users, the responses have different relevance for discovering th suspicious points hidden in a rumor c… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: Submitted to TKDE

  20. arXiv:2305.04556  [pdf, other

    cs.CL cs.AI

    Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

    Authors: Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Existing MWP solvers employ sequence or binary tree to present the solution expression and decode it from given problem description. However, such structures fail to handle the variants that can be derived via mathematical manipulation, e.g., $(a_1+a_2) * a_3$ and $a_1 * a_3+a_2 * a_3$ can both be possible valid solutions for a same problem but formulated as different expression sequences or trees… ▽ More

    Submitted 28 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP2023

  21. arXiv:2201.02062  [pdf

    cs.NI

    Traffic Flow Modeling for UAV-Enabled Wireless Networks

    Authors: A. Abada, Y. Bin, T. Taleb

    Abstract: This paper investigates traffic flow modeling issue in multi-services oriented unmanned aerial vehicle (UAV)-enabled wireless networks, which is critical for supporting future various applications of such networks. We propose a general traffic flow model for multi-services oriented UAV-enable wireless networks. Under this model, we first classify the network services into three subsets: telemetry,… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

  22. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  23. arXiv:2011.11221  [pdf, other

    cs.CV

    Adversarial Refinement Network for Human Motion Prediction

    Authors: Xianjin Chao, Yanrui Bin, Wenqing Chu, Xuan Cao, Yanhao Ge, Chengjie Wang, Jilin Li, Feiyue Huang, Howard Leung

    Abstract: Human motion prediction aims to predict future 3D skeletal sequences by giving a limited human motion as inputs. Two popular methods, recurrent neural networks and feed-forward deep networks, are able to predict rough motion trend, but motion details such as limb movement may be lost. To predict more accurate future human motion, we propose an Adversarial Refinement Network (ARNet) following a sim… ▽ More

    Submitted 23 November, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Accepted by ACCV 2020(Oral)

  24. arXiv:2008.00697  [pdf, other

    cs.CV

    Adversarial Semantic Data Augmentation for Human Pose Estimation

    Authors: Yanrui Bin, Xuan Cao, Xinya Chen, Yanhao Ge, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Changxin Gao, Nong Sang

    Abstract: Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge the amounts of challenging cases, previous methods augmented images by cropping and pasting image patches with weak semantics, which leads to unrealistic appe… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  25. arXiv:2005.09816  [pdf, other

    cs.CV

    Relevant Region Prediction for Crowd Counting

    Authors: Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang

    Abstract: Crowd counting is a concerned and challenging task in computer vision. Existing density map based methods excessively focus on the individuals' localization which harms the crowd counting performance in highly congested scenes. In addition, the dependency between the regions of different density is also ignored. In this paper, we propose Relevant Region Prediction (RRP) for crowd counting, which c… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

    Comments: accepted by Neurocomputing

  26. arXiv:1606.04631  [pdf, other

    cs.MM cs.CL

    Bidirectional Long-Short Term Memory for Video Description

    Authors: Yi Bin, Yang Yang, Zi Huang, Fumin Shen, Xing Xu, Heng Tao Shen

    Abstract: Video captioning has been attracting broad research attention in multimedia community. However, most existing approaches either ignore temporal information among video frames or just employ local contextual temporal knowledge. In this work, we propose a novel video captioning framework, termed as \emph{Bidirectional Long-Short Term Memory} (BiLSTM), which deeply captures bidirectional global tempo… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

    Comments: 5 pages

  27. arXiv:1311.4659  [pdf

    physics.med-ph

    A Fast local Reconstruction algorithm by selective backprojection for Low-Dose in Dental Computed Tomography

    Authors: Yan Bin, Deng Lin, Han Yu, Zhang Feng, Wang Xian Chao, Li Lei

    Abstract: High radiation dose in computed tomography (CT) scans increases the lifetime risk of cancer, which become a major clinical concern. The backprojection-filtration (BPF) algorithm could reduce radiation dose by reconstructing images from truncated data in a short scan. In dental CT, it could reduce radiation dose for the teeth by using the projection acquired in a short scan, and could avoid irradia… ▽ More

    Submitted 19 November, 2013; originally announced November 2013.

    Comments: 18 pages,10 figures

    MSC Class: 78-05

  28. arXiv:1208.1379  [pdf, ps, other

    physics.optics

    All-optical Switch Based on Optical Waveguide Coupling with Micro Cavity Array

    Authors: Yang Bin, Li Heling, Zhao Hongsheng, Yang Tao

    Abstract: This paper theoretically analyzes the optical transmission characteristics of an optical waveguide when coupling to a micro cavity array. The results showed that not only were there sharp peaks on the transmission and reflection spectra, but also that a certain system configuration can produce a backward wave to obtain a phase shift in the small detuning range between the incident wave and the mic… ▽ More

    Submitted 7 August, 2012; originally announced August 2012.