Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Nan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13836  [pdf, other

    cs.CV cs.AI

    PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

    Authors: Zifan Chen, Xinyu Nan, Jiazheng Li, Jie Zhao, Haifeng Li, Zilin Lin, Haoshen Li, Heyun Chen, Yiting Liu, Bin Dong, Li Zhang, Lei Tang

    Abstract: Volumetric segmentation is crucial for medical imaging but is often constrained by labor-intensive manual annotations and the need for scenario-specific model training. Furthermore, existing general segmentation models are inefficient due to their design and inferential approaches. Addressing this clinical demand, we introduce PropSAM, a propagation-based segmentation model that optimizes the use… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 26 figures, 6 figures

  2. arXiv:2408.04865  [pdf, other

    cs.SD cs.MM eess.AS

    TEAdapter: Supply abundant guidance for controllable text-to-music generation

    Authors: Jialing Zou, Jiahao Mei, Xudong Nan, Jinghua Li, Daoguo Dong, Liang He

    Abstract: Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ICME'24: IEEE International Conference on Multimedia and Expo

    Journal ref: 2024 IEEE International Conference on Multimedia and Expo (ICME 2024)

  3. arXiv:2401.17862  [pdf, other

    cs.CV

    Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

    Authors: Jianing Li, Xi Nan, Ming Lu, Li Du, Shanghang Zhang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable vision-language capabilities, primarily due to the exceptional in-context understanding and multi-task learning strengths of large language models (LLMs). The advent of visual instruction tuning has further enhanced MLLMs' performance in vision-language understanding. However, while existing MLLMs adeptly recognize \textit{what… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 15 pages,version 1

    ACM Class: I.5.4; I.2.7

  4. arXiv:2308.05201  [pdf, other

    cs.AI cs.HC econ.GN

    "Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

    Authors: Jin Liu, Xingchen Xu, Xi Nan, Yongjun Li, Yong Tan

    Abstract: Large Language Model (LLM) based generative AI, such as ChatGPT, is considered the first generation of Artificial General Intelligence (AGI), exhibiting zero-shot learning abilities for a wide variety of downstream tasks. Due to its general-purpose and emergent nature, its impact on labor dynamics becomes complex and difficult to anticipate. Leveraging an extensive dataset from a prominent online… ▽ More

    Submitted 6 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: 65 pages, 6 figures, 22 tables

    ACM Class: J.4

  5. FoodWise: Food Waste Reduction and Behavior Change on Campus with Data Visualization and Gamification

    Authors: Yue Yu, Sophia Yi, Xi Nan, Leo Yu-Ho Lo, Kento Shigyo, Liwenhan Xie, Jeffry Wicaksana, Kwang-Ting Cheng, Huamin Qu

    Abstract: Food waste presents a substantial challenge with significant environmental and economic ramifications, and its severity on campus environments is of particular concern. In response to this, we introduce FoodWise, a dual-component system tailored to inspire and incentivize campus communities to reduce food waste. The system consists of a data storytelling dashboard that graphically displays food wa… ▽ More

    Submitted 27 July, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted in ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS) 2023

  6. arXiv:2304.05571  [pdf, other

    cs.CV

    SGL: Structure Guidance Learning for Camera Localization

    Authors: Xudong Zhang, Shuang Gao, Xiaohu Nan, Haikuan Ning, Yuchen Yang, Yishan Ping, Jixiang Wan, Shuzhou Dong, Jijunnan Li, Yandong Guo

    Abstract: Camera localization is a classical computer vision task that serves various Artificial Intelligence and Robotics applications. With the rapid developments of Deep Neural Networks (DNNs), end-to-end visual localization methods are prosperous in recent years. In this work, we focus on the scene coordinate prediction ones and propose a network architecture named as Structure Guidance Learning (SGL) w… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  7. arXiv:2007.13525  [pdf, other

    cs.LG cs.CY cs.SI stat.ML

    Detecting Transaction-based Tax Evasion Activities on Social Media Platforms Using Multi-modal Deep Neural Networks

    Authors: Lelin Zhang, Xi Nan, Eva Huang, Sidong Liu

    Abstract: Social media platforms now serve billions of users by providing convenient means of communication, content sharing and even payment between different users. Due to such convenient and anarchic nature, they have also been used rampantly to promote and conduct business activities between unregistered market participants without paying taxes. Tax authorities worldwide face difficulties in regulating… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  8. arXiv:1811.00778  [pdf, other

    cs.CR cs.LG

    Towards the AlexNet Moment for Homomorphic Encryption: HCNN, theFirst Homomorphic CNN on Encrypted Data with GPUs

    Authors: Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Jun Jie Sim, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, Vijay Ramaseshan Chandrasekhar

    Abstract: Deep Learning as a Service (DLaaS) stands as a promising solution for cloud-based inference applications. In this setting, the cloud has a pre-learned model whereas the user has samples on which she wants to run the model. The biggest concern with DLaaS is user privacy if the input samples are sensitive data. We provide here an efficient privacy-preserving system by employing high-end technologies… ▽ More

    Submitted 18 August, 2020; v1 submitted 2 November, 2018; originally announced November 2018.