Zum Hauptinhalt springen

Showing 1–50 of 54 results for author: Nie, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01648  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization

    Authors: Siyi Gu, Minkai Xu, Alexander Powers, Weili Nie, Tomas Geffner, Karsten Kreis, Jure Leskovec, Arash Vahdat, Stefano Ermon

    Abstract: Generating ligand molecules for specific protein targets, known as structure-based drug design, is a fundamental problem in therapeutics development and biological discovery. Recently, target-aware generative models, especially diffusion models, have shown great promise in modeling protein-ligand interactions and generating candidate drugs. However, existing models primarily focus on learning the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.03831  [pdf, other

    cs.CR

    Malware Classification Based on Image Segmentation

    Authors: Wanhu Nie

    Abstract: Executable programs are highly structured files that can be recognized by operating systems and loaded into memory, analyzed for their dependencies, allocated resources, and ultimately executed. Each section of an executable program possesses distinct file and semantic boundaries, resembling puzzle pieces with varying shapes, textures, and sizes. These individualistic sections, when combined in di… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.02509  [pdf, other

    cs.CV

    CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

    Authors: Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat

    Abstract: Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, limiting the expression of cinematic language and user control. To address this issue, we introduce CamCo, which allows fine-grained Camera pose Contro… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page: https://ir1d.github.io/CamCo/

  4. arXiv:2406.01594  [pdf, other

    cs.CV cs.GR cs.LG

    DiffUHaul: A Training-Free Method for Object Dragging in Images

    Authors: Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie

    Abstract: Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page is available at https://omriavrahami.com/diffuhaul/

  5. arXiv:2405.08246  [pdf, other

    cs.CV cs.AI cs.LG

    Compositional Text-to-Image Generation with Dense Blob Representations

    Authors: Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat

    Abstract: Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  6. arXiv:2403.14148  [pdf, other

    cs.CV cs.LG

    Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

    Authors: Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

    Abstract: Video diffusion models have recently made great progress in generation quality, but are still limited by the high memory and computational requirements. This is because current video diffusion models often attempt to process high-dimensional videos directly. To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion model… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: ICLR 2024. Project page: https://sihyun.me/CMD

  7. arXiv:2402.14167  [pdf, other

    cs.CV cs.LG

    T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

    Authors: Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

    Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling tra… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  8. arXiv:2401.17123  [pdf, other

    cs.LG cs.AI q-bio.QM

    Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled

    Authors: Shengchao Liu, Chengpeng Wang, Jiarui Lu, Weili Nie, Hanchen Wang, Zhuoxinran Li, Bolei Zhou, Jian Tang

    Abstract: Deep generative models (DGMs) have been widely developed for graph data. However, much less investigation has been carried out on understanding the latent space of such pretrained graph DGMs. These understandings possess the potential to provide constructive guidelines for crucial tasks, such as graph controllable generation. Thus in this work, we are interested in studying this problem and propos… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  9. arXiv:2311.18405  [pdf, other

    cs.CV

    CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

    Authors: Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, Anan Liu

    Abstract: Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on and multiple denoising iterations limit… ▽ More

    Submitted 25 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  10. arXiv:2311.18402  [pdf, other

    cs.CV

    MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

    Authors: Dan Song, Xinwei Fu, Weizhi Nie, Wenhui Li, Lanjun Wang, You Yang, Anan Liu

    Abstract: Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalizatio… ▽ More

    Submitted 17 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  11. arXiv:2311.06978  [pdf, other

    cs.LG cs.CV stat.ML

    Augmented Bridge Matching

    Authors: Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Weilie Nie

    Abstract: Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. I… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  12. arXiv:2311.04811  [pdf, other

    cs.CV

    Image-Based Virtual Try-On: A Survey

    Authors: Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, Mohan Kankanhalli, An-An Liu

    Abstract: Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to acc… ▽ More

    Submitted 1 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 30 pages, 18 figures

  13. arXiv:2310.07159  [pdf, other

    cs.CR cs.SI

    My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection

    Authors: Lanjun Wang, Xinran Qiao, Yanwei Xie, Weizhi Nie, Yongdong Zhang, Anan Liu

    Abstract: Social platforms such as Twitter are under siege from a multitude of fraudulent users. In response, social bot detection tasks have been developed to identify such fake users. Due to the structure of social networks, the majority of methods are based on the graph neural network(GNN), which is susceptible to attacks. In this study, we propose a node injection-based adversarial attack method designe… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  14. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  15. arXiv:2309.06928  [pdf, other

    cs.CL cs.CV

    Dynamic Causal Disentanglement Model for Dialogue Emotion Detection

    Authors: Yuting Su, Yichen Wei, Weizhi Nie, Sicheng Zhao, Anan Liu

    Abstract: Emotion detection is a critical technology extensively employed in diverse fields. While the incorporation of commonsense knowledge has proven beneficial for existing emotion detection methods, dialogue-based emotion detection encounters numerous difficulties and challenges due to human agency and the variability of dialogue content.In dialogues, human emotions tend to accumulate in bursts. Howeve… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  16. arXiv:2308.13801  [pdf, other

    cs.AI cs.MM

    Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery

    Authors: Qiang Li, Qiuyang Ma, Weizhi Nie, Anan Liu

    Abstract: With the development of deep learning techniques, supervised learning has achieved performances surpassing those of humans. Researchers have designed numerous corresponding models for different data modalities, achieving excellent results in supervised tasks. However, with the exponential increase of data in multiple fields, the recognition and classification of unlabeled data have gradually becom… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  17. arXiv:2308.12351  [pdf, other

    hep-ph cs.LG hep-ex

    Improving Generative Model-based Unfolding with Schrödinger Bridges

    Authors: Sascha Diefenbacher, Guan-Horng Liu, Vinicius Mikuni, Benjamin Nachman, Weili Nie

    Abstract: Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of… ▽ More

    Submitted 22 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: 9 pages, 5 figures

  18. arXiv:2308.03027  [pdf, other

    cs.LG cs.CV eess.SP

    Causal Disentanglement Hidden Markov Model for Fault Diagnosis

    Authors: Rihao Chang, Yongtao Ma, Weizhi Nie, Jie Nie, An-an Liu

    Abstract: In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism a… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  19. arXiv:2306.09305  [pdf, other

    cs.CV cs.AI cs.LG

    Fast Training of Diffusion Models with Masked Transformers

    Authors: Hongkai Zheng, Weili Nie, Arash Vahdat, Anima Anandkumar

    Abstract: We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in the vision domain. Our work is the first to exploit masked training to reduce the training cost of diffusion models significantly. Specifically, we randomly mask… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  20. arXiv:2306.01970  [pdf, other

    cs.LG cs.AI cs.CV cs.CY

    Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

    Authors: Weizhi Nie, Yuhe Yu, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai

    Abstract: In recent years, medical information technology has made it possible for electronic health record (EHR) to store fairly complete clinical data. This has brought health care into the era of "big data". However, medical data are often sparse and strongly correlated, which means that medical problems cannot be solved effectively. With the rapid development of deep learning in recent years, it has pro… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  21. arXiv:2306.01232  [pdf, other

    eess.IV cs.CV

    Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep r… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  22. arXiv:2305.17770  [pdf, ps, other

    cs.CV

    Point Cloud Completion Guided by Prior Knowledge via Causal Inference

    Authors: Songxue Gao, Chuanqi Jiao, Ruidong Chen, Weijie Wang, Weizhi Nie

    Abstract: Point cloud completion aims to recover raw point clouds captured by scanners from partial observations caused by occlusion and limited view angles. This makes it hard to recover details because the global feature is unlikely to capture the full details of all missing parts. In this paper, we propose a novel approach to point cloud completion task called Point-PC, which uses a memory network to ret… ▽ More

    Submitted 15 December, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  23. arXiv:2305.15753  [pdf, other

    cs.CV

    T2TD: Text-3D Generation Model based on Prior Knowledge Guidance

    Authors: Weizhi Nie, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe

    Abstract: In recent years, 3D models have been utilized in many applications, such as auto-driver, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the ability of human beings to complement… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  24. arXiv:2305.12072  [pdf, other

    eess.IV cs.CV

    Chest X-ray Image Classification: A Causal Perspective

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  25. arXiv:2305.12070  [pdf, other

    eess.IV cs.CV

    Instrumental Variable Learning for Chest X-ray Classification

    Authors: Weizhi Nie, Chen Zhang, Dan song, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  26. arXiv:2303.01507  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Defending against Adversarial Audio via Diffusion Model

    Authors: Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

    Abstract: Deep learning models have been widely used in commercial acoustic systems in recent years. However, adversarial audio examples can cause abnormal behaviors for those acoustic systems, while being hard for humans to perceive. Various methods, such as transformation-based defenses and adversarial training, have been proposed to protect acoustic systems from adversarial attacks, but they are less eff… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  27. arXiv:2302.05872  [pdf, other

    cs.CV cs.LG stat.ML

    I$^2$SB: Image-to-Image Schrödinger Bridge

    Authors: Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar

    Abstract: We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I$^2$SB belongs to a tractable class of Schröd… ▽ More

    Submitted 25 May, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: ICML camera ready (high-resolution figures)

  28. arXiv:2302.04858  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

    Authors: Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

    Abstract: Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating ne… ▽ More

    Submitted 22 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Findings of EMNLP 2023

  29. arXiv:2302.04611  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    A Text-guided Protein Design Framework

    Authors: Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar

    Abstract: Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework tha… ▽ More

    Submitted 12 August, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

  30. arXiv:2212.10789  [pdf, other

    cs.LG cs.CL q-bio.QM stat.ML

    Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

    Authors: Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar

    Abstract: There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Her… ▽ More

    Submitted 29 January, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

  31. arXiv:2211.13449  [pdf, other

    cs.LG cs.CV

    Fast Sampling of Diffusion Models via Operator Learning

    Authors: Hongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion m… ▽ More

    Submitted 22 July, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  32. arXiv:2211.00322  [pdf, other

    cs.LG cs.AI cs.CR

    DensePure: Understanding Diffusion Models towards Adversarial Robustness

    Authors: Chaowei Xiao, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, Dawn Song

    Abstract: Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions u… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  33. arXiv:2210.15136  [pdf, ps, other

    cs.CV

    3D Shape Knowledge Graph for Cross-domain 3D Shape Retrieval

    Authors: Rihao Chang, Yongtao Ma, Tong Hao, Weizhi Nie

    Abstract: The surge in 3D modeling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. This study presents an innovative notion,… ▽ More

    Submitted 21 December, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

  34. arXiv:2209.15171  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    State-specific protein-ligand complex structure prediction with a multi-scale deep generative model

    Authors: Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar

    Abstract: The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that… ▽ More

    Submitted 19 April, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 19 pages, 5 figures, 1 table & Supplementary Information (18 pages, 2 figures, 7 tables, 12 algorithms); supersedes an earlier version arXiv:2209.15171v1 presented at the NeurIPS 2022 MLSB workshop as a contributed talk

  35. arXiv:2209.07511  [pdf, other

    cs.CV

    Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

    Authors: Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

    Abstract: Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In thi… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022

  36. arXiv:2209.02976  [pdf, other

    cs.CV

    YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

    Authors: Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, Xiaolin Wei

    Abstract: For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. Considering the divers… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: technical report

  37. arXiv:2208.11126  [pdf, other

    q-bio.QM cs.LG

    Retrieval-based Controllable Molecule Generation

    Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar

    Abstract: Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small… ▽ More

    Submitted 24 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: ICLR 2023

  38. arXiv:2208.09801  [pdf, other

    cs.CV cs.CR cs.LG

    PointDP: Diffusion-driven Purification against Adversarial Attacks on 3D Point Cloud Recognition

    Authors: Jiachen Sun, Weili Nie, Zhiding Yu, Z. Morley Mao, Chaowei Xiao

    Abstract: 3D Point cloud is becoming a critical data representation in many real-world applications like autonomous driving, robotics, and medical imaging. Although the success of deep learning further accelerates the adoption of 3D point clouds in the physical world, deep learning is notorious for its vulnerability to adversarial attacks. In this work, we first identify that the state-of-the-art empirical… ▽ More

    Submitted 21 August, 2022; originally announced August 2022.

  39. arXiv:2206.05641  [pdf, ps, other

    cs.CV cs.LG eess.IV

    An Unsupervised Deep-Learning Method for Bone Age Assessment

    Authors: Hao Zhu, Wan-Jing Nie, Yue-Jie Hou, Qi-Meng Du, Si-Jing Li, Chi-Chun Zhou

    Abstract: The bone age, reflecting the degree of development of the bones, can be used to predict the adult height and detect endocrine diseases of children. Both examinations of radiologists and variability of operators have a significant impact on bone age assessment. To decrease human intervention , machine learning algorithms are used to assess the bone age automatically. However, conventional supervise… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

  40. arXiv:2205.13803  [pdf, other

    cs.CV cs.AI cs.LG

    Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

    Authors: Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

    Abstract: A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteris… ▽ More

    Submitted 13 April, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 (oral); First two authors contributed equally; Code: https://github.com/NVlabs/Bongard-HOI

  41. arXiv:2205.07460  [pdf, other

    cs.LG cs.CR cs.CV

    Diffusion Models for Adversarial Purification

    Authors: Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar

    Abstract: Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure t… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: ICML 2022

  42. arXiv:2204.11167  [pdf, other

    cs.CV cs.AI cs.LG

    RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

    Authors: Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

    Abstract: Reasoning about visual relationships is central to how humans interpret the visual world. This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i.e.,… ▽ More

    Submitted 11 June, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

    Comments: ICLR 2022; Code: https://github.com/NVlabs/RelViT

  43. arXiv:2110.10873  [pdf, other

    cs.CV cs.AI cs.LG

    Controllable and Compositional Generation with Latent-Space Energy-Based Models

    Authors: Weili Nie, Arash Vahdat, Anima Anandkumar

    Abstract: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attrib… ▽ More

    Submitted 3 December, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: 32 pages, NeurIPS 2021

  44. arXiv:2108.04527  [pdf, other

    cs.CV

    Multigranular Visual-Semantic Embedding for Cloth-Changing Person Re-identification

    Authors: Zan Gao, Hongwei Wei, Weili Guan, Weizhi Nie, Meng Liu, Meng Wang

    Abstract: Person reidentification (ReID) is a very hot research topic in machine learning and computer vision, and many person ReID approaches have been proposed; however, most of these methods assume that the same person has the same clothes within a short time interval, and thus their visual appearance must be similar. However, in an actual surveillance environment, a given person has a great probability… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  45. arXiv:2012.15262  [pdf, other

    cs.CL cs.AI

    Robustness Testing of Language Understanding in Task-Oriented Dialog

    Authors: Jiexi Liu, Ryuichi Takanobu, Jiaxin Wen, Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang

    Abstract: Most language understanding models in task-oriented dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable output when being exposed to natural language perturbation or variation in practice. In this paper, we conduct comprehensive evaluation and analysis with… ▽ More

    Submitted 4 June, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: ACL 2021 long paper

  46. arXiv:2010.00763  [pdf, other

    cs.AI cs.CV cs.LG

    Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

    Authors: Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Animashree Anandkumar

    Abstract: Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard problems (BPs) we… ▽ More

    Submitted 4 January, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: 22 pages, NeurIPS 2020

  47. arXiv:2009.05103  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

    Authors: Sicheng Zhao, Yaxian Li, Xingxu Yao, Weizhi Nie, Pengfei Xu, Jufeng Yang, Kurt Keutzer

    Abstract: Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matchi… ▽ More

    Submitted 22 August, 2020; originally announced September 2020.

    Comments: Accepted by ACM Multimedia 2020

  48. arXiv:2006.07460  [pdf, other

    cs.LG stat.ML

    An Improved Semi-Supervised VAE for Learning Disentangled Representations

    Authors: Weili Nie, Zichao Wang, Ankit B. Patel, Richard G. Baraniuk

    Abstract: Learning interpretable and disentangled representations is a crucial yet challenging task in representation learning. In this work, we focus on semi-supervised disentanglement learning and extend work by Locatello et al. (2019) by introducing another source of supervision that we denote as label replacement. Specifically, during training, we replace the inferred representation associated with a da… ▽ More

    Submitted 22 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

  49. arXiv:2005.05519  [pdf

    cs.CV cs.AI eess.SP

    A Novel Granular-Based Bi-Clustering Method of Deep Mining the Co-Expressed Genes

    Authors: Kaijie Xu, Witold Pedrycz, Zhiwu Li, Yinghui Quan, Weike Nie

    Abstract: Traditional clustering methods are limited when dealing with huge and heterogeneous groups of gene expression data, which motivates the development of bi-clustering methods. Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions. Studies show that mining bi-clusters of consistent trends and trends with similar degrees of flu… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

  50. arXiv:2003.03461  [pdf, other

    cs.CV cs.LG

    Semi-Supervised StyleGAN for Disentanglement Learning

    Authors: Weili Nie, Tero Karras, Animesh Garg, Shoubhik Debnath, Anjul Patney, Ankit B. Patel, Anima Anandkumar

    Abstract: Disentanglement learning is crucial for obtaining disentangled representations and controllable generation. Current disentanglement methods face several inherent limitations: difficulty with high-resolution images, primarily focusing on learning disentangled representations, and non-identifiability due to the unsupervised setting. To alleviate these limitations, we design new architectures and los… ▽ More

    Submitted 25 November, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: ICML 2020, 21 pages. Project page: https://sites.google.com/nvidia.com/semi-stylegan