Skip to main content

Showing 1–50 of 69 results for author: Bi, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11106  [pdf, other

    cs.LG cs.AI

    Deep Learning Evidence for Global Optimality of Gerver's Sofa

    Authors: Kuangdai Leng, Jia Bi, Jaehoon Cha, Samuel Pinilla, Jeyan Thiyagalingam

    Abstract: The Moving Sofa Problem, formally proposed by Leo Moser in 1966, seeks to determine the largest area of a two-dimensional shape that can navigate through an $L$-shaped corridor with unit width. The current best lower bound is about 2.2195, achieved by Joseph Gerver in 1992, though its global optimality remains unproven. In this paper, we investigate this problem by leveraging the universal approxi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 16 pages, 9 figures

  2. arXiv:2407.05869  [pdf, other

    cs.AI

    PORCA: Root Cause Analysis with Partially Observed Data

    Authors: Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi

    Abstract: Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which… ▽ More

    Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2406.19475  [pdf, other

    math.OC cs.LG

    Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization

    Authors: Yue Xie, Jiawen Bi, Hongcheng Liu

    Abstract: When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. In this work, we propose dimension-insensitive stochastic first-order methods (DISFOMs) to address nonconvex optimization with expected-valued objective function. Our algorithms allow for non… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    MSC Class: 90C06; 90C15; 90C26; 90C30

  4. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. CausalMMM: Learning Causal Structure for Marketing Mix Modeling

    Authors: Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi

    Abstract: In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better predictio… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: WSDM 2024, full version

  6. arXiv:2406.14491  [pdf, other

    cs.CL

    Instruction Pre-Training: Language Models are Supervised Multitask Learners

    Authors: Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei

    Abstract: Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augment… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2405.16036  [pdf, other

    cs.LG cs.CR cs.CV

    Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

    Authors: Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, Jinbo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar

    Abstract: Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage h… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.07626  [pdf, other

    cs.LG cs.AI

    AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

    Authors: Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, Jingping Bi

    Abstract: Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edge… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 13pages

  9. arXiv:2404.07308  [pdf, other

    cs.LG

    Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions

    Authors: Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu

    Abstract: Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning method… ▽ More

    Submitted 22 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at ECML-PKDD 2024

  10. arXiv:2403.16276  [pdf, other

    cs.CV cs.AI

    AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

    Authors: Yunlong Tang, Daiki Shimada, Jing Bi, Chenliang Xu

    Abstract: In everyday communication, humans frequently use speech and gestures to refer to specific areas or objects, a process known as Referential Dialogue (RD). While prior studies have investigated RD through Large Language Models (LLMs) or Large Multimodal Models (LMMs) in static contexts, the exploration of Temporal Referential Dialogue (TRD) within audio-visual media remains limited. Two primary chal… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  11. arXiv:2403.05796  [pdf, other

    cs.CV

    Weakly Supervised Change Detection via Knowledge Distillation and Multiscale Sigmoid Inference

    Authors: Binghao Lu, Caiwen Ding, Jinbo Bi, Dongjin Song

    Abstract: Change detection, which aims to detect spatial changes from a pair of multi-temporal images due to natural or man-made causes, has been widely applied in remote sensing, disaster management, urban management, etc. Most existing change detection approaches, however, are fully supervised and require labor-intensive pixel-level labels. To address this, we develop a novel weakly supervised change dete… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: code is available: https://github.com/BinghaoLu/KD-MSI

  12. arXiv:2402.17128  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OSCaR: Object State Captioning and State Change Representation

    Authors: Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu

    Abstract: The capability of intelligent models to extrapolate and comprehend changes in object states is a crucial yet demanding aspect of AI research, particularly through the lens of human interaction in real-world settings. This task involves describing complex visual environments, identifying active objects, and interpreting their changes as conveyed through language. Traditional methods, which isolate… ▽ More

    Submitted 2 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: NAACL 2024

  13. arXiv:2402.15586  [pdf, other

    cs.CV cs.CR

    Distilling Adversarial Robustness Using Heterogeneous Teachers

    Authors: Jieren Deng, Aaron Palmer, Rigel Mahmood, Ethan Rathbun, Jinbo Bi, Kaleel Mahmood, Derek Aguiar

    Abstract: Achieving resiliency against adversarial attacks is necessary prior to deploying neural network classifiers in domains where misclassification incurs substantial costs, e.g., self-driving cars or medical imaging. Recent work has demonstrated that robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation. However, current methods perform dis… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  14. arXiv:2312.17432  [pdf, other

    cs.CV cs.CL

    Video Understanding with Large Language Models: A Survey

    Authors: Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

    Abstract: With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of Large Language Models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs (Vid-… ▽ More

    Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  15. arXiv:2312.07934  [pdf, other

    eess.IV cs.CV

    Toward Real World Stereo Image Super-Resolution via Hybrid Degradation Model and Discriminator for Implied Stereo Image Information

    Authors: Yuanbo Zhou, Yuyang Xue, Jiang Bi, Wenlin He, Xinlin Zhang, Jiajun Zhang, Wei Deng, Ruofeng Nie, Junlin Lan, Qinquan Gao, Tong Tong

    Abstract: Real-world stereo image super-resolution has a significant influence on enhancing the performance of computer vision systems. Although existing methods for single-image super-resolution can be applied to improve stereo images, these methods often introduce notable modifications to the inherent disparity, resulting in a loss in the consistency of disparity between the original and the enhanced ster… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  16. arXiv:2311.12919  [pdf, other

    cs.CV cs.AI

    SPOT! Revisiting Video-Language Models for Event Understanding

    Authors: Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp

    Abstract: Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only… ▽ More

    Submitted 1 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  17. arXiv:2310.11699  [pdf, other

    cs.CL cs.CV

    MISAR: A Multimodal Instructional System with Augmented Reality

    Authors: Jing Bi, Nguyen Manh Nguyen, Ali Vosoughi, Chenliang Xu

    Abstract: Augmented reality (AR) requires the seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction. While auditory and visual inputs facilitate real-time and contextual user guidance, the potential of large language models (LLMs) in this landscape remains largely untapped. Our study introduces an innovative method harnessing LLMs to assimilate informatio… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023 - AV4D, 6 figures, 2 tables

  18. arXiv:2307.12732  [pdf, other

    cs.CV

    CLIP-KD: An Empirical Study of CLIP Model Distillation

    Authors: Chuanguang Yang, Zhulin An, Libo Huang, Junyu Bi, Xinqiang Yu, Han Yang, Boyu Diao, Yongjun Xu

    Abstract: Contrastive Language-Image Pre-training (CLIP) has become a promising language-supervised visual pre-training framework. This paper aims to distill small CLIP models supervised by a large teacher CLIP model. We propose several distillation strategies, including relation, feature, gradient and contrastive paradigms, to examine the effectiveness of CLIP-Knowledge Distillation (KD). We show that a si… ▽ More

    Submitted 7 May, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: CVPR-2024

  19. arXiv:2307.02507  [pdf, other

    cs.LG cs.AI

    STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

    Authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi

    Abstract: Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotempora… ▽ More

    Submitted 16 December, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: This work was accepted by the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP 2024). We will present our work in Seoul, Korea

  20. arXiv:2306.17100  [pdf, other

    cs.LG cs.AI

    RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

    Authors: Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, André Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Qingfu Zhang, Joungho Kim, Jie Zhang , et al. (8 additional authors not shown)

    Abstract: Deep reinforcement learning (RL) has recently shown significant benefits in solving combinatorial optimization (CO) problems, reducing reliance on domain expertise, and improving computational efficiency. However, the field lacks a unified benchmark for easy development and standardized comparison of algorithms across diverse CO problems. To fill this gap, we introduce RL4CO, a unified and extensi… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: A previous version was presented as a workshop paper at the NeurIPS 2023 GLFrontiers Workshop (Oral)

  21. arXiv:2306.13699  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Curvature-enhanced Graph Convolutional Network for Biomolecular Interaction Prediction

    Authors: Cong Shen, Pingjian Ding, Junjie Wee, Jialin Bi, Jiawei Luo, Kelin Xia

    Abstract: Geometric deep learning has demonstrated a great potential in non-Euclidean data analysis. The incorporation of geometric insights into learning architecture is vital to its success. Here we propose a curvature-enhanced graph convolutional network (CGCN) for biomolecular interaction prediction, for the first time. Our CGCN employs Ollivier-Ricci curvature (ORC) to characterize network local struct… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  22. arXiv:2306.09391  [pdf, other

    q-bio.QM cs.CV cs.LG q-bio.GN

    Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

    Authors: Rahil Mehrizi, Arash Mehrjou, Maryana Alegro, Yi Zhao, Benedetta Carbone, Carl Fishwick, Johanna Vappiani, Jing Bi, Siobhan Sanford, Hakan Keles, Marcus Bantscheff, Cuong Nguyen, Patrick Schwab

    Abstract: High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially… ▽ More

    Submitted 21 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  23. arXiv:2303.10112  [pdf, other

    cs.LG stat.ME

    Causal Discovery from Temporal Data: An Overview and New Perspectives

    Authors: Chang Gong, Di Yao, Chuzhe Zhang, Wenbin Li, Jingping Bi

    Abstract: Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the pas… ▽ More

    Submitted 3 August, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: 54 pages, 7 figures

  24. arXiv:2303.08518  [pdf, other

    cs.CL

    UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

    Authors: Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang

    Abstract: Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifical… ▽ More

    Submitted 16 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023 Main Conference

  25. arXiv:2211.07518  [pdf, other

    cs.LG

    Heterogeneous Graph Sparsification for Efficient Representation Learning

    Authors: Chandan Chunduru, Chun Jiang Zhu, Blake Gains, Jinbo Bi

    Abstract: Graph sparsification is a powerful tool to approximate an arbitrary graph and has been used in machine learning over homogeneous graphs. In heterogeneous graphs such as knowledge graphs, however, sparsification has not been systematically exploited to improve efficiency of learning tasks. In this work, we initiate the study on heterogeneous graph sparsification and develop sampling-based algorithm… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted and to appear in IEEE BIBM 2022 Workshop

  26. arXiv:2211.05361  [pdf, other

    cs.LG

    Safety-Constrained Policy Transfer with Successor Features

    Authors: Zeyu Feng, Bowen Zhang, Jianxin Bi, Harold Soh

    Abstract: In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications where interactions are costly and unconstrained policies can lead to undesirable or dangerous outcomes, e.g., with physical robots that interact with humans. We… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  27. arXiv:2210.07686  [pdf, other

    cs.LG cs.AI

    Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation

    Authors: Jieyi Bi, Yining Ma, Jiahai Wang, Zhiguang Cao, Jinbiao Chen, Yuan Sun, Yeow Meng Chee

    Abstract: Recent neural methods for vehicle routing problems always train and test the deep models on the same instance distribution (i.e., uniform). To tackle the consequent cross-distribution generalization concerns, we bring the knowledge distillation to this field and propose an Adaptive Multi-Distribution Knowledge Distillation (AMDKD) scheme for learning more generalizable deep models. Particularly, o… ▽ More

    Submitted 19 January, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  28. arXiv:2210.06546  [pdf, other

    cs.LG stat.ML

    Auto-Encoding Goodness of Fit

    Authors: Aaron Palmer, Zhiyi Chi, Derek Aguiar, Jinbo Bi

    Abstract: For generative autoencoders to learn a meaningful latent representation for data generation, a careful balance must be achieved between reconstruction error and how close the distribution in the latent space is to the prior. However, this balance is challenging to achieve due to a lack of criteria that work both at the mini-batch (local) and aggregated posterior (global) level. Goodness of fit (Go… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  29. arXiv:2210.04132  [pdf, other

    cs.CR cs.DC cs.LG

    Performances of Symmetric Loss for Private Data from Exponential Mechanism

    Authors: Jing Bi, Vorapong Suppakitpaisarn

    Abstract: This study explores the robustness of learning by symmetric loss on private data. Specifically, we leverage exponential mechanism (EM) on private labels. First, we theoretically re-discussed properties of EM when it is used for private learning with symmetric loss. Then, we propose numerical guidance of privacy budgets corresponding to different data scales and utility guarantees. Further, we cond… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: 14th International Workshop on Parallel and Distributed Algorithms and Applications (PDAA2022)

  30. arXiv:2209.10860  [pdf, other

    cs.LG cs.AI cs.CY

    SCALES: From Fairness Principles to Constrained Decision-Making

    Authors: Sreejith Balakrishnan, Jianxin Bi, Harold Soh

    Abstract: This paper proposes SCALES, a general framework that translates well-established fairness principles into a common representation based on the Constraint Markov Decision Process (CMDP). With the help of causal language, our framework can place constraints on both the procedure of decision making (procedural fairness) as well as the outcomes resulting from decisions (outcome fairness). Specifically… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Accepted to the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES '22), Updated version with additional citations, 14 pages

  31. A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

    Authors: Hongwu Peng, Shaoyi Huang, Shiyang Chen, Bingbing Li, Tong Geng, Ang Li, Weiwen Jiang, Wujie Wen, Jinbo Bi, Hang Liu, Caiwen Ding

    Abstract: Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkable triumphs, the prolonged turnaround time of Transformer models is a widely recognized roadblock. The variety of sequence lengths imposes additional computing ov… ▽ More

    Submitted 20 August, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

    Comments: 2022 59th ACM/IEEE Design Automation Conference (DAC)

    ACM Class: I.2; B.6; C.3

  32. arXiv:2204.12044  [pdf, other

    cs.LG stat.ML

    ISTRBoost: Importance Sampling Transfer Regression using Boosting

    Authors: Shrey Gupta, Jianzhao Bi, Yang Liu, Avani Wildani

    Abstract: Current Instance Transfer Learning (ITL) methodologies use domain adaptation and sub-space transformation to achieve successful transfer learning. However, these methodologies, in their processes, sometimes overfit on the target dataset or suffer from negative transfer if the test dataset has a high variance. Boosting methodologies have been shown to reduce the risk of overfitting by iteratively r… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  33. arXiv:2201.00689  [pdf, other

    cs.IR cs.AI

    CausalMTA: Eliminating the User Confounding Bias for Causal Multi-touch Attribution

    Authors: Di Yao, Chang Gong, Lei Zhang, Sheng Chen, Jingping Bi

    Abstract: Multi-touch attribution (MTA), aiming to estimate the contribution of each advertisement touchpoint in conversion journeys, is essential for budget allocation and automatically advertising. Existing methods first train a model to predict the conversion probability of the advertisement journeys with historical data and calculate the attribution of each touchpoint using counterfactual predictions. A… ▽ More

    Submitted 21 July, 2022; v1 submitted 20 December, 2021; originally announced January 2022.

    Comments: 11 pages, 5 figures This paper has been accepted in KDD 2022

  34. arXiv:2110.01770  [pdf, other

    cs.CV cs.AI

    Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning

    Authors: Jing Bi, Jiebo Luo, Chenliang Xu

    Abstract: Learning new skills by observing humans' behaviors is an essential capability of AI. In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos. In contrast to conventional action recognition, goal-directed actions are based on expectations of their outcomes requiring causal knowledge of… ▽ More

    Submitted 8 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 Oral

  35. arXiv:2109.09495  [pdf, other

    cs.LG cs.NE

    GhostShiftAddNet: More Features from Energy-Efficient Operations

    Authors: Jia Bi, Jonathon Hare, Geoff V. Merrett

    Abstract: Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with few… ▽ More

    Submitted 3 February, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

    Journal ref: The 32nd British Machine Vision Conference BMVC 2021

  36. arXiv:2109.06355  [pdf, other

    cs.AR

    Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search

    Authors: Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott A. Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, Caiwen Ding

    Abstract: Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measurin… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICCAD 2021

    ACM Class: B.0; I.0

  37. arXiv:2109.00201  [pdf, other

    cs.LG cs.AI cs.CV

    An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

    Authors: Chongsheng Zhang, Paolo Soda, Jingjun Bi, Gaojuan Fan, George Almpanidis, Salvador Garcia

    Abstract: In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head) classes have sufficient samples, the minority (the tail) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very effective in dealing with such problems. On one hand, data re-s… ▽ More

    Submitted 13 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 25 pages, 12 figures; revision v1

  38. arXiv:2107.08199  [pdf, other

    cs.CL cs.LG

    Dynamic Transformer for Efficient Machine Translation on Embedded Devices

    Authors: Hishan Parry, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V. Merrett

    Abstract: The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The… ▽ More

    Submitted 30 July, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted at MLCAD 2021

  39. arXiv:2105.03596  [pdf, other

    cs.CV

    Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

    Authors: Wei Lou, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V. Merrett

    Abstract: Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performa… ▽ More

    Submitted 11 May, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Accepted at CVPR ECV Workshop 2021

  40. arXiv:2104.05613  [pdf, other

    cs.LG cs.GT

    An Efficient Algorithm for Deep Stochastic Contextual Bandits

    Authors: Tan Zhu, Guannan Liang, Chunjiang Zhu, Haining Li, Jinbo Bi

    Abstract: In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed context to maximize the cumulative reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and the DNN is trained by a stochastic gradient based method. However, convergence analysis has been greatly ignored to e… ▽ More

    Submitted 21 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted by AAAI 2021 Appendix uploaded

  41. arXiv:2103.04413  [pdf, other

    math.OC cs.LG stat.ML

    Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

    Authors: Guannan Liang, Qianqian Tong, Chunjiang Zhu, Jinbo Bi

    Abstract: Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that S… ▽ More

    Submitted 23 April, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

  42. arXiv:2102.09893  [pdf, other

    cs.LG cs.AI math.OC

    A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization

    Authors: Jia Bi, Steve R. Gunn

    Abstract: In this paper, we proposed a new technique, {\em variance controlled stochastic gradient} (VCSG), to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by SVRG, a hyper-parameter $λ$ is introduced in VCSG that is able to control the reduced variance of SVRG. Theory shows that the optimization method can converge by… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  43. arXiv:2101.06861  [pdf, other

    cs.LG stat.ML

    Discrete Graph Structure Learning for Forecasting Multiple Time Series

    Authors: Chao Shang, Jie Chen, Jinbo Bi

    Abstract: Time series forecasting is an extensively studied subject in statistics, economics, and computer science. Exploration of the correlation and causation among the variables in a multivariate time series shows promise in enhancing the performance of a time series model. When using deep neural networks as forecasting models, we hypothesize that exploiting the pairwise information among multiple (multi… ▽ More

    Submitted 20 April, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: ICLR 2021. Code is available at https://github.com/chaoshangcs/GTS

  44. arXiv:2101.00052  [pdf, other

    cs.LG

    Federated Nonconvex Sparse Learning

    Authors: Qianqian Tong, Guannan Liang, Tan Zhu, Jinbo Bi

    Abstract: Nonconvex sparse learning plays an essential role in many areas, such as signal processing and deep network compression. Iterative hard thresholding (IHT) methods are the state-of-the-art for nonconvex sparse learning due to their capability of recovering true support and scalability with large datasets. Theoretical analysis of IHT is currently based on centralized IID data. In realistic large-sca… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  45. arXiv:2011.00667  [pdf, other

    math.OC cs.DC

    Asynchronous Parallel Stochastic Quasi-Newton Methods

    Authors: Qianqian Tong, Guannan Liang, Xingyu Cai, Chunjiang Zhu, Jinbo Bi

    Abstract: Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an as… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted by Parallel Computing Journal

  46. arXiv:2010.01381  [pdf, other

    cs.LG stat.ML

    Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences

    Authors: Jing Shi, Jing Bi, Yingru Liu, Chenliang Xu

    Abstract: The marriage of recurrent neural networks and neural ordinary differential networks (ODE-RNN) is effective in modeling irregularly-observed sequences. While ODE produces the smooth hidden states between observation intervals, the RNN will trigger a hidden state jump when a new observation arrives, thus cause the interpolation discontinuity problem. To address this issue, we propose the cubic splin… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  47. arXiv:2009.06562  [pdf, other

    cs.LG stat.ML

    Effective Proximal Methods for Non-convex Non-smooth Regularized Learning

    Authors: Guannan Liang, Qianqian Tong, Jiahao Ding, Miao Pan, Jinbo Bi

    Abstract: Sparse learning is a very important tool for mining useful information and patterns from high dimensional data. Non-convex non-smooth regularized learning problems play essential roles in sparse learning, and have drawn extensive attentions recently. We design a family of stochastic proximal gradient methods by applying arbitrary sampling to solve the empirical risk minimization problem with a non… ▽ More

    Submitted 21 October, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Accepted by ICDM 2020, 24 pages

  48. arXiv:2009.06557  [pdf, other

    cs.LG stat.ML

    Effective Federated Adaptive Gradient Methods with Non-IID Decentralized Data

    Authors: Qianqian Tong, Guannan Liang, Jinbo Bi

    Abstract: Federated learning allows loads of edge computing devices to collaboratively learn a global model without data sharing. The analysis with partial device participation under non-IID and unbalanced data reflects more reality. In this work, we propose federated learning versions of adaptive gradient methods - Federated AGMs - which employ both the first-order and second-order momenta, to alleviate ge… ▽ More

    Submitted 21 December, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: 42 pages

  49. arXiv:2008.13578  [pdf, other

    cs.LG stat.ML

    Against Membership Inference Attack: Pruning is All You Need

    Authors: Yijue Wang, Chenghong Wang, Zigeng Wang, Shanglin Zhou, Hang Liu, Jinbo Bi, Caiwen Ding, Sanguthevar Rajasekaran

    Abstract: The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pru… ▽ More

    Submitted 4 July, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

    Journal ref: IJCAI, 2021

  50. arXiv:2008.04500  [pdf, other

    cs.LG cs.CR stat.ML

    Towards Plausible Differentially Private ADMM Based Distributed Machine Learning

    Authors: Jiahao Ding, Jingyi Wang, Guannan Liang, Jinbo Bi, Miao Pan

    Abstract: The Alternating Direction Method of Multipliers (ADMM) and its distributed version have been widely used in machine learning. In the iterations of ADMM, model updates using local private data and model exchanges among agents impose critical privacy concerns. Despite some pioneering works to relieve such concerns, differentially private ADMM still confronts many research challenges. For example, th… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Comments: Accepted for publication in CIKM'20