Search | arXiv e-print repository

arXiv:2408.02019 [pdf, other]

Personalized Federated Learning on Heterogeneous and Long-Tailed Data via Expert Collaborative Learning

Authors: Fengling Lv, Xinyi Shang, Yang Zhou, Yiqun Zhang, Mengke Li, Yang Lu

Abstract: Personalized Federated Learning (PFL) aims to acquire customized models for each client without disclosing raw data by leveraging the collective knowledge of distributed clients. However, the data collected in real-world scenarios is likely to follow a long-tailed distribution. For example, in the medical domain, it is more common for the number of general health notes to be much larger than those… ▽ More Personalized Federated Learning (PFL) aims to acquire customized models for each client without disclosing raw data by leveraging the collective knowledge of distributed clients. However, the data collected in real-world scenarios is likely to follow a long-tailed distribution. For example, in the medical domain, it is more common for the number of general health notes to be much larger than those specifically relatedto certain diseases. The presence of long-tailed data can significantly degrade the performance of PFL models. Additionally, due to the diverse environments in which each client operates, data heterogeneity is also a classic challenge in federated learning. In this paper, we explore the joint problem of global long-tailed distribution and data heterogeneity in PFL and propose a method called Expert Collaborative Learning (ECL) to tackle this problem. Specifically, each client has multiple experts, and each expert has a different training subset, which ensures that each class, especially the minority classes, receives sufficient training. Multiple experts collaborate synergistically to produce the final prediction output. Without special bells and whistles, the vanilla ECL outperforms other state-of-the-art PFL methods on several benchmark datasets under different degrees of data heterogeneity and long-tailed distribution. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.00799 [pdf, other]

Deep Uncertainty-Based Explore for Index Construction and Retrieval in Recommendation System

Authors: Xin Jiang, Kaiqiang Wang, Yinlong Wang, Fengchang Lv, Taiyang Peng, Shuai Yang, Xianteng Wu, Pengye Zhang, Shuo Yuan, Yifan Zeng

Abstract: In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems,… ▽ More In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems, contributing significantly to improving recommendation quality. However, the typical matching algorithms have not simultaneously addressed the relevance and novelty perfectly. One main reason is that deep matching algorithms exhibit significant uncertainty when estimating items in the long tail (e.g., due to insufficient training samples) items.The uncertainty not only affects the training of the models but also influences the confidence in the index construction and beam search retrieval process of these models. This paper proposes the UICR (Uncertainty-based explore for Index Construction and Retrieval) algorithm, which introduces the concept of uncertainty modeling in the matching stage and achieves multi-task modeling of model uncertainty and index uncertainty. The final matching results are obtained by combining the relevance score and uncertainty score infered by the model. Experimental results demonstrate that the UICR improves novelty without sacrificing relevance on realworld industrial productive environments and multiple open-source datasets. Remarkably, online A/B test results of display advertising in Shopee demonstrates the effectiveness of the proposed algorithm. △ Less

Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

Comments: accepted by cikm2024

arXiv:2406.12193 [pdf, other]

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Authors: Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

Abstract: Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers withi… ▽ More Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods. △ Less

Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.00644 [pdf, other]

Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance

Authors: Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang

Abstract: Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation proces… ▽ More Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2404.00226 [pdf, other]

Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training

Authors: Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu

Abstract: Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Ques… ▽ More Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Question Answering (VQA) for multimodal pre-training to guide the framework focusing on targeted pathological features. In this work, we leverage descriptions in medical reports to design multi-granular question-answer pairs associated with different diseases, which assist the framework in pre-training without requiring extra annotations from experts. We also propose a novel pre-training framework with a quasi-textual feature transformer, a module designed to transform visual features into a quasi-textual space closer to the textual domain via a contrastive learning strategy. This narrows the vision-language gap and facilitates modality alignment. Our framework is applied to four downstream tasks: report generation, classification, segmentation, and detection across five datasets. Extensive experiments demonstrate the superiority of our framework compared to other state-of-the-art methods. Our code will be released upon acceptance. △ Less

Submitted 8 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.07392 [pdf, other]

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Authors: Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, Yifeng Shi

Abstract: Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore,… ▽ More Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT backbone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://github.com/Traffic-X/ViT-CoMer. △ Less

Submitted 27 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: CVPR2024

arXiv:2403.04338 [pdf, other]

doi 10.1103/PhysRevB.109.195301

Structural disorder-induced topological phase transitions in quasicrystals

Authors: Tan Peng, Yong-Chen Xiong, Chun-Bo Hua, Zheng-Rong Liu, Xiaolu Zhu, Wei Cao, Fang Lv, Yue Hou, Bin Zhou, Ziyu Wang, Rui Xiong

Abstract: Recently, the structural disorder-induced topological phase transitions in periodic systems have attracted much attention. However, in aperiodic systems such as quasicrystalline systems, the interplay between structural disorder and band topology is still unclear. In this work, we investigate the effects of structural disorder on a quantum spin Hall insulator phase and a higher-order topological p… ▽ More Recently, the structural disorder-induced topological phase transitions in periodic systems have attracted much attention. However, in aperiodic systems such as quasicrystalline systems, the interplay between structural disorder and band topology is still unclear. In this work, we investigate the effects of structural disorder on a quantum spin Hall insulator phase and a higher-order topological phase in a two-dimensional Amman-Beenker tiling quasicrystalline lattice, respectively. We demonstrate that the structural disorder can induce a topological phase transition from a quasicrystalline normal insulator phase to an amorphous quantum spin Hall insulator phase, which is confirmed by bulk gap closing and reopening, robust edge states, quantized spin Bott index and conductance. Furthermore, the structural disorder-induced higher-order topological phase transition from a quasicrystalline normal insulator phase to an amorphous higher-order topological phase characterized by quantized quadrupole moment and topological corner states is also found. More strikingly, the disorder-induced higher-order topological insulator with eight corner states represents a distinctive topological state that eludes realization in conventional crystalline systems. Our work extends the study of the interplay between disorder effects and topologies to quasicrystalline and amorphous systems. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 9 pages,7 figures. arXiv admin note: text overlap with arXiv:2108.04971

Journal ref: Phys. Rev. B 109, 195301 (2024)

arXiv:2402.06165 [pdf, other]

Learning Contrastive Feature Representations for Facial Action Unit Detection

Authors: Ziqiao Shang, Bin Liu, Fengmao Lv, Fei Teng, Tianrui Li

Abstract: Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by… ▽ More Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by the class imbalance issue of each AU type, and the presence of noisy and false AU labels. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on four widely-utilized benchmark datasets (BP4D, DISFA, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at \url{https://github.com/Ziqiao-Shang/AUNCE}. △ Less

Submitted 12 July, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 13 pages, 17 figures, submitted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:2401.11734 [pdf, other]

Colorectal Polyp Segmentation in the Deep Learning Era: A Comprehensive Survey

Authors: Zhenyu Wu, Fengmao Lv, Chenglizhao Chen, Aimin Hao, Shuo Li

Abstract: Colorectal polyp segmentation (CPS), an essential problem in medical image analysis, has garnered growing research attention. Recently, the deep learning-based model completely overwhelmed traditional methods in the field of CPS, and more and more deep CPS methods have emerged, bringing the CPS into the deep learning era. To help the researchers quickly grasp the main techniques, datasets, evaluat… ▽ More Colorectal polyp segmentation (CPS), an essential problem in medical image analysis, has garnered growing research attention. Recently, the deep learning-based model completely overwhelmed traditional methods in the field of CPS, and more and more deep CPS methods have emerged, bringing the CPS into the deep learning era. To help the researchers quickly grasp the main techniques, datasets, evaluation metrics, challenges, and trending of deep CPS, this paper presents a systematic and comprehensive review of deep-learning-based CPS methods from 2014 to 2023, a total of 115 technical papers. In particular, we first provide a comprehensive review of the current deep CPS with a novel taxonomy, including network architectures, level of supervision, and learning paradigm. More specifically, network architectures include eight subcategories, the level of supervision comprises six subcategories, and the learning paradigm encompasses 12 subcategories, totaling 26 subcategories. Then, we provided a comprehensive analysis the characteristics of each dataset, including the number of datasets, annotation types, image resolution, polyp size, contrast values, and polyp location. Following that, we summarized CPS's commonly used evaluation metrics and conducted a detailed analysis of 40 deep SOTA models, including out-of-distribution generalization and attribute-based performance analysis. Finally, we discussed deep learning-based CPS methods' main challenges and opportunities. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 21 pages, 8 figures

arXiv:2401.10747

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Authors: Weide Liu, Huijing Zhan, Hao Chen, Fengmao Lv

Abstract: Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate betw… ▽ More Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision. △ Less

Submitted 10 July, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

Comments: We request to withdraw our paper from the archive due to significant errors identified in the analysis and conclusions. Upon further review, we realized that these errors undermine the validity of our findings. We plan to conduct additional research to correct these issues and resubmit a revised version in the future

arXiv:2401.10549 [pdf, other]

Unified View Imputation and Feature Selection Learning for Incomplete Multi-view Data

Authors: Yanyong Huang, Zongxin Shen, Tianrui Li, Fengmao Lv

Abstract: Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating im… ▽ More Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating imputation and feature selection processes fails to capitalize on the potential synergy where local structural information gleaned from feature selection could guide the imputation, thereby improving the feature selection performance in turn. Additionally, previous methods only focus on leveraging samples' local structure information, while ignoring the intrinsic locality of the feature space. To tackle these problems, a novel MUFS method, called UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed. UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces. Then, UNIFIER dynamically recovers the missing views, guided by the sample and feature similarity graphs during the feature selection procedure. Furthermore, the half-quadratic minimization technique is used to automatically weight different instances, alleviating the impact of outliers and unreliable restored data. Comprehensive experimental results demonstrate that UNIFIER outperforms other state-of-the-art methods. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2310.10008 [pdf, other]

Towards Unified and Effective Domain Generalization

Authors: Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

Abstract: We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, w… ▽ More We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, we encourage models to learn the distribution of test data in an unsupervised manner and impose a penalty regarding the updating step of model parameters. The penalty term can effectively reduce the catastrophic forgetting issue as we would like to maximally preserve the valuable knowledge in the original model. Empirically, across 12 visual backbones, including CNN-, MLP-, and Transformer-based models, ranging from 1.89M to 303M parameters, UniDG shows an average accuracy improvement of +5.4% on DomainBed. These performance results demonstrate the superiority and versatility of UniDG. The code is publicly available at https://github.com/invictus717/UniDG △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: Project Website: https://invictus717.github.io/Generalization/

arXiv:2308.00721 [pdf, other]

A Pre-trained Data Deduplication Model based on Active Learning

Authors: Xinyao Liu, Shengdong Du, Fengmao Lv, Hongtao Xue, Jie Hu, Tianrui Li

Abstract: In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication mod… ▽ More In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication model based on active learning, which is the first work that utilizes active learning to address the problem of deduplication at the semantic level. The model is built on a pre-trained Transformer and fine-tuned to solve the deduplication problem as a sequence to classification task, which firstly integrate the transformer with active learning into an end-to-end architecture to select the most valuable data for deduplication model training, and also firstly employ the R-Drop method to perform data augmentation on each round of labeled data, which can reduce the cost of manual labeling and improve the model's performance. Experimental results demonstrate that our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification, achieving up to a 28% improvement in Recall score on benchmark datasets. △ Less

Submitted 20 March, 2024; v1 submitted 30 July, 2023; originally announced August 2023.

arXiv:2304.07858 [pdf, other]

Cold-Start based Multi-Scenario Ranking Model for Click-Through Rate Prediction

Authors: Peilin Chen, Hong Wen, Jing Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang

Abstract: Online travel platforms (OTPs), e.g., Ctrip.com or Fliggy.com, can effectively provide travel-related products or services to users. In this paper, we focus on the multi-scenario click-through rate (CTR) prediction, i.e., training a unified model to serve all scenarios. Existing multi-scenario based CTR methods struggle in the context of OTP setting due to the ignorance of the cold-start users who… ▽ More Online travel platforms (OTPs), e.g., Ctrip.com or Fliggy.com, can effectively provide travel-related products or services to users. In this paper, we focus on the multi-scenario click-through rate (CTR) prediction, i.e., training a unified model to serve all scenarios. Existing multi-scenario based CTR methods struggle in the context of OTP setting due to the ignorance of the cold-start users who have very limited data. To fill this gap, we propose a novel method named Cold-Start based Multi-scenario Network (CSMN). Specifically, it consists of two basic components including: 1) User Interest Projection Network (UIPN), which firstly purifies users' behaviors by eliminating the scenario-irrelevant information in behaviors with respect to the visiting scenario, followed by obtaining users' scenario-specific interests by summarizing the purified behaviors with respect to the target item via an attention mechanism; and 2) User Representation Memory Network (URMN), which benefits cold-start users from users with rich behaviors through a memory read and write mechanism. CSMN seamlessly integrates both components in an end-to-end learning framework. Extensive experiments on real-world offline dataset and online A/B test demonstrate the superiority of CSMN over state-of-the-art methods. △ Less

Submitted 16 April, 2023; originally announced April 2023.

Comments: accepted by DASFAA'23 as a Research Paper

arXiv:2304.06051 [pdf, other]

Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation

Authors: Yifeng Shi, Feng Lv, Xinliang Wang, Chunlong Xia, Shaojie Li, Shujie Yang, Teng Xi, Gang Zhang

Abstract: With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: f… ▽ More With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: few shots, poor generalization, and a lack of multi-modal techniques. Foundation model technology can significantly alleviate the aforementioned issues. To address these, we designed the 1st Foundation Model Challenge, with the goal of increasing the popularity of foundation model technology in traffic scenarios and promoting the rapid development of the intelligent transportation industry. The challenge is divided into two tracks: all-in-one and cross-modal image retrieval. Furthermore, we provide a new baseline and benchmark for the two tracks, called Open-TransMind. According to our knowledge, Open-TransMind is the first open-source transportation foundation model with multi-task and multi-modal capabilities. Simultaneously, Open-TransMind can achieve state-of-the-art performance on detection, classification, and segmentation datasets of traffic scenarios. Our source code is available at https://github.com/Traffic-X/Open-TransMind. △ Less

Submitted 7 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.04377 [pdf, other]

doi 10.1145/3539618.3591859

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Authors: Xiaoyang Zheng, Fuyu Lv, Zilong Wang, Qingwen Liu, Xiaoyi Zeng

Abstract: E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We… ▽ More E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We design a visual pre-training task based on contrastive learning, outperforming common regression-based visual pre-training tasks. In addition, we adopt two negative sampling schemes, tailored for the large-scale retrieval task. Besides, we introduce the details of the online deployment of our proposed method in real-world situations. Extensive offline/online experiments demonstrate the superior performance of our method on the retrieval task. Our proposed method is employed as one retrieval channel of Taobao Search and serves hundreds of millions of users in real time. △ Less

Submitted 17 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: 5 pages, 4 figures, accepted to SIRIP 2023

arXiv:2304.03679 [pdf, other]

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Authors: Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhijing Wu, Xiangsheng Li, Haitao Li, Yiqun Liu, Jin Ma

Abstract: Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine… ▽ More Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/ △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

arXiv:2303.13826 [pdf, other]

Hard Sample Matters a Lot in Zero-Shot Quantization

Authors: Huantong Li, Xiangmiao Wu, Fanbing Lv, Daihai Liao, Thomas H. Li, Yonggang Zhang, Bo Han, Mingkui Tan

Abstract: Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural networks when the data for training full-precision models are inaccessible. In ZSQ, network quantization is performed using synthetic samples, thus, the performance of quantized models depends heavily on the quality of synthetic samples. Nonetheless, we find that the synthetic samples constructed in existing ZSQ… ▽ More Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural networks when the data for training full-precision models are inaccessible. In ZSQ, network quantization is performed using synthetic samples, thus, the performance of quantized models depends heavily on the quality of synthetic samples. Nonetheless, we find that the synthetic samples constructed in existing ZSQ methods can be easily fitted by models. Accordingly, quantized models obtained by these methods suffer from significant performance degradation on hard samples. To address this issue, we propose HArd sample Synthesizing and Training (HAST). Specifically, HAST pays more attention to hard samples when synthesizing samples and makes synthetic samples hard to fit when training quantized models. HAST aligns features extracted by full-precision and quantized models to ensure the similarity between features extracted by these two models. Extensive experiments show that HAST significantly outperforms existing ZSQ methods, achieving performance comparable to models that are quantized with real data. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 12 pages, CVPR 2023

arXiv:2303.13297 [pdf, other]

Improving Generalization with Domain Convex Game

Authors: Fangrui Lv, Jian Liang, Shuang Li, Jinming Zhang, Di Liu

Abstract: Domain generalization (DG) tends to alleviate the poor generalization capability of deep neural networks by learning model with multiple source domains. A classical solution to DG is domain augmentation, the common belief of which is that diversifying source domains will be conducive to the out-of-distribution generalization. However, these claims are understood intuitively, rather than mathematic… ▽ More Domain generalization (DG) tends to alleviate the poor generalization capability of deep neural networks by learning model with multiple source domains. A classical solution to DG is domain augmentation, the common belief of which is that diversifying source domains will be conducive to the out-of-distribution generalization. However, these claims are understood intuitively, rather than mathematically. Our explorations empirically reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation. This work therefore aim to guarantee and further enhance the validity of this strand. To this end, we propose a new perspective on DG that recasts it as a convex game between domains. We first encourage each diversified domain to enhance model generalization by elaborately designing a regularization term based on supermodularity. Meanwhile, a sample filter is constructed to eliminate low-quality samples, thereby avoiding the impact of potentially harmful information. Our framework presents a new avenue for the formal analysis of DG, heuristic analysis and extensive experiments demonstrate the rationality and effectiveness. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: accepted by CVPR 2023

arXiv:2301.01970 [pdf, other]

CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection

Authors: Shuailei Ma, Yuefeng Wang, Jiaqi Fan, Ying Wei, Thomas H. Li, Hongli Liu, Fanbing Lv

Abstract: Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknow… ▽ More Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection. △ Less

Submitted 27 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: CVPR 2023 camera-ready version

arXiv:2210.17180 [pdf, other]

doi 10.1109/TCSVT.2024.3395463

Automated Dominative Subspace Mining for Efficient Neural Architecture Search

Authors: Yaofo Chen, Yong Guo, Daihai Liao, Fanbing Lv, Hengjie Song, James Tin-Yau Kwok, Mingkui Tan

Abstract: Neural Architecture Search (NAS) aims to automatically find effective architectures within a predefined search space. However, the search space is often extremely large. As a result, directly searching in such a large search space is non-trivial and also very time-consuming. To address the above issues, in each search step, we seek to limit the search space to a small but effective subspace to boo… ▽ More Neural Architecture Search (NAS) aims to automatically find effective architectures within a predefined search space. However, the search space is often extremely large. As a result, directly searching in such a large search space is non-trivial and also very time-consuming. To address the above issues, in each search step, we seek to limit the search space to a small but effective subspace to boost both the search performance and search efficiency. To this end, we propose a novel Neural Architecture Search method via Dominative Subspace Mining (DSM-NAS) that finds promising architectures in automatically mined subspaces. Specifically, we first perform a global search, i.e ., dominative subspace mining, to find a good subspace from a set of candidates. Then, we perform a local search within the mined subspace to find effective architectures. More critically, we further boost search performance by taking well-designed/ searched architectures to initialize candidate subspaces. Experimental results demonstrate that DSM-NAS not only reduces the search cost but also discovers better architectures than state-of-the-art methods in various benchmark search spaces. △ Less

Submitted 6 June, 2024; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: Published in IEEE TCSVT

arXiv:2209.13947 [pdf, ps, other]

$^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) Reaction Cross Section Measurements using Laser-Driven Ultra-Intense $γ$-Ray Source

Authors: D. Wu, H. Y. Lan, J. Y. Zhang, J. X. Liu, H. G. Lu, J. F. Lv, X. Z. Wu, H. Zhang, J. Cai, Q. Y. Ma, Y. H. Xia, Z. N. Wang, M. Z. Wang, Z. Y. Yang, X. L. Xu, Y. X. Geng, Y. Y. Zhao, C. Lin, W. J. Ma, J. Q. Yu, H. R. Wang, F. L. Liu, C. Y. He, B. Guo, P. Zhu , et al. (4 additional authors not shown)

Abstract: We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover th… ▽ More We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover the energy range from knocking out neutrons to producing pions. Stable quasi-monoenergetic electron beams were generated via laser wakefield acceleration with a charge of 300$\,\thicksim\,$600 pC per shot. The averaged $γ$-ray intensities ($\geqslant$8 MeV) were higher than 10$^{8}$ per shot and the instantaneous intensities can reach above 10$^{19}$ s$^{-1}$ with a duration time about 6.7 ps. $^{65}$Cu($γ,\,n$)$^{64}$Cu and $^{27}$Al($γ,\,x$)$^{24}$Na reactions were used as $γ$-ray flux monitors in the experiments. The flux-weighted average cross sections and isomeric ratios of $^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) reactions were analyzed through activation measurements. The results showed good agreement with previous works and proved this method to be accurate. The $^{197}$Au($γ,\,xn;\,x\,=\,7\thicksim\,9$) reaction cross sections were first achieved with the highest threshold energy of 71.410 MeV. Theoretical cross sections of TALYS 1.9 were calculated to compare with experiment results. This method offered a unique way of gaining insight into photonuclear reaction research, especially for short-lived isomers which extremely lack experimental data. △ Less

Submitted 23 November, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2208.09736 [pdf, other]

C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection

Authors: Yanyong Huang, Zongxin Shen, Yuxin Cai, Xiuwen Yi, Dongjie Wang, Fengmao Lv, Tianrui Li

Abstract: Multi-view unsupervised feature selection (MUFS) has been demonstrated as an effective technique to reduce the dimensionality of multi-view unlabeled data. The existing methods assume that all of views are complete. However, multi-view data are usually incomplete, i.e., a part of instances are presented on some views but not all views. Besides, learning the complete similarity graph, as an importa… ▽ More Multi-view unsupervised feature selection (MUFS) has been demonstrated as an effective technique to reduce the dimensionality of multi-view unlabeled data. The existing methods assume that all of views are complete. However, multi-view data are usually incomplete, i.e., a part of instances are presented on some views but not all views. Besides, learning the complete similarity graph, as an important promising technology in existing MUFS methods, cannot achieve due to the missing views. In this paper, we propose a complementary and consensus learning-based incomplete multi-view unsupervised feature selection method (C$^{2}$IMUFS) to address the aforementioned issues. Concretely, C$^{2}$IMUFS integrates feature selection into an extended weighted non-negative matrix factorization model equipped with adaptive learning of view-weights and a sparse $\ell_{2,p}$-norm, which can offer better adaptability and flexibility. By the sparse linear combinations of multiple similarity matrices derived from different views, a complementary learning-guided similarity matrix reconstruction model is presented to obtain the complete similarity graph in each view. Furthermore, C$^{2}$IMUFS learns a consensus clustering indicator matrix across different views and embeds it into a spectral graph term to preserve the local geometric structure. Comprehensive experimental results on real-world datasets demonstrate the effectiveness of C$^{2}$IMUFS compared with state-of-the-art methods. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2208.03030 [pdf, other]

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Authors: Bingning Wang, Feiyang Lv, Ting Yao, Yiming Yuan, Jin Ma, Yu Luo, Haijin Liang

Abstract: Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as `What color are her eyes?'. The human generated crowdsourcing questions are relatively simple and sometimes have the bias toward certain ent… ▽ More Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as `What color are her eyes?'. The human generated crowdsourcing questions are relatively simple and sometimes have the bias toward certain entities or attributes. In this paper, we introduce a new question answering dataset based on image-ChiQA. It contains the real-world queries issued by internet users, combined with several related open-domain images. The system should determine whether the image could answer the question or not. Different from previous VQA datasets, the questions are real-world image-independent queries that are more various and unbiased. Compared with previous image-retrieval or image-caption datasets, the ChiQA not only measures the relatedness but also measures the answerability, which demands more fine-grained vision and language reasoning. ChiQA contains more than 40K questions and more than 200K question-images pairs. A three-level 2/1/0 label is assigned to each pair indicating perfect answer, partially answer and irrelevant. Data analysis shows ChiQA requires a deep understanding of both language and vision, including grounding, comparisons, and reading. We evaluate several state-of-the-art visual-language models such as ALBEF, demonstrating that there is still a large room for improvements on ChiQA. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: CIKM2022 camera ready version

arXiv:2207.02468 [pdf, other]

Re-weighting Negative Samples for Model-Agnostic Matching

Authors: Jiazhen Lou, Hong Wen, Fuyu Lv, Jing Zhang, Tengfei Yuan, Zhao Li

Abstract: Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry. As the initial stage of RS, large-scale matching is fundamental yet challenging. A typical recipe is to learn user and item representations with a two-tower architecture and then calculate the similarity score between both re… ▽ More Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry. As the initial stage of RS, large-scale matching is fundamental yet challenging. A typical recipe is to learn user and item representations with a two-tower architecture and then calculate the similarity score between both representation vectors, which however still struggles in how to properly deal with negative samples. In this paper, we find that the common practice that randomly sampling negative samples from the entire space and treating them equally is not an optimal choice, since the negative samples from different sub-spaces at different stages have different importance to a matching model. To address this issue, we propose a novel method named Unbiased Model-Agnostic Matching Approach (UMA$^2$). It consists of two basic modules including 1) General Matching Model (GMM), which is model-agnostic and can be implemented as any embedding-based two-tower models; and 2) Negative Samples Debias Network (NSDN), which discriminates negative samples by borrowing the idea of Inverse Propensity Weighting (IPW) and re-weighs the loss in GMM. UMA$^2$ seamlessly integrates these two modules in an end-to-end multi-task learning framework. Extensive experiments on both real-world offline dataset and online A/B test demonstrate its superiority over state-of-the-art methods. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2206.12296 [pdf, other]

Intelligent Request Strategy Design in Recommender System

Authors: Xufeng Qian, Yue Xu, Fuyu Lv, Shengyu Zhang, Ziwen Jiang, Qingwen Liu, Xiaoyi Zeng, Tat-Seng Chua, Fei Wu

Abstract: Waterfall Recommender System (RS), a popular form of RS in mobile applications, is a stream of recommended items consisting of successive pages that can be browsed by scrolling. In waterfall RS, when a user finishes browsing a page, the edge (e.g., mobile phones) would send a request to the cloud server to get a new page of recommendations, known as the paging request mechanism. RSs typically put… ▽ More Waterfall Recommender System (RS), a popular form of RS in mobile applications, is a stream of recommended items consisting of successive pages that can be browsed by scrolling. In waterfall RS, when a user finishes browsing a page, the edge (e.g., mobile phones) would send a request to the cloud server to get a new page of recommendations, known as the paging request mechanism. RSs typically put a large number of items into one page to reduce excessive resource consumption from numerous paging requests, which, however, would diminish the RSs' ability to timely renew the recommendations according to users' real-time interest and lead to a poor user experience. Intuitively, inserting additional requests inside pages to update the recommendations with a higher frequency can alleviate the problem. However, previous attempts, including only non-adaptive strategies (e.g., insert requests uniformly), would eventually lead to resource overconsumption. To this end, we envision a new learning task of edge intelligence named Intelligent Request Strategy Design (IRSD). It aims to improve the effectiveness of waterfall RSs by determining the appropriate occasions of request insertion based on users' real-time intention. Moreover, we propose a new paradigm of adaptive request insertion strategy named Uplift-based On-edge Smart Request Framework (AdaRequest). AdaRequest 1) captures the dynamic change of users' intentions by matching their real-time behaviors with their historical interests based on attention-based neural networks. 2) estimates the counterfactual uplift of user purchase brought by an inserted request based on causal inference. 3) determines the final request insertion strategy by maximizing the utility function under online resource constraints. We conduct extensive experiments on both offline dataset and online A/B test to verify the effectiveness of AdaRequest. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2203.14237 [pdf, other]

Causality Inspired Representation Learning for Domain Generalization

Authors: Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, Di Liu

Abstract: Domain generalization (DG) is essentially an out-of-distribution problem, aiming to generalize the knowledge learned from multiple source domains to an unseen target domain. The mainstream is to leverage statistical models to model the dependence between data and labels, intending to learn representations independent of domain. Nevertheless, the statistical models are superficial descriptions of r… ▽ More Domain generalization (DG) is essentially an out-of-distribution problem, aiming to generalize the knowledge learned from multiple source domains to an unseen target domain. The mainstream is to leverage statistical models to model the dependence between data and labels, intending to learn representations independent of domain. Nevertheless, the statistical models are superficial descriptions of reality since they are only required to model dependence instead of the intrinsic causal mechanism. When the dependence changes with the target distribution, the statistic models may fail to generalize. In this regard, we introduce a general structural causal model to formalize the DG problem. Specifically, we assume that each input is constructed from a mix of causal factors (whose relationship with the label is invariant across domains) and non-causal factors (category-independent), and only the former cause the classification judgments. Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms. However, the theoretical idea is far from practical of DG since the required causal/non-causal factors are unobserved. We highlight that ideal causal factors should meet three basic properties: separated from the non-causal ones, jointly independent, and causally sufficient for the classification. Based on that, we propose a Causality Inspired Representation Learning (CIRL) algorithm that enforces the representations to satisfy the above properties and then uses them to simulate the causal factors, which yields improved generalization ability. Extensive experimental results on several widely used datasets verify the effectiveness of our approach. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Comments: Accepted in CVPR 2022

arXiv:2202.08959 [pdf, other]

doi 10.1145/3485447.3511970

Deep Interest Highlight Network for Click-Through Rate Prediction in Trigger-Induced Recommendation

Authors: Qijie Shen, Hong Wen, Wanjie Tao, Jing Zhang, Fuyu Lv, Zulong Chen, Zhao Li

Abstract: In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms. In this paper, we present a new recommendation problem, Trigger-Induced Recommendation (TIR), where users' instant interest can be explicitly induced with a trigger item and follow-up related target items are r… ▽ More In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms. In this paper, we present a new recommendation problem, Trigger-Induced Recommendation (TIR), where users' instant interest can be explicitly induced with a trigger item and follow-up related target items are recommended accordingly. TIR has become ubiquitous and popular in e-commerce platforms. In this paper, we figure out that although existing recommendation models are effective in traditional recommendation scenarios by mining users' interests based on their massive historical behaviors, they are struggling in discovering users' instant interests in the TIR scenario due to the discrepancy between these scenarios, resulting in inferior performance. To tackle the problem, we propose a novel recommendation method named Deep Interest Highlight Network (DIHN) for Click-Through Rate (CTR) prediction in TIR scenarios. It has three main components including 1) User Intent Network (UIN), which responds to generate a precise probability score to predict user's intent on the trigger item; 2) Fusion Embedding Module (FEM), which adaptively fuses trigger item and target item embeddings based on the prediction from UIN; and (3) Hybrid Interest Extracting Module (HIEM), which can effectively highlight users' instant interest from their behaviors based on the result of FEM. Extensive offline and online evaluations on a real-world e-commerce platform demonstrate the superiority of DIHN over state-of-the-art methods. △ Less

Submitted 20 February, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

Comments: Accepted by WWW 2022

arXiv:2202.06081 [pdf, other]

Modeling User Behavior with Graph Convolution for Personalized Product Search

Authors: Fan Lu, Qimai Li, Bo Liu, Xiao-Ming Wu, Xiaotong Zhang, Fuyu Lv, Guli Lin, Sen Li, Taiwei Jin, Keping Yang

Abstract: User preference modeling is a vital yet challenging problem in personalized product search. In recent years, latent space based methods have achieved state-of-the-art performance by jointly learning semantic representations of products, users, and text tokens. However, existing methods are limited in their ability to model user preferences. They typically represent users by the products they visit… ▽ More User preference modeling is a vital yet challenging problem in personalized product search. In recent years, latent space based methods have achieved state-of-the-art performance by jointly learning semantic representations of products, users, and text tokens. However, existing methods are limited in their ability to model user preferences. They typically represent users by the products they visited in a short span of time using attentive models and lack the ability to exploit relational information such as user-product interactions or item co-occurrence relations. In this work, we propose to address the limitations of prior arts by exploring local and global user behavior patterns on a user successive behavior graph, which is constructed by utilizing short-term actions of all users. To capture implicit user preference signals and collaborative patterns, we use an efficient jumping graph convolution to explore high-order relations to enrich product representations for user preference modeling. Our approach can be seamlessly integrated with existing latent space based methods and be potentially applied in any product retrieval method that uses purchase history to model user preferences. Extensive experiments on eight Amazon benchmarks demonstrate the effectiveness and potential of our approach. The source code is available at \url{https://github.com/floatSDSDS/SBG}. △ Less

Submitted 12 February, 2022; originally announced February 2022.

arXiv:2202.04972 [pdf, other]

IHGNN: Interactive Hypergraph Neural Network for Personalized Product Search

Authors: Dian Cheng, Jiawei Chen, Wenjun Peng, Wenqin Ye, Fuyu Lv, Tao Zhuang, Xiaoyi Zeng, Xiangnan He

Abstract: A good personalized product search (PPS) system should not only focus on retrieving relevant products, but also consider user personalized preference. Recent work on PPS mainly adopts the representation learning paradigm, e.g., learning representations for each entity (including user, product and query) from historical user behaviors (aka. user-product-query interactions). However, we argue that e… ▽ More A good personalized product search (PPS) system should not only focus on retrieving relevant products, but also consider user personalized preference. Recent work on PPS mainly adopts the representation learning paradigm, e.g., learning representations for each entity (including user, product and query) from historical user behaviors (aka. user-product-query interactions). However, we argue that existing methods do not sufficiently exploit the crucial collaborative signal, which is latent in historical interactions to reveal the affinity between the entities. Collaborative signal is quite helpful for generating high-quality representation, exploiting which would benefit the representation learning of one node from its connected nodes. To tackle this limitation, in this work, we propose a new model IHGNN for personalized product search. IHGNN resorts to a hypergraph constructed from the historical user-product-query interactions, which could completely preserve ternary relations and express collaborative signal based on the topological structure. On this basis, we develop a specific interactive hypergraph neural network to explicitly encode the structure information (i.e., collaborative signal) into the embedding process. It collects the information from the hypergraph neighbors and explicitly models neighbor feature interaction to enhance the representation of the target entity. Extensive experiments on three real-world datasets validate the superiority of our proposal over the state-of-the-arts. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Presented at Proceedings of the ACM Web Conference 2022 (WWW '22)

arXiv:2201.02010 [pdf, other]

doi 10.1109/TCSVT.2023.3235704

Self-Training Vision Language BERTs with a Unified Conditional Model

Authors: Xiaofeng Yang, Fengmao Lv, Fayao Liu, Guosheng Lin

Abstract: Natural language BERTs are trained with language corpus in a self-supervised manner. Unlike natural language BERTs, vision language BERTs need paired data to train, which restricts the scale of VL-BERT pretraining. We propose a self-training approach that allows training VL-BERTs from unlabeled image data. The proposed method starts with our unified conditional model -- a vision language BERT mode… ▽ More Natural language BERTs are trained with language corpus in a self-supervised manner. Unlike natural language BERTs, vision language BERTs need paired data to train, which restricts the scale of VL-BERT pretraining. We propose a self-training approach that allows training VL-BERTs from unlabeled image data. The proposed method starts with our unified conditional model -- a vision language BERT model that can perform zero-shot conditional generation. Given different conditions, the unified conditional model can generate captions, dense captions, and even questions. We use the labeled image data to train a teacher model and use the trained model to generate pseudo captions on unlabeled image data. We then combine the labeled data and pseudo labeled data to train a student model. The process is iterated by putting the student model as a new teacher. By using the proposed self-training approach and only 300k unlabeled extra data, we are able to get competitive or even better performances compared to the models of similar model size trained with 3 million extra image data. △ Less

Submitted 19 January, 2023; v1 submitted 6 January, 2022; originally announced January 2022.

arXiv:2112.04137 [pdf, other]

Pareto Domain Adaptation

Authors: Fangrui Lv, Jian Liang, Kaixiong Gong, Shuang Li, Chi Harold Liu, Han Li, Di Liu, Guoren Wang

Abstract: Domain adaptation (DA) attempts to transfer the knowledge from a labeled source domain to an unlabeled target domain that follows different distribution from the source. To achieve this, DA methods include a source classification objective to extract the source knowledge and a domain alignment objective to diminish the domain shift, ensuring knowledge transfer. Typically, former DA methods adopt s… ▽ More Domain adaptation (DA) attempts to transfer the knowledge from a labeled source domain to an unlabeled target domain that follows different distribution from the source. To achieve this, DA methods include a source classification objective to extract the source knowledge and a domain alignment objective to diminish the domain shift, ensuring knowledge transfer. Typically, former DA methods adopt some weight hyper-parameters to linearly combine the training objectives to form an overall objective. However, the gradient directions of these objectives may conflict with each other due to domain shift. Under such circumstances, the linear optimization scheme might decrease the overall objective value at the expense of damaging one of the training objectives, leading to restricted solutions. In this paper, we rethink the optimization scheme for DA from a gradient-based perspective. We propose a Pareto Domain Adaptation (ParetoDA) approach to control the overall optimization direction, aiming to cooperatively optimize all training objectives. Specifically, to reach a desirable solution on the target domain, we design a surrogate loss mimicking target classification. To improve target-prediction accuracy to support the mimicking, we propose a target-prediction refining mechanism which exploits domain labels via Bayes' theorem. On the other hand, since prior knowledge of weighting schemes for objectives is often unavailable to guide optimization to approach the optimal solution on the target domain, we propose a dynamic preference mechanism to dynamically guide our cooperative optimization by the gradient of the surrogate loss on a held-out unlabeled target dataset. Extensive experiments on image classification and semantic segmentation benchmarks demonstrate the effectiveness of ParetoDA △ Less

Submitted 9 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Accepted in NeurIPS 2021

arXiv:2112.02792 [pdf, other]

Incentive Compatible Pareto Alignment for Multi-Source Large Graphs

Authors: Jian Liang, Fangrui Lv, Di Liu, Zehui Dai, Xu Tian, Shuang Li, Fei Wang, Han Li

Abstract: In this paper, we focus on learning effective entity matching models over multi-source large-scale data. For real applications, we relax typical assumptions that data distributions/spaces, or entity identities are shared between sources, and propose a Relaxed Multi-source Large-scale Entity-matching (RMLE) problem. Challenges of the problem include 1) how to align large-scale entities between sour… ▽ More In this paper, we focus on learning effective entity matching models over multi-source large-scale data. For real applications, we relax typical assumptions that data distributions/spaces, or entity identities are shared between sources, and propose a Relaxed Multi-source Large-scale Entity-matching (RMLE) problem. Challenges of the problem include 1) how to align large-scale entities between sources to share information and 2) how to mitigate negative transfer from joint learning multi-source data. What's worse, one practical issue is the entanglement between both challenges. Specifically, incorrect alignments may increase negative transfer; while mitigating negative transfer for one source may result in poorly learned representations for other sources and then decrease alignment accuracy. To handle the entangled challenges, we point out that the key is to optimize information sharing first based on Pareto front optimization, by showing that information sharing significantly influences the Pareto front which depicts lower bounds of negative transfer. Consequently, we proposed an Incentive Compatible Pareto Alignment (ICPA) method to first optimize cross-source alignments based on Pareto front optimization, then mitigate negative transfer constrained on the optimized alignments. This mechanism renders each source can learn based on its true preference without worrying about deteriorating representations of other sources. Specifically, the Pareto front optimization encourages minimizing lower bounds of negative transfer, which optimizes whether and which to align. Comprehensive empirical evaluation results on four large-scale datasets are provided to demonstrate the effectiveness and superiority of ICPA. Online A/B test results at a search advertising platform also demonstrate the effectiveness of ICPA in production environments. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2108.05720 [pdf, other]

Semantic Concentration for Domain Adaptation

Authors: Shuang Li, Mixue Xie, Fangrui Lv, Chi Harold Liu, Jian Liang, Chen Qin, Wei Li

Abstract: Domain adaptation (DA) paves the way for label annotation and dataset bias issues by the knowledge transfer from a label-rich source domain to a related but unlabeled target domain. A mainstream of DA methods is to align the feature distributions of the two domains. However, the majority of them focus on the entire image features where irrelevant semantic information, e.g., the messy background, i… ▽ More Domain adaptation (DA) paves the way for label annotation and dataset bias issues by the knowledge transfer from a label-rich source domain to a related but unlabeled target domain. A mainstream of DA methods is to align the feature distributions of the two domains. However, the majority of them focus on the entire image features where irrelevant semantic information, e.g., the messy background, is inevitably embedded. Enforcing feature alignments in such case will negatively influence the correct matching of objects and consequently lead to the semantically negative transfer due to the confusion of irrelevant semantics. To tackle this issue, we propose Semantic Concentration for Domain Adaptation (SCDA), which encourages the model to concentrate on the most principal features via the pair-wise adversarial alignment of prediction distributions. Specifically, we train the classifier to class-wisely maximize the prediction distribution divergence of each sample pair, which enables the model to find the region with large differences among the same class of samples. Meanwhile, the feature extractor attempts to minimize that discrepancy, which suppresses the features of dissimilar regions among the same class of samples and accentuates the features of principal parts. As a general method, SCDA can be easily integrated into various DA methods as a regularizer to further boost their performance. Extensive experiments on the cross-domain benchmarks show the efficacy of SCDA. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV 2021

arXiv:2106.09297 [pdf, other]

Embedding-based Product Retrieval in Taobao Search

Authors: Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xiaoyi Zeng, Xiao-Ming Wu, Qianli Ma

Abstract: Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this… ▽ More Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 9 pages, accepted by KDD2021

arXiv:2106.04879 [pdf, ps, other]

doi 10.1016/j.physletb.2021.136840

Evidence against the wobbling nature of low-spin bands in $^{135}$Pr

Authors: B. F. Lv, C. M. Petrache, E. A. Lawrie, S. Guo, A. Astier, E. Dupont, K. K. Zheng, H. J. Ong, J. G. Wang, X. H. Zhou, Z. Y. Sun, P. Greenlees, H. Badran, T. Calverley, D. M. Cox, T. Grahn, J. Hilton, R. Julin, S. Juutinen, J. Konki, J. Pakarinen, P. Papadakis, J. Partanen, P. Rahkila, P. Ruotsalainen , et al. (14 additional authors not shown)

Abstract: The electromagnetic character of the $ΔI=1$ transitions connecting the one- to zero-phonon and the two- to one-phonon wobbling bands should be dominated by an $E2$ component, due to the collective motion of the entire nuclear charge. In the present work it is shown, based on combined angular correlation and linear polarization measurements, that the mixing ratios of all analyzed connecting transit… ▽ More The electromagnetic character of the $ΔI=1$ transitions connecting the one- to zero-phonon and the two- to one-phonon wobbling bands should be dominated by an $E2$ component, due to the collective motion of the entire nuclear charge. In the present work it is shown, based on combined angular correlation and linear polarization measurements, that the mixing ratios of all analyzed connecting transitions between low-lying bands in $^{135}$Pr interpreted as zero-, one-, and two-phonon wobbling bands, have absolute values smaller than one. This indicates predominant $M1$ magnetic character, which is incompatible with the proposed wobbling nature. All experimental observables are instead in good agreement with quasiparticle-plus-triaxial-rotor model calculations, which describe the bands as resulting from a rapid re-alignment of the total angular momentum from the short to the intermediate nuclear axis. △ Less

Submitted 18 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2104.09713 [pdf, other]

Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction

Authors: Hong Wen, Jing Zhang, Fuyu Lv, Wentian Bao, Tianyi Wang, Zulong Chen

Abstract: Conversion Rate (\emph{CVR}) prediction in modern industrial e-commerce platforms is becoming increasingly important, which directly contributes to the final revenue. In order to address the well-known sample selection bias (\emph{SSB}) and data sparsity (\emph{DS}) issues encountered during CVR modeling, the abundant labeled macro behaviors ($i.e.$, user's interactions with items) are used. Nonet… ▽ More Conversion Rate (\emph{CVR}) prediction in modern industrial e-commerce platforms is becoming increasingly important, which directly contributes to the final revenue. In order to address the well-known sample selection bias (\emph{SSB}) and data sparsity (\emph{DS}) issues encountered during CVR modeling, the abundant labeled macro behaviors ($i.e.$, user's interactions with items) are used. Nonetheless, we observe that several purchase-related micro behaviors ($i.e.$, user's interactions with specific components on the item detail page) can supplement fine-grained cues for \emph{CVR} prediction. Motivated by this observation, we propose a novel \emph{CVR} prediction method by Hierarchically Modeling both Micro and Macro behaviors ($HM^3$). Specifically, we first construct a complete user sequential behavior graph to hierarchically represent micro behaviors and macro behaviors as one-hop and two-hop post-click nodes. Then, we embody $HM^3$ as a multi-head deep neural network, which predicts six probability variables corresponding to explicit sub-paths in the graph. They are further combined into the prediction targets of four auxiliary tasks as well as the final $CVR$ according to the conditional probability rule defined on the graph. By employing multi-task learning and leveraging the abundant supervisory labels from micro and macro behaviors, $HM^3$ can be trained end-to-end and address the \emph{SSB} and \emph{DS} issues. Extensive experiments on both offline and online settings demonstrate the superiority of the proposed $HM^3$ over representative state-of-the-art methods. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted as SIGIR 2021 short paper

arXiv:2103.00546 [pdf, ps, other]

Diophantine analysis of the expansions of a fixed point under continuum many bases

Authors: Fan Lv, Baowei Wang, Jun Wu

Abstract: In this paper, we study the Diophantine properties of the orbits of a fixed point in its expansions under continuum many bases. More precisely, let $T_β$ be the beta-transformation with base $β>1$, $\{x_{n}\}_{n\geq 1}$ be a sequence of real numbers in $[0,1]$ and $\varphi\colon \mathbb{N}\rightarrow (0,1]$ be a positive function. With a detailed analysis on the distribution of {\em full cylinders… ▽ More In this paper, we study the Diophantine properties of the orbits of a fixed point in its expansions under continuum many bases. More precisely, let $T_β$ be the beta-transformation with base $β>1$, $\{x_{n}\}_{n\geq 1}$ be a sequence of real numbers in $[0,1]$ and $\varphi\colon \mathbb{N}\rightarrow (0,1]$ be a positive function. With a detailed analysis on the distribution of {\em full cylinders} in the base space $\{β>1\}$, it is shown that for any given $x\in(0,1]$, for almost all or almost no bases $β>1$, the orbit of $x$ under $T_β$ can $\varphi$-well approximate the sequence $\{x_{n}\}_{n\geq 1}$ according to the divergence or convergence of the series $\sum \varphi(n)$. This strengthens Schmeling's result significantly and complete all known results in this aspect. Moreover, the idea presented here can also be used to determine the Lebesgue measure of the set \begin{equation*} \{x\in [0,1]\colon|T^{n}_βx-L(x)|<\varphi(n) \text{ for infinitely many } n\in\mathbb{N}\}, \end{equation*} for a fixed base $β>1$, where $L\colon [0,1]\rightarrow[0,1]$ is a Lipschitz function. △ Less

Submitted 28 February, 2021; originally announced March 2021.

MSC Class: 11K55; 11J83

arXiv:2012.13892 [pdf, other]

Adaptive Graph-based Generalized Regression Model for Unsupervised Feature Selection

Authors: Yanyong Huang, Zongxin Shen, Fuxu Cai, Tianrui Li, Fengmao Lv

Abstract: Unsupervised feature selection is an important method to reduce dimensions of high dimensional data without labels, which is benefit to avoid ``curse of dimensionality'' and improve the performance of subsequent machine learning tasks, like clustering and retrieval. How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection. Many proposed method… ▽ More Unsupervised feature selection is an important method to reduce dimensions of high dimensional data without labels, which is benefit to avoid ``curse of dimensionality'' and improve the performance of subsequent machine learning tasks, like clustering and retrieval. How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection. Many proposed methods select features with strong discriminant and high redundancy, or vice versa. However, they only satisfy one of these two criteria. Other existing methods choose the discriminative features with low redundancy by constructing the graph matrix on the original feature space. Since the original feature space usually contains redundancy and noise, it will degrade the performance of feature selection. In order to address these issues, we first present a novel generalized regression model imposed by an uncorrelated constraint and the $\ell_{2,1}$-norm regularization. It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood, which is help for the clustering task. Furthermore, the local intrinsic structure of data is constructed on the reduced dimensional space by learning the similarity-induced graph adaptively. Then the learnings of the graph structure and the indicator matrix based on the spectral analysis are integrated into the generalized regression model. Finally, we develop an alternative iterative optimization algorithm to solve the objective function. A series of experiments are carried out on nine real-world data sets to demonstrate the effectiveness of the proposed method in comparison with other competing approaches. △ Less

Submitted 27 December, 2020; originally announced December 2020.

arXiv:2012.06995 [pdf, other]

Bi-Classifier Determinacy Maximization for Unsupervised Domain Adaptation

Authors: Shuang Li, Fangrui Lv, Binhui Xie, Chi Harold Liu, Jian Liang, Chen Qin

Abstract: Unsupervised domain adaptation challenges the problem of transferring knowledge from a well-labelled source domain to an unlabelled target domain. Recently,adversarial learning with bi-classifier has been proven effective in pushing cross-domain distributions close. Prior approaches typically leverage the disagreement between bi-classifier to learn transferable representations, however, they often… ▽ More Unsupervised domain adaptation challenges the problem of transferring knowledge from a well-labelled source domain to an unlabelled target domain. Recently,adversarial learning with bi-classifier has been proven effective in pushing cross-domain distributions close. Prior approaches typically leverage the disagreement between bi-classifier to learn transferable representations, however, they often neglect the classifier determinacy in the target domain, which could result in a lack of feature discriminability. In this paper, we present a simple yet effective method, namely Bi-Classifier Determinacy Maximization(BCDM), to tackle this problem. Motivated by the observation that target samples cannot always be separated distinctly by the decision boundary, here in the proposed BCDM, we design a novel classifier determinacy disparity (CDD) metric, which formulates classifier discrepancy as the class relevance of distinct target predictions and implicitly introduces constraint on the target feature discriminability. To this end, the BCDM can generate discriminative representations by encouraging target predictive outputs to be consistent and determined, meanwhile, preserve the diversity of predictions in an adversarial manner. Furthermore, the properties of CDD as well as the theoretical guarantees of BCDM's generalization bound are both elaborated. Extensive experiments show that BCDM compares favorably against the existing state-of-the-art domain adaptation methods. △ Less

Submitted 13 December, 2020; originally announced December 2020.

Comments: Accepted as AAAI 2021. The code is publicly available at https://github.com/BIT-DA/BCDM

arXiv:2011.14354 [pdf]

doi 10.1016/j.physletb.2022.137010

Probing the nature of the conjectured low-spin wobbling bands in atomic nuclei

Authors: S. Guo, X. H. Zhou, C. M. Petrache, E. A. Lawrie, S. Mthembu, Y. D. Fang, H. Y. Wu, H. L. Wang, H. Y. Meng, G. S. Li, Y. H. Qiang, J. G. Wang, M. L. Liu, Y. Zheng, B. Ding, W. Q. Zhang, A. Rohilla, K. R. Mukhi, Y. Y. Yang, H. J. Ong, J. B. Ma, S. W. Xu, Z. Bai, H. L. Fan, J. F. Huang , et al. (6 additional authors not shown)

Abstract: Precession is a unique motion in which the orientation of the rotational axis of a rotating body is not fixed but moving, and it generally exists in the Universe from giant stars through tiny atomic nuclei. In principle, the precession of an atomic nuclide can be approximately described as wobbling motion, arising from the coupling of a rotation and a harmonic vibration. Recently, a number of wobb… ▽ More Precession is a unique motion in which the orientation of the rotational axis of a rotating body is not fixed but moving, and it generally exists in the Universe from giant stars through tiny atomic nuclei. In principle, the precession of an atomic nuclide can be approximately described as wobbling motion, arising from the coupling of a rotation and a harmonic vibration. Recently, a number of wobbling bands were reported at low spin, which violate the wobbling approximation that can be valid only at high spin. Here we explore the nature of the reported low-spin wobbling bands. Via a new experiment, we demonstrate that one such band in $^{187}$Au is generated by dominant single-particle excitation rather than by the excitation of a wobbling phonon. We point out that the imperfect research paradigm used previously would lead to unreliable identification of low-spin wobbling bands. Consequently, new experimental approaches should be developed to distinguish among the different excitation mechanisms that can give rise to the observed low-spin bands in odd-even nuclei. △ Less

Submitted 18 September, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

arXiv:2010.12837 [pdf, other]

XDM: Improving Sequential Deep Matching with Unclicked User Behaviors for Recommender System

Authors: Fuyu Lv, Mengxue Li, Tonglei Guo, Changlong Yu, Fei Sun, Taiwei Jin, Wilfred Ng

Abstract: Deep learning-based sequential recommender systems have recently attracted increasing attention from both academia and industry. Most of industrial Embedding-Based Retrieval (EBR) system for recommendation share the similar ideas with sequential recommenders. Among them, how to comprehensively capture sequential user interest is a fundamental problem. However, most existing sequential recommendati… ▽ More Deep learning-based sequential recommender systems have recently attracted increasing attention from both academia and industry. Most of industrial Embedding-Based Retrieval (EBR) system for recommendation share the similar ideas with sequential recommenders. Among them, how to comprehensively capture sequential user interest is a fundamental problem. However, most existing sequential recommendation models take as input clicked or purchased behavior sequences from user-item interactions. This leads to incomprehensive user representation and sub-optimal model performance, since they ignore the complete user behavior exposure data, i.e., items impressed yet unclicked by users. In this work, we attempt to incorporate and model those unclicked item sequences using a new learning approach in order to explore better sequential recommendation technique. An efficient triplet metric learning algorithm is proposed to appropriately learn the representation of unclicked items. Our method can be simply integrated with existing sequential recommendation models by a confidence fusion network and further gain better user representation. The offline experimental results based on real-world E-commerce data demonstrate the effectiveness and verify the importance of unclicked items in sequential recommendation. Moreover we deploy our new model (named XDM) into EBR of recommender system at Taobao, outperforming the deployed previous generation SDM. △ Less

Submitted 30 March, 2022; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: 12 pages, accepted by DASFAA2022

arXiv:2008.07925 [pdf, ps, other]

doi 10.1140/epja/s10050-020-00218-5

Multiple chiral bands in $^{137}$Nd

Authors: C. M. Petrache, B. F. Lv, Q. B. Chen, J. Meng, A. Astier, E. Dupont, K. K. Zheng, P. T. Greenlees, H. Badran, T. Calverley, D. M. Cox, T. Grahn, J. Hilton, R. Julin, S. Juutinen, J. Konki, J. Pakarinen, P. Papadakis, J. Partanen, P. Rahkila, P. Ruotsalainen, M. Sandzelius, J. Saren, C. Scholey, J. Sorri , et al. (13 additional authors not shown)

Abstract: Two new bands have been identified in $^{137}$Nd from a high-statistics JUROGAM II gamma-ray spectroscopy experiment. Constrained density functional theory and particle rotor model calculations are used to assign configurations and investigate the band properties, which are well described and understood. It is demonstrated that these two new bands can be interpreted as chiral partners of previousl… ▽ More Two new bands have been identified in $^{137}$Nd from a high-statistics JUROGAM II gamma-ray spectroscopy experiment. Constrained density functional theory and particle rotor model calculations are used to assign configurations and investigate the band properties, which are well described and understood. It is demonstrated that these two new bands can be interpreted as chiral partners of previously known three-quasiparticle positive- and negative-parity bands. The newly observed chiral doublet bands in $^{137}$Nd represent an important support to the existence of multiple chiral bands in nuclei. The present results constitute the missing stone in the series of Nd nuclei showing multiple chiral bands, which becomes the most extended sequence of nuclei presenting multiple chiral bands in the Segré chart. △ Less

Submitted 18 August, 2020; originally announced August 2020.

arXiv:2008.05673 [pdf, other]

doi 10.1145/3340531.3412729

MTBRN: Multiplex Target-Behavior Relation Enhanced Network for Click-Through Rate Prediction

Authors: Yufei Feng, Fuyu Lv, Binbin Hu, Fei Sun, Kun Kuang, Yang Liu, Qingwen Liu, Wenwu Ou

Abstract: Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. Howev… ▽ More Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. However, this methodology lacks of concrete semantics and overlooks the underlying reasons driving a user to click on a target item. In this paper, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) to leverage multiplex relations between user behaviors and target item to enhance CTR prediction. Multiplex relations consist of meaningful semantics, which can bring a better understanding on users' interests from different perspectives. To explore and model multiplex relations, we propose to incorporate various graphs (e.g., knowledge graph and item-item similarity graph) to construct multiple relational paths between user behaviors and target item. Then Bi-LSTM is applied to encode each path in the path extractor layer. A path fusion network and a path activation network are devised to adaptively aggregate and finally learn the representation of all paths for CTR prediction. Extensive offline and online experiments clearly verify the effectiveness of our framework. △ Less

Submitted 26 August, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: Accepted by CIKM2020

arXiv:2007.10692 [pdf, other]

Interpretable Fault Detection using Projections of Mutual Information Matrix

Authors: Feiya Lv, Shujian Yu, Chenglin Wen, Jose C. Principe

Abstract: This paper presents a novel mutual information (MI) matrix based method for fault detection. Given a $m$-dimensional fault process, the MI matrix is a $m \times m$ matrix in which the $(i,j)$-th entry measures the MI values between the $i$-th dimension and the $j$-th dimension variables. We introduce the recently proposed matrix-based Rényi's $α$-entropy functional to estimate MI values in each en… ▽ More This paper presents a novel mutual information (MI) matrix based method for fault detection. Given a $m$-dimensional fault process, the MI matrix is a $m \times m$ matrix in which the $(i,j)$-th entry measures the MI values between the $i$-th dimension and the $j$-th dimension variables. We introduce the recently proposed matrix-based Rényi's $α$-entropy functional to estimate MI values in each entry of the MI matrix. The new estimator avoids density estimation and it operates on the eigenspectrum of a (normalized) symmetric positive definite (SPD) matrix, which makes it well suited for industrial process. We combine different orders of statistics of the transformed components (TCs) extracted from the MI matrix to constitute the detection index, and derive a simple similarity index to monitor the changes of characteristics of the underlying process in consecutive windows. We term the overall methodology "projections of mutual information matrix" (PMIM). Experiments on both synthetic data and the benchmark Tennessee Eastman process demonstrate the interpretability of PMIM in identifying the root variables that cause the faults, and its superiority in detecting the occurrence of faults in terms of the improved fault detection rate (FDR) and the lowest false alarm rate (FAR). The advantages of PMIM is also less sensitive to hyper-parameters. The advantages of PMIM is also less sensitive to hyper-parameters. Code of PMIM is available at https://github.com/SJYuCNEL/Fault_detection_PMIM △ Less

Submitted 18 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: manuscript accepted at the Journal of The Franklin Institute. MATLAB code is available at https://github.com/SJYuCNEL/Fault_detection_PMIM

arXiv:2007.00515 [pdf, other]

Learning unbiased zero-shot semantic segmentation networks via transductive transfer

Authors: Haiyang Liu, Yichen Wang, Jiayi Zhao, Guowu Yang, Fengmao Lv

Abstract: Semantic segmentation, which aims to acquire a detailed understanding of images, is an essential issue in computer vision. However, in practical scenarios, new categories that are different from the categories in training usually appear. Since it is impractical to collect labeled data for all categories, how to conduct zero-shot learning in semantic segmentation establishes an important problem. A… ▽ More Semantic segmentation, which aims to acquire a detailed understanding of images, is an essential issue in computer vision. However, in practical scenarios, new categories that are different from the categories in training usually appear. Since it is impractical to collect labeled data for all categories, how to conduct zero-shot learning in semantic segmentation establishes an important problem. Although the attribute embedding of categories can promote effective knowledge transfer across different categories, the prediction of segmentation network reveals obvious bias to seen categories. In this paper, we propose an easy-to-implement transductive approach to alleviate the prediction bias in zero-shot semantic segmentation. Our method assumes that both the source images with full pixel-level labels and unlabeled target images are available during training. To be specific, the source images are used to learn the relationship between visual images and semantic embeddings, while the target images are used to alleviate the prediction bias towards seen categories. We conduct comprehensive experiments on diverse split s of the PASCAL dataset. The experimental results clearly demonstrate the effectiveness of our method. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.09235 [pdf, other]

Weakly-supervised Domain Adaption for Aspect Extraction via Multi-level Interaction Transfer

Authors: Tao Liang, Wenya Wang, Fengmao Lv

Abstract: Fine-grained aspect extraction is an essential sub-task in aspect based opinion analysis. It aims to identify the aspect terms (a.k.a. opinion targets) of a product or service in each sentence. However, expensive annotation process is usually involved to acquire sufficient token-level labels for each domain. To address this limitation, some previous works propose domain adaptation strategies to tr… ▽ More Fine-grained aspect extraction is an essential sub-task in aspect based opinion analysis. It aims to identify the aspect terms (a.k.a. opinion targets) of a product or service in each sentence. However, expensive annotation process is usually involved to acquire sufficient token-level labels for each domain. To address this limitation, some previous works propose domain adaptation strategies to transfer knowledge from a sufficiently labeled source domain to unlabeled target domains. But due to both the difficulty of fine-grained prediction problems and the large domain gap between domains, the performance remains unsatisfactory. This work conducts a pioneer study on leveraging sentence-level aspect category labels that can be usually available in commercial services like review sites to promote token-level transfer for the extraction purpose. Specifically, the aspect category information is used to construct pivot knowledge for transfer with assumption that the interactions between sentence-level aspect category and token-level aspect terms are invariant across domains. To this end, we propose a novel multi-level reconstruction mechanism that aligns both the fine-grained and coarse-grained information in multiple levels of abstractions. Comprehensive experiments demonstrate that our approach can fully utilize sentence-level aspect category labels to improve cross-domain aspect extraction with a large performance gain. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2006.00439 [pdf, other]

Fast Enhancement for Non-Uniform Illumination Images using Light-weight CNNs

Authors: Feifan Lv, Bo Liu, Feng Lu

Abstract: This paper proposes a new light-weight convolutional neural network (5k parameters) for non-uniform illumination image enhancement to handle color, exposure, contrast, noise and artifacts, etc., simultaneously and effectively. More concretely, the input image is first enhanced using Retinex model from dual different aspects (enhancing under-exposure and suppressing over-exposure), respectively. Th… ▽ More This paper proposes a new light-weight convolutional neural network (5k parameters) for non-uniform illumination image enhancement to handle color, exposure, contrast, noise and artifacts, etc., simultaneously and effectively. More concretely, the input image is first enhanced using Retinex model from dual different aspects (enhancing under-exposure and suppressing over-exposure), respectively. Then, these two enhanced results and the original image are fused to obtain an image with satisfactory brightness, contrast and details. Finally, the extra noise and compression artifacts are removed to get the final result. To train this network, we propose a semi-supervised retouching solution and construct a new dataset (82k images) contains various scenes and light conditions. Our model can enhance 0.5 mega-pixel (like 600*800) images in real time (50 fps), which is faster than existing enhancement methods. Extensive experiments show that our solution is fast and effective to deal with non-uniform illumination images. △ Less

Submitted 31 May, 2020; originally announced June 2020.

Comments: 9 pages, 12 figures, 2 tables

arXiv:2005.12002 [pdf, other]

doi 10.1145/3397271.3401428

ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation

Authors: Yufei Feng, Binbin Hu, Fuyu Lv, Qingwen Liu, Zhiqiang Zhang, Wenwu Ou

Abstract: Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representati… ▽ More Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representations for users and items separately. Despite effectiveness, the former type of methods fails to fully capture structural information implied in KG, while the latter ignores the mutual effect between target user and item during the embedding propagation. In this work, we propose a new framework named Adaptive Target-Behavior Relational Graph network (ATBRG for short) to effectively capture structural relations of target user-item pairs over KG. Specifically, to associate the given target item with user behaviors over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. To fully distill structural information from the sub-graph connected by rich relations in an end-to-end fashion, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer. We perform extensive experiments on both industrial and benchmark datasets. Empirical results show that ATBRG consistently and significantly outperforms state-of-the-art methods. Moreover, ATBRG has also achieved a performance improvement of 5.1% on CTR metric after successful deployment in one popular recommendation scenario of Taobao APP. △ Less

Submitted 27 May, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: Accepted by SIGIR 2020, full paper with 10 pages and 5 figures

arXiv:2005.04580 [pdf, other]

An Integrated Enhancement Solution for 24-hour Colorful Imaging

Authors: Feifan Lv, Yinqiang Zheng, Yicheng Li, Feng Lu

Abstract: The current industry practice for 24-hour outdoor imaging is to use a silicon camera supplemented with near-infrared (NIR) illumination. This will result in color images with poor contrast at daytime and absence of chrominance at nighttime. For this dilemma, all existing solutions try to capture RGB and NIR images separately. However, they need additional hardware support and suffer from various d… ▽ More The current industry practice for 24-hour outdoor imaging is to use a silicon camera supplemented with near-infrared (NIR) illumination. This will result in color images with poor contrast at daytime and absence of chrominance at nighttime. For this dilemma, all existing solutions try to capture RGB and NIR images separately. However, they need additional hardware support and suffer from various drawbacks, including short service life, high price, specific usage scenario, etc. In this paper, we propose a novel and integrated enhancement solution that produces clear color images, whether at abundant sunlight daytime or extremely low-light nighttime. Our key idea is to separate the VIS and NIR information from mixed signals, and enhance the VIS signal adaptively with the NIR signal as assistance. To this end, we build an optical system to collect a new VIS-NIR-MIX dataset and present a physically meaningful image processing algorithm based on CNN. Extensive experiments show outstanding results, which demonstrate the effectiveness of our solution. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: AAAI 2020 (Oral)

Showing 1–50 of 68 results for author: Lv, F