Zum Hauptinhalt springen

Showing 1–50 of 74 results for author: Si, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  2. arXiv:2406.18856  [pdf, ps, other

    cs.CL cs.AI cs.CE

    FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus

    Authors: Yuxin Fu, Shijing Si, Leyi Mai, Xi-ang Li

    Abstract: Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream med… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: a simplified version of this paper is accepted by International Conference on Asian Language Processing 2024

  3. arXiv:2406.17224  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    Large Language Models are Interpretable Learners

    Authors: Ruochen Wang, Si Si, Felix Yu, Dorothea Wiesmann, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Preliminary Version, Code at [this url](https://github.com/ruocwang/llm-symbolic-program)

    MSC Class: 68T05

  4. arXiv:2405.17505  [pdf, other

    cs.LG cs.CL

    Predicting Rental Price of Lane Houses in Shanghai with Machine Learning Methods and Large Language Models

    Authors: Tingting Chen, Shijing Si

    Abstract: Housing has emerged as a crucial concern among young individuals residing in major cities, including Shanghai. Given the unprecedented surge in property prices in this metropolis, young people have increasingly resorted to the rental market to address their housing needs. This study utilizes five traditional machine learning methods: multiple linear regression (MLR), ridge regression (RR), lasso r… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 13 pages, 11 figures, 39 references

  5. arXiv:2404.08491  [pdf, other

    cs.CL cs.AI

    Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

    Authors: Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang

    Abstract: Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tun… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  6. arXiv:2402.15537  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Evaluating the Performance of ChatGPT for Spam Email Detection

    Authors: Shijing Si, Yuwei Wu, Le Tang, Yugui Zhang, Jedrek Wosik

    Abstract: Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements i… ▽ More

    Submitted 19 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures

  7. arXiv:2312.15304  [pdf, other

    cs.CL cs.AI

    Exploring the Capabilities of ChatGPT in Ancient Chinese Translation and Person Name Recognition

    Authors: Shijing Si, Siqing Zhou, Le Tang, Xiaoqing Cheng, Yugui Zhang

    Abstract: ChatGPT's proficiency in handling modern standard languages suggests potential for its use in understanding ancient Chinese. This paper explores ChatGPT's capabilities on ancient Chinese via two tasks: translating ancient Chinese to modern Chinese and recognizing ancient Chinese names. A comparison of ChatGPT's output with human translations serves to evaluate its comprehension of ancient Chinese.… ▽ More

    Submitted 23 February, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Technical report

  8. arXiv:2312.10302  [pdf, other

    cs.CL cs.AI

    One-Shot Learning as Instruction Data Prospector for Large Language Models

    Authors: Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Ling-Hao Chen, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li

    Abstract: Contemporary practices in instruction tuning often hinge on enlarging data scaling without a clear strategy for ensuring data quality, inadvertently introducing noise that may compromise model performance. To address this challenge, we introduce \textsc{Nuggets}, a novel and efficient methodology that leverages one-shot learning to discern and select high-quality instruction data from extensive da… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  9. arXiv:2312.06522  [pdf, other

    cs.CL cs.AI cs.LG

    Revisiting the Role of Label Smoothing in Enhanced Text Sentiment Classification

    Authors: Yijie Gao, Shijing Si, Hua Luo, Haixia Sun, Yugui Zhang

    Abstract: Label smoothing is a widely used technique in various domains, such as text classification, image classification and speech recognition, known for effectively combating model overfitting. However, there is little fine-grained analysis on how label smoothing enhances text sentiment classification. To fill in the gap, this article performs a set of in-depth analyses on eight datasets for text sentim… ▽ More

    Submitted 22 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Technical Report

  10. arXiv:2311.10117  [pdf, other

    cs.AI cs.LG

    Automatic Engineering of Long Prompts

    Authors: Cho-Jui Hsieh, Si Si, Felix X. Yu, Inderjit S. Dhillon

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic promp… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  11. arXiv:2311.09835  [pdf, other

    cs.CL cs.AI

    ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

    Authors: Xiangru Tang, Yuliang Liu, Zefan Cai, Yanjun Shao, Junjie Lu, Yichi Zhang, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yin Fang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

    Abstract: Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e.g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions. Also, recently, people have developed LLM agents that attempt to interact with repository code (e.g., com… ▽ More

    Submitted 21 August, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  12. arXiv:2311.08010  [pdf, other

    cs.CL cs.AI

    Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

    Authors: Helan Hu, Shuzheng Si, Haozhe Zhao, Shuang Zeng, Kaikai An, Zefan Cai, Baobao Chang

    Abstract: Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However… ▽ More

    Submitted 9 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: ACL 2024 (Findings)

  13. arXiv:2309.16231  [pdf, other

    cs.CL

    Controllable Text Generation with Residual Memory Transformer

    Authors: Hanqing Zhang, Sun Si, Haiming Wu, Dawei Song

    Abstract: Large-scale Causal Language Models (CLMs), e.g., GPT3 and ChatGPT, have brought great success in text generation. However, it is still an open challenge to control the generation process of CLM while balancing flexibility, control granularity, and generation efficiency. In this paper, we provide a new alternative for controllable text generation (CTG), by designing a non-intrusive, lightweight con… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: github:https://github.com/littlehacker26/Residual_Memory_Transformer

    Journal ref: ACL 2024

  14. arXiv:2309.11065  [pdf, other

    cs.CL

    UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt

    Authors: Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li

    Abstract: Recent research has shown that multi-task pre-training greatly improves the model's robustness and transfer ability, which is crucial for building a high-quality dialog system. However, most previous works on multi-task pre-training rely heavily on human-defined input format or prompt, which is not optimal in quality and quantity. In this work, we propose to use Task-based Automatic Prompt generat… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  15. arXiv:2309.07915  [pdf, other

    cs.CL cs.AI cs.CV

    MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

    Authors: Haozhe Zhao, Zefan Cai, Shuzheng Si, Xiaojian Ma, Kaikai An, Liang Chen, Zixuan Liu, Sheng Wang, Wenjuan Han, Baobao Chang

    Abstract: Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex multi-modal prompts with multiple images, making VLMs less effective in downstream visio… ▽ More

    Submitted 20 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by ICLR2024

  16. arXiv:2307.00866  [pdf, other

    cs.CL cs.AI

    Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomplete Utterance Rewriting

    Authors: Shuzheng Si, Shuang Zeng, Baobao Chang

    Abstract: Incomplete utterance rewriting has recently raised wide attention. However, previous works do not consider the semantic structural information between incomplete utterance and rewritten utterance or model the semantic structure implicitly and insufficiently. To address this problem, we propose a QUEry-Enhanced Network (QUEEN). Firstly, our proposed query template explicitly brings guided semantic… ▽ More

    Submitted 27 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: NAACL 2022

  17. arXiv:2305.14987  [pdf, other

    cs.CL

    Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios

    Authors: Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

    Abstract: Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this p… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Camera-ready version for EMNLP 2023 industry track

  18. arXiv:2305.13040  [pdf, other

    cs.CL cs.AI

    SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

    Authors: Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li

    Abstract: Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challeng… ▽ More

    Submitted 12 March, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  19. arXiv:2305.08372  [pdf, other

    cs.CL cs.MM

    Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

    Authors: Peipei Liu, Hong Li, Yimo Ren, Jie Liu, Shuaizong Si, Hongsong Zhu, Limin Sun

    Abstract: Mining structured knowledge from tweets using named entity recognition (NER) can be beneficial for many down stream applications such as recommendation and intention understanding. With tweet posts tending to be multimodal, multimodal named entity recognition (MNER) has attracted more attention. In this paper, we propose a novel approach, which can dynamically align the image and text sequence and… ▽ More

    Submitted 4 January, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

  20. arXiv:2305.04076  [pdf, other

    cs.CL cs.AI

    SANTA: Separate Strategies for Inaccurate and Incomplete Annotation Noise in Distantly-Supervised Named Entity Recognition

    Authors: Shuzheng Si, Zefan Cai, Shuang Zeng, Guoqiang Feng, Jiaxing Lin, Baobao Chang

    Abstract: Distantly-Supervised Named Entity Recognition effectively alleviates the burden of time-consuming and expensive annotation in the supervised setting. But the context-free matching process and the limited coverage of knowledge bases introduce inaccurate and incomplete annotation noise respectively. Previous studies either considered only incomplete annotation noise or indiscriminately handle two ty… ▽ More

    Submitted 28 July, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Findings of ACL2023

  21. arXiv:2304.07778  [pdf

    cs.CL

    SikuGPT: A Generative Pre-trained Model for Intelligent Information Processing of Ancient Texts from the Perspective of Digital Humanities

    Authors: Liu Chang, Wang Dongbo, Zhao Zhixiao, Hu Die, Wu Mengcheng, Lin Litao, Shen Si, Li Bin, Liu Jiangfeng, Zhang Hai, Zhao Lianzheng

    Abstract: The rapid advance in artificial intelligence technology has facilitated the prosperity of digital humanities research. Against such backdrop, research methods need to be transformed in the intelligent processing of ancient texts, which is a crucial component of digital humanities research, so as to adapt to new development trends in the wave of AIGC. In this study, we propose a GPT model called Si… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: 20 pages,1 figure

  22. arXiv:2303.08606  [pdf, other

    cs.CL cs.AI

    On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

    Authors: Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, Jing Xiao

    Abstract: Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions alw… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2023

  23. arXiv:2211.10586  [pdf, other

    cs.CV cs.AI

    Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

    Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

    Abstract: Dataset Distillation is a newly emerging area that aims to distill large datasets into much smaller and highly informative synthetic ones to accelerate training and reduce storage. Among various dataset distillation methods, trajectory-matching-based methods (MTT) have achieved SOTA performance in many tasks, e.g., on CIFAR-10/100. However, due to exorbitant memory consumption when unrolling optim… ▽ More

    Submitted 31 October, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

  24. arXiv:2211.04620  [pdf, other

    cs.CL cs.AI

    DeepE: a deep neural network for knowledge graph embedding

    Authors: Zhu Danhao, Shen Si, Huang Shujian, Yin Chang, Ding Ziqi

    Abstract: Recently, neural network based methods have shown their power in learning more expressive features on the task of knowledge graph embedding (KGE). However, the performance of deep methods often falls behind the shallow ones on simple graphs. One possible reason is that deep models are difficult to train, while shallow models might suffice for accurately representing the structure of the simple KGs… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 10 pages, 5 figures, 7 tables

  25. arXiv:2211.00635  [pdf, other

    cs.CL cs.LG

    Two-stage LLM Fine-tuning with Less Specialization and More Generalization

    Authors: Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

    Abstract: Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tun… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: ICLR 2024

  26. arXiv:2210.17170  [pdf, other

    cs.IR

    Efficient Document Retrieval by End-to-End Refining and Quantizing BERT Embedding with Contrastive Product Quantization

    Authors: Zexuan Qiu, Qinliang Su, Jianxing Yu, Shijing Si

    Abstract: Efficient document retrieval heavily relies on the technique of semantic hashing, which learns a binary code for every document and employs Hamming distance to evaluate document distances. However, existing semantic hashing methods are mostly established on outdated TFIDF features, which obviously do not contain lots of important semantic information about documents. Furthermore, the Hamming dista… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Journal ref: EMNLP 2022

  27. arXiv:2210.12818  [pdf, other

    cs.CV

    Pushing the Efficiency Limit Using Structured Sparse Convolutions

    Authors: Vinay Kumar Verma, Nikhil Mehta, Shijing Si, Ricardo Henao, Lawrence Carin

    Abstract: Weight pruning is among the most popular approaches for compressing deep convolutional neural networks. Recent work suggests that in a randomly initialized deep neural network, there exist sparse subnetworks that achieve performance comparable to the original network. Unfortunately, finding these subnetworks involves iterative stages of training and pruning, which can be computationally expensive.… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted at the IEEE Winter Conference on Applications of Computer Vision, WACV 2023

  28. arXiv:2210.03627  [pdf, other

    cs.CV cs.AI

    Pose Guided Human Image Synthesis with Partially Decoupled GAN

    Authors: Jianhan Wu, Jianzong Wang, Shijing Si, Xiaoyang Qu, Jing Xiao

    Abstract: Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of t… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 16 pages, 14th Asian Conference on Machine Learning conference

  29. arXiv:2209.15276  [pdf, other

    cs.LG cs.AI

    Machine Unlearning Method Based On Projection Residual

    Authors: Zihao Cao, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao

    Abstract: Machine learning models (mainly neural networks) are used more and more in real life. Users feed their data to the model for training. But these processes are often one-way. Once trained, the model remembers the data. Even when data is removed from the dataset, the effects of these data persist in the model. With more and more laws and regulations around the world protecting data privacy, it becom… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: This paper is accepted by DSAA2022. The 9th IEEE International Conference on Data Science and Advanced Analytics

  30. arXiv:2209.15181  [pdf, other

    cs.LG cs.AI q-bio.GN

    RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery

    Authors: Wen Wang, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao

    Abstract: The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology. Several deep learning-based techniques have recently been introduced to address this issue. However, these algorithms can not be used in real-world situations because of the need for labeled data. Here, we presented RL-… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: This paper is accepted by DSAA2022. The 9th IEEE International Conference on Data Science and Advanced Analytics

  31. arXiv:2209.10088  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

    Authors: Shijing Si, Jianzong Wang, Xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

    Abstract: Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the… ▽ More

    Submitted 27 September, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: 12 pages, 3 figures, Accepted by ICONIP 2022

  32. arXiv:2209.01646  [pdf, other

    cs.CL

    SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER

    Authors: Shuzheng Si, Shuang Zeng, Jiaxing Lin, Baobao Chang

    Abstract: Named Entity Recognition is the task to locate and classify the entities in the text. However, Unlabeled Entity Problem in NER datasets seriously hinders the improvement of NER performance. This paper proposes SCL-RAI to cope with this problem. Firstly, we decrease the distance of span representations with the same label while increasing it for different ones via span-based contrastive learning, w… ▽ More

    Submitted 24 October, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  33. arXiv:2208.11628  [pdf, other

    cs.IR cs.CY cs.LG

    Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

    Authors: Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

    Abstract: Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations. However, service providers frequently face more complex obstacles in real-world circumstances, such as deployment cost constraints and fairness requirements. Knowledge distillation,… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: This paper has been accepted by the 23rd International Conference on Web Information Systems Engineering (WISE 2022)

  34. arXiv:2207.09639  [pdf, other

    cs.LG cs.AI cs.CV

    DC-BENCH: Dataset Condensation Benchmark

    Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

    Abstract: Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been prop… ▽ More

    Submitted 17 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

  35. arXiv:2206.13071  [pdf, other

    cs.SD cs.LG eess.AS

    Uncertainty Calibration for Deep Audio Classifiers

    Authors: Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Although deep Neural Networks (DNNs) have achieved tremendous success in audio classification tasks, their uncertainty calibration are still under-explored. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. In this work, we investigate the uncertainty calibration for deep audio classifiers. In partic… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted by InterSpeech 2022, the first two authors contributed equally

  36. arXiv:2205.13415  [pdf, other

    cs.LG

    A Fair Federated Learning Framework With Reinforcement Learning

    Authors: Yaqi Sun, Shijing Si, Jianzong Wang, Yuhan Dong, Zhitao Zhu, Jing Xiao

    Abstract: Federated learning (FL) is a paradigm where many clients collaboratively train a model under the coordination of a central server, while keeping the training data locally stored. However, heterogeneous data distributions over different clients remain a challenge to mainstream FL algorithms, which may cause slow convergence, overall performance degradation and unfairness of performance across clien… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  37. arXiv:2205.13342  [pdf, other

    cs.SE cs.AI cs.CL

    Leveraging Causal Inference for Explainable Automatic Program Repair

    Authors: Jianzong Wang, Shijing Si, Zhitao Zhu, Xiaoyang Qu, Zhenhou Hong, Jing Xiao

    Abstract: Deep learning models have made significant progress in automatic program repair. However, the black-box nature of these methods has restricted their practical applications. To address this challenge, this paper presents an interpretable approach for program repair based on sequence-to-sequence models with causal inference and our method is called CPR, short for causal program repair. Our CPR can g… ▽ More

    Submitted 6 June, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: This paper has been accepted by IJCNN2022

  38. arXiv:2205.13300  [pdf, other

    cs.CL cs.AI cs.LG

    Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

    Authors: Shijing Si, Jianzong Wang, Ruiyi Zhang, Qinliang Su, Jing Xiao

    Abstract: Non-negative matrix factorization (NMF) based topic modeling is widely used in natural language processing (NLP) to uncover hidden topics of short text documents. Usually, training a high-quality topic model requires large amount of textual data. In many real-world scenarios, customer textual data should be private and sensitive, precluding uploading to data centers. This paper proposes a Federate… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: 7 pages, 4 figures, accepted by IJCNN 2022

    Journal ref: IJCNN 2022

  39. arXiv:2205.13299  [pdf, other

    cs.CL cs.AI cs.LG

    Federated Split BERT for Heterogeneous Text Classification

    Authors: Zhengyang Li, Shijing Si, Jianzong Wang, Jing Xiao

    Abstract: Pre-trained BERT models have achieved impressive performance in many natural language processing (NLP) tasks. However, in many real-world situations, textual data are usually decentralized over many clients and unable to be uploaded to a central server due to privacy protection and regulations. Federated learning (FL) enables multiple clients collaboratively to train a global model while keeping t… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: 8 pages, 6 figures, accepted by IJCNN 2022

  40. arXiv:2205.13121  [pdf, other

    cs.IR cs.CY cs.LG

    Cali3F: Calibrated Fast Fair Federated Recommendation System

    Authors: Zhitao Zhu, Shijing Si, Jianzong Wang, Jing Xiao

    Abstract: The increasingly stringent regulations on privacy protection have sparked interest in federated learning. As a distributed machine learning framework, it bridges isolated data islands by training a global model over devices while keeping data localized. Specific to recommendation systems, many federated recommendation algorithms have been proposed to realize the privacy-preserving collaborative re… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: This paper has been accepted by IJCNN2022

  41. arXiv:2205.12461  [pdf, other

    cs.LG cs.AI

    Augmentation-induced Consistency Regularization for Classification

    Authors: Jianhan Wu, Shijing Si, Jianzong Wang, Jing Xiao

    Abstract: Deep neural networks have become popular in many supervised learning tasks, but they may suffer from overfitting when the training dataset is limited. To mitigate this, many researchers use data augmentation, which is a widely used and effective method for increasing the variety of datasets. However, the randomness introduced by data augmentation causes inevitable inconsistency between training an… ▽ More

    Submitted 26 May, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: This paper is accepted by IJCNN2022

  42. arXiv:2205.12022  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Human Image Synthesis with Residual Fast Fourier Transformation and Wasserstein Distance

    Authors: Jianhan Wu, Shijing Si, Jianzong Wang, Jing Xiao

    Abstract: With the rapid development of the Metaverse, virtual humans have emerged, and human image synthesis and editing techniques, such as pose transfer, have recently become popular. Most of the existing techniques rely on GANs, which can generate good human images even with large variants and occlusions. But from our best knowledge, the existing state-of-the-art method still has the following problems:… ▽ More

    Submitted 26 May, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: This paper is accepted by IJCNN2022

  43. arXiv:2202.11424  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Towards Speaker Age Estimation with Label Distribution Learning

    Authors: Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

    Abstract: Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discre… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: Accepted by the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022)

  44. arXiv:2202.10787  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    VU-BERT: A Unified framework for Visual Dialog

    Authors: Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao

    Abstract: The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history. Existing researches tend to employ the modality-specific modules to model the interactions, which might be troublesome to use. To fill in this gap, we propose a unified framework for image-text joint embedding,… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures, accepted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)

  45. arXiv:2107.04806  [pdf, other

    cs.SD cs.CV eess.AS eess.IV

    Speech2Video: Cross-Modal Distillation for Speech to Video Generation

    Authors: Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao

    Abstract: This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique can spark interesting applications in entertainment, customer service, and human-computer-interaction industries. Indeed, the timbre, accent and speed in speeches could contain rich information relevant to speakers' appearance. The challenge mainly lies in disentangl… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Accepted by InterSpeech2021

  46. arXiv:2107.04803  [pdf, other

    cs.SD eess.AS

    Variational Information Bottleneck for Effective Low-resource Audio Classification

    Authors: Shijing Si, Jianzong Wang, Huiming Sun, Jianhan Wu, Chuanyao Zhang, Xiaoyang Qu, Ning Cheng, Lei Chen, Jing Xiao

    Abstract: Large-scale deep neural networks (DNNs) such as convolutional neural networks (CNNs) have achieved impressive performance in audio classification for their powerful capacity and strong generalization ability. However, when training a DNN model on low-resource tasks, it is usually prone to overfitting the small data and learning too much redundant information. To address this issue, we propose to u… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Accepted by InterSpeech 2021

  47. arXiv:2106.02795  [pdf, other

    cs.LG cs.AI cs.CV

    Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

    Authors: Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio

    Abstract: Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represe… ▽ More

    Submitted 8 November, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  48. arXiv:2103.06413  [pdf, other

    cs.CL cs.LG

    FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

    Authors: Pengyu Cheng, Weituo Hao, Siyang Yuan, Shijing Si, Lawrence Carin

    Abstract: Pretrained text encoders, such as BERT, have been applied increasingly in various natural language processing (NLP) tasks, and have recently demonstrated significant performance gains. However, recent studies have demonstrated the existence of social bias in these pretrained NLP models. Although prior works have made progress on word-level debiasing, improved sentence-level fairness of pretrained… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted by the 9th International Conference on Learning Representations (ICLR 2021)

  49. arXiv:2101.00355  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning for Flexibility Design Problems

    Authors: Yehua Wei, Lei Zhang, Ruiyi Zhang, Shijing Si, Hao Zhang, Lawrence Carin

    Abstract: Flexibility design problems are a class of problems that appear in strategic decision-making across industries, where the objective is to design a ($e.g.$, manufacturing) network that affords flexibility and adaptivity. The underlying combinatorial nature and stochastic objectives make flexibility design problems challenging for standard optimization methods. In this paper, we develop a reinforcem… ▽ More

    Submitted 18 January, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

  50. arXiv:2010.09889  [pdf, other

    cs.LG math.OC stat.ML

    How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

    Authors: Yuanhao Xiong, Xuanqing Liu, Li-Cheng Lan, Yang You, Si Si, Cho-Jui Hsieh

    Abstract: Many optimizers have been proposed for training deep neural networks, and they often have multiple hyperparameters, which make it tricky to benchmark their performance. In this work, we propose a new benchmarking protocol to evaluate both end-to-end efficiency (training a model from scratch without knowing the best hyperparameter) and data-addition training efficiency (the previously selected hype… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.