Skip to main content

Showing 1–33 of 33 results for author: Fung, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07925  [pdf, other

    cs.DC

    FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning

    Authors: Jiaxing QI, Zhongzhi Luan, Shaohan Huang, Carol Fung, Hailong Yang, Depei Qian

    Abstract: Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data. It poses a challenge to robustly training LLMs for individual users (clients). To tackle this challenge, the intuitive idea is to introduce federated learning (FL), which can collaboratively train models on distributed pri… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2404.08562  [pdf, other

    cs.CR cs.AI cs.LG

    Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection

    Authors: Litao Li, Steven H. H. Ding, Andrew Walenstein, Philippe Charland, Benjamin C. M. Fung

    Abstract: Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it pro… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2402.01905  [pdf, other

    cs.SI cs.CY cs.MA

    Carthago Delenda Est: Co-opetitive Indirect Information Diffusion Model for Influence Operations on Online Social Media

    Authors: Jwen Fai Low, Benjamin C. M. Fung, Farkhund Iqbal, Claude Fachkha

    Abstract: For a state or non-state actor whose credibility is bankrupt, relying on bots to conduct non-attributable, non-accountable, and seemingly-grassroots-but-decentralized-in-actuality influence/information operations (info ops) on social media can help circumvent the issue of trust deficit while advancing its interests. Planning and/or defending against decentralized info ops can be aided by computati… ▽ More

    Submitted 6 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 60 pages, 9 figures, 1 table

  4. arXiv:2310.10461  [pdf, other

    cs.LG cs.CV

    Model Selection of Zero-shot Anomaly Detectors in the Absence of Labeled Validation Data

    Authors: Clement Fung, Chen Qiu, Aodong Li, Maja Rudolph

    Abstract: Anomaly detection requires detecting abnormal samples in large unlabeled datasets. While progress in deep learning and the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the lack of labeled data -- without it, their detection performance cannot be evaluated reliably. In this work, we propose SWSA (Selection W… ▽ More

    Submitted 9 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 14 pages

  5. arXiv:2309.01189  [pdf, other

    cs.LG cs.AI cs.SE

    LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

    Authors: Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Carol Fung, Hailong Yang, Depei Qian

    Abstract: The increasing volume of log data produced by software-intensive systems makes it impractical to analyze them manually. Many deep learning-based methods have been proposed for log-based anomaly detection. These methods face several challenges such as high-dimensional and noisy log data, class imbalance, generalization, and model interpretability. Recently, ChatGPT has shown promising results in va… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  6. arXiv:2307.10631  [pdf, other

    cs.SE cs.AI

    Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

    Authors: Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland

    Abstract: The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 13 pages and 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  7. arXiv:2303.11715  [pdf, other

    cs.NI

    LogQA: Question Answering in Unstructured Logs

    Authors: Shaohan Huang, Yi Liu, Carol Fung, Jiaxing Qi, Hailong Yang, Zhongzhi Luan

    Abstract: Modern systems produce a large volume of logs to record run-time status and events. System operators use these raw logs to track a system in order to obtain some useful information to diagnose system anomalies. One of the most important problems in this area is to help operators find the answers to log-based questions efficiently and user-friendly. In this work, we propose LogQA, which aims at ans… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  8. arXiv:2210.11711  [pdf, ps, other

    cs.CL cs.AI

    Modelling Multi-relations for Convolutional-based Knowledge Graph Embedding

    Authors: Sirui Li, Kok Wai Wong, Dengya Zhu, Chun Che Fung

    Abstract: Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, Co… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022

  9. arXiv:2209.15207  [pdf, other

    math.ST cs.LG cs.NE stat.ME

    Mixture of experts models for multilevel data: modelling framework and approximation theory

    Authors: Tsz Chai Fung, Spark C. Tseung

    Abstract: Multilevel data are prevalent in many real-world applications. However, it remains an open research problem to identify and justify a class of models that flexibly capture a wide range of multilevel data. Motivated by the versatility of the mixture of experts (MoE) models in fitting regression data, in this article we extend upon the MoE and study a class of mixed MoE (MMoE) models for multilevel… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  10. arXiv:2203.13570  [pdf, other

    cs.LG

    Improving Question Answering over Knowledge Graphs Using Graph Summarization

    Authors: Sirui Li, Kok Kai Wong, Dengya Zhu, Chun Che Fung

    Abstract: Question Answering (QA) systems over Knowledge Graphs (KGs) (KGQA) automatically answer natural language questions using triples contained in a KG. The key idea is to represent questions and entities of a KG as low-dimensional embeddings. Previous KGQAs have attempted to represent entities using Knowledge Graph Embedding (KGE) and Deep Learning (DL) methods. However, KGEs are too shallow to captur… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: The paper is accepted by ICONIP 2021

  11. Privacy Guarantees of BLE Contact Tracing: A Case Study on COVIDWISE

    Authors: Salman Ahmed, Ya Xiao, Taejoong, Chung, Carol Fung, Moti Yung, Danfeng, Yao

    Abstract: Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scen… ▽ More

    Submitted 16 December, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: \{copyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: IEEE Computer 2021

  12. arXiv:2111.02303  [pdf, other

    cs.LG cs.AI

    On the Effectiveness of Interpretable Feedforward Neural Network

    Authors: Miles Q. Li, Benjamin C. M. Fung, Adel Abusitta

    Abstract: Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an interpretation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is hard to interpret their… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  13. arXiv:2110.08254  [pdf, other

    cs.LG cs.CL

    Inconsistent Few-Shot Relation Classification via Cross-Attentional Prototype Networks with Contrastive Learning

    Authors: Hongru Wang, Zhijing Jin, Jiarun Cao, Gabriel Pui Cheong Fung, Kam-Fai Wong

    Abstract: Standard few-shot relation classification (RC) is designed to learn a robust classifier with only few labeled data for each class. However, previous works rarely investigate the effects of a different number of classes (i.e., $N$-way) and number of labeled data per class (i.e., $K$-shot) during training vs. testing. In this work, we define a new task, \textit{inconsistent few-shot RC}, where the m… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  14. arXiv:2109.05234  [pdf, other

    cs.CL cs.AI

    Prior Omission of Dissimilar Source Domain(s) for Cost-Effective Few-Shot Learning

    Authors: Zezhong Wang, Hongru Wang, Kwan Wai Chung, Jia Zhu, Gabriel Pui Cheong Fung, Kam-Fai Wong

    Abstract: Few-shot slot tagging is an emerging research topic in the field of Natural Language Understanding (NLU). With sufficient annotated data from source domains, the key challenge is how to train and adapt the model to another target domain which only has few labels. Conventional few-shot approaches use all the data from the source domains without considering inter-domain relations and implicitly assu… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

  15. arXiv:2109.05187  [pdf, other

    cs.CL cs.AI

    TopicRefine: Joint Topic Prediction and Dialogue Response Generation for Multi-turn End-to-End Dialogue System

    Authors: Hongru Wang, Mingyu Cui, Zimo Zhou, Gabriel Pui Cheong Fung, Kam-Fai Wong

    Abstract: A multi-turn dialogue always follows a specific topic thread, and topic shift at the discourse level occurs naturally as the conversation progresses, necessitating the model's ability to capture different topics and generate topic-aware responses. Previous research has either predicted the topic first and then generated the relevant response, or simply applied the attention mechanism to all topics… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

  16. arXiv:2104.08530  [pdf, other

    cs.CL

    The Topic Confusion Task: A Novel Scenario for Authorship Attribution

    Authors: Malik H. Altakrori, Jackie Chi Kit Cheung, Benjamin C. M. Fung

    Abstract: Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether new, unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by a failure to… ▽ More

    Submitted 9 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: 15 pages (9 + ref./appin.), 6 figures, Accepted to Findings of EMNLP 2021

  17. arXiv:2011.08772  [pdf, other

    cs.CL cs.AI

    KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System

    Authors: Hongru Wang, Min Li, Zimo Zhou, Gabriel Pui Cheong Fung, Kam-Fai Wong

    Abstract: Compared with CrossWOZ (Chinese) and MultiWOZ (English) dataset which have coarse-grained information, there is no dataset which handle fine-grained and hierarchical level information properly. In this paper, we publish a first Cantonese knowledge-driven Dialogue Dataset for REStaurant (KddRES) in Hong Kong, which grounds the information in multi-turn conversations to one specific restaurant. Our… ▽ More

    Submitted 14 December, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: 8 pages,2 figures

  18. Learning Inter-Modal Correspondence and Phenotypes from Multi-Modal Electronic Health Records

    Authors: Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon

    Abstract: Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health records (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnos… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

  19. arXiv:2006.09161  [pdf, other

    cs.CL cs.AI cs.LG

    CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning

    Authors: Hongru Wang, Xiangru Tang, Sunny Lai, Kwong Sak Leung, Jia Zhu, Gabriel Pui Cheong Fung, Kam-Fai Wong

    Abstract: This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE) which consists of three sub-tasks. The task is to directly validate the given sentence whether or not it makes sense and require the model to explain it. Based on BERTarchitecture with a multi-task setting, we propose an effective and interpretable "Explain, Reason and Predict" (ERP)… ▽ More

    Submitted 27 July, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

  20. arXiv:2006.06862   

    cs.LG q-bio.NC stat.AP stat.ML

    Deep Learning-based Stress Determinator for Mouse Psychiatric Analysis using Hippocampus Activity

    Authors: Donghan Liu, Benjamin C. M. Fung, Tak Pan Wong

    Abstract: Decoding neurons to extract information from transmission and employ them into other use is the goal of neuroscientists' study. Due to that the field of neuroscience is utilizing the traditional methods presently, we hence combine the state-of-the-art deep learning techniques with the theory of neuron decoding to discuss its potential of accomplishment. Besides, the stress level that is related to… ▽ More

    Submitted 27 June, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: The paper need re-evaluated and reviewed, may cause some significant changes

  21. arXiv:1909.06865  [pdf, other

    cs.LG cs.CR stat.ML

    I-MAD: Interpretable Malware Detector Using Galaxy Transformer

    Authors: Miles Q. Li, Benjamin C. M. Fung, Philippe Charland, Steven H. H. Ding

    Abstract: Malware currently presents a number of serious threats to computer users. Signature-based malware detection methods are limited in detecting new malware samples that are significantly different from known ones. Therefore, machine learning-based methods have been proposed, but there are two challenges these methods face. The first is to model the full semantics behind the assembly code of malware.… ▽ More

    Submitted 20 June, 2021; v1 submitted 15 September, 2019; originally announced September 2019.

    Comments: Published by Elsevier Computers & Security

  22. arXiv:1907.08736  [pdf, other

    cs.CR cs.CL cs.LG

    ER-AE: Differentially Private Text Generation for Authorship Anonymization

    Authors: Haohan Bo, Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal

    Abstract: Most of privacy protection studies for textual data focus on removing explicit sensitive identifiers. However, personal writing style, as a strong indicator of the authorship, is often neglected. Recent studies, such as SynTF, have shown promising results on privacy-preserving text mining. However, their anonymization algorithm can only output numeric term vectors which are difficult for the recip… ▽ More

    Submitted 13 May, 2021; v1 submitted 19 July, 2019; originally announced July 2019.

  23. arXiv:1811.09904  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning

    Authors: Muhammad Shayan, Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh

    Abstract: Federated Learning is the current state of the art in supporting secure multi-party machine learning (ML): data is maintained on the owner's device and the updates to the model are aggregated through a secure protocol. However, this process assumes a trusted centralized infrastructure for coordination, and clients must trust that the central service does not use the byproducts of client data. In a… ▽ More

    Submitted 11 December, 2019; v1 submitted 24 November, 2018; originally announced November 2018.

    Comments: 20 pages

  24. arXiv:1811.09712  [pdf, other

    cs.CR cs.DC cs.LG

    Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

    Authors: Clement Fung, Jamie Koerner, Stewart Grant, Ivan Beschastnikh

    Abstract: Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-part… ▽ More

    Submitted 23 February, 2019; v1 submitted 23 November, 2018; originally announced November 2018.

    Comments: 16 pages

  25. All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey

    Authors: Ashkan Yousefpour, Caleb Fung, Tam Nguyen, Krishna Kadiyala, Fatemeh Jalali, Amirreza Niakanlahiji, Jian Kong, Jason P. Jue

    Abstract: With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promisin… ▽ More

    Submitted 13 February, 2019; v1 submitted 15 August, 2018; originally announced August 2018.

    Comments: 48 pages, 7 tables, 11 figures, 450 references. The data (categories and features/objectives of the papers) of this survey are now available publicly. Accepted by Elsevier Journal of Systems Architecture

  26. arXiv:1808.04866  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Mitigating Sybils in Federated Learning Poisoning

    Authors: Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh

    Abstract: Machine learning (ML) over distributed multi-party data is required for a variety of domains. Existing approaches, such as federated learning, collect the outputs computed by a group of devices at a central aggregator and run iterative algorithms to train a globally shared model. Unfortunately, such approaches are susceptible to a variety of attacks, including model poisoning, which is made substa… ▽ More

    Submitted 15 July, 2020; v1 submitted 14 August, 2018; originally announced August 2018.

    Comments: 16 pages, Extended technical version of conference paper "The Limitations of Federated Learning in Sybil Settings" accepted at RAID 2020

  27. arXiv:1711.06710  [pdf, other

    cs.CY

    Instant Accident Reporting and Crowdsensed Road Condition Analytics for Smart Cities

    Authors: Ashkan Yousefpour, Caleb Fung, Tam Nguyen, David Hong, Daniel Zhang

    Abstract: The following report contains information about a proposed technology by the authors, which consists of a device that sits inside of a vehicle and constantly monitors the car information. It can determine speed, g-force, and location coordinates. Using these data, the device can detect a car crash or pothole on the road. The data collected from the car is forwarded to a server to for more in-depth… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: 8 pages, 7 figures, submitted to "Communication Technology Changing the World Competition", Sponsored by IEEE Communication Society

  28. arXiv:1606.01219  [pdf, other

    cs.CL cs.CY cs.SI

    Learning Stylometric Representations for Authorship Analysis

    Authors: Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal, William K. Cheung

    Abstract: Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most… ▽ More

    Submitted 3 June, 2016; originally announced June 2016.

    ACM Class: K.4.1; I.7.5; I.2.7

  29. arXiv:1405.0198   

    quant-ph cs.CR

    No Superluminal Signaling Implies Unconditionally Secure Bit Commitment

    Authors: H. F. Chau, C. -H. Fred Fung, H. -K. Lo

    Abstract: Bit commitment (BC) is an important cryptographic primitive for an agent to convince a mutually mistrustful party that she has already made a binding choice of 0 or 1 but only to reveal her choice at a later time. Ideally, a BC protocol should be simple, reliable, easy to implement using existing technologies, and most importantly unconditionally secure in the sense that its security is based on a… ▽ More

    Submitted 18 November, 2014; v1 submitted 1 May, 2014; originally announced May 2014.

    Comments: This paper has been withdrawn by the authors due to a crucial oversight on an earlier work by A. Kent

  30. arXiv:1208.2773  [pdf, other

    cs.DB

    Privacy Preserving Record Linkage via grams Projections

    Authors: Luca Bonomi, Li Xiong, Rui Chen, Benjamin C. M. Fung

    Abstract: Record linkage has been extensively used in various data mining applications involving sharing data. While the amount of available data is growing, the concern of disclosing sensitive information poses the problem of utility vs privacy. In this paper, we study the problem of private record linkage via secure data transformations. In contrast to the existing techniques in this area, we propose a no… ▽ More

    Submitted 13 August, 2012; originally announced August 2012.

  31. arXiv:1112.2020  [pdf, ps, other

    cs.DB

    Differentially Private Trajectory Data Publication

    Authors: Rui Chen, Benjamin C. M. Fung, Bipin C. Desai

    Abstract: With the increasing prevalence of location-aware devices, trajectory data has been generated and collected in various application domains. Trajectory data carries rich information that is useful for many data analysis tasks. Yet, improper publishing and use of trajectory data could jeopardize individual privacy. However, it has been shown that existing privacy-preserving trajectory data publishing… ▽ More

    Submitted 9 December, 2011; originally announced December 2011.

  32. arXiv:1002.3190  [pdf, ps, other

    cs.CR cs.NI

    A Distributed Sequential Algorithm for Collaborative Intrusion Detection Networks

    Authors: Quanyan Zhu, Carol J. Fung, Raouf Boutaba, Tamer Basar

    Abstract: Collaborative intrusion detection networks are often used to gain better detection accuracy and cost efficiency as compared to a single host-based intrusion detection system (IDS). Through cooperation, it is possible for a local IDS to detect new attacks that may be known to other experienced acquaintances. In this paper, we present a sequential hypothesis testing method for feedback aggregation… ▽ More

    Submitted 16 February, 2010; originally announced February 2010.

  33. Phase-Remapping Attack in Practical Quantum Key Distribution Systems

    Authors: Chi-Hang Fred Fung, Bing Qi, Kiyoshi Tamaki, Hoi-Kwong Lo

    Abstract: Quantum key distribution (QKD) can be used to generate secret keys between two distant parties. Even though QKD has been proven unconditionally secure against eavesdroppers with unlimited computation power, practical implementations of QKD may contain loopholes that may lead to the generated secret keys being compromised. In this paper, we propose a phase-remapping attack targeting two practical… ▽ More

    Submitted 5 March, 2007; v1 submitted 17 January, 2006; originally announced January 2006.

    Comments: 13 pages, 8 figures

    Journal ref: Phys. Rev. A 75, 032314 (2007)