Zum Hauptinhalt springen

Showing 1–50 of 82 results for author: Ooi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.03013  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: On the Design and Implementation of an AI-powered Autonomous Database

    Authors: Zhanhao Zhao, Shaofeng Cai, Haotian Gao, Hexiang Pan, Siqi Xiang, Naili Xing, Gang Chen, Beng Chin Ooi, Yanyan Shen, Yuncheng Wu, Meihui Zhang

    Abstract: Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper intr… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  2. arXiv:2408.00513  [pdf, other

    cs.LG

    VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

    Authors: Fei Xiao, Shaofeng Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Meihui Zhang

    Abstract: Fraud detection presents a challenging task characterized by ever-evolving fraud patterns and scarce labeled data. Existing methods predominantly rely on graph-based or sequence-based approaches. While graph-based approaches connect users through shared entities to capture structural information, they remain vulnerable to fraudsters who can disrupt or manipulate these connections. In contrast, seq… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024

  3. arXiv:2407.05034  [pdf

    cs.CR

    GCON: Differentially Private Graph Convolutional Network via Objective Perturbation

    Authors: Jianxin Wei, Yizheng Zhu, Xiaokui Xiao, Ergute Bao, Yin Yang, Kuntai Cai, Beng Chin Ooi

    Abstract: Graph Convolutional Networks (GCNs) are a popular machine learning model with a wide range of applications in graph analytics, including healthcare, transportation, and finance. Similar to other neural networks, a GCN may memorize parts of the training data through its model weights. Thus, when the underlying graph data contains sensitive information such as interpersonal relationships, a GCN trai… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  4. arXiv:2406.14015  [pdf, other

    cs.LG

    CohortNet: Empowering Cohort Discovery for Interpretable Healthcare Analytics

    Authors: Qingpeng Cai, Kaiping Zheng, H. V. Jagadish, Beng Chin Ooi, James Yip

    Abstract: Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that lack medical relevance. Automating cohort studies with interpretable patterns has great potential to facilitate healthcare analysis but remains an unmet need in p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages, 12 figures

  5. arXiv:2405.03924  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: An AI-powered Autonomous Data System

    Authors: Beng Chin Ooi, Shaofeng Cai, Gang Chen, Yanyan Shen, Kian-Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

    Abstract: In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, sel… ▽ More

    Submitted 4 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.00568  [pdf, other

    cs.DB cs.AI

    Powering In-Database Dynamic Model Slicing for Structured Data Analytics

    Authors: Lingze Zeng, Naili Xing, Shaofeng Cai, Gang Chen, Beng Chin Ooi, Jian Pei, Yuncheng Wu

    Abstract: Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  7. arXiv:2403.10318  [pdf, other

    cs.LG

    Anytime Neural Architecture Search on Tabular Data

    Authors: Naili Xing, Shaofeng Cai, Zhaojing Luo, Beng Chin Ooi, Jian Pei

    Abstract: The increasing demand for tabular data analysis calls for transitioning from manual architecture design to Neural Architecture Search (NAS). This transition demands an efficient and responsive anytime NAS approach that is capable of returning current optimal architectures within any given time budget while progressively enhancing architecture quality with increased budget allocation. However, the… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2402.18607  [pdf, other

    cs.LG cs.AI cs.CR

    Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective

    Authors: Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi

    Abstract: Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing pri… ▽ More

    Submitted 3 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  9. METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

    Abstract: Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams.… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  10. arXiv:2312.03243  [pdf, other

    cs.NE cs.CE cs.LG

    Generalizable Neural Physics Solvers by Baldwinian Evolution

    Authors: Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

    Abstract: Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin ef… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  11. arXiv:2311.15310  [pdf, other

    cs.CR cs.DB cs.DC cs.LG

    Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs

    Authors: Yizheng Zhu, Yuncheng Wu, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao

    Abstract: Organizations are increasingly recognizing the value of data collaboration for data analytics purposes. Yet, stringent data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated Learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server wh… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  12. arXiv:2310.10483  [pdf, other

    cs.CR cs.LG

    Passive Inference Attacks on Split Learning via Adversarial Regularization

    Authors: Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, Beng Chin Ooi

    Abstract: Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more practical attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and ad… ▽ More

    Submitted 28 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 19 pages, 20 figures

  13. arXiv:2304.10539  [pdf, other

    cs.LG cs.CV

    Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels

    Authors: Wenqiao Zhang, Changshuo Liu, Lingze Zeng, Beng Chin Ooi, Siliang Tang, Yueting Zhuang

    Abstract: Conventional multi-label classification (MLC) methods assume that all samples are fully labeled and identically distributed. Unfortunately, this assumption is unrealistic in large-scale MLC data that has long-tailed (LT) distribution and partial labels (PL). To address the problem, we introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC), to jointly consider… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  14. arXiv:2304.04468  [pdf, other

    cs.LG cs.AI

    Toward Cohort Intelligence: A Universal Cohort Representation Learning Framework for Electronic Health Record Analysis

    Authors: Changshuo Liu, Wenqiao Zhang, Beng Chin Ooi, James Wei Luen Yip, Lingze Zeng, Kaiping Zheng

    Abstract: Electronic Health Records (EHR) are generated from clinical routine care recording valuable information of broad patient populations, which provide plentiful opportunities for improving patient management and intervention strategies in clinical practice. To exploit the enormous potential of EHR data, a popular EHR data analysis paradigm in machine learning is EHR representation learning, which fir… ▽ More

    Submitted 12 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: 10 pages

  15. arXiv:2303.17526  [pdf, other

    cs.CV

    CAusal and collaborative proxy-tasKs lEarning for Semi-Supervised Domain Adaptation

    Authors: Wenqiao Zhang, Changshuo Liu, Can Cui, Beng Chin Ooi

    Abstract: Semi-supervised domain adaptation (SSDA) adapts a learner to a new domain by effectively utilizing source domain data and a few labeled target samples. It is a practical yet under-investigated research topic. In this paper, we analyze the SSDA problem from two perspectives that have previously been overlooked, and correspondingly decompose it into two \emph{key subproblems}: \emph{robust domain ad… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

  16. arXiv:2302.04500  [pdf, other

    cs.DC cs.AI cs.DB

    FLAC: A Robust Failure-Aware Atomic Commit Protocol for Distributed Transactions

    Authors: Hexiang Pan, Quang-Trung Ta, Meihui Zhang, Yeow Meng Chee, Gang Chen, Beng Chin Ooi

    Abstract: In distributed transaction processing, atomic commit protocol (ACP) is used to ensure database consistency. With the use of commodity compute nodes and networks, failures such as system crashes and network partitioning are common. It is therefore important for ACP to dynamically adapt to the operating condition for efficiency while ensuring the consistency of the database. Existing ACPs often assu… ▽ More

    Submitted 2 March, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    MSC Class: H.2.4

  17. arXiv:2302.01518  [pdf, other

    cs.LG cs.CE physics.flu-dyn

    LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry

    Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

    Abstract: We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 11 pages, 7 figures

    Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

  18. Graph Neural Network Based Surrogate Model of Physics Simulations for Geometry Design

    Authors: Jian Cheng Wong, Chin Chun Ooi, Joyjit Chattoraj, Lucas Lestandi, Guoying Dong, Umesh Kizhakkinan, David William Rosen, Mark Hyunpong Jhon, My Ha Dao

    Abstract: Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict th… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: 7 pages, 5 figures, 2022 IEEE Symposium Series on Computational Intelligence

  19. arXiv:2301.03829  [pdf, other

    cs.LG cs.AI cs.CV cs.DB cs.MM

    From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore

    Authors: Kaiping Zheng, Thao Nguyen, Jesslyn Hwei Sing Chong, Charlene Enhui Goh, Melanie Herschel, Hee Hoon Lee, Changshuo Liu, Beng Chin Ooi, Wei Wang, James Yip

    Abstract: Singapore has been striving to improve the provision of healthcare services to her people. In this course, the government has taken note of the deficiency in regulating and supervising people's nutrient intake, which is identified as a contributing factor to the development of chronic diseases. Consequently, this issue has garnered significant attention. In this paper, we share our experience in a… ▽ More

    Submitted 28 March, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  20. arXiv:2212.07624  [pdf, other

    cs.NE cs.AI cs.LG physics.comp-ph

    Neuroevolution of Physics-Informed Neural Nets: Benchmark Problems and Comparative Results

    Authors: Nicholas Sung Wei Yong, Jian Cheng Wong, Pao-Hsiung Chiu, Abhishek Gupta, Chinchun Ooi, Yew-Soon Ong

    Abstract: The potential of learned models for fundamental scientific research and discovery is drawing increasing attention worldwide. Physics-informed neural networks (PINNs), where the loss function directly embeds governing equations of scientific phenomena, is one of the key techniques at the forefront of recent advances. PINNs are typically trained using stochastic gradient descent methods, akin to the… ▽ More

    Submitted 6 December, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 11 pages, 6 figures, 4 tables

    Journal ref: Proceedings of the Companion Conference on Genetic and Evolutionary Computation July 2023

  21. arXiv:2212.04371  [pdf

    cs.LG cs.CR

    Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy

    Authors: Ergute Bao, Yizheng Zhu, Xiaokui Xiao, Yin Yang, Beng Chin Ooi, Benjamin Hong Meng Tan, Khin Mi Mi Aung

    Abstract: Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants… ▽ More

    Submitted 2 July, 2024; v1 submitted 8 December, 2022; originally announced December 2022.

  22. arXiv:2211.13464  [pdf, other

    cs.LG physics.bio-ph physics.chem-ph physics.comp-ph physics.flu-dyn

    Design of Turing Systems with Physics-Informed Neural Networks

    Authors: Jordon Kho, Winston Koh, Jian Cheng Wong, Pao-Hsiung Chiu, Chin Chun Ooi

    Abstract: Reaction-diffusion (Turing) systems are fundamental to the formation of spatial patterns in nature and engineering. These systems are governed by a set of non-linear partial differential equations containing parameters that determine the rate of constituent diffusion and reaction. Critically, these parameters, such as diffusion coefficient, heavily influence the mode and type of the final pattern,… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  23. arXiv:2211.13455  [pdf, other

    cs.CY cs.CV physics.flu-dyn

    Automated Quantification of Traffic Particulate Emissions via an Image Analysis Pipeline

    Authors: Kong Yuan Ho, Chin Seng Lim, Matthena A. Kattar, Bharathi Boppana, Liya Yu, Chin Chun Ooi

    Abstract: Traffic emissions are known to contribute significantly to air pollution around the world, especially in heavily urbanized cities such as Singapore. It has been previously shown that the particulate pollution along major roadways exhibit strong correlation with increased traffic during peak hours, and that reductions in traffic emissions can lead to better health outcomes. However, in many instanc… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  24. arXiv:2211.12042  [pdf, other

    cs.LG physics.comp-ph

    Robustness of Physics-Informed Neural Networks to Noise in Sensor Data

    Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chin Chun Ooi, My Ha Da

    Abstract: Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  25. arXiv:2211.12035  [pdf, other

    cs.LG cs.CY physics.flu-dyn

    FastFlow: AI for Fast Urban Wind Velocity Prediction

    Authors: Shi Jer Low, Venugopalan, S. G. Raghavan, Harish Gopalan, Jian Cheng Wong, Justin Yeoh, Chin Chun Ooi

    Abstract: Data-driven approaches, including deep learning, have shown great promise as surrogate models across many domains. These extend to various areas in sustainability. An interesting direction for which data-driven methods have not been applied much yet is in the quick quantitative evaluation of urban layouts for planning and design. In particular, urban designs typically involve complex trade-offs be… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  26. arXiv:2209.05227  [pdf, other

    cs.DC cs.AI cs.CV cs.IR

    DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization

    Authors: Zheqi Lv, Wenqiao Zhang, Shengyu Zhang, Kun Kuang, Feng Wang, Yongwei Wang, Zhengyu Chen, Tao Shen, Hongxia Yang, Beng Chin Ooi, Fei Wu

    Abstract: Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. It aims to improve the generalization ability of pre-trained models when deployed on resource-constrained devices, such as improving the performance of pre-trained cloud models on smart mobiles. While quite a lot of works have investigated the data distribution shift… ▽ More

    Submitted 16 February, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

  27. arXiv:2207.00944  [pdf, other

    cs.DB

    GlassDB: An Efficient Verifiable Ledger Database System Through Transparency

    Authors: Cong Yue, Tien Tuan Anh Dinh, Zhongle Xie, Meihui Zhang, Gang Chen, Beng Chin Ooi, Xiaokui Xiao

    Abstract: Verifiable ledger databases protect data history against malicious tampering. Existing systems, such as blockchains and certificate transparency, are based on transparency logs -- a simple abstraction allowing users to verify that a log maintained by an untrusted server is append-only. They expose a simple key-value interface. Building a practical database from transparency logs, on the other hand… ▽ More

    Submitted 19 February, 2023; v1 submitted 2 July, 2022; originally announced July 2022.

  28. arXiv:2206.10326  [pdf, other

    cs.HC cs.AI cs.CV cs.DB cs.DC

    The Metaverse Data Deluge: What Can We Do About It?

    Authors: Beng Chin Ooi, Gang Chen, Mike Zheng Shou, Kian-Lee Tan, Anthony Tung, Xiaokui Xiao, James Wei Luen Yip, Meihui Zhang

    Abstract: In the Metaverse, the physical space and the virtual space co-exist, and interact simultaneously. While the physical space is virtually enhanced with information, the virtual space is continuously refreshed with real-time, real-world information. To allow users to process and manipulate information seamlessly between the real and digital spaces, novel technologies must be developed. These include… ▽ More

    Submitted 10 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  29. arXiv:2205.06941  [pdf, ps, other

    cs.DC cs.DB cs.PF

    Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

    Authors: Dumitrel Loghin, Tien Tuan Anh Dinh, Aung Maw, Chen Gang, Yong Meng Teo, Beng Chin Ooi

    Abstract: While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adop… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 13 pages, 10 figures, 3 tables

  30. arXiv:2203.02533  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

    Authors: Wenqiao Zhang, Lei Zhu, James Hallinan, Andrew Makmur, Shengyu Zhang, Qingpeng Cai, Beng Chin Ooi

    Abstract: In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status. This strategy can adaptively… ▽ More

    Submitted 21 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: 11 pages

    Journal ref: CVPR 2022

  31. arXiv:2112.04963  [pdf, other

    cs.LG physics.ao-ph

    Model-Agnostic Hybrid Numerical Weather Prediction and Machine Learning Paradigm for Solar Forecasting in the Tropics

    Authors: Nigel Yuan Yun Ng, Harish Gopalan, Venugopalan S. G. Raghavan, Chin Chun Ooi

    Abstract: Numerical weather prediction (NWP) and machine learning (ML) methods are popular for solar forecasting. However, NWP models have multiple possible physical parameterizations, which requires site-specific NWP optimization. This is further complicated when regional NWP models are used with global climate models with different possible parameterizations. In this study, an alternative approach is prop… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  32. arXiv:2110.15832  [pdf

    cs.LG cs.CE math.NA physics.comp-ph physics.flu-dyn

    CAN-PINN: A Fast Physics-Informed Neural Network Based on Coupled-Automatic-Numerical Differentiation Method

    Authors: Pao-Hsiung Chiu, Jian Cheng Wong, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

    Abstract: In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD… ▽ More

    Submitted 27 March, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

    Comments: 25 pages, 20 figures

    Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 395, 15 May 2022, 114909

  33. arXiv:2109.09338  [pdf

    cs.LG cs.AI cs.CE physics.comp-ph

    Learning in Sinusoidal Spaces with Physics-Informed Neural Networks

    Authors: Jian Cheng Wong, Chinchun Ooi, Abhishek Gupta, Yew-Soon Ong

    Abstract: A physics-informed neural network (PINN) uses physics-augmented loss functions, e.g., incorporating the residual term from governing partial differential equations (PDEs), to ensure its output is consistent with fundamental physics laws. However, it turns out to be difficult to train an accurate PINN model for many problems in practice. In this paper, we present a novel perspective of the merits o… ▽ More

    Submitted 14 March, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: 16 pages, 13 figures

    Journal ref: IEEE Transactions on Artificial Intelligence, 2022

  34. arXiv:2109.00817  [pdf, other

    cs.LG cs.AI

    NASI: Label- and Data-agnostic Neural Architecture Search at Initialization

    Authors: Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi, Bryan Kian Hsiang Low

    Abstract: Recent years have witnessed a surging interest in Neural Architecture Search (NAS). Various algorithms have been proposed to improve the search efficiency and effectiveness of NAS, i.e., to reduce the search cost and improve the generalization performance of the selected architectures, respectively. However, the search efficiency of these algorithms is severely limited by the need for model traini… ▽ More

    Submitted 25 April, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: Published as a conference paper at ICLR 2022

  35. SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

    Authors: Naili Xing, Sai Ho Yeung, Chenghao Cai, Teck Khim Ng, Wei Wang, Kaiyuan Yang, Nan Yang, Meihui Zhang, Gang Chen, Beng Chin Ooi

    Abstract: Deep learning has achieved great success in a wide spectrum of multimedia applications such as image classification, natural language processing and multimodal data analysis. Recent years have seen the development of many deep learning frameworks that provide a high-level programming interface for users to design models, conduct training and deploy inference. However, it remains challenging to bui… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 10 pages, 10 figures

  36. ARM-Net: Adaptive Relation Modeling Network for Structured Data

    Authors: Shaofeng Cai, Kaiping Zheng, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Meihui Zhang

    Abstract: Relational databases are the de facto standard for storing and querying structured data, and extracting insights from structured data requires advanced analytics. Deep neural networks (DNNs) have achieved super-human prediction performance in particular data types, e.g., images. However, existing DNNs may not produce meaningful results when applied to structured data. The reason is that there are… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: 14 pages, 11 figures, 5 tables, published as a conference paper in ACM SIGMOD 2020

  37. A Fusion-Denoising Attack on InstaHide with Data Augmentation

    Authors: Xinjian Luo, Xiaokui Xiao, Yuncheng Wu, Juncheng Liu, Beng Chin Ooi

    Abstract: InstaHide is a state-of-the-art mechanism for protecting private training images, by mixing multiple private images and modifying them such that their visual features are indistinguishable to the naked eye. In recent work, however, Carlini et al. show that it is possible to reconstruct private images from the encrypted dataset generated by InstaHide. Nevertheless, we demonstrate that Carlini et al… ▽ More

    Submitted 5 December, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: 15 pages

  38. arXiv:2105.05173  [pdf

    physics.flu-dyn cs.LG physics.comp-ph

    U-Net-Based Surrogate Model For Evaluation of Microfluidic Channels

    Authors: Quang Tuyen Le, Pao-Hsiung Chiu, Chin Chun Ooi

    Abstract: Microfluidics have shown great promise in multiple applications, especially in biomedical diagnostics and separations. While the flow properties of these microfluidic devices can be solved by numerical methods such as computational fluid dynamics (CFD), the process of mesh generation and setting up a numerical solver requires some domain familiarity, while more intuitive commercial programs such a… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: 10 pages, 7 figures

  39. arXiv:2105.03854  [pdf

    physics.flu-dyn cs.LG physics.comp-ph

    Surrogate Modeling of Fluid Dynamics with a Multigrid Inspired Neural Network Architecture

    Authors: Quang Tuyen Le, Chin Chun Ooi

    Abstract: Algebraic or geometric multigrid methods are commonly used in numerical solvers as they are a multi-resolution method able to handle problems with multiple scales. In this work, we propose a modification to the commonly-used U-Net neural network architecture that is inspired by the principles of multigrid methods, referred to here as U-Net-MG. We then demonstrate that this proposed U-Net-MG archit… ▽ More

    Submitted 9 May, 2021; originally announced May 2021.

    Comments: 22 pages, 15 figures

  40. arXiv:2105.01838  [pdf

    cs.LG physics.comp-ph physics.flu-dyn

    Improved Surrogate Modeling of Fluid Dynamics with Physics-Informed Neural Networks

    Authors: Jian Cheng Wong, Chinchun Ooi, Pao-Hsiung Chiu, My Ha Dao

    Abstract: Physics-Informed Neural Networks (PINNs) have recently shown great promise as a way of incorporating physics-based domain knowledge, including fundamental governing equations, into neural network models for many complex engineering systems. They have been particularly effective in the area of inverse problems, where boundary conditions may be ill-defined, and data-absent scenarios, where typical s… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: No comment

  41. AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

    Authors: Can Cui, Wei Wang, Meihui Zhang, Gang Chen, Zhaojing Luo, Beng Chin Ooi

    Abstract: Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data… ▽ More

    Submitted 1 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted by SIGMOD 2021 Data Science and Engineering Track

    ACM Class: H.2.8

  42. arXiv:2103.02958  [pdf, other

    cs.DC cs.AI cs.DB cs.LG

    Serverless Data Science -- Are We There Yet? A Case Study of Model Serving

    Authors: Yuncheng Wu, Tien Tuan Anh Dinh, Guoyu Hu, Meihui Zhang, Yeow Meng Chee, Beng Chin Ooi

    Abstract: Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their works available to end-users. Systems of model serving require high performance, low cost, and ease of management. Cloud providers are already offeri… ▽ More

    Submitted 1 March, 2022; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted by ACM SIGMOD 2022, 10 pages

  43. arXiv:2102.00196  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures

    Authors: Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong

    Abstract: In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix. We propose an algorithm based on the directional sparse filtering (DSF) framework that utilizes the Lehmer mean with learnable weights to adaptively account for source imbalance. Performance evaluatio… ▽ More

    Submitted 14 May, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

    Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489

  44. arXiv:2010.10246  [pdf, other

    cs.SE cs.DB cs.DC cs.LG

    MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines

    Authors: Zhaojing Luo, Sai Ho Yeung, Meihui Zhang, Kaiping Zheng, Lei Zhu, Gang Chen, Feiyi Fan, Qian Lin, Kee Yuan Ngiam, Beng Chin Ooi

    Abstract: With the ever-increasing adoption of machine learning for data analytics, maintaining a machine learning pipeline is becoming more complex as both the datasets and trained models evolve with time. In a collaborative environment, the changes and updates due to pipeline evolution often cause cumbersome coordination and maintenance work, raising the costs and making it hard to use. Existing solutions… ▽ More

    Submitted 16 March, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

    Comments: 13 pages; added new baselines, i.e., MLflow and ModelDB, in Section VII-C; added experience on the system deployment in Section VIII; added Table I to clarify the correctness of the prioritized pipeline search in Section VII-E

  45. Feature Inference Attack on Model Predictions in Vertical Federated Learning

    Authors: Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi

    Abstract: Federated learning (FL) is an emerging paradigm for facilitating multiple organizations' data collaboration without revealing their private data to each other. Recently, vertical FL, where the participating organizations hold the same set of samples but with disjoint features and only one organization owns the labels, has received increased attention. This paper presents several feature inference… ▽ More

    Submitted 22 April, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted at the IEEE 37th International Conference on Data Engineering (ICDE 2021); 15 pages

  46. arXiv:2009.05766  [pdf, other

    cs.DC

    Communication-efficient Decentralized Machine Learning over Heterogeneous Networks

    Authors: Pan Zhou, Qian Lin, Dumitrel Loghin, Beng Chin Ooi, Yuncheng Wu, Hongfang Yu

    Abstract: In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogeneous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform… ▽ More

    Submitted 20 October, 2020; v1 submitted 12 September, 2020; originally announced September 2020.

    Comments: 17 pages, 19 figures, accepted by conference ICDE'2021

  47. Privacy Preserving Vertical Federated Learning for Tree-based Models

    Authors: Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, Beng Chin Ooi

    Abstract: Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propo… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Proc. VLDB Endow. 13(11): 2090-2103 (2020)

  48. arXiv:2004.07585  [pdf, other

    cs.DB

    ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications

    Authors: Qian Lin, Kaiyuan Yang, Tien Tuan Anh Dinh, Qingchao Cai, Gang Chen, Beng Chin Ooi, Pingcheng Ruan, Sheng Wang, Zhongle Xie, Meihui Zhang, Olafs Vandans

    Abstract: Data collaboration activities typically require systematic or protocol-based coordination to be scalable. Git, an effective enabler for collaborative coding, has been attested for its success in countless projects around the world. Hence, applying the Git philosophy to general data collaboration beyond coding is motivating. We call it Git for data. However, the original Git design handles data at… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: In Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2020 (Demo)

  49. arXiv:2003.12012  [pdf, other

    eess.SP cs.AI cs.LG stat.AP stat.ML

    TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications

    Authors: Kaiping Zheng, Shaofeng Cai, Horng Ruey Chua, Wei Wang, Kee Yuan Ngiam, Beng Chin Ooi

    Abstract: In high stakes applications such as healthcare and finance analytics, the interpretability of predictive models is required and necessary for domain practitioners to trust the predictions. Traditional machine learning models, e.g., logistic regression (LR), are easy to interpret in nature. However, many of these models aggregate time-series data without considering the temporal correlations and va… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Comments: A version of this preprint will appear in ACM SIGMOD 2020

  50. arXiv:2003.10064  [pdf, other

    cs.DC cs.DB cs.PF

    A Transactional Perspective on Execute-order-validate Blockchains

    Authors: Pingcheng Ruan, Dumitrel Loghin, Quang-Trung Ta, Meihui Zhang, Gang Chen, Beng Chin Ooi

    Abstract: Smart contracts have enabled blockchain systems to evolve from simple cryptocurrency platforms, such as Bitcoin, to general transactional systems, such as Ethereum. Catering for emerging business requirements, a new architecture called execute-order-validate has been proposed in Hyperledger Fabric to support parallel transactions and improve the blockchain's throughput. However, this new architect… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.