Zum Hauptinhalt springen

Showing 1–50 of 181 results for author: Jianzhong

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16224  [pdf, other

    cs.CV cs.AI

    LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models

    Authors: Jingyi Wang, Jianzhong Ju, Jian Luan, Zhidong Deng

    Abstract: Recent advances in large vision-language models (VLMs) typically employ vision encoders based on the Vision Transformer (ViT) architecture. The division of the images into patches by ViT results in a fragmented perception, thereby hindering the visual understanding capabilities of VLMs. In this paper, we propose an innovative enhancement to address this limitation by introducing a Scene Graph Expr… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  3. arXiv:2408.06134  [pdf, other

    cs.DB

    Learned Indexes with Distribution Smoothing via Virtual Points

    Authors: Kasun Amarasinghe, Farhana Choudhury, Jianzhong Qi, James Bailey

    Abstract: Recent research on learned indexes has created a new perspective for indexes as models that map keys to their respective storage locations. These learned indexes are created to approximate the cumulative distribution function of the key set, where using only a single model may have limited accuracy. To overcome this limitation, a typical method is to use multiple models, arranged in a hierarchical… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  4. arXiv:2408.04939  [pdf, other

    cs.CR cs.SE

    Demystifying and Detecting Cryptographic Defects in Ethereum Smart Contracts

    Authors: Jiashuo Zhang, Yiming Shen, Jiachi Chen, Jianzhong Su, Yanlin Wang, Ting Chen, Jianbo Gao, Zhong Chen

    Abstract: Ethereum has officially provided a set of system-level cryptographic APIs to enhance smart contracts with cryptographic capabilities. These APIs have been utilized in over 10% of Ethereum transactions, motivating developers to implement various on-chain cryptographic tasks, such as digital signatures. However, since developers may not always be cryptographic experts, their ad-hoc and potentially d… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  5. arXiv:2407.02014  [pdf, other

    cs.CV

    Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning

    Authors: Chengchao Shen, Jianzhong Chen, Jianxin Wang

    Abstract: The existing contrastive learning methods mainly focus on single-grained representation learning, e.g., part-level, object-level or scene-level ones, thus inevitably neglecting the transferability of representations on other granularity levels. In this paper, we aim to learn multi-grained representations, which can effectively describe the image on various granularity levels, thus improving genera… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2406.14709  [pdf, other

    cs.CL

    Factual Dialogue Summarization via Learning from Large Language Models

    Authors: Rongxin Zhu, Jey Han Lau, Jianzhong Qi

    Abstract: Factual consistency is an important quality in dialogue summarization. Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries compared to those by smaller pretrained language models, but they face deployment challenges in real-world applications due to privacy or resource constraints. In this paper, we investigate the use of symbolic knowl… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    ACM Class: F.2.2; I.2.7

  7. arXiv:2406.10054  [pdf, other

    cs.SE cs.CR

    SmartOracle: Generating Smart Contract Oracle via Fine-Grained Invariant Detection

    Authors: Jianzhong Su, Jiachi Chen, Zhiyuan Fang, Xingwei Lin, Yutian Tang, Zibin Zheng

    Abstract: As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose or… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.05817  [pdf, other

    cs.DB

    Convex-area-wise Linear Regression and Algorithms for Data Analysis

    Authors: Bohan Lyu, Jianzhong Li

    Abstract: This paper introduces a new type of regression methodology named as Convex-Area-Wise Linear Regression(CALR), which separates given datasets by disjoint convex areas and fits different linear regression models for different areas. This regression model is highly interpretable, and it is able to interpolate any given datasets, even when the underlying relationship between explanatory and response v… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  9. arXiv:2406.03150  [pdf, other

    cs.LG cs.CV

    Sample-specific Masks for Visual Reprogramming-based Prompting

    Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

    Abstract: Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  10. arXiv:2405.12168  [pdf, other

    cs.IT

    WiDRa -- Enabling Millimeter-Level Differential Ranging Accuracy in Wi-Fi Using Carrier Phase

    Authors: Vishnu V. Ratnam, Bilal Sadiq, Hao Chen, Wei Sun, Shunyao Wu, Boon L. Ng, Jianzhong, Zhang

    Abstract: Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of $\sim 1$ m. For many applications, measuring differential range, viz., the change in the range between adjacent measurements, is sufficient. Correspondingly, this work proposes WiDRa - a Wi-Fi based Differential Ranging solution that… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE JSAC special issue on Positioning and Sensing Over Wireless Networks, 2024

  11. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  12. arXiv:2405.09079  [pdf, other

    eess.SP cs.IT

    Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave Systems

    Authors: Murat Bayraktar, Nuria González-Prelcic, Mikko Valkama, Hao Chen, Charlie Jianzhong Zhang

    Abstract: In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures

  13. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  14. arXiv:2404.16137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM

    Authors: Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang

    Abstract: High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this wor… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 5 pages, under review

  15. arXiv:2404.05949  [pdf, ps, other

    cs.DB cs.DS

    Balanced Partitioning for Optimizing Big Graph Computation: Complexities and Approximation Algorithms

    Authors: Baoling Ning, Jianzhong Li

    Abstract: Graph partitioning is a key fundamental problem in the area of big graph computation. Previous works do not consider the practical requirements when optimizing the big data analysis in real applications. In this paper, motivated by optimizing the big data computing applications, two typical problems of graph partitioning are studied. The first problem is to optimize the performance of specific wor… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  16. arXiv:2401.10518  [pdf, other

    cs.LG

    Spatial-temporal Forecasting for Regions without Observations

    Authors: Xinyu Su, Jianzhong Qi, Egemen Tanin, Yanchuan Chang, Majid Sarvi

    Abstract: Spatial-temporal forecasting plays an important role in many real-world applications, such as traffic forecasting, air pollutant forecasting, crowd-flow forecasting, and so on. State-of-the-art spatial-temporal forecasting models take data-driven approaches and rely heavily on data availability. Such models suffer from accuracy issues when data is incomplete, which is common in reality due to the… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by EDBT2024

  17. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  18. arXiv:2401.00819  [pdf, other

    cs.IT eess.SP

    3D Beamforming Through Joint Phase-Time Arrays

    Authors: Ozlem Yildiz, Ahmad AlAmmouri, Jianhua Mo, Younghan Nam, Elza Erkip, Jianzhong, Zhang

    Abstract: High-frequency wideband cellular communications over mmWave and sub-THz offer the opportunity for high data rates. However, it also presents high path loss, resulting in limited coverage. High-gain beamforming from the antenna array is essential to mitigate the coverage limitations. The conventional phased antenna arrays (PAA) cause high scheduling latency owing to analog beam constraints, i.e., o… ▽ More

    Submitted 13 August, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  19. arXiv:2312.16355  [pdf, other

    cs.DB

    Efficient Cost Modeling of Space-filling Curves

    Authors: Guanli Liu, Lars Kulik, Christian S. Jensen, Tianyi Li, Jianzhong Qi

    Abstract: A space-filling curve (SFC) maps points in a multi-dimensional space to one-dimensional points by discretizing the multi-dimensional space into cells and imposing a linear order on the cells. This way, an SFC enables the indexing of multi-dimensional data using a one-dimensional index such as a B+-tree. Choosing an appropriate SFC is crucial, as different SFCs have different effects on query perfo… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  20. Joint Phase-Time Arrays: A Paradigm for Frequency-Dependent Analog Beamforming in 6G

    Authors: Vishnu V. Ratnam, Jianhua Mo, Ahmad AlAmmouri, Boon L. Ng, Jianzhong, Zhang, Andreas F. Molisch

    Abstract: Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper pr… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The paper is a revised version of the IEEE Access paper, that includes the full operation of Algorithms 1-3 to help curtail incorrect implementations

    Journal ref: IEEE Access, vol. 10, pp. 73364-73377, 2022

  21. arXiv:2312.04606  [pdf, other

    cs.LG cs.DB

    Urban Region Representation Learning with Attentive Fusion

    Authors: Fengze Sun, Jianzhong Qi, Yanchuan Chang, Xiaoliang Fan, Shanika Karunasekera, Egemen Tanin

    Abstract: An increasing number of related urban data sources have brought forth novel opportunities for learning urban region representations, i.e., embeddings. The embeddings describe latent features of urban regions and enable discovering similar regions for urban planning applications. Existing methods learn an embedding for a region using every different type of region feature data, and subsequently fus… ▽ More

    Submitted 26 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  22. arXiv:2311.05167  [pdf, other

    physics.chem-ph cond-mat.soft cs.LG physics.comp-ph stat.AP

    Perfecting Liquid-State Theories with Machine Intelligence

    Authors: Jianzhong Wu, Mengyang Gu

    Abstract: Recent years have seen a significant increase in the use of machine intelligence for predicting electronic structure, molecular force fields, and the physicochemical properties of various condensed systems. However, substantial challenges remain in developing a comprehensive framework capable of handling a wide range of atomic compositions and thermodynamic conditions. This perspective discusses p… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  23. arXiv:2311.01212  [pdf, other

    cs.CV cs.AI

    Multi-level Relation Learning for Cross-domain Few-shot Hyperspectral Image Classification

    Authors: Chun Liu, Longwei Yang, Zheng Li, Wei Yang, Zhigang Han, Jianzhong Guo, Junyong Yu

    Abstract: Cross-domain few-shot hyperspectral image classification focuses on learning prior knowledge from a large number of labeled samples from source domains and then transferring the knowledge to the tasks which contain few labeled samples in target domains. Following the metric-based manner, many current methods first extract the features of the query and support samples, and then directly predict the… ▽ More

    Submitted 25 December, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

  24. arXiv:2311.00960  [pdf, other

    cs.DB

    Trajectory Similarity Measurement: An Efficiency Perspective

    Authors: Yanchuan Chang, Egemen Tanin, Gao Cong, Christian S. Jensen, Jianzhong Qi

    Abstract: Trajectories that capture object movement have numerous applications, in which similarity computation between trajectories often plays a key role. Traditionally, the similarity between two trajectories is quantified by means of heuristic measures, e.g., Hausdorff or ERP, that operate directly on the trajectories. In contrast, recent studies exploit deep learning to map trajectories to d-dimensiona… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted by VLDB 2024

  25. arXiv:2310.04018  [pdf, ps, other

    cs.DS

    Testing Higher-order Clusterability on graphs

    Authors: Yifei Li, Donghua Yang, Jianzhong Li

    Abstract: Analysis of higher-order organizations, usually small connected subgraphs called motifs, is a fundamental task on complex networks. This paper studies a new problem of testing higher-order clusterability: given query access to an undirected graph, can we judge whether this graph can be partitioned into a few clusters of highly-connected motifs? This problem is an extension of the former work propo… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  26. MaaSDB: Spatial Databases in the Era of Large Language Models (Vision Paper)

    Authors: Jianzhong Qi, Zuqing Li, Egemen Tanin

    Abstract: Large language models (LLMs) are advancing rapidly. Such models have demonstrated strong capabilities in learning from large-scale (unstructured) text data and answering user queries. Users do not need to be experts in structured query languages to interact with systems built upon such models. This provides great opportunities to reduce the barrier of information retrieval for the general public.… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted to appear in ACM SIGSPATIAL 2023

  27. Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images

    Authors: Zhiyun Song, Penghui Du, Junpeng Yan, Kailu Li, Jianzhong Shou, Maode Lai, Yubo Fan, Yan Xu

    Abstract: Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretr… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  28. arXiv:2309.05520  [pdf, other

    cs.SE

    When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?

    Authors: Chong Chen, Jianzhong Su, Jiachi Chen, Yanlin Wang, Tingting Bi, Jianxing Yu, Yanli Wang, Xingwei Lin, Ting Chen, Zibin Zheng

    Abstract: With the development of blockchain technology, smart contracts have become an important component of blockchain applications. Despite their crucial role, the development of smart contracts may introduce vulnerabilities and potentially lead to severe consequences, such as financial losses. Meanwhile, large language models, represented by ChatGPT, have gained great attentions, showcasing great capab… ▽ More

    Submitted 21 August, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  29. arXiv:2308.12691  [pdf, other

    cs.LG cs.DB

    An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression

    Authors: Bohan Lyu, Jianzhong Li

    Abstract: This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate a… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  30. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  31. arXiv:2307.12639  [pdf, other

    cs.SI cs.CL cs.GR cs.LG

    Fake News Detection Through Graph-based Neural Networks: A Survey

    Authors: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris

    Abstract: The popularity of online social networks has enabled rapid dissemination of information. People now can share and consume information much more rapidly than ever before. However, low-quality and/or accidentally/deliberately fake information can also spread rapidly. This can lead to considerable and negative impacts on society. Identifying, labelling and debunking online misinformation as early as… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 18 pages, 3 tables, 7 figures

  32. Optimal preprocessing of WiFi CSI for sensing applications

    Authors: Vishnu V. Ratnam, Hao Chen, Hao Hsuan Chang, Abhishek Sehgal, Jianzhong, Zhang

    Abstract: Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and pha… ▽ More

    Submitted 21 May, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Paper is accepted to IEEE Transactions on Wireless Communications

    Journal ref: IEEE Transactions on Wireless Communications (2024)

  33. arXiv:2307.11772  [pdf, other

    cs.IR cs.CL cs.LG

    AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

    Authors: Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, Jianzhong Qi

    Abstract: The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automat… ▽ More

    Submitted 13 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: 14 pages, 5 figures, 4 tables, IEEE Transactions on Knowledge and Data Engineering

  34. Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation

    Authors: Yuanjing Feng, Lei Xie, Jingqiang Wang, Qiyuan Tian, Jianzhong He, Qingrun Zeng, Fei Gao

    Abstract: Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of globa… ▽ More

    Submitted 17 August, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

  35. arXiv:2306.17659  [pdf, other

    cs.CV

    Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

    Authors: Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

    Abstract: Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the po… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: This article has been accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

  36. arXiv:2306.09681  [pdf

    physics.med-ph cs.LG

    Magnetic Resonance Spectroscopy Quantification Aided by Deep Estimations of Imperfection Factors and Macromolecular Signal

    Authors: Dicheng Chen, Meijin Lin, Huiting Liu, Jiayu Li, Yirong Zhou, Taishan Kang, Liangjie Lin, Zhigang Wu, Jiazheng Wang, Jing Li, Jianzhong Lin, Xi Chen, Di Guo, Xiaobo Qu

    Abstract: Objective: Magnetic Resonance Spectroscopy (MRS) is an important technique for biomedical detection. However, it is challenging to accurately quantify metabolites with proton MRS due to serious overlaps of metabolite signals, imperfections because of non-ideal acquisition conditions, and interference with strong background signals mainly from macromolecules. The most popular method, LCModel, adopt… ▽ More

    Submitted 9 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

  37. arXiv:2306.05623  [pdf

    cs.CV

    Reconstructing the somatotopic organization of the corticospinal tract remains a challenge for modern tractography methods

    Authors: Jianzhong He, Fan Zhang, Yiang Pan, Yuanjing Feng, Jarrett Rushmore, Erickson Torio, Yogesh Rathi, Nikos Makris, Ron Kikinis, Alexandra J. Golby, Lauren J. O'Donnell

    Abstract: The corticospinal tract (CST) is a critically important white matter fiber tract in the human brain that enables control of voluntary movements of the body. Diffusion MRI tractography is the only method that enables the study of the anatomy and variability of the CST pathway in human health. In this work, we explored the performance of six widely used tractography methods for reconstructing the CS… ▽ More

    Submitted 14 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 41 pages, 19 figures

  38. arXiv:2306.02854  [pdf, other

    cs.CV

    Asymmetric Patch Sampling for Contrastive Learning

    Authors: Chengchao Shen, Jianzhong Chen, Shu Wang, Hulin Kuang, Jin Liu, Jianxin Wang

    Abstract: Asymmetric appearance between positive pair effectively reduces the risk of representation degradation in contrastive learning. However, there are still a mass of appearance similarities between positive pair constructed by the existing methods, which inhibits the further representation improvement. In this paper, we propose a novel asymmetric patch sampling strategy for contrastive learning, to f… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  39. Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

    Authors: Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

    Abstract: Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is under… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic

  40. arXiv:2305.16548  [pdf, other

    cs.CL cs.AI

    Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization

    Authors: Rongxin Zhu, Jianzhong Qi, Jey Han Lau

    Abstract: A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles. Dialogue summaries, however, have been under explored. In this paper, we present the first dataset with fine-grained factual error annotations named DIASUMFACT. We define fine-grained factual error detection as a sentence-level multi-label classification problem, and we ev… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted in ACL 2023

  41. arXiv:2305.08456  [pdf, other

    cs.SE

    DAppSCAN: Building Large-Scale Datasets for Smart Contract Weaknesses in DApp Projects

    Authors: Zibin Zheng, Jianzhong Su, Jiachi Chen, David Lo, Zhijie Zhong, Mingxi Ye

    Abstract: The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating th… ▽ More

    Submitted 18 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Dataset available at https://github.com/InPlusLab/DAppSCAN

  42. arXiv:2303.13770  [pdf, other

    cs.SE

    Turn the Rudder: A Beacon of Reentrancy Detection for Smart Contracts on Ethereum

    Authors: Zibin Zheng, Neng Zhang, Jianzhong Su, Zhijie Zhong, Mingxi Ye, Jiachi Chen

    Abstract: Smart contracts are programs deployed on a blockchain and are immutable once deployed. Reentrancy, one of the most important vulnerabilities in smart contracts, has caused millions of dollars in financial loss. Many reentrancy detection approaches have been proposed. It is necessary to investigate the performance of these approaches to provide useful guidelines for their application. In this work,… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted by ICSE 2023. Dataset available at https://github.com/InPlusLab/ReentrancyStudy-Data

  43. arXiv:2303.06565  [pdf, other

    cs.CL cs.AI

    Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

    Authors: Miao Li, Jianzhong Qi, Jey Han Lau

    Abstract: Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: AAAI 2023

  44. arXiv:2303.06213  [pdf, other

    cs.LG cs.AI

    CHGNN: A Semi-Supervised Contrastive Hypergraph Learning Network

    Authors: Yumeng Song, Yu Gu, Tianyi Li, Jianzhong Qi, Zhenghao Liu, Christian S. Jensen, Ge Yu

    Abstract: Hypergraphs can model higher-order relationships among data objects that are found in applications such as social networks and bioinformatics. However, recent studies on hypergraph learning that extend graph convolutional networks to hypergraphs cannot learn effectively from features of unlabeled data. To such learning, we propose a contrastive hypergraph neural network, CHGNN, that exploits self-… ▽ More

    Submitted 28 May, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted by TKDE

  45. arXiv:2303.00259  [pdf, other

    cs.DS

    Computing All Restricted Skyline Probabilities on Uncertain Datasets

    Authors: Xiangyu Gao, Jianzhong Li, Dongjing Miao

    Abstract: Restricted skyline (rskyline) query is widely used in multi-criteria decision making. It generalizes the skyline query by additionally considering a set of personalized scoring functions F. Since uncertainty is inherent in datasets for multi-criteria decision making, we study rskyline queries on uncertain datasets from both complexity and algorithm perspective. We formalize the problem of computin… ▽ More

    Submitted 12 January, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Full version, a shorter version to appear in ICDE 2024

  46. arXiv:2302.14287  [pdf, other

    cs.DB cs.LG

    WISK: A Workload-aware Learned Index for Spatial Keyword Queries

    Authors: Yufan Sheng, Xin Cao, Yixiang Fang, Kaiqi Zhao, Jianzhong Qi, Gao Cong, Wenjie Zhang

    Abstract: Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual dat… ▽ More

    Submitted 13 April, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: PACMMOD camera-ready version with appendix. Accpeted by ACM SIGMOD 2023

  47. arXiv:2302.13549  [pdf

    cs.DS

    Random-Order Enumeration for Self-Reducible NP-Problems

    Authors: Pengyu Chen, Dongjing Miao, Weitian Tong, Zizheng Guo, Jianzhong Li, Zhipeng Cai

    Abstract: In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration algorithm produces all solutions of a problem instance without repetition. To be a statistically meaningful representation of the solution space, solutions are req… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  48. arXiv:2302.12953  [pdf, other

    cs.CC

    The Hardness of Optimization Problems on the Weighted Massively Parallel Computation Model

    Authors: Hengzhao Ma, Jianzhong Li

    Abstract: The topology-aware Massively Parallel Computation (MPC) model is proposed and studied recently, which enhances the classical MPC model by the awareness of network topology. The work of Hu et al. on topology-aware MPC model considers only the tree topology. In this paper a more general case is considered, where the underlying network is a weighted complete graph. We then call this model as Weighted… ▽ More

    Submitted 16 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  49. INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging

    Authors: Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, Jianzhong Qi, Chaochao Chen, Longbiao Chen

    Abstract: Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge for (the things at) unobserved locations using the data from (the things at) observed locations during a given time period of interest. This problem essentiall… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: WWW 2023 paper

  50. arXiv:2211.13975  [pdf, other

    cs.LG

    FedGS: Federated Graph-based Sampling with Arbitrary Client Availability

    Authors: Zheng Wang, Xiaoliang Fan, Jianzhong Qi, Haibing Jin, Peizhen Yang, Siqi Shen, Cheng Wang

    Abstract: While federated learning has shown strong results in optimizing a machine learning model without direct access to the original data, its performance may be hindered by intermittent client availability which slows down the convergence and biases the final learned model. There are significant challenges to achieve both stable and bias-free training under arbitrary client availability. To address the… ▽ More

    Submitted 7 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI23