Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Zhang, C J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14507  [pdf, other

    cs.DB cs.AI

    Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework

    Authors: Longyu Feng, Huahang Li, Chen Jason Zhang

    Abstract: Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling proba… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  2. arXiv:2408.07401  [pdf, other

    cs.CL cs.AI cs.DB

    DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

    Authors: Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

    Abstract: Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. arXiv:2408.04197  [pdf, other

    cs.IR cs.AI cs.DB

    Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

    Authors: Mengze Hong, Chen Jason Zhang

    Abstract: Semantic Embedding Model (SEM), a neural network-based Siamese architecture, is gaining momentum in information retrieval and natural language processing. In order to train SEM in a supervised fashion for Web search, the search engine query log is typically utilized to automatically formulate pairwise judgments as training data. Despite the growing application of semantic embeddings in the search… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  4. arXiv:2407.15360  [pdf, other

    cs.CL

    Dissecting Multiplication in Transformers: Insights into LLMs

    Authors: Luyu Qiu, Jianing Li, Chi Su, Chen Jason Zhang, Lei Chen

    Abstract: Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This stark disparity raise human's concerns about their safe and ethical use, hinder their widespread adoption.In this paper, we focus on a typical arithmetic task, inte… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

  5. arXiv:2406.15960  [pdf, other

    cs.LG cs.AI cs.CY cs.DS

    Fair Clustering: Critique, Caveats, and Future Directions

    Authors: John Dickerson, Seyed A. Esmaeili, Jamie Morgenstern, Claire Jie Zhang

    Abstract: Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms.… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2405.09079  [pdf, other

    eess.SP cs.IT

    Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave Systems

    Authors: Murat Bayraktar, Nuria González-Prelcic, Mikko Valkama, Hao Chen, Charlie Jianzhong Zhang

    Abstract: In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures

  7. arXiv:2404.16137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM

    Authors: Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang

    Abstract: High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this wor… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 5 pages, under review

  8. arXiv:2403.12370  [pdf, other

    cs.CV

    XPose: eXplainable Human Pose Estimation

    Authors: Luyu Qiu, Jianing Li, Lei Wen, Chi Su, Fei Hao, Chen Jason Zhang, Lei Chen

    Abstract: Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  9. BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

    Authors: Huahang Li, Shuangyin Li, Fei Hao, Chen Jason Zhang, Yuanfeng Song, Lei Chen

    Abstract: Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 4 pages, 3 figures, The Web Conf 2024 - WWW'24

  10. arXiv:2401.03426  [pdf, other

    cs.CL cs.AI

    On Leveraging Large Language Models for Enhancing Entity Resolution

    Authors: Huahang Li, Longyu Feng, Shuangyin Li, Fei Hao, Chen Jason Zhang, Yuanfeng Song, Lei Chen

    Abstract: Entity resolution, the task of identifying and consolidating records that pertain to the same real-world entity, plays a pivotal role in various sectors such as e-commerce, healthcare, and law enforcement. The emergence of Large Language Models (LLMs) like GPT-4 has introduced a new dimension to this task, leveraging their advanced linguistic capabilities. This paper explores the potential of LLMs… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 12 pages,6 figures, ICDE 2024

  11. arXiv:2310.05035  [pdf, other

    cs.CL cs.AI

    Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection

    Authors: Haodi Zhang, Min Cai, Xinhe Zhang, Chen Jason Zhang, Rui Mao, Kaishun Wu

    Abstract: While large language models (LLMs) such as ChatGPT and PaLM have demonstrated remarkable performance in various language understanding and generation tasks, their capabilities in complex reasoning and intricate knowledge utilization still fall short of human-level proficiency. Recent studies have established the effectiveness of prompts in steering LLMs towards generating desired outputs. Building… ▽ More

    Submitted 10 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  12. arXiv:2305.19475  [pdf, other

    cs.LG cs.AI cs.DS

    Doubly Constrained Fair Clustering

    Authors: John Dickerson, Seyed A. Esmaeili, Jamie Morgenstern, Claire Jie Zhang

    Abstract: The remarkable attention which fair clustering has received in the last few years has resulted in a significant number of different notions of fairness. Despite the fact that these notions are well-justified, they are often motivated and studied in a disjoint manner where one fairness desideratum is considered exclusively in isolation from the others. This leaves the understanding of the relations… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  13. arXiv:1809.04017  [pdf, other

    cs.DB

    Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates

    Authors: Chen Jason Zhang, Lei Chen, H. V. Jagadish, Mengchen Zhang, Yongxin Tong

    Abstract: Schema matching is a central challenge for data integration systems. Inspired by the popularity and the success of crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching. Since crowdsourcing platforms are most effective for simple questions, we assume that each Correspondence Correctness Question (CCQ) asks the crowd to decide whether a given corr… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: 15 pages

  14. arXiv:1702.00567  [pdf, other

    cs.DB

    CrowdFusion: A Crowdsourced Approach on Data Fusion Refinement

    Authors: Yunfan Chen, Lei Chen, Chen Jason Zhang

    Abstract: Data fusion has played an important role in data mining because high-quality data is required in a lot of applications. As on-line data may be out-of-date and errors in the data may propagate with copying and referring between sources, it is hard to achieve satisfying results with merely applying existing data fusion methods to fuse Web data. In this paper, we make use of the crowd to achieve high… ▽ More

    Submitted 2 February, 2017; originally announced February 2017.

    Comments: A short version of this paper will be published as ICDE'2017 poster