Skip to main content

Showing 1–50 of 96 results for author: Cheng, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2406.10432  [pdf, other

    cs.CL

    Enhancing In-Context Learning with Semantic Representations for Relation Extraction

    Authors: Peitao Han, Lis Kanashiro Pereira, Fei Cheng, Wan Jou She, Eiji Aramaki

    Abstract: In this work, we employ two AMR-enhanced semantic representations for ICL on RE: one that explores the AMR structure generated for a sentence at the subgraph level (shortest AMR path), and another that explores the full AMR structure generated for a sentence. In both cases, we demonstrate that all settings benefit from the fine-grained AMR's semantic structure. We evaluate our model on four RE dat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.06847  [pdf, other

    cs.CV

    Generalized W-Net: Arbitrary-style Chinese Character Synthesization

    Authors: Haochuan Jiang, Guanyu Yang, Fei Cheng, Kaizhu Huang

    Abstract: Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Journal ref: International Conference on Brain Inspired Cognitive Systems 2023

  5. arXiv:2405.19209  [pdf, other

    cs.CV cs.AI cs.CL

    VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

    Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

    Abstract: Video-language understanding tasks have focused on short video clips, often struggling with long-form video understanding tasks. Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captio… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, first three authors contributed equally; Project page: https://videotree2024.github.io/

  6. arXiv:2405.17137  [pdf, other

    cs.CV

    Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

    Authors: Kangye Ji, Fei Cheng, Zeqing Wang, Bohu Huang

    Abstract: Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning wi… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  7. arXiv:2405.11921  [pdf, other

    cs.CV

    MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

    Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

    Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  8. arXiv:2405.03913  [pdf, other

    q-bio.QM cs.LG stat.ML

    Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

    Authors: Fuqiang Cheng, Wei Xie, Hua Zheng

    Abstract: Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  9. arXiv:2405.00708  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Analysis of LLMs using Meaningful Counterfactuals

    Authors: Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Fürst, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Se… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    ACM Class: I.2.7; H.5.2

  10. arXiv:2404.10209  [pdf, other

    cs.AI cs.LG

    Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

    Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  11. arXiv:2403.18504  [pdf

    cs.CL

    AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QA

    Authors: Felix Virgo, Fei Cheng, Lis Kanashiro Pereira, Masayuki Asahara, Ichiro Kobayashi, Sadao Kurohashi

    Abstract: We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance compara… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  12. arXiv:2403.11517  [pdf, other

    q-bio.NC cs.HC

    Inter-individual and inter-site neural code conversion and image reconstruction without shared stimuli

    Authors: Haibao Wang, Jun Kai Ho, Fan L. Cheng, Shuntaro C. Aoki, Yusuke Muraki, Misato Tanaka, Yukiyasu Kamitani

    Abstract: The human brain demonstrates substantial inter-individual variability in fine-grained functional topography, posing challenges in identifying common neural representations across individuals. Functional alignment has the potential to harmonize these individual differences. However, it typically requires an identical set of stimuli presented to different individuals, which is often unavailable. To… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2403.08755  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DAM: Dynamic Adapter Merging for Continual Video QA Learning

    Authors: Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

    Abstract: We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Give… ▽ More

    Submitted 22 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: The first two authors contribute equally

  14. arXiv:2403.03690  [pdf

    cs.CL cs.AI

    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

    Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}

  15. arXiv:2402.00891  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Large Language Models in Cybersecurity: State-of-the-Art

    Authors: Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi, Feng Cheng, Christoph Meinel

    Abstract: The rise of Large Language Models (LLMs) has revolutionized our comprehension of intelligence bringing us closer to Artificial Intelligence. Since their introduction, researchers have actively explored the applications of LLMs across diverse fields, significantly elevating capabilities. Cybersecurity, traditionally resistant to data-driven solutions and slow to embrace machine learning, stands out… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

  16. arXiv:2312.17449  [pdf, other

    cs.DB

    DB-GPT: Empowering Database Interactions with Private Large Language Models

    Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More

    Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  17. arXiv:2312.12414  [pdf, ps, other

    cs.DB cs.AI cs.LG

    Translating Natural Language Queries to SQL Using the T5 Model

    Authors: Albert Wong, Lien Pham, Young Lee, Shek Chan, Razel Sadaya, Youry Khmelevsky, Mathias Clement, Florence Wing Yau Cheng, Joe Mahony, Michael Ferri

    Abstract: This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used s… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  18. arXiv:2312.03814  [pdf, other

    cs.LG cs.AI

    Pearl: A Production-ready Reinforcement Learning Agent

    Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

    Abstract: Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety const… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  19. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  20. RELIC: Investigating Large Language Model Responses using Self-Consistency

    Authors: Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence… ▽ More

    Submitted 4 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  21. arXiv:2311.14381  [pdf

    cs.CY cs.AI

    Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review

    Authors: Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng

    Abstract: Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping… ▽ More

    Submitted 22 June, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Work in progress

  22. arXiv:2311.14378  [pdf

    cs.AI cs.CY

    Ethical Implications of ChatGPT in Higher Education: A Scoping Review

    Authors: Ming Li, Ariunaa Enkhtur, Fei Cheng, Beverley Anne Yamamoto

    Abstract: This scoping review explores the ethical challenges of using ChatGPT in higher education. By reviewing recent academic articles in English, Chinese, and Japanese, we aimed to provide a deep dive review and identify gaps in the literature. Drawing on Arksey and O'Malley's (2005) scoping review framework, we defined search terms and identified relevant publications from four databases in the three t… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by Journal of Interdisciplinary Studies in Education

    Journal ref: Volume 13, Issue 1, 2024, pp. 55-68

  23. arXiv:2310.20236  [pdf, other

    cs.CL

    Dynamically Updating Event Representations for Temporal Relation Classification with Multi-category Learning

    Authors: Fei Cheng, Masayuki Asahara, Ichiro Kobayashi, Sadao Kurohashi

    Abstract: Temporal relation classification is a pair-wise task for identifying the relation of a temporal link (TLINK) between two mentions, i.e. event, time, and document creation time (DCT). It leads to two crucial limits: 1) Two TLINKs involving a common mention do not share information. 2) Existing models with independent classifiers for each TLINK category (E2E, E2T, and E2D) hinder from using the whol… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: EMNLP 2020 Findings

  24. arXiv:2310.12182  [pdf, other

    cs.AR

    Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

    Authors: Xueying Wu, Edward Hanson, Nansu Wang, Qilin Zheng, Xiaoxuan Yang, Huanrui Yang, Shiyu Li, Feng Cheng, Partha Pratim Pande, Janardhan Rao Doppa, Krishnendu Chakrabarty, Hai Li

    Abstract: Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activ… ▽ More

    Submitted 27 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 12 pages, 13 figures

  25. arXiv:2310.09426  [pdf, other

    cs.LG stat.ML

    Offline Reinforcement Learning for Optimizing Production Bidding Policies

    Authors: Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu

    Abstract: The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the plat… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  26. arXiv:2310.03328  [pdf, other

    cs.CL

    Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise

    Authors: Zhen wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi

    Abstract: While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-doma… ▽ More

    Submitted 12 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Under submission to ICLR 2024

  27. arXiv:2309.10091  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unified Coarse-to-Fine Alignment for Video-Text Retrieval

    Authors: Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal

    Abstract: The canonical approach to video-text retrieval leverages a coarse-grained or fine-grained alignment between visual and textual information. However, retrieving the correct video according to the text query is often challenging as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To this end, we propose a Unifi… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  28. arXiv:2307.12199  [pdf, other

    cs.HC

    Leveraging Historical Medical Records as a Proxy via Multimodal Modeling and Visualization to Enrich Medical Diagnostic Learning

    Authors: Yang Ouyang, Yuchen Wu, He Wang, Chenyang Zhang, Furui Cheng, Chang Jiang, Lixia Jin, Yuanwu Cao, Quan Li

    Abstract: Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnose… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE VIS 2023

  29. arXiv:2306.11251  [pdf, other

    cs.CV

    Eliminating Lipschitz Singularities in Diffusion Models

    Authors: Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

    Abstract: Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models. However, the rationality of the diffusion process itself receives limited attention, leaving the question of whether the problem is well-posed and well-conditioned. In this paper, we uncover a vexing propensity of diffusion models: they frequen… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  30. arXiv:2306.09719  [pdf, other

    cs.CL cs.AI

    Pushing the Limits of ChatGPT on NLP Tasks

    Authors: Xiaofei Sun, Linfeng Dong, Xiaoya Li, Zhen Wan, Shuhe Wang, Tianwei Zhang, Jiwei Li, Fei Cheng, Lingjuan Lyu, Fei Wu, Guoyin Wang

    Abstract: Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors: (1) token limit in the prompt does not allow for the full utilization of the supervised datasets; (2) mismatch between the generation nature of ChatGPT and NLP tasks… ▽ More

    Submitted 9 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

  31. arXiv:2305.16896  [pdf, other

    cs.CL cs.AI cs.LG

    MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

    Authors: Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, Sadao Kurohashi

    Abstract: Large language models (LLMs) have achieved impressive performance on various reasoning tasks. To further improve the performance, we propose MultiTool-CoT, a novel framework that leverages chain-of-thought (CoT) prompting to incorporate multiple external tools, such as a calculator and a knowledge retriever, during the reasoning process. We apply MultiTool-CoT to the Task 2 dataset of NumGLUE, whi… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL2023. Our code is available at https://github.com/InabaTatsuro/MultiTool-CoT

  32. arXiv:2305.07475  [pdf, other

    cs.CL

    Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

    Authors: Qianying Liu, Dongsheng Yang, Wenjie Zhong, Fei Cheng, Sadao Kurohashi

    Abstract: Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 11 pages

  33. arXiv:2305.02105  [pdf, other

    cs.CL

    GPT-RE: In-context Learning for Relation Extraction using Large Language Models

    Authors: Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

    Abstract: In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023 Main Conference (long paper)

  34. arXiv:2304.03928  [pdf

    cs.LG stat.AP

    Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation

    Authors: Hengjie Yu, Dan Luo, Sam F. Y. Li, Maozhen Qu, Da Liu, Yingchao He, Fang Cheng

    Abstract: Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI)… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 30 pages, 6 figures

  35. arXiv:2303.10318  [pdf, other

    cs.CV

    Crowd Counting with Online Knowledge Learning

    Authors: Shengqin Jiang, Bowen Li, Fengna Cheng, Qingshan Liu

    Abstract: Efficient crowd counting models are urgently required for the applications in scenarios with limited computing resources, such as edge computing and mobile devices. A straightforward method to achieve this is knowledge distillation (KD), which involves using a trained teacher network to guide the training of a student network. However, this traditional two-phase training method can be time-consumi… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: under review

  36. HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing

    Authors: Paul Streli, Rayan Armani, Yi Fei Cheng, Christian Holz

    Abstract: Current Virtual Reality systems are designed for interaction under visual control. Using built-in cameras, headsets track the user's hands or hand-held controllers while they are inside the field of view. Current systems thus ignore the user's interaction with off-screen content -- virtual objects that the user could quickly access through proprioception without requiring laborious head motions to… ▽ More

    Submitted 30 April, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted at 2023 CHI Conference on Human Factors in Computing Systems

    ACM Class: I.2; I.5; H.5

  37. arXiv:2301.13337   

    cs.CV

    DAFD: Domain Adaptation via Feature Disentanglement for Image Classification

    Authors: Zhize Wu, Changjiang Du, Le Zou, Ming Tan, Tong Xu, Fan Cheng, Fudong Nian, Thomas Weise

    Abstract: A good feature representation is the key to image classification. In practice, image classifiers may be applied in scenarios different from what they have been trained on. This so-called domain shift leads to a significant performance drop in image classification. Unsupervised domain adaptation (UDA) reduces the domain shift by transferring the knowledge learned from a labeled source domain to an… ▽ More

    Submitted 9 January, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Update the experimental results

  38. arXiv:2301.08237  [pdf, other

    cs.CV

    LoCoNet: Long-Short Context Network for Active Speaker Detection

    Authors: Xizi Wang, Feng Cheng, Gedas Bertasius, David Crandall

    Abstract: Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a video. ASD reasons from audio and visual information from two contexts: long-term intra-speaker context and short-term inter-speaker context. Long-term intra-speaker context models the temporal dependencies of the same speaker, while short-term inter-speaker context models the interactions of speakers in the same sc… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: accepted by CVPR 2024

  39. arXiv:2212.14193  [pdf, other

    cs.CV

    A Unified Object Counting Network with Object Occupation Prior

    Authors: Shengqin Jiang, Qing Wang, Fengna Cheng, Yuankai Qi, Qingshan Liu

    Abstract: The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counti… ▽ More

    Submitted 30 June, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY; The dataset and code are available at: https://github.com/Tanyjiang/EOCO

  40. arXiv:2212.05051  [pdf, other

    cs.CV

    VindLU: A Recipe for Effective Video-and-Language Pretraining

    Authors: Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

    Abstract: The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures and sophisticated pretraining protocols, making the reproducibility, analysis and comparisons of these frameworks difficult. Hence, instead of proposing yet another new VidL model, this paper conducts a thorough e… ▽ More

    Submitted 5 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Project page: https://klauscc.github.io/vindlu.html

  41. arXiv:2211.16032  [pdf, other

    cs.LG cs.CV

    Dimensionality-Varying Diffusion Process

    Authors: Han Zhang, Ruili Feng, Zhantao Yang, Lianghua Huang, Yu Liu, Yifei Zhang, Yujun Shen, Deli Zhao, Jingren Zhou, Fan Cheng

    Abstract: Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension. We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. To this end, we make a theoretical generalization o… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  42. arXiv:2211.16022  [pdf, other

    cs.CL

    Textual Enhanced Contrastive Learning for Solving Math Word Problems

    Authors: Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi

    Abstract: Solving math word problems is the task that analyses the relation of quantities and requires an accurate understanding of contextual natural language information. Recent studies show that current models rely on shallow heuristics to predict solutions and could be easily misled by small textual perturbations. To address this problem, we propose a Textual Enhanced Contrastive Learning framework, whi… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Findings of EMNLP 2022

  43. arXiv:2211.07466  [pdf, other

    cs.NI cs.LG

    Reinforcement Learning Based Resource Allocation for Network Slices in O-RAN Midhaul

    Authors: Nien Fang Cheng, Turgay Pamuklu, Melike Erol-Kantarci

    Abstract: Network slicing envisions the 5th generation (5G) mobile network resource allocation to be based on different requirements for different services, such as Ultra-Reliable Low Latency Communication (URLLC) and Enhanced Mobile Broadband (eMBB). Open Radio Access Network (O-RAN), proposes an open and disaggregated concept of RAN by modulizing the functionalities into independent components. Network sl… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted Paper for IEEE CCNC 2023

  44. arXiv:2210.11800  [pdf, other

    cs.CL

    Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction

    Authors: Zhen Wan, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Jiwei Li

    Abstract: Relation extraction (RE) has achieved remarkable progress with the help of pre-trained language models. However, existing RE models are usually incapable of handling two situations: implicit expressions and long-tail relation types, caused by language complexity and data sparsity. In this paper, we introduce a simple enhancement of RE using $k$ nearest neighbors ($k$NN-RE). $k$NN-RE allows the mod… ▽ More

    Submitted 30 January, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 (short paper)

  45. arXiv:2210.10421  [pdf

    cs.CV cs.LG

    Multi-view Gait Recognition based on Siamese Vision Transformer

    Authors: Yanchen Yang, Lijun Yun, Ruoyu Li, Feiyan Cheng

    Abstract: While the Vision Transformer has been used in gait recognition, its application in multi-view gait recognition is still limited. Different views significantly affect the extraction and identification accuracy of the characteristics of gait contour. To address this, this paper proposes a Siamese Mobile Vision Transformer (SMViT). This model not only focuses on the local characteristics of the human… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 13 pages,9 figures,1 table

  46. arXiv:2210.07017  [pdf, other

    cs.CL

    ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision

    Authors: Qianying Liu, Wenyu Guan, Jianhao Shen, Fei Cheng, Sadao Kurohashi

    Abstract: Previous studies have introduced a weakly-supervised paradigm for solving math word problems requiring only the answer value annotation. While these methods search for correct value equation candidates as pseudo labels, they search among a narrow sub-space of the enormous equation space. To address this problem, we propose a novel search algorithm with combinatorial strategy \textbf{ComSearch}, wh… ▽ More

    Submitted 7 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: EACL 2023 long paper, 14 pages

  47. arXiv:2209.10310  [pdf, other

    cs.CL

    Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems

    Authors: Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng, Sadao Kurohashi

    Abstract: To solve Math Word Problems, human students leverage diverse reasoning logic that reaches different possible equation solutions. However, the mainstream sequence-to-sequence approach of automatic solvers aims to decode a fixed solution equation supervised by human annotation. In this paper, we propose a controlled equation generation solver by leveraging a set of control codes to guide the model t… ▽ More

    Submitted 29 November, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: AACL 2022 short paper

  48. arXiv:2208.13108  [pdf, ps, other

    cs.IT math-ph math.AG

    A Reformulation of Gaussian Completely Monotone Conjecture: A Hodge Structure on the Fisher Information along Heat Flow

    Authors: Fan Cheng

    Abstract: In the past decade, J. Huh solved several long-standing open problems on log-concave sequences in combinatorics. The ground-breaking techniques developed in those work are from algebraic geometry: "We believe that behind any log-concave sequence that appears in nature there is such a Hodge structure responsible for the log-concavity". A function is called completely monotone if its derivatives a… ▽ More

    Submitted 27 August, 2022; originally announced August 2022.

  49. ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset

    Authors: Zhihua Jin, Xingbo Wang, Furui Cheng, Chunhui Sun, Qun Liu, Huamin Qu

    Abstract: Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts -- unwanted biases in the benchmark datasets -- can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoi… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 15 pages, 6 figures

  50. arXiv:2206.13042  [pdf, other

    cs.CV eess.IV

    A Strategy Optimized Pix2pix Approach for SAR-to-Optical Image Translation Task

    Authors: Fujian Cheng, Yashu Kang, Chunlei Chen, Kezhao Jiang

    Abstract: This technical report summarizes the analysis and approach on the image-to-image translation task in the Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022). In terms of strategy optimization, cloud classification is utilized to filter optical images with dense cloud coverage to aid the supervised learning alike approach. The commonly used pix2pix framework with a few optimiz… ▽ More

    Submitted 4 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.