Skip to main content

Showing 1–50 of 1,124 results for author: Lin, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12579  [pdf, other

    cs.CV cs.AI

    The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

    Authors: Yi Yao, Chan-Feng Hsu, Jhe-Hao Lin, Hongxia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data. This work explores how diffusion models can generate images from prompts requiring artistic creativity or specialized knowledge. We introduce the Realistic-Fantasy Benchmark (RFBench), a novel evaluation framew… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2407.11578  [pdf, other

    cs.CV eess.IV

    UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

    Authors: Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo

    Abstract: This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and pla… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  3. arXiv:2407.11477  [pdf, other

    cs.LG cs.AI

    XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

    Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

    Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,9… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.11007  [pdf, other

    cs.CL cs.AI

    Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

    Authors: Jiacheng Lin, Hanwen Xu, Zifeng Wang, Sheng Wang, Jimeng Sun

    Abstract: Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challen… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  5. arXiv:2407.10916  [pdf, other

    cs.LG cs.SI

    When Heterophily Meets Heterogeneity: New Graph Benchmarks and Effective Methods

    Authors: Junhong Lin, Xiaojie Guo, Shuaicheng Zhang, Dawei Zhou, Yada Zhu, Julian Shun

    Abstract: Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  6. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  7. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  8. arXiv:2407.09652  [pdf, other

    cs.CL

    How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs

    Authors: Andrea W Wen-Yi, Unso Eun Seo Jo, Lu Jia Lin, David Mimno

    Abstract: Contemporary language models are increasingly multilingual, but Chinese LLM developers must navigate complex political and business considerations of language diversity. Language policy in China aims at influencing the public discourse and governing a multi-ethnic society, and has gradually transitioned from a pluralist to a more assimilationist approach since 1949. We explore the impact of these… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Wen-Yi and Jo contributed equally to this work

  9. ecVoice: Audio Text Extraction and Optimization of Video Based on Idioms Similarity Replacement

    Authors: Jinwei Lin

    Abstract: The Text Extraction of the Audio from the Video plays an important role in multimedia editing and processing. As a popular open source toolkit, Whisper performs fast in human voice recognition. However, the recognition performance is dependent on the computing resource, which makes the low computing memory running Whisper become difficult. Our paper presents an available solution to extract the hu… ▽ More

    Submitted 20 May, 2024; originally announced July 2024.

    Comments: APSIPA ASC 2023

  10. arXiv:2407.09484  [pdf

    cs.HC cs.CY

    GPTutor: Great Personalized Tutor with Large Language Models for Personalized Learning Content Generation

    Authors: Eason Chen, Jia-En Lee, Jionghao Lin, Kenneth Koedinger

    Abstract: We developed GPTutor, a pioneering web application designed to revolutionize personalized learning by leveraging the capabilities of Generative AI at scale. GPTutor adapts educational content and practice exercises to align with individual students' interests and career goals, enhancing their engagement and understanding of critical academic concepts. The system uses a serverless architecture to d… ▽ More

    Submitted 16 May, 2024; originally announced July 2024.

  11. Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

    Authors: Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin

    Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: SIGIR 2024 (Resource & Reproducibility Track)

  12. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  13. arXiv:2407.06886  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

    Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

    Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More

    Submitted 18 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

  14. arXiv:2407.05652  [pdf, other

    cs.SE

    StmtTree: An Easy-to-Use yet Versatile Fortran Transformation Toolkit

    Authors: Jingbo Lin, Yi Yu, Zhang Yang, Yafan Zhao

    Abstract: The Fortran programming language continues to dominate the scientific computing community, with many production codes written in the outdated Fortran-77 dialect, yet with many non-standard extensions such as Cray poiters. This creates significant maintenance burden within the community, with tremendous efforts devoted to modernization. However, despite the modern age of advanced compiler framework… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages, 2 tables, 1 figure, submitted to ICSME 2024

  15. arXiv:2407.04960  [pdf, other

    cs.IR

    MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's histo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  16. arXiv:2407.04925  [pdf, other

    cs.IR cs.AI cs.HC

    RAMO: Retrieval-Augmented Generation for Enhancing MOOCs Recommendations

    Authors: Jiarui Rao, Jionghao Lin

    Abstract: Massive Open Online Courses (MOOCs) have significantly enhanced educational accessibility by offering a wide variety of courses and breaking down traditional barriers related to geography, finance, and time. However, students often face difficulties navigating the vast selection of courses, especially when exploring new fields of study. Driven by this challenge, researchers have been exploring cou… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 7 pages, this paper underwent a rigorous review process and was officially accepted on May 31, 2024, for presentation at the Educational Data Mining 2024 Workshop: Leveraging Large Language Models for Next Generation Educational Technologies

  17. arXiv:2407.03535  [pdf, other

    cs.CV

    BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

    Authors: Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

    Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, inco… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.01970

  18. arXiv:2407.02818  [pdf, other

    cs.SE cs.ET cs.PL

    WizardMerge -- Save Us From Merging Without Any Clues

    Authors: Qingyu Zhang, Junzhe Li, Jiayi Lin, Jie Ding, Lanteng Lin, Chenxiong Qian

    Abstract: Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, dev… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 22 pages

    ACM Class: D.2; D.3

  19. arXiv:2407.02068  [pdf, other

    cs.CV

    LPViT: Low-Power Semi-structured Pruning for Vision Transformers

    Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More

    Submitted 12 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  20. arXiv:2407.01245  [pdf, other

    cs.AI cs.CY

    SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

    Authors: Lingyue Fu, Hao Guan, Kounianhua Du, Jianghao Lin, Wei Xia, Weinan Zhang, Ruiming Tang, Yasheng Wang, Yong Yu

    Abstract: Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  21. arXiv:2407.00118  [pdf, other

    cs.LG cs.AI

    From Efficient Multimodal Models to World Models: A Survey

    Authors: Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang

    Abstract: Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential in achieving artificial general intelligence and as a pathway to world models. We provide an overview of… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  22. arXiv:2407.00025  [pdf, other

    cs.DC

    Anywhere: A Web Crawler Automation Management Interface

    Authors: Jinwei Lin

    Abstract: Web crawling projects or design is significant in the current information age. Using the web spider or crawler can automatically search and collect a huge amount of internet information. As one of the most popular web crawler frameworks, Scrapy is robust in abundant functions but weak in easy operation. In this paper, we provide a framework Anywhere, for optimising the usage feeling and improving… ▽ More

    Submitted 10 May, 2024; originally announced July 2024.

    Comments: 8 pages

  23. arXiv:2406.19647  [pdf, other

    cs.IR

    Doc2Token: Bridging Vocabulary Gap by Predicting Missing Tokens for E-commerce Search

    Authors: Kaihao Li, Juexin Lin, Tony Lee

    Abstract: Addressing the "vocabulary mismatch" issue in information retrieval is a central challenge for e-commerce search engines, because product pages often miss important keywords that customers search for. Doc2Query[1] is a popular document-expansion technique that predicts search queries for a document and includes the predicted queries with the document for retrieval. However, this approach can be in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 9 pages, 1 figure, SIGIR 2024 Workshop on eCommerce

    ACM Class: H.3.3

  24. arXiv:2406.19394  [pdf, other

    cs.CV

    HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

    Authors: Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

    Abstract: Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional sup… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.18825  [pdf, other

    cs.IR

    ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

    Authors: Jizheng Chen, Kounianhua Du, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang

    Abstract: Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long cont… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  26. arXiv:2406.18762  [pdf, other

    cs.CL

    Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism

    Authors: Shi Zong, Jimmy Lin

    Abstract: There have been a huge number of benchmarks proposed to evaluate how large language models (LLMs) behave for logic inference tasks. However, it remains an open question how to properly evaluate this ability. In this paper, we provide a systematic overview of prior works on the logical reasoning ability of LLMs for analyzing categorical syllogisms. We first investigate all the possible variations f… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  27. arXiv:2406.18197  [pdf, other

    cs.CV

    Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

    Authors: Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

    Abstract: Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  28. arXiv:2406.18019  [pdf, other

    cs.RO

    Continuous Execution of High-Level Collaborative Tasks for Heterogeneous Robot Teams

    Authors: Amy Fang, Tenny Yin, Jiawei Lin, Hadas Kress-Gazit

    Abstract: We propose a control synthesis framework for a heterogeneous multi-robot system to satisfy collaborative tasks, where actions may take varying duration of time to complete. We encode tasks using the discrete logic LTL^ψ, which uses the concept of bindings to interleave robot actions and express information about relationship between specific task requirements and robot assignments. We present a sy… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Under review in IEEE Transactions on Robotics

  29. arXiv:2406.17413  [pdf, other

    cs.CV

    Depth-Guided Semi-Supervised Instance Segmentation

    Authors: Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framewo… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 4 tables

  30. arXiv:2406.16828  [pdf, other

    cs.IR cs.AI cs.CL

    Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

    Authors: Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin

    Abstract: Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the tradi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  31. arXiv:2406.16473  [pdf, other

    cs.CV cs.AI

    Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

    Abstract: The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  32. arXiv:2406.16459  [pdf, other

    cs.CV

    Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

    Authors: Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

    Abstract: The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  33. arXiv:2406.15396  [pdf, other

    cs.CV cs.AI cs.LG

    Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

    Authors: Jerry Chun-Wei Lin, Pi-Wei Chen, Chao-Chun Chen

    Abstract: Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting fro… ▽ More

    Submitted 30 April, 2024; originally announced June 2024.

    Comments: 12 pages

  34. arXiv:2406.14548  [pdf, other

    cs.LG cs.CV

    Consistency Models Made Easy

    Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

    Abstract: Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  35. arXiv:2406.13919  [pdf, other

    cs.AI

    SPL: A Socratic Playground for Learning Powered by Large Language Model

    Authors: Liang Zhang, Jionghao Lin, Ziyi Kuang, Sheng Xu, Mohammed Yeasin, Xiangen Hu

    Abstract: Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) su… ▽ More

    Submitted 20 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  36. arXiv:2406.12809  [pdf, other

    cs.CL

    Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

    Authors: Zhe Yang, Yichang Zhang, Tianyu Liu, Jian Yang, Junyang Lin, Chang Zhou, Zhifang Sui

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistenc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 25 pages, 12 figures, 10 tables

  37. arXiv:2406.12580  [pdf, other

    cs.IR

    Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation

    Authors: Chengkai Liu, Jianghao Lin, Hanzhou Liu, Jianling Wang, James Caverlee

    Abstract: Sequential recommender systems aims to predict the users' next interaction through user behavior modeling with various operators like RNNs and attentions. However, existing models generally fail to achieve the three golden principles for sequential recommendation simultaneously, i.e., training efficiency, low-cost inference, and strong performance. To this end, we propose RecBLR, an Efficient Sequ… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  38. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

    Authors: Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng Wang

    Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D vi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (2024)

  39. arXiv:2406.12326  [pdf, other

    cs.SE cs.AI

    Toward Exploring the Code Understanding Capabilities of Pre-trained Code Generation Models

    Authors: Jiayi Lin, Yutao Xie, Yue Yu, Yibiao Yang, Lei Zhang

    Abstract: Recently, large code generation models trained in a self-supervised manner on extensive unlabeled programming language data have achieved remarkable success. While these models acquire vast amounts of code knowledge, they perform poorly on code understanding tasks, such as code search and clone detection, as they are specifically trained for generation. Pre-training a larger encoder-only architect… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

  40. arXiv:2406.11921  [pdf, other

    cs.LG cs.AI

    Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

    Authors: Jiaqi Lin, Qianqian Ren

    Abstract: Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- an… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  41. arXiv:2406.11251  [pdf, other

    cs.IR

    Unifying Multimodal Retrieval via Document Screenshot Embedding

    Authors: Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

    Abstract: In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards docu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  42. arXiv:2406.10393  [pdf, other

    cs.CL

    EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

    Authors: Mohammad Dehghan, Mohammad Ali Alomrani, Sunyam Bagga, David Alfonso-Hermelo, Khalil Bibi, Abbas Ghaddar, Yingxue Zhang, Xiaoguang Li, Jianye Hao, Qun Liu, Jimmy Lin, Boxing Chen, Prasanna Parthasarathi, Mahdi Biparva, Mehdi Rezagholizadeh

    Abstract: The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  43. arXiv:2406.09401  [pdf, other

    cs.CV cs.AI cs.RO

    MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

    Authors: Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

    Abstract: With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Follow-up of EmbodiedScan. A multi-modal 3D dataset with the most-ever comprehensive language annotations for 3D-LLMs. Project page: https://tai-wang.github.io/mmscan/

  44. arXiv:2406.09355  [pdf, other

    cs.IR

    Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models

    Authors: Manveer Singh Tamber, Jasper Xian, Jimmy Lin

    Abstract: Embedding models that generate representation vectors from natural language text are widely used, reflect substantial investments, and carry significant commercial value. Companies such as OpenAI and Cohere have developed competing embedding models accessed through APIs that require users to pay for usage. In this architecture, the models are "hidden" behind APIs, but this does not mean that they… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  45. arXiv:2406.08482  [pdf, other

    cs.CV cs.CL

    Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

    Authors: Raphael Tang, Xinyu Zhang, Lixinyu Xu, Yao Lu, Wenyan Li, Pontus Stenetorp, Jimmy Lin, Ferhan Ture

    Abstract: Diffusion models are the state of the art in text-to-image generation, but their perceptual variability remains understudied. In this paper, we examine how prompts affect image variability in black-box diffusion-based models. We propose W1KP, a human-calibrated measure of variability in a set of images, bootstrapped from existing image-pair perceptual distances. Current datasets do not cover recen… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  46. arXiv:2406.07437  [pdf, other

    cs.SD eess.AS

    Graph-based multi-Feature fusion method for speech emotion recognition

    Authors: Xueyu Liu, Jie Lin, Chao Wang

    Abstract: Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogen… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  47. arXiv:2406.06986  [pdf, other

    cs.LG

    DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach

    Authors: Zhang Liu, Hongyang Du, Junzhe Lin, Zhibin Gao, Lianfen Huang, Seyyedali Hosseinalipour, Dusit Niyato

    Abstract: The rapid advancement of Artificial Intelligence (AI) has introduced Deep Neural Network (DNN)-based tasks to the ecosystem of vehicular networks. These tasks are often computation-intensive, requiring substantial computation resources, which are beyond the capability of a single vehicle. To address this challenge, Vehicular Edge Computing (VEC) has emerged as a solution, offering computing servic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures, and with extra appendix

  48. arXiv:2406.06519  [pdf, other

    cs.IR

    UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

    Authors: Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin

    Abstract: Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide huma… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures

  49. arXiv:2406.03835  [pdf, other

    cs.CV cs.RO

    Monocular Localization with Semantics Map for Autonomous Vehicles

    Authors: Jixiang Wan, Xudong Zhang, Shuzhou Dong, Yuwei Zhang, Yuchen Yang, Ruoxi Wu, Ye Jiang, Jijunnan Li, Jinquan Lin, Ming Yang

    Abstract: Accurate and robust localization remains a significant challenge for autonomous vehicles. The cost of sensors and limitations in local computational efficiency make it difficult to scale to large commercial applications. Traditional vision-based approaches focus on texture features that are susceptible to changes in lighting, season, perspective, and appearance. Additionally, the large storage siz… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  50. arXiv:2406.02883  [pdf, other

    cs.LG cs.CR

    Nonlinear Transformations Against Unlearnable Datasets

    Authors: Thushari Hapuarachchi, Jing Lin, Kaiqi Xiong, Mohamed Rahouti, Gitte Ost

    Abstract: Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. Notable approaches include Deepconfuse, error-minimizing, error-maximizing (also known as adversarial poisoning), Neural Tangent Generalization Attack, synthetic,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.