-
Non-Stabilizing Parallel Chip-Firing Games
Authors:
David Ji,
Michael Li,
Daniel Wang
Abstract:
In 2010, Kominers and Kominers proved that any parallel chip-firing game on $G(V,\,E)$ with $|σ|\geq 4|E|-|V|$ chips stabilizes. Recently, Bu, Choi, and Xu made the bound exact: all games with $|σ|< |E|$ chips or $|σ|> 3|E|-|V|$ chips stabilize. Meanwhile, Levine found a "devil's staircase'' pattern in the plot of the activity of parallel chip-firing games against their density of chips. The stabi…
▽ More
In 2010, Kominers and Kominers proved that any parallel chip-firing game on $G(V,\,E)$ with $|σ|\geq 4|E|-|V|$ chips stabilizes. Recently, Bu, Choi, and Xu made the bound exact: all games with $|σ|< |E|$ chips or $|σ|> 3|E|-|V|$ chips stabilize. Meanwhile, Levine found a "devil's staircase'' pattern in the plot of the activity of parallel chip-firing games against their density of chips. The stabilizing bound of Bu, Choi, and Xu corresponds to the top and bottom stairs of this staircase, in which the activity is 1 and 0, respectively. In this paper, we analyze the middle stair of the staircase, corresponding to activity $\frac{1}{2}$. We prove that all parallel chip-firing games with $2|E|-|V|< |σ|< 2|E|$ have period $T\neq 3,\,4$. In fact, this is exactly the range of $|σ|$ for which all games are non-stabilizing. We conjecture that all parallel chip-firing games with $2|E|-|V|< |σ|<2|E|$ have $T=2$ and thus activity $\frac{1}{2}$. This conjecture has been proven for trees by Bu, Choi, and Xu, cycles by Dall'asta, and complete graphs by Levine. We extend Levine's method of conjugate configurations to prove the conjecture on complete bipartite graphs $K_{a,a}$.
△ Less
Submitted 24 August, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery
Authors:
Daomin Ji,
Hui Luo,
Zhifeng Bao,
J. Shane Culpepper
Abstract:
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository th…
▽ More
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository that are capable of generating similar line charts. To solve this problem, we propose a novel approach called Fine-grained Cross-modal Relevance Learning Model (FCM), which aims to estimate the relevance between a line chart and a candidate dataset. To achieve this goal, FCM first employs a visual element extractor to extract informative visual elements, i.e., lines and y-ticks, from a line chart. Then, two novel segment-level encoders are adopted to learn representations for a line chart and a dataset, preserving fine-grained information, followed by a cross-modal matcher to match the learned representations in a fine-grained way. Furthermore, we extend FCM to support line chart queries generated based on data aggregation. Last, we propose a benchmark tailored for this problem since no such dataset exists. Extensive evaluation on the new benchmark verifies the effectiveness of our proposed method. Specifically, our proposed approach surpasses the best baseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More
Authors:
Tianrun Chen,
Ankang Lu,
Lanyun Zhu,
Chaotao Ding,
Chunan Yu,
Deyi Ji,
Zejian Li,
Lingyun Sun,
Papa Mao,
Ying Zang
Abstract:
The advent of large models, also known as foundation models, has significantly transformed the AI research landscape, with models like Segment Anything (SAM) achieving notable success in diverse image segmentation scenarios. Despite its advancements, SAM encountered limitations in handling some complex low-level segmentation tasks like camouflaged object and medical imaging. In response, in 2023,…
▽ More
The advent of large models, also known as foundation models, has significantly transformed the AI research landscape, with models like Segment Anything (SAM) achieving notable success in diverse image segmentation scenarios. Despite its advancements, SAM encountered limitations in handling some complex low-level segmentation tasks like camouflaged object and medical imaging. In response, in 2023, we introduced SAM-Adapter, which demonstrated improved performance on these challenging tasks. Now, with the release of Segment Anything 2 (SAM2), a successor with enhanced architecture and a larger training corpus, we reassess these challenges. This paper introduces SAM2-Adapter, the first adapter designed to overcome the persistent limitations observed in SAM2 and achieve new state-of-the-art (SOTA) results in specific downstream tasks including medical image segmentation, camouflaged (concealed) object detection, and shadow detection. SAM2-Adapter builds on the SAM-Adapter's strengths, offering enhanced generalizability and composability for diverse applications. We present extensive experimental results demonstrating SAM2-Adapter's effectiveness. We show the potential and encourage the research community to leverage the SAM2 model with our SAM2-Adapter for achieving superior segmentation outcomes. Code, pre-trained models, and data processing protocols are available at http://tianrun-chen.github.io/SAM-Adaptor/
△ Less
Submitted 10 August, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
Generative Sentiment Analysis via Latent Category Distribution and Constrained Decoding
Authors:
Jun Zhou,
Dongyang Yu,
Kamran Aziz,
Fangfang Su,
Qing Zhang,
Fei Li,
Donghong Ji
Abstract:
Fine-grained sentiment analysis involves extracting and organizing sentiment elements from textual data. However, existing approaches often overlook issues of category semantic inclusion and overlap, as well as inherent structural patterns within the target sequence. This study introduces a generative sentiment analysis model. To address the challenges related to category semantic inclusion and ov…
▽ More
Fine-grained sentiment analysis involves extracting and organizing sentiment elements from textual data. However, existing approaches often overlook issues of category semantic inclusion and overlap, as well as inherent structural patterns within the target sequence. This study introduces a generative sentiment analysis model. To address the challenges related to category semantic inclusion and overlap, a latent category distribution variable is introduced. By reconstructing the input of a variational autoencoder, the model learns the intensity of the relationship between categories and text, thereby improving sequence generation. Additionally, a trie data structure and constrained decoding strategy are utilized to exploit structural patterns, which in turn reduces the search space and regularizes the generation process. Experimental results on the Restaurant-ACOS and Laptop-ACOS datasets demonstrate a significant performance improvement compared to baseline models. Ablation experiments further confirm the effectiveness of latent category distribution and constrained decoding strategy.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Parallel chip-firing games on directed graphs
Authors:
David Ji,
Michael Li,
Daniel Wang
Abstract:
In 1992, Bitar and Goles introduced the parallel chip-firing game on undirected graphs. Two years later, Prisner extended the game to directed graphs. While the properties of parallel chip-firing games on undirected graphs have been extensively studied, their analogs for parallel chip-firing games on directed graphs have been sporadic. In this paper, we prove the outstanding analogs of the core re…
▽ More
In 1992, Bitar and Goles introduced the parallel chip-firing game on undirected graphs. Two years later, Prisner extended the game to directed graphs. While the properties of parallel chip-firing games on undirected graphs have been extensively studied, their analogs for parallel chip-firing games on directed graphs have been sporadic. In this paper, we prove the outstanding analogs of the core results of parallel chip-firing games on undirected graphs for those on directed graphs. We find the possible periods of a parallel chip-firing game on a directed simple cycle and use Gauss-Jordan Elimination on a Laplacian-like matrix to establish a lower bound on the maximum period of a parallel chip-firing game on a directed complete graph and a directed complete bipartite graph. Finally, we use the method of motors by Jiang, Scully, and Zhang to show that a binary string $s$ can be the atomic firing sequence of a vertex in a parallel chip-firing game on a strongly connected directed graph if and only if $s$ contains 1 or $s=0$.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Revisiting Structured Sentiment Analysis as Latent Dependency Graph Parsing
Authors:
Chengjie Zhou,
Bobo Li,
Hao Fei,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Structured Sentiment Analysis (SSA) was cast as a problem of bi-lexical dependency graph parsing by prior studies. Multiple formulations have been proposed to construct the graph, which share several intrinsic drawbacks: (1) The internal structures of spans are neglected, thus only the boundary tokens of spans are used for relation prediction and span recognition, thus hindering the model's expres…
▽ More
Structured Sentiment Analysis (SSA) was cast as a problem of bi-lexical dependency graph parsing by prior studies. Multiple formulations have been proposed to construct the graph, which share several intrinsic drawbacks: (1) The internal structures of spans are neglected, thus only the boundary tokens of spans are used for relation prediction and span recognition, thus hindering the model's expressiveness; (2) Long spans occupy a significant proportion in the SSA datasets, which further exacerbates the problem of internal structure neglect. In this paper, we treat the SSA task as a dependency parsing task on partially-observed dependency trees, regarding flat spans without determined tree annotations as latent subtrees to consider internal structures of spans. We propose a two-stage parsing method and leverage TreeCRFs with a novel constrained inside algorithm to model latent structures explicitly, which also takes advantages of joint scoring graph arcs and headed spans for global optimization and inference. Results of extensive experiments on five benchmark datasets reveal that our method performs significantly better than all previous bi-lexical methods, achieving new state-of-the-art.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
Authors:
Tianrun Chen,
Chaotao Ding,
Lanyun Zhu,
Tao Xu,
Deyi Ji,
Yan Wang,
Ying Zang,
Zejian Li
Abstract:
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (…
▽ More
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks and have demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, xLSTM-UNet we designed extend the success in biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. The code, models, and datasets are publicly available at http://tianrun-chen.github.io/xLSTM-UNet/
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation
Authors:
Deyi Ji,
Wenwei Jin,
Hongtao Lu,
Feng Zhao
Abstract:
The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these is…
▽ More
The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel \textbf{P}seudo Multi-\textbf{P}erspective \textbf{T}rans\textbf{former} network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Representation, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.
△ Less
Submitted 11 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm
Authors:
Qiang Gao,
Zixiang Meng,
Bobo Li,
Jun Zhou,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document ev…
▽ More
Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To build a benchmark, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task. Our work builds a new line of information extraction research and will attract new research attention.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
Authors:
Qiang Gao,
Bobo Li,
Zixiang Meng,
Yunlong Li,
Jun Zhou,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performan…
▽ More
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Discrete Latent Perspective Learning for Segmentation and Detection
Authors:
Deyi Ji,
Feng Zhao,
Lanyun Zhu,
Wenwei Jin,
Hongtao Lu,
Jieping Ye
Abstract:
In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework,…
▽ More
In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks. Extensive experiments demonstrate that DLPL significantly enhances the network's capacity to depict images across diverse scenarios (daily photos, UAV, auto-driving) and tasks (detection, segmentation).
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Development of high-level applications for High Energy Photon Source booster
Authors:
Yuemei Peng,
Daheng Ji,
Hongfei Ji,
Nan Li,
Xiaohan Lu,
Saike Tian,
Yuanyuan Wei,
Haisheng Xu,
Yaliang Zhao,
Yi Jiao,
Jingyi Li
Abstract:
The High Energy Photon Source (HEPS), is the first fourth-generation storage ring light source being built in the suburb of Beijing, China. The storage ring was designed with the emittance lower than 60 pm.rad with a circumference of 1.36 km and beam energy of 6 GeV. Its injector contains a 500 MeV S-band Linac and a 454 m booster which was designed as an accumulator at the extraction energy. In t…
▽ More
The High Energy Photon Source (HEPS), is the first fourth-generation storage ring light source being built in the suburb of Beijing, China. The storage ring was designed with the emittance lower than 60 pm.rad with a circumference of 1.36 km and beam energy of 6 GeV. Its injector contains a 500 MeV S-band Linac and a 454 m booster which was designed as an accumulator at the extraction energy. In the energy ramping control design of HEPS booster, the ramping process was programed to be able to stop and stay at any energy between the injection energy and the extraction energy. This feature enables us to conduct energy-dependent machine studies and ramping curve optimization. The beam commissioning of HEPS Linac finished in June, 2023. And the beam commissioning of booster started in the end of July, 2023. In November 17, main target values proposed in the preliminary design report has been reached. The high-level applications (HLAs) are essential tools for beam commissioning. The development of HLAs, which are based on the framework named Python accelerator physics application set (Pyapas), started in the end of 2021. The HEPS physics team spent more than one year to develop and test the HLAs to meet the requirements of beam commissioning of the booster. Thanks to the modular design, the principle based on physical quantities, and the ability of running simulation models online from the Pyapas, the development efficiency and reliability of the HLAs have been greatly improved. In particular, the principle based on physical quantities allows us to control the beam more intuitively.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models
Authors:
Tianrun Chen,
Chunan Yu,
Jing Li,
Jianqi Zhang,
Lanyun Zhu,
Deyi Ji,
Yong Zhang,
Ying Zang,
Zejian Li,
Lingyun Sun
Abstract:
In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation for parts searching and localization for objects, which is a new paradigm to 3D segmentation that transcends limitations for previous category-specific 3D semantic segmentation, 3D instance segmentation, and open-vocabulary 3D segmentation. We design a simple baseline method, Reasoning3D, with the capability to understand…
▽ More
In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation for parts searching and localization for objects, which is a new paradigm to 3D segmentation that transcends limitations for previous category-specific 3D semantic segmentation, 3D instance segmentation, and open-vocabulary 3D segmentation. We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands for (fine-grained) segmenting specific parts for 3D meshes with contextual awareness and reasoned answers for interactive segmentation. Specifically, Reasoning3D leverages an off-the-shelf pre-trained 2D segmentation network, powered by Large Language Models (LLMs), to interpret user input queries in a zero-shot manner. Previous research have shown that extensive pre-training endows foundation models with prior world knowledge, enabling them to comprehend complex commands, a capability we can harness to "segment anything" in 3D with limited 3D datasets (source efficient). Experimentation reveals that our approach is generalizable and can effectively localize and highlight parts of 3D objects (in 3D mesh) based on implicit textual queries, including these articulated 3d objects and real-world scanned data. Our method can also generate natural language explanations corresponding to these 3D models and the decomposition. Moreover, our training-free approach allows rapid deployment and serves as a viable universal baseline for future research of part-level 3d (semantic) object understanding in various fields including robotics, object manipulation, part assembly, autonomous driving applications, augment reality and virtual reality (AR/VR), and medical applications. The code, the model weight, the deployment guide, and the evaluation protocol are: http://tianrun-chen.github.io/Reason3D/
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Authors:
Lingdong Kong,
Shaoyuan Xie,
Hanjiang Hu,
Yaru Niu,
Wei Tsang Ooi,
Benoit R. Cottereau,
Lai Xing Ng,
Yuexin Ma,
Wenwei Zhang,
Liang Pan,
Kai Chen,
Ziwei Liu,
Weichao Qiu,
Wei Zhang,
Xu Cao,
Hao Lu,
Ying-Cong Chen,
Caixin Kang,
Xinning Zhou,
Chengyang Ying,
Wentao Shang,
Xingxing Wei,
Yinpeng Dong,
Bo Yang,
Shengyin Jiang
, et al. (66 additional authors not shown)
Abstract:
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c…
▽ More
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Novel Topological Machine Learning Methodology for Stream-of-Quality Modeling in Smart Manufacturing
Authors:
Jay Lee,
Dai-Yan Ji,
Yuan-Ming Hsu
Abstract:
This paper presents a topological analytics approach within the 5-level Cyber-Physical Systems (CPS) architecture for the Stream-of-Quality assessment in smart manufacturing. The proposed methodology not only enables real-time quality monitoring and predictive analytics but also discovers the hidden relationships between quality features and process parameters across different manufacturing proces…
▽ More
This paper presents a topological analytics approach within the 5-level Cyber-Physical Systems (CPS) architecture for the Stream-of-Quality assessment in smart manufacturing. The proposed methodology not only enables real-time quality monitoring and predictive analytics but also discovers the hidden relationships between quality features and process parameters across different manufacturing processes. A case study in additive manufacturing was used to demonstrate the feasibility of the proposed methodology to maintain high product quality and adapt to product quality variations. This paper demonstrates how topological graph visualization can be effectively used for the real-time identification of new representative data through the Stream-of-Quality assessment.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Theta oscillons in behaving rats
Authors:
M. S. Zobaer,
N. Lotfi,
C. M. Domenico,
C. Hoffman,
L. Perotti,
D. Ji,
Y. Dabaghian
Abstract:
Recently discovered constituents of the brain waves -- the oscillons -- provide high-resolution representation of the extracellular field dynamics. Here we study the most robust, highest-amplitude oscillons that manifest in actively behaving rats and generally correspond to the traditional theta-waves. We show that the resemblances between theta-oscillons and the conventional theta-waves apply to…
▽ More
Recently discovered constituents of the brain waves -- the oscillons -- provide high-resolution representation of the extracellular field dynamics. Here we study the most robust, highest-amplitude oscillons that manifest in actively behaving rats and generally correspond to the traditional theta-waves. We show that the resemblances between theta-oscillons and the conventional theta-waves apply to the ballpark characteristics -- mean frequencies, amplitudes, and bandwidths. In addition, both hippocampal and cortical oscillons exhibit a number of intricate, behavior-attuned, transient properties that suggest a new vantage point for understanding the theta-rhythms' structure, origins and functions. We demonstrate that oscillons are frequency-modulated waves, with speed-controlled parameters, embedded into a noise background. We also use a basic model of neuronal synchronization to contextualize and to interpret the observed phenomena. In particular, we argue that the synchronicity level in physiological networks is fairly weak and modulated by the animal's locomotion.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Altered patterning of neural activity in a tauopathy mouse model
Authors:
C. Hoffman,
J. Cheng,
R. Morales,
D. Ji,
Y. Dabaghian
Abstract:
Alzheimer's disease (AD) is a complex neurodegenerative condition that manifests at multiple levels and involves a spectrum of abnormalities ranging from the cellular to cognitive. Here, we investigate the impact of AD-related tau-pathology on hippocampal circuits in mice engaged in spatial navigation, and study changes of neuronal firing and dynamics of extracellular fields. While most studies ar…
▽ More
Alzheimer's disease (AD) is a complex neurodegenerative condition that manifests at multiple levels and involves a spectrum of abnormalities ranging from the cellular to cognitive. Here, we investigate the impact of AD-related tau-pathology on hippocampal circuits in mice engaged in spatial navigation, and study changes of neuronal firing and dynamics of extracellular fields. While most studies are based on analyzing instantaneous or time-averaged characteristics of neuronal activity, we focus on intermediate timescales -- spike trains and waveforms of oscillatory potentials, which we consider as single entities. We find that, in healthy mice, spike arrangements and wave patterns (series of crests or troughs) are coupled to the animal's location, speed, and acceleration. In contrast, in tau-mice, neural activity is structurally disarrayed: brainwave cadence is detached from locomotion, spatial selectivity is lost, the spike flow is scrambled. Importantly, these alterations start early and accumulate with age, which exposes progressive disinvolvement the hippocampus circuit in spatial navigation. These features highlight qualitatively different neurodynamics than the ones provided by conventional analyses, and are more salient, thus revealing a new level of the hippocampal circuit disruptions.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Modeling Unified Semantic Discourse Structure for High-quality Headline Generation
Authors:
Minghui Xu,
Hao Fei,
Fei Li,
Shengqiong Wu,
Rui Sun,
Chong Teng,
Donghong Ji
Abstract:
Headline generation aims to summarize a long document with a short, catchy title that reflects the main idea. This requires accurately capturing the core document semantics, which is challenging due to the lengthy and background information-rich na ture of the texts. In this work, We propose using a unified semantic discourse structure (S3) to represent document semantics, achieved by combining do…
▽ More
Headline generation aims to summarize a long document with a short, catchy title that reflects the main idea. This requires accurately capturing the core document semantics, which is challenging due to the lengthy and background information-rich na ture of the texts. In this work, We propose using a unified semantic discourse structure (S3) to represent document semantics, achieved by combining document-level rhetorical structure theory (RST) trees with sentence-level abstract meaning representation (AMR) graphs to construct S3 graphs. The hierarchical composition of sentence, clause, and word intrinsically characterizes the semantic meaning of the overall document. We then develop a headline generation framework, in which the S3 graphs are encoded as contextual features. To consolidate the efficacy of S3 graphs, we further devise a hierarchical structure pruning mechanism to dynamically screen the redundant and nonessential nodes within the graph. Experimental results on two headline generation datasets demonstrate that our method outperforms existing state-of-art methods consistently. Our work can be instructive for a broad range of document modeling tasks, more than headline or summarization generation.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV
Authors:
Deyi Ji,
Siqi Gao,
Lanyun Zhu,
Qi Zhu,
Yiru Zhao,
Peng Xu,
Hongtao Lu,
Feng Zhao,
Jieping Ye
Abstract:
In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association…
▽ More
In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects, which complicates tracking. To overcome these issues, we propose a novel universal HomView-MOT framework, which for the first time, harnesses the view Homography inherent in changing scenes to solve MOT challenges in moving environments, incorporating Homographic Matching and View-Centric concepts. We introduce a Fast Homography Estimation (FHE) algorithm for rapid computation of Homography matrices between video frames, enabling object View-Centric ID Learning (VCIL) and leveraging multi-view Homography to learn cross-view ID features. Concurrently, our Homographic Matching Filter (HMF) maps object bounding boxes from different frames onto a common view plane for a more realistic physical IOU association. Extensive experiments have proven that these innovations allow HomView-MOT to achieve state-of-the-art performance on prominent UAV MOT datasets VisDrone and UAVDT.
△ Less
Submitted 14 May, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection
Authors:
Gyusam Chang,
Wonseok Roh,
Sujin Jang,
Dongwook Lee,
Daehyun Ji,
Gyeongrok Oh,
Jinsun Park,
Jinkyu Kim,
Sangpil Kim
Abstract:
Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image moda…
▽ More
Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image modality (i.e., camera images) as an effective semantic bridge to close the domain gap in the cross-modal Bird's Eye View (BEV) representations. Further, (ii) we also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features, which disrupt the discrimination of whether a feature instance comes from a source or an unseen target domain. Overall, our CMDA framework guides the 3DOD model to generate highly informative and domain-adaptive features for novel data distributions. In our extensive experiments with large-scale benchmarks, such as nuScenes, Waymo, and KITTI, those mentioned above provide significant performance gains for UDA tasks, achieving state-of-the-art performance.
△ Less
Submitted 6 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
Authors:
Lanyun Zhu,
Deyi Ji,
Tianrun Chen,
Peng Xu,
Jieping Ye,
Jun Liu
Abstract:
Despite achieving rapid developments and with widespread applications, Large Vision-Language Models (LVLMs) confront a serious challenge of being prone to generating hallucinations. An over-reliance on linguistic priors has been identified as a key factor leading to these hallucinations. In this paper, we propose to alleviate this problem by introducing a novel image-biased decoding (IBD) techniqu…
▽ More
Despite achieving rapid developments and with widespread applications, Large Vision-Language Models (LVLMs) confront a serious challenge of being prone to generating hallucinations. An over-reliance on linguistic priors has been identified as a key factor leading to these hallucinations. In this paper, we propose to alleviate this problem by introducing a novel image-biased decoding (IBD) technique. Our method derives the next-token probability distribution by contrasting predictions from a conventional LVLM with those of an image-biased LVLM, thereby amplifying the correct information highly correlated with image content while mitigating the hallucinatory errors caused by excessive dependence on text. We further conduct a comprehensive statistical analysis to validate the reliability of our method, and design an adaptive adjustment strategy to achieve robust and flexible handling under varying conditions. Experimental results across multiple evaluation metrics verify that our method, despite not requiring additional training data and only with a minimal increase in model parameters, can significantly reduce hallucinations in LVLMs and enhance the truthfulness of the generated response.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
CMNER: A Chinese Multimodal NER Dataset based on Social Media
Authors:
Yuanze Ji,
Bobo Li,
Jun Zhou,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Multimodal Named Entity Recognition (MNER) is a pivotal task designed to extract named entities from text with the support of pertinent images. Nonetheless, a notable paucity of data for Chinese MNER has considerably impeded the progress of this natural language processing task within the Chinese domain. Consequently, in this study, we compile a Chinese Multimodal NER dataset (CMNER) utilizing dat…
▽ More
Multimodal Named Entity Recognition (MNER) is a pivotal task designed to extract named entities from text with the support of pertinent images. Nonetheless, a notable paucity of data for Chinese MNER has considerably impeded the progress of this natural language processing task within the Chinese domain. Consequently, in this study, we compile a Chinese Multimodal NER dataset (CMNER) utilizing data sourced from Weibo, China's largest social media platform. Our dataset encompasses 5,000 Weibo posts paired with 18,326 corresponding images. The entities are classified into four distinct categories: person, location, organization, and miscellaneous. We perform baseline experiments on CMNER, and the outcomes underscore the effectiveness of incorporating images for NER. Furthermore, we conduct cross-lingual experiments on the publicly available English MNER dataset (Twitter2015), and the results substantiate our hypothesis that Chinese and English multimodal NER data can mutually enhance the performance of the NER model.
△ Less
Submitted 1 March, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements
Authors:
Bingbing Zhang,
Shuo Liu,
Shanmin Zhou,
Daxiong Ji,
Tao Wang,
Tian Xia,
Wen Xu
Abstract:
We present a sensor misalignment-tolerant AUV navigation method that leverages measurements from an acoustic array and dead reckoned information. Recent studies have demonstrated the potential use of passive acoustic Direction of Arrival (DoA) measurements for AUV navigation without requiring ranging measurements. However, the sensor misalignment between the acoustic array and the attitude sensor…
▽ More
We present a sensor misalignment-tolerant AUV navigation method that leverages measurements from an acoustic array and dead reckoned information. Recent studies have demonstrated the potential use of passive acoustic Direction of Arrival (DoA) measurements for AUV navigation without requiring ranging measurements. However, the sensor misalignment between the acoustic array and the attitude sensor was not accounted for. Such misalignment may deteriorate the navigation accuracy. This paper proposes a novel approach that allows simultaneous AUV navigation, beacon localization, and sensor alignment. An Unscented Kalman Filter (UKF) that enables the necessary calculations to be completed at an affordable computational load is developed. A Nonlinear Least Squares (NLS)-based technique is employed to find an initial solution for beacon localization and sensor alignment as early as possible using a short-term window of measurements. Experimental results demonstrate the performance of the proposed method.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Disentanglement Provides a Unified Estimation for Quantum Entropies and Distance Measures
Authors:
Myeongjin Shin,
Seungwoo Lee,
Junseo Lee,
Mingyu Lee,
Donghwa Ji,
Hyeonjun Yeo,
Kabgyun Jeong
Abstract:
The estimation of quantum entropies and distance measures, such as von Neumann entropy, Rényi entropy, Tsallis entropy, trace distance, and fidelity-induced distances like Bures distance, has been a key area of research. This paper introduces a unified approach using Disentangling Quantum Neural Networks (DEQNN) for estimating these quantities, leveraging continuity bounds and disentanglement in t…
▽ More
The estimation of quantum entropies and distance measures, such as von Neumann entropy, Rényi entropy, Tsallis entropy, trace distance, and fidelity-induced distances like Bures distance, has been a key area of research. This paper introduces a unified approach using Disentangling Quantum Neural Networks (DEQNN) for estimating these quantities, leveraging continuity bounds and disentanglement in the cost function design. Our mathematical proof demonstrates that DEQNN can preserve quantum entropies and distances in smaller partial states, making them suitable for further estimation. This method is scalable to an arbitrary number of quantum states and is particularly effective for less complex quantum systems. Numerical simulations validate our approach, and we also discuss strategies to enhance trainability and avoid barren plateaus.
△ Less
Submitted 29 July, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset
Authors:
Deyi Ji,
Siqi Gao,
Mingyuan Tao,
Hongtao Lu,
Feng Zhao
Abstract:
Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to fa…
▽ More
Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to facilitate the community. To this end, the ChangeNet dataset is presented especially for multi-temporal change detection, along with the new task of "Asymmetric Change Detection". Specifically, ChangeNet consists of 31,000 multi-temporal images pairs, a wide range of complex scenes from 100 cities, and 6 pixel-level annotated categories, which is far superior to all the existing change detection datasets including LEVIR-CD, WHU Building CD, etc.. In addition, ChangeNet contains amounts of real-world perspective distortions in different temporal phases on the same areas, which is able to promote the practical application of change detection algorithms. The ChangeNet dataset is suitable for both binary change detection (BCD) and semantic change detection (SCD) tasks. Accordingly, we benchmark the ChangeNet dataset on six BCD methods and two SCD methods, and extensive experiments demonstrate its challenges and great significance. The dataset is available at https://github.com/jankyee/ChangeNet.
△ Less
Submitted 11 April, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought
Authors:
Li Zheng,
Hao Fei,
Fei Li,
Bobo Li,
Lizi Liao,
Donghong Ji,
Chong Teng
Abstract:
With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened i…
▽ More
With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened intricacy and informational density. In this paper, inspired by the human cognitive process of progressively excluding options, we propose a three-step Reverse Exclusion Graph-of-Thought (ReX-GoT) framework, including Option Exclusion, Error Analysis, and Combine Information. Specifically, our ReX-GoT mimics human reasoning by gradually excluding irrelevant options and learning the reasons for option errors to choose the optimal path of the GoT and ultimately infer the correct answer. By progressively integrating intricate clues, our method effectively reduces the difficulty of multi-choice reasoning and provides a novel solution for DC-MCQ. Extensive experiments on the CICERO and CICERO$_{v2}$ datasets validate the significant improvement of our approach on DC-MCQ task. On zero-shot setting, our model outperform the best baseline by 17.67% in terms of F1 score for the multi-choice task. Most strikingly, our GPT3.5-based ReX-GoT framework achieves a remarkable 39.44% increase in F1 score.
△ Less
Submitted 26 December, 2023; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Compositional Generalization for Multi-label Text Classification: A Data-Augmentation Approach
Authors:
Yuyang Chai,
Zhuang Li,
Jiahui Liu,
Lei Chen,
Fei Li,
Donghong Ji,
Chong Teng
Abstract:
Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text…
▽ More
Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text classification models. Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models' capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines.
△ Less
Submitted 20 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
LLaFS: When Large Language Models Meet Few-Shot Segmentation
Authors:
Lanyun Zhu,
Tianrun Chen,
Deyi Ji,
Jieping Ye,
Jun Liu
Abstract:
This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-sho…
▽ More
This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks.
△ Less
Submitted 3 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
OceanGPT: A Large Language Model for Ocean Science Tasks
Authors:
Zhen Bi,
Ningyu Zhang,
Yida Xue,
Yixin Ou,
Daxiong Ji,
Guozhou Zheng,
Huajun Chen
Abstract:
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, an…
▽ More
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reasons are the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever large language model in the ocean domain, which is expert in various ocean science tasks. We also propose OceanGPT, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology.
△ Less
Submitted 3 September, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Authors:
Bobo Li,
Hao Fei,
Lizi Liao,
Yu Zhao,
Chong Teng,
Tat-Seng Chua,
Donghong Ji,
Fei Li
Abstract:
It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard m…
▽ More
It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.
△ Less
Submitted 12 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs
Authors:
Yiyun Xiong,
Mengwei Dai,
Fei Li,
Hao Fei,
Bobo Li,
Shengqiong Wu,
Donghong Ji,
Chong Teng
Abstract:
Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument r…
▽ More
Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument relations is expected to be enhanced. In DialogRE^C+ dataset, we manually annotate total 5,068 coreference chains over 36,369 argument mentions based on the existing DialogRE data, where four different coreference chain types namely speaker chain, person chain, location chain and organization chain are explicitly marked. We further develop 4 coreference-enhanced graph-based DRE models, which learn effective coreference representations for improving the DRE task. We also train a coreference resolution model based on our annotations and evaluate the effect of automatically extracted coreference chains demonstrating the practicality of our dataset and its potential to other domains and tasks.
△ Less
Submitted 12 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition
Authors:
Li Zheng,
Fei Li,
Yuyang Chai,
Chong Teng,
Donghong Ji
Abstract:
The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and a…
▽ More
The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and act labels, which leads to an insufficient ability to capture rich sentiment and act clues and hinders effective and accurate reasoning. To address these issues, we propose a Bi-directional Multi-hop Inference Model (BMIM) that leverages a feature selection network and a bi-directional multi-hop inference network to iteratively extract and integrate rich sentiment and act clues in a bi-directional manner. We also employ contrastive learning and dual learning to explicitly model the correlations of sentiment and act labels. Our experiments on two widely-used datasets show that BMIM outperforms state-of-the-art baselines by at least 2.6% on F1 score in DAR and 1.4% on F1 score in DSC. Additionally, Our proposed model not only improves the performance but also enhances the interpretability of the joint sentiment and act prediction task.
△ Less
Submitted 12 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation
Authors:
Deyi Ji,
Feng Zhao,
Hongtao Lu
Abstract:
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ($\mathcal{T}$)-CNN ($\mathcal{C}$) mutual leaning framework…
▽ More
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ($\mathcal{T}$)-CNN ($\mathcal{C}$) mutual leaning framework, where $\mathcal{T}$ takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while $\mathcal{C}$ takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, $\mathcal{T}$ partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by $\mathcal{C}$ are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.
△ Less
Submitted 5 July, 2023; v1 submitted 2 July, 2023;
originally announced July 2023.
-
Revisiting Conversation Discourse for Dialogue Disentanglement
Authors:
Bobo Li,
Hao Fei,
Fei Li,
Shengqiong Wu,
Lizi Liao,
Yinwei Wei,
Tat-Seng Chua,
Donghong Ji
Abstract:
Dialogue disentanglement aims to detach the chronologically ordered utterances into several independent sessions. Conversation utterances are essentially organized and described by the underlying discourse, and thus dialogue disentanglement requires the full understanding and harnessing of the intrinsic discourse attribute. In this paper, we propose enhancing dialogue disentanglement by taking ful…
▽ More
Dialogue disentanglement aims to detach the chronologically ordered utterances into several independent sessions. Conversation utterances are essentially organized and described by the underlying discourse, and thus dialogue disentanglement requires the full understanding and harnessing of the intrinsic discourse attribute. In this paper, we propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics. First of all, in feature encoding stage, we construct the heterogeneous graph representations to model the various dialogue-specific discourse structural features, including the static speaker-role structures (i.e., speaker-utterance and speaker-mentioning structure) and the dynamic contextual structures (i.e., the utterance-distance and partial-replying structure). We then develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context. Second, in model learning stage, we perform optimization with a hierarchical ranking loss mechanism, which groups dialogue utterances into different discourse levels and carries training covering pair-wise and session-wise levels hierarchically. Third, in inference stage, we devise an easy-first decoding algorithm, which performs utterance pairing under the easy-to-hard manner with a global context, breaking the constraint of traditional sequential decoding order. On two benchmark datasets, our overall system achieves new state-of-the-art performances on all evaluations. In-depth analyses further demonstrate the efficacy of each proposed idea and also reveal how our methods help advance the task. Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
△ Less
Submitted 10 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
TKDP: Threefold Knowledge-enriched Deep Prompt Tuning for Few-shot Named Entity Recognition
Authors:
Jiang Liu,
Hao Fei,
Fei Li,
Jingye Li,
Bobo Li,
Liang Zhao,
Chong Teng,
Donghong Ji
Abstract:
Few-shot named entity recognition (NER) exploits limited annotated instances to identify named mentions. Effectively transferring the internal or external resources thus becomes the key to few-shot NER. While the existing prompt tuning methods have shown remarkable few-shot performances, they still fail to make full use of knowledge. In this work, we investigate the integration of rich knowledge t…
▽ More
Few-shot named entity recognition (NER) exploits limited annotated instances to identify named mentions. Effectively transferring the internal or external resources thus becomes the key to few-shot NER. While the existing prompt tuning methods have shown remarkable few-shot performances, they still fail to make full use of knowledge. In this work, we investigate the integration of rich knowledge to prompt tuning for stronger few-shot NER. We propose incorporating the deep prompt tuning framework with threefold knowledge (namely TKDP), including the internal 1) context knowledge and the external 2) label knowledge & 3) sememe knowledge. TKDP encodes the three feature sources and incorporates them into the soft prompt embeddings, which are further injected into an existing pre-trained language model to facilitate predictions. On five benchmark datasets, our knowledge-enriched model boosts by at most 11.53% F1 over the raw deep prompt method, and significantly outperforms 8 strong-performing baseline systems in 5-/10-/20-shot settings, showing great potential in few-shot NER. Our TKDP can be broadly adapted to other few-shot tasks without effort.
△ Less
Submitted 10 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
ECQED: Emotion-Cause Quadruple Extraction in Dialogs
Authors:
Li Zheng,
Donghong Ji,
Fei Li,
Hao Fei,
Shengqiong Wu,
Jingye Li,
Bobo Li,
Chong Teng
Abstract:
The existing emotion-cause pair extraction (ECPE) task, unfortunately, ignores extracting the emotion type and cause type, while these fine-grained meta-information can be practically useful in real-world applications, i.e., chat robots and empathic dialog generation. Also the current ECPE is limited to the scenario of single text piece, while neglecting the studies at dialog level that should hav…
▽ More
The existing emotion-cause pair extraction (ECPE) task, unfortunately, ignores extracting the emotion type and cause type, while these fine-grained meta-information can be practically useful in real-world applications, i.e., chat robots and empathic dialog generation. Also the current ECPE is limited to the scenario of single text piece, while neglecting the studies at dialog level that should have more realistic values. In this paper, we extend the ECPE task with a broader definition and scenario, presenting a new task, Emotion-Cause Quadruple Extraction in Dialogs (ECQED), which requires detecting emotion-cause utterance pairs and emotion and cause types. We present an ECQED model based on a structural and semantic heterogeneous graph as well as a parallel grid tagging scheme, which advances in effectively incorporating the dialog context structure, meanwhile solving the challenging overlapped quadruple issue. Via experiments we show that introducing the fine-grained emotion and cause features evidently helps better dialog generation. Also our proposed ECQED system shows exceptional superiority over baselines on both the emotion-cause quadruple or pair extraction tasks, meanwhile being highly efficient.
△ Less
Submitted 10 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing
Authors:
Zhuang Li,
Yuyang Chai,
Terry Yue Zhuo,
Lizhen Qu,
Gholamreza Haffari,
Fei Li,
Donghong Ji,
Quan Hung Tran
Abstract:
Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval. However, existing scene graph parsers that convert image captions into scene graphs often suffer from two types of errors. First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resu…
▽ More
Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval. However, existing scene graph parsers that convert image captions into scene graphs often suffer from two types of errors. First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resulting in a lack of faithfulness. Second, the generated scene graphs have high inconsistency, with the same semantics represented by different annotations.
To address these challenges, we propose a novel dataset, which involves re-annotating the captions in Visual Genome (VG) using a new intermediate representation called FACTUAL-MR. FACTUAL-MR can be directly converted into faithful and consistent scene graph annotations. Our experimental results clearly demonstrate that the parser trained on our dataset outperforms existing approaches in terms of faithfulness and consistency. This improvement leads to a significant performance boost in both image caption evaluation and zero-shot image retrieval tasks. Furthermore, we introduce a novel metric for measuring scene graph similarity, which, when combined with the improved scene graph parser, achieves state-of-the-art (SOTA) results on multiple benchmark datasets for the aforementioned tasks. The code and dataset are available at https://github.com/zhuang-li/FACTUAL .
△ Less
Submitted 1 June, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark
Authors:
Deyi Ji,
Feng Zhao,
Hongtao Lu,
Mingyuan Tao,
Jieping Ye
Abstract:
With the increasing interest and rapid development of methods for Ultra-High Resolution (UHR) segmentation, a large-scale benchmark covering a wide range of scenes with full fine-grained dense annotations is urgently needed to facilitate the field. To this end, the URUR dataset is introduced, in the meaning of Ultra-High Resolution dataset with Ultra-Rich Context. As the name suggests, URUR contai…
▽ More
With the increasing interest and rapid development of methods for Ultra-High Resolution (UHR) segmentation, a large-scale benchmark covering a wide range of scenes with full fine-grained dense annotations is urgently needed to facilitate the field. To this end, the URUR dataset is introduced, in the meaning of Ultra-High Resolution dataset with Ultra-Rich Context. As the name suggests, URUR contains amounts of images with high enough resolution (3,008 images of size 5,120x5,120), a wide range of complex scenes (from 63 cities), rich-enough context (1 million instances with 8 categories) and fine-grained annotations (about 80 billion manually annotated pixels), which is far superior to all the existing UHR datasets including DeepGlobe, Inria Aerial, UDD, etc.. Moreover, we also propose WSDNet, a more efficient and effective framework for UHR segmentation especially with ultra-rich context. Specifically, multi-level Discrete Wavelet Transform (DWT) is naturally integrated to release computation burden while preserve more spatial details, along with a Wavelet Smooth Loss (WSL) to reconstruct original structured context and texture with a smooth constrain. Experiments on several UHR datasets demonstrate its state-of-the-art performance. The dataset is available at https://github.com/jankyee/URUR.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Authors:
Deyi Ji,
Haoran Wang,
Mingyuan Tao,
Jianqiang Huang,
Xian-Sheng Hua,
Hongtao Lu
Abstract:
Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-lev…
▽ More
Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.
△ Less
Submitted 5 July, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training
Authors:
Hao Fei,
Tat-Seng Chua,
Chenliang Li,
Donghong Ji,
Meishan Zhang,
Yafeng Ren
Abstract:
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind the social media texts or reviews, which has been a fundamental application to the real-world society. Since the early 2010s, ABSA has achieved extraordinarily high accuracy with various deep neural models. However, existing ABSA models with…
▽ More
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind the social media texts or reviews, which has been a fundamental application to the real-world society. Since the early 2010s, ABSA has achieved extraordinarily high accuracy with various deep neural models. However, existing ABSA models with strong in-house performances may fail to generalize to some challenging cases where the contexts are variable, i.e., low robustness to real-world environments. In this study, we propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training. First, we strengthen the current best-robust syntax-aware models by further incorporating the rich external syntactic dependencies and the labels with aspect simultaneously with a universal-syntax graph convolutional network. In the corpus perspective, we propose to automatically induce high-quality synthetic training data with various types, allowing models to learn sufficient inductive bias for better robustness. Last, we based on the rich pseudo data perform adversarial training to enhance the resistance to the context perturbation and meanwhile employ contrastive learning to reinforce the representations of instances with contrastive sentiments. Extensive robustness evaluations are conducted. The results demonstrate that our enhanced syntax-aware model achieves better robustness performances than all the state-of-the-art baselines. By additionally incorporating our synthetic corpus, the robust testing results are pushed with around 10% accuracy, which are then further improved by installing the advanced training strategies. In-depth analyses are presented for revealing the factors influencing the ABSA robustness.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
BUFFALO/Flashlights: Constraints on the abundance of lensed supergiant stars in the Spock galaxy at redshift 1
Authors:
Jose M. Diego,
Sung Kei Li,
Ashish K. Meena,
Anna Niemiec,
Ana Acebron,
Mathilde Jauzac,
Mitchell F. Struble,
Alfred Amruth,
Tom J. Broadhurst,
Catherine Cerny,
Harald Ebeling,
Alexei V. Filippenko,
Eric Jullo,
Patrick Kelly,
Anton M. Koekemoer,
David Lagatutta,
Jeremy Lim,
Marceau Limousin,
Guillaume Mahler,
Nency Patel,
Juan Remolina,
Johan Richard,
Keren Sharon,
Charles Steinhardt,
Keichii Umetsu
, et al. (5 additional authors not shown)
Abstract:
We present a constraint on the abundance of supergiant (SG) stars at redshift z approx. 1, based on recent observations of a strongly lensed arc at this redshift. First we derive a free-form model of MACS J0416.1-2403 using data from the BUFFALO program. The new lens model is based on 72 multiply lensed galaxies that produce 214 multiple images, making it the largest sample of spectroscopically co…
▽ More
We present a constraint on the abundance of supergiant (SG) stars at redshift z approx. 1, based on recent observations of a strongly lensed arc at this redshift. First we derive a free-form model of MACS J0416.1-2403 using data from the BUFFALO program. The new lens model is based on 72 multiply lensed galaxies that produce 214 multiple images, making it the largest sample of spectroscopically confirmed lensed galaxies on this cluster. The larger coverage in BUFFALO allows us to measure the shear up to the outskirts of the cluster, and extend the range of lensing constraints up to ~ 1 Mpc from the central region, providing a mass estimate up to this radius. As an application, we make predictions for the number of high-redshift multiply-lensed galaxies detected in future observations with JWST. Then we focus on a previously known lensed galaxy at z=1.0054, nicknamed Spock, which contains four previously reported transients. We interpret these transients as microcaustic crossings of SG stars and compute the probability of such events. Based on simplifications regarding the stellar evolution, we find that microlensing (by stars in the intracluster medium) of SG stars at z=1.0054 can fully explain these events. The inferred abundance of SG stars is consistent with either (1) a number density of stars with bolometric luminosities beyond the Humphreys-Davidson (HD) limit (L ~ $6\times10^5 L_{\odot}$) that is below 400 stars per sq. kpc, or (2) the absence of stars beyond the HD limit but with a SG number density of ~ 9000 per sq. kpc for stars with luminosities between $10^5$ and $6\times10^5$. This is equivalent to one SG star per 10x10 pc$^2$. We finally make predictions for future observations with JWST's NIRcam. We find that in observations made with the F200W filter that reach 29 mag AB, if cool red SG stars exist at z~1 beyond the HD limit, they should be easily detected in this arc
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Interactions between two adjacent convection rolls in turbulent Rayleigh-Benard convection
Authors:
Eric Brown,
Dandan Ji
Abstract:
Rayleigh-B{é}nard convection experiments were done with two adjacent cubic cells with a partial wall in between to force the generation of two interacting convection rolls. Observed stable states include both counter-rotating and co-rotating states. The stability of each of these states and their dynamics were modeled by stochastic ordinary differential equations of motion in terms of the orientat…
▽ More
Rayleigh-B{é}nard convection experiments were done with two adjacent cubic cells with a partial wall in between to force the generation of two interacting convection rolls. Observed stable states include both counter-rotating and co-rotating states. The stability of each of these states and their dynamics were modeled by stochastic ordinary differential equations of motion in terms of the orientation, amplitude, and mean temperature of each convection roll. The form of the interaction terms is predicted based on an effective turbulent diffusion of temperature between the adjacent rolls. Predictions are made for stable fixed points of the co- and counter-rotating states. This suggests that the same turbulent thermal diffusivity that describes macroscopically averaged heat transport also controls the interactions between neighboring convection rolls. The surprising stability of co-rotating states is due to the temperature difference between the neighboring rolls becoming large enough that the heat flux between the rolls stabilizes the temperature profile of aligned co-rotating states. This temperature difference can be driven by heating the plates of the two cells to different mean temperatures. This shifts the orientations of the rolls of counter-rotating states in opposite directions, and for large temperature differences only co-rotating states are stable Spontaneous switching between co-rotating and counter-rotating states is also observed. Switching to counter-rotating states occurs mainly due to cessation (a significant weakening of a convection roll), which reduces damping on changes in orientation, allowing the orientation to change rapidly due to diffusive fluctuations. Switching to co-rotating states is mainly driven by smaller diffusive fluctuations, which have a positive feedback that destabilizes the counter-rotating state.
△ Less
Submitted 12 June, 2023; v1 submitted 25 February, 2023;
originally announced February 2023.
-
Booster Free From Spin Resonance For Future 100~km-scale Circular e$^{+}$e$^{-}$ Colliders
Authors:
Tao Chen,
Zhe Duan,
Daheng Ji,
Dou Wang
Abstract:
Acceleration of polarized electron~(positron) beams in a booster synchrotron may suffer from depolarization due to crossings of many spin depolarization resonances, which could limit its applications. We have studied the spin depolarization resonance structure of a 100~km scale booster lattice of the Circular Electron Positron Collider~(CEPC). The lattice has 8 arc regions with hundreds of FODO ce…
▽ More
Acceleration of polarized electron~(positron) beams in a booster synchrotron may suffer from depolarization due to crossings of many spin depolarization resonances, which could limit its applications. We have studied the spin depolarization resonance structure of a 100~km scale booster lattice of the Circular Electron Positron Collider~(CEPC). The lattice has 8 arc regions with hundreds of FODO cells, interleaved with straight sections, which leads to a high periodicity. Our analysis shows the contributions to the strength of intrinsic and imperfection spin resonances add up coherently near the super strong resonances beyond 120 GeV, but mostly cancel out and result in generally weak resonance strengths at lower beam energies. Detailed simulations confirm that beam polarization can be mostly maintained in the fast acceleration to 45.6 GeV and 80 GeV, but severe depolarization may occur at even higher energies. This study suggests the possibility of acceleration of polarized electron~(positron) beams to ultra-high beam energies without the help of Siberian snakes, and supports injecting highly polarized beams into the collider rings as an attractive solution for resonant depolarization measurements and longitudinal polarized colliding beam experiments for future 100~km scale circular e$^{+}$e$^{-}$ colliders.
△ Less
Submitted 6 June, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
DiaASQ : A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis
Authors:
Bobo Li,
Hao Fei,
Fei Li,
Yuhan Wu,
Jinsong Zhang,
Shengqiong Wu,
Jingye Li,
Yijiang Liu,
Lizi Liao,
Tat-Seng Chua,
Donghong Ji
Abstract:
The rapid development of aspect-based sentiment analysis (ABSA) within recent decades shows great potential for real-world society. The current ABSA works, however, are mostly limited to the scenario of a single text piece, leaving the study in dialogue contexts unexplored. To bridge the gap between fine-grained sentiment analysis and conversational opinion mining, in this work, we introduce a nov…
▽ More
The rapid development of aspect-based sentiment analysis (ABSA) within recent decades shows great potential for real-world society. The current ABSA works, however, are mostly limited to the scenario of a single text piece, leaving the study in dialogue contexts unexplored. To bridge the gap between fine-grained sentiment analysis and conversational opinion mining, in this work, we introduce a novel task of conversational aspect-based sentiment quadruple analysis, namely DiaASQ, aiming to detect the quadruple of target-aspect-opinion-sentiment in a dialogue. We manually construct a large-scale high-quality DiaASQ dataset in both Chinese and English languages. We deliberately develop a neural model to benchmark the task, which advances in effectively performing end-to-end quadruple prediction, and manages to incorporate rich dialogue-specific and discourse feature representations for better cross-utterance quadruple extraction. We hope the new benchmark will spur more advancements in the sentiment analysis community.
△ Less
Submitted 22 May, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Umehara algebra and complex submanifolds of indefinite complex space forms
Authors:
Xu Zhang,
Donghai Ji
Abstract:
The Umehara algebra is studied with motivation on the problem of the non-existence of common complex submanifolds. In this paper, we prove some new results in Umehara algebra and obtain some applications. In particular, if a complex manifolds admits a holomorphic polynomial isometric immersion to one indefinite complex space form, then it cannot admits a holomorphic isometric immersion to another…
▽ More
The Umehara algebra is studied with motivation on the problem of the non-existence of common complex submanifolds. In this paper, we prove some new results in Umehara algebra and obtain some applications. In particular, if a complex manifolds admits a holomorphic polynomial isometric immersion to one indefinite complex space form, then it cannot admits a holomorphic isometric immersion to another indefinite complex space form of different type. Other consequences include the non-existence of the common complex submanifolds for indefinite complex projective space or hyperbolic space and a complex manifold with a distinguished metric, such as homogeneous domains, the Hartogs triangle, the minimal ball, the symmetrized polydisc, etc, equipped with their intrinsic Bergman metrics, which generalizes more or less all existing results.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags
Authors:
Jiang Liu,
Donghong Ji,
Jingye Li,
Dongdong Xie,
Chong Teng,
Liang Zhao,
Fei Li
Abstract:
So far, discontinuous named entity recognition (NER) has received increasing research attention and many related methods have surged such as hypergraph-based methods, span-based methods, and sequence-to-sequence (Seq2Seq) methods, etc. However, these methods more or less suffer from some problems such as decoding ambiguity and efficiency, which limit their performance. Recently, grid-tagging metho…
▽ More
So far, discontinuous named entity recognition (NER) has received increasing research attention and many related methods have surged such as hypergraph-based methods, span-based methods, and sequence-to-sequence (Seq2Seq) methods, etc. However, these methods more or less suffer from some problems such as decoding ambiguity and efficiency, which limit their performance. Recently, grid-tagging methods, which benefit from the flexible design of tagging systems and model architectures, have shown superiority to adapt for various information extraction tasks. In this paper, we follow the line of such methods and propose a competitive grid-tagging model for discontinuous NER. We call our model TOE because we incorporate two kinds of Tag-Oriented Enhancement mechanisms into a state-of-the-art (SOTA) grid-tagging model that casts the NER problem into word-word relationship prediction. First, we design a Tag Representation Embedding Module (TREM) to force our model to consider not only word-word relationships but also word-tag and tag-tag relationships. Concretely, we construct tag representations and embed them into TREM, so that TREM can treat tag and word representations as queries/keys/values and utilize self-attention to model their relationships. On the other hand, motivated by the Next-Neighboring-Word (NNW) and Tail-Head-Word (THW) tags in the SOTA model, we add two new symmetric tags, namely Previous-Neighboring-Word (PNW) and Head-Tail-Word (HTW), to model more fine-grained word-word relationships and alleviate error propagation from tag prediction. In the experiments of three benchmark datasets, namely CADEC, ShARe13 and ShARe14, our TOE model pushes the SOTA results by about 0.83%, 0.05% and 0.66% in F1, demonstrating its effectiveness.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Entity-centered Cross-document Relation Extraction
Authors:
Fengqi Wang,
Fei Li,
Hao Fei,
Jingye Li,
Shengqiong Wu,
Fangfang Su,
Wenxuan Shi,
Donghong Ji,
Bo Cai
Abstract:
Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention. Previous studies focus on extracting the relations within a sentence or document, while currently researchers begin to explore cross-document RE. However, current cross-document RE methods directly utilize text snippets surrounding target entities in multiple given do…
▽ More
Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention. Previous studies focus on extracting the relations within a sentence or document, while currently researchers begin to explore cross-document RE. However, current cross-document RE methods directly utilize text snippets surrounding target entities in multiple given documents, which brings considerable noisy and non-relevant sentences. Moreover, they utilize all the text paths in a document bag in a coarse-grained way, without considering the connections between these text paths.In this paper, we aim to address both of these shortages and push the state-of-the-art for cross-document RE. First, we focus on input construction for our RE model and propose an entity-based document-context filter to retain useful information in the given documents by using the bridge entities in the text paths. Second, we propose a cross-document RE model based on cross-path entity relation attention, which allow the entity relations across text paths to interact with each other. We compare our cross-document RE method with the state-of-the-art methods in the dataset CodRED. Our method outperforms them by at least 10% in F1, thus demonstrating its effectiveness.
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
Authors:
Peihao Chen,
Dongyu Ji,
Kunyang Lin,
Runhao Zeng,
Thomas H. Li,
Mingkui Tan,
Chuang Gan
Abstract:
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the envi…
▽ More
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Learning Active Camera for Multi-Object Navigation
Authors:
Peihao Chen,
Dongyu Ji,
Kunyang Lin,
Weiwen Hu,
Wenbing Huang,
Thomas H. Li,
Mingkui Tan,
Chuang Gan
Abstract:
Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras. As a result, the agent may take a very long time to perceive the environ…
▽ More
Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras. As a result, the agent may take a very long time to perceive the environment due to limited camera scope. In contrast, humans typically gain a larger field of view by looking around for a better perception of the environment. How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics. In this paper, we consider navigating to multiple objects more efficiently with active cameras. Specifically, we cast moving camera to a Markov Decision Process and reformulate the active camera problem as a reinforcement learning problem. However, we have to address two new challenges: 1) how to learn a good camera policy in complex environments and 2) how to coordinate it with the navigation policy. To address these, we carefully design a reward function to encourage the agent to explore more areas by moving camera actively. Moreover, we exploit human experience to infer a rule-based camera action to guide the learning process. Last, to better coordinate two kinds of policies, the camera policy takes navigation actions into account when making camera moving decisions. Experimental results show our camera policy consistently improves the performance of multi-object navigation over four baselines on two datasets.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.