-
Fast Downflows Observed during a Polar Crown Filament Eruption
Authors:
Zheng Sun,
Hui Tian,
Ting Li,
Rui Liu,
Yadan Duan
Abstract:
Solar filaments can undergo eruptions and result in the formation of coronal mass ejections (CMEs), which could significantly impact planetary space environments. Observations of eruptions involving polar crown filaments, situated in the polar regions of the Sun, are limited. In this study, we report a polar crown filament eruption (SOL2023-06-12), characterized by fast downflows below the filamen…
▽ More
Solar filaments can undergo eruptions and result in the formation of coronal mass ejections (CMEs), which could significantly impact planetary space environments. Observations of eruptions involving polar crown filaments, situated in the polar regions of the Sun, are limited. In this study, we report a polar crown filament eruption (SOL2023-06-12), characterized by fast downflows below the filament. The downflows appear instantly after the onset of the filament eruption and persist for approximately 2 hours, exhibiting plane-of-sky (POS) velocities ranging between 92 and 144 km s$^{-1}$. They originate from the leading edge of the filament and no clear acceleration is observed. Intriguingly, these downflows appear at two distinct sites, symmetrically positioned at the opposite ends of the conjugate flare ribbons. Based on the observations, we propose that the filament might be supported by a magnetic flux rope (MFR), and these downflows possibly occur along the legs of the MFR. The downflows likely result from continuous reconnections between the MFR and the overlying magnetic field structures, and could either be reconnection outflows or redirected filament materials. We also observed horizontal drifting of the locations of downflows, which might correspond to the MFR's footpoint drifting. This type of downflows can potentially be utilized to track the footpoints of MFRs during eruptions.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Optical Routing via High Efficiency Composite Acoustic Diffraction
Authors:
Yuxiang Zhao,
Jiangyong Hu,
Ruijuan Liu,
Ruochen Gao,
Yiming Li,
Xiao Zhang,
Huanfeng Zhu,
Saijun Wu
Abstract:
Acousto-optical modulation (AOM) is a powerful and widely used technique for rapidly controlling the frequency, phase, intensity, and direction of light. Based on Bragg diffraction, AOMs typically exhibit moderate diffraction efficiency, often less than 90\% even for collimated inputs. In this work, we demonstrate that this efficiency can be significantly improved using a composite (CP) setup comp…
▽ More
Acousto-optical modulation (AOM) is a powerful and widely used technique for rapidly controlling the frequency, phase, intensity, and direction of light. Based on Bragg diffraction, AOMs typically exhibit moderate diffraction efficiency, often less than 90\% even for collimated inputs. In this work, we demonstrate that this efficiency can be significantly improved using a composite (CP) setup comprising a pair of 4-F-linked AOMs, enabling 2-by-2 beamsplitting with fully tunable splitting amplitude and phase. The efficiency enhancement arises from two effects, termed "momentum echo" and "high-order rephasing," which can be simultaneously optimized by adjusting the relative distance between the two AOMs. This method is resource-efficient, does not require ultra-collimation, and maintains control bandwidth. Experimentally, we achieved a diffraction efficiency exceeding 99\% (excluding insertion loss) and a 35 dB single-mode suppression of the 0th-order beam, demonstrating a full-contrast optical router with a switching time of less than 100~nanoseconds. Theoretically, we formulate the dynamics of CP-AOM in terms of multi-mode quantum control and discuss extensions beyond the $N=2$ configuration presented in this work. The substantially enhanced performance of CP-AOMs, coupled with reduced acoustic amplitude requirements, may significantly advance our ability to accurately control light at high speeds with low-loss acousto-optics.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
General targeted machine learning for modern causal mediation analysis
Authors:
Richard Liu,
Nicholas T. Williams,
Kara E. Rudolph,
Iván Díaz
Abstract:
Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in rigourous causal models has grown significantly in recent years, and there has been important progress to address challenges in the interpretation and identificat…
▽ More
Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in rigourous causal models has grown significantly in recent years, and there has been important progress to address challenges in the interpretation and identification of such effects. Despite great progress in the causal inference front, statistical methodology for non-parametric estimation has lagged behind, with few or no methods available for tackling non-parametric estimation in the presence of multiple, continuous, or high-dimensional mediators. In this paper we show that the identification formulas for six popular non-parametric approaches to mediation analysis proposed in recent years can be recovered from just two statistical estimands. We leverage this finding to propose an all-purpose one-step estimation algorithm that can be coupled with machine learning in any mediation study that uses any of these six definitions of mediation. The estimators have desirable properties, such as $\sqrt{n}$-convergence and asymptotic normality. Estimating the first-order correction for the one-step estimator requires estimation of complex density ratios on the potentially high-dimensional mediators, a challenge that is solved using recent advancements in so-called Riesz learning. We illustrate the properties of our methods in a simulation study and illustrate its use on real data to estimate the extent to which pain management practices mediate the total effect of having a chronic pain disorder on opioid use disorder.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
The Frame-Dragging effect on the excitation rate of atoms
Authors:
Rui-Chen Liu,
C. P. Sun
Abstract:
The frame-dragging phenomenon in gravitational fields is revisited to explore the geometric effects induced by spacetime curvature. We quantize a massless scalar field in the spacetime of a rotating sphere, incorporating the frame-dragging frequency into the field modes. The excitation rate for an atom undergoing uniform circular motion and interacting with the scalar field is calculated. Our resu…
▽ More
The frame-dragging phenomenon in gravitational fields is revisited to explore the geometric effects induced by spacetime curvature. We quantize a massless scalar field in the spacetime of a rotating sphere, incorporating the frame-dragging frequency into the field modes. The excitation rate for an atom undergoing uniform circular motion and interacting with the scalar field is calculated. Our results reveal that the time-dependent excitation rates of atoms following different trajectories exhibit a common envelope, from which the frame-dragging frequency can be effectively extracted. This discovery leads us to propose a novel detection scheme for measuring the frame-dragging frequency caused by rotating celestial bodies, eliminating the need for traditional starlight calibration methods.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Month-long-lifetime microwave spectral holes in an erbium-doped scheelite crystal at millikelvin temperature
Authors:
Zhiren Wang,
Sen Lin,
Marianne Le Dantec,
Miloš Rančić,
Philippe Goldner,
Sylvain Bertaina,
Thierry Chanelière,
Ren-Bao Liu,
Daniel Esteve,
Denis Vion,
Emmanuel Flurin,
Patrice Bertet
Abstract:
Rare-earth-ion (REI) ensembles in crystals have remarkable optical and spin properties characterized by narrow homogeneous linewidths relative to the inhomogeneous ensemble broadening. This makes it possible to precisely tailor the ensemble spectral density and therefore the absorption profile by applying narrow-linewidth radiation to transfer population into auxiliary levels, a process broadly kn…
▽ More
Rare-earth-ion (REI) ensembles in crystals have remarkable optical and spin properties characterized by narrow homogeneous linewidths relative to the inhomogeneous ensemble broadening. This makes it possible to precisely tailor the ensemble spectral density and therefore the absorption profile by applying narrow-linewidth radiation to transfer population into auxiliary levels, a process broadly known as spectral hole burning (SHB). REI-doped crystals find applications in information processing, both classical (pattern recognition, filtering, spectral analysis) and quantum (photon storage), all protocols requiring suitable ensemble preparation by SHB as a first step. In Er$^{3+}$-doped materials, the longest reported hole lifetime is one minute, and longer lifetimes are desirable. Here, we report SHB and accumulated echo measurements in a scheelite crystal of CaWO$_4$ by pumping the electron spin transition of Er$^{3+}$ ions at microwave frequencies and millikelvin temperatures, with nuclear spin states of neighboring $^{183}$W atoms serving as the auxiliary levels. The lifetime of the holes and accumulated echoes rises steeply as the sample temperature is decreased, exceeding a month at 10 mK. Our results demonstrate that millikelvin temperatures can be beneficial for signal processing applications requiring long spectral hole lifetimes.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
MCDubber: Multimodal Context-Aware Expressive Video Dubbing
Authors:
Yuan Zhao,
Zhenqi Jia,
Rui Liu,
De Hu,
Feilong Bao,
Guanglai Gao
Abstract:
Automatic Video Dubbing (AVD) aims to take the given script and generate speech that aligns with lip motion and prosody expressiveness. Current AVD models mainly utilize visual information of the current sentence to enhance the prosody of synthesized speech. However, it is crucial to consider whether the prosody of the generated dubbing aligns with the multimodal context, as the dubbing will be co…
▽ More
Automatic Video Dubbing (AVD) aims to take the given script and generate speech that aligns with lip motion and prosody expressiveness. Current AVD models mainly utilize visual information of the current sentence to enhance the prosody of synthesized speech. However, it is crucial to consider whether the prosody of the generated dubbing aligns with the multimodal context, as the dubbing will be combined with the original context in the final video. This aspect has been overlooked in previous studies. To address this issue, we propose a Multimodal Context-aware video Dubbing model, termed \textbf{MCDubber}, to convert the modeling object from a single sentence to a longer sequence with context information to ensure the consistency of the global context prosody. MCDubber comprises three main components: (1) A context duration aligner aims to learn the context-aware alignment between the text and lip frames; (2) A context prosody predictor seeks to read the global context visual sequence and predict the context-aware global energy and pitch; (3) A context acoustic decoder ultimately predicts the global context mel-spectrogram with the assistance of adjacent ground-truth mel-spectrograms of the target sentence. Through this process, MCDubber fully considers the influence of multimodal context on the prosody expressiveness of the current sentence when dubbing. The extracted mel-spectrogram belonging to the target sentence from the output context mel-spectrograms is the final required dubbing audio. Extensive experiments on the Chem benchmark dataset demonstrate that our MCDubber significantly improves dubbing expressiveness compared to all advanced baselines. The code and demos are available at https://github.com/XiaoYuanJun-zy/MCDubber.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
GRANDlib: A simulation pipeline for the Giant Radio Array for Neutrino Detection (GRAND)
Authors:
GRAND Collaboration,
Rafael Alves Batista,
Aurélien Benoit-Lévy,
Teresa Bister,
Martina Bohacova,
Mauricio Bustamante,
Washington Carvalho,
Yiren Chen,
LingMei Cheng,
Simon Chiche,
Jean-Marc Colley,
Pablo Correa,
Nicoleta Cucu Laurenciu,
Zigao Dai,
Rogerio M. de Almeida,
Beatriz de Errico,
Sijbrand de Jong,
João R. T. de Mello Neto,
Krijn D. de Vries,
Valentin Decoene,
Peter B. Denton,
Bohao Duan,
Kaikai Duan,
Ralph Engel,
William Erba
, et al. (90 additional authors not shown)
Abstract:
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challen…
▽ More
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challenges. Its primary goal is to perform end-to-end simulations of the detector operation, from the interaction of ultra-high-energy particles, through -- by interfacing with external air-shower simulations -- the ensuing particle shower development and its radio emission, to its detection by antenna arrays and its processing by data-acquisition systems. Additionally, GRANDlib manages the visualization, storage, and retrieval of experimental and simulated data. We present an overview of GRANDlib to serve as the basis of future GRAND analyses.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Authors:
Haoran Tang,
Meng Cao,
Jinfa Huang,
Ruyang Liu,
Peng Jin,
Ge Li,
Xiaodan Liang
Abstract:
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough under…
▽ More
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough understanding. To this end, we propose MUSE, a multi-scale mamba with linear computational complexity for efficient cross-resolution modeling. Specifically, the multi-scale representations are generated by applying a feature pyramid on the last single-scale feature map. Then, we employ the Mamba structure as an efficient multi-scale learner to jointly learn scale-wise representations. Furthermore, we conduct comprehensive studies to investigate different model structures and designs. Extensive results on three popular benchmarks have validated the superiority of MUSE.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Physics-Aware Combinatorial Assembly Planning using Deep Reinforcement Learning
Authors:
Ruixuan Liu,
Alan Chen,
Weiye Zhao,
Changliu Liu
Abstract:
Combinatorial assembly uses standardized unit primitives to build objects that satisfy user specifications. Lego is a widely used platform for combinatorial assembly, in which people use unit primitives (ie Lego bricks) to build highly customizable 3D objects. This paper studies sequence planning for physical combinatorial assembly using Lego. Given the shape of the desired object, we want to find…
▽ More
Combinatorial assembly uses standardized unit primitives to build objects that satisfy user specifications. Lego is a widely used platform for combinatorial assembly, in which people use unit primitives (ie Lego bricks) to build highly customizable 3D objects. This paper studies sequence planning for physical combinatorial assembly using Lego. Given the shape of the desired object, we want to find a sequence of actions for placing Lego bricks to build the target object. In particular, we aim to ensure the planned assembly sequence is physically executable. However, assembly sequence planning (ASP) for combinatorial assembly is particularly challenging due to its combinatorial nature, ie the vast number of possible combinations and complex constraints. To address the challenges, we employ deep reinforcement learning to learn a construction policy for placing unit primitives sequentially to build the desired object. Specifically, we design an online physics-aware action mask that efficiently filters out invalid actions and guides policy learning. In the end, we demonstrate that the proposed method successfully plans physically valid assembly sequences for constructing different Lego structures. The generated construction plan can be executed in real.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization
Authors:
Ran Liu,
Ming Liu,
Min Yu,
Jianguo Jiang,
Gang Li,
Dan Zhang,
Jingyuan Li,
Xiang Meng,
Weiqing Huang
Abstract:
Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised app…
▽ More
Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e.g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at https://github.com/Oswald1997/GLIMMER.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning
Authors:
Zhi Qiao,
Xuhui Liu,
Xiaopeng Wang,
Runkun Liu,
Xiantong Zhen,
Pei Dong,
Zhen Qian
Abstract:
Intraoperative CT imaging serves as a crucial resource for surgical guidance; however, it may not always be readily accessible or practical to implement. In scenarios where CT imaging is not an option, reconstructing CT scans from X-rays can offer a viable alternative. In this paper, we introduce an innovative method for 3D CT reconstruction utilizing biplanar X-rays. Distinct from previous resear…
▽ More
Intraoperative CT imaging serves as a crucial resource for surgical guidance; however, it may not always be readily accessible or practical to implement. In scenarios where CT imaging is not an option, reconstructing CT scans from X-rays can offer a viable alternative. In this paper, we introduce an innovative method for 3D CT reconstruction utilizing biplanar X-rays. Distinct from previous research that relies on conventional image generation techniques, our approach leverages a conditional diffusion process to tackle the task of reconstruction. More precisely, we employ a diffusion-based probabilistic model trained to produce 3D CT images based on orthogonal biplanar X-rays. To improve the structural integrity of the reconstructed images, we incorporate a novel projection loss function. Experimental results validate that our proposed method surpasses existing state-of-the-art benchmarks in both visual image quality and multiple evaluative metrics. Specifically, our technique achieves a higher Structural Similarity Index (SSIM) of 0.83, a relative increase of 10\%, and a lower Fréchet Inception Distance (FID) of 83.43, which represents a relative decrease of 25\%.
△ Less
Submitted 20 August, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
ByCAN: Reverse Engineering Controller Area Network (CAN) Messages from Bit to Byte Level
Authors:
Xiaojie Lin,
Baihe Ma,
Xu Wang,
Guangsheng Yu,
Ying He,
Ren Ping Liu,
Wei Ni
Abstract:
As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive unde…
▽ More
As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive understanding of the meaning of CAN messages. In this paper, we propose a fully automated reverse-engineering system, named ByCAN, to reverse engineer CAN messages. ByCAN outperforms existing research by introducing byte-level clusters and integrating multiple features at both byte and bit levels. ByCAN employs the clustering and template matching algorithms to automatically decode the specifications of CAN frames without the need for prior knowledge. Experimental results demonstrate that ByCAN achieves high accuracy in slicing and labeling performance, i.e., the identification of CAN signal boundaries and labels. In the experiments, ByCAN achieves slicing accuracy of 80.21%, slicing coverage of 95.21%, and labeling accuracy of 68.72% for general labels when analyzing the real-world CAN frames.
△ Less
Submitted 17 August, 2024;
originally announced August 2024.
-
Fishers Harvest Parallel Unlearning in Inherited Model Networks
Authors:
Xiao Liu,
Mingyuan Li,
Xu Wang,
Guangsheng Yu,
Wei Ni,
Lixiang Li,
Haipeng Peng,
Renping Liu
Abstract:
Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Ac…
▽ More
Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Acyclic Graph (DAG).Central to our framework is the new Fisher Inheritance Unlearning (FIUn) algorithm, which utilizes the Fisher Information Matrix (FIM) from initial unlearning models to pinpoint impacted parameters in inherited models. By employing FIM, the FIUn method breaks the sequential dependencies among the models, facilitating simultaneous unlearning and reducing computational overhead. We further design to merge disparate FIMs into a single matrix, synchronizing updates across inherited models. Experiments confirm the effectiveness of our unlearning framework. For single-class tasks, it achieves complete unlearning with 0\% accuracy for unlearned labels while maintaining 94.53\% accuracy for retained labels on average. For multi-class tasks, the accuracy is 1.07\% for unlearned labels and 84.77\% for retained labels on average. Our framework accelerates unlearning by 99\% compared to alternative methods.
△ Less
Submitted 20 August, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy
Authors:
Shaojun Xu,
Xusheng Luo,
Yutong Huang,
Letian Leng,
Ruixuan Liu,
Changliu Liu
Abstract:
Long-horizon planning is hindered by challenges such as uncertainty accumulation, computational complexity, delayed rewards and incomplete information. This work proposes an approach to exploit the task hierarchy from human instructions to facilitate multi-robot planning. Using Large Language Models (LLMs), we propose a two-step approach to translate multi-sentence instructions into a structured l…
▽ More
Long-horizon planning is hindered by challenges such as uncertainty accumulation, computational complexity, delayed rewards and incomplete information. This work proposes an approach to exploit the task hierarchy from human instructions to facilitate multi-robot planning. Using Large Language Models (LLMs), we propose a two-step approach to translate multi-sentence instructions into a structured language, Hierarchical Linear Temporal Logic (LTL), which serves as a formal representation for planning. Initially, LLMs transform the instructions into a hierarchical representation defined as Hierarchical Task Tree, capturing the logical and temporal relations among tasks. Following this, a domain-specific fine-tuning of LLM translates sub-tasks of each task into flat LTL formulas, aggregating them to form hierarchical LTL specifications. These specifications are then leveraged for planning using off-the-shelf planners. Our framework not only bridges the gap between instructions and algorithmic planning but also showcases the potential of LLMs in harnessing hierarchical reasoning to automate multi-robot task planning. Through evaluations in both simulation and real-world experiments involving human participants, we demonstrate that our method can handle more complex instructions compared to existing methods. The results indicate that our approach achieves higher success rates and lower costs in multi-robot task allocation and plan generation. Demos videos are available at https://youtu.be/7WOrDKxIMIs .
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Intensity correlations in measurement-device-independent quantum key distribution
Authors:
Junxuan Liu,
Tianyi Xing,
Ruiyin Liu,
Zihao Chen,
Hao Tan,
Anqi Huang
Abstract:
The intensity correlations due to imperfect modulation during the quantum-state preparation in a measurement-device-independent quantum key distribution (MDI QKD) system compromise its security performance. Therefore, it is crucial to assess the impact of intensity correlations on the practical security of MDI QKD systems. In this work, we propose a theoretical model that quantitatively analyzes t…
▽ More
The intensity correlations due to imperfect modulation during the quantum-state preparation in a measurement-device-independent quantum key distribution (MDI QKD) system compromise its security performance. Therefore, it is crucial to assess the impact of intensity correlations on the practical security of MDI QKD systems. In this work, we propose a theoretical model that quantitatively analyzes the secure key rate of MDI QKD systems under intensity correlations. Furthermore, we apply the theoretical model to a practical MDI QKD system with measured intensity correlations, which shows that the system struggles to generate keys efficiently under this model. We also explore the boundary conditions of intensity correlations to generate secret keys. This study extends the security analysis of intensity correlations to MDI QKD protocols, providing a methodology to evaluate the practical security of MDI QKD systems.
△ Less
Submitted 18 August, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Characterization of Intensity Correlation via Single-photon Detection in Quantum Key Distribution
Authors:
Tianyi Xing,
Junxuan Liu,
Likang Zhang,
Min-Yan Wang,
Yu-Huai Li,
Ruiyin Liu,
Qingquan Peng,
Dongyang Wang,
Yaxuan Wang,
Hongwei Liu,
Wei Li,
Yuan Cao,
Anqi Huang
Abstract:
One of the most significant vulnerabilities in the source unit of quantum key distribution (QKD) is the correlation between quantum states after modulation, which shall be characterized and evaluated for its practical security performance. In this work, we propose a methodology to characterize the intensity correlation according to the single-photon detection results in the measurement unit withou…
▽ More
One of the most significant vulnerabilities in the source unit of quantum key distribution (QKD) is the correlation between quantum states after modulation, which shall be characterized and evaluated for its practical security performance. In this work, we propose a methodology to characterize the intensity correlation according to the single-photon detection results in the measurement unit without modifying the configuration of the QKD system. In contrast to the previous research that employs extra classical optical detector to measure the correlation, our method can directly analyse the detection data generated during the raw key exchange, enabling to characterize the feature of correlation in real-time system operation. The basic method is applied to a BB84 QKD system and the characterized correlation decreases the secure key rate shown by the security proof. Furthermore, the method is extended and applied to characterize the correlation from the result of Bell-state measurement, which demonstrates its applicability to a running full-scheme MDI QKD system. This study provides an approach for standard certification of a QKD system.
△ Less
Submitted 18 August, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Authors:
Jiri Hron,
Laura Culp,
Gamaleldin Elsayed,
Rosanne Liu,
Ben Adlam,
Maxwell Bileschi,
Bernd Bohnet,
JD Co-Reyes,
Noah Fiedel,
C. Daniel Freeman,
Izzeddin Gur,
Kathleen Kenealy,
Jaehoon Lee,
Peter J. Liu,
Gaurav Mishra,
Igor Mordatch,
Azade Nova,
Roman Novak,
Aaron Parisi,
Jeffrey Pennington,
Alex Rizkowsky,
Isabelle Simpson,
Hanie Sedghi,
Jascha Sohl-dickstein,
Kevin Swersky
, et al. (6 additional authors not shown)
Abstract:
While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus focus on studying only those hallucinations where a correct answer appears verbatim in the training set. To fully control the training data content,…
▽ More
While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus focus on studying only those hallucinations where a correct answer appears verbatim in the training set. To fully control the training data content, we construct a knowledge graph (KG)-based dataset, and use it to train a set of increasingly large LMs. We find that for a fixed dataset, larger and longer-trained LMs hallucinate less. However, hallucinating on $\leq5$% of the training data requires an order of magnitude larger model, and thus an order of magnitude more compute, than Hoffmann et al. (2022) reported was optimal. Given this costliness, we study how hallucination detectors depend on scale. While we see detector size improves performance on fixed LM's outputs, we find an inverse relationship between the scale of the LM and the detectability of its hallucinations.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
RTAT: A Robust Two-stage Association Tracker for Multi-Object Tracking
Authors:
Song Guo,
Rujie Liu,
Narishige Abe
Abstract:
Data association is an essential part in the tracking-by-detection based Multi-Object Tracking (MOT). Most trackers focus on how to design a better data association strategy to improve the tracking performance. The rule-based handcrafted association methods are simple and highly efficient but lack generalization capability to deal with complex scenes. While the learnt association methods can learn…
▽ More
Data association is an essential part in the tracking-by-detection based Multi-Object Tracking (MOT). Most trackers focus on how to design a better data association strategy to improve the tracking performance. The rule-based handcrafted association methods are simple and highly efficient but lack generalization capability to deal with complex scenes. While the learnt association methods can learn high-order contextual information to deal with various complex scenes, but they have the limitations of higher complexity and cost. To address these limitations, we propose a Robust Two-stage Association Tracker, named RTAT. The first-stage association is performed between tracklets and detections to generate tracklets with high purity, and the second-stage association is performed between tracklets to form complete trajectories. For the first-stage association, we use a simple data association strategy to generate tracklets with high purity by setting a low threshold for the matching cost in the assignment process. We conduct the tracklet association in the second-stage based on the framework of message-passing GNN. Our method models the tracklet association as a series of edge classification problem in hierarchical graphs, which can recursively merge short tracklets into longer ones. Our tracker RTAT ranks first on the test set of MOT17 and MOT20 benchmarks in most of the main MOT metrics: HOTA, IDF1, and AssA. We achieve 67.2 HOTA, 84.7 IDF1, and 69.7 AssA on MOT17, and 66.2 HOTA, 82.5 IDF1, and 68.1 AssA on MOT20.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Flexible 3D Lane Detection by Hierarchical Shape MatchingFlexible 3D Lane Detection by Hierarchical Shape Matching
Authors:
Zhihao Guan,
Ruixin Liu,
Zejian Yuan,
Ao Liu,
Kun Tang,
Tong Zhou,
Erlong Li,
Chao Zheng,
Shuqi Mei
Abstract:
As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexib…
▽ More
As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexible representations of lane shapes at different levels, simultaneously collecting global instance semantics and avoiding local errors. In the global scope, we propose to regress parametric curves w.r.t adaptive axes that help to make more robust predictions towards complex scenes, while in the local vision the structure of lane segment is detected in each of the dynamic anchor cells sampled along the global predicted curves. Moreover, corresponding global and local shape matching losses and anchor cell generation strategies are designed. Experiments on two datasets show that we overwhelm current top methods under high precision standards, and full ablation studies also verify each part of our method. Our codes will be released at https://github.com/Doo-do/FHLD.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Controlling the World by Sleight of Hand
Authors:
Sruthi Sudhakar,
Ruoshi Liu,
Basile Van Hoorick,
Carl Vondrick,
Richard Zemel
Abstract:
Humans naturally build mental models of object interactions and dynamics, allowing them to imagine how their surroundings will change if they take a certain action. While generative models today have shown impressive results on generating/editing images unconditionally or conditioned on text, current methods do not provide the ability to perform object manipulation conditioned on actions, an impor…
▽ More
Humans naturally build mental models of object interactions and dynamics, allowing them to imagine how their surroundings will change if they take a certain action. While generative models today have shown impressive results on generating/editing images unconditionally or conditioned on text, current methods do not provide the ability to perform object manipulation conditioned on actions, an important tool for world modeling and action planning. Therefore, we propose to learn an action-conditional generative models by learning from unlabeled videos of human hands interacting with objects. The vast quantity of such data on the internet allows for efficient scaling which can enable high-performing action-conditional models. Given an image, and the shape/location of a desired hand interaction, CosHand, synthesizes an image of a future after the interaction has occurred. Experiments show that the resulting model can predict the effects of hand-object interactions well, with strong generalization particularly to translation, stretching, and squeezing interactions of unseen objects in unseen environments. Further, CosHand can be sampled many times to predict multiple possible effects, modeling the uncertainty of forces in the interaction/environment. Finally, method generalizes to different embodiments, including non-human hands, i.e. robot hands, suggesting that generative video models can be powerful models for robotics.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
A magnetised Galactic halo from inner Galaxy outflows
Authors:
He-Shou Zhang,
Gabriele Ponti,
Ettore Carretti,
Ruo-Yu Liu,
Mark R. Morris,
Marijke Haverkorn,
Nicola Locatelli,
Xueying Zheng,
Felix Aharonian,
Haiming Zhang,
Yi Zhang,
Giovanni Stel,
Andrew Strong,
Micheal Yeung,
Andrea Merloni
Abstract:
Large-scale magnetic fields are observed off the midplanes of disk galaxies, indicating that they harbour magnetised halos. These halos are crucial to studies of galaxy evolution, galactic-scale outflows, and feedback from star formation activity. Identifying the magnetised halo of the Milky Way is challenging because of the potential contamination from foreground emission arising in local spiral…
▽ More
Large-scale magnetic fields are observed off the midplanes of disk galaxies, indicating that they harbour magnetised halos. These halos are crucial to studies of galaxy evolution, galactic-scale outflows, and feedback from star formation activity. Identifying the magnetised halo of the Milky Way is challenging because of the potential contamination from foreground emission arising in local spiral arms. Additionally, it is unclear how our magnetic halo is influenced by recently revealed large-scale structures such as the X-ray emitting eROSITA Bubbles, which, according to previous simulations, might be transient structures powered by the Galactic Center or the Galaxy's star-forming ring. Here we report the identification of several kpc-scale magnetised structures based on their polarized radio emission and their gamma-ray counterparts, which can be interpreted as the radiation of relativistic electrons. These non-thermal structures extend far above and below the Galactic plane and are spatially coincident with the thermal X-ray emission from the eROSITA Bubbles. The morphological consistency of these structures suggests a common origin, which can be sustained by Galactic outflows driven by the active star-forming regions located at 3-5 kpc from the Galactic Centre. These results reveal how X-ray-emitting and magnetised halos of spiral galaxies can be related to intense star formation activities and suggest that the X-shaped coherent magnetic structures observed in their halos can stem from galaxy outflows.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Authors:
Rex Liu,
Xin Liu
Abstract:
With the exponential growth of multimedia data, leveraging multimodal sensors presents a promising approach for improving accuracy in human activity recognition. Nevertheless, accurately identifying these activities using both video data and wearable sensor data presents challenges due to the labor-intensive data annotation, and reliance on external pretrained models or additional data. To address…
▽ More
With the exponential growth of multimedia data, leveraging multimodal sensors presents a promising approach for improving accuracy in human activity recognition. Nevertheless, accurately identifying these activities using both video data and wearable sensor data presents challenges due to the labor-intensive data annotation, and reliance on external pretrained models or additional data. To address these challenges, we introduce Multimodal Masked Autoencoders-Based One-Shot Learning (Mu-MAE). Mu-MAE integrates a multimodal masked autoencoder with a synchronized masking strategy tailored for wearable sensors. This masking strategy compels the networks to capture more meaningful spatiotemporal features, which enables effective self-supervised pretraining without the need for external data. Furthermore, Mu-MAE leverages the representation extracted from multimodal masked autoencoders as prior information input to a cross-attention multimodal fusion layer. This fusion layer emphasizes spatiotemporal features requiring attention across different modalities while highlighting differences from other classes, aiding in the classification of various classes in metric-based one-shot learning. Comprehensive evaluations on MMAct one-shot classification show that Mu-MAE outperforms all the evaluated approaches, achieving up to an 80.17% accuracy for five-way one-shot multimodal classification, without the use of additional data.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Single-photon interference over 8.4 km urban atmosphere: towards testing quantum effects in curved spacetime with photons
Authors:
Hui-Nan Wu,
Yu-Huai Li,
Bo Li,
Xiang You,
Run-Ze Liu,
Ji-Gang Ren,
Juan Yin,
Chao-Yang Lu,
Yuan Cao,
Cheng-Zhi Peng,
Jian-Wei Pan
Abstract:
The emergence of quantum mechanics and general relativity has transformed our understanding of the natural world significantly. However, integrating these two theories presents immense challenges, and their interplay remains untested. Recent theoretical studies suggest that the single-photon interference covering huge space can effectively probe the interface between quantum mechanics and general…
▽ More
The emergence of quantum mechanics and general relativity has transformed our understanding of the natural world significantly. However, integrating these two theories presents immense challenges, and their interplay remains untested. Recent theoretical studies suggest that the single-photon interference covering huge space can effectively probe the interface between quantum mechanics and general relativity. We developed an alternative design using unbalanced Michelson interferometers to address this and validated its feasibility over an 8.4 km free-space channel. Using a high-brightness single-photon source based on quantum dots, we demonstrated single-photon interference along this long-distance baseline. We achieved a phase measurement precision of 16.2 mrad, which satisfied the measurement requirements for a gravitational redshift at the geosynchronous orbit by five times the standard deviation. Our results confirm the feasibility of the single-photon version of the Colella-Overhauser-Werner experiment for testing the quantum effects in curved spacetime.
△ Less
Submitted 18 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression
Authors:
Jonas Schmitt,
Ruiping Liu,
Junwei Zheng,
Jiaming Zhang,
Rainer Stiefelhagen
Abstract:
Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, D…
▽ More
Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, Distill (CPD), which addresses both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. Additionally, the pruning pipeline adaptively remove parameters based on the importance scoring metrics regardless of vision tasks. To support the model in retaining its learned information, we introduce knowledge distillation during the pruning step. Extensive experiments demonstrate the generalizability of our framework, encompassing both convolutional neural network (CNN) and transformer models, as well as image classification and segmentation tasks. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Cascading failures with group support in interdependent hypergraphs
Authors:
Lei Chen,
Chunxiao Jia,
Run-Ran Liu,
Fanyuan Meng
Abstract:
The functionality of an entity frequently necessitates the support of a group situated in another layer of the system. To unravel the profound impact of such group support on a system's resilience against cascading failures, we devise a framework comprising a double-layer interdependent hypergraph system, wherein nodes are capable of receiving support via hyperedges. Our central hypothesis posits…
▽ More
The functionality of an entity frequently necessitates the support of a group situated in another layer of the system. To unravel the profound impact of such group support on a system's resilience against cascading failures, we devise a framework comprising a double-layer interdependent hypergraph system, wherein nodes are capable of receiving support via hyperedges. Our central hypothesis posits that the failure may transcend to another layer when all support groups of each dependent node fail, thereby initiating a potentially iterative cascade across layers. Through rigorous analytical methods, we derive the critical threshold for the initial node survival probability that marks the second-order phase transition point. A salient discovery is that as the prevalence of dependent nodes escalates, the system dynamics shift from a second-order to a first-order phase transition. Notably, irrespective of the collapse pattern, systems characterized by scale-free hyperdegree distributions within both hypergraph layers consistently demonstrate superior robustness compared to those adhering to Poisson hyperdegree distributions. In summary, our research underscores the paramount significance of group support mechanisms and intricate network topologies in determining the resilience of interconnected systems against the propagation of cascading failures. By exploring the interplay between these factors, we have gained insights into how systems can be designed or optimized to mitigate the risk of widespread disruptions, ensuring their continued functionality and stability in the face of adverse events.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Illumination Design for Joint Imaging and Wireless Power Transfer Systems
Authors:
Qianyu Yang,
Haiyang Zhang,
Chunguo Li,
Ruiqi Liu,
Baoyun Wang
Abstract:
This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently prov…
▽ More
This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently providing wireless power to nearby users. The integration of IWPT offers compelling advantages, including notable reductions in power consumption and spectrum utilization, pivotal for the optimization of future 6G wireless networks. As an initial investigation, we explore two antenna architectures: a fully digital array and a digital/analog hybrid array. Our goal is to characterize the fundamental trade-off between imaging and wireless power transfer by optimizing the illumination signal. With imaging operating in the near-field, we formulate the illumination signal design as an optimization problem that minimizes the condition number of the equivalent channel. To address this optimization problem, we propose an semi-definite relaxation-based approach for the fully digital array and an alternating optimization algorithm for the hybrid array. Finally, numerical results verify the effectiveness of our proposed solutions and demonstrate the trade-off between imaging and wireless power transfer.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Invariant Discovery of Features Across Multiple Length Scales: Applications in Microscopy and Autonomous Materials Characterization
Authors:
Aditya Raghavan,
Utkarsh Pratiush,
Mani Valleti,
Richard Liu,
Reece Emery,
Hiroshi Funakubo,
Yongtao Liu,
Philip Rack,
Sergei Kalinin
Abstract:
Physical imaging is a foundational characterization method in areas from condensed matter physics and chemistry to astronomy and spans length scales from atomic to universe. Images encapsulate crucial data regarding atomic bonding, materials microstructures, and dynamic phenomena such as microstructural evolution and turbulence, among other phenomena. The challenge lies in effectively extracting a…
▽ More
Physical imaging is a foundational characterization method in areas from condensed matter physics and chemistry to astronomy and spans length scales from atomic to universe. Images encapsulate crucial data regarding atomic bonding, materials microstructures, and dynamic phenomena such as microstructural evolution and turbulence, among other phenomena. The challenge lies in effectively extracting and interpreting this information. Variational Autoencoders (VAEs) have emerged as powerful tools for identifying underlying factors of variation in image data, providing a systematic approach to distilling meaningful patterns from complex datasets. However, a significant hurdle in their application is the definition and selection of appropriate descriptors reflecting local structure. Here we introduce the scale-invariant VAE approach (SI-VAE) based on the progressive training of the VAE with the descriptors sampled at different length scales. The SI-VAE allows the discovery of the length scale dependent factors of variation in the system. Here, we illustrate this approach using the ferroelectric domain images and generalize it to the movies of the electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries. This approach can further be used to initialize the decision making in automated experiments including structure-property discovery and can be applied across a broad range of imaging methods. This approach is universal and can be applied to any spatially resolved data including both experimental imaging studies and simulations, and can be particularly useful for exploration of phenomena such as turbulence, scale-invariant transformation fronts, etc.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Generative Expressive Conversational Speech Synthesis
Authors:
Rui Liu,
Yifan Hu,
Yi Ren,
Xiang Yin,
Haizhou Li
Abstract:
Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understanding and expression. However, they often need to design complex network architectures and meticulously optimize the modules within them. In addition, du…
▽ More
Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understanding and expression. However, they often need to design complex network architectures and meticulously optimize the modules within them. In addition, due to the limitations of small-scale datasets containing scripted recording styles, they often fail to simulate real natural conversational styles. To address the above issues, we propose a novel generative expressive CSS system, termed GPT-Talker.We transform the multimodal information of the multi-turn dialogue history into discrete token sequences and seamlessly integrate them to form a comprehensive user-agent dialogue context. Leveraging the power of GPT, we predict the token sequence, that includes both semantic and style knowledge, of response for the agent. After that, the expressive conversational speech is synthesized by the conversation-enriched VITS to deliver feedback to the user.Furthermore, we propose a large-scale Natural CSS Dataset called NCSSD, that includes both naturally recorded conversational speech in improvised styles and dialogues extracted from TV shows. It encompasses both Chinese and English languages, with a total duration of 236 hours.We conducted comprehensive experiments on the reliability of the NCSSD and the effectiveness of our GPT-Talker. Both subjective and objective evaluations demonstrate that our model outperforms other state-of-the-art CSS systems significantly in terms of naturalness and expressiveness. The Code, Dataset, and Pre-trained Model are available at: https://github.com/AI-S2-Lab/GPT-Talker.
△ Less
Submitted 31 July, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Regularization by noise for the inviscid primitive equations
Authors:
Ruimeng Hu,
Quyuan Lin,
Rongchang Liu
Abstract:
The deterministic inviscid primitive equations (also called the hydrostatic Euler equations) are known to be ill-posed in Sobolev spaces and in Gevrey classes of order strictly greater than 1, and some of their analytic solutions exist only locally in time and exhibit finite-time blowup. This work demonstrates that introducing suitable random noise can restore the local well-posedness and prevent…
▽ More
The deterministic inviscid primitive equations (also called the hydrostatic Euler equations) are known to be ill-posed in Sobolev spaces and in Gevrey classes of order strictly greater than 1, and some of their analytic solutions exist only locally in time and exhibit finite-time blowup. This work demonstrates that introducing suitable random noise can restore the local well-posedness and prevent finite-time blowups. Specifically, random diffusion addresses the ill-posedness in certain Gevrey classes, allowing us to establish the local well-posedness almost surely and the global existence of solutions with high probability. In the case of random damping (linear multiplicative noise), the noise prevents analytic solutions from forming singularities in finite time, resulting in globally existing solutions with high probability.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram
Authors:
Jingping Nie,
Ran Liu,
Behrooz Mahasseni,
Erdrin Azemi,
Vikramjit Mitra
Abstract:
Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate est…
▽ More
Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate estimation and murmur detection. Heart rate estimates are derived using a sliding window technique on heart sound snippets, analyzed with a combination of acoustic features (Mel spectrogram, cepstral coefficients, power spectral density, root mean square energy). Our findings indicate that a 2D convolutional neural network (\textbf{\texttt{2dCNN}}) is most effective for heart rate estimation, achieving a mean absolute error (MAE) of 1.312 bpm. We systematically investigate the impact of different feature combinations and find that utilizing all four features yields the best results. The MTL model (\textbf{\texttt{2dCNN-MTL}}) achieves accuracy over 95% in murmur detection, surpassing existing models, while maintaining an MAE of 1.636 bpm in heart rate estimation, satisfying the requirements stated by Association for the Advancement of Medical Instrumentation (AAMI).
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
NC-NCD: Novel Class Discovery for Node Classification
Authors:
Yue Hou,
Xueyuan Chen,
He Zhu,
Romei Liu,
Bowen Shi,
Jiaheng Liu,
Junran Wu,
Ke Xu
Abstract:
Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is freque…
▽ More
Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Universal clusters in quasi-two-dimensional ultracold Fermi mixtures
Authors:
Ruijin Liu,
Tingting Shi,
Matteo Zaccanti,
Xiaoling Cui
Abstract:
We study universal clusters in quasi-two dimensions (q2D) that consist of a light (L) atom interacting with two or three heavy (H) identical fermions, forming the trimer or tetramer bound state. The axial confinement in q2D is shown to lift the three-fold degeneracy of 3D trimer (tetramer) in $p$-wave channel and uniquely select the ground state with magnetic angular momentum $|m|=1$ ($m=0$). By v…
▽ More
We study universal clusters in quasi-two dimensions (q2D) that consist of a light (L) atom interacting with two or three heavy (H) identical fermions, forming the trimer or tetramer bound state. The axial confinement in q2D is shown to lift the three-fold degeneracy of 3D trimer (tetramer) in $p$-wave channel and uniquely select the ground state with magnetic angular momentum $|m|=1$ ($m=0$). By varying the interaction or confinement strength, we explore the dimensional crossover of these clusters from 3D to 2D, characterized by a gradual change of critical H-L mass ratio for their emergence and momentum-space distribution. Importantly, we find that a finite effective range will {\it not} alter their critical mass ratios in the weak coupling regime. There, we establish an effective 2D model to quantitatively reproduce the properties of q2D clusters, and further identify the optimal interaction strengths for their detections in experiments. Our results suggest a promising prospect for observing universal clusters and associated high-order correlation effects in realistic q2D ultracold Fermi mixtures.
△ Less
Submitted 3 August, 2024; v1 submitted 24 July, 2024;
originally announced July 2024.
-
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
Authors:
Siwei Wu,
Kang Zhu,
Yu Bai,
Yiming Liang,
Yizhi Li,
Haoning Wu,
J. H. Liu,
Ruibo Liu,
Xingwei Qu,
Xuxin Cheng,
Ge Zhang,
Wenhao Huang,
Chenghua Lin
Abstract:
Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVLMs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks primarily focus on facts or specific topic-related knowledge contained within individual images. However, they often overlook the associative relations between multip…
▽ More
Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVLMs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks primarily focus on facts or specific topic-related knowledge contained within individual images. However, they often overlook the associative relations between multiple images, which require the identification and analysis of similarities among entities or content present in different images. Therefore, we propose the multi-image relation association task and a meticulously curated Multi-granularity Multi-image Relational Association (MMRA) benchmark, comprising 1,024 samples. In order to systematically and comprehensively evaluate current LVLMs, we establish an associational relation system among images that contain 11 subtasks (e.g, UsageSimilarity, SubEvent) at two granularity levels (i.e., image and entity) according to the relations in ConceptNet. Our experiments reveal that on the MMRA benchmark, current multi-image LVLMs exhibit distinct advantages and disadvantages across various subtasks. Notably, fine-grained, entity-level multi-image perception tasks pose a greater challenge for LVLMs compared to image-level tasks. Moreover, LVLMs perform poorly on spatial-related tasks, indicating that LVLMs still have limited spatial awareness. Additionally, our findings indicate that while LVLMs demonstrate a strong capability to perceive image details, enhancing their ability to associate information across multiple images hinges on improving the reasoning capabilities of their language model component. Moreover, we explored the ability of LVLMs to perceive image sequences within the context of our multi-image association task. Our experiments show that the majority of current LVLMs do not adequately model image sequences during the pre-training process.
△ Less
Submitted 5 August, 2024; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Some $3$-designs invariant under $2.PΣL(2,49).$
Authors:
Minjia Shi,
Ruowen Liu,
Patrick Solé
Abstract:
We construct a ternary [49,25,7] code from the row span of a Jacobsthal matrix. It is equivalent to a Generalized Quadratic Residue (GQR) code in the sense of van Lint and MacWilliams (1978). These codes are the abelian generalizations of the quadratic residue (QR) codes which are cyclic. The union of the [50,25,8] extension of the said code and its dual supports a 3-(50,14,1248) design. The autom…
▽ More
We construct a ternary [49,25,7] code from the row span of a Jacobsthal matrix. It is equivalent to a Generalized Quadratic Residue (GQR) code in the sense of van Lint and MacWilliams (1978). These codes are the abelian generalizations of the quadratic residue (QR) codes which are cyclic. The union of the [50,25,8] extension of the said code and its dual supports a 3-(50,14,1248) design. The automorphism group of the latter design is a double cover of the permutation part of the automorphism group of the [50,25,8] code, which is isomorphic to $PΣL(2,49).$ Other weights in this code, other GQR codes, and other QR codes yield other 3-designs by the same process. A simple group action argument is provided to explain this behaviour of isodual codes.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Nonparametric Statistics on Magnetic Properties at the Footpoints of Erupting Magnetic Flux Ropes
Authors:
Rui Liu,
Wensi Wang
Abstract:
It is under debate whether the magnetic field in the solar atmosphere carries neutralized electric currents; particularly, whether a magnetic flux rope (MFR), which is considered the core structure of coronal mass ejections, carries neutralized electric currents. Recently Wang et al. (2023, ApJ, 943, 80) studied magnetic flux and electric current measured at the footpoints of 28 eruptive MFRs from…
▽ More
It is under debate whether the magnetic field in the solar atmosphere carries neutralized electric currents; particularly, whether a magnetic flux rope (MFR), which is considered the core structure of coronal mass ejections, carries neutralized electric currents. Recently Wang et al. (2023, ApJ, 943, 80) studied magnetic flux and electric current measured at the footpoints of 28 eruptive MFRs from 2010 to 2015. Because of the small sample size, no rigorous statistics has been done. Here, we include 9 more events from 2016 to 2023 and perform a series of nonparametric statistical tests at a significance level of 5\%. The tests confirm that there exist no significant differences in magnetic properties between conjugated footpoints of the same MFR, which justifies the method of identifying the MFR footpoints through coronal dimming. The tests demonstrate that there exist no significant differences between MFRs with pre-eruption dimming and those with only post-eruption dimming. However, there is a medium level of association between MFRs carrying substantial net current and those produce pre-eruption dimming, which can be understood by the Lorentz-self force of the current channel. The tests also suggest that in estimating the magnetic twist of MFRs, it is necessary to take into account the spatially inhomogeneous distribution of electric current density and magnetic field.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Navigation Instruction Generation with BEV Perception and Large Language Models
Authors:
Sheng Fan,
Rui Liu,
Wenguan Wang,
Yi Yang
Abstract:
Navigation instruction generation, which requires embodied agents to describe the navigation routes, has been of great interest in robotics and human-computer interaction. Existing studies directly map the sequence of 2D perspective observations to route descriptions. Though straightforward, they overlook the geometric information and object semantics of the 3D environment. To address these challe…
▽ More
Navigation instruction generation, which requires embodied agents to describe the navigation routes, has been of great interest in robotics and human-computer interaction. Existing studies directly map the sequence of 2D perspective observations to route descriptions. Though straightforward, they overlook the geometric information and object semantics of the 3D environment. To address these challenges, we propose BEVInstructor, which incorporates Bird's Eye View (BEV) features into Multi-Modal Large Language Models (MLLMs) for instruction generation. Specifically, BEVInstructor constructs a PerspectiveBEVVisual Encoder for the comprehension of 3D environments through fusing BEV and perspective features. To leverage the powerful language capabilities of MLLMs, the fused representations are used as visual prompts for MLLMs, and perspective-BEV prompt tuning is proposed for parameter-efficient updating. Based on the perspective-BEV prompts, BEVInstructor further adopts an instance-guided iterative refinement pipeline, which improves the instructions in a progressive manner. BEVInstructor achieves impressive performance across diverse datasets (i.e., R2R, REVERIE, and UrbanWalk).
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Scalable Optimization for Locally Relevant Geo-Location Privacy
Authors:
Chenxi Qiu,
Ruiyao Liu,
Primal Pappachan,
Anna Squicciarini,
Xinpeng Xie
Abstract:
Geo-obfuscation functions as a location privacy protection mechanism (LPPM), enabling mobile users to share obfuscated locations with servers instead of their exact locations. This technique protects users' location privacy during server-side data breaches since the obfuscation process is irreversible. To minimize the utility loss caused by data obfuscation, linear programming (LP) is widely used.…
▽ More
Geo-obfuscation functions as a location privacy protection mechanism (LPPM), enabling mobile users to share obfuscated locations with servers instead of their exact locations. This technique protects users' location privacy during server-side data breaches since the obfuscation process is irreversible. To minimize the utility loss caused by data obfuscation, linear programming (LP) is widely used. However, LP can face a polynomial explosion in decision variables, making it impractical for large-scale geo-obfuscation applications. In this paper, we propose a new LPPM called Locally Relevant Geo-obfuscation (LR-Geo) to optimize geo-obfuscation using LP more efficiently. This is accomplished by restricting the geo-obfuscation calculations for each user to locally relevant (LR) locations near the user's actual location. To prevent LR locations from inadvertently revealing a user's true whereabouts, users compute the LP coefficients locally and upload only these coefficients to the server, rather than the LR locations themselves. The server then solves the LP problem using the provided coefficients. Additionally, we enhance the LP framework with an exponential obfuscation mechanism to ensure that the obfuscation distribution is indistinguishable across multiple users. By leveraging the constraint structure of the LP formulation, we apply Benders' decomposition to further boost computational efficiency. Our theoretical analysis confirms that, even though geo-obfuscation is calculated independently for each user, it still adheres to geo-indistinguishability constraints across multiple users with high probability. Finally, experimental results using a real-world dataset demonstrate that LR-Geo outperforms existing geo-obfuscation methods in terms of computational time, data utility, and privacy protection.
△ Less
Submitted 29 August, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Unsupervised and Interpretable Synthesizing for Electrical Time Series Based on Information Maximizing Generative Adversarial Nets
Authors:
Zhenghao Zhou,
Yiyan Li,
Runlong Liu,
Zheng Yan,
Mo-Yuen Chow
Abstract:
Generating synthetic data has become a popular alternative solution to deal with the difficulties in accessing and sharing field measurement data in power systems. However, to make the generation results controllable, existing methods (e.g. Conditional Generative Adversarial Nets, cGAN) require labeled dataset to train the model, which is demanding in practice because many field measurement data l…
▽ More
Generating synthetic data has become a popular alternative solution to deal with the difficulties in accessing and sharing field measurement data in power systems. However, to make the generation results controllable, existing methods (e.g. Conditional Generative Adversarial Nets, cGAN) require labeled dataset to train the model, which is demanding in practice because many field measurement data lacks descriptive labels. In this paper, we introduce the Information Maximizing Generative Adversarial Nets (infoGAN) to achieve interpretable feature extraction and controllable synthetic data generation based on the unlabeled electrical time series dataset. Features with clear physical meanings can be automatically extracted by maximizing the mutual information between the input latent code and the classifier output of infoGAN. Then the extracted features are used to control the generation results similar to a vanilla cGAN framework. Case study is based on the time series datasets of power load and renewable energy output. Results demonstrate that infoGAN can extract both discrete and continuous features with clear physical meanings, as well as generating realistic synthetic time series that satisfy given features.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
Authors:
Xuhui Liu,
Zhi Qiao,
Runkun Liu,
Hong Li,
Juan Zhang,
Xiantong Zhen,
Zhen Qian,
Baochang Zhang
Abstract:
Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific res…
▽ More
Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
Authors:
Ruibo Fu,
Rui Liu,
Chunyu Qiang,
Yingming Gao,
Yi Lu,
Shuchen Shi,
Tao Wang,
Ya Li,
Zhengqi Wen,
Chen Zhang,
Hui Bu,
Yukun Liu,
Xin Qi,
Guanjun Li
Abstract:
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept…
▽ More
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper.
△ Less
Submitted 31 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation
Authors:
Maohan Liang,
Ryan Wen Liu,
Ruobin Gao,
Zhe Xiao,
Xiaocai Zhang,
Hua Wang
Abstract:
Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I…
▽ More
Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. Initially, we conducted a thorough literature review using relevant keywords to gather and summarize pertinent research papers and datasets. Then, this paper discussed the principal methods of data pre-processing that prepare data for further analysis. The survey progresses to detail the leading algorithms for measuring vessel trajectory similarity and the main clustering techniques used in the field today. Furthermore, the various applications of trajectory clustering within the maritime context are explored. Finally, the paper evaluates the effectiveness of different algorithm combinations and pre-processing methods through experimental analysis, focusing on their impact on the performance of distance-based trajectory clustering algorithms. The experimental results demonstrate the effectiveness of various trajectory clustering algorithms and notably highlight the significant improvements that trajectory compression techniques contribute to the efficiency and accuracy of trajectory clustering. This comprehensive approach ensures a deep understanding of current capabilities and future directions in vessel trajectory clustering.
△ Less
Submitted 19 July, 2024; v1 submitted 13 July, 2024;
originally announced July 2024.
-
Enhancing Building Safety Design for Active Shooter Incidents: Exploration of Building Exit Parameters using Reinforcement Learning-Based Simulations
Authors:
Ruying Liu,
Wanjing Wu,
Burcin Becerik-Gerber,
Gale M. Lucas
Abstract:
With the alarming rise in active shooter incidents (ASIs) in the United States, enhancing public safety through building design has become a pressing need. This study proposes a reinforcement learning-based simulation approach addressing gaps in existing research that has neglected the dynamic behaviours of shooters. We developed an autonomous agent to simulate an active shooter within a realistic…
▽ More
With the alarming rise in active shooter incidents (ASIs) in the United States, enhancing public safety through building design has become a pressing need. This study proposes a reinforcement learning-based simulation approach addressing gaps in existing research that has neglected the dynamic behaviours of shooters. We developed an autonomous agent to simulate an active shooter within a realistic office environment, aiming to offer insights into the interactions between building design parameters and ASI outcomes. A case study is conducted to quantitatively investigate the impact of building exit numbers (total count of accessible exits) and configuration (arrangement of which exits are available or not) on evacuation and harm rates. Findings demonstrate that greater exit availability significantly improves evacuation outcomes and reduces harm. Exits nearer to the shooter's initial position hold greater importance for accessibility than those farther away. By encompassing dynamic shooter behaviours, this study offers preliminary insights into effective building safety design against evolving threats.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Exploiting Scale-Variant Attention for Segmenting Small Medical Objects
Authors:
Wei Dai,
Rui Liu,
Zixuan Wu,
Tianyi Wu,
Min Wang,
Junxian Zhou,
Yixuan Yuan,
Jun Liu
Abstract:
Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise…
▽ More
Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.
△ Less
Submitted 5 August, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Distributed multi-robot potential-field-based exploration with submap-based mapping and noise-augmented strategy
Authors:
Khattiya Pongsirijinda,
Zhiqiang Cao,
Kaushik Bhowmik,
Muhammad Shalihan,
Billy Pik Lik Lau,
Ran Liu,
Chau Yuen,
U-Xuan Tan
Abstract:
Multi-robot collaboration has become a needed component in unknown environment exploration due to its ability to accomplish various challenging situations. Potential-field-based methods are widely used for autonomous exploration because of their high efficiency and low travel cost. However, exploration speed and collaboration ability are still challenging topics. Therefore, we propose a Distribute…
▽ More
Multi-robot collaboration has become a needed component in unknown environment exploration due to its ability to accomplish various challenging situations. Potential-field-based methods are widely used for autonomous exploration because of their high efficiency and low travel cost. However, exploration speed and collaboration ability are still challenging topics. Therefore, we propose a Distributed Multi-Robot Potential-Field-Based Exploration (DMPF-Explore). In particular, we first present a Distributed Submap-Based Multi-Robot Collaborative Mapping Method (DSMC-Map), which can efficiently estimate the robot trajectories and construct the global map by merging the local maps from each robot. Second, we introduce a Potential-Field-Based Exploration Strategy Augmented with Modified Wave-Front Distance and Colored Noises (MWF-CN), in which the accessible frontier neighborhood is extended, and the colored noise provokes the enhancement of exploration performance. The proposed exploration method is deployed for simulation and real-world scenarios. The results show that our approach outperforms the existing ones regarding exploration speed and collaboration ability.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
ActionVOS: Actions as Prompts for Video Object Segmentation
Authors:
Liangyang Ouyang,
Ruicong Liu,
Yifei Huang,
Ryosuke Furuta,
Yoichi Sato
Abstract:
Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in understanding human activities. However, existing RVOS task primarily relies on static attributes such as object names to segment target objects, posing challenges in distinguishing target objects from background objects and in identifying objects undergoing state changes…
▽ More
Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in understanding human activities. However, existing RVOS task primarily relies on static attributes such as object names to segment target objects, posing challenges in distinguishing target objects from background objects and in identifying objects undergoing state changes. To address these problems, this work proposes a novel action-aware RVOS setting called ActionVOS, aiming at segmenting only active objects in egocentric videos using human actions as a key language prompt. This is because human actions precisely describe the behavior of humans, thereby helping to identify the objects truly involved in the interaction and to understand possible state changes. We also build a method tailored to work under this specific setting. Specifically, we develop an action-aware labeling module with an efficient action-guided focal loss. Such designs enable ActionVOS model to prioritize active objects with existing readily-available annotations. Experimental results on VISOR dataset reveal that ActionVOS significantly reduces the mis-segmentation of inactive objects, confirming that actions help the ActionVOS model understand objects' involvement. Further evaluations on VOST and VSCOS datasets show that the novel ActionVOS setting enhances segmentation performance when encountering challenging circumstances involving object state changes. We will make our implementation available at https://github.com/ut-vision/ActionVOS.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Evidence for large baryonic feedback at low and intermediate redshifts from kinematic Sunyaev-Zel'dovich observations with ACT and DESI photometric galaxies
Authors:
B. Hadzhiyska,
S. Ferraro,
B. Ried Guachalla,
E. Schaan,
J. Aguilar,
N. Battaglia,
J. R. Bond,
D. Brooks,
E. Calabrese,
S. K. Choi,
T. Claybaugh,
W. R. Coulton,
K. Dawson,
M. Devlin,
B. Dey,
P. Doel,
A. J. Duivenvoorden,
J. Dunkley,
G. S. Farren,
A. Font-Ribera,
J. E. Forero-Romero,
P. A. Gallardo,
E. Gaztañaga,
S. Gontcho Gontcho,
M. Gralla
, et al. (48 additional authors not shown)
Abstract:
Recent advances in cosmological observations have provided an unprecedented opportunity to investigate the distribution of baryons relative to the underlying matter. In this work, we robustly show that the gas is much more extended than the dark matter at 40$σ$ and the amount of baryonic feedback at $z \lesssim 1$ strongly disfavors low-feedback models such as that of state-of-the-art hydrodynamic…
▽ More
Recent advances in cosmological observations have provided an unprecedented opportunity to investigate the distribution of baryons relative to the underlying matter. In this work, we robustly show that the gas is much more extended than the dark matter at 40$σ$ and the amount of baryonic feedback at $z \lesssim 1$ strongly disfavors low-feedback models such as that of state-of-the-art hydrodynamical simulation IllustrisTNG compared with high-feedback models such as that of the original Illustris simulation. This has important implications for bridging the gap between theory and observations and understanding galaxy formation and evolution. Furthermore, a better grasp of the baryon-dark matter link is critical to future cosmological analyses, which are currently impeded by our limited knowledge of baryonic feedback. Here, we measure the kinematic Sunyaev-Zel'dovich (kSZ) effect from the Atacama Cosmology Telescope (ACT), stacked on the luminous red galaxy (LRG) sample of the Dark Energy Spectroscopic Instrument (DESI) imaging survey. This is the first analysis to use photometric redshifts for reconstructing galaxy velocities. Due to the large number of galaxies comprising the DESI imaging survey, this is the highest signal-to-noise stacked kSZ measurement to date: we detect the signal at 13$σ$ and find that the gas is more spread out than the dark matter at $\sim$40$σ$. Our work opens up the possibility to recalibrate large hydrodynamical simulations using the kSZ effect. In addition, our findings point towards a way of alleviating inconsistencies between weak lensing surveys and cosmic microwave background (CMB) experiments such as the `low $S_8$' tension, and shed light on long-standing enigmas in astrophysics such as the `missing baryon' problem.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Authors:
Mingfang Zhang,
Yifei Huang,
Ruicong Liu,
Yoichi Sato
Abstract:
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric action recognition, the potential of IMUs remains under-explored. In this work, we present a novel method for action recognition that integrates motio…
▽ More
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric action recognition, the potential of IMUs remains under-explored. In this work, we present a novel method for action recognition that integrates motion data from body-worn IMUs with egocentric video. Due to the scarcity of labeled multimodal data, we design an MAE-based self-supervised pretraining method, obtaining strong multi-modal representations via modeling the natural correlation between visual and motion signals. To model the complex relation of multiple IMU devices placed across the body, we exploit the collaborative dynamics in multiple IMU devices and propose to embed the relative motion features of human joints into a graph structure. Experiments show our method can achieve state-of-the-art performance on multiple public datasets. The effectiveness of our MAE-based pretraining and graph-based IMU modeling are further validated by experiments in more challenging scenarios, including partially missing IMU devices and video quality corruption, promoting more flexible usages in the real world.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making
Authors:
Yangyang Yu,
Zhiyuan Yao,
Haohang Li,
Zhiyang Deng,
Yupeng Cao,
Zhi Chen,
Jordan W. Suchow,
Rong Liu,
Zhenyu Cui,
Denghui Zhang,
Koduvayur Subbalakshmi,
Guojun Xiong,
Yueru He,
Jimin Huang,
Dong Li,
Qianqian Xie
Abstract:
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and man…
▽ More
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Analytic Convolutional Layer: A Step to Analytic Neural Network
Authors:
Jingmao Cui,
Donglai Tao,
Linmi Tao,
Ruiyang Liu,
Yu Cheng
Abstract:
The prevailing approach to embedding prior knowledge within convolutional layers typically includes the design of steerable kernels or their modulation using designated kernel banks. In this study, we introduce the Analytic Convolutional Layer (ACL), an innovative model-driven convolutional layer, which is a mosaic of analytical convolution kernels (ACKs) and traditional convolution kernels. ACKs…
▽ More
The prevailing approach to embedding prior knowledge within convolutional layers typically includes the design of steerable kernels or their modulation using designated kernel banks. In this study, we introduce the Analytic Convolutional Layer (ACL), an innovative model-driven convolutional layer, which is a mosaic of analytical convolution kernels (ACKs) and traditional convolution kernels. ACKs are characterized by mathematical functions governed by analytic kernel parameters (AKPs) learned in training process. Learnable AKPs permit the adaptive update of incorporated knowledge to align with the features representation of data. Our extensive experiments demonstrate that the ACLs not only have a remarkable capacity for feature representation with a reduced number of parameters but also attain increased reliability through the analytical formulation of ACKs. Furthermore, ACLs offer a means for neural network interpretation, thereby paving the way for the intrinsic interpretability of neural network. The source code will be published in company with the paper.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Authors:
Daliang Xu,
Hao Zhang,
Liming Yang,
Ruiqi Liu,
Gang Huang,
Mengwei Xu,
Xuanzhe Liu
Abstract:
On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well…
▽ More
On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well as the lack of parallel computing capacity of mobile CPU/GPU.
To enable practical on-device LLM, we present mllm-NPU, the first-of-its-kind LLM inference system that efficiently leverages on-device Neural Processing Unit (NPU) offloading. Essentially, mllm-NPU is an algorithm-system co-design that tackles a few semantic gaps between the LLM architecture and contemporary NPU design. Specifically, it re-constructs the prompt and model in three levels: (1) At prompt level, it divides variable-length prompts into multiple fixed-sized chunks while maintaining data dependencies; (2) At tensor level, it identifies and extracts significant outliers to run on the CPU/GPU in parallel with minimal overhead; (3) At block level, it schedules Transformer blocks in an out-of-order manner to the CPU/GPU and NPU based on their hardware affinity and sensitivity to accuracy. Compared to competitive baselines, mllm-NPU achieves 22.4x faster prefill speed and 30.7x energy savings on average, and up to 32.8x speedup in an end-to-end real-world application. For the first time, mllm-NPU achieves more than 1,000 tokens/sec prefilling for a billion-sized model (Qwen1.5-1.8B), paving the way towards practical on-device LLM.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.