Search | arXiv e-print repository

On the Optimal MMSE Channel Estimation for One-Bit Quantized MIMO Systems

Authors: Minhua Ding, Italo Atzeni, Antti Tölli, A. Lee Swindlehurst

Abstract: This paper focuses on the minimum mean squared error (MMSE) channel estimator for multiple-input multiple-output (MIMO) systems with one-bit quantization at the receiver side. Despite its optimality and significance in estimation theory, the MMSE channel estimator has not been fully investigated in this context due to its general non-linearity and computational complexity. Instead, the typically s… ▽ More This paper focuses on the minimum mean squared error (MMSE) channel estimator for multiple-input multiple-output (MIMO) systems with one-bit quantization at the receiver side. Despite its optimality and significance in estimation theory, the MMSE channel estimator has not been fully investigated in this context due to its general non-linearity and computational complexity. Instead, the typically suboptimal Bussgang linear MMSE (BLMMSE) estimator has been widely adopted. In this work, we develop a new framework to compute the MMSE channel estimator that hinges on computation of the orthant probability of the multivariate normal distribution. Based on this framework, we determine a necessary and sufficient condition for the BLMMSE channel estimator to be optimal and equivalent to the MMSE estimator. Under the assumption of specific channel correlation or pilot symbols, we further utilize the framework to derive analytical expressions for the MMSE channel estimator that are particularly convenient for computation when certain system dimensions become large, thereby enabling a comparison between the BLMMSE and MMSE channel estimators in these cases. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: IEEE journal submission, 13 pages, 5 figures

MSC Class: 94A12; 94A05

arXiv:2404.05252 [pdf, other]

doi 10.1038/s41550-024-02244-5

A model for heating the super-hot corona in solar active regions

Authors: Zekun Lu, Feng Chen, M. D. Ding, Can Wang, Yu Dai, Xin Cheng

Abstract: What physical mechanisms heat the outer solar or stellar atmosphere to million-Kelvin temperatures is a fundamental but long-standing open question. In particular, the solar corona in active region cores contains an even hotter component reaching ten million Kelvin, manifesting as persistent coronal loops in extreme ultraviolet and soft X-ray images, which imposes a more stringent energy budget. H… ▽ More What physical mechanisms heat the outer solar or stellar atmosphere to million-Kelvin temperatures is a fundamental but long-standing open question. In particular, the solar corona in active region cores contains an even hotter component reaching ten million Kelvin, manifesting as persistent coronal loops in extreme ultraviolet and soft X-ray images, which imposes a more stringent energy budget. Here, we present a self-consistent coronal heating model using a state-of-the-art three-dimensional radiative magnetohydrodynamics simulation. We find that the continuous magnetic flux emergence in active regions keeps driving magnetic reconnections that release energy impulsively but, on time average, persistently. As a result, numerous sub-structures are heated to ten million Kelvin and then evolve independently, which collectively form long-lived and stable coronal loops as in observations. This provides a heating model explaining the origin of the super-hot coronal plasma and the persistence of hot coronal loops in emerging active regions. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 34 pages, 14 figures

Journal ref: 8 April 2024, Nature Astronomy

arXiv:2404.01465 [pdf, ps, other]

Mahonian-Stirling statistics for partial permutations

Authors: Ming-Jian Ding, Jiang Zeng

Abstract: Recently Cheng et al. (Adv. in Appl. Math. 143 (2023) 102451) generalized the inversion number to partial permutations, which are also known as Laguerre digraphs, and asked for a suitable analogue of MacMahon's major index. We provide such a major index, namely, the corresponding maj and inv statistics are equidistributed, and exhibit a Haglund-Remmel-Wilson type identity. We then interpret some J… ▽ More Recently Cheng et al. (Adv. in Appl. Math. 143 (2023) 102451) generalized the inversion number to partial permutations, which are also known as Laguerre digraphs, and asked for a suitable analogue of MacMahon's major index. We provide such a major index, namely, the corresponding maj and inv statistics are equidistributed, and exhibit a Haglund-Remmel-Wilson type identity. We then interpret some Jacobi-Rogers polynomials in terms of Laguerre digraphs generalizing Deb and Sokal's alternating Laguerre digraph interpretation of some special Jacobi-Rogers polynomials. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.15779 [pdf, other]

The Frontier of Data Erasure: Machine Unlearning for Large Language Models

Authors: Youyang Qu, Ming Ding, Nan Sun, Kanchana Thilakarathna, Tianqing Zhu, Dusit Niyato

Abstract: Large Language Models (LLMs) are foundational to AI advancements, facilitating applications like predictive text generation. Nonetheless, they pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information from their vast datasets. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns, offering techniques for LLMs to selectively disc… ▽ More Large Language Models (LLMs) are foundational to AI advancements, facilitating applications like predictive text generation. Nonetheless, they pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information from their vast datasets. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns, offering techniques for LLMs to selectively discard certain data. This paper reviews the latest in machine unlearning for LLMs, introducing methods for the targeted forgetting of information to address privacy, ethical, and legal challenges without necessitating full model retraining. It divides existing research into unlearning from unstructured/textual data and structured/classification data, showcasing the effectiveness of these approaches in removing specific data while maintaining model efficacy. Highlighting the practicality of machine unlearning, this analysis also points out the hurdles in preserving model integrity, avoiding excessive or insufficient data removal, and ensuring consistent outputs, underlining the role of machine unlearning in advancing responsible, ethical AI. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.10180 [pdf, other]

An efficient asymptotic DC method for sparse and low-rank matrix recovery

Authors: Mingcai Ding, Xiaoliang Song, Bo Yu

Abstract: The optimization problem of sparse and low-rank matrix recovery is considered, which involves a least squares problem with a rank constraint and a cardinality constraint. To overcome the challenges posed by these constraints, an asymptotic difference-of-convex (ADC) method that employs a Moreau smoothing approach and an exact penalty approach is proposed to transform this problem into a DC program… ▽ More The optimization problem of sparse and low-rank matrix recovery is considered, which involves a least squares problem with a rank constraint and a cardinality constraint. To overcome the challenges posed by these constraints, an asymptotic difference-of-convex (ADC) method that employs a Moreau smoothing approach and an exact penalty approach is proposed to transform this problem into a DC programming format gradually. To solve the gained DC programming, by making full use of its DC structure, an efficient inexact DC algorithm with sieving strategy (siDCA) is introduced. The subproblem of siDCA is solved by an efficient dual-based semismooth Newton method. The convergence of the solution sequence generated by siDCA is proved. To illustrate the effectiveness of ADC-siDCA, matrix recovery experiments on nonnegative and positive semidefinite matrices. The numerical results are compared with those obtained using a successive DC approximation minimization method and a penalty proximal alternating linearized minimization approach. The outcome of the comparison indicates that ADC-siDCA surpasses the other two methods in terms of efficiency and recovery error. Additionally, numerical experiments on sparse phase retrieval demonstrate that ADC-siDCA is a valuable tool for recovering sparse and low-rank Hermitian matrices. △ Less

Submitted 15 March, 2024; originally announced March 2024.

MSC Class: 65K05; 90C06; 90C26; 91G80

arXiv:2403.09011 [pdf, other]

Sun-as-a-star Study of an X-class Solar Flare with Spectroscopic Observations of CHASE

Authors: Y. L. Ma, Q. H. Lao, X. Cheng, B. T. Wang, Z. H. Zhao, S. H. Rao, C. Li, M. D. Ding

Abstract: Sun-as-a-star spectroscopic characteristics of solar flares can be used as a benchmark for the detection and analyses of stellar flares. Here, we study the Sun-as-a-star properties of an X1.0 solar flare using high-resolution spectroscopic data obtained by the Chinese $\mathrm{H} α$ Solar Explorer (CHASE). A noise reduction algorithm based on discrete Fourier transformation is first employed to en… ▽ More Sun-as-a-star spectroscopic characteristics of solar flares can be used as a benchmark for the detection and analyses of stellar flares. Here, we study the Sun-as-a-star properties of an X1.0 solar flare using high-resolution spectroscopic data obtained by the Chinese $\mathrm{H} α$ Solar Explorer (CHASE). A noise reduction algorithm based on discrete Fourier transformation is first employed to enhance the signal-to-noise ratio of the space-integral $\mathrm{H} α$ spectrum with a focus on its typical characteristics. For the flare of interest, we find that the average $\mathrm{H} α$ profile displays a strong emission at the line center and an obvious line broadening. It also presents a clear red asymmetry, corresponding to a redshift velocity of around $50 \ \mathrm{km \ s^{-1}}$ that slightly decreases with time, consistent with previous results. Furthermore, we study how the size of the space-integral region affects the characteristics of the flare Sun-as-a-star $\mathrm{H} α$ profile. It is found that although the redshift velocity calculated from the $\mathrm{H} α$ profile remains unchanged, the detectability of the characteristics weakens as the space-integral region becomes large. An upper limit for the size of the target region where the red asymmetry is detectable is estimated. It is also found that the intensity in $\mathrm{H} α$ profiles, measured by the equivalent widths of the spectra, are significantly underestimated if the $\mathrm{H} α$ spectra are further averaged in the time domain. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08125 [pdf, other]

Q-SLAM: Quadric Representations for Monocular SLAM

Authors: Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan

Abstract: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirement… ▽ More Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirements of SLAM. We identify that the gap results from the volumetric representations used in NeRF, which are often dense and noisy. In this study, we propose a novel approach that reimagines volumetric representations through the lens of quadric forms. We posit that most scene components can be effectively represented as quadric planes. Leveraging this assumption, we reshape the volumetric representations with million of cubes by several quadric planes, which leads to more accurate and efficient modeling of 3D scenes in SLAM contexts. Our method involves two key steps: First, we use the quadric assumption to enhance coarse depth estimations obtained from tracking modules, e.g., Droid-SLAM. This step alone significantly improves depth estimation accuracy. Second, in the subsequent mapping phase, we diverge from previous NeRF-based SLAM methods that distribute sampling points across the entire volume space. Instead, we concentrate sampling points around quadric planes and aggregate them using a novel quadric-decomposed Transformer. Additionally, we introduce an end-to-end joint optimization strategy that synchronizes pose estimation with 3D reconstruction. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07470 [pdf, other]

doi 10.1109/LRA.2024.3441493

DrPlanner: Diagnosis and Repair of Motion Planners for Automated Vehicles Using Large Language Models

Authors: Yuanfei Lin, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Matthias Althoff

Abstract: Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners us… ▽ More Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, our framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using both search- and sampling-based motion planners for automated vehicles; experimental results highlight the need for demonstrations in the prompt and show the ability of our framework to effectively identify and rectify elusive issues. △ Less

Submitted 7 August, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: @2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2403.05121 [pdf, other]

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Authors: Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the… ▽ More Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0\% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.02110 [pdf, ps, other]

Generalized Coronal Loop Scaling Laws and Their Implication for Turbulence in Solar Active Region Loops

Authors: Y. Dai, J. J. Xiang, M. D. Ding

Abstract: Recent coronal loop modeling has emphasized the importance of combining both Coulomb collisions and turbulent scattering to characterize field-aligned thermal conduction, which invokes a hybrid loop model. In this work we generalize the hybrid model by incorporating nonuniform heating and cross section that are both formulated by a power-law function of temperature. Based on the hybrid model solut… ▽ More Recent coronal loop modeling has emphasized the importance of combining both Coulomb collisions and turbulent scattering to characterize field-aligned thermal conduction, which invokes a hybrid loop model. In this work we generalize the hybrid model by incorporating nonuniform heating and cross section that are both formulated by a power-law function of temperature. Based on the hybrid model solutions, we construct scaling laws that relate loop-top temperature ($T_a$) and heating rate ($H_a$) to other loop parameters. It is found that the loop-top properties for turbulent loops are additionally power-law functions of turbulent mean free path ($λ_T$), with the functional forms varying from situation to situation that depends on the specification of the heating and/or areal parameters. More importantly, both a sufficiently footpoint-concentrated heating and a cross-sectional expansion with height can effectively weaken (strengthen) the negative (positive) power-law dependence of $T_a$ ($H_a$) on $λ_T$. The reason lies in a notable reduction of heat flux by footpoint heating and/or cross-sectional expansion in the turbulence-dominated coronal part, where turbulent scattering introduces a much weaker dependence of the conduction coefficient on temperature. In this region, therefore, the reduction of the heat flux predominately relies on a backward flattening of the temperature gradient. Through numerical modeling that incorporates more realistic conditions, this scenario is further consolidated. Our results have important implication for solar active region (AR) loops. With the factors of nonuniform heating and cross section taken into account, AR loops can bear relatively stronger turbulence while still keeping a physically reasonable temperature for nonflaring loops. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 18 pages, 8 figures and 2 tables, accepted for publication in The Astrophysical Journal

arXiv:2403.01937

Examining the critical phenomenon of pion parton distribution: Insights from the Moment Problem

Authors: Xiaobin Wang, Zexin Wu, Minghui Ding, Lei Chang

Abstract: A recent study by Wang {\it et al.}(arXiv:2309.01417) proposed a novel connection between the nature of the parton distribution function (PDF) and the characteristics of its moments. In this study, we apply these findings to analyze the evolution of the pion valence quark PDF, garnering valuable qualitative insights. Firstly, we validate the non-negativity and continuity of the PDF across a wide r… ▽ More A recent study by Wang {\it et al.}(arXiv:2309.01417) proposed a novel connection between the nature of the parton distribution function (PDF) and the characteristics of its moments. In this study, we apply these findings to analyze the evolution of the pion valence quark PDF, garnering valuable qualitative insights. Firstly, we validate the non-negativity and continuity of the PDF across a wide range of scales, indicating the logical consistency of our chosen evolution scheme. Subsequently, we examine the unimodality of both the PDF and its transformed counterpart, the xPDF, i.e., the parton distribution function multiplied by the momentum fraction. We observe a smooth evolution of the peak position of the xPDF towards the small-$x$ region with increasing scale, while intriguingly, the PDF undergoes a phase of bimodal competition as the energy scale evolves. △ Less

Submitted 15 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: The judgment made based on finite-order moments information about the distribution function (DF) is insufficient

arXiv:2403.01815 [pdf, other]

Lithium Abundances from the LAMOST Med-Resolution Survey Data Release 9

Authors: Ming-Yi Ding, Jian-Rong Shi, Hong-liang Yan, Chun-Qian Li, Qi Gao, Tian-Yi Chen, Jing-Hua Zhang, Shuai Liu, Xiao-Jin Xie, Yao-Jia Tang, Ze-Ming Zhou, Jiang-Tao Wang

Abstract: Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolu… ▽ More Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolution of lithium. In this work, we use the Li 6,707.8 Å line to derive the lithium abundance through a template-matching method. A catalog of precise lithium abundance is presented for 795,384 spectra corresponding to 455,752 stars from the LAMOST MRS Data Release (DR) 9. Comparing our results with those of external high-resolution references we find a good consistency with a typical deviation of σ A(Li) ~ 0.2 dex. We also analyze the internal errors using stars that have multiple LAMOST MRS observations, which will reach as low as 0.1 dex when the signal-to-noise ratio (S/N) of the spectra > 20. Besides, our result indicates that a small fraction of giant stars still exhibit surprisingly high amount of lithium contents, and 967 stars are identified as Li-rich giants with A(Li) > 1.5 dex, accounting for ~ 2.6% of our samples. If one takes into account the fact that nearly all stars deplete lithium during the main sequence, then the fraction of Li-rich stars may exceed 2.6% much. This new catalog covers a wide range of stellar evolutionary stages from pre-main sequence to giants, and will provide help to the further study of the chemical evolution of lithium. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 16 pages, 9 figures, 2 tables. Accepted for publication in ApJS

arXiv:2402.17238 [pdf, other]

Does Negative Sampling Matter? A Review with Insights into its Theory and Applications

Authors: Zhen Yang, Ming Ding, Tinglin Huang, Yukuo Cen, Junshuai Song, Bin Xu, Yuxiao Dong, Jie Tang

Abstract: Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling me… ▽ More Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling methods? In what fields is it applied? Addressing these questions, we propose a general framework that leverages negative sampling. Delving into the history of negative sampling, we trace the development of negative sampling through five evolutionary paths. We dissect and categorize the strategies used to select negative sample candidates, detailing global, local, mini-batch, hop, and memory-based approaches. Our review categorizes current negative sampling methods into five types: static, hard, GAN-based, Auxiliary-based, and In-batch methods, providing a clear structure for understanding negative sampling. Beyond detailed categorization, we highlight the application of negative sampling in various areas, offering insights into its practical benefits. Finally, we briefly discuss open problems and future directions for negative sampling. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 20 pages, 11 figures

arXiv:2402.16836 [pdf, other]

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

Authors: Dingkun Guo, Yuqi Xiang, Shuqi Zhao, Xinghao Zhu, Masayoshi Tomizuka, Mingyu Ding, Wei Zhan

Abstract: Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for o… ▽ More Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before. This work delves into infusing such physical commonsense reasoning into robotic manipulation. We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds, seamlessly integrated through a bridge module. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With these two capabilities, PhyGrasp is able to accurately assess the physical properties of object parts and determine optimal grasping poses. Additionally, the model's language comprehension enables human instruction interpretation, generating grasping poses that align with human preferences. To train PhyGrasp, we construct a dataset PhyPartNet with 195K object instances with varying physical properties and human preferences, alongside their corresponding language descriptions. Extensive experiments conducted in the simulation and on the real robots demonstrate that PhyGrasp achieves state-of-the-art performance, particularly in long-tailed cases, e.g., about 10% improvement in success rate over GraspNet. Project page: https://sites.google.com/view/phygrasp △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16679 [pdf, other]

Unveiling the Initiation Route of Coronal Mass Ejections through their Slow Rise Phase

Authors: Chen Xing, Guillaume Aulanier, Xin Cheng, Chun Xia, Mingde Ding

Abstract: Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here… ▽ More Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here, with a state-of-the-art thermal-magnetohydrodynamics simulation, we determine a complete CME initiation route in which multiple mainstream mechanisms occur in sequence yet are tightly coupled. The slow rise is first triggered and driven by the developing hyperbolic flux tube (HFT) reconnection. Subsequently, the slow rise continues as driven by the coupling of the HFT reconnection and the early development of torus instability. The end of the slow rise, i.e., the onset of the impulsive acceleration, is induced by the start of the fast magnetic reconnection coupled with the torus instability. These results unveil that the CME initiation is a complicated process involving multiple physical mechanisms, thus being hardly resolved by a single initiation mechanism. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 35 pages, 15 figures, accepted for publication in ApJ

arXiv:2402.16117 [pdf, other]

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various scenarios. In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints, and applies code generation to introduce generalization ability across various robotics platforms. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning. Extensive experiments demonstrate that RoboCodeX achieves state-of-the-art performance in both simulators and real robots on four different kinds of manipulation tasks and one navigation task. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.14623 [pdf, other]

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

Abstract: Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c… ▽ More Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental components of autonomous robot systems including robot perception, motion planning, and control. To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language. The RobotScript platform addresses this gap by emphasizing the unified interface with both simulation and real robots, based on abstraction from the Robot Operating System (ROS), ensuring syntax compliance and simulation validation with Gazebo. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms, and multiple grippers. Additionally, our benchmark assesses reasoning abilities for physical space and constraints, highlighting the differences between GPT-3.5, GPT-4, and Gemini in handling complex physical interactions. Finally, we present a thorough evaluation on the whole system, exploring how each module in the pipeline: code generation, perception, motion planning, and even object geometric properties, impact the overall performance of the system. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 10 pages of main paper, 4 pages of appendix; 10 figures in main paper, 3 figures in appendix

ACM Class: I.2.7; I.2.8; I.2.9; I.2.10

arXiv:2402.14209 [pdf, other]

Developing an Automated Detection, Tracking and Analysis Method for Solar Filaments Observed by CHASE via Machine Learning

Authors: Z. Zheng, Q. Hao, Y. Qiu, J. Hong, C. Li, M. D. Ding

Abstract: Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity t… ▽ More Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity to systematically explore and analyze the plasma dynamics of filaments. The dramatically increased observation data require automate processing and analysis which are impossible if dealt with manually. In this paper, we utilize the U-Net model to identify filaments and implement the Channel and Spatial Reliability Tracking (CSRT) algorithm for automated filament tracking. In addition, we use the cloud model to invert the line-of-sight velocity of filaments and employ the graph theory algorithm to extract the filament spine, which can advance our understanding of the dynamics of filaments. The favorable test performance confirms the validity of our method, which will be implemented in the following statistical analyses of filament features and dynamics of CHASE/HIS observations. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures, Accepted for publication in ApJ

arXiv:2402.12957 [pdf, other]

Energy-Efficient Wireless Federated Learning via Doubly Adaptive Quantization

Authors: Xuefeng Han, Wen Chen, Jun Li, Ming Ding, Qingqing Wu, Kang Wei, Xiumei Deng, Zhen Mei

Abstract: Federated learning (FL) has been recognized as a viable distributed learning paradigm for training a machine learning model across distributed clients without uploading raw data. However, FL in wireless networks still faces two major challenges, i.e., large communication overhead and high energy consumption, which are exacerbated by client heterogeneity in dataset sizes and wireless channels. Whil… ▽ More Federated learning (FL) has been recognized as a viable distributed learning paradigm for training a machine learning model across distributed clients without uploading raw data. However, FL in wireless networks still faces two major challenges, i.e., large communication overhead and high energy consumption, which are exacerbated by client heterogeneity in dataset sizes and wireless channels. While model quantization is effective for energy reduction, existing works ignore adapting quantization to heterogeneous clients and FL convergence. To address these challenges, this paper develops an energy optimization problem of jointly designing quantization levels, scheduling clients, allocating channels, and controlling computation frequencies (QCCF) in wireless FL. Specifically, we derive an upper bound identifying the influence of client scheduling and quantization errors on FL convergence. Under the longterm convergence constraints and wireless constraints, the problem is established and transformed into an instantaneous problem with Lyapunov optimization. Solving Karush-Kuhn-Tucker conditions, our closed-form solution indicates that the doubly adaptive quantization level rises with the training process and correlates negatively with dataset sizes. Experiment results validate our theoretical results, showing that QCCF consumes less energy with faster convergence compared with state-of-the-art baselines. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10473 [pdf, other]

Privacy for Fairness: Information Obfuscation for Fair Representation Learning with Local Differential Privacy

Authors: Songjie Xie, Youlong Wu, Jiaxuan Li, Ming Ding, Khaled B. Letaief

Abstract: As machine learning (ML) becomes more prevalent in human-centric applications, there is a growing emphasis on algorithmic fairness and privacy protection. While previous research has explored these areas as separate objectives, there is a growing recognition of the complex relationship between privacy and fairness. However, previous works have primarily focused on examining the interplay between p… ▽ More As machine learning (ML) becomes more prevalent in human-centric applications, there is a growing emphasis on algorithmic fairness and privacy protection. While previous research has explored these areas as separate objectives, there is a growing recognition of the complex relationship between privacy and fairness. However, previous works have primarily focused on examining the interplay between privacy and fairness through empirical investigations, with limited attention given to theoretical exploration. This study aims to bridge this gap by introducing a theoretical framework that enables a comprehensive examination of their interrelation. We shall develop and analyze an information bottleneck (IB) based information obfuscation method with local differential privacy (LDP) for fair representation learning. In contrast to many empirical studies on fairness in ML, we show that the incorporation of LDP randomizers during the encoding process can enhance the fairness of the learned representation. Our analysis will demonstrate that the disclosure of sensitive information is constrained by the privacy budget of the LDP randomizer, thereby enabling the optimization process within the IB framework to effectively suppress sensitive information while preserving the desired utility through obfuscation. Based on the proposed method, we further develop a variational representation encoding approach that simultaneously achieves fairness and LDP. Our variational encoding approach offers practical advantages. It is trained using a non-adversarial method and does not require the introduction of any variational prior. Extensive experiments will be presented to validate our theoretical results and demonstrate the ability of our proposed approach to achieve both LDP and fairness while preserving adequate utility. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08931 [pdf, other]

Depth-aware Volume Attention for Texture-less Stereo Matching

Authors: Tong Zhao, Mingyu Ding, Wei Zhan, Masayoshi Tomizuka, Yintao Wei

Abstract: Stereo matching plays a crucial role in 3D perception and scenario understanding. Despite the proliferation of promising methods, addressing texture-less and texture-repetitive conditions remains challenging due to the insufficient availability of rich geometric and semantic information. In this paper, we propose a lightweight volume refinement scheme to tackle the texture deterioration in practic… ▽ More Stereo matching plays a crucial role in 3D perception and scenario understanding. Despite the proliferation of promising methods, addressing texture-less and texture-repetitive conditions remains challenging due to the insufficient availability of rich geometric and semantic information. In this paper, we propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios. Specifically, we introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture. Subsequently, the disparity discrepancy volume undergoes hierarchical filtering through the incorporation of depth-aware hierarchy attention and target-aware disparity attention modules. Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation. Furthermore, we propose a more rigorous evaluation metric that considers depth-wise relative error, providing comprehensive evaluations for universal stereo matching and depth estimation models. We extensively validate the superiority of our proposed methods on public datasets. Results demonstrate that our model achieves state-of-the-art performance, particularly excelling in scenarios with texture-less images. The code is available at https://github.com/ztsrxh/DVANet. △ Less

Submitted 26 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: 10 pages, 6 figures

arXiv:2402.07374 [pdf, ps, other]

The White-light Emissions in Two X-class Flares Observed by ASO-S and CHASE

Authors: Ying Li, Zhichen Jing, De-Chao Song, Qiao Li, Jun Tian, Xiaofeng Liu, Ya Wang, M. D. Ding, Andrea Francesco Battaglia, Li Feng, Hui Li, Weiqun Gan

Abstract: The white-light continuum emissions in solar flares (i.e., white-light flares) are usually observed on the solar disk but, in a few cases, off the limb. Here we present on-disk as well as off-limb continuum emissions at 3600 Å (in the Balmer continuum) in an X2.1 flare (SOL2023-03-03T17:52) and an X1.5 flare (SOL2023-08-07T20:46), respectively, observed by the White-light Solar Telescope (WST) on… ▽ More The white-light continuum emissions in solar flares (i.e., white-light flares) are usually observed on the solar disk but, in a few cases, off the limb. Here we present on-disk as well as off-limb continuum emissions at 3600 Å (in the Balmer continuum) in an X2.1 flare (SOL2023-03-03T17:52) and an X1.5 flare (SOL2023-08-07T20:46), respectively, observed by the White-light Solar Telescope (WST) on the Advanced Space-based Solar Observatory (ASO-S). These continuum emissions are seen at the ribbons for the X2.1 flare and on loops during the X1.5 event, in which the latter also appears in the decay phase. These emissions also show up in the pseudo-continuum images at Fe I λ6173 from the Helioseismic and Magnetic Imager (HMI) on the Solar Dynamics Observatory (SDO). In addition, the ribbon sources in the X2.1 flare exhibit significant enhancements in the Fe I line at 6569.2 Å and the nearby continuum observed by the Chinese Hα Solar Explorer (CHASE). It is found that the on-disk continuum emissions in the X2.1 flare are related to a nonthermal electron-beam heating either directly or indirectly, while the off-limb emissions in the X1.5 flare are associated with thermal plasma cooling or due to Thomson scattering. These comprehensive continuum observations can provide good constraints on flare energy deposition models, which helps well understand the physical mechanism of white-light flares. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 13 pages, 1 table, 4 figures, accepted for publication in ApJL

arXiv:2402.06682 [pdf, other]

Private Knowledge Sharing in Distributed Learning: A Survey

Authors: Yasas Supeksala, Dinh C. Nguyen, Ming Ding, Thilina Ranbaduge, Calson Chua, Jun Zhang, Jun Li, H. Vincent Poor

Abstract: The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven se… ▽ More The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes. In line with this goal, the latest AI models are frequently trained in a decentralized manner. Distributed learning involves multiple entities working together to make collective predictions and decisions. However, this collaboration can also bring about security vulnerabilities and challenges. This paper provides an in-depth survey on private knowledge sharing in distributed learning, examining various knowledge components utilized in leading distributed learning architectures. Our analysis sheds light on the most critical vulnerabilities that may arise when using these components in a distributed setting. We further identify and examine defensive strategies for preserving the privacy of these knowledge components and preventing malicious parties from manipulating or accessing the knowledge information. Finally, we highlight several key limitations of knowledge sharing in distributed learning and explore potential avenues for future research. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Manuscript submitted to ACM

arXiv:2402.04236 [pdf, other]

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

Authors: Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

Abstract: Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems… ▽ More Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems (e.g., marking, zoom in), this paper introduces Chain of Manipulations, a mechanism that enables VLMs to solve problems step-by-step with evidence. After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) with results (e.g., boxes, image) actively without involving external tools, while also allowing users to trace error causes. We study the roadmap to implement this mechanism, including (1) a flexible design of manipulations upon extensive analysis, (2) an efficient automated data generation pipeline, (3) a compatible VLM architecture capable of multi-turn multi-image, and (4) a model training process for versatile capabilities. With the design, we also manually annotate 6K high-quality samples for the challenging graphical mathematical problems. Our trained model, \textbf{CogCoM}, equipped with this mechanism with 17B parameters achieves state-of-the-art performance across 9 benchmarks from 4 categories, demonstrating the effectiveness while preserving the interpretability. Our code, model weights, and collected data are publicly available at https://github.com/THUDM/CogCoM. △ Less

Submitted 22 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 19 pages, 9 figures

arXiv:2401.16730 [pdf, other]

doi 10.3847/2041-8213/ad1e4f

Three-dimensional velocity fields of the solar filament eruptions detected by CHASE

Authors: Ye Qiu, Chuan Li, Yang Guo, Zhen Li, Mingde Ding, Linggao Kong

Abstract: The eruption of solar filaments, also known as prominences appearing off-limb, is a common phenomenon in the solar atmosphere. It ejects massive plasma and high-energy particles into interplanetary space, disturbing the solar-terrestrial environment. It is vital to obtain the three-dimensional velocity fields of erupting filaments for space-weather predictions. We derive the three-dimensional kine… ▽ More The eruption of solar filaments, also known as prominences appearing off-limb, is a common phenomenon in the solar atmosphere. It ejects massive plasma and high-energy particles into interplanetary space, disturbing the solar-terrestrial environment. It is vital to obtain the three-dimensional velocity fields of erupting filaments for space-weather predictions. We derive the three-dimensional kinematics of an off-limb prominence and an on-disk filament, respectively, using the full-disk spectral and imaging data detected by the Chinese H$α$ Solar Explorer (CHASE). It is found that both the prominence and the filament experience a fast semicircle-shaped expansion at first. The prominence keeps propagating outward with an increasing velocity until escaping successfully, whereas the south leg of the prominence finally moves back to the Sun in a swirling manner. For the filament, the internal plasma falls back to the Sun associated with an anticlockwise rotation in the late ejection, matching the failed eruption without a coronal mass ejection. During the eruptions, both the prominence and the filament show material splitting along the line-of-sight direction, revealed by the bimodal H$α$ spectral profiles. For the prominence, the splitting begins at the top and gradually spreads to almost the whole prominence with a fast blue-shift component and a slow red-shift component. The material splitting in the filament is more fragmental. As shown by the present results, the CHASE full-disk spectroscopic observations make it possible to systematically study the three-dimensional kinematics of solar filament eruptions. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 12 pages, 5 figures

Journal ref: The Astrophysical Journal Letters, 961 (2024), 2, L30

arXiv:2401.11477 [pdf, ps, other]

Some properties of generalized cluster algebras of geometric types

Authors: Junyuan Huang, Xueqing Chen, Fan Xu, Ming Ding

Abstract: We study the lower bound algebras generated by the generalized projective cluster variables of acyclic generalized cluster algebras of geometric types. We prove that this lower bound algebra coincides with the corresponding generalized cluster algebra under the coprimality condition. As a corollary, we obtain the dual PBW bases of these generalized cluster algebras. Moreover, we show that if the s… ▽ More We study the lower bound algebras generated by the generalized projective cluster variables of acyclic generalized cluster algebras of geometric types. We prove that this lower bound algebra coincides with the corresponding generalized cluster algebra under the coprimality condition. As a corollary, we obtain the dual PBW bases of these generalized cluster algebras. Moreover, we show that if the standard monomials of a generalized cluster algebra of geometric type are linearly independent, then the directed graph associated to the initial generalized seed of this generalized cluster algebra does not have 3-cycles. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 18 pages

arXiv:2401.08573 [pdf, other]

WAVES: Benchmarking the Robustness of Image Watermarks

Authors: Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang

Abstract: In the burgeoning age of generative AI, watermarks act as identifiers of provenance and artificial content. We present WAVES (Watermark Analysis Via Enhanced Stress-testing), a benchmark for assessing image watermark robustness, overcoming the limitations of current evaluation methods. WAVES integrates detection and identification tasks and establishes a standardized evaluation protocol comprised… ▽ More In the burgeoning age of generative AI, watermarks act as identifiers of provenance and artificial content. We present WAVES (Watermark Analysis Via Enhanced Stress-testing), a benchmark for assessing image watermark robustness, overcoming the limitations of current evaluation methods. WAVES integrates detection and identification tasks and establishes a standardized evaluation protocol comprised of a diverse range of stress tests. The attacks in WAVES range from traditional image distortions to advanced, novel variations of diffusive, and adversarial attacks. Our evaluation examines two pivotal dimensions: the degree of image quality degradation and the efficacy of watermark detection after attacks. Our novel, comprehensive evaluation reveals previously undetected vulnerabilities of several modern watermarking algorithms. We envision WAVES as a toolkit for the future development of robust watermarks. The project is available at https://wavesbench.github.io/ △ Less

Submitted 6 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted by ICML 2024

arXiv:2401.08402 [pdf, other]

Uniform Recovery Guarantees for Quantized Corrupted Sensing Using Structured or Generative Priors

Authors: Junren Chen, Zhaoqiang Liu, Meng Ding, Michael K. Ng

Abstract: This paper studies quantized corrupted sensing where the measurements are contaminated by unknown corruption and then quantized by a dithered uniform quantizer. We establish uniform guarantees for Lasso that ensure the accurate recovery of all signals and corruptions using a single draw of the sub-Gaussian sensing matrix and uniform dither. For signal and corruption with structured priors (e.g., s… ▽ More This paper studies quantized corrupted sensing where the measurements are contaminated by unknown corruption and then quantized by a dithered uniform quantizer. We establish uniform guarantees for Lasso that ensure the accurate recovery of all signals and corruptions using a single draw of the sub-Gaussian sensing matrix and uniform dither. For signal and corruption with structured priors (e.g., sparsity, low-rankness), our uniform error rate for constrained Lasso typically coincides with the non-uniform one [Sun, Cui and Liu, 2022] up to logarithmic factors. By contrast, our uniform error rate for unconstrained Lasso exhibits worse dependence on the structured parameters due to regularization parameters larger than the ones for non-uniform recovery. For signal and corruption living in the ranges of some Lipschitz continuous generative models (referred to as generative priors), we achieve uniform recovery via constrained Lasso with a measurement number proportional to the latent dimensions of the generative models. Our treatments to the two kinds of priors are (nearly) unified and share the common key ingredients of (global) quantized product embedding (QPE) property, which states that the dithered uniform quantization (universally) preserves inner product. As a by-product, our QPE result refines the one in [Xu and Jacques, 2020] under sub-Gaussian random matrix, and in this specific instance we are able to sharpen the uniform error decaying rate (for the projected-back projection estimator with signals in some convex symmetric set) presented therein from $O(m^{-1/16})$ to $O(m^{-1/8})$. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 69 pages, 11 figures (In Review)

arXiv:2401.04783 [pdf, other]

Hyperbolic Machine Learning Moment Closures for the BGK Equations

Authors: Andrew J. Christlieb, Mingchang Ding, Juntao Huang, Nicholas A. Krupansky

Abstract: We introduce a hyperbolic closure for the Grad moment expansion of the Bhatnagar-Gross-Krook's (BGK) kinetic model using a neural network (NN) trained on BGK's moment data. This closure is motivated by the exact closure for the free streaming limit that we derived in our paper on closures in transport \cite{Huang2022-RTE1}. The exact closure relates the gradient of the highest moment to the gradie… ▽ More We introduce a hyperbolic closure for the Grad moment expansion of the Bhatnagar-Gross-Krook's (BGK) kinetic model using a neural network (NN) trained on BGK's moment data. This closure is motivated by the exact closure for the free streaming limit that we derived in our paper on closures in transport \cite{Huang2022-RTE1}. The exact closure relates the gradient of the highest moment to the gradient of four lower moments. As with our past work, the model presented here learns the gradient of the highest moment in terms of the coefficients of gradients for all lower ones. By necessity, this means that the resulting hyperbolic system is not conservative in the highest moment. For stability, the output layers of the NN are designed to enforce hyperbolicity and Galilean invariance. This ensures the model can be run outside of the training window of the NN. Unlike our previous work on radiation transport that dealt with linear models, the BGK model's nonlinearity demanded advanced training tools. These comprised an optimal learning rate discovery, one cycle training, batch normalization in each neural layer, and the use of the \texttt{AdamW} optimizer. To address the non-conservative structure of the hyperbolic model, we adopt the FORCE numerical method to achieve robust solutions. This results in a comprehensive computing model combining learned closures with methods for solving hyperbolic models. The proposed model can capture accurate moment solutions across a broad spectrum of Knudsen numbers. Our paper details the multi-scale model construction and is run on a range of test problems. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 30 pages, 7 figures

MSC Class: 82C32 (Primary); 82C40; 82C70 (Secondary)

arXiv:2312.16415 [pdf, other]

Deterministic Minimum Steiner Cut in Maximum Flow Time

Authors: Matthew Ding, Jason Li

Abstract: We devise a deterministic algorithm for minimum Steiner cut, which uses $(\log n)^{O(1)}$ maximum flow calls and additional near-linear time. This algorithm improves on Li and Panigrahi's (FOCS 2020) algorithm, which uses $(\log n)^{O(1/ε^4)}$ maximum flow calls and additional $O(m^{1+ε})$ time, for $ε> 0$. Our algorithm thus shows that deterministic minimum Steiner cut can be solved in maximum fl… ▽ More We devise a deterministic algorithm for minimum Steiner cut, which uses $(\log n)^{O(1)}$ maximum flow calls and additional near-linear time. This algorithm improves on Li and Panigrahi's (FOCS 2020) algorithm, which uses $(\log n)^{O(1/ε^4)}$ maximum flow calls and additional $O(m^{1+ε})$ time, for $ε> 0$. Our algorithm thus shows that deterministic minimum Steiner cut can be solved in maximum flow time up to polylogarithmic factors, given any black-box deterministic maximum flow algorithm. Our main technical contribution is a novel deterministic graph decomposition method for terminal vertices that generalizes all existing $s$-strong partitioning methods, which we believe may have future applications. △ Less

Submitted 1 July, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 18 pages, 1 figure, to appear at ESA 2024

ACM Class: G.2.2; F.2.2

arXiv:2312.15589 [pdf, other]

doi 10.1051/0004-6361/202347564

A Method for Determining the Locations and Configurations of Magnetic Reconnection within 3D Turbulent Plasmas

Authors: Yulei Wang, Xin Cheng, Yang Guo, Jinhan Guo, Mingde Ding

Abstract: Context. Three-dimensional (3D) reconnection is an important mechanism for efficiently releasing energy during astrophysical eruptive events, which is difficult to be quantitatively analyzed especially within turbulent plasmas. Aims. In this paper, an efficient method for identifying locations and configurations of 3D reconnection from MHD data is developed. Methods. This method analyzes the l… ▽ More Context. Three-dimensional (3D) reconnection is an important mechanism for efficiently releasing energy during astrophysical eruptive events, which is difficult to be quantitatively analyzed especially within turbulent plasmas. Aims. In this paper, an efficient method for identifying locations and configurations of 3D reconnection from MHD data is developed. Methods. This method analyzes the local nonideal electric field and magnetic structure at an arbitrary position. As only performing algebraical manipulations on the discrete field data and avoiding computationally expensive operations like field-line tracing and root-finding, this method naturally possesses high efficiency. To validate this method, we apply it to the 3D data from a high-resolution simulation of a Harris-sheet reconnection and a data-driven simulation of a coronal flux rope eruption. Results. It is shown that this method can precisely identify the local structures of discrete magnetic field. Through the information of nonideal electric field and the geometric attributes of magnetic field, the local structures of reconnection sites can be effectively and comprehensively determined. For fine turbulent processes, both qualitative pictures and quantitative statistical properties of small-scale reconnection structures can be obtained. For large-scale solar simulations, macro-scale magnetic structures such as flux ropes and eruption current sheets can also be recognized. Conclusions. We develop a powerful method to analyze multi-scale structures of 3D reconnection. It can be applied not only in MHD simulations but also in kinetic simulations, plasma experiments, and in-situ observations. △ Less

Submitted 26 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 19 pages, 14 figures, 4 tables. Accepted for publication in Astronomy & Astrophysics. The code URL: https://github.com/RainthunderWYL/LoRD

arXiv:2312.12378 [pdf, other]

doi 10.3847/1538-4357/ad09d7

A Statistical Study of Soft X-ray Flares on Solar-type Stars

Authors: Zhanhao Zhao, Ziqian Hua, Xin Cheng, Zhiyuan Li, Mingde Ding

Abstract: The statistical characteristic of stellar flares at optical bands has received an extensive study, but it remains to be studied at soft X-ray bands, in particular for solar-type stars. Here, we present a statistical study of soft X-ray flares on solar-type stars, which can help understand multi-wavelength behaviors of stellar flares. We mainly use Chandra Source Catalog Release 2.0, which includes… ▽ More The statistical characteristic of stellar flares at optical bands has received an extensive study, but it remains to be studied at soft X-ray bands, in particular for solar-type stars. Here, we present a statistical study of soft X-ray flares on solar-type stars, which can help understand multi-wavelength behaviors of stellar flares. We mainly use Chandra Source Catalog Release 2.0, which includes a number of flaring stars with denoted variability, and Gaia Data Release 3, which includes necessary information for classifying stars. We also develop a set of methods for identifying and classifying stellar soft X-ray flares and estimating their properties. A detailed statistical investigation for 129 flare samples on 103 nearby solar-type stars as selected yields the following main results. (1) The flare energy emitted at the soft X-ray band in our sample ranges from $\sim 10^{33}$ to $\sim 10^{37} \ \mathrm{erg}$, and the majority of them are superflares with the most energetic one having energy of $6.0_{-4.7}^{+3.2} \times 10^{37} \ \mathrm{erg}$. (2) The flare duration is related to its energy as formulated by $T_\mathrm{duration,SXR} \propto E_\mathrm{flare,SXR}^{\ 0.201 \pm 0.024}$, which is different from those derived at optical and NIR bands, indicating distinct radiation mechanisms at different bands. (3) The frequency distribution of stellar flares as a function of energy is formulated as $\mathrm{d} N_\mathrm{flare} / \mathrm{d} E_\mathrm{flare,SXR} \propto E_\mathrm{flare,SXR}^{\ -1.77}$, which is similar to the results found at other bands and on other types of stars, indicating that the energy emitted at the soft X-ray band could be a constant fraction of the full-band bolometric energy. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2312.11598 [pdf, other]

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Authors: Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo

Abstract: Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion p… ▽ More Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion planning to address this problem. At the higher level, the skill abstraction module learns discrete, human-understandable skill representations from visual observations and language instructions. These learned skill embeddings are then used to condition the diffusion model to generate customized latent trajectories aligned with the skills. This allows generating diverse state trajectories that adhere to the learnable skills. By integrating skill learning with conditional trajectory generation, SkillDiffuser produces coherent behavior following abstract instructions across diverse tasks. Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser. More visualization results and information could be found on our website. △ Less

Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024. Camera ready version. Project page: https://skilldiffuser.github.io/

arXiv:2312.11562 [pdf, other]

A Survey of Reasoning with Foundation Models

Authors: Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng , et al. (9 additional authors not shown)

Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring… ▽ More Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI. △ Less

Submitted 25 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

arXiv:2312.08914 [pdf, other]

CogAgent: A Visual Language Model for GUI Agents

Authors: Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billi… ▽ More People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogAgent supports input at a resolution of 1120*1120, enabling it to recognize tiny page elements and text. As a generalist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK-VQA, Text-VQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent, using only screenshots as input, outperforms LLM-based methods that consume extracted HTML text on both PC and Android GUI navigation tasks -- Mind2Web and AITW, advancing the state of the art. The model and codes are available at https://github.com/THUDM/CogVLM . △ Less

Submitted 21 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 27 pages, 19 figures

arXiv:2312.08323 [pdf, other]

PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Authors: Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Yi Yu, Jie Yang, Yun Gu

Abstract: Precise boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention, especially for boundary confusion in clinical practice. However, U-shape networks cannot effectively resolve this challenge due to the lack of boundary shape constraints. Besides, existing methods of refining boundaries overemphasize the slender structure, which resu… ▽ More Precise boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention, especially for boundary confusion in clinical practice. However, U-shape networks cannot effectively resolve this challenge due to the lack of boundary shape constraints. Besides, existing methods of refining boundaries overemphasize the slender structure, which results in the overfitting phenomenon due to networks' limited abilities to model tiny objects. In this paper, we reconceptualize the mechanism of boundary generation by encompassing the interaction dynamics with adjacent regions. Moreover, we propose a unified network termed PnPNet to model shape characteristics of the confused boundary region. Core ingredients of PnPNet contain the pushing and pulling branches. Specifically, based on diffusion theory, we devise the semantic difference module (SDM) from the pushing branch to squeeze the boundary region. Explicit and implicit differential information inside SDM significantly boost representation abilities for inter-class boundaries. Additionally, motivated by the K-means algorithm, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary region. Thus, pushing and pulling branches will shrink and enlarge the boundary uncertainty respectively. They furnish two adversarial forces to promote models to output a more precise delineation of boundaries. We carry out experiments on three challenging public datasets and one in-house dataset, containing three types of boundary confusion in model predictions. Experimental results demonstrate the superiority of PnPNet over other segmentation networks, especially on evaluation metrics of HD and ASSD. Besides, pushing and pulling branches can serve as plug-and-play modules to enhance classic U-shape baseline models. Codes are available. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 13 Figures, 8 Tables

arXiv:2312.07999 [pdf, other]

Random Serial Dictatorship with Transfers

Authors: Sudharsan Sundar, Eric Gao, Trevor Chow, Matthew Ding

Abstract: It is well known that Random Serial Dictatorship is strategy-proof and leads to a Pareto-Efficient outcome. We show that this result breaks down when individuals are allowed to make transfers, and adapt Random Serial Dictatorship to encompass trades between individuals. Strategic analysis of play under the new mechanisms we define is given, accompanied by simulations to quantify the gains from tra… ▽ More It is well known that Random Serial Dictatorship is strategy-proof and leads to a Pareto-Efficient outcome. We show that this result breaks down when individuals are allowed to make transfers, and adapt Random Serial Dictatorship to encompass trades between individuals. Strategic analysis of play under the new mechanisms we define is given, accompanied by simulations to quantify the gains from trade. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.07532 [pdf, other]

Interfacing Foundation Models' Embeddings

Authors: Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Junyi Wei, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

Abstract: Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in the teaser figure, a lightweight transformer interface without tuning any… ▽ More Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in the teaser figure, a lightweight transformer interface without tuning any foundation model weights is enough for segmentation, grounding, and retrieval in an interleaved manner. The proposed interface has the following favorable attributes: (1) Generalizable. It applies to various tasks spanning retrieval, segmentation, etc., under the same architecture and weights. (2) Interleavable. With the benefit of multi-task multi-modal training, the proposed interface creates an interleaved shared embedding space. (3) Extendable. The proposed interface is adaptive to new tasks, and new models. In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval. We are the first work aligning foundations models' embeddings for interleave understanding. Meanwhile, our approach achieves state-of-the-art performance on FIND-Bench and competitive performance on standard retrieval and segmentation settings. △ Less

Submitted 15 July, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: CODE: https://github.com/UX-Decoder/FIND

arXiv:2312.07406 [pdf, other]

Rotation and Confined Eruption of a Double Flux-Rope System

Authors: Xiaomeng Zhang, Jinhan Guo, Yang Guo, Mingde Ding, Rony Keppens

Abstract: We perform a data-constrained simulation with the zero-$β$ assumption to study the mechanisms of strong rotation and failed eruption of a filament in active region 11474 on 2012 May 5 observed by Solar Dynamics Observatory and Solar Terrestrial Relations Observatory. The initial magnetic field is provided by nonlinear force-free field extrapolation, which is reconstructed by the regularized Biot-S… ▽ More We perform a data-constrained simulation with the zero-$β$ assumption to study the mechanisms of strong rotation and failed eruption of a filament in active region 11474 on 2012 May 5 observed by Solar Dynamics Observatory and Solar Terrestrial Relations Observatory. The initial magnetic field is provided by nonlinear force-free field extrapolation, which is reconstructed by the regularized Biot-Savart laws and magnetofrictional method. Our simulation reproduces most observational features very well, e.g., the filament large-angle rotation of about $130 ^{\circ}$, the confined eruption and the flare ribbons, allowing us to analyze the underlying physical processes behind observations. We discover two flux ropes in the sigmoid system, an upper flux rope (MFR1) and a lower flux rope (MFR2), which correspond to the filament and hot channel in observations, respectively. Both flux ropes undergo confined eruptions. MFR2 grows by tether-cutting reconnection during the eruption. The rotation of MFR1 is related to the shear-field component along the axis. The toroidal field tension force and the non-axisymmetry forces confine the eruption of MFR1. We also suggest that the mutual interaction between MFR1 and MFR2 contributes to the large-angle rotation and the eruption failure. In addition, we calculate the temporal evolution of the twist and writhe of MFR1, which is a hint of probable reversal rotation. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 18 pages, 7 figures, Accepted for publication in ApJ

arXiv:2312.06722 [pdf, other]

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

Authors: Yi Chen, Yuying Ge, Yixiao Ge, Mingyu Ding, Bohao Li, Rui Wang, Ruifeng Xu, Ying Shan, Xihui Liu

Abstract: The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and s… ▽ More The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all codes, data, and a maintained benchmark leaderboard available to advance future research. △ Less

Submitted 11 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Project released at: https://github.com/ChenYi99/EgoPlan

arXiv:2312.04010 [pdf, ps, other]

On a conjecture of transposed Poisson $n$-Lie algebras

Authors: Junyuan Huang, Xueqing Chen, Zhiqi Chen, Ming Ding

Abstract: In this paper, we obtain a rich family of identities for transposed Poisson $n$-Lie algebras, and then prove the conjecture of Bai, Bai, Guo and Wu in \cite{BBGW} under certain strong condition. In this paper, we obtain a rich family of identities for transposed Poisson $n$-Lie algebras, and then prove the conjecture of Bai, Bai, Guo and Wu in \cite{BBGW} under certain strong condition. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 25 pages

arXiv:2312.03440 [pdf, ps, other]

doi 10.1021/acs.jpcc.3c06132

Reduction of Interlayer Interaction in Multilayer Stacking Graphene with Carbon Nanotube Insertion: Insights from Experiment and Simulation

Authors: Mingda Ding, Taiki Inoue, John Isaac Enriquez, Harry Handoko Halim, Yui Ogawa, Yoshitaka Taniyasu, Yuji Hamamoto, Yoshitada Morikawa, Yoshihiro Kobayashi

Abstract: The creation of multilayer graphene (Gr), while preserving the brilliant properties of monolayer Gr derived from its unique band structure, can expand the application field of Gr to the macroscale. However, the energy-favorable AB stacking structure in the multilayer Gr induces a strong interlayer interaction and alters the band structure. Consequently, the intrinsic properties of each monolayer a… ▽ More The creation of multilayer graphene (Gr), while preserving the brilliant properties of monolayer Gr derived from its unique band structure, can expand the application field of Gr to the macroscale. However, the energy-favorable AB stacking structure in the multilayer Gr induces a strong interlayer interaction and alters the band structure. Consequently, the intrinsic properties of each monolayer are degraded. In this work, we insert carbon nanotubes (CNTs) as nanospacers to modulate the microstructure of multilayer stacking Gr. Nanospacers can increase the interlayer distance and reduce the interlayer interaction. The Gr/CNT stacking structure is experimentally fabricated using a dry transfer method in a layer-by-layer manner. Raman spectroscopy verifies the reduction in the interlayer interaction within the stacking structure. Atomic force microscopy shows an increase in the interlayer distance, which can explain the weakening of the interlayer interactions. The microstructure of the stacked Gr and CNTs is studied by molecular dynamics simulation to systematically investigate the effect of CNT insertion. We found that the distribution distance, size, and arrangement of the CNT can modulate the interlayer distance. These results will help us to understand and improve the properties of the composite systems consisting of Gr and CNTs. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: This is the submitted version before review and revision

Journal ref: J. Phys. Chem. C 2023

arXiv:2312.00855 [pdf, other]

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

Authors: Shuchi Wu, Chuan Ma, Kang Wei, Xiaogang Xu, Ming Ding, Yuwen Qian, Tao Xiang

Abstract: This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine… ▽ More This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses. △ Less

Submitted 10 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 25 pages, 12 figures, 15 tables

arXiv:2311.17637 [pdf, other]

Observations of a Failed Solar Filament Eruption Involving External Reconnection

Authors: Yuehong Chen, Xin Cheng, Jun Chen, Yu Dai, Mingde Ding

Abstract: We report a failed solar filament eruption that involves external magnetic reconnection in a quadrupolar magnetic configuration. The evolution exhibits three kinematic evolution phases: a slow-rise phase, an acceleration phase, and a deceleration phase. In the early slow rise, extreme-ultraviolet (EUV) brightenings appear at the expected null point above the filament and are connected to the outer… ▽ More We report a failed solar filament eruption that involves external magnetic reconnection in a quadrupolar magnetic configuration. The evolution exhibits three kinematic evolution phases: a slow-rise phase, an acceleration phase, and a deceleration phase. In the early slow rise, extreme-ultraviolet (EUV) brightenings appear at the expected null point above the filament and are connected to the outer polarities by the hot loops, indicating the occurrence of a breakout reconnection. Subsequently, the filament is accelerated outward, accompanied by the formation of low-lying high-temperature post-flare loops ($>$ 15 MK), complying with the standard flare model. However, after 2--3 minutes, the erupting filament starts to decelerate and is finally confined in the corona. The important finding is that the confinement is closely related to an external reconnection as evidenced by the formation of high-lying large-scale hot loops ($>$ 10 MK) with their brightened footpoints at the outer polarities, the filament fragmentation and subsequent falling along the newly formed large-scale loops, as well as a hard X-ray source close to one of the outer footpoint brightenings. We propose that, even though the initial breakout reconnection and subsequent flare reconnection commence and accelerate the filament eruption, the following external reconnection between the erupting flux rope and overlying field, as driven by the upward filament eruption, makes the eruption finally failed, as validated by the numerical simulation of a failed flux rope eruption. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted by ApJ

arXiv:2311.14832 [pdf, other]

Pion and kaon electromagnetic and gravitational form factors

Authors: Yin-Zhen Xu, Minghui Ding, Khépani Raya, Craig D. Roberts, José Rodríguez-Quintero, Sebastian M. Schmidt

Abstract: A unified set of predictions for pion and kaon elastic electromagnetic and gravitational form factors is obtained using a symmetry-preserving truncation of each relevant quantum field equation. A key part of the study is a description of salient aspects of the dressed graviton + quark vertices. The calculations reveal that each meson's mass radius is smaller than its charge radius, matching availa… ▽ More A unified set of predictions for pion and kaon elastic electromagnetic and gravitational form factors is obtained using a symmetry-preserving truncation of each relevant quantum field equation. A key part of the study is a description of salient aspects of the dressed graviton + quark vertices. The calculations reveal that each meson's mass radius is smaller than its charge radius, matching available empirical inferences; and meson core pressures are commensurate with those in neutron stars. The analysis described herein paves the way for a direct calculation of nucleon gravitational form factors. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 12 pages, 4 figures, 3 tables

Report number: NJU-INP 081/23

arXiv:2311.14531 [pdf, other]

Formation of a long filament through the connection of two filament segments observed by CHASE

Authors: H. T. Li, X. Cheng, Y. W. Ni, C. Li, S. H. Rao, J. H. Guo, M. D. Ding, P. F. Chen

Abstract: We present imaging and spectroscopic diagnostics of a long filament during its formation with the observations from the Chinese H$α$ Solar Explorer and Solar Dynamics Observatory. The seed filament first appeared at about 05:00 UT on 2022 September 13. Afterwards, it grew gradually and connected to another filament segment nearby, building up a long filament at about 20:00 UT on the same day. The… ▽ More We present imaging and spectroscopic diagnostics of a long filament during its formation with the observations from the Chinese H$α$ Solar Explorer and Solar Dynamics Observatory. The seed filament first appeared at about 05:00 UT on 2022 September 13. Afterwards, it grew gradually and connected to another filament segment nearby, building up a long filament at about 20:00 UT on the same day. The CHASE H$α$ spectra show an obvious centroid absorption with mild broadening at the main spine of the long filament, which is interpreted as the evidence of filament material accumulation. More interestingly, near the footpoints of the filament, persistent redshifts have been detected in the H$α$ spectra during the filament formation, indicating continuous drainage of filament materials. Furthermore, through inspecting the extreme ultraviolet images and magnetograms, it is found that EUV jets and brightenings appeared repeatedly at the junction of the two filament segments, where opposite magnetic polarities converged and canceled to each other continuously. These results suggest the occurrence of intermittent magnetic reconnection that not only connects magnetic structures of the two filament segments but also supplies cold materials for the filament channel likely by the condensation of injected hot plasma, even though a part of cold materials fall down to the filament footpoints at the same time. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 11 pages, 6 figures, Accepted for publication in ApJL

arXiv:2311.06497 [pdf, other]

DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding

Authors: Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda

Abstract: Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and freque… ▽ More Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model. △ Less

Submitted 13 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.05191 [pdf, other]

Determining Sources in the Bioluminescence Tomography Problem

Authors: Ming-Hui Ding, Rongfang Gong, Hongyu Liu, Catharine W. K. Lo

Abstract: In this paper, we revisit the bioluminescence tomography (BLT) problem, where one seeks to reconstruct bioluminescence signals (an internal light source) from external measurements of the Cauchy data. As one kind of optical imaging, the BLT has many merits such as high signal-to-noise ratio, non-destructivity and cost-effectiveness etc., and has potential applications such as cancer diagnosis, dru… ▽ More In this paper, we revisit the bioluminescence tomography (BLT) problem, where one seeks to reconstruct bioluminescence signals (an internal light source) from external measurements of the Cauchy data. As one kind of optical imaging, the BLT has many merits such as high signal-to-noise ratio, non-destructivity and cost-effectiveness etc., and has potential applications such as cancer diagnosis, drug discovery and development as well as gene therapies and so on. In the literature, BLT is extensively studied based on diffusion approximation (DA) equation, where the distribution of peak sources is to be reconstructed and no solution uniqueness is guaranteed without adequate a priori information. Motivated by the solution uniqueness issue, several theoretical results are explored. The major contributions in this work that are new to the literature are two-fold: first, we show the theoretical uniqueness of the BLT problem where the light sources are in the shape of $C^2$ domains or polyhedral- or corona-shaped; second, we support our results with plenty of problem-orientated numerical experiments. △ Less

Submitted 9 November, 2023; originally announced November 2023.

MSC Class: Primary 35R30; secondary 78A46; 92C55; 35Q60; 78A70

arXiv:2311.03865 [pdf, other]

doi 10.24963/ijcai.2024/57

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Authors: Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou

Abstract: Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used dur… ▽ More Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used during training by analyzing the model's prediction scores. However, our investigations reveal that these score-based MIAs are ineffective when targeting fairness-enhanced models in binary classifications. The attack models trained to launch the MIAs degrade into simplistic threshold models, resulting in lower attack performance. Meanwhile, we observe that fairness methods often lead to prediction performance degradation for the majority subgroups of the training data. This raises the barrier to successful attacks and widens the prediction gaps between member and non-member data. Building upon these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). It leverages the difference in the predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Extensive experiments validate our findings and demonstrate the efficacy of the proposed method. △ Less

Submitted 26 August, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted by IJCAI 2024

arXiv:2311.03079 [pdf, other]

CogVLM: Visual Expert for Pretrained Language Models

Authors: Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision… ▽ More We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and ranks the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. Codes and checkpoints are available at https://github.com/THUDM/CogVLM. △ Less

Submitted 4 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Showing 51–100 of 733 results for author: Ding, M