Search | arXiv e-print repository

Hill Function-based Model of Transcriptional Response: Impact of Nonspecific Binding and RNAP Interactions

Authors: Wenjia Shi, Yao Ma, Peilin Hu, Mi Pang, Xiaona Huang, Yiting Dang, Yuxin Xie, Danni Wu

Abstract: Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical i… ▽ More Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical information can Hill function provide in addition to fitting. Here, started from the interactions between TFs and RNA polymerase during transcription regulation and both of their association-dissociation reactions at specific/nonspecific sites on DNA, the regulatory effect of TFs was deduced as fold change. We found that, for weak promoter, fold change can degrade into the regulatory factor (Freg) which is closely correlated with Hill function. By directly comparing and fitting with Hill function, the fitting parameters and corresponding biochemical reaction parameters in Freg were analyzed and discussed, where the single TF and multiple TFs that with cooperativity and basic logic effects were considered. We concluded the strength of promoter and interactions between TFs determine whether Hill function can reflect the corresponding biochemical information. Our findings highlight the role of Hill function in modeling/fitting for transcriptional regulation, which also benefits the preparation of synthetic regulatory elements. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.01047 [pdf]

doi 10.1088/1361-6404/ad2393

Student Understanding of the Bloch Sphere

Authors: Peter Hu, Yangqiuting Li, Roger S. K. Mong, Chandralekha Singh

Abstract: Quantum information science is a rapidly growing interdisciplinary field that is attracting the attention of academics and industry experts alike. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities, it is important to give them a strong foundation in the basics of q… ▽ More Quantum information science is a rapidly growing interdisciplinary field that is attracting the attention of academics and industry experts alike. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities, it is important to give them a strong foundation in the basics of quantum information science, in which quantum computing plays a central role. In this study, we discuss the development, validation, and evaluation of a tutorial on the Bloch sphere, a useful visual tool for developing intuition about single quantum bits (qubits), which are the basic building block of any quantum computer. Students' understanding was evaluated after they received traditional lecture-based instruction on the requisite topics, and again after engaging with the tutorial. We observe, analyze, and discuss their improvement in performance on concepts covered in the tutorial. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 19 pages, 1 figure

Journal ref: European Journal of Physics 45 025705 (2024)

arXiv:2402.18080 [pdf]

doi 10.1088/1361-6404/ac9ba3

Challenges in addressing student difficulties with measurement uncertainty of two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students knowledge. We present findings from the implementation, in consecutive years, of research-validated multiple-choice question sequence on measurement uncertainty as it applies to two-state quantum systems. This study was conducted in… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students knowledge. We present findings from the implementation, in consecutive years, of research-validated multiple-choice question sequence on measurement uncertainty as it applies to two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course, in an online and in-person learning environments in consecutive years. Student learning was assessed after receiving traditional lecture-based instruction in relevant concepts, and their performance was compared with that on a similar assessment given after engaging with the multiple-choice question sequence. We analyze and discuss the similar and differing trends observed in the two modes of instruction. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 23 pages, 0 figures

Journal ref: European Journal of Physics 44 015702 (2022)

arXiv:2402.18075 [pdf]

doi 10.1088/1361-6404/ac49f4

Challenges in addressing student difficulties with time-development of two-state quantum systems using a multiple-choice question sequence in virtual and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated clicker questions as instructional tools for formative assessment are relatively easy to implement and can provide effective scaffolding when developed and implemented in a sequence. We present findings from the implementation of a research-validated clicker question sequence (CQS) on student understanding of the time-development of two-state quantum systems. This study was cond… ▽ More Research-validated clicker questions as instructional tools for formative assessment are relatively easy to implement and can provide effective scaffolding when developed and implemented in a sequence. We present findings from the implementation of a research-validated clicker question sequence (CQS) on student understanding of the time-development of two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course for two consecutive years in virtual and in-person classes. The effectiveness of the CQS discussed here in both modes of instruction was determined by evaluating students' performance after traditional lecture-based instruction and comparing it to their performance after engaging with the CQS. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 18 pages, 1 figure

Journal ref: European Journal of Physics 43 025704 (2022)

arXiv:2402.18072 [pdf]

doi 10.1103/PhysRevPhysEducRes.19.020130

Challenges in addressing student difficulties with quantum measurement of two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students' knowledge. We present findings from the implementation, in consecutive years, of a research-validated multiple-choice question sequence [referred to in this study as a Clicker Question Sequence (CQS)] on quantum measurement as it ap… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool that serves to scaffold student learning and formatively assess students' knowledge. We present findings from the implementation, in consecutive years, of a research-validated multiple-choice question sequence [referred to in this study as a Clicker Question Sequence (CQS)] on quantum measurement as it applies to two-state quantum systems. This study was conducted in an advanced undergraduate quantum mechanics course, in both online and in-person learning environments across three years. Student learning was assessed after traditional lecture-based instruction in relevant concepts, and their performance was compared with that on a similar assessment given after engaging with the CQS. We analyze, compare, and discuss the trends observed in the three implementations. △ Less

Submitted 29 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 30 pages, 3 figures

Journal ref: Physical Review Physics Education Research 19, 020130 (2023)

arXiv:2402.18069 [pdf]

doi 10.1088/1361-6404/acf5b3

Challenges in addressing student difficulties with basics and change of basis for two-state quantum systems using a multiple-choice question sequence in online and in-person classes

Authors: Peter Hu, Yangqiuting Li, Chandralekha Singh

Abstract: Research-validated multiple-choice questions comprise an easy-to-implement instructional tool for scaffolding student learning and providing formative assessment of students' knowledge. We present findings from the implementation of a research-validated multiple-choice question sequence on the basics of two-state quantum systems, including inner products, outer products, translation between Dirac… ▽ More Research-validated multiple-choice questions comprise an easy-to-implement instructional tool for scaffolding student learning and providing formative assessment of students' knowledge. We present findings from the implementation of a research-validated multiple-choice question sequence on the basics of two-state quantum systems, including inner products, outer products, translation between Dirac notation and matrix representation in a particular basis, and change of basis. This study was conducted in an advanced undergraduate quantum mechanics course, in both online and in-person learning environments, across three years. For each cohort, students had their learning assessed after traditional lecture-based instruction in relevant concepts before engaging with the multiple-choice question sequence. Their performance was evaluated again afterwards with a similar assessment and compared to their earlier performance. We analyze, compare, and discuss the trends observed in the three implementations. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 25 pages, 0 figures

Journal ref: European Journal of Physics 44 065703 (2023)

arXiv:2402.17271 [pdf, other]

Capacitive coupling study of the HERD SCD prototype: preliminary results

Authors: Ruo-Si Lu, Rui Qiao, Ke Gong, Wen-Xi Peng, Wei-Shuai Zhang, Dong-Ya Guo, Jia-Ju Wei, Yi-Ming Hu, Jian-Hua Guo, Qi Wu, Peng Hu, Xuan Liu, Bing Lu, Yi-Rong Zhang

Abstract: The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super P… ▽ More The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super Proton Synchrotron. The capacitive coupling fractions with readout strip and floating strip incidences were studied using the beam test data and SPICE simulation. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16297 [pdf, other]

A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics

Authors: Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

Abstract: Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing… ▽ More Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing the evolving dynamics underlying observed count sequences. However, the state-of-the-art PGDS still fails to capture the \emph{time-varying} transition dynamics that are commonly observed in real-world count time sequences. To mitigate this gap, a non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time, and the evolving transition matrices are modeled by sophisticatedly-designed Dirichlet Markov chains. Leveraging Dirichlet-Multinomial-Beta data augmentation techniques, a fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation. Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance due to its capacity to learn non-stationary dependency structure captured by the time-evolving transition matrices. △ Less

Submitted 23 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15141 [pdf, ps, other]

A note on the adjoint method for neural ordinary differential equation network

Authors: Pipi Hu

Abstract: Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the sa… ▽ More Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2401.11818 [pdf, other]

MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement

Authors: Weichen Dai, Xingyu Li, Zeyu Wang, Pengbo Hu, Ji Qi, Jianlin Peng, Yi Zhou

Abstract: Learning effective joint representations has been a central task in multi-modal sentiment analysis. Previous works addressing this task focus on exploring sophisticated fusion techniques to enhance performance. However, the inherent heterogeneity of distinct modalities remains a core problem that brings challenges in fusing and coordinating the multi-modal signals at both the representational leve… ▽ More Learning effective joint representations has been a central task in multi-modal sentiment analysis. Previous works addressing this task focus on exploring sophisticated fusion techniques to enhance performance. However, the inherent heterogeneity of distinct modalities remains a core problem that brings challenges in fusing and coordinating the multi-modal signals at both the representational level and the informational level, impeding the full exploitation of multi-modal information. To address this problem, we propose the Multi-modal Information Disentanglement (MInD) method, which decomposes the multi-modal inputs into modality-invariant and modality-specific components through a shared encoder and multiple private encoders. Furthermore, by explicitly training generated noise in an adversarial manner, MInD is able to isolate uninformativeness, thus improves the learned representations. Therefore, the proposed disentangled decomposition allows for a fusion process that is simpler than alternative methods and results in improved performance. Experimental evaluations conducted on representative benchmark datasets demonstrate MInD's effectiveness in both multi-modal emotion recognition and multi-modal humor detection tasks. Code will be released upon acceptance of the paper. △ Less

Submitted 17 August, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11399 [pdf, other]

Prospects for Joint Detection of Gravitational Waves with Counterpart Gamma-Ray Bursts Detected by the HADAR Experiment

Authors: Pei-Jin Hu, Qi-Ling Chen, Tian-Lu Chen, Ming-Ming Kang, Yi-Qing Guo, Dan-Zeng Luo-Bu, You-Liang Feng, Qi Gao, Quan-Bu Gou, Hong-Bo Hu, Hai-Jin Li, Cheng Liu, Mao-Yuan Liu, Wei Liu, Xiang-Li Qian, Bing-Qiang Qiao, Jing-Jing Su, Hui-Ying Sun, Xu Wang, Zhen Wang, Guang-Guang Xin, Chao-Wen Yang, Yu-Hua Yao, Qiang Yuan, Yi Zhang

Abstract: The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of v… ▽ More The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of view (FOV) and a large effective area above tens of GeV, is a hope for the prompt detection of very-high-energy (VHE; > 10 GeV) SGRBs. The aim of this paper is to simulate and analyse GW/SGRB joint detections by future GW detector networks in synergy with HADAR, including the second generation LIGO, Virgo and KAGRA and the third generation ET and CE. We provide a brief introduction of the HADAR experiment for SGRB simulations and its expected SGRB detections. For GW simulations, we adopt a phenomenological model to describe GWs produced by BNS mergers and introduce the signal-noise ratios (SNRs) as detector responses. Following a theoretical analysis we compute the redshift-dependent efficiency functions of GW detector networks. We then construct the simulation of GW detection by Monte Carlo sampling. We compare the simulated results of LIGO-Virgo O2 and O3 runs with their actual detections as a check. The combination of GW and SGRB models is then discussed for joint detection, including parameter correlations, triggered SNRs and efficiency skymaps. The estimated joint detection rates are 0.09-2.52 per year for LHVK network with HADAR under different possible configurations, and approximately 0.27-7.89 per year for ET+CE network with HADAR. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.10370 [pdf, other]

Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review

Authors: Lars Ericson, Xuejun Zhu, Xusi Han, Rao Fu, Shuang Li, Steve Guo, Ping Hu

Abstract: In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as th… ▽ More In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as the forecast distribution of risk factor returns in the next day. The objectives for financial time series generation are to generate synthetic data paths with good variety, and similar distribution and dynamics to the original historical data. In this paper, we apply multiple existing deep generative methods (e.g., CGAN, CWGAN, Diffusion, and Signature WGAN) for conditional time series generation, and propose and test two new methods for conditional multi-step time series generation, namely Encoder-Decoder CGAN and Conditional TimeVAE. Furthermore, we introduce a comprehensive framework with a set of KPIs to measure the quality of the generated time series for financial modeling. The KPIs cover distribution distance, autocorrelation and backtesting. All models (HS, parametric and neural networks) are tested on both historical USD yield curve data and additional data simulated from GARCH and CIR processes. The study shows that top performing models are HS, GARCH and CWGAN models. Future research directions in this area are also discussed. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07842 [pdf, ps, other]

Closing the Performance and Management Gaps with Satellite Internet: Challenges, Approaches, and Future Directions

Authors: Peng Hu

Abstract: Recent advancements in low-Earth orbit (LEO) satellites represented by large constellations and advanced payloads provide great promises for enabling beyond 5G and 6G telecommunications and high-quality and ubiquitous Internet connectivity to everyone anywhere on Earth. LEO satellite networks are envisioned to bridge the urban-rural connectivity gap for the digital divide. However, the digital div… ▽ More Recent advancements in low-Earth orbit (LEO) satellites represented by large constellations and advanced payloads provide great promises for enabling beyond 5G and 6G telecommunications and high-quality and ubiquitous Internet connectivity to everyone anywhere on Earth. LEO satellite networks are envisioned to bridge the urban-rural connectivity gap for the digital divide. However, the digital divide can hardly be closed by only providing connectivity to rural and remote areas. Various unprecedented challenges brought by the emerging satellite Internet still need to be resolved, such as inconsistent end-to-end performance guarantees and a lack of efficient management and operations in these areas, which are referred to as "performance gap" and "management gap", respectively. This position paper will briefly discuss these gaps, approaches to addressing the gaps, and some research directions based on our recent works. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Published at the IAB Workshop on Barriers to Internet Access of Services (BIAS) 2024. Available at: https://www.ietf.org/slides/slides-biasws-closing-the-performance-and-management-gaps-with-satellite-internet-challenges-approaches-and-future-directions-01.pdf

arXiv:2401.06786 [pdf, other]

CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. We further enhanced the dataset to meet practical needs by rephrasing questions in a concise, abbreviated, and bilingual manner. The dataset consists of 1011 problems that take more than 1200 human hours to complete. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 12 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost. △ Less

Submitted 9 November, 2023; originally announced January 2024.

arXiv:2401.02869 [pdf, ps, other]

Practical Reasoning in DatalogMTL

Authors: Dingmin Wang, Przemysław A. Wałęga, Pan Hu, Bernardo Cuenca Grau

Abstract: DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented i… ▽ More DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented in a system called MeTeoR. Our approach effectively combines an optimised (but generally non-terminating) materialisation (a.k.a. forward chaining) procedure, which provides scalable behaviour, with an automata-based component that guarantees termination and completeness. To ensure favourable scalability of the materialisation component, we propose a novel seminaïve materialisation procedure for DatalogMTL enjoying the non-repetition property, which ensures that each specific rule application will be considered at most once throughout the entire execution of the algorithm. Moreover, our materialisation procedure is enhanced with additional optimisations which further reduce the number of redundant computations performed during materialisation by disregarding rules as soon as it is certain that they cannot derive new facts in subsequent materialisation steps. Our extensive evaluation supports the practicality of our approach. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: text overlap with arXiv:2208.07100

arXiv:2401.01077 [pdf, other]

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

Authors: Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su

Abstract: We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guarante… ▽ More We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.00997

arXiv:2401.00435 [pdf, other]

Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

Authors: Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du

Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional tr… ▽ More The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. In order to overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, allowing for more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model's robustness and generalization when dealing with visual ambiguity, particularly in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17460 [pdf, other]

Momentum and angular correlations in \texorpdfstring{$Z/γ$}{Z/gamma}-hadron production in relativistic heavy-ion collisions

Authors: Zhan Gao, Lin Chen, Peng-Hui Hu, Man Xie, Han-Zhong Zhang

Abstract: We carry out a detailed study of medium modifications on momentum and angular correlations between a large transverse momentum hadron and a $Z/γ$ trigger in relativistic heavy-ion collisions within a perturbative QCD parton model improved by the Sudakov resummation technique. The total energy loss of a hard parton propagating inside the medium is employed to modify the fragmentation function, whil… ▽ More We carry out a detailed study of medium modifications on momentum and angular correlations between a large transverse momentum hadron and a $Z/γ$ trigger in relativistic heavy-ion collisions within a perturbative QCD parton model improved by the Sudakov resummation technique. The total energy loss of a hard parton propagating inside the medium is employed to modify the fragmentation function, while the medium-induced transverse momentum broadening is included in the resummation approach, and both of them are related to the jet transport parameter and obtained by the high-twist formalism. We obtain good agreements with the existing data on transverse momentum and azimuthal angular correlations for the $Z/γ$-hadron pairs in $pp$ and $AA$ collisions, and predict the correlations for the $γ$-hadron in central $PbPb$ collisions at 5.02 TeV. The numerical analyses for the $Z/γ$-hadron in central $PbPb$ collisions show that the normalized angular distribution is decorrelated due to the medium-induced transverse momentum broadening, however, the angular correlation is enhanced due to the parton energy loss, namely anti-broadening. The observed modification of the angular correlation is a result of the competition between the broadening and the anti-broadening. This work provides a reliable theoretical tool for a comprehensive and precise study of jet quenching in relativistic heavy-ion collisions. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 19 pages, 20 figures

arXiv:2312.11610 [pdf, ps, other]

doi 10.1007/s11433-024-2398-1

Improved Reall-Santos method for AdS black holes in general 4-derivative gravities

Authors: Peng-Ju Hu, Liang Ma, H. Lu, Yi Pang

Abstract: For asymptotically flat black holes, Reall-Santos method is a convenient tool to compute leading higher derivative corrections to the thermodynamic quantities without actually solving the modified field equations. However, there are subtleties in its generalization to asymptotically AdS black holes with general higher derivative corrections. First of all, it is necessary to know all the higher der… ▽ More For asymptotically flat black holes, Reall-Santos method is a convenient tool to compute leading higher derivative corrections to the thermodynamic quantities without actually solving the modified field equations. However, there are subtleties in its generalization to asymptotically AdS black holes with general higher derivative corrections. First of all, it is necessary to know all the higher derivative holographic counterterms and the surface terms implementing the variational principle and subtracting the divergence. One then needs to solve for the modified AdS radius and rescale the time coordinate in an appropriate way such that the induced metric on the conformal boundary of AdS black hole is not modified. We observe that Reall-Santos method can be directly applied to a particular 4-derivative gravity model, known as the Einstein-Weyl gravity, which does not modify the AdS radius and requires only the Gibbons-Hawking-York term and holographic counterterms for the 2-derivative theory. We thus suggest that to compute the thermodynamic quantities of AdS black holes in general 4-derivative theories of gravity, one simply needs to transform it to a Einstein-Weyl gravity with identical thermodynamic variables by appropriate field redefinitions. We explicitly verify this proposal with spherically-symmetric and static charged black holes in Einstein-Maxwell theory extended with generic 4-derivative interactions. △ Less

Submitted 17 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: LateX; 33 pages; accepted by SCIENCE CHINA Physics, Mechanics & Astronomy

Journal ref: Sci. China Phys. Mech. Astron. 67 (2024) 8, 280412

arXiv:2312.11297 [pdf, other]

Optimised Storage for Datalog Reasoning

Authors: Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks

Abstract: Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures… ▽ More Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures that compactly represent the reasoning result and can be efficiently queried when necessary. In this paper, we present a general framework that allows for the integration of such optimised storage schemes with standard materialisation algorithms. Moreover, we devise optimised storage schemes targeting at transitive rules and union rules, two types of (combination of) rules that commonly occur in practice. Our experimental evaluation shows that our approach significantly improves memory consumption, sometimes by orders of magnitude, while remaining competitive in terms of query answering time. △ Less

Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2312.05583 [pdf, other]

Better Neural PDE Solvers Through Data-Free Mesh Movers

Authors: Peiyan Hu, Yue Wang, Zhi-Ming Ma

Abstract: Recently, neural networks have been extensively employed to solve partial differential equations (PDEs) in physical system modeling. While major studies focus on learning system evolution on predefined static mesh discretizations, some methods utilize reinforcement learning or supervised learning techniques to create adaptive and dynamic meshes, due to the dynamic nature of these systems. However,… ▽ More Recently, neural networks have been extensively employed to solve partial differential equations (PDEs) in physical system modeling. While major studies focus on learning system evolution on predefined static mesh discretizations, some methods utilize reinforcement learning or supervised learning techniques to create adaptive and dynamic meshes, due to the dynamic nature of these systems. However, these approaches face two primary challenges: (1) the need for expensive optimal mesh data, and (2) the change of the solution space's degree of freedom and topology during mesh refinement. To address these challenges, this paper proposes a neural PDE solver with a neural mesh adapter. To begin with, we introduce a novel data-free neural mesh adaptor, called Data-free Mesh Mover (DMM), with two main innovations. Firstly, it is an operator that maps the solution to adaptive meshes and is trained using the Monge-Ampère equation without optimal mesh data. Secondly, it dynamically changes the mesh by moving existing nodes rather than adding or deleting nodes and edges. Theoretical analysis shows that meshes generated by DMM have the lowest interpolation error bound. Based on DMM, to efficiently and accurately model dynamic systems, we develop a moving mesh based neural PDE solver (MM-PDE) that embeds the moving mesh with a two-branch architecture and a learnable interpolation framework to preserve information within the data. Empirical experiments demonstrate that our method generates suitable meshes and considerably enhances accuracy when modeling widely considered PDE systems. The code can be found at: https://github.com/Peiyannn/MM-PDE.git. △ Less

Submitted 19 February, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.04795 [pdf, other]

Latency versus Transmission Power Trade-off in Free-Space Optical (FSO) Satellite Networks with Multiple Inter-Continental Connections

Authors: Jintao Liang, Aizaz Chaudhry, John Chinneck, Halim Yanikomeroglu, Gunes Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

Abstract: In free-space optical satellite networks (FSOSNs), satellites connected via laser inter-satellite links (LISLs), latency is a critical factor, especially for long-distance inter-continental connections. Since satellites depend on solar panels for power supply, power consumption is also a vital factor. We investigate the minimization of total network latency (i.e., the sum of the network latencies… ▽ More In free-space optical satellite networks (FSOSNs), satellites connected via laser inter-satellite links (LISLs), latency is a critical factor, especially for long-distance inter-continental connections. Since satellites depend on solar panels for power supply, power consumption is also a vital factor. We investigate the minimization of total network latency (i.e., the sum of the network latencies of all inter-continental connections in a time slot) in a realistic model of a FSOSN, the latest version of the Starlink Phase 1 Version 3 constellation. We develop mathematical formulations of the total network latency over different LISL ranges and different satellite transmission power constraints for multiple simultaneous inter-continental connections. We use practical system models for calculating network latency and satellite optical link transmission power, and we formulate the problem as a binary integer linear program. The results reveal that, for satellite transmission power limits set at 0.5 W, 0.3 W, and 0.1 W, the average total network latency for all five inter-continental connections studied in this work levels off at 339 ms, 361 ms, and 542 ms, respectively. Furthermore, the corresponding LISL ranges required to achieve these average total network latency values are 4500 km, 3000 km, and 1731 km, respectively. Different limitations on satellite transmission power exhibit varying effects on average total network latency (over 100 time slots), and they also induce differing changes in the corresponding LISL ranges. In the absence of satellite transmission power constraints, as the LISL range extends from the minimum feasible range of 1575 km to the maximum feasible range of 5016 km, the average total network latency decreases from 589 ms to 311 ms. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Open Journal of the Communications Society

arXiv:2312.04788 [pdf, other]

Free-Space Optical (FSO) Satellite Networks Performance Analysis: Transmission Power, Latency, and Outage Probability

Authors: Jintao Liang, Aizaz U. Chaudhry, Eylem Erdogan, Halim Yanikomeroglu, Gunes Karabulut Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

Abstract: In free-space optical satellite networks (FSOSNs), satellites can have different laser inter-satellite link (LISL) ranges for connectivity. Greater LISL ranges can reduce network latency of the path but can also result in an increase in transmission power for satellites on the path. Consequently, this tradeoff between satellite transmission power and network latency should be investigated, and in… ▽ More In free-space optical satellite networks (FSOSNs), satellites can have different laser inter-satellite link (LISL) ranges for connectivity. Greater LISL ranges can reduce network latency of the path but can also result in an increase in transmission power for satellites on the path. Consequently, this tradeoff between satellite transmission power and network latency should be investigated, and in this work we examine it in FSOSNs drawing on the Starlink Phase 1 Version 3 and Kuiper Shell 2 constellations for different LISL ranges and different inter-continental connections. We use appropriate system models for calculating the average satellite transmission power and network latency. The results show that the mean network latency decreases and mean average satellite transmission power increases with an increase in LISL range. For the Toronto--Sydney inter-continental connection in an FSOSN with Starlink's Phase 1 Version 3 constellation, when the LISL range is approximately 2,900 km, the mean network latency and mean average satellite transmission power intersect are approximately 135 ms and 380 mW, respectively. For an FSOSN with the Kuiper Shell 2 constellation in this inter-continental connection, this LISL range is around 3,800 km, and the two parameters are approximately 120 ms and 700 mW, respectively. For the Toronto--Istanbul and Toronto--London inter-continental connections, the LISL ranges at the intersection are different and vary from 2,600 km to 3,400 km. Furthermore, we analyze outage probability performance of optical uplink/downlink due to atmosphere attenuation and turbulence. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Open Journal of Vehicular Technology

arXiv:2312.04038 [pdf, other]

Reconstruction of dynamical systems from data without time labels

Authors: Zhijun Zeng, Pipi Hu, Chenglong Bao, Yi Zhu, Zuoqiang Shi

Abstract: In this paper, we study the method to reconstruct dynamical systems from data without time labels. Data without time labels appear in many applications, such as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of dynamical system from time sequence data has been studied extensively. However, these methods do not apply if time labels are unknown. Without time labels, sequence data… ▽ More In this paper, we study the method to reconstruct dynamical systems from data without time labels. Data without time labels appear in many applications, such as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of dynamical system from time sequence data has been studied extensively. However, these methods do not apply if time labels are unknown. Without time labels, sequence data becomes distribution data. Based on this observation, we propose to treat the data as samples from a probability distribution and try to reconstruct the underlying dynamical system by minimizing the distribution loss, sliced Wasserstein distance more specifically. Extensive experiment results demonstrate the effectiveness of the proposed method. △ Less

Submitted 8 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03018 [pdf, other]

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

Authors: Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Abstract: Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods try to extend pre-trained text-guided image diffusion models to image-guided video generation models. Nevertheless, these methods often result in either low fidelity or flickering over time due to their limitation to shallow image guidance and poor temporal c… ▽ More Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods try to extend pre-trained text-guided image diffusion models to image-guided video generation models. Nevertheless, these methods often result in either low fidelity or flickering over time due to their limitation to shallow image guidance and poor temporal consistency. To tackle these problems, we propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo. Instead of integrating the reference image into the diffusion process at a semantic level, our DreamVideo perceives the reference image via convolution layers and concatenates the features with the noisy latents as model input. By this means, the details of the reference image can be preserved to the greatest extent. In addition, by incorporating double-condition classifier-free guidance, a single image can be directed to videos of different actions by providing varying prompt texts. This has significant implications for controllable video generation and holds broad application prospects. We conduct comprehensive experiments on the public dataset, and both quantitative and qualitative results indicate that our method outperforms the state-of-the-art method. Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge. Also, precise control can be achieved by giving different text prompts. Further details and comprehensive results of our model will be presented in https://anonymous0769.github.io/DreamVideo/. △ Less

Submitted 12 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.00823 [pdf, other]

Adaptive Multi-Modality Prompt Learning

Authors: Zongqian Wu, Yujing Liu, Mengmeng Zhan, Jialie Shen, Ping Hu, Xiaofeng Zhu

Abstract: Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalizatio… ▽ More Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalization. In this paper, we propose an adaptive multi-modality prompt learning to address the above issues. To do this, we employ previous text prompt learning and propose a new image prompt learning. The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts. Moreover, each of the prompts provides auxiliary information to each other, further strengthening these two kinds of generalization. Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.15834 [pdf, other]

doi 10.1103/PhysRevB.109.L041113

Charge-density wave transition in magnetic topological semimetal EuAl$_4$

Authors: R. Yang, C. C. Le, P. Zhu, Z. W. Wang, T. Shang, Y. M. Dai, J. P. Hu, M. Dressel

Abstract: The interplay among topology, charge-density wave (CDW), and magnetism can give rise to a plethora of exotic quantum phenomena. Recently, a group of magnetic topological semimetals with tetragonal lattices and CDW order were found to exhibit anomalous magnetic instability, helical spin ordering, and the presence of skyrmions. However, the underlying mechanism responsible for these observations rem… ▽ More The interplay among topology, charge-density wave (CDW), and magnetism can give rise to a plethora of exotic quantum phenomena. Recently, a group of magnetic topological semimetals with tetragonal lattices and CDW order were found to exhibit anomalous magnetic instability, helical spin ordering, and the presence of skyrmions. However, the underlying mechanism responsible for these observations remains unclear. Here, we conducted a comprehensive investigation into the impact of CDW on the topological and magnetic properties of EuAl$_4$ using optical spectroscopy and the first-principles calculations. Through optical spectroscopy, we observed a partial gap (60~meV) on the Fermi surface and an enhanced mid-infrared absorption around 0.4~eV after the CDW transition. Magneto-optical spectroscopy and the first-principles calculations proved that, by affecting the band structure, the CDW order frustrates the antiferromagnetic interactions but strengthened the ferromagnetic ones, which can destabilize the magnetism. With lower symmetry in the CDW ordered state, carriers from the Weyl bands will mediate the anisotropic magnetic interactions promoting the formation of chiral spin textures. Conversely, without the CDW order, the counterpart EuGa$_4$ shows robust collinear antiferromagnetic order. Our findings uncover the pivotal role played by CDW order in arousing intricate magnetism in topological materials and provide valuable insights into controlling topological and magnetic properties through the manipulation of CDW orders. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 8 pages, 4 figures

Report number: RIKEN-iTHEMS-Report-24

arXiv:2311.07062 [pdf, other]

doi 10.1109/TASLP.2023.3332542

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Authors: Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie

Abstract: Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related a… ▽ More Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related accent characteristics, while coarse-grained units are better for learning linguistic information. Moreover, an explicit interaction of two tasks can also provide complementary information and improve the performance of each other, but it is rarely used by existing approaches. In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder. Specifically, AR and ASR are first decoupled by separated branches and two-granular modeling units to learn task-specific representations. The AR branch is from our previously proposed linguistic-acoustic bimodal AR model and the ASR branch is an encoder-decoder based Conformer model. Then, for the task interaction, the CTC branch provides aligned text for the AR task, while accent embeddings extracted from our AR model are incorporated into the ASR branch's encoder and decoder. Finally, during ASR inference, a cross-granular rescoring method is introduced to fuse the complementary information from the CTC and attention decoder after the decoupling. Our experiments on English and Chinese datasets demonstrate the effectiveness of the proposed model, which achieves 21.45%/28.53% AR accuracy relative improvement and 32.33%/14.55% ASR error rate relative reduction over a published standard baseline, respectively. △ Less

Submitted 17 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing (TASLP)

arXiv:2310.18946 [pdf, other]

Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

Authors: Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

Abstract: In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger… ▽ More In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger area of visual context, establishing a many-to-many splatting scheme with robustness to undesirable artifacts. For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames, hence achieving fast multi-frame interpolation. However, directly warping and fusing pixels in the intensity domain is sensitive to the quality of motion estimation and may suffer from less effective representation capacity. To improve interpolation accuracy, we further extend an M2M++ framework by introducing a flexible Spatial Selective Refinement (SSR) component, which allows for trading computational efficiency for interpolation quality and vice versa. Instead of refining the entire interpolated frame, SSR only processes difficult regions selected under the guidance of an estimated error map, thereby avoiding redundant computation. Evaluation on multiple benchmark datasets shows that our method is able to improve the efficiency while maintaining competitive video interpolation quality, and it can be adjusted to use more or less compute as needed. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: T-PAMI. arXiv admin note: substantial text overlap with arXiv:2204.03513

arXiv:2310.17468 [pdf, other]

Cross-modal Active Complementary Learning with Self-refining Correspondence

Authors: Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, Peng Hu

Abstract: Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a perf… ▽ More Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a performance drop. Although some methods attempt to address such noise, they still face two challenging problems: excessive memorizing/overfitting and unreliable correction for NC, especially under high noise. To address the two problems, we propose a generalized Cross-modal Robust Complementary Learning framework (CRCL), which benefits from a novel Active Complementary Loss (ACL) and an efficient Self-refining Correspondence Correction (SCC) to improve the robustness of existing methods. Specifically, ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision, leading to theoretically and experimentally demonstrated robustness against NC. SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences, thereby alleviating error accumulation and achieving accurate and stable corrections. We carry out extensive experiments on three image-text benchmarks, i.e., Flickr30K, MS-COCO, and CC152K, to verify the superior robustness of our CRCL against synthetic and real-world noisy correspondences. △ Less

Submitted 7 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: This paper is accepted by NeurIPS 2023

arXiv:2310.17145 [pdf]

doi 10.1063/5.0216883

Unveiling microstructural damage for leakage current degradation in SiC Schottky diode after heavy ions irradiation under 200 V

Authors: Xiaoyu Yan, Pengfei Zhai, Chen Yang, Shiwei Zhao, Shuai Nan, Peipei Hu, Teng Zhang, Qiyu Chen, Lijun Xu, Zongzhen Li, Jie Liu

Abstract: Single-event burnout and single-event leakage current (SELC) in SiC power devices induced by heavy ions severely limit their space application, and the underlying mechanism is still unclear. One fundamental problem is lack of high-resolution characterization of radiation damage in the irradiated SiC power devices, which is a crucial indicator of the related mechanism. In this letter, high-resoluti… ▽ More Single-event burnout and single-event leakage current (SELC) in SiC power devices induced by heavy ions severely limit their space application, and the underlying mechanism is still unclear. One fundamental problem is lack of high-resolution characterization of radiation damage in the irradiated SiC power devices, which is a crucial indicator of the related mechanism. In this letter, high-resolution transmission electron microscopy (TEM) was used to characterize the radiation damage in the 1437.6 MeV 181Ta-irradiated SiC junction barrier Schottky diode under 200 V. The amorphous radiation damage with about 52 nm in diameter and 121 nm in length at the Schottky metal (Ti)-semiconductor (SiC) interface was observed. More importantly, in the damage site the atomic mixing of Ti, Si, and C was identified by electron energy loss spectroscopy and high-angle annular dark-field scanning TEM. It indicates that the melting of the Ti-SiC interface induced by localized Joule heating is responsible for the amorphization and the formation of titanium silicide, titanium carbide, or ternary phases. These modifications at nanoscale in turn cause the localized degradation of the Schottky contact, resulting in the permanent increase in leakage current. This experimental study provides very valuable clues to thorough understanding of the SELC mechanism in SiC diode. △ Less

Submitted 7 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 4 pages,4 figures

Journal ref: Applied Physics Letters, 125, 042103 (2024)

arXiv:2310.11989 [pdf, other]

Image Clustering with External Guidance

Authors: Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng

Abstract: The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from dat… ▽ More The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering, even though it seems irrelevant to the given data. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, to improve image clustering performance, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset. △ Less

Submitted 16 July, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Journal ref: ICML 2024 (Oral)

arXiv:2310.11727 [pdf, other]

Testing the cosmological principle with the Pantheon+ sample and the region-fitting method

Authors: J. P. Hu, Y. Y. Wang, J. Hu, F. Y. Wang

Abstract: The cosmological principle is fundamental to the standard cosmological model. It assumes that the Universe is homogeneous and isotropic on very large scales. As the basic assumption, it must stand the test of various observations. In this work, using the region fitting (RF) method, we mapped the all-sky distribution of cosmological parameters ($Ω_{m}$ and $H_{0}$) and find that the distribution si… ▽ More The cosmological principle is fundamental to the standard cosmological model. It assumes that the Universe is homogeneous and isotropic on very large scales. As the basic assumption, it must stand the test of various observations. In this work, using the region fitting (RF) method, we mapped the all-sky distribution of cosmological parameters ($Ω_{m}$ and $H_{0}$) and find that the distribution significantly deviates from isotropy. A local matter underdensity region exists toward (${308.4^{\circ}}$$_{-48.7}^{+47.6}$, ${-18.2^{\circ}}$$_{-28.8}^{+21.1}$) as well as a preferred direction of the cosmic anisotropy (${313.4^{\circ}}$$_{-18.2}^{+19.6}$, ${-16.8^{\circ}}$$_{-10.7}^{+11.1}$) in galactic coordinates. Similar directions may imply that local matter density might be responsible for the anisotropy of the accelerated expansion of the Universe. Results of statistical isotropy analyses including Isotropy and Isotropy with real-data positions (RP) show high confidence levels. For the local matter underdensity, the statistical significances are 2.78$σ$ (isotropy) and 2.34$σ$ (isotropy RP). For the cosmic anisotropy, the statistical significances are 3.96$σ$ (isotropy) and 3.15$σ$ (isotropy RP). The comparison of these two kinds of statistical isotropy analyses suggests that inhomogeneous spatial distribution of real sample can increase the deviation from isotropy. The similar results and findings are also found from reanalyses of the low-redshift sample (lp+) and the lower screening angle ($θ_\mathrm{max}$ = 60$^{\circ}$), but with a slight decrease in statistical significance. Overall, our results provide clear indications for a possible cosmic anisotropy. This possibility must be taken seriously. Further testing is needed to better understand this signal. △ Less

Submitted 9 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: 15 pages, 20 figures and 3 tables, published version of A&A, corrected some typos and updated the reference, the abstract appearing here is slightly shorter than that in the PDF file

arXiv:2310.11598 [pdf, other]

Learning Neural Implicit through Volume Rendering with Attentive Depth Fusion Priors

Authors: Pengchong Hu, Zhizhong Han

Abstract: Learning neural implicit representations has achieved remarkable performance in 3D reconstruction from multi-view images. Current methods use volume rendering to render implicit representations into either RGB or depth images that are supervised by multi-view ground truth. However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the dep… ▽ More Learning neural implicit representations has achieved remarkable performance in 3D reconstruction from multi-view images. Current methods use volume rendering to render implicit representations into either RGB or depth images that are supervised by multi-view ground truth. However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the depth supervision, which severely affects the accuracy of geometry inference via volume rendering. To resolve this issue, we propose to learn neural implicit representations from multi-view RGBD images through volume rendering with an attentive depth fusion prior. Our prior allows neural networks to perceive coarse 3D structures from the Truncated Signed Distance Function (TSDF) fused from all depth images available for rendering. The TSDF enables accessing the missing depth at holes on one depth image and the occluded parts that are invisible from the current view. By introducing a novel attention mechanism, we allow neural networks to directly use the depth fusion prior with the inferred occupancy as the learned implicit function. Our attention mechanism works with either a one-time fused TSDF that represents a whole scene or an incrementally fused TSDF that represents a partial scene in the context of Simultaneous Localization and Mapping (SLAM). Our evaluations on widely used benchmarks including synthetic and real-world scans show our superiority over the latest neural implicit methods. Project page: https://machineperceptionlab.github.io/Attentive_DF_Prior/ △ Less

Submitted 7 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2310.09297 [pdf, other]

A Framework for Inference Inspired by Human Memory Mechanisms

Authors: Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang

Abstract: How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inferen… ▽ More How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, reducing information conflicts and averting memory overflow. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as detecting equilateral triangles, language modeling and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that relational memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks. △ Less

Submitted 20 May, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2310.05372 [pdf]

doi 10.1016/j.scitotenv.2023.166714

The role of hydrodynamics for the spatial distribution of high-temperature hydrothermal vent-endemic fauna in the deep ocean environment

Authors: Zhiguo He, Yingzhong Lou, Haoyang Zhang, Xiqiu Han, Thomas Pähtz, Pengcheng Jiao, Peng Hu, Yadong Zhou, Yejian Wang, Zhongyan Qiu

Abstract: Active hydrothermal vents provide the surrounding submarine environment with substantial amounts of matter and energy, thus serving as important habitats for diverse megabenthic communities in the deep ocean and constituting a unique, highly productive chemosynthetic ecosystem on Earth. Vent-endemic biological communities gather near the venting site and are usually not found beyond a distance of… ▽ More Active hydrothermal vents provide the surrounding submarine environment with substantial amounts of matter and energy, thus serving as important habitats for diverse megabenthic communities in the deep ocean and constituting a unique, highly productive chemosynthetic ecosystem on Earth. Vent-endemic biological communities gather near the venting site and are usually not found beyond a distance of the order of 100 m from the vent. This is surprising because one would actually expect matter ejected from high-temperature vents, which generate highly turbulent buoyancy plumes, to be suspended and carried far away by the plume flows and deep-sea currents. Here, we study this problem from a fluid dynamics perspective by simulating the vent hydrodynamics using a numerical model that couples the plume flow with induced matter and energy transport. We find that both low- and high-temperature vents deposit most vent matter relatively close to the plume. In particular, the tendency of turbulent buoyancy plumes to carry matter far away is strongly counteracted by generated entrainment flows back into the plume stem. The deposition ranges of organic and inorganic hydrothermal particles obtained from the simulations for various natural high-temperature vents are consistent with the observed maximum spatial extent of biological communities, evidencing that plume hydrodynamics exercises strong control over the spatial distribution of vent-endemic fauna. While other factors affecting the spatial distribution of vent-endemic fauna, such as geology and geochemistry, are site-specific, the main physical features of plume hydrodynamics unraveled in this study are largely site-unspecific and therefore universal across vent sites on Earth. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Journal ref: Science of the Total Environment 904, 166714 (2023)

arXiv:2309.12113 [pdf, other]

Incentivizing Massive Unknown Workers for Budget-Limited Crowdsensing: From Off-Line and On-Line Perspectives

Authors: Feng Li, Yuqi Chai, Huan Yang, Pengfei Hu, Lingjie Duan

Abstract: How to incentivize strategic workers using limited budget is a very fundamental problem for crowdsensing systems; nevertheless, since the sensing abilities of the workers may not always be known as prior knowledge due to the diversities of their sensor devices and behaviors, it is difficult to properly select and pay the unknown workers. Although the uncertainties of the workers can be addressed b… ▽ More How to incentivize strategic workers using limited budget is a very fundamental problem for crowdsensing systems; nevertheless, since the sensing abilities of the workers may not always be known as prior knowledge due to the diversities of their sensor devices and behaviors, it is difficult to properly select and pay the unknown workers. Although the uncertainties of the workers can be addressed by the standard Combinatorial Multi-Armed Bandit (CMAB) framework in existing proposals through a trade-off between exploration and exploitation, we may not have sufficient budget to enable the trade-off among the individual workers, especially when the number of the workers is huge while the budget is limited. Moreover, the standard CMAB usually assumes the workers always stay in the system, whereas the workers may join in or depart from the system over time, such that what we have learnt for an individual worker cannot be applied after the worker leaves. To address the above challenging issues, in this paper, we first propose an off-line Context-Aware CMAB-based Incentive (CACI) mechanism. We innovate in leveraging the exploration-exploitation trade-off in an elaborately partitioned context space instead of the individual workers, to effectively incentivize the massive unknown workers with a very limited budget. We also extend the above basic idea to the on-line setting where unknown workers may join in or depart from the systems dynamically, and propose an on-line version of the CACI mechanism. We perform rigorous theoretical analysis to reveal the upper bounds on the regrets of our CACI mechanisms and to prove their truthfulness and individual rationality, respectively. Extensive experiments on both synthetic and real datasets are also conducted to verify the efficacy of our mechanisms. △ Less

Submitted 2 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.07925 [pdf, other]

doi 10.1145/3581783.3612859

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

arXiv:2309.04814 [pdf, other]

Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

Authors: Xiuzhe Wu, Pengfei Hu, Yang Wu, Xiaoyang Lyu, Yan-Pei Cao, Ying Shan, Wenming Yang, Zhongqian Sun, Xiaojuan Qi

Abstract: Synthesizing realistic videos according to a given speech is still an open challenge. Previous works have been plagued by issues such as inaccurate lip shape generation and poor image quality. The key reason is that only motions and appearances on limited facial areas (e.g., lip area) are mainly driven by the input speech. Therefore, directly learning a mapping function from speech to the entire h… ▽ More Synthesizing realistic videos according to a given speech is still an open challenge. Previous works have been plagued by issues such as inaccurate lip shape generation and poor image quality. The key reason is that only motions and appearances on limited facial areas (e.g., lip area) are mainly driven by the input speech. Therefore, directly learning a mapping function from speech to the entire head image is prone to ambiguity, particularly when using a short video for training. We thus propose a decomposition-synthesis-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance to facilitate effective learning from limited training data, resulting in the generation of natural-looking videos. First, given a fixed head pose (i.e., canonical space), we present a speech-driven implicit model for lip image generation which concentrates on learning speech-sensitive motion and appearance. Next, to model the major speech-insensitive motion (i.e., head movement), we introduce a geometry-aware mutual explicit mapping (GAMEM) module that establishes geometric mappings between different head poses. This allows us to paste generated lip images at the canonical space onto head images with arbitrary poses and synthesize talking videos with natural head movements. In addition, a Blend-Net and a contrastive sync loss are introduced to enhance the overall synthesis performance. Quantitative and qualitative results on three benchmarks demonstrate that our model can be trained by a video of just a few minutes in length and achieve state-of-the-art performance in both visual quality and speech-visual synchronization. Code: https://github.com/CVMI-Lab/Speech2Lip. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2309.02399 [pdf, other]

The Batik-plays-Mozart Corpus: Linking Performance to Score to Musicological Annotations

Authors: Patricia Hu, Gerhard Widmer

Abstract: We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored Bösendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, no… ▽ More We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored Bösendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, note by note, with a current standard edition of the corresponding scores (the New Mozart Edition) in such a way that they can further be connected to the musicological annotations (harmony, cadences, phrases) on these scores that were recently published by Hentschel et al. (2021). The result is a high-quality, high-precision corpus mapping scores and musical structure annotations to precise note-level professional performance information. As the first of its kind, it can serve as a valuable resource for studying various facets of expressive performance and their relationship with structural aspects. In the paper, we outline the curation process of the alignment and conduct two exploratory experiments to demonstrate its usefulness in analyzing expressive performance. △ Less

Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: To be published in the Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

arXiv:2308.14667 [pdf]

Neural Network-Based Histologic Remission Prediction In Ulcerative Colitis

Authors: Yemin li, Zhongcheng Liu, Xiaoying Lou, Mirigual Kurban, Miao Li, Jie Yang, Kaiwei Che, Jiankun Wang, Max Q. -H Meng, Yan Huang, Qin Guo, Pinjin Hu

Abstract: BACKGROUND & AIMS: Histological remission (HR) is advocated and considered as a new therapeutic target in ulcerative colitis (UC). Diagnosis of histologic remission currently relies on biopsy; during this process, patients are at risk for bleeding, infection, and post-biopsy fibrosis. In addition, histologic response scoring is complex and time-consuming, and there is heterogeneity among pathologi… ▽ More BACKGROUND & AIMS: Histological remission (HR) is advocated and considered as a new therapeutic target in ulcerative colitis (UC). Diagnosis of histologic remission currently relies on biopsy; during this process, patients are at risk for bleeding, infection, and post-biopsy fibrosis. In addition, histologic response scoring is complex and time-consuming, and there is heterogeneity among pathologists. Endocytoscopy (EC) is a novel ultra-high magnification endoscopic technique that can provide excellent in vivo assessment of glands. Based on the EC technique, we propose a neural network model that can assess histological disease activity in UC using EC images to address the above issues. The experiment results demonstrate that the proposed method can assist patients in precise treatment and prognostic assessment. METHODS: We construct a neural network model for UC evaluation. A total of 5105 images of 154 intestinal segments from 87 patients undergoing EC treatment at a center in China between March 2022 and March 2023 are scored according to the Geboes score. Subsequently, 103 intestinal segments are used as the training set, 16 intestinal segments are used as the validation set for neural network training, and the remaining 35 intestinal segments are used as the test set to measure the model performance together with the validation set. RESULTS: By treating HR as a negative category and histologic activity as a positive category, the proposed neural network model can achieve an accuracy of 0.9, a specificity of 0.95, a sensitivity of 0.75, and an area under the curve (AUC) of 0.81. CONCLUSION: We develop a specific neural network model that can distinguish histologic remission/activity in EC images of UC, which helps to accelerate clinical histological diagnosis. keywords: ulcerative colitis; Endocytoscopy; Geboes score; neural network. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.12350 [pdf, other]

Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation

Authors: Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu

Abstract: Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source… ▽ More Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. Concretely, we formulate cross-domain image translation as a denoising diffusion process and utilize a novel Semantic Gradient Guidance (SGG) method to constrain the translation process, conditioning it on the pixel-wise source labels. Additionally, a Progressive Translation Learning (PTL) strategy is devised to enable the SGG method to work reliably across domains with large gaps. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV2023

arXiv:2308.11164 [pdf, other]

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Authors: Yiding Lu, Yijie Lin, Mouxing Yang, Dezhong Peng, Peng Hu, Xi Peng

Abstract: In recent, some robust contrastive multi-view clustering (MvC) methods have been proposed, which construct data pairs from neighborhoods to alleviate the false negative issue, i.e., some intra-cluster samples are wrongly treated as negative pairs. Although promising performance has been achieved by these methods, the false negative issue is still far from addressed and the false positive issue eme… ▽ More In recent, some robust contrastive multi-view clustering (MvC) methods have been proposed, which construct data pairs from neighborhoods to alleviate the false negative issue, i.e., some intra-cluster samples are wrongly treated as negative pairs. Although promising performance has been achieved by these methods, the false negative issue is still far from addressed and the false positive issue emerges because all in- and out-of-neighborhood samples are simply treated as positive and negative, respectively. To address the issues, we propose a novel robust method, dubbed decoupled contrastive multi-view clustering with high-order random walks (DIVIDE). In brief, DIVIDE leverages random walks to progressively identify data pairs in a global instead of local manner. As a result, DIVIDE could identify in-neighborhood negatives and out-of-neighborhood positives. Moreover, DIVIDE embraces a novel MvC architecture to perform inter- and intra-view contrastive learning in different embedding spaces, thus boosting clustering performance and embracing the robustness against missing views. To verify the efficacy of DIVIDE, we carry out extensive experiments on four benchmark datasets comparing with nine state-of-the-art MvC methods in both complete and incomplete MvC settings. △ Less

Submitted 18 January, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted by AAAI 2024

arXiv:2308.09911 [pdf, other]

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Authors: Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, Peng Hu

Abstract: Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, th… ▽ More Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, the image-text pairs inevitably exist under-correlated or even false-correlated, a.k.a noisy correspondence (NC), due to the low quality of the images and annotation errors. To address this problem, we propose a novel Robust Dual Embedding method (RDE) that can learn robust visual-semantic associations even with NC. Specifically, RDE consists of two main components: 1) A Confident Consensus Division (CCD) module that leverages the dual-grained decisions of dual embedding modules to obtain a consensus set of clean training data, which enables the model to learn correct and reliable visual-semantic associations. 2) A Triplet Alignment Loss (TAL) relaxes the conventional Triplet Ranking loss with the hardest negative samples to a log-exponential upper bound over all negative ones, thus preventing the model collapse under NC and can also focus on hard-negative samples for promising performance. We conduct extensive experiments on three public benchmarks, namely CUHK-PEDES, ICFG-PEDES, and RSTPReID, to evaluate the performance and robustness of our RDE. Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on all three datasets. Code is available at https://github.com/QinYang79/RDE. △ Less

Submitted 28 March, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

arXiv:2308.09658 [pdf, other]

Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning

Authors: Pengbo Hu, Ji Qi, Xingyu Li, Hong Li, Xinqi Wang, Bing Quan, Ruiyu Wang, Yi Zhou

Abstract: There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few… ▽ More There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few inference steps. Planning for the more challenging multi-hop visual reasoning tasks remains under-explored. Specifically, under multi-hop reasoning situations, the trade-off between accuracy and the complexity of plan-searching becomes prominent. The prevailing algorithms either address the efficiency issue by employing the fast one-stop generation or adopt a complex iterative generation method to improve accuracy. Both fail to balance the need for efficiency and performance. Drawing inspiration from the dual system of cognition in the human brain, the fast and the slow think processes, we propose a hierarchical plan-searching algorithm that integrates the one-stop reasoning (fast) and the Tree-of-thought (slow). Our approach succeeds in performance while significantly saving inference steps. Moreover, we repurpose the PTR and the CLEVER datasets, developing a systematic framework for evaluating the performance and efficiency of LLMs-based plan-search algorithms under reasoning tasks at different levels of difficulty. Extensive experiments demonstrate the superiority of our proposed algorithm in terms of performance and efficiency. The dataset and code will be release soon. △ Less

Submitted 20 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 16 pages,1 figures, under review

arXiv:2308.07686 [pdf, other]

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Authors: Hong Li, Xingyu Li, Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou

Abstract: While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the modality competition phenomenon. Existing works attempt to improve the jointly trained model by modulating the training process. Despite their effectiveness, those m… ▽ More While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the modality competition phenomenon. Existing works attempt to improve the jointly trained model by modulating the training process. Despite their effectiveness, those methods can only apply to late fusion models. More importantly, the mechanism of the modality competition remains unexplored. In this paper, we first propose an adaptive gradient modulation method that can boost the performance of multi-modal models with various fusion strategies. Extensive experiments show that our method surpasses all existing modulation methods. Furthermore, to have a quantitative understanding of the modality competition and the mechanism behind the effectiveness of our modulation method, we introduce a novel metric to measure the competition strength. This metric is built on the mono-modal concept, a function that is designed to represent the competition-less state of a modality. Through systematic investigation, our results confirm the intuition that the modulation encourages the model to rely on the more informative modality. In addition, we find that the jointly trained model typically has a preferred modality on which the competition is weaker than other modalities. However, this preferred modality need not dominate others. Our code will be available at https://github.com/lihong2303/AGM_ICCV2023. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.05901 [pdf, other]

Deepsea: A Meta-ocean Prototype for Undersea Exploration

Authors: Jinyu Li, Ping Hu, Weicheng Cui, Tianyi Huang, Shenghui Cheng

Abstract: Metaverse has attracted great attention from industry and academia in recent years. Metaverse for the ocean (Meta-ocean) is the implementation of the Metaverse technologies in virtual emersion of the ocean which is beneficial for people yearning for the ocean. It has demonstrated great potential for tourism and education with its strong immersion and appealing interactive user experience. However,… ▽ More Metaverse has attracted great attention from industry and academia in recent years. Metaverse for the ocean (Meta-ocean) is the implementation of the Metaverse technologies in virtual emersion of the ocean which is beneficial for people yearning for the ocean. It has demonstrated great potential for tourism and education with its strong immersion and appealing interactive user experience. However, quite limited endeavors have been spent on exploring the full possibility of Meta-ocean, especially in modeling the movements of marine creatures. In this paper, we first investigate the technology status of Metaverse and virtual reality (VR) and develop a prototype that builds the Meta-ocean in VR devices with strong immersive visual effects. Then, we demonstrate a method to model the undersea scene and marine creatures and propose an optimized path algorithm based on the Catmull-Rom spline to model the movements of marine life. Finally, we conduct a user study to analyze our Meta-ocean prototype. This user study illustrates that our new prototype can give us strong immersion and an appealing interactive user experience. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.03822 [pdf, other]

Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effects of eccentricity. Here, we present observational results for a waveform-independent search sensitive to eccentric black hole coalescences, covering the third observing run (O3) of the LIGO and Virgo detectors. We identified no new high-significance candidates beyond those that were already identified with searches focusing on quasi-circular binaries. We determine the sensitivity of our search to high-mass (total mass $M>70$ $M_\odot$) binaries covering eccentricities up to 0.3 at 15 Hz orbital frequency, and use this to compare model predictions to search results. Assuming all detections are indeed quasi-circular, for our fiducial population model, we place an upper limit for the merger rate density of high-mass binaries with eccentricities $0 < e \leq 0.3$ at $0.33$ Gpc$^{-3}$ yr$^{-1}$ at 90\% confidence level. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 24 pages, 5 figures

Report number: LIGO-P2300080

arXiv:2308.01890 [pdf, other]

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

Authors: Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Abstract: Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between te… ▽ More Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++), which serves as a unified approach for addressing partial-label and zero-shot multi-label recognition. In DualCoOp++ we separately encode evidential, positive, and negative contexts for target classes as parametric components of the linguistic input (i.e., prompts). The evidential context aims to discover all the related visual content for the target class, and serves as guidance to aggregate positive and negative contexts from the spatial domain of the image, enabling better distinguishment between similar categories. Additionally, we introduce a Winner-Take-All module that promotes inter-class interaction during training, while avoiding the need for extra parameters and costs. As DualCoOp++ imposes minimal additional learnable overhead on the pretrained vision-language framework, it enables rapid adaptation to multi-label recognition tasks with limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the superior performance of our approach compared to state-of-the-art methods. △ Less

Submitted 13 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: TPAMI. arXiv admin note: substantial text overlap with arXiv:2206.09541

arXiv:2307.16253 [pdf, other]

Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction

Authors: Pengfei Hu, Jiefeng Ma, Zhenrong Zhang, Jun Du, Jianshu Zhang

Abstract: Recently, handwritten Chinese character error correction has been greatly improved by employing encoder-decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and encode linguistic information inherent in IDS sequences, leading to a tendency to generate IDS sequences that match seen characters. This poses a chal… ▽ More Recently, handwritten Chinese character error correction has been greatly improved by employing encoder-decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and encode linguistic information inherent in IDS sequences, leading to a tendency to generate IDS sequences that match seen characters. This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead. Therefore, we introduce Count, Decode and Fetch (CDF), a novel approach that exhibits better generalization towards unseen misspelled characters. CDF is mainly composed of three parts: the counter, the decoder, and the fetcher. In the first stage, the counter predicts the number of each radical class without the symbol-level position annotations. In the second stage, the decoder employs the counting information and generates the IDS sequence step by step. Moreover, by updating the counting information at each time step, the decoder becomes aware of the existence of each radical. With the decomposed IDS sequence, we can determine whether the given character is misspelled. If it is misspelled, the fetcher under the transductive transfer learning strategy predicts the ideal character that the user originally intended to write. We integrate our method into existing encoder-decoder models and significantly enhance their performance. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Showing 51–100 of 410 results for author: Hu, P