-
Molecular interaction volume model of mixing enthalpy for molten salt system: An integrated calorimetry-model case study of LaCl$_3$-(LiCl-KCl)
Authors:
Vitaliy G. Goncharov,
William Smith,
Jiahong Li,
Jeffrey A. Eakin,
Erik D. Reinhart,
James Boncella,
Luke D. Gibson,
Vyacheslav S. Bryantsev,
Rushi Gong,
Shun-Li Shang,
Zi-Kui Liu,
Hongwu Xu,
Aurora Clark,
Xiaofeng Guo
Abstract:
Calorimetric determination of enthalpies of mixing ($Δ$H$_{\rm mix}$) of multicomponent molten salts often employs empirical models that lack parameters with clear physical interpretation (e.g., coordination numbers, molar volumes, and pair potentials). Although such physics informed models are not always needed, a thermodynamic understanding of the relationships between excess energies of mixing…
▽ More
Calorimetric determination of enthalpies of mixing ($Δ$H$_{\rm mix}$) of multicomponent molten salts often employs empirical models that lack parameters with clear physical interpretation (e.g., coordination numbers, molar volumes, and pair potentials). Although such physics informed models are not always needed, a thermodynamic understanding of the relationships between excess energies of mixing and local to intermediate solvation structures is particularly important for pyrochemical separation, as is the case for lanthanides (Ln), which are common neutron poisons and critical industrial elements found in spent nuclear fuels. Here we implement the molecular interaction volume model (MIVM) to synthesize information from experimentally measured $Δ$H$_{\rm mix}$ (using high temperature melt drop calorimetry) and the distribution of solvation structures from ab initio molecular dynamics (AIMD) simulations. This was demonstrated by a case study of molten salt system consisted of LaCl$_3$ mixing with a eutectic LiCl-KCl (58mol% to 42mol%) at 873 K and 1133 K. The parameters modelled from MIVM were used to extrapolate excess Gibbs energy ($Δ$G$_{\rm mix}$), and compositional dependence of La$^{3+}$ activity in the LaCl$_3$-(LiCl-KCl) system. In contrast, by AIMD or polarizable ion model (PIM) simulations, a significant deviation regarding the predicted $Δ$H$_{\rm mix}$ was seen if computed directly from the molecular dynamic trajectories. The integrated experimental and simulation data within the MIVM formalism are generalizable to a wide variety of molten salts and demonstrate a significant improvement over currently employed methods to study molten salts for nuclear and separations sciences.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Deconvoluting Thermomechanical Effects in X-ray Diffraction Data using Machine Learning
Authors:
Rachel E. Lim,
Shun-Li Shang,
Chihpin Chuang,
Thien Q. Phan,
Zi-Kui Liu,
Darren C. Pagan
Abstract:
X-ray diffraction is ideal for probing sub-surface state during complex or rapid thermomechanical loading of crystalline materials. However, challenges arise as the size of diffraction volumes increase due to spatial broadening and inability to deconvolute the effects of different lattice deformation mechanisms. Here we present a novel approach to use combinations of physics-based modeling and mac…
▽ More
X-ray diffraction is ideal for probing sub-surface state during complex or rapid thermomechanical loading of crystalline materials. However, challenges arise as the size of diffraction volumes increase due to spatial broadening and inability to deconvolute the effects of different lattice deformation mechanisms. Here we present a novel approach to use combinations of physics-based modeling and machine learning to deconvolve thermal and mechanical elastic strains for diffraction data analysis. The method builds on a previous effort to extract thermal strain distribution information from diffraction data. The new approach is applied to extract the evolution of thermomechanical state during laser melting of an Inconel 625 wall specimen which produces significant residual stress upon cooling. A combination of heat transfer and fluid flow, elasto-plasticity, and X-ray diffraction simulations are used to generate training data for machine-learning (Gaussian Process Regression, GPR) models that map diffracted intensity distributions to underlying thermomechanical strain fields. First-principles density functional theory is used to determine accurate temperature-dependent thermal expansion and elastic stiffness used in the elasto-plasticity modeling. The trained GPR models are found to be capable of deconvoluting the effects of thermal and mechanical strains, in addition to providing information about underlying strain distributions, even from complex diffraction patterns with irregularly shaped peaks.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Ergodicity of Stochastic two-phase Stefan problem driven by pure jump Lévy noise
Authors:
Xiaotian Ge,
Shijie Shang,
Jianliang Zhai,
Tusheng Zhang
Abstract:
In this paper, we consider stochastic two-phase Stefan problem driven by general jump Lévy noise. We first obtain the existence and uniqueness of the strong solution and then establish the ergodicity of the stochastic Stefan problem. Moreover, we give a precise characterization of the support of the invariant measures which provides the regularities of the stationary solutions of the stochastic fr…
▽ More
In this paper, we consider stochastic two-phase Stefan problem driven by general jump Lévy noise. We first obtain the existence and uniqueness of the strong solution and then establish the ergodicity of the stochastic Stefan problem. Moreover, we give a precise characterization of the support of the invariant measures which provides the regularities of the stationary solutions of the stochastic free boundary problems.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks
Authors:
Chengrui Huang,
Zhengliang Shi,
Yuntao Wen,
Xiuying Chen,
Peng Han,
Shen Gao,
Shuo Shang
Abstract:
Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a…
▽ More
Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, and algorithms. Without understanding the impact of these factors, it can lead to inconsistent results, inefficient model deployment, and suboptimal tool utilization, ultimately hindering the practical integration and scalability of LLMs in real-world scenarios. Therefore, in this paper, we explore the impact of both internal and external factors on the performance of tool learning frameworks. Through extensive experiments on two benchmark datasets, we find several insightful conclusions for future work, including the observation that LLMs can benefit significantly from increased trial and exploration. We believe our empirical study provides a new perspective for future tool learning research.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models
Authors:
Zhiyuan Tang,
Dong Wang,
Shen Huang,
Shidong Shang
Abstract:
Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chin…
▽ More
Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the error-correcting ability of LLMs when compared with those without regularization. The dataset is available on the website.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Authors:
Shihan Deng,
Weikai Xu,
Hongda Sun,
Wei Liu,
Tao Tan,
Jianfeng Liu,
Ang Li,
Jian Luan,
Bin Wang,
Rui Yan,
Shuo Shang
Abstract:
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions…
▽ More
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Simulating Financial Market via Large Language Model based Agents
Authors:
Shen Gao,
Yuntao Wen,
Minghang Zhu,
Jianing Wei,
Yuhan Cheng,
Qunzi Zhang,
Shuo Shang
Abstract:
Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}…
▽ More
Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}arket (ASFM), which first constructs a simulated stock market with a real order matching system. Then, we propose a large language model based agent as the stock trader, which contains the profile, observation, and tool-learning based action module. The trading agent can comprehensively understand current market dynamics and financial policy information, and make decisions that align with their trading strategy. In the experiments, we first verify that the reactions of our ASFM are consistent with the real stock market in two controllable scenarios. In addition, we also conduct experiments in two popular economics research directions, and we find that conclusions drawn in our \model align with the preliminary findings in economics research. Based on these observations, we believe our proposed ASFM provides a new paradigm for economic research.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Thermodynamic modeling of the LiCl-KCl-LaCl$_3$ system with Bayesian model selection and uncertainty quantification
Authors:
Rushi Gong,
Shun-Li Shang,
Vitaliy G. Goncharov,
Xiaofeng Guo,
Zi-Kui Liu
Abstract:
Chloride molten salts are increasingly recognized for their applications in pyroprocessing techniques for the separation of lanthanides. Understanding the thermodynamic properties of these molten salts is essential to optimize the separation process. Several thermodynamic models, including the associate model, the two-sublattice ionic model, and the modified quasichemical model with quadruplet app…
▽ More
Chloride molten salts are increasingly recognized for their applications in pyroprocessing techniques for the separation of lanthanides. Understanding the thermodynamic properties of these molten salts is essential to optimize the separation process. Several thermodynamic models, including the associate model, the two-sublattice ionic model, and the modified quasichemical model with quadruplet approximation (MQMQA), are utilized to capture the complexity of molten salts. In the present work, the Bayes factor was used to guide model selection process for the thermodynamic modeling of the KCl-LaCl$_3$ system and provide statistical comparisons of liquid models. The results indicate that the MQMQA model is the most favorable model based on the available thermochemical data. The LiCl-KCl-LaCl$_3$ system was further optimized with uncertainty quantification using MQMQA. The thermodynamic properties of compounds in the KCl-LaCl$_3$ system were obtained from DFT-based phonon calculations. The calculated phase stability shows excellent agreement with experimental data, indicating that an appropriate model is important for accurately predicting the behavior of complex molten salts.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke
Authors:
Jingxi Xu,
Runsheng Wang,
Siqi Shang,
Ava Chen,
Lauren Winterbottom,
To-Liang Hsu,
Wenxi Chen,
Khondoker Ahmed,
Pedro Leandro La Rotta,
Xinyue Zhu,
Dawn M. Nilsen,
Joel Stein,
Matei Ciocarlie
Abstract:
Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train i…
▽ More
Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos and additional information can be found at https://jxu.ai/chatemg.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting
Authors:
Hongda Sun,
Hongzhan Lin,
Haiyu Yan,
Chen Zhu,
Yang Song,
Xin Gao,
Shuo Shang,
Rui Yan
Abstract:
The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-pla…
▽ More
The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-playing capabilities of Large Language Models (LLMs), we propose to introduce a mock interview process between LLM-played interviewers and candidates. The mock interview conversations can provide additional evidence for candidate evaluation, thereby augmenting traditional person-job fitting based solely on resumes and job descriptions. However, characterizing these two roles in online recruitment still presents several challenges, such as developing the skills to raise interview questions, formulating appropriate answers, and evaluating two-sided fitness. To this end, we propose MockLLM, a novel applicable framework that divides the person-job matching process into two modules: mock interview generation and two-sided evaluation in handshake protocol, jointly enhancing their performance through collaborative behaviors between interviewers and candidates. We design a role-playing framework as a multi-role and multi-behavior paradigm to enable a single LLM agent to effectively behave with multiple functions for both parties. Moreover, we propose reflection memory generation and dynamic prompt modification techniques to refine the behaviors of both sides, enabling continuous optimization of the augmented additional evidence. Extensive experimental results show that MockLLM can achieve the best performance on person-job matching accompanied by high mock interview quality, envisioning its emerging application in real online recruitment in the future.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Revisiting first-principles thermodynamics by quasiharmonic approach: Application to study thermal expansion of additively-manufactured Inconel 625
Authors:
Shun-Li Shang,
Rushi Gong,
Michael C. Gao,
Darren C. Pagan,
Zi-Kui Liu
Abstract:
An innovative method is developed for accurate determination of thermodynamic properties as a function of temperature by revisiting the density functional theory (DFT) based quasiharmonic approach (QHA). The present methodology individually evaluates the contributions from static total energy, phonon, and thermal electron to free energy for increased efficiency and accuracy. The Akaike information…
▽ More
An innovative method is developed for accurate determination of thermodynamic properties as a function of temperature by revisiting the density functional theory (DFT) based quasiharmonic approach (QHA). The present methodology individually evaluates the contributions from static total energy, phonon, and thermal electron to free energy for increased efficiency and accuracy. The Akaike information criterion with a correction (AICc) is used to select models and model parameters for fitting each contribution as a function of volume. Using the additively manufactured Inconel alloy 625 (IN625) as an example, predicted temperature-dependent linear coefficient of thermal expansion (CTE) agrees well with dilatometer measurements and values in the literature. Sensitivity and uncertainty are also analyzed for the predicted IN625 CTE due to different structural configurations used by DFT, and hence different equilibrium properties determined.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Authors:
Shang Shang,
Xinqiang Zhao,
Zhongjiang Yao,
Yepeng Yao,
Liya Su,
Zijing Fan,
Xiaodan Zhang,
Zhengwei Jiang
Abstract:
To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content securi…
▽ More
To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.
△ Less
Submitted 7 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Investigation of ideal shear strength of dilute binary and ternary Ni-based alloys using first-principles calculations, CALPHAD modeling and correlation analysis
Authors:
Shuang Lin,
Shun-Li Shang,
John D. Shimanek,
Yi Wang,
Allison M. Beese,
Zi-Kui Liu
Abstract:
In the present work, the ideal shear strength (Tis) of dilute Ni34XZ ternary alloys (X or Z = Al, Co, Cr, Fe, Mn, Mo, Nb, Si, Ti) are predicted by first-principles calculations based on density functional theory (DFT) in terms of pure alias shear deformations. The results show that within the concentration up to 8.3% of the alloying elements, Tis increases with composition in binary systems with M…
▽ More
In the present work, the ideal shear strength (Tis) of dilute Ni34XZ ternary alloys (X or Z = Al, Co, Cr, Fe, Mn, Mo, Nb, Si, Ti) are predicted by first-principles calculations based on density functional theory (DFT) in terms of pure alias shear deformations. The results show that within the concentration up to 8.3% of the alloying elements, Tis increases with composition in binary systems with Mn, Fe, Co in ascending order, and decreases with composition with Nb, Si, Mo, Ti, Al, Cr in descending order. Combined with Ni34XZ in the present work and Ni11X in the literature from DFT-based calculations, the composition dependence of Tis in binary and ternary systems is modeled using the CALculation of PHAse Diagrams (CALPHAD) approach considering lattice instability, indicating that atomic bonding strength significantly influences Tis. Furthermore, correlational analyses show that Burgers vector and elastic constant C11 affect Tis the most out of the elemental features.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Global prediction of nuclear charge density distributions using deep neural network
Authors:
Tian Shuai Shang,
Hui Hui Xie,
Jian Li,
Haozhao Liang
Abstract:
A deep neural network (DNN) has been developed to generate the distributions of nuclear charge density, utilizing the training data from the relativistic density functional theory and incorporating available experimental charge radii of 1014 nuclei into the loss function. The DNN achieved a root-mean-square (rms) deviation of 0.0193 fm for charge radii on its validation set. Furthermore, the DNN c…
▽ More
A deep neural network (DNN) has been developed to generate the distributions of nuclear charge density, utilizing the training data from the relativistic density functional theory and incorporating available experimental charge radii of 1014 nuclei into the loss function. The DNN achieved a root-mean-square (rms) deviation of 0.0193 fm for charge radii on its validation set. Furthermore, the DNN can improve the description in both the tail and central regions of the charge density, enhancing agreement with experimental findings. The model's predictive capability has been further validated by its agreement with recent experimental data on charge radii. Finally, this refined model is employed to predict the charge density distributions in a wider range of nuclide chart, and the parameterized charge densities, charge radii, and higher-order moments of charge density distributions are given, providing a robust reference for future experimental investigations.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
DRE: Generating Recommendation Explanations by Aligning Large Language Models at Data-level
Authors:
Shen Gao,
Yifan Wang,
Jiabao Fang,
Lisi Chen,
Peng Han,
Shuo Shang
Abstract:
Recommendation systems play a crucial role in various domains, suggesting items based on user behavior.However, the lack of transparency in presenting recommendations can lead to user confusion. In this paper, we introduce Data-level Recommendation Explanation (DRE), a non-intrusive explanation framework for black-box recommendation models.Different from existing methods, DRE does not require any…
▽ More
Recommendation systems play a crucial role in various domains, suggesting items based on user behavior.However, the lack of transparency in presenting recommendations can lead to user confusion. In this paper, we introduce Data-level Recommendation Explanation (DRE), a non-intrusive explanation framework for black-box recommendation models.Different from existing methods, DRE does not require any intermediary representations of the recommendation model or latent alignment training, mitigating potential performance issues.We propose a data-level alignment method, leveraging large language models to reason relationships between user data and recommended items.Additionally, we address the challenge of enriching the details of the explanation by introducing target-aware user preference distillation, utilizing item reviews. Experimental results on benchmark datasets demonstrate the effectiveness of the DRE in providing accurate and user-centric explanations, enhancing user engagement with recommended item.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System
Authors:
Shen Gao,
Hao Li,
Chengrui Huang,
Quan Tu,
Zhiliang Tian,
Minlie Huang,
Shuo Shang
Abstract:
Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue…
▽ More
Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360$^\circ$ Assessment (360$^\circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360$^\circ$ performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360$^\circ$REA.
△ Less
Submitted 26 June, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Revealing Symmetry-broken Superconducting Configurations by Density Functional Theory
Authors:
Zi-Kui Liu,
Shun-Li Shang
Abstract:
A coherent theory for both conventional and unconventional superconductors is currently lacking. Here we show that the electron charge densities of Al, YBa2Cu3O7 (YBCO), and LaH10 along with Pb and Nb3Sn share the same feature of electron charge gains in their respective superconducting configurations (SCCs) predicted by first-principles calculations based on the density functional theory (DFT). I…
▽ More
A coherent theory for both conventional and unconventional superconductors is currently lacking. Here we show that the electron charge densities of Al, YBa2Cu3O7 (YBCO), and LaH10 along with Pb and Nb3Sn share the same feature of electron charge gains in their respective superconducting configurations (SCCs) predicted by first-principles calculations based on the density functional theory (DFT). It is discovered that the formation of SCCs is due to the local symmetry breaking from their normal conducting configurations (NCCs), and the electron charge gains in SCCs form electron tunnels in crystals that resemble pontoons, thus termed as electron pontoon tunnel (EPT) here. The nuclei promoting the formation of EPTs in conventional superconductors have strong bonding with other nuclei, resulting in their EPTs easily destroyed and thus low superconducting critical temperature (Tc), while in unconventional superconductor, this bonding is very weak as shown by negative stretching force constants in YBCO, thus resulting in much higher Tc. The fundamental understanding of SCCs and the capability to predict them by DFT enable theoretical search of room temperature superconductors without empirical models.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
MaterialsMap: A CALPHAD-Based Tool to Design Composition Pathways through feasibility map for Desired Dissimilar Materials, demonstrated with RSW Joining of Ag-Al-Cu
Authors:
Hui Sun,
Bo Pan,
Zhening Yang,
Adam M. Krajewski,
Brandon Bocklund,
Shun-Li Shang,
Jingjing Li,
Allison M. Beese,
Zi-Kui Liu
Abstract:
Assembly of dissimilar metals can be achieved by different methods, for example, casting, welding, and additive manufacturing (AM). However, undesired phases formed in liquid-phase assembling processes due to solute segregation during solidification diminish mechanical and other properties of the processed parts. In the present work, an open-source software named MaterialsMap, has been developed b…
▽ More
Assembly of dissimilar metals can be achieved by different methods, for example, casting, welding, and additive manufacturing (AM). However, undesired phases formed in liquid-phase assembling processes due to solute segregation during solidification diminish mechanical and other properties of the processed parts. In the present work, an open-source software named MaterialsMap, has been developed based on the CALculation of Phase Diagrams (CALPHAD) approach. The primary objective of MaterialsMap is to facilitate the design of an optimal composition pathway for assembling dissimilar alloys with liquid-phases based on the formation of desired and undesired phases along the pathway. In MaterialsMap, equilibrium thermodynamic calculations are used to predict equilibrium phases formed at slow cooling rate, while Scheil-Gulliver simulations are employed to predict non-equilibrium phases formed during rapid cooling. By combining these two simulations, MaterialsMap offers a thorough guide for understanding phase formation in various manufacturing processes, assisting users in making informed decisions during material selection and production. As a demonstration of this approach, a compositional pathway was designed from pure Al to pure Cu through Ag using MaterialsMap. The design was experimentally verified using resistance spot welding (RSW).
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection
Authors:
Sihan Shang,
Jiancheng Yang,
Zhenglong Sun,
Pascal Fua
Abstract:
In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data…
▽ More
In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by "cooking" the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also "cooked" through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
HDRTransDC: High Dynamic Range Image Reconstruction with Transformer Deformation Convolution
Authors:
Shuaikang Shang,
Xuejing Kang,
Anlong Ming
Abstract:
High Dynamic Range (HDR) imaging aims to generate an artifact-free HDR image with realistic details by fusing multi-exposure Low Dynamic Range (LDR) images. Caused by large motion and severe under-/over-exposure among input LDR images, HDR imaging suffers from ghosting artifacts and fusion distortions. To address these critical issues, we propose an HDR Transformer Deformation Convolution (HDRTran…
▽ More
High Dynamic Range (HDR) imaging aims to generate an artifact-free HDR image with realistic details by fusing multi-exposure Low Dynamic Range (LDR) images. Caused by large motion and severe under-/over-exposure among input LDR images, HDR imaging suffers from ghosting artifacts and fusion distortions. To address these critical issues, we propose an HDR Transformer Deformation Convolution (HDRTransDC) network to generate high-quality HDR images, which consists of the Transformer Deformable Convolution Alignment Module (TDCAM) and the Dynamic Weight Fusion Block (DWFB). To solve the ghosting artifacts, the proposed TDCAM extracts long-distance content similar to the reference feature in the entire non-reference features, which can accurately remove misalignment and fill the content occluded by moving objects. For the purpose of eliminating fusion distortions, we propose DWFB to spatially adaptively select useful information across frames to effectively fuse multi-exposed features. Extensive experiments show that our method quantitatively and qualitatively achieves state-of-the-art performance.
△ Less
Submitted 29 August, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering
Authors:
Hongda Sun,
Yuxuan Liu,
Chengwei Wu,
Haiyu Yan,
Cheng Tai,
Xin Gao,
Shuo Shang,
Rui Yan
Abstract:
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, nei…
▽ More
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and TriviaQA) demonstrate that LLMQA achieves the best performance in terms of both answer accuracy and evidence quality, showcasing its potential for advancing ODQA research and applications.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning
Authors:
Chuanqi Cheng,
Quan Tu,
Wei Wu,
Shuo Shang,
Cunli Mao,
Zhengtao Yu,
Rui Yan
Abstract:
Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances t…
▽ More
Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances the ability of pre-trained large language models to leverage dialogue history to characterize persona for completing personalized dialogue generation tasks without pre-defined profiles. Our experiments on three datasets demonstrate that IDL brings substantial improvements, with BLEU and ROUGE scores increasing by up to 200% and 247%, respectively. Additionally, the results of human evaluations further validate the efficacy of our proposed method.
△ Less
Submitted 12 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Not All Layers of LLMs Are Necessary During Inference
Authors:
Siqi Fan,
Xin Jiang,
Xiang Li,
Xuying Meng,
Peng Han,
Shuo Shang,
Aixin Sun,
Yequan Wang,
Zhongyuan Wang
Abstract:
Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. However, not all requests posed to LLMs are equally difficult to handle. Through analysis, we show that for some tasks, LLMs can achieve results comparable to the final output at some intermediate layers. That is, not all layers of LLMs are necessary during inference. If we can predict…
▽ More
Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. However, not all requests posed to LLMs are equally difficult to handle. Through analysis, we show that for some tasks, LLMs can achieve results comparable to the final output at some intermediate layers. That is, not all layers of LLMs are necessary during inference. If we can predict at which layer the inferred results match the final results (produced by evaluating all layers), we could significantly reduce the inference cost. To this end, we propose a simple yet effective algorithm named AdaInfer to adaptively terminate the inference process for an input instance. AdaInfer relies on easily obtainable statistical features and classic classifiers like SVM. Experiments on well-known LLMs like the Llama2 series and OPT, show that AdaInfer can achieve an average of 17.8% pruning ratio, and up to 43% on sentiment tasks, with nearly no performance drop (<1%). Because AdaInfer does not alter LLM parameters, the LLMs incorporated with AdaInfer maintain generalizability across tasks.
△ Less
Submitted 9 July, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Explainable Session-based Recommendation via Path Reasoning
Authors:
Yang Cao,
Shuo Shang,
Jun Wang,
Wei Zhang
Abstract:
This paper explores providing explainability for session-based recommendation (SR) by path reasoning. Current SR models emphasize accuracy but lack explainability, while traditional path reasoning prioritizes knowledge graph exploration, ignoring sequential patterns present in the session history. Therefore, we propose a generalized hierarchical reinforcement learning framework for SR, which impro…
▽ More
This paper explores providing explainability for session-based recommendation (SR) by path reasoning. Current SR models emphasize accuracy but lack explainability, while traditional path reasoning prioritizes knowledge graph exploration, ignoring sequential patterns present in the session history. Therefore, we propose a generalized hierarchical reinforcement learning framework for SR, which improves the explainability of existing SR models via Path Reasoning, namely PR4SR. Considering the different importance of items to the session, we design the session-level agent to select the items in the session as the starting point for path reasoning and the path-level agent to perform path reasoning. In particular, we design a multi-target reward mechanism to adapt to the skip behaviors of sequential patterns in SR, and introduce path midpoint reward to enhance the exploration efficiency in knowledge graphs. To improve the completeness of the knowledge graph and to diversify the paths of explanation, we incorporate extracted feature information from images into the knowledge graph. We instantiate PR4SR in five state-of-the-art SR models (i.e., GRU4REC, NARM, GCSAN, SR-GNN, SASRec) and compare it with other explainable SR frameworks, to demonstrate the effectiveness of PR4SR for recommendation and explanation tasks through extensive experiments with these approaches on four datasets.
△ Less
Submitted 28 February, 2024;
originally announced March 2024.
-
First-principles Investigation of Thermodynamic Properties of CrNbO4 and CrTaO4
Authors:
Shuang Lin,
Shun-Li Shang,
Allison M. Beese,
Zi-Kui Liu
Abstract:
In the present study, the DFT+U method was employed to predict the thermodynamic properties of Cr2O3, Nb2O5, and Ta2O5. Results were benchmarked with experimental data showing high accuracy, except for the negative thermal expansion (NTE) of Nb2O5, which is attributed to its polymorphic complexity. Additionally, we extended our analysis to rutile-type oxides CrNbO4 and CrTaO4, examining their entr…
▽ More
In the present study, the DFT+U method was employed to predict the thermodynamic properties of Cr2O3, Nb2O5, and Ta2O5. Results were benchmarked with experimental data showing high accuracy, except for the negative thermal expansion (NTE) of Nb2O5, which is attributed to its polymorphic complexity. Additionally, we extended our analysis to rutile-type oxides CrNbO4 and CrTaO4, examining their entropy and heat capacity at finite temperatures. CrNbO4 displayed slightly higher entropy and heat capacity at high temperatures. The mean linear thermal expansion coefficients for CrNbO4 and CrTaO4 from 500 K to 2000 K were predicted to be 6.00*10-6/K and 13.49*10-6/K, respectively, corroborating with DFT predictions and experimental evidence. Our research highlights the precision of the DFT+U and phonon methods in predicting the thermodynamic properties of oxide materials, offering insights into the design of corrosion-resistant materials.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Predicting Phase Transitions in PbTiO$_3$ using Zentropy through Quasi-Harmonic Phonon Calculations
Authors:
Nigel Lee En Hew,
Shun-Li Shang,
Zi-Kui Liu
Abstract:
According to X-ray diffraction (XRD) measurements, PbTiO$_3$ undergoes a phase transition from a tetragonal ferroelectric phase to a cubic paraelectric phase at 763 K. However, X-ray absorption fine-structure (XAFS) measurements indicate that PbTiO$_3$ is locally tetragonal even after the phase transition. The difference in these results is because XAFS measurements can probe local features of a s…
▽ More
According to X-ray diffraction (XRD) measurements, PbTiO$_3$ undergoes a phase transition from a tetragonal ferroelectric phase to a cubic paraelectric phase at 763 K. However, X-ray absorption fine-structure (XAFS) measurements indicate that PbTiO$_3$ is locally tetragonal even after the phase transition. The difference in these results is because XAFS measurements can probe local features of a structure, while XRD averages over such local features. For both measurements to be consistent, PbTiO$_3$ is macroscopically cubic but locally tetragonal after the phase transition. Despite this, most models, such as the Laundau-Ginsburg-Devonshire theory and effective Hamiltonians, are still unable to explain this phenomenon. Moreover, these methods involve model parameters fitted to experimental or theoretical data and do not consider other tetragonal configurations, such as domain walls, to predict the phase transition. Our previous study used our novel zentropy approach to predict the phase transition by considering the tetragonal ferroelectric ground state configuration and the tetragonal 90° and 180° domain wall configurations with their total energies at 0 K. In the present work, the Helmholtz energies of the three configurations are obtained from density functional theory calculations through energy-volume curves and phonon calculations. The predicted phase transition temperature using the meta-GGA $r^2\text{SCAN}$ and revised multiplicities of configurations is 716 K, showing good agreement with the experimental value of 763 K.
△ Less
Submitted 17 July, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Revisiting thermodynamics in (LiF, NaF, KF, CrF2)-CrF3 by first-principles calculations and CALPHAD modeling
Authors:
Rushi Gong,
Shun-Li Shang,
Yi Wang,
Jorge Paz Soldan Palma,
Hojong Kim,
Zi-Kui Liu
Abstract:
The thermodynamic description of the (LiF, NaF, KF, CrF2)-CrF3 systems has been revisited, aiming for a better understanding of the effects of Cr on the FLiNaK molten salt. First-principles calculations based on density functional theory (DFT) were performed to determine the electronic and structural properties of each compound, including the formation enthalpy, volume, and bulk modulus. DFT-based…
▽ More
The thermodynamic description of the (LiF, NaF, KF, CrF2)-CrF3 systems has been revisited, aiming for a better understanding of the effects of Cr on the FLiNaK molten salt. First-principles calculations based on density functional theory (DFT) were performed to determine the electronic and structural properties of each compound, including the formation enthalpy, volume, and bulk modulus. DFT-based phonon calculations were carried out to determine the thermodynamic properties of compounds, for example, enthalpy, entropy, and heat capacity as functions of temperature. Phonon-based thermodynamic properties show a good agreement with experimental data of binary compounds LiF, NaF, KF, CrF3, and CrF2, establishing a solid foundation to determine thermodynamic properties of ternary compounds as well as to verify results estimated by the Neumann-Kopp rule. Additionally, DFT-based ab initio molecular dynamics (AIMD) simulations were employed to predict the mixing enthalpies of liquid salts. Using DFT-based results and experimental data in the literature, the (LiF, NaF, KF, CrF2)-CrF3 system has been remodeled in terms of the CALculation of PHAse Diagrams (CALPHAD) approach using the modified quasichemical model with quadruplet approximation (MQMQA) for liquid. Calculated phase stability in the present work shows an excellent agreement with experiments, indicating the effectiveness of combining DFT-based total energy, phonon, and AIMD calculations, and CALPHAD modeling to provide the thermodynamic description in complex molten salt systems.
△ Less
Submitted 28 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
R$\times$R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Authors:
Gagan Khandate,
Tristan L. Saidi,
Siqi Shang,
Eric T. Chang,
Yang Liu,
Seth Dennis,
Johnson Adams,
Matei Ciocarlie
Abstract:
We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work prese…
▽ More
We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: https://sbrl.cs.columbia.edu
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Additively manufactured Ni-20Cr to V functionally graded material: computational predictions and experimental verification of phase formations
Authors:
Beril Tonyali,
Hui Sun,
Brandon Bocklund,
John Paul Borgonia,
Richard A. Otis,
Shun-Li Shang,
Zi-Kui Liu,
Allison M. Beese
Abstract:
A database for the Cr-Ni-V system was constructed by modeling the binary Cr-V and ternary Cr-Ni-V systems using the CALPHAD approach aided by density functional theory (DFT)-based first-principles calculations and ab initio molecular dynamics (AIMD) simulations. To validate this new database, a functionally graded material (FGM) using Ni-20Cr and elemental V was fabricated using directed energy de…
▽ More
A database for the Cr-Ni-V system was constructed by modeling the binary Cr-V and ternary Cr-Ni-V systems using the CALPHAD approach aided by density functional theory (DFT)-based first-principles calculations and ab initio molecular dynamics (AIMD) simulations. To validate this new database, a functionally graded material (FGM) using Ni-20Cr and elemental V was fabricated using directed energy deposition additive manufacturing (DED AM) and experimentally characterized. The deposited Ni-20Cr was pure fcc phase, while increasing the amount of V across the gradient resulted in the formation of sigma phase, followed by the bcc phase. The experimentally measured phase data was compared with computational predictions made using a Cr-Ni-V thermodynamic database from the literature as well as the database developed in the present work. The newly developed database was shown to better predict the experimentally observed phases due to its accurate modeling of binary systems within the database and the ternary liquid phase, which is critical for accurate Scheil calculations.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Viscosity bounds in liquids with different structure and bonding types
Authors:
M. Withington,
H. L. Devereux,
C. Cockrell,
A. M. Elena,
I. T. Todorov,
Z. K. Liu,
S. L. Shang,
J. S. McCloy,
P. A. Bingham,
K. Trachenko
Abstract:
Recently, it was realised that liquid viscosity has a lower bound which is nearly constant for all liquids and is governed by fundamental physical constants. This was supported by experimental data in noble and molecular liquids. Here, we perform large-scale molecular dynamics simulations to ascertain this bound in two other important liquid types: the ionic molten salt system LiF and metallic Pb.…
▽ More
Recently, it was realised that liquid viscosity has a lower bound which is nearly constant for all liquids and is governed by fundamental physical constants. This was supported by experimental data in noble and molecular liquids. Here, we perform large-scale molecular dynamics simulations to ascertain this bound in two other important liquid types: the ionic molten salt system LiF and metallic Pb. We find that these ionic and metallic systems similarly have lower viscosity bounds corresponding to the minimum of kinematic viscosity of about 10$^{-7}$ $\frac{{\rm m}^2}{\rm s}$. We show that this agrees with experimental data in other systems with different structures and bonding types, including noble, molecular, metallic and covalent liquids. This expands the universality of viscosity bounds into the main system types known.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Allison-Benkart-Gao functor and the free non-unital alternative algebras
Authors:
Shikui Shang
Abstract:
Let $k$ be a field of characteristic $0$. We introduce a pair of adjoint functors, Allison-Benkart-Gao functor $\mathcal{ABG}$ and Berman-Moody functor $\mathcal{BM}$, between the category of non-unital alternative algebras over $k$ and the category ${\text{\bf Lie}_{\text{R}}}$ of Lie algebras with appropriate $sl_3(k)$-module structures. Surprisingly, when $A$ is a non-unital alternative algebra…
▽ More
Let $k$ be a field of characteristic $0$. We introduce a pair of adjoint functors, Allison-Benkart-Gao functor $\mathcal{ABG}$ and Berman-Moody functor $\mathcal{BM}$, between the category of non-unital alternative algebras over $k$ and the category ${\text{\bf Lie}_{\text{R}}}$ of Lie algebras with appropriate $sl_3(k)$-module structures. Surprisingly, when $A$ is a non-unital alternative algebra, the Allison-Gao Lie algebra $\mathcal{ABG}(A)$ is different from the more well-known Steinberg Lie algebra $st_3(A)$.
Next, let $A(D)$ be the free (non-unit) alternative algebra generated by $D$ elements and $\text{Inner} A(D)$ the inner derivation algebra of $A(D)$. A conjecture on the homology of $H_r(\mathcal{ABG}(A(D)))$ is proposed.
Let $A(D)_n$(resp. $\text{Inner} A(D)_n$) be the degree $n$ component of $A(D)_n$(resp. $\text{Inner} A(D)_n$). The previous conjecture implies another conjecture on the dimensions on $A(D)_n$ and $\text{Inner} A(D)_n$. We also give some evidences to support the these conjectures.
△ Less
Submitted 30 December, 2023; v1 submitted 26 December, 2023;
originally announced December 2023.
-
FeaInfNet: Diagnosis in Medical Image with Feature-Driven Inference and Visual Explanations
Authors:
Yitao Peng,
Lianghua He,
Die Hu,
Yihang Liu,
Longzhen Yang,
Shaohua Shang
Abstract:
Interpretable deep learning models have received widespread attention in the field of image recognition. Due to the unique multi-instance learning of medical images and the difficulty in identifying decision-making regions, many interpretability models that have been proposed still have problems of insufficient accuracy and interpretability in medical image disease diagnosis. To solve these proble…
▽ More
Interpretable deep learning models have received widespread attention in the field of image recognition. Due to the unique multi-instance learning of medical images and the difficulty in identifying decision-making regions, many interpretability models that have been proposed still have problems of insufficient accuracy and interpretability in medical image disease diagnosis. To solve these problems, we propose feature-driven inference network (FeaInfNet). Our first key innovation involves proposing a feature-based network reasoning structure, which is applied to FeaInfNet. The network of this structure compares the similarity of each sub-region image patch with the disease templates and normal templates that may appear in the region, and finally combines the comparison of each sub-region to make the final diagnosis. It simulates the diagnosis process of doctors to make the model interpretable in the reasoning process, while avoiding the misleading caused by the participation of normal areas in reasoning. Secondly, we propose local feature masks (LFM) to extract feature vectors in order to provide global information for these vectors, thus enhancing the expressive ability of the FeaInfNet. Finally, we propose adaptive dynamic masks (Adaptive-DM) to interpret feature vectors and prototypes into human-understandable image patches to provide accurate visual interpretation. We conducted qualitative and quantitative experiments on multiple publicly available medical datasets, including RSNA, iChallenge-PM, Covid-19, ChinaCXRSet, and MontgomerySet. The results of our experiments validate that our method achieves state-of-the-art performance in terms of classification accuracy and interpretability compared to baseline methods in medical image diagnosis. Additional ablation studies verify the effectiveness of each of our proposed components.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Value sets of non-permutation polynomials over the residue class rings of integers
Authors:
Shikui Shang
Abstract:
In this paper, we study the value sets of non-permutation polynomial functions over the residue class ring $\mathbb{Z}/m\mathbb{Z}$. When $m=p^r$ is a power of some prime $p$, an upper bound is given for the size of the value set of a polynomial function which is not a permutation. We also show that this upper bound can be achieved by some integral polynomials. Finally, we generalize the results f…
▽ More
In this paper, we study the value sets of non-permutation polynomial functions over the residue class ring $\mathbb{Z}/m\mathbb{Z}$. When $m=p^r$ is a power of some prime $p$, an upper bound is given for the size of the value set of a polynomial function which is not a permutation. We also show that this upper bound can be achieved by some integral polynomials. Finally, we generalize the results for any positive integer $m$ with known prime decomposition.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy
Authors:
Hongda Sun,
Weikai Xu,
Wei Liu,
Jian Luan,
Bin Wang,
Shuo Shang,
Ji-Rong Wen,
Rui Yan
Abstract:
Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structur…
▽ More
Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structures to diverse tasks; (2) Insufficient precision in exploiting known conditions to derive new ones; and (3) Inadequate consideration of historical reasoning experiences for subsequent reasoning steps. To this end, we propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy. First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights. Subsequently, we leverage quantitative measurements to prioritize more relevant premises to explore new insights. Furthermore, we automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps. Comprehensive experimental results demonstrate that DetermLR surpasses all baselines on various logical reasoning benchmarks: LogiQA, ProofWriter, FOLIO, PrOntoQA, and LogicalDeduction. Compared to previous multi-step reasoning methods, DetermLR achieves higher accuracy with fewer reasoning steps, highlighting its superior efficiency and effectiveness in solving logical reasoning tasks.
△ Less
Submitted 26 May, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning
Authors:
Yuxiang Wang,
Xiao Yan,
Chuang Hu,
Fangcheng Fu,
Wentao Zhang,
Hao Wang,
Shuo Shang,
Jiawei Jiang
Abstract:
For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms…
▽ More
For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
The unimodular equivalence of sublattices in an $n$-dimensional lattice
Authors:
Shikui Shang
Abstract:
In this paper, we study the unimodular equivalence of sublattices in an $n$-dimensional lattice. A recursive procedure is given to compute the cardinalities of the unimodular equivalent classes with the indices which are powers of a prime $p$. We also show that these are integral polynomials in $p$. When $n=2$, the explicit formulae of the cardinalities are presented depending on the prime decompo…
▽ More
In this paper, we study the unimodular equivalence of sublattices in an $n$-dimensional lattice. A recursive procedure is given to compute the cardinalities of the unimodular equivalent classes with the indices which are powers of a prime $p$. We also show that these are integral polynomials in $p$. When $n=2$, the explicit formulae of the cardinalities are presented depending on the prime decomposition of the index $m$. We also give an explicit formula on the number of co-cyclic sublattices with a fixed index $m$, which consist a unimodular equivalent class of sublattices.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
A High Fidelity and Low Complexity Neural Audio Coding
Authors:
Wenzhe Liu,
Wei Xiao,
Meng Wang,
Shan Yang,
Yupeng Shi,
Yuyong Kang,
Dan Su,
Shidong Shang,
Dong Yu
Abstract:
Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide…
▽ More
Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide-band components and adopts traditional signal processing to compress high-band components according to psychological hearing knowledge. Inspired by auditory perception theory, a perception-based loss function is designed to improve harmonic modeling. Besides, generative adversarial network (GAN) compression is proposed for the first time for neural audio codecs. Our method is superior to prior advanced neural codecs across subjective and objective metrics and allows real-time inference on desktop and mobile.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Zentropy theory for accurate prediction of free energy, volume, and thermal expansion without fitting parameters
Authors:
Zi-Kui Liu,
Nigel L. E. Hew,
Shun-Li Shang
Abstract:
Based on statistical mechanics, a macroscopically homogeneous system, i.e., a single phase in the present context, is composed of many independent configurations that the system embraces. The macroscopical properties of the system are determined by the properties and statistical probabilities of those configurations with respect to external conditions. The volume of a single phase is thus the weig…
▽ More
Based on statistical mechanics, a macroscopically homogeneous system, i.e., a single phase in the present context, is composed of many independent configurations that the system embraces. The macroscopical properties of the system are determined by the properties and statistical probabilities of those configurations with respect to external conditions. The volume of a single phase is thus the weighted sum of the volumes of all configurations. Consequently, the derivative of the volume to temperature of a single phase depends on both the derivatives of the volumes of every configuration to temperature and the derivatives of their statistical probabilities to temperature with the latter introducing non-linear emergent behaviors. It is shown that the derivative of the volume to temperature of the single phase can be negative, i.e., negative thermal expansion (NTE), due to the symmetry-breaking non-ground-state configurations with smaller volumes than that of the ground-state configuration and the rapid increase of the statistical probabilities of the former, and NTE can be predicted without fitting parameters from the zentropy theory that combines quantum mechanics and statistical mechanics with the free energy of each configuration predicted from quantum mechanics and the partition function of each configuration calculated from its free energy.
△ Less
Submitted 7 November, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks
Authors:
Qiang Huang,
Jiawei Jiang,
Xi Susie Rao,
Ce Zhang,
Zhichao Han,
Zitao Zhang,
Xin Wang,
Yongjun He,
Quanqing Xu,
Yang Zhao,
Chuang Hu,
Shuo Shang,
Bo Du
Abstract:
To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient compari…
▽ More
To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Electronic-grade epitaxial (111) KTaO3 heterostructures
Authors:
Jieun Kim,
Muqing Yu,
Jung-Woo Lee,
Shun-Li Shang,
Gi-Yeop Kim,
Pratap Pal,
Jinsol Seo,
Neil Campbell,
Kitae Eom,
Ranjani Ramachandran,
Mark S. Rzchowski,
Sang Ho Oh,
Si-Young Choi,
Zi-Kui Liu,
Jeremy Levy,
Chang-Beom Eom
Abstract:
KTaO3 has recently attracted attention as a model system to study the interplay of quantum paraelectricity, spin-orbit coupling, and superconductivity. However, the high and low vapor pressures of potassium and tantalum present processing challenges to creating interfaces clean enough to reveal the intrinsic quantum properties. Here, we report superconducting heterostructures based on electronic-g…
▽ More
KTaO3 has recently attracted attention as a model system to study the interplay of quantum paraelectricity, spin-orbit coupling, and superconductivity. However, the high and low vapor pressures of potassium and tantalum present processing challenges to creating interfaces clean enough to reveal the intrinsic quantum properties. Here, we report superconducting heterostructures based on electronic-grade epitaxial (111) KTaO3 thin films. Electrical and structural characterizations reveal that two-dimensional electron gas at the heterointerface between amorphous LaAlO3 and KTaO3 thin film exhibits significantly higher electron mobility, superconducting transition temperature and critical current density than those in bulk single crystal KTaO3-based heterostructures owing to cleaner interface in KTaO3 thin films. Our hybrid approach may enable epitaxial growth of other alkali metal-based oxides that lie beyond the capabilities of conventional methods.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
CharacterChat: Learning towards Conversational AI with Personalized Social Support
Authors:
Quan Tu,
Chuanqi Chen,
Jinpeng Li,
Yanran Li,
Shuo Shang,
Dongyan Zhao,
Ran Wang,
Rui Yan
Abstract:
In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of supp…
▽ More
In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of support agents and the interpersonal matching mechanism, linking individuals with persona-compatible virtual supporters. Utilizing persona decomposition based on the MBTI (Myers-Briggs Type Indicator), we have created the MBTI-1024 Bank, a group that of virtual characters with distinct profiles. Through improved role-playing prompts with behavior preset and dynamic memory, we facilitate the development of the MBTI-S2Conv dataset, which contains conversations between the characters in the MBTI-1024 Bank. Building upon these foundations, we present CharacterChat, a comprehensive S2Conv system, which includes a conversational model driven by personas and memories, along with an interpersonal matching plugin model that dispatches the optimal supporters from the MBTI-1024 Bank for individuals with specific personas. Empirical results indicate the remarkable efficacy of CharacterChat in providing personalized social support and highlight the substantial advantages derived from interpersonal matching. The source code is available in \url{https://github.com/morecry/CharacterChat}.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Predictions and correlation analyses of Ellingham diagrams in binary oxides
Authors:
Shun-Li Shang,
Shuang Lin,
Michael C. Gao,
Darrell G. Schlom,
Zi-Kui Liu
Abstract:
Knowing oxide-forming ability is vital to gain desired or avoid deleterious oxides formation through tuning oxidizing environment and materials chemistry. Here, we have conducted a comprehensive thermodynamic analysis of 137 binary oxides using the presently predicted Ellingham diagrams. It is found that the active elements to form oxides easily are the f-block elements (lanthanides and actinides)…
▽ More
Knowing oxide-forming ability is vital to gain desired or avoid deleterious oxides formation through tuning oxidizing environment and materials chemistry. Here, we have conducted a comprehensive thermodynamic analysis of 137 binary oxides using the presently predicted Ellingham diagrams. It is found that the active elements to form oxides easily are the f-block elements (lanthanides and actinides), elements in the groups II, III, and IV (alkaline earth, Sc, Y, Ti, Zr, and Hf), and Al and Li; while the noble elements with their oxides nonstable and easily reduced are coinage metals (Cu, Ag, and especially Au), Pt-group elements, and Hg and Se. Machine learning based sequential feature selection indicates that oxide-forming ability can be represented by electronic structures of pure elements, for example, their d- and s-valence electrons, Mendeleev numbers, and the groups, making the periodic table a useful tool to tailor oxide-forming ability. The other key elemental features to correlate oxide-forming ability are thermochemical properties such as melting points and standard entropy at 298 K of pure elements. It further shows that the present Ellingham diagrams enable qualitatively understanding and even predicting oxides formed in multicomponent materials, such as the Fe-20Cr-20Ni alloy (in wt.%) and the equimolar high entropy alloy of AlCoCrFeNi, which are in accordance with thermodynamic calculations using the CALPHAD approach and experimental observations in the literature.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Large deviation principle for stochastic reaction-diffusion equations with super-linear drift on $\mathbb{R}$ driven by space-time white noise
Authors:
Yue Li,
Shijie Shang,
Jianliang Zhai
Abstract:
In this paper, we consider stochastic reaction-diffusion equations with super-linear drift on the real line $\mathbb{R}$ driven by space-time white noise. A Freidlin-Wentzell large deviation principle is established by a modified weak convergence method on the space $C([0,T], C_{tem}(\mathbb{R}))$. Obtaining the main result in this paper is challenging due to the setting of unbounded domain, the s…
▽ More
In this paper, we consider stochastic reaction-diffusion equations with super-linear drift on the real line $\mathbb{R}$ driven by space-time white noise. A Freidlin-Wentzell large deviation principle is established by a modified weak convergence method on the space $C([0,T], C_{tem}(\mathbb{R}))$. Obtaining the main result in this paper is challenging due to the setting of unbounded domain, the space-time white noise, and the superlinear drift term without dissipation. To overcome these difficulties, the special designed norm on $C([0,T], C_{tem}(\mathbb{R}))$, one order moment estimates of the stochastic convolution and two nonlinear Gronwall-type inequalities play an important role.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Comparing Forward and Inverse Design Paradigms: A Case Study on Refractory High-Entropy Alloys
Authors:
Arindam Debnath,
Lavanya Raman,
Wenjie Li,
Adam M. Krajewski,
Marcia Ahn,
Shuang Lin,
Shunli Shang,
Allison M. Beese,
Zi-Kui Liu,
Wesley F. Reinhart
Abstract:
The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, where…
▽ More
The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, wherein a model provided with the target properties is able to find the best candidate. Being a relatively new concept, there remains a need to systematically evaluate how these two paradigms perform in practical applications. Therefore, the objective of this study is to directly, quantitatively compare the forward and inverse design modeling paradigms. We do so by considering two case studies of refractory high-entropy alloy design with different objectives and constraints and comparing the inverse design method to other forward schemes like localized forward search, high throughput screening, and multi objective optimization.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Transportation cost inequalities for stochastic reaction diffusion equations on the whole line $\mathbb{R}$
Authors:
Yue Li,
Shijie Shang,
Tusheng Zhang
Abstract:
In this paper, we established quadratic transportation cost inequalities for solutions of stochastic reaction diffusion equations driven by multiplicative space-time white noise on the whole line $\mathbb{R}$. Since the space variable is defined on the unbounded domain $\mathbb{R}$, the inequalities are proved under a weighted $L^2$-norm and a weighted uniform metric in the so called $L^2_{tem}$,…
▽ More
In this paper, we established quadratic transportation cost inequalities for solutions of stochastic reaction diffusion equations driven by multiplicative space-time white noise on the whole line $\mathbb{R}$. Since the space variable is defined on the unbounded domain $\mathbb{R}$, the inequalities are proved under a weighted $L^2$-norm and a weighted uniform metric in the so called $L^2_{tem}$, $C_{tem}$ spaces. The new moments estimates of the stochastic convolution with respect to space-time white noise play an important role. In addition, the transportation cost inequalities are also obtained for the stochastic reaction diffusion equations with random initial values.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Hard Lefschetz theorems for free line bundles
Authors:
Jiajun Hu,
Shijie Shang,
Jian Xiao
Abstract:
We introduce a partial positivity notion for algebraic maps via the defect of semismallness. This positivity notion is modeled on $m$-positivity in the analytic setting and $m$-ampleness in the geometric setting. Using this positivity condition for algebraic maps, we establish Kähler packages, that is, Hard Lefschetz theorems and Hodge-Riemann bilinear relations, for the complete intersections of…
▽ More
We introduce a partial positivity notion for algebraic maps via the defect of semismallness. This positivity notion is modeled on $m$-positivity in the analytic setting and $m$-ampleness in the geometric setting. Using this positivity condition for algebraic maps, we establish Kähler packages, that is, Hard Lefschetz theorems and Hodge-Riemann bilinear relations, for the complete intersections of Chern classes of free line bundles.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Inter-SubNet: Speech Enhancement with Subband Interaction
Authors:
Jun Chen,
Wei Rao,
Zilin Wang,
Jiuxin Lin,
Zhiyong Wu,
Yannan Wang,
Shidong Shang,
Helen Meng
Abstract:
Subband-based approaches process subbands in parallel through the model with shared parameters to learn the commonality of local spectrums for noise reduction. In this way, they have achieved remarkable results with fewer parameters. However, in some complex environments, the lack of global spectral information has a negative impact on the performance of these subband-based approaches. To this end…
▽ More
Subband-based approaches process subbands in parallel through the model with shared parameters to learn the commonality of local spectrums for noise reduction. In this way, they have achieved remarkable results with fewer parameters. However, in some complex environments, the lack of global spectral information has a negative impact on the performance of these subband-based approaches. To this end, this paper introduces the subband interaction as a new way to complement the subband model with the global spectral information such as cross-band dependencies and global spectral patterns, and proposes a new lightweight single-channel speech enhancement framework called Interactive Subband Network (Inter-SubNet). Experimental results on DNS Challenge - Interspeech 2021 dataset show that the proposed Inter-SubNet yields a significant improvement over the subband model and outperforms other state-of-the-art speech enhancement approaches, which demonstrate the effectiveness of subband interaction.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Authors:
Yiqun Yao,
Siqi fan,
Xiusheng Huang,
Xuezhi Fang,
Xiang Li,
Ziyi Ni,
Xin Jiang,
Xuying Meng,
Peng Han,
Shuo Shang,
Kang Liu,
Aixin Sun,
Yequan Wang
Abstract:
As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that accurately predicts certain metrics for large models without training them. Existing scaling laws require hyperparameter search on the largest models, limiting their predicative…
▽ More
As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that accurately predicts certain metrics for large models without training them. Existing scaling laws require hyperparameter search on the largest models, limiting their predicative capability. In this paper, we present an approach (namely μScaling) to predict the pre-training loss, based on our observations that Maximal Update Parametrization (μP) enables accurate fitting of scaling laws close to common loss basins in hyperparameter space. With μScaling, different model designs can be compared on large scales by training only their smaller counterparts. Further, we introduce nanoLM: an affordable LLM pre-training benchmark that facilitates this new research paradigm. With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B. Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models. We also aspire for our benchmark to serve as a bridge between the academic community and the industry. Code for μScaling is available at https://github.com/cofe-ai/Mu-scaling. Code for nanoLLM will be available later.
△ Less
Submitted 6 April, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution
Authors:
Shuyao Shang,
Zhengyang Shan,
Guangxing Liu,
LunQian Wang,
XingHua Wang,
Zekai Zhang,
Jinglin Zhang
Abstract:
Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). ResDiff utilizes a combination of a CNN, which restores primary lo…
▽ More
Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). ResDiff utilizes a combination of a CNN, which restores primary low-frequency components, and a DPM, which predicts the residual between the ground-truth image and the CNN predicted image. In contrast to the common diffusion-based methods that directly use LR images to guide the noise towards HR space, ResDiff utilizes the CNN's initial prediction to direct the noise towards the residual space between HR space and CNN-predicted space, which not only accelerates the generation process but also acquires superior sample quality. Additionally, a frequency-domain-based loss function for CNN is introduced to facilitate its restoration, and a frequency-domain guided diffusion is designed for DPM on behalf of predicting high-frequency details. The extensive experiments on multiple benchmark datasets demonstrate that ResDiff outperforms previous diffusion based methods in terms of shorter model convergence time, superior generation quality, and more diverse samples.
△ Less
Submitted 2 February, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge
Authors:
Yukai Ju,
Jun Chen,
Shimin Zhang,
Shulin He,
Wei Rao,
Weixin Zhu,
Yannan Wang,
Tao Yu,
Shidong Shang
Abstract:
This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct…
▽ More
This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) structure is introduced to boost speaker information extraction, and multi-STFT resolution loss is used to effectively capture the time-frequency characteristics of the speech signals. Moreover, retraining methods are employed based on the freeze training strategy to fine-tune the system. According to the official results, TEA-PSE 3.0 ranks 1st in both ICASSP 2023 DNS-Challenge track 1 and track 2.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.