-
ViMo: Generating Motions from Casual Videos
Authors:
Liangdong Qiu,
Chengxing Yu,
Yanran Li,
Zhao Wang,
Haibin Huang,
Chongyang Ma,
Di Zhang,
Pengfei Wan,
Xiaoguang Han
Abstract:
Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting i…
▽ More
Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist. We also show this work could enable three important downstream applications, such as generating dancing motions according to arbitrary music and source video style. Extensive experimental results prove that our model offers an effective and scalable way to generate diversity and realistic motions. Code and demos will be public soon.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction
Authors:
Yu Guo,
Caiying Wu,
Yaxin Li,
Qiyu Jin,
Tieyong Zeng
Abstract:
Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algor…
▽ More
Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algorithm. Furthermore, we leverage deep learning to initialize the conjugate gradient method, resulting in a deep unrolling network with theoretical guarantees. Our extensive numerical experiments demonstrate that our proposed algorithm surpasses existing methods, particularly excelling in fewer scanned views and complex noise conditions.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Gravity and a universal cutoff for field theory
Authors:
Simon Caron-Huot,
Yue-Zhou Li
Abstract:
We analyze the one-loop effects of massive fields on 2-to-2 scattering processes involving gravitons. It has been suggested that in the presence of gravity, any local effective field theory description must break down at the "species scale". We first observe that unitarity and analyticity of the amplitude indeed imply a species-type bound $GΛ^{d-2}N\leq O(1)$, where $N$ counts parametrically light…
▽ More
We analyze the one-loop effects of massive fields on 2-to-2 scattering processes involving gravitons. It has been suggested that in the presence of gravity, any local effective field theory description must break down at the "species scale". We first observe that unitarity and analyticity of the amplitude indeed imply a species-type bound $GΛ^{d-2}N\leq O(1)$, where $N$ counts parametrically light species and $Λ$ is an energy scale above which new unknown ingredients must modify the graviton amplitude. To clarify what happens at this scale, we contrast the partial wave decomposition of calculated amplitudes with that of some ultraviolet scenarios: string theory and strongly interacting Planck-scale physics. Observing that the latter exhibit a markedly stronger high-spin content, we define nonperturbatively the high-spin onset scale $Λ_{\rm o}$, which coincides with the string scale and higher-dimensional Planck scale in respective examples. We argue that, generally, no local field description can exist at distances shorter than $1/Λ_{\rm o}$.
△ Less
Submitted 23 August, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Production characteristics of light nuclei, hypertritons and $Ω$-hypernuclei in Pb+Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV
Authors:
Rui-Qin Wang,
Xin-Lei Hou,
Yan-Hao Li,
Jun Song,
Feng-Lan Shao
Abstract:
We extend an analytical nucleon coalescence model with hyperons to study productions of light nuclei, hypertritons and $Ω$-hypernuclei in Pb+Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV. We derive the formula of the momentum distribution of two bodies coalescing into dibaryon states and that of three bodies coalescing into tribaryon states. We explain the available data of the coalescence factors…
▽ More
We extend an analytical nucleon coalescence model with hyperons to study productions of light nuclei, hypertritons and $Ω$-hypernuclei in Pb+Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV. We derive the formula of the momentum distribution of two bodies coalescing into dibaryon states and that of three bodies coalescing into tribaryon states. We explain the available data of the coalescence factors $B_2$ and $B_3$, the transverse momentum spectra, the averaged transverse momenta, the yield rapidity densities, yield ratios of the deuteron, antihelium-3, antitriton, hypertriton measured by the ALICE collaboration, and give predictions of different $Ω$-hypernuclei, e.g., $H(pΩ^-)$, $H(nΩ^-)$ and $H(pnΩ^-)$. We find two groups of interesting observables, the averaged transverse momentum ratios of light (hyper-)nuclei to protons (hyperons) and the centrality-dependent yield ratios of theirs. The former group exhibits a reverse-hierarchy of the nucleus size, and the latter is helpful for the judgements of the nucleus production mechanism as well as the nucleus own size.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Large Language Model Agent in Financial Trading: A Survey
Authors:
Han Ding,
Yinheng Li,
Junhao Wang,
Hang Chen
Abstract:
Trading is a highly competitive task that requires a combination of strategy, knowledge, and psychological fortitude. With the recent success of large language models(LLMs), it is appealing to apply the emerging intelligence of LLM agents in this competitive arena and understanding if they can outperform professional traders. In this survey, we provide a comprehensive review of the current researc…
▽ More
Trading is a highly competitive task that requires a combination of strategy, knowledge, and psychological fortitude. With the recent success of large language models(LLMs), it is appealing to apply the emerging intelligence of LLM agents in this competitive arena and understanding if they can outperform professional traders. In this survey, we provide a comprehensive review of the current research on using LLMs as agents in financial trading. We summarize the common architecture used in the agent, the data inputs, and the performance of LLM trading agents in backtesting as well as the challenges presented in these research. This survey aims to provide insights into the current state of LLM-based financial trading agents and outline future research directions in this field.
△ Less
Submitted 26 July, 2024;
originally announced August 2024.
-
ACCELERATION: Sequentially-scanning DECT Imaging Using High Temporal Resolution Image Reconstruction And Temporal Extrapolation
Authors:
Qiaoxin Li,
Dong Liang,
Yinsheng Li
Abstract:
Dual-energy computed tomography (DECT) has been widely used to obtain quantitative elemental composition of imaged subjects for personalized and precise medical diagnosis. Compared with existing high-end DECT leveraging advanced X-ray source and/or detector technologies, the use of the sequentially-scanning data acquisition scheme to implement DECT may make broader impact on clinical practice beca…
▽ More
Dual-energy computed tomography (DECT) has been widely used to obtain quantitative elemental composition of imaged subjects for personalized and precise medical diagnosis. Compared with existing high-end DECT leveraging advanced X-ray source and/or detector technologies, the use of the sequentially-scanning data acquisition scheme to implement DECT may make broader impact on clinical practice because this scheme requires no specialized hardware designs. However, since the concentration of iodinated contrast agent in the imaged subject varies over time, sequentially-scanned data sets acquired at two tube potentials are temporally inconsistent. As existing material decomposition approaches for DECT assume that the data sets acquired at two tube potentials are temporally consistent, the violation of this assumption results in inaccurate quantification accuracy of iodine concentration. In this work, we developed a technique to achieve sequentially-scanning DECT imaging using high temporal resolution image reconstruction and temporal extrapolation, ACCELERATION in short, to address the technical challenge induced by temporal inconsistency of sequentially-scanned data sets and improve iodine quantification accuracy in sequentially-scanning DECT. ACCELERATION has been validated and evaluated using numerical simulation data sets generated from clinical human subject exams. Results demonstrated the improvement of iodine quantification accuracy using ACCELERATION.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Understanding Byzantine Robustness in Federated Learning with A Black-box Server
Authors:
Fangyuan Zhao,
Yuexiang Xie,
Xuebin Ren,
Bolin Ding,
Shusen Yang,
Yaliang Li
Abstract:
Federated learning (FL) becomes vulnerable to Byzantine attacks where some of participators tend to damage the utility or discourage the convergence of the learned model via sending their malicious model updates. Previous works propose to apply robust rules to aggregate updates from participators against different types of Byzantine attacks, while at the same time, attackers can further design adv…
▽ More
Federated learning (FL) becomes vulnerable to Byzantine attacks where some of participators tend to damage the utility or discourage the convergence of the learned model via sending their malicious model updates. Previous works propose to apply robust rules to aggregate updates from participators against different types of Byzantine attacks, while at the same time, attackers can further design advanced Byzantine attack algorithms targeting specific aggregation rule when it is known. In practice, FL systems can involve a black-box server that makes the adopted aggregation rule inaccessible to participants, which can naturally defend or weaken some Byzantine attacks. In this paper, we provide an in-depth understanding on the Byzantine robustness of the FL system with a black-box server. Our investigation demonstrates the improved Byzantine robustness of a black-box server employing a dynamic defense strategy. We provide both empirical evidence and theoretical analysis to reveal that the black-box server can mitigate the worst-case attack impact from a maximum level to an expectation level, which is attributed to the inherent inaccessibility and randomness offered by a black-box server.The source code is available at https://github.com/alibaba/FederatedScope/tree/Byzantine_attack_defense to promote further research in the community.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Global weak solutions to a fractional Cahn-Hilliard cross-diffusion system in lymphangiogenesis
Authors:
Ansgar Jüngel,
Yue Li
Abstract:
A spectral-fractional Cahn-Hilliard cross-diffusion system, which describes the pre-patterning of lymphatic vessel morphology in collagen gels, is studied. The model consists of two higher-order quasilinear parabolic equations and describes the evolution of the fiber phase volume fraction and the solute concentration. The free energy consists of the nonconvex Flory-Huggins energy and a fractional…
▽ More
A spectral-fractional Cahn-Hilliard cross-diffusion system, which describes the pre-patterning of lymphatic vessel morphology in collagen gels, is studied. The model consists of two higher-order quasilinear parabolic equations and describes the evolution of the fiber phase volume fraction and the solute concentration. The free energy consists of the nonconvex Flory-Huggins energy and a fractional gradient energy, modeling nonlocal long-range correlations. The existence of global weak solutions to this system in a bounded domain with no-flux boundary conditions is shown. The proof is based on a three-level approximation scheme, spectral-fractional calculus, and a priori estimates coming from the energy inequality.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models
Authors:
Ronak Pradeep,
Daniel Lee,
Ali Mousavi,
Jeff Pound,
Yisi Sang,
Jimmy Lin,
Ihab Ilyas,
Saloni Potdar,
Mostafa Arefiyan,
Yunyao Li
Abstract:
The rapid advancement of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an idea…
▽ More
The rapid advancement of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge. Although human-curated KG-based conversational datasets exist, they struggle to keep pace with the rapidly changing user information needs. We present ConvKGYarn, a scalable method for generating up-to-date and configurable conversational KGQA datasets. Qualitative psychometric analyses confirm our method can generate high-quality datasets rivaling a popular conversational KGQA dataset while offering it at scale and covering a wide range of human-interaction configurations. We showcase its utility by testing LLMs on diverse conversations - exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set. Our results highlight the ability of ConvKGYarn to improve KGQA foundations and evaluate parametric knowledge of LLMs, thus offering a robust solution to the constantly evolving landscape of conversational assistants.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Observation of single-quantum vortex splitting in the Ba$_{1-x}$K$_x$Fe$_2$As$_2$ superconductor
Authors:
Q. Z. Zhou,
B. R. Chen,
B. K. Xiang,
I. Timoshuk,
J. Garaud,
Y. Li,
K. Y. Liang,
Q. S. He,
Z. J. Li,
P. H. Zhang,
K. Z. Yao,
H. X. Yao,
E. Babaev,
V. Grinenko,
Y. H. Wang
Abstract:
Since their theoretical discovery more than a half-century ago, vortices observed in bulk superconductors have carried a quantized value of magnetic flux determined only by fundamental constants. A recent experiment reported 'unquantized' quantum vortices carrying the same fraction of flux quantum in Ba$_{0.23}$K$_{0.77}$Fe$_2$As$_2$ in a small temperature range below its superconducting critical…
▽ More
Since their theoretical discovery more than a half-century ago, vortices observed in bulk superconductors have carried a quantized value of magnetic flux determined only by fundamental constants. A recent experiment reported 'unquantized' quantum vortices carrying the same fraction of flux quantum in Ba$_{0.23}$K$_{0.77}$Fe$_2$As$_2$ in a small temperature range below its superconducting critical temperature ($T_C$). Here, we use scanning superconducting quantum interference device (sSQUID) microscopy with improved sensitivity to investigate the genesis of fractional vortices in Ba$_{0.23}$K$_{0.77}$Fe$_2$As$_2$. We report the direct observation of a single-flux quantum vortex splitting into two different fractions with increasing temperature. The flux of the two fractions has opposite dependence on temperature, while the total flux sums up to one flux quantum despite their spatial separation. Overall, our study shows the existence of different fractional vortices and their stability in temperature ranging from 0.1 to 0.99 $T_C$. Besides the implications of this observation for the fundamental question of quantum vorticity, the discovery of these objects paves the way for the new platform for anyon quasiparticles and applications for fractional fluxonics.
△ Less
Submitted 27 August, 2024; v1 submitted 11 August, 2024;
originally announced August 2024.
-
Dynamic hysteresis of an oscillatory contact line
Authors:
Jiaxing Shen,
Yaerim Lee,
Yuanzhe Li,
Stéphane Zaleski,
Gustav Amberg,
Junichiro Shiomi
Abstract:
During oscillatory wetting, a phase retardation emerges between contact angle variation and contact line velocity, presenting as a hysteresis loop in their correlation -- an effect we term dynamic hysteresis. This phenomenon is found to be tunable by modifying the surface with different molecular layers. A comparative analysis of dynamic hysteresis, static hysteresis, and contact line friction coe…
▽ More
During oscillatory wetting, a phase retardation emerges between contact angle variation and contact line velocity, presenting as a hysteresis loop in their correlation -- an effect we term dynamic hysteresis. This phenomenon is found to be tunable by modifying the surface with different molecular layers. A comparative analysis of dynamic hysteresis, static hysteresis, and contact line friction coefficients across diverse substrates reveals that dynamic hysteresis is not a result of dissipative effects but is instead proportionally linked to the flexibility of the grafted layer on the surface. In the quest for appropriate conditions to model oscillatory contact line motion, we identify the generalized Hocking's linear law and modified Generalized Navier Boundary Condition (GNBC) as alternative options for predicting realistic dynamic hysteresis.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Authors:
Xuanyu Su,
Yansong Li,
Diana Inkpen,
Nathalie Japkowicz
Abstract:
Amidst the rise of Large Multimodal Models (LMMs) and their widespread application in generating and interpreting complex content, the risk of propagating biased and harmful memes remains significant. Current safety measures often fail to detect subtly integrated hateful content within ``Confounder Memes''. To address this, we introduce \textsc{HateSieve}, a new framework designed to enhance the d…
▽ More
Amidst the rise of Large Multimodal Models (LMMs) and their widespread application in generating and interpreting complex content, the risk of propagating biased and harmful memes remains significant. Current safety measures often fail to detect subtly integrated hateful content within ``Confounder Memes''. To address this, we introduce \textsc{HateSieve}, a new framework designed to enhance the detection and segmentation of hateful elements in memes. \textsc{HateSieve} features a novel Contrastive Meme Generator that creates semantically paired memes, a customized triplet dataset for contrastive learning, and an Image-Text Alignment module that produces context-aware embeddings for accurate meme segmentation. Empirical experiments on the Hateful Meme Dataset show that \textsc{HateSieve} not only surpasses existing LMMs in performance with fewer trainable parameters but also offers a robust mechanism for precisely identifying and isolating hateful content. \textcolor{red}{Caution: Contains academic discussions of hate speech; viewer discretion advised.}
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training
Authors:
Zhuoyan Liu,
Bo Wang,
Ye Li
Abstract:
Underwater object detection has higher requirements of running speed and deployment efficiency for the detector due to its specific environmental challenges. NMS of two- or one-stage object detectors and transformer architecture of query-based end-to-end object detectors are not conducive to deployment on underwater embedded devices with limited processing power. As for the detrimental effect of u…
▽ More
Underwater object detection has higher requirements of running speed and deployment efficiency for the detector due to its specific environmental challenges. NMS of two- or one-stage object detectors and transformer architecture of query-based end-to-end object detectors are not conducive to deployment on underwater embedded devices with limited processing power. As for the detrimental effect of underwater color cast noise, recent underwater object detectors make network architecture or training complex, which also hinders their application and deployment on underwater vehicle platforms. In this paper, we propose the Underwater DECO with improved deNoising training (U-DECN), the query-based end-to-end object detector (with ConvNet encoder-decoder architecture) for underwater color cast noise that addresses the above problems. We integrate advanced technologies from DETR variants into DECO and design optimization methods specifically for the ConvNet architecture, including Separate Contrastive DeNoising Forward and Deformable Convolution in SIM. To address the underwater color cast noise issue, we propose an underwater color denoising query to improve the generalization of the model for the biased object feature information by different color cast noise. Our U-DECN, with ResNet-50 backbone, achieves 61.4 AP (50 epochs), 63.3 AP (72 epochs), 64.0 AP (100 epochs) on DUO, and 21 FPS (5 times faster than Deformable DETR and DINO 4 FPS) on NVIDIA AGX Orin by TensorRT FP16, outperforming the other state-of-the-art query-based end-to-end object detectors. The code is available at https://github.com/LEFTeyex/U-DECN.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing
Authors:
Siyu Yu,
Yifan Wu,
Ying Li,
Pinjia He
Abstract:
Parser-based log compressors have been widely explored in recent years because the explosive growth of log volumes makes the compression performance of general-purpose compressors unsatisfactory. These parser-based compressors preprocess logs by grouping the logs based on the parsing result and then feed the preprocessed files into a general-purpose compressor. However, parser-based compressors ha…
▽ More
Parser-based log compressors have been widely explored in recent years because the explosive growth of log volumes makes the compression performance of general-purpose compressors unsatisfactory. These parser-based compressors preprocess logs by grouping the logs based on the parsing result and then feed the preprocessed files into a general-purpose compressor. However, parser-based compressors have their limitations. First, the goals of parsing and compression are misaligned, so the inherent characteristics of logs were not fully utilized. In addition, the performance of parser-based compressors depends on the sample logs and thus it is very unstable. Moreover, parser-based compressors often incur a long processing time. To address these limitations, we propose Denum, a simple, general log compressor with high compression ratio and speed. The core insight is that a majority of the tokens in logs are numeric tokens (i.e. pure numbers, tokens with only numbers and special characters, and numeric variables) and effective compression of them is critical for log compression. Specifically, Denum contains a Numeric Token Parsing module, which extracts all numeric tokens and applies tailored processing methods (e.g. store the differences of incremental numbers like timestamps), and a String Processing module, which processes the remaining log content without numbers. The processed files of the two modules are then fed as input to a general-purpose compressor and it outputs the final compression results. Denum has been evaluated on 16 log datasets and it achieves an 8.7%-434.7% higher average compression ratio and 2.6x-37.7x faster average compression speed (i.e. 26.2MB/S) compared to the baselines. Moreover, integrating Denum's Numeric Token Parsing into existing log compressors can provide an 11.8% improvement in their average compression ratio and achieve 37% faster average compression speed.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Various Features of the X-class White-light Flares in Super Active Region NOAA 13664
Authors:
Ying Li,
Xiaofeng Liu,
Zhichen Jing,
Wei Chen,
Qiao Li,
Yang Su,
De-Chao Song,
M. D. Ding,
Li Feng,
Hui Li,
Weiqun Gan
Abstract:
Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismi…
▽ More
Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory. It is found that both the white-light emissions at WST 3600 Å (Balmer continuum) and HMI 6173 Å (Paschen continuum) show up in different regions of the sunspot group in these flares, including outside the sunspots and within the penumbra and umbra of the sunspots. They exhibit a point-, ribbon-, loop-, or ejecta-like shape, which can come from flare ribbons (or footpoints), flare loops, and plasma ejecta depending on the perspective view. The white-light duration and relative enhancement are measured and both parameters for 3600 Å emission have greater values than those for 6173 Å emission. It is also found that these white-light emissions are cospatial well with the hard X-ray (HXR) sources in the on-disk flares but have some offsets with the HXR emissions in the off-limb flares. In addition, it is interesting that the 3600 and 6173 Å emissions show different correlations with the peak HXR fluxes, with the former one more sensitive to the HXR emission. All these greatly help us understand the white-light flares of a large magnitude from a super active region on the Sun and also provide important insights into superflares on Sun-like stars.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Moment&Cross: Next-Generation Real-Time Cross-Domain CTR Prediction for Live-Streaming Recommendation at Kuaishou
Authors:
Jiangxia Cao,
Shen Wang,
Yue Li,
Shenghui Wang,
Jian Tang,
Shiyao Wang,
Shuang Yang,
Zhaojie Liu,
Guorui Zhou
Abstract:
Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still m…
▽ More
Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still may be an negative watching (e.g., short-view < 3s) since the real-time content is not attractive enough. Therefore, for live-streaming recommendation, there exists a challenging task: how do we recommend the live-streaming at right moment for users? Additionally, our platform's major exposure content is short short-video, and the amount of exposed short-video is 9x more than exposed live-streaming. Thus users will leave more behaviors on short-videos, which leads to a serious data imbalance problem making the live-streaming data could not fully reflect user interests. In such case, there raises another challenging task: how do we utilize users' short-video behaviors to make live-streaming recommendation better?
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Tunable atomically enhanced moiré Berry curvatures in twisted triple bilayer graphene
Authors:
Konstantin Davydov,
Ziyan Zhu,
Noah Friedman,
Ethan Gramowski,
Yaotian Li,
Jack Tavakley,
Kenji Watanabe,
Takashi Taniguchi,
Mitchell Luskin,
Efthimios Kaxiras,
Ke Wang
Abstract:
We report a twisted triple bilayer graphene platform consisting of three units of Bernal bilayer graphene (BLG) consecutively twisted at 1.49° and 1.68°. We observe inter-moiré Hofstadter butterflies from two co-existing moiré superlattices and a Hofstadter butterfly from reconstructed moiré-of-moiré lattice, and show that their Brown-Zak (BZ) oscillations quantitatively agree with each other, bot…
▽ More
We report a twisted triple bilayer graphene platform consisting of three units of Bernal bilayer graphene (BLG) consecutively twisted at 1.49° and 1.68°. We observe inter-moiré Hofstadter butterflies from two co-existing moiré superlattices and a Hofstadter butterfly from reconstructed moiré-of-moiré lattice, and show that their Brown-Zak (BZ) oscillations quantitatively agree with each other, both evidencing strong atomic reconstruction with a lattice constant of 18.1 nm. We further demonstrate such atomic reconstruction strongly enhances the Berry curvature of each moiré and moiré-of-moiré band-insulator state, characterized by measured strong non-local valley Hall effect (VHE) that sensitively depends on the inter-moiré competition strength, tunable by manipulating the out-of-the-plane carrier distribution which controls the magnitude of the valley currents. Our study sheds new light on the microscopic mechanism of atomic and electronic reconstruction in twisted-multilayer systems, by investigating novel emergent quantum phenomena of reconstructed quasi-crystalline moiré-of-moiré superlattice, including a new type of moiré-of-moiré band-insulator states and atomically enhanced moiré Berry curvature. We show that the reconstructed electronic band can be versatilely tuned by electrostatics, providing an approach towards engineering the band structure and its topology for a novel quantum material platform with designer electrical and optical properties.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark
Authors:
Cheng Wei,
Yang Wang,
Kuofeng Gao,
Shuo Shao,
Yiming Li,
Zhibo Wang,
Zhan Qin
Abstract:
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permissi…
▽ More
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Accelerating In-transit Isosurface Generation With Topology Preserving Compression
Authors:
Yanliang Li,
Jieyang Chen
Abstract:
Data visualization through isosurface generation is critical in various scientific fields, including computational fluid dynamics, medical imaging, and geophysics. However, the high cost of data sharing between simulation sources and visualization resources poses a significant challenge. This paper introduces a novel framework that leverages lossy compression to accelerate in-transit isosurface ge…
▽ More
Data visualization through isosurface generation is critical in various scientific fields, including computational fluid dynamics, medical imaging, and geophysics. However, the high cost of data sharing between simulation sources and visualization resources poses a significant challenge. This paper introduces a novel framework that leverages lossy compression to accelerate in-transit isosurface generation. Our approach involves a Compressed Hierarchical Representation (CHR) and topology-preserving compression to ensure the fidelity of the isosurface generation. Experimental evaluations demonstrate that our framework can achieve up to 4x speedup in visualization workflows, making it a promising solution for real-time scientific data analysis.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation
Authors:
Yiran Li,
Gongyao Guo,
Jieming Shi,
Renchi Yang,
Shiqi Shen,
Qing Li,
Jun Luo
Abstract:
Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed ne…
▽ More
Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Existence and non-uniqueness of probabilistically strong solutions to 3D stochastic magnetohydrodynamic equations
Authors:
Wenping Cao,
Yachun Li,
Deng Zhang
Abstract:
We are concerned with the 3D stochastic magnetohydrodynamic (MHD) equations driven by additive noise on torus. For arbitrarily prescribed divergence-free initial data in $L^{2}_x$, we construct infinitely many probabilistically strong and analitically weak solutions in the class $L^{r}_ΩL_{t}^γW_{x}^{s,p}$, where $r>1$ and $(s, γ, p)$ lie in a supercritical regime with respect to the the Ladyžhens…
▽ More
We are concerned with the 3D stochastic magnetohydrodynamic (MHD) equations driven by additive noise on torus. For arbitrarily prescribed divergence-free initial data in $L^{2}_x$, we construct infinitely many probabilistically strong and analitically weak solutions in the class $L^{r}_ΩL_{t}^γW_{x}^{s,p}$, where $r>1$ and $(s, γ, p)$ lie in a supercritical regime with respect to the the Ladyžhenskaya-Prodi-Serrin (LPS) criteria. In particular, we get the non-uniqueness of probabilistically strong solutions, which is sharp at one LPS endpoint space. Our proof utilizes intermittent flows which are different from those of Navier-Stokes equations and derives the non-uniqueness even in the high viscous and resistive regime beyond the Lions exponent 5/4. Furthermore, we prove that as the noise intensity tends to zero, the accumulation points of stochastic MHD solutions contain all deterministic solutions to MHD solutions, which include the recently constructed solutions in [28, 29] to deterministic MHD systems.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Unidirectional imaging with partially coherent light
Authors:
Guangdong Ma,
Che-Yung Shen,
Jingxi Li,
Luzhe Huang,
Cagatay Isil,
Fazil Onuralp Ardic,
Xilin Yang,
Yuhang Li,
Yuntian Wang,
Md Sadman Sakib Rahman,
Aydogan Ozcan
Abstract:
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th…
▽ More
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting the image formation in the backward direction (B->A) along with low power efficiency. Our reciprocal design features a set of spatially engineered linear diffractive layers that are statistically optimized for partially coherent illumination with a given phase correlation length. Our analyses reveal that when illuminated by a partially coherent beam with a correlation length of ~1.5 w or larger, where w is the wavelength of light, diffractive unidirectional imagers achieve robust performance, exhibiting asymmetric imaging performance between the forward and backward directions - as desired. A partially coherent unidirectional imager designed with a smaller correlation length of less than 1.5 w still supports unidirectional image transmission, but with a reduced figure of merit. These partially coherent diffractive unidirectional imagers are compact (axially spanning less than 75 w), polarization-independent, and compatible with various types of illumination sources, making them well-suited for applications in asymmetric visual information processing and communication.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection
Authors:
Jia Wei,
Yun Li,
Meiyu Qiu,
Hongyu Chen,
Xiaomao Fan,
Wenbin Lei
Abstract:
Laryngo-pharyngeal cancer (LPC) is a highly fatal malignant disease affecting the head and neck region. Previous studies on endoscopic tumor detection, particularly those leveraging dual-branch network architectures, have shown significant advancements in tumor detection. These studies highlight the potential of dual-branch networks in improving diagnostic accuracy by effectively integrating globa…
▽ More
Laryngo-pharyngeal cancer (LPC) is a highly fatal malignant disease affecting the head and neck region. Previous studies on endoscopic tumor detection, particularly those leveraging dual-branch network architectures, have shown significant advancements in tumor detection. These studies highlight the potential of dual-branch networks in improving diagnostic accuracy by effectively integrating global and local (lesion) feature extraction. However, they are still limited in their capabilities to accurately locate the lesion region and capture the discriminative feature information between the global and local branches. To address these issues, we propose a novel SAM-guided fusion network (SAM-FNet), a dual-branch network for laryngo-pharyngeal tumor detection. By leveraging the powerful object segmentation capabilities of the Segment Anything Model (SAM), we introduce the SAM into the SAM-FNet to accurately segment the lesion region. Furthermore, we propose a GAN-like feature optimization (GFO) module to capture the discriminative features between the global and local branches, enhancing the fusion feature complementarity. Additionally, we collect two LPC datasets from the First Affiliated Hospital (FAHSYSU) and the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University. The FAHSYSU dataset is used as the internal dataset for training the model, while the SAHSYSU dataset is used as the external dataset for evaluating the model's performance. Extensive experiments on both datasets of FAHSYSU and SAHSYSU demonstrate that the SAM-FNet can achieve competitive results, outperforming the state-of-the-art counterparts. The source code of SAM-FNet is available at the URL of https://github.com/VVJia/SAM-FNet.
△ Less
Submitted 14 August, 2024; v1 submitted 10 August, 2024;
originally announced August 2024.
-
Designing Band Structures by Patterned Dielectric Superlattices
Authors:
Zhen Zhan,
Yonggang Li,
Pierre A. Pantaleon
Abstract:
We investigate the electronic structure of graphene monolayers subjected to patterned dielectric superlattices. Through a quantum capacitance model approach, we simulate realistic devices capable of imposing periodic potentials on graphene. By means of both tight-binding and continuum models, we analyze the electronic structure across varied patterning geometries, including triangular, kagome, and…
▽ More
We investigate the electronic structure of graphene monolayers subjected to patterned dielectric superlattices. Through a quantum capacitance model approach, we simulate realistic devices capable of imposing periodic potentials on graphene. By means of both tight-binding and continuum models, we analyze the electronic structure across varied patterning geometries, including triangular, kagome, and square configurations. We explicitly explore the influence of device parameters such as the superlattice potential strength, geometry, and periodicity on the electronic properties of graphene. By introducing a long-range Coulomb interaction, we found an emergent periodic potential strong enough to open a mass gap, thereby generating a Chern band. Our study highlights the robustness and versatility of patterned dielectric superlattices for band engineering in graphene systems.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Global Existence of Large Strong Solutions to the 3D Full Compressible Navier-Stokes Equations with Density-dependent Viscosity
Authors:
Yachun Li,
Peng Lu,
Zhaoyang Shang,
Shaojun Yu
Abstract:
The purpose of this work is to investigate the Cauchy problem of global in time existence of large strong solutions to the Navier-Stokes equations for viscous compressible and heat conducting fluids. A class of density-dependent viscosity is considered. By introducing the modified effective viscous flux and using the bootstrap argument, we establish the global existence of large strong solutions w…
▽ More
The purpose of this work is to investigate the Cauchy problem of global in time existence of large strong solutions to the Navier-Stokes equations for viscous compressible and heat conducting fluids. A class of density-dependent viscosity is considered. By introducing the modified effective viscous flux and using the bootstrap argument, we establish the global existence of large strong solutions when the initial density is linearly equivalent to a large constant state. It should be mentioned that this result is obtained without any restrictions on the size of initial velocity and initial temperature.
△ Less
Submitted 26 August, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Observation of muonic Dalitz decays of $χ_{b}$ mesons and precise spectroscopy of hidden-beauty states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1114 additional authors not shown)
Abstract:
The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay…
▽ More
The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay modes are used for precision measurements of the mass and mass splittings for the hidden-beauty states.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Lithography-free patterning of chalcogenide materials for integrated photonic devices
Authors:
Zhen Hu,
Yuru Li,
Yan Li,
Shunyu Yao,
Hongfei Chen,
Tao Zhang,
Zhaohuan Ao,
Zhaohui Li
Abstract:
Chalcogenide material-based integrated photonic devices have garnered widespread attention due to their unique wideband transparency. Despite their recognized CMOS compatibility, the fabrication of these devices relies predominantly on lithography techniques. However, chalcogenide thin films are highly susceptible to oxidation, necessitating customized process flows and complex protective measures…
▽ More
Chalcogenide material-based integrated photonic devices have garnered widespread attention due to their unique wideband transparency. Despite their recognized CMOS compatibility, the fabrication of these devices relies predominantly on lithography techniques. However, chalcogenide thin films are highly susceptible to oxidation, necessitating customized process flows and complex protective measures during lithography. These requirements are hardly compatible with current commercial CMOS manufacturing platforms designed for silicon photonics, significantly limiting the practical applications of chalcogenide photonic devices. In this work, we ingeniously exploit the ease of oxidation of chalcogenide materials, presenting a novel laser-induced localized oxidation technique for spatial patterning on chalcogenide thin films, enabling concise lithography-free fabrication of chalcogenide integrated photonic devices. Using Sb2S3 as an example, we experimentally demonstrate localized multi-level oxidation with a sizable overall refractive index contrast of 0.7 at near-infrared, featuring a high spatial resolution of 0.6 um. Based on this technique, multiple integrated photonic devices are demonstrated, showing versatile functionalities, including color printing at visible and metasurface-based spatial light modulation at near-infrared regions. Leveraging the inherent phase-change property of Sb2S3, an active Fresnel zone plate, enabling switchable beam focusing, is further demonstrated, indicating the feasibility of concise fabrication of active photonic devices. Our work offers a brand-new modulation dimension for chalcogenide materials and provides a significantly simplified approach for realizing chalcogenide-integrated photonic devices.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation
Authors:
Huilin Tian,
Jingke Meng,
Wei-Shi Zheng,
Yuan-Ming Li,
Junkai Yan,
Yunong Zhang
Abstract:
Vision and Language Navigation (VLN) is a challenging task that requires agents to understand instructions and navigate to the destination in a visual environment.One of the key challenges in outdoor VLN is keeping track of which part of the instruction was completed. To alleviate this problem, previous works mainly focus on grounding the natural language to the visual input, but neglecting the cr…
▽ More
Vision and Language Navigation (VLN) is a challenging task that requires agents to understand instructions and navigate to the destination in a visual environment.One of the key challenges in outdoor VLN is keeping track of which part of the instruction was completed. To alleviate this problem, previous works mainly focus on grounding the natural language to the visual input, but neglecting the crucial role of the agent's spatial position information in the grounding process. In this work, we first explore the substantial effect of spatial position locating on the grounding of outdoor VLN, drawing inspiration from human navigation. In real-world navigation scenarios, before planning a path to the destination, humans typically need to figure out their current location. This observation underscores the pivotal role of spatial localization in the navigation process. In this work, we introduce a novel framework, Locating be for Planning (Loc4Plan), designed to incorporate spatial perception for action planning in outdoor VLN tasks. The main idea behind Loc4Plan is to perform the spatial localization before planning a decision action based on corresponding guidance, which comprises a block-aware spatial locating (BAL) module and a spatial-aware action planning (SAP) module. Specifically, to help the agent perceive its spatial location in the environment, we propose to learn a position predictor that measures how far the agent is from the next intersection for reflecting its position, which is achieved by the BAL module. After the locating process, we propose the SAP module to incorporate spatial information to ground the corresponding guidance and enhance the precision of action planning. Extensive experiments on the Touchdown and map2seq datasets show that the proposed Loc4Plan outperforms the SOTA methods.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Global Strong Solutions to the Cauchy Problem of Three-dimensional Isentropic Magnetohydrodynamics Equations with Large Initial Data
Authors:
Yachun Li,
Peng Lu,
Zhaoyang Shang
Abstract:
We consider the Cauchy problem to the three-dimensional isentropic compressible Magnetohydrodynamics (MHD) system with density-dependent viscosities. When the initial density is linearly equivalent to a large constant state, we prove that strong solutions exist globally in time, and there is no restriction on the size of the initial velocity and initial magnetic field.
We consider the Cauchy problem to the three-dimensional isentropic compressible Magnetohydrodynamics (MHD) system with density-dependent viscosities. When the initial density is linearly equivalent to a large constant state, we prove that strong solutions exist globally in time, and there is no restriction on the size of the initial velocity and initial magnetic field.
△ Less
Submitted 18 August, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
High-resolution closed-loop seismic inversion network in time-frequency phase mixed domain
Authors:
Yingtian Liu,
Yong Li,
Junheng Peng,
Huating Li,
Mingwei Wang
Abstract:
Thin layers and reservoirs may be concealed in areas of low seismic reflection amplitude, making them difficult to recognize. Deep learning (DL) techniques provide new opportunities for accurate impedance prediction by establishing a nonlinear mapping between seismic data and impedance. However, existing methods primarily use time domain seismic data, which limits the capture of frequency bands, t…
▽ More
Thin layers and reservoirs may be concealed in areas of low seismic reflection amplitude, making them difficult to recognize. Deep learning (DL) techniques provide new opportunities for accurate impedance prediction by establishing a nonlinear mapping between seismic data and impedance. However, existing methods primarily use time domain seismic data, which limits the capture of frequency bands, thus leading to insufficient resolution of the inversion results. To address these problems, we introduce a new time-frequency-phase (TFP) mixed-domain closed-loop seismic inversion network (TFP-CSIN) to improve the identification of thin layers and reservoirs. First, the inversion network and closed-loop network are constructed by using bidirectional gated recurrent units (Bi-GRU) and convolutional neural network (CNN) architectures, enabling bidirectional mapping between seismic data and impedance data. Next, to comprehensive learning across the entire frequency spectrum, the Fourier transform is used to capture frequency information and establish frequency domain constraints. At the same time, the phase domain constraint is introduced through Hilbert transformation, which improves the method's ability to recognize the weak reflection region features. Both experiments on the synthetic data show that TFP-CSIN outperforms the traditional supervised learning method and time domain semi-supervised learning methods in seismic inversion. The field data further verify that the proposed method improves the identification ability of weak reflection areas and thin layers.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Authors:
Zhibo Zhang,
Wuxia Bai,
Yuxi Li,
Mark Huasong Meng,
Kailong Wang,
Ling Shi,
Li Li,
Jun Wang,
Haoyu Wang
Abstract:
Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in th…
▽ More
Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.
In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Investigating and improving student understanding of the basics of quantum computing
Authors:
Peter Hu,
Yangqiuting Li,
Chandralekha Singh
Abstract:
Quantum information science and engineering (QISE) is a rapidly developing field that leverages the skills of experts from many disciplines to utilize the potential of quantum systems in a variety of applications. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities,…
▽ More
Quantum information science and engineering (QISE) is a rapidly developing field that leverages the skills of experts from many disciplines to utilize the potential of quantum systems in a variety of applications. It requires talent from a wide variety of traditional fields, including physics, engineering, chemistry, and computer science, to name a few. To prepare students for such opportunities, it is important to give them a strong foundation in the basics of QISE, in which quantum computing plays a central role. In this study, we discuss the development, validation, and evaluation of a QuILT, or Quantum Interactive Learning Tutorial, on the basics and applications of quantum computing. These include an overview of key quantum mechanical concepts relevant for quantum computation (including ways a quantum computer is different from a classical computer), properties of single- and multi-qubit systems, and the basics of single-qubit quantum gates. The tutorial uses guided inquiry-based teaching-learning sequences. Its development and validation involved conducting cognitive task analysis from both expert and student perspectives and using common student difficulties as a guide. The inquiry-based learning sequences in the tutorial provide scaffolding support to help students develop a functional understanding. The final version of the validated tutorial was implemented in two distinct courses offered by the physics department with slightly different student populations and broader course goals. Students' understanding was evaluated after traditional lecture-based instruction on the requisite concepts, and again after engaging with the tutorial. We analyze and discuss their improvement in performance on concepts covered in the tutorial.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Your Classifier Can Be Secretly a Likelihood-Based OOD Detector
Authors:
Jirayu Burapacheep,
Yixuan Li
Abstract:
The ability to detect out-of-distribution (OOD) inputs is critical to guarantee the reliability of classification models deployed in an open environment. A fundamental challenge in OOD detection is that a discriminative classifier is typically trained to estimate the posterior probability p(y|z) for class y given an input z, but lacks the explicit likelihood estimation of p(z) ideally needed for O…
▽ More
The ability to detect out-of-distribution (OOD) inputs is critical to guarantee the reliability of classification models deployed in an open environment. A fundamental challenge in OOD detection is that a discriminative classifier is typically trained to estimate the posterior probability p(y|z) for class y given an input z, but lacks the explicit likelihood estimation of p(z) ideally needed for OOD detection. While numerous OOD scoring functions have been proposed for classification models, these estimate scores are often heuristic-driven and cannot be rigorously interpreted as likelihood. To bridge the gap, we propose Intrinsic Likelihood (INK), which offers rigorous likelihood interpretation to modern discriminative-based classifiers. Specifically, our proposed INK score operates on the constrained latent embeddings of a discriminative classifier, which are modeled as a mixture of hyperspherical embeddings with constant norm. We draw a novel connection between the hyperspherical distribution and the intrinsic likelihood, which can be effectively optimized in modern neural networks. Extensive experiments on the OpenOOD benchmark empirically demonstrate that INK establishes a new state-of-the-art in a variety of OOD detection setups, including both far-OOD and near-OOD. Code is available at https://github.com/deeplearning-wisc/ink.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation
Authors:
Yifan Feng,
Jiangang Huang,
Shaoyi Du,
Shihui Ying,
Jun-Hai Yong,
Yipeng Li,
Guiguang Ding,
Rongrong Ji,
Yue Gao
Abstract:
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propos…
▽ More
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation. This enables the model to acquire both semantic and structural information, advancing beyond conventional feature-focused learning. Hyper-YOLO incorporates the proposed Mixed Aggregation Network (MANet) in its backbone for enhanced feature extraction and introduces the Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net) in its neck. HyperC2Net operates across five scales and breaks free from traditional grid structures, allowing for sophisticated high-order interactions across levels and positions. This synergy of components positions Hyper-YOLO as a state-of-the-art architecture in various scale models, as evidenced by its superior performance on the COCO dataset. Specifically, Hyper-YOLO-N significantly outperforms the advanced YOLOv8-N and YOLOv9-T with 12\% $\text{AP}^{val}$ and 9\% $\text{AP}^{val}$ improvements. The source codes are at ttps://github.com/iMoonLab/Hyper-YOLO.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp Planning
Authors:
Wenqiang Xu,
Jieyi Zhang,
Tutian Tang,
Zhenjun Yu,
Yutong Li,
Cewu Lu
Abstract:
Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a…
▽ More
Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a gradient-based optimization scheme on the metric, which also considers parallel sampling and collision handling. This not only drastically accelerates the grasp search process over the object surface but also makes it differentiable. We apply DiPGrasp to three applications, namely grasp dataset construction, mask-conditioned planning, and pose refinement. For dataset generation, as a standalone planner, DiPGrasp has clear advantages over speed and quality compared with several classic planners. For mask-conditioned planning, it can turn a 3D perception model into a 3D grasp detection model instantly. As a pose refiner, it can optimize the coarse grasp prediction from the neural network, as well as the neural network parameters. Finally, we conduct real-world experiments with the Barrett hand and Schunk SVH 5-finger hand. Video and supplementary materials can be viewed on our website: \url{https://dipgrasp.robotflow.ai}.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
Authors:
Zifeng Ding,
Yifeng Li,
Yuan He,
Antonio Norelli,
Jingcheng Wu,
Volker Tresp,
Yunpu Ma,
Michael Bronstein
Abstract:
Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2)…
▽ More
Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
APE: Active Learning-based Tooling for Finding Informative Few-shot Examples for LLM-based Entity Matching
Authors:
Kun Qian,
Yisi Sang,
Farima Fatahi Bayat,
Anton Belyi,
Xianqi Chu,
Yash Govind,
Samira Khorshidi,
Rahul Khot,
Katherine Luna,
Azadeh Nikfarjam,
Xiaoguang Qi,
Fei Wu,
Xianhan Zhang,
Yunyao Li
Abstract:
Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstratio…
▽ More
Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive, frequently entailing sifting through an extensive search space. In this demonstration, we showcase a human-in-the-loop tool called APE (Active Prompt Engineering) designed for refining prompts through active learning. Drawing inspiration from active learning, APE iteratively selects the most ambiguous examples for human feedback, which will be transformed into few-shot examples within the prompt. The demo recording can be found with the submission or be viewed at https://youtu.be/OwQ6MQx53-Y.
△ Less
Submitted 29 July, 2024;
originally announced August 2024.
-
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Authors:
Qirui Jiao,
Daoyuan Chen,
Yilun Huang,
Yaliang Li,
Ying Shen
Abstract:
High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct c…
▽ More
High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct components. We utilize the Stable-Diffusion-XL model and advanced image editing techniques to create pairs of similar images that highlight object replacements. Our methodology includes a Difference Area Generator for object differences identifying, followed by a Difference Captions Generator for detailed difference descriptions. The result is a relatively small but high-quality dataset of "object replacement" samples. We use the the proposed dataset to finetune state-of-the-art (SOTA) MLLMs such as MGM-7B, yielding comprehensive improvements of performance scores over SOTA models that trained with larger-scale datasets, in numerous image difference and Visual Question Answering tasks. For instance, our trained models notably surpass the SOTA models GPT-4V and Gemini on the MMVP benchmark. Besides, we investigate alternative methods for generating image difference data through "object removal" and conduct a thorough evaluation to confirm the dataset's diversity, quality, and robustness, presenting several insights on the synthesis of such a contrastive dataset. To encourage further research and advance the field of multimodal data synthesis and enhancement of MLLMs' fundamental capabilities for image understanding, we release our codes and dataset at https://github.com/modelscope/data-juicer/tree/ImgDiff.
△ Less
Submitted 9 August, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches
Authors:
Yongzhi Xu,
Yonhon Ng,
Yifu Wang,
Inkyu Sa,
Yunfei Duan,
Yang Li,
Pan Ji,
Hongdong Li
Abstract:
3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc. This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes, all from the user's casual prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and convenient way to…
▽ More
3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc. This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes, all from the user's casual prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and convenient way to convey the user's design intention in the content creation process. To circumvent the data-deficient challenge in learning (i.e. the lack of large training data of 3D scenes), our method leverages a pre-trained 2D denoising diffusion model to generate a 2D image of the scene as the conceptual guidance. In this process, we adopt the isometric projection mode to factor out unknown camera poses while obtaining the scene layout. From the generated isometric image, we use a pre-trained image understanding method to segment the image into meaningful parts, such as off-ground objects, trees, and buildings, and extract the 2D scene layout. These segments and layouts are subsequently fed into a procedural content generation (PCG) engine, such as a 3D video game engine like Unity or Unreal, to create the 3D scene. The resulting 3D scene can be seamlessly integrated into a game development environment and is readily playable. Extensive tests demonstrate that our method can efficiently generate high-quality and interactive 3D game scenes with layouts that closely follow the user's intention.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Randomness versus Nonlocality in Multi-input and Multi-output Quantum Scenario
Authors:
Chao Zhang,
Yi Li,
Xiao-Min Hu,
Yu Xiang,
Chuan-Feng Li,
Guang-Can Guo,
Jordi Tura,
Qihuang Gong,
Qiongyi He,
Bi-Heng Liu
Abstract:
Device-independent randomness certification based on Bell nonlocality does not require any assumptions about the devices and therefore provides adequate security. Great effort has been made to demonstrate that nonlocality is necessary for generating quantum randomness, but the minimal resource required for random number generation has not been clarified. Here we first prove and experimentally demo…
▽ More
Device-independent randomness certification based on Bell nonlocality does not require any assumptions about the devices and therefore provides adequate security. Great effort has been made to demonstrate that nonlocality is necessary for generating quantum randomness, but the minimal resource required for random number generation has not been clarified. Here we first prove and experimentally demonstrate that violating any two-input Bell inequality is both necessary and sufficient for certifying randomness, however, for the multi-input cases, this sufficiency ceases to apply, leading to certain states exhibiting Bell nonlocality without the capability to certify randomness. We examine two typical classes of Bell inequalities with multi-input and multi-output, the facet inequalities and Salavrakos-Augusiak-Tura-Wittek-Acín-Pironio Bell inequalities, in the high-dimensional photonic system, and observe the violation of the latter one can always certify randomness which is not true for the former. The private randomness with a generation rate of 1.867\pm0.018 bits per photon pair is obtained in the scenario of Salavrakos-Augusiak-Tura-Wittek-Acín-Pironio Bell inequalities with 3-input and 4-output. Our work unravels the internal connection between randomness and nonlocality, and effectively enhances the performance of tasks such as device-independent random number generation.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Unconventional Hall effects in a quasi-kagome Kondo Weyl semimetal candidate Ce$_3$TiSb$_5$
Authors:
Xiaobo He,
Ying Li,
Yongheng Ge,
Hai Zeng,
Shi-Jie Song,
Shuo Zou,
Zhuo Wang,
Yuke Li,
Wenxin Ding,
Jianhui Dai,
Guang-Han Cao,
Xiao-Xiao Zhang,
Gang Xu,
Yongkang Luo
Abstract:
It is generally believed that electronic correlation, geometric frustration, and topology, \textit{individually}, can facilitate the emergence of various intriguing properties that have attracted a broad audience for both fundamental research and potential applications. Here, we report a systematic investigation on a quasi-kagome Kondo Weyl semimetal candidate Ce$_3$TiSb$_5$. A series of unconvent…
▽ More
It is generally believed that electronic correlation, geometric frustration, and topology, \textit{individually}, can facilitate the emergence of various intriguing properties that have attracted a broad audience for both fundamental research and potential applications. Here, we report a systematic investigation on a quasi-kagome Kondo Weyl semimetal candidate Ce$_3$TiSb$_5$. A series of unconventional Hall effects are observed. In the paramagnetic phase, signature of dynamic $c$-$f$ hybridization is revealed by a reduction of anomalous Hall effect and is connected to frustration-promoted incoherent Kondo scattering. A large topological Hall effect exceeding 0.2 $μΩ$ cm is found at low temperatures, which should be ascribed to the noncolinear magnetic structures of the frustrated quasi-kagome lattice. In addition, a peculiar loop-shaped Hall effect with switching chirality is also seen, which is inferred to be associated with magnetic domain walls that pin history-dependent spin chirality and / or Fermi-arc surface states projected from the in-gap Weyl nodes. These exotic results place Ce$_3$TiSb$_5$ in a regime of highly-frustrated antiferromagnetic dense Kondo lattice with a nontrivial topology on an ``extended" global phase diagram, and highlight the interplay among electronic correlation, geometric frustration and topology.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Analysis of the dynamics of the decay $D^{+}\to K_{S}^{0} π^{0} e^{+}ν_{e}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The branching fraction of $D^+\to K_{S}^{0} π^{0}e^+ν_e$ is measured for the first time using $7.93~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$~GeV with the BESIII detector operating at the BEPCII collider, and is determined to be ${\mathcal B}$($D^+\to K_S^0π^0e^+ν_e$) = $(0.881~\pm~0.017_{\rm stat.}~\pm~0.016_{\rm syst.})$\%. Based on a…
▽ More
The branching fraction of $D^+\to K_{S}^{0} π^{0}e^+ν_e$ is measured for the first time using $7.93~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$~GeV with the BESIII detector operating at the BEPCII collider, and is determined to be ${\mathcal B}$($D^+\to K_S^0π^0e^+ν_e$) = $(0.881~\pm~0.017_{\rm stat.}~\pm~0.016_{\rm syst.})$\%. Based on an analysis of the $D^+\to K_S^0π^0e^+ν_e$ decay dynamics, we observe the $S\text{-}{\rm wave}$ and $P$-wave components with fractions of $f_{S\text{-}{\rm wave}}$ = $(6.13~\pm~0.27_{\rm stat.}~\pm ~0.30_{\rm syst.})\%$ and $f_{\bar K^{*}(892)^0}$ = $(93.88~\pm~0.27_{\rm stat.}~\pm~0.29_{\rm syst.})$\%, respectively. From these results, we obtain the branching fractions ${\mathcal B}$($D^+\to (K_S^0π^0)_{S\text{-}{\rm wave}}~e^+ν_e$) = $(5.41~\pm~0.35_{\rm stat.}~\pm~0.37_{\rm syst.})\times10^{-4}$ and ${\mathcal B}$($D^+\to \bar K^{*}(892)^0e^+ν_e$) = $(4.97~\pm~0.11_{\rm stat.}~\pm~0.12_{\rm syst.})$\%. In addition, the hadronic form-factor ratios of $D^{+} \to \bar {K}^{*}(892)^0e^+ν_e$ at $q^2=0$, assuming a single-pole dominance parameterization, are determined to be $r_V=\frac{V(0)}{A_1(0)}= 1.43~\pm~0.07_{\rm stat.}~\pm~0.03_{\rm syst.}$ and $r_2=\frac{A_2(0)}{A_1(0)}=0.72~\pm~0.06_{\rm stat.}~\pm~0.02_{\rm syst.}$.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
TupleChain: Fast Lookup of OpenFlow Table with Multifaceted Scalability
Authors:
Yanbiao Li,
Neng Ren,
Xin Wang,
Yuxuan Chen,
Xinyi Zhang,
Lingbo Guo,
Gaogang Xie
Abstract:
OpenFlow switches are fundamental components of software defined networking, where the key operation is to look up flow tables to determine which flow an incoming packet belongs to. This needs to address the same multi-field rule-matching problem as legacy packet classification, but faces more serious scalability challenges. The demand of fast on-line updates makes most existing solutions unfit, w…
▽ More
OpenFlow switches are fundamental components of software defined networking, where the key operation is to look up flow tables to determine which flow an incoming packet belongs to. This needs to address the same multi-field rule-matching problem as legacy packet classification, but faces more serious scalability challenges. The demand of fast on-line updates makes most existing solutions unfit, while the rest still lacks the scalability to either large data sets or large number of fields to match for a rule. In this work, we propose TupleChain for fast OpenFlow table lookup with multifaceted scalability. We group rules based on their masks, each being maintained with a hash table, and explore the connections among rule groups to skip unnecessary hash probes for fast search. We show via theoretical analysis and extensive experiments that the proposed scheme not only has competitive computing complexity, but is also scalable and can achieve high performance in both search and update. It can process multiple millions of packets per second, while dealing with millions of on-line updates per second at the same time, and its lookup speed maintains at the same level no mater it handles a large flow table with 10 million rules or a flow table with every entry having as many as 100 match fields.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation
Authors:
Xingwei Qu,
Ge Zhang,
Siwei Wu,
Yizhi Li,
Chenghua Lin
Abstract:
This paper presents the results of the shared task on Chinese metaphor generation, hosted at the 13th CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2024). The goal of this shared task is to generate Chinese metaphors using machine learning techniques and effectively identifying basic components of metaphorical sentences. It is divided into two subtasks: 1) Metaphor Gen…
▽ More
This paper presents the results of the shared task on Chinese metaphor generation, hosted at the 13th CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2024). The goal of this shared task is to generate Chinese metaphors using machine learning techniques and effectively identifying basic components of metaphorical sentences. It is divided into two subtasks: 1) Metaphor Generation, which involves creating a metaphor from a provided tuple consisting of TENOR, GROUND, and VEHICLE. The goal here is to synthesize a metaphor that connects the subject (i.e. TENOR) with the object (i.e. VEHICLE), guided by the concept of the GROUND. 2) Metaphor Components Identification, which extracts the most fitting TENORs, GROUNDs, and VEHICLEs from a metaphorical sentence. This component requires the identification of the most fitting metaphor elements that correspond to the specified grounds. In addition to overall results, we report on the setup and insights from the metaphor generation shared task, which attracted a total of 4 participating teams across both subtasks.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Ultrabright-entanglement-based quantum key distribution over a 404-km-long optical fiber
Authors:
Shi-Chang Zhuang,
Bo Li,
Ming-Yang Zheng,
Yi-Xi Zeng,
Hui-Nan Wu,
Guang-Bing Li,
Quan Yao,
Xiu-Ping Xie,
Yu-Huai Li,
Hao Qin,
Li-Xing You,
Fei-Hu Xu,
Juan Yin,
Yuan Cao,
Qiang Zhang,
Cheng-Zhi Peng,
Jian-Wei Pan
Abstract:
The entangled photons are crucial resources for quantum communications and networking. Here, we present an ultra-bright polarization-entangled photon source based on a periodically poled lithium niobate waveguide designed for practical quantum communication networks. Using a 780 nm pump laser, the source achieves a pair generation rate of 2.4 $\times 10^{10}$ pairs/s/mW. This work has achieved a d…
▽ More
The entangled photons are crucial resources for quantum communications and networking. Here, we present an ultra-bright polarization-entangled photon source based on a periodically poled lithium niobate waveguide designed for practical quantum communication networks. Using a 780 nm pump laser, the source achieves a pair generation rate of 2.4 $\times 10^{10}$ pairs/s/mW. This work has achieved a directly measured power of 17.9 nW in entangled photon generation with a 3.2 mW pump power. Based on this, we demonstrate the practicality of the source by conducting quantum key distribution experiments over long-distance fiber links, achieving the applicable secure key rates of up to 440.80 bits/s over 200 km with 62 dB loss and reaching a maximum secure key generation distance of 404 km. These results demonstrate the potential of wavelength-multiplexed polarization-entangled photon sources for high-speed, long-distance quantum communication, positioning them as key components for future large-scale quantum networks.
△ Less
Submitted 8 August, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
VideoQA in the Era of LLMs: An Empirical Study
Authors:
Junbin Xiao,
Nanxin Huang,
Hangyu Qin,
Dongyang Li,
Yicong Li,
Fengbin Zhu,
Zhulin Tao,
Jianxing Yu,
Liang Lin,
Tat-Seng Chua,
Angela Yao
Abstract:
Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video underst…
▽ More
Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video understanding and question answering. Our analyses demonstrate that Video-LLMs excel in VideoQA; they can correlate contextual cues and generate plausible responses to questions about varied video contents. However, models falter in handling video temporality, both in reasoning about temporal content ordering and grounding QA-relevant temporal moments. Moreover, the models behave unintuitively - they are unresponsive to adversarial video perturbations while being sensitive to simple variations of candidate answers and questions. Also, they do not necessarily generalize better. The findings demonstrate Video-LLMs' QA capability in standard condition yet highlight their severe deficiency in robustness and interpretability, suggesting the urgent need on rationales in Video-LLM developing.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Quantum-Enhanced Polarimetric Imaging
Authors:
Meng-Yu Xie,
Su-Jian Niu,
Zhao-Qi-Zhi Han,
Yin-Hai Li,
Ren-Hui Chen,
Xiao-Hua Wang,
Ming-Yuan Gao,
Li Chen,
Yue-Wei Song,
Zhi-Yuan Zhou,
Bao-Sen Shi
Abstract:
Polarimetric imaging, a technique that captures the invisible polarization-related properties of given materials, has broad applications from fundamental physics to advanced fields such as target recognition, stress detection, biomedical diagnosis and remote sensing. The introduction of quantum sources into classical imaging systems has demonstrated distinct advantages, yet few studies have explor…
▽ More
Polarimetric imaging, a technique that captures the invisible polarization-related properties of given materials, has broad applications from fundamental physics to advanced fields such as target recognition, stress detection, biomedical diagnosis and remote sensing. The introduction of quantum sources into classical imaging systems has demonstrated distinct advantages, yet few studies have explored their combination with polarimetric imaging. In this study, we present a quantum polarimetric imaging system that integrates polarization-entangled photon pairs into a polarizer-sample-compensator-analyzer (PSRA)-type polarimeter. Our system visualizes the birefringence properties of a periodical-distributed anisotropic material under decreasing illumination levels and diverse disturbing light sources. Compared to the classical system, the quantum approach reveals the superior sensitivity and robustness in low-light conditions, particularly useful in biomedical studies where the low illumination and non-destructive detection are urgently needed. The study also highlights the nonlocality of entangled photons in birefringence measurement, indicating the potential of quantum polarimetric system in the remote sensing domain.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions
Authors:
Qingbin Zeng,
Qinglong Yang,
Shunan Dong,
Heming Du,
Liang Zheng,
Fengli Xu,
Yong Li
Abstract:
This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it req…
▽ More
This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it requires agent to establish self-position and acquire spatial representation of complex urban environment, where landmarks are often invisible. In the absence of navigation instructions, such abilities are vital for the agent to make high-quality decisions in long-range city navigation. With the emergent reasoning ability of large language models (LLMs), a tempting baseline is to prompt LLMs to "react" on each observation and make decisions accordingly. However, this baseline has very poor performance that the agent often repeatedly visits same locations and make short-sighted, inconsistent decisions. To address these issues, this paper introduces a novel agentic workflow featured by its abilities to perceive, reflect and plan. Specifically, we find LLaVA-7B can be fine-tuned to perceive the direction and distance of landmarks with sufficient accuracy for city navigation. Moreover, reflection is achieved through a memory mechanism, where past experiences are stored and can be retrieved with current perception for effective decision argumentation. Planning uses reflection results to produce long-term plans, which can avoid short-sighted decisions in long-range navigation. We show the designed workflow significantly improves navigation ability of the LLM agent compared with the state-of-the-art baselines.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
A Metastable Pentagonal 2D Material Synthesized by Symmetry-Driven Epitaxy
Authors:
Lina Liu,
Yujin Ji,
Marco Bianchi,
Saban M. Hus,
Zheshen Li,
Richard Balog,
Jill A. Miwa,
Philip Hofmann,
An-ping Li,
Dmitry Y. Zemlyanov,
Youyong Li,
Yong P. Chen
Abstract:
Most two-dimensional (2D) materials experimentally studied so far have hexagons as their building blocks. Only a few exceptions, such as PdSe2, are lower in energy in pentagonal phases and exhibit pentagons as building blocks. While theory has predicted a large number of pentagonal 2D materials, many of them are metastable and their experimental realization is difficult. Here we report the success…
▽ More
Most two-dimensional (2D) materials experimentally studied so far have hexagons as their building blocks. Only a few exceptions, such as PdSe2, are lower in energy in pentagonal phases and exhibit pentagons as building blocks. While theory has predicted a large number of pentagonal 2D materials, many of them are metastable and their experimental realization is difficult. Here we report the successful synthesis of a metastable pentagonal 2D material, the monolayer pentagonal PdTe2, by symmetry-driven epitaxy. Scanning tunneling microscopy and complementary spectroscopy measurements are used to characterize the monolayer pentagonal PdTe2, which demonstrates well-ordered low-symmetry atomic arrangements and is stabilized by lattice matching with the underlying Pd(100) substrate. Theoretical calculations, along with angle-resolved photoemission spectroscopy, reveal monolayer pentagonal PdTe2 is a semiconductor with an indirect bandgap of 1.05 eV. Our work opens an avenue for the synthesis of pentagon-based 2D materials and gives opportunities to explore their applications such as multifunctional nanoelectronics.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
From Black Box to Clarity: AI-Powered Smart Grid Optimization with Kolmogorov-Arnold Networks
Authors:
Xiaoting Wang,
Yuzhuo Li,
Yunwei Li,
Gregory Kish
Abstract:
This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectivenes…
▽ More
This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectiveness of this framework.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.