-
Unsupervised Manifold Linearizing and Clustering
Authors:
Tianjiao Ding,
Shengbang Tong,
Kwan Ho Ryan Chan,
Xili Dai,
Yi Ma,
Benjamin D. Haeffele
Abstract:
We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunat…
▽ More
We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.
△ Less
Submitted 24 August, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
An extended plane wave framework for the electronic structure calculations of twisted bilayer material systems
Authors:
Xiaoying Dai,
Aihui Zhou,
Yuzhi Zhou
Abstract:
In this paper, we propose an extended plane wave framework to make the electronic structure calculations of the twisted bilayer 2D material systems practically feasible. Based on the foundation in [Y. Zhou, H. Chen, A. Zhou, J. Comput. Phys. 384, 99 (2019)], following extensions take place: (1) an tensor-producted basis set, which adopts PWs in the incommensurate dimensions, and localized basis in…
▽ More
In this paper, we propose an extended plane wave framework to make the electronic structure calculations of the twisted bilayer 2D material systems practically feasible. Based on the foundation in [Y. Zhou, H. Chen, A. Zhou, J. Comput. Phys. 384, 99 (2019)], following extensions take place: (1) an tensor-producted basis set, which adopts PWs in the incommensurate dimensions, and localized basis in the interlayer dimension, (2) a practical application of a novel cutoff techniques we have recently developed, and (3) a quasi-band structure picture under the small twisted angles and weak interlayer coupling limits. With (1) and (2) now the dimensions of Hamiltonian matrix are reduced by about 2 orders of magnitude compared with the original framework. And (3) enables us to better organize the calculations and understand the results. For numerical examples, we study the electronic structures of the linear bilayer graphene lattice system with the magic twisted angle ($\sim 1.05^{\circ}$). The famous flat bands have been reproduced with their features in quantitative agreement with those from experiments and other theoretical calculations. Moreover, the extended framework has much less computational cost compared to the commensurate cell approximations, and is more extendable compared to the traditional model hamiltonians and tight binding models. Finally this framework can readily accommodate nonlinear models thus will laid the foundations for more effective yet accurate Density Functional Theory (DFT) calculations.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Unconventionally Fast Transport through Sliding Dynamics of Rodlike Particles in Macromolecular Networks
Authors:
Xuanyu Zhang,
Xiaobin Dai,
Md Ahsan Habib,
Ziyang Xu,
Lijuan Gao,
Wenlong Chen,
Wenjie Wei,
Zhongqiu Tang,
Xianyu Qi,
Xiangjun Gong,
Lingxiang Jiang,
Li-Tang Yan
Abstract:
Transport of rodlike particles in confinement environments of macromolecular networks plays crucial roles in many important biological processes and technological applications. The relevant understanding has been limited to thin rods with diameter much smaller than network mesh size, although the opposite case, of which the dynamical behaviors and underlying physical mechanisms remain unclear, is…
▽ More
Transport of rodlike particles in confinement environments of macromolecular networks plays crucial roles in many important biological processes and technological applications. The relevant understanding has been limited to thin rods with diameter much smaller than network mesh size, although the opposite case, of which the dynamical behaviors and underlying physical mechanisms remain unclear, is ubiquitous. Here, we solve this issue by combining experiments, simulations and theory. We find a nonmonotonic dependence of translational diffusion on rod length, characterized by length commensuration-governed unconventionally fast dynamics which is in striking contrast to the monotonic dependence for thin rods. Our results clarify that such a fast diffusion of thick rods with length of integral multiple of mesh size follows sliding dynamics and demonstrate it to be "anomalous yet Brownian". Moreover, good agreement between theoretical analysis and simulations corroborates that the sliding dynamics is an intermediate regime between hopping and Brownian dynamics, and provides a mechanistic interpretation based on the rod-length dependent entropic free energy barrier. The findings yield a principle, that is, length commensuration, for optimal design of rodlike particles with highly efficient transport in confined environments of macromolecular networks, and might enrich the physics of the diffusion dynamics in heterogeneous media.
△ Less
Submitted 19 November, 2023; v1 submitted 26 December, 2022;
originally announced December 2022.
-
First observation and branching fraction measurement of the $Λ_b^0\to D_s^- p$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1040 additional authors not shown)
Abstract:
The first observation of the $Λ_b^0\to D_s^- p$ decay is presented using proton-proton collision data collected by the LHCb experiment at a centre-of-mass energy of ${\sqrt{s}=13 \,\textrm{TeV}}$, corresponding to a total integrated luminosity of $6\,\textrm{fb}^{-1}$. Using the $Λ_b^0\toΛ_c^+π^-$ decay as the normalisation mode, the branching fraction of the $Λ_b^0\to D_s^- p$ decay is measured t…
▽ More
The first observation of the $Λ_b^0\to D_s^- p$ decay is presented using proton-proton collision data collected by the LHCb experiment at a centre-of-mass energy of ${\sqrt{s}=13 \,\textrm{TeV}}$, corresponding to a total integrated luminosity of $6\,\textrm{fb}^{-1}$. Using the $Λ_b^0\toΛ_c^+π^-$ decay as the normalisation mode, the branching fraction of the $Λ_b^0\to D_s^- p$ decay is measured to be ${\mathcal{B}(Λ_b^0\to D_s^- p)=(12.6 \pm 0.5 \pm 0.3 \pm 1.2 )\times 10^{-6}}$, where the first uncertainty is statistical, the second systematic and the third due to uncertainties in the branching fractions of the $Λ_b^0\toΛ_c^+π^-$, $D_s^- \to K^-K^+π^-$ and $Λ_c^+\to p K^- π^+$ decays.
△ Less
Submitted 17 July, 2023; v1 submitted 23 December, 2022;
originally announced December 2022.
-
A Method to Load Tellurium in Liquid Scintillator for the Study of Neutrinoless Double Beta Decay
Authors:
D. J. Auty,
D. Bartlett,
S. D. Biller,
D. Chauhan,
M. Chen,
O. Chkvorets,
S. Connolly,
X. Dai,
E. Fletcher,
K. Frankiewicz,
D. Gooding,
C. Grant,
S. Hall,
D. Horne,
S. Hans,
B. Hreljac,
T. Kaptanoglu,
B. Krar,
C. Kraus,
T. Kroupova',
I. Lam,
Y. Liu,
S. Maguire,
C. Miller,
S. Manecki
, et al. (12 additional authors not shown)
Abstract:
A method has been developed to load tellurium into liquid scintillator so as to permit searches for neutrinoless double beta decay with high sensitivity. The approach involves the synthesis of an oil-soluble tellurium compound from telluric acid and an organic diol. The process utilises distillable chemicals that can be safely handled underground and affords low radioactive backgrounds, low optica…
▽ More
A method has been developed to load tellurium into liquid scintillator so as to permit searches for neutrinoless double beta decay with high sensitivity. The approach involves the synthesis of an oil-soluble tellurium compound from telluric acid and an organic diol. The process utilises distillable chemicals that can be safely handled underground and affords low radioactive backgrounds, low optical absorption and high light yields at loading levels of at least several percent Te by weight.
△ Less
Submitted 4 April, 2023; v1 submitted 23 December, 2022;
originally announced December 2022.
-
Generalized Decoding for Pixel, Image, and Language
Authors:
Xueyan Zou,
Zi-Yi Dou,
Jianwei Yang,
Zhe Gan,
Linjie Li,
Chunyuan Li,
Xiyang Dai,
Harkirat Behl,
Jianfeng Wang,
Lu Yuan,
Nanyun Peng,
Lijuan Wang,
Yong Jae Lee,
Jianfeng Gao
Abstract:
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that…
▽ More
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Measurement of lepton universality parameters in $B^+\to K^+\ell^+\ell^-$ and $B^0\to K^{*0}\ell^+\ell^-$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1039 additional authors not shown)
Abstract:
A simultaneous analysis of the $B^+\to K^+\ell^+\ell^-$ and $B^0\to K^{*0}\ell^+\ell^-$ decays is performed to test muon-electron universality in two ranges of the square of the dilepton invariant mass, $q^2$. The measurement uses a sample of beauty meson decays produced in proton-proton collisions collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of…
▽ More
A simultaneous analysis of the $B^+\to K^+\ell^+\ell^-$ and $B^0\to K^{*0}\ell^+\ell^-$ decays is performed to test muon-electron universality in two ranges of the square of the dilepton invariant mass, $q^2$. The measurement uses a sample of beauty meson decays produced in proton-proton collisions collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. A sequence of multivariate selections and strict particle identification requirements produce a higher signal purity and a better statistical sensitivity per unit luminosity than previous LHCb lepton universality tests using the same decay modes. Residual backgrounds due to misidentified hadronic decays are studied using data and included in the fit model. Each of the four lepton universality measurements reported is either the first in the given $q^2$ interval or supersedes previous LHCb measurements. The results are compatible with the predictions of the Standard Model.
△ Less
Submitted 7 November, 2023; v1 submitted 18 December, 2022;
originally announced December 2022.
-
Test of lepton universality in $b \rightarrow s \ell^+ \ell^-$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1039 additional authors not shown)
Abstract:
The first simultaneous test of muon-electron universality using $B^{+}\rightarrow K^{+}\ell^{+}\ell^{-}$ and $B^{0}\rightarrow K^{*0}\ell^{+}\ell^{-}$ decays is performed, in two ranges of the dilepton invariant-mass squared, $q^{2}$. The analysis uses beauty mesons produced in proton-proton collisions collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosit…
▽ More
The first simultaneous test of muon-electron universality using $B^{+}\rightarrow K^{+}\ell^{+}\ell^{-}$ and $B^{0}\rightarrow K^{*0}\ell^{+}\ell^{-}$ decays is performed, in two ranges of the dilepton invariant-mass squared, $q^{2}$. The analysis uses beauty mesons produced in proton-proton collisions collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of 9 $\mathrm{fb}^{-1}$. Each of the four lepton universality measurements reported is either the first in the given $q^{2}$ interval or supersedes previous LHCb measurements. The results are compatible with the predictions of the Standard Model.
△ Less
Submitted 7 November, 2023; v1 submitted 18 December, 2022;
originally announced December 2022.
-
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Authors:
Junke Wang,
Dongdong Chen,
Zuxuan Wu,
Chong Luo,
Chuanxin Tang,
Xiyang Dai,
Yucheng Zhao,
Yujia Xie,
Lu Yuan,
Yu-Gang Jiang
Abstract:
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of object…
▽ More
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of objects and cameras. In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, \ie, identifying and segmenting object instances within the video. Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank. We employ the well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-augmented matching is further performed. In addition, we introduce a multi-path fusion block to effectively combine the memory readout with multi-scale features from the instance segmentation decoder, which incorporates high-resolution instance-aware features to produce final segmentation results. Our method achieves state-of-the-art performance on DAVIS 2016/2017 val (92.6% and 87.1%), DAVIS 2017 test-dev (82.8%), and YouTube-VOS 2018/2019 val (86.3% and 86.3%), outperforming alternative methods by clear margins.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Authors:
Rui Wang,
Dongdong Chen,
Zuxuan Wu,
Yinpeng Chen,
Xiyang Dai,
Mengchen Liu,
Lu Yuan,
Yu-Gang Jiang
Abstract:
Benefiting from masked visual modeling, self-supervised video representation learning has achieved remarkable progress. However, existing methods focus on learning representations from scratch through reconstructing low-level features like raw pixel RGB values. In this paper, we propose masked video distillation (MVD), a simple yet effective two-stage masked feature modeling framework for video re…
▽ More
Benefiting from masked visual modeling, self-supervised video representation learning has achieved remarkable progress. However, existing methods focus on learning representations from scratch through reconstructing low-level features like raw pixel RGB values. In this paper, we propose masked video distillation (MVD), a simple yet effective two-stage masked feature modeling framework for video representation learning: firstly we pretrain an image (or video) model by recovering low-level features of masked patches, then we use the resulting features as targets for masked feature modeling. For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks. Visualization analysis also indicates different teachers produce different learned patterns for students. Motivated by this observation, we design a spatial-temporal co-teaching method for MVD. Specifically, we distill student models from both video teachers and image teachers by masked feature modeling. Extensive experimental results demonstrate that video transformers pretrained with spatial-temporal co-teaching outperform models distilled with a single teacher on a multitude of video datasets. Our MVD with vanilla ViT achieves state-of-the-art performance compared with previous supervised or self-supervised methods on several challenging video downstream tasks. For example, with the ViT-Large model, our MVD achieves 86.4% and 76.7% Top-1 accuracy on Kinetics-400 and Something-Something-v2, outperforming VideoMAE by 1.2% and 2.4% respectively. When a larger ViT-Huge model is adopted, MVD achieves the state-of-the-art performance with 77.3% Top-1 accuracy on Something-Something-v2 and 41.1 mAP on AVA v2.2. Code will be available at \url{https://github.com/ruiwang2021/mvd}.
△ Less
Submitted 6 March, 2023; v1 submitted 8 December, 2022;
originally announced December 2022.
-
ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Authors:
Yao Su,
Xin Dai,
Lifang He,
Xiangnan Kong
Abstract:
Deformable image registration, i.e., the task of aligning multiple images into one coordinate system by non-linear transformation, serves as an essential preprocessing step for neuroimaging data. Recent research on deformable image registration is mainly focused on improving the registration accuracy using multi-stage alignment methods, where the source image is repeatedly deformed in stages by a…
▽ More
Deformable image registration, i.e., the task of aligning multiple images into one coordinate system by non-linear transformation, serves as an essential preprocessing step for neuroimaging data. Recent research on deformable image registration is mainly focused on improving the registration accuracy using multi-stage alignment methods, where the source image is repeatedly deformed in stages by a same neural network until it is well-aligned with the target image. Conventional methods for multi-stage registration can often blur the source image as the pixel/voxel values are repeatedly interpolated from the image generated by the previous stage. However, maintaining image quality such as sharpness during image registration is crucial to medical data analysis. In this paper, we study the problem of anti-blur deformable image registration and propose a novel solution, called Anti-Blur Network (ABN), for multi-stage image registration. Specifically, we use a pair of short-term registration and long-term memory networks to learn the nonlinear deformations at each stage, where the short-term registration network learns how to improve the registration accuracy incrementally and the long-term memory network combines all the previous deformations to allow an interpolation to perform on the raw image directly and preserve image sharpness. Extensive experiments on both natural and medical image datasets demonstrated that ABN can accurately register images while preserving their sharpness. Our code and data can be found at https://github.com/anonymous3214/ABN
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Amplitude analysis of $B^0 \rightarrow \overline{D}^0 D_s^+ π^-$ and $B^+ \rightarrow D^- D_s^+ π^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1047 additional authors not shown)
Abstract:
Resonant contributions in $B^0 \rightarrow \overline{D}^0 D^+_sπ^-$ and $B^+\rightarrow D^- D^+_sπ^+$ decays are determined with an amplitude analysis, which is performed both separately and simultaneously, where in the latter case isospin symmetry between the decays is assumed. The analysis is based on data collected by the LHCb detector in proton-proton collisions at center-of-mass energies of 7…
▽ More
Resonant contributions in $B^0 \rightarrow \overline{D}^0 D^+_sπ^-$ and $B^+\rightarrow D^- D^+_sπ^+$ decays are determined with an amplitude analysis, which is performed both separately and simultaneously, where in the latter case isospin symmetry between the decays is assumed. The analysis is based on data collected by the LHCb detector in proton-proton collisions at center-of-mass energies of 7, 8 and 13 $\rm{TeV}$. The full data sample corresponds to an integrated luminosity of 9 $\rm fb^{-1}$. A doubly charged spin-0 open-charm tetraquark candidate together with a neutral partner, both with masses near $2.9\,\rm{GeV}$, are observed in the $D_sπ$ decay channel.
△ Less
Submitted 1 August, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
First observation of a doubly charged tetraquark and its neutral partner
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1047 additional authors not shown)
Abstract:
A combined amplitude analysis is performed for the decays $B^0 \rightarrow \overline{D}^0 D^+_sπ^-$ and $B^+\rightarrow D^- D^+_sπ^+$, which are related by isospin symmetry. The analysis is based on data collected by the LHCb detector in proton-proton collisions at center-of-mass energies of 7, 8 and 13$\,\rm{TeV}$. The full data sample corresponds to an integrated luminosity of 9$\,\rm{fb^{-1}}$.…
▽ More
A combined amplitude analysis is performed for the decays $B^0 \rightarrow \overline{D}^0 D^+_sπ^-$ and $B^+\rightarrow D^- D^+_sπ^+$, which are related by isospin symmetry. The analysis is based on data collected by the LHCb detector in proton-proton collisions at center-of-mass energies of 7, 8 and 13$\,\rm{TeV}$. The full data sample corresponds to an integrated luminosity of 9$\,\rm{fb^{-1}}$. Two new resonant states with masses of $2.908\pm0.011\pm0.020\,\rm{GeV}$ and widths of $0.136\pm0.023\pm0.011\,\rm{GeV}$ are observed, which decay to $D^+_sπ^+$ and $D^+_sπ^-$ respectively. The former state indicates the first observation of a doubly charged open-charm tetraquark state with minimal quark content $[c\bar{s}u\bar{d}]$, and the latter state is a neutral tetraquark composed of $[c\bar{s}\bar{u}d]$ quarks. Both states are found to have spin-parity $0^+$, and their resonant parameters are consistent with each other, which suggests that they belong to an isospin triplet.
△ Less
Submitted 1 August, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Incentive-Aware Recommender Systems in Two-Sided Markets
Authors:
Xiaowu Dai,
Wenlu Xu,
Yuan Qi,
Michael I. Jordan
Abstract:
Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We pro…
▽ More
Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We propose a novel recommender system that aligns with agents' incentives while achieving asymptotically optimal performance, as measured by regret in repeated interactions. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets, where the interactions of agents and arms are facilitated by recommender systems on online platforms. This model incorporates incentive constraints induced by agents' opportunity costs. In scenarios where opportunity costs are known to the platform, we show the existence of an incentive-compatible recommendation algorithm. This algorithm pools recommendations between a genuinely good arm and an unknown arm using a randomized and adaptive strategy. Moreover, when these opportunity costs are unknown, we introduce an algorithm that randomly pools recommendations across all arms, utilizing the cumulative loss from each arm as feedback for strategic exploration. We demonstrate that both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation. All code for using the proposed algorithms and reproducing results is made available on GitHub.
△ Less
Submitted 18 June, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Compacitification and positive mass theorem for fibered Euclidean end
Authors:
Xianzhe Dai,
Yukai Sun
Abstract:
In this note, we consider the positive mass theorem for Riemannian manifolds $(M^{n},g)$ asymptotic to $(\mathbb{R}^{k}\times X^{n-k}, g_{\mathbb{R}^{k}}+g_{X})$ for $k\geq 3$ by studying the corresponding compactification problem.
In this note, we consider the positive mass theorem for Riemannian manifolds $(M^{n},g)$ asymptotic to $(\mathbb{R}^{k}\times X^{n-k}, g_{\mathbb{R}^{k}}+g_{X})$ for $k\geq 3$ by studying the corresponding compactification problem.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods
Authors:
Xiang Dai,
Sarvnaz Karimi
Abstract:
Information Extraction from scientific literature can be challenging due to the highly specialised nature of such text. We describe our entity recognition methods developed as part of the DEAL (Detecting Entities in the Astrophysics Literature) shared task. The aim of the task is to build a system that can identify Named Entities in a dataset composed by scholarly articles from astrophysics litera…
▽ More
Information Extraction from scientific literature can be challenging due to the highly specialised nature of such text. We describe our entity recognition methods developed as part of the DEAL (Detecting Entities in the Astrophysics Literature) shared task. The aim of the task is to build a system that can identify Named Entities in a dataset composed by scholarly articles from astrophysics literature. We planned our participation such that it enables us to conduct an empirical comparison between word-based tagging and span-based classification methods. When evaluated on two hidden test sets provided by the organizer, our best-performing submission achieved $F_1$ scores of 0.8307 (validation phase) and 0.7990 (testing phase).
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Self-Supervised Learning based on Heat Equation
Authors:
Yinpeng Chen,
Xiyang Dai,
Dongdong Chen,
Mengchen Liu,
Lu Yuan,
Zicheng Liu,
Youzuo Lin
Abstract:
This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplif…
▽ More
This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat.
QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6% vs. 52.9%). When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively.
This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Differentiable Fuzzy $\mathcal{ALC}$: A Neural-Symbolic Representation Language for Symbol Grounding
Authors:
Xuan Wu,
Xinhao Zhu,
Yizheng Zhao,
Xinyu Dai
Abstract:
Neural-symbolic computing aims at integrating robust neural learning and sound symbolic reasoning into a single framework, so as to leverage the complementary strengths of both of these, seemingly unrelated (maybe even contradictory) AI paradigms. The central challenge in neural-symbolic computing is to unify the formulation of neural learning and symbolic reasoning into a single framework with co…
▽ More
Neural-symbolic computing aims at integrating robust neural learning and sound symbolic reasoning into a single framework, so as to leverage the complementary strengths of both of these, seemingly unrelated (maybe even contradictory) AI paradigms. The central challenge in neural-symbolic computing is to unify the formulation of neural learning and symbolic reasoning into a single framework with common semantics, that is, to seek a joint representation between a neural model and a logical theory that can support the basic grounding learned by the neural model and also stick to the semantics of the logical theory. In this paper, we propose differentiable fuzzy $\mathcal{ALC}$ (DF-$\mathcal{ALC}$) for this role, as a neural-symbolic representation language with the desired semantics. DF-$\mathcal{ALC}$ unifies the description logic $\mathcal{ALC}$ and neural models for symbol grounding; in particular, it infuses an $\mathcal{ALC}$ knowledge base into neural models through differentiable concept and role embeddings. We define a hierarchical loss to the constraint that the grounding learned by neural models must be semantically consistent with $\mathcal{ALC}$ knowledge bases. And we find that capturing the semantics in grounding solely by maximizing satisfiability cannot revise grounding rationally. We further define a rule-based loss for DF adapting to symbol grounding problems. The experiment results show that DF-$\mathcal{ALC}$ with rule-based loss can improve the performance of image object detectors in an unsupervised learning way, even in low-resource situations.
△ Less
Submitted 1 December, 2022; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Open charm production and asymmetry in $p$Ne collisions at $\sqrt{s_{\scriptscriptstyle\rm NN}} =$ 68.5 GeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1045 additional authors not shown)
Abstract:
A measurement of $D^0$ meson production by the LHCb experiment in its fixed-target configuration is presented. The production of $D^0$ mesons is studied with a beam of 2.5 TeV protons colliding on a gaseous neon target at rest, corresponding to a nucleon-nucleon centre-of-mass energy of $\sqrt{s_{\rm NN}}$ = 68.5 GeV. The sum of the $D^0$ and ${\overline D^0}$ production cross-section in $p$Ne col…
▽ More
A measurement of $D^0$ meson production by the LHCb experiment in its fixed-target configuration is presented. The production of $D^0$ mesons is studied with a beam of 2.5 TeV protons colliding on a gaseous neon target at rest, corresponding to a nucleon-nucleon centre-of-mass energy of $\sqrt{s_{\rm NN}}$ = 68.5 GeV. The sum of the $D^0$ and ${\overline D^0}$ production cross-section in $p$Ne collisions in the centre-of-mass rapidity range $y^{\star}\in [-2.29, 0]$ is found to be $σ_{D^{0}}^{y^\star \in [-2.29, 0]} = 48.2 \pm 0.3 \pm 4.5 \,μ\textrm{b/nucleon}$ where the first uncertainty is statistical and the second is systematic. The $D^0-{\overline D^0}$ production asymmetry is also evaluated and suggests a trend towards negative values at large negative $y^{\star}$. The considered models do not account precisely for all the features observed in the LHCb data, but theoretical predictions including 1$\%$ intrinsic charm and 10$\%$ recombination contributions better describe the data than the other models considered.
△ Less
Submitted 20 February, 2024; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Long-lived particle reconstruction downstream of the LHCb magnet
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1110 additional authors not shown)
Abstract:
Charged-particle trajectories are usually reconstructed with the LHCb detector using combined information from the tracking devices placed upstream and downstream of the 4\,T\,m dipole magnet. Trajectories reconstructed using only information from the tracker downstream of the dipole magnet, which are referred to as T tracks, have not been used for physics analysis to date due to their limited mom…
▽ More
Charged-particle trajectories are usually reconstructed with the LHCb detector using combined information from the tracking devices placed upstream and downstream of the 4\,T\,m dipole magnet. Trajectories reconstructed using only information from the tracker downstream of the dipole magnet, which are referred to as T tracks, have not been used for physics analysis to date due to their limited momentum resolution. The challenges of the reconstruction of long-lived particles using T tracks for use in physics analyses are discussed and solutions are proposed. The feasibility and the tracking performance are studied using samples of long-lived $\varLambda$ and $K_S^0$ hadrons decaying between 6.0 and 7.6 metres downstream of the proton-proton collision point, thereby traversing most of the magnetic field region and providing maximal sensitivity to magnetic and electric dipole moments. The reconstruction can be expanded below this range for use in direct searches of exotic long-lived particles. The data used in this analysis have been recorded between 2015 and 2018 and correspond to an integrated luminosity of 6\,fb$^{-1}$. The results obtained demonstrate the possibility to further extend the fiducial volume and the physics reach of the LHCb experiment.
△ Less
Submitted 27 August, 2024; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
Authors:
Haoran You,
Yunyang Xiong,
Xiaoliang Dai,
Bichen Wu,
Peizhao Zhang,
Haoqi Fan,
Peter Vajda,
Yingyan Celine Lin
Abstract:
Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifi…
▽ More
Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.
△ Less
Submitted 25 July, 2024; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Simple Vertex Algebras Arising From Congruence Subgroups
Authors:
Xuanzhong Dai,
Bailin Song
Abstract:
Chiral de Rham complex introduced by Malikov et al. in 1998, is a sheaf of vertex algebras on any complex analytic manifold or non-singular algebraic variety. Starting from the vertex algebra of global sections of chiral de Rham complex on the upper half plane, we consider the subspace of $Γ$-invariant sections that are meromorphic at the cusps. The space is again a vertex operator algebra, with a…
▽ More
Chiral de Rham complex introduced by Malikov et al. in 1998, is a sheaf of vertex algebras on any complex analytic manifold or non-singular algebraic variety. Starting from the vertex algebra of global sections of chiral de Rham complex on the upper half plane, we consider the subspace of $Γ$-invariant sections that are meromorphic at the cusps. The space is again a vertex operator algebra, with a linear basis consisting of lifting formulas of meromorphic modular forms. We will describe two types of lifting formulas, and generalize the Rankin-Cohen bracket to the meromorphic modular forms. As an application, we will show that the vertex algebras constructed by congruence subgroups are simple.
△ Less
Submitted 16 August, 2024; v1 submitted 18 November, 2022;
originally announced November 2022.
-
A Bird's-eye View of Reranking: from List Level to Page Level
Authors:
Yunjia Xi,
Jianghao Lin,
Weiwen Liu,
Xinyi Dai,
Weinan Zhang,
Rui Zhang,
Ruiming Tang,
Yong Yu
Abstract:
Reranking, as the final stage of multi-stage recommender systems, refines the initial lists to maximize the total utility. With the development of multimedia and user interface design, the recommendation page has evolved to a multi-list style. Separately employing traditional list-level reranking methods for different lists overlooks the inter-list interactions and the effect of different page for…
▽ More
Reranking, as the final stage of multi-stage recommender systems, refines the initial lists to maximize the total utility. With the development of multimedia and user interface design, the recommendation page has evolved to a multi-list style. Separately employing traditional list-level reranking methods for different lists overlooks the inter-list interactions and the effect of different page formats, thus yielding suboptimal reranking performance. Moreover, simply applying a shared network for all the lists fails to capture the commonalities and distinctions in user behaviors on different lists. To this end, we propose to draw a bird's-eye view of \textbf{page-level reranking} and design a novel Page-level Attentional Reranking (PAR) model. We introduce a hierarchical dual-side attention module to extract personalized intra- and inter-list interactions. A spatial-scaled attention network is devised to integrate the spatial relationship into pairwise item influences, which explicitly models the page format. The multi-gated mixture-of-experts module is further applied to capture the commonalities and differences of user behaviors between different lists. Extensive experiments on a public dataset and a proprietary dataset show that PAR significantly outperforms existing baseline models.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
3D-Aware Encoding for Style-based Neural Radiance Fields
Authors:
Yu-Jhe Li,
Tao Xu,
Bichen Wu,
Ningyuan Zheng,
Xiaoliang Dai,
Albert Pumarola,
Peizhao Zhang,
Peter Vajda,
Kris Kitani
Abstract:
We tackle the task of NeRF inversion for style-based neural radiance fields, (e.g., StyleNeRF). In the task, we aim to learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the iden…
▽ More
We tackle the task of NeRF inversion for style-based neural radiance fields, (e.g., StyleNeRF). In the task, we aim to learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views. This requires the latent code obtained from the single-view image to be invariant across multiple views. To address this new challenge, we propose a two-stage encoder for style-based NeRF inversion. In the first stage, we introduce a base encoder that converts the input image to a latent code. To ensure the latent code is view-invariant and is able to synthesize 3D consistent novel view images, we utilize identity contrastive learning to train the base encoder. Second, to better preserve the identity of the input image, we introduce a refining encoder to refine the latent code and add finer details to the output image. Importantly note that the novelty of this model lies in the design of its first-stage encoder which produces the closest latent code lying on the latent manifold and thus the refinement in the second stage would be close to the NeRF manifold. Through extensive experiments, we demonstrate that our proposed two-stage encoder qualitatively and quantitatively exhibits superiority over the existing encoders for inversion in both image reconstruction and novel-view rendering.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
First observation of the $B^+ \rightarrow D_s^+ D_s^- K^+$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1038 additional authors not shown)
Abstract:
The $B^+ \rightarrow D_s^+ D_s^- K^+$ decay is observed for the first time using proton-proton collision data collected by the LHCb detector at centre-of-mass energies of $7$, $8$ and $13\, \text{TeV}$, corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$. Its branching fraction relative to that of the $B^{+} \rightarrow D^{+} D^{-} K^{+}$ decay is measured to be…
▽ More
The $B^+ \rightarrow D_s^+ D_s^- K^+$ decay is observed for the first time using proton-proton collision data collected by the LHCb detector at centre-of-mass energies of $7$, $8$ and $13\, \text{TeV}$, corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$. Its branching fraction relative to that of the $B^{+} \rightarrow D^{+} D^{-} K^{+}$ decay is measured to be $$\frac{B\left(B^{+} \rightarrow D_s^{+} D_s^{-} K^{+}\right)}{B\left(B^{+} \rightarrow D^{+} D^{-} K^{+}\right)}=0.525 \pm 0.033 \pm 0.027 \pm 0.034,$$ where the first uncertainty is statistical, the second systematic, and the third is due to the uncertainties on the branching fractions of the $D_s^{\pm} \rightarrow K^{\mp} K^{\pm} π^{\pm}$ and $D^{\pm} \rightarrow K^{\mp} π^{\pm} π^{\pm}$ decays. This measurement fills an experimental gap in the knowledge of the family of Cabibbo$-$favoured $\bar{b} \rightarrow \bar{c} c \bar{s}$ transitions and opens the path for unique studies of spectroscopy in future.
△ Less
Submitted 7 November, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Optimizing for periodicity: a model-independent approach to flux crosstalk calibration for superconducting circuits
Authors:
X. Dai,
R. Trappen,
R. Yang,
S. M. Disseler,
J. I. Basham,
J. Gibson,
A. J. Melville,
B. M. Niedzielski,
R. Das,
D. K. Kim,
J. L. Yoder,
S. J. Weber,
C. F. Hirjibehedin,
D. A. Lidar,
A. Lupascu
Abstract:
Flux tunability is an important engineering resource for superconducting circuits. Large-scale quantum computers based on flux-tunable superconducting circuits face the problem of flux crosstalk, which needs to be accurately calibrated to realize high-fidelity quantum operations. Typical calibration methods either assume that circuit elements can be effectively decoupled and simple models can be a…
▽ More
Flux tunability is an important engineering resource for superconducting circuits. Large-scale quantum computers based on flux-tunable superconducting circuits face the problem of flux crosstalk, which needs to be accurately calibrated to realize high-fidelity quantum operations. Typical calibration methods either assume that circuit elements can be effectively decoupled and simple models can be applied, or require a large amount of data. Such methods become ineffective as the system size increases and circuit interactions become stronger. Here we propose a new method for calibrating flux crosstalk, which is independent of the underlying circuit model. Using the fundamental property that superconducting circuits respond periodically to external fluxes, crosstalk calibration of N flux channels can be treated as N independent optimization problems, with the objective functions being the periodicity of a measured signal depending on the compensation parameters. We demonstrate this method on a small-scale quantum annealing circuit based on superconducting flux qubits, achieving comparable accuracy with previous methods. We also show that the objective function usually has a nearly convex landscape, allowing efficient optimization.
△ Less
Submitted 3 February, 2024; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Unsupervised Learning of Structured Representations via Closed-Loop Transcription
Authors:
Shengbang Tong,
Xili Dai,
Yubei Chen,
Mingyang Li,
Zengyi Li,
Brent Yi,
Yann LeCun,
Yi Ma
Abstract:
This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently p…
▽ More
This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently proposed \textit{closed-loop transcription} framework, known as CTRL, to the unsupervised setting. This entails solving a constrained maximin game over a rate reduction objective that expands features of all samples while compressing features of augmentations of each sample. Through this process, we see discriminative low-dimensional structures emerge in the resulting representations. Under comparable experimental conditions and network complexities, we demonstrate that these structured representations enable classification performance close to state-of-the-art unsupervised discriminative representations, and conditionally generated image quality significantly higher than that of state-of-the-art unsupervised generative models. Source code can be found at https://github.com/Delay-Xili/uCTRL.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Observation of a resonant structure near the $D_s^+ D_s^-$ threshold in the $B^+\to D_s^+ D_s^- K^+$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1038 additional authors not shown)
Abstract:
An amplitude analysis of the $B^+\to D_s^+ D_s^- K^+$ decay is carried out to study for the first time its intermediate resonant contributions, using proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV. A near-threshold peaking structure, referred to as $X(3960)$, is observed in the $D_s^+ D_s^-$ invariant-mass spectrum with significance grea…
▽ More
An amplitude analysis of the $B^+\to D_s^+ D_s^- K^+$ decay is carried out to study for the first time its intermediate resonant contributions, using proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV. A near-threshold peaking structure, referred to as $X(3960)$, is observed in the $D_s^+ D_s^-$ invariant-mass spectrum with significance greater than 12 standard deviations. The mass, width and the quantum numbers of the structure are measured to be $3956\pm5\pm10$ MeV, $43\pm13\pm8$ MeV and $J^{PC}=0^{++}$, respectively, where the first uncertainties are statistical and the second systematic. The properties of the new structure are consistent with recent theoretical predictions for a state composed of $c\bar{c}s\bar{s}$ quarks. Evidence for an additional structure is found around 4140 MeV in the $D_s^+ D_s^-$ invariant mass, which might be caused either by a new resonance with the $0^{++}$ assignment or by a $J/ψφ\leftrightarrow D_s^+ D_s^-$ coupled-channel effect.
△ Less
Submitted 18 August, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Observation of the $B^0_s\!\to D^{*+}D^{*-}$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
S. Aiola,
Z. Ajaltouni,
S. Akar,
K. Akiba,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1049 additional authors not shown)
Abstract:
The first observation of the $B^0_s\!\to D^{*+}D^{*-}$ decay and the measurement of its branching ratio relative to the $B^0\!\to D^{*+}D^{*-}$ decay are presented. The data sample used corresponds to an integrated luminosity of $9\,\text{fb}^{-1}$ of proton-proton collisions recorded by the LHCb experiment at centre-of-mass energies of 7, 8 and $13\,\text{TeV}$ between 2011 and 2018. The decay is…
▽ More
The first observation of the $B^0_s\!\to D^{*+}D^{*-}$ decay and the measurement of its branching ratio relative to the $B^0\!\to D^{*+}D^{*-}$ decay are presented. The data sample used corresponds to an integrated luminosity of $9\,\text{fb}^{-1}$ of proton-proton collisions recorded by the LHCb experiment at centre-of-mass energies of 7, 8 and $13\,\text{TeV}$ between 2011 and 2018. The decay is observed with more than $10$ standard deviations and the time-integrated ratio of branching fractions is determined to be \begin{align*}
\frac{\mathcal{B}(B^0_s\!\to D^{*+}D^{*-})}{\mathcal{B}(B^0\!\to D^{*+}D^{*-})} = 0.269 \pm 0.032 \pm 0.011 \pm 0.008\, , \end{align*} where the first uncertainty is statistical, the second systematic and the third due to the uncertainty of the fragmentation fraction ratio $f_s/f_d$. The $B^0_s\!\to D^{*+}D^{*-}$ branching fraction is calculated to be \begin{align*}
\mathcal{B}(B^0_s\!\to D^{*+}D^{*-}) = (2.15 \pm 0.26 \pm 0.09 \pm 0.06 \pm 0.16)\times 10^{-4} \,, \end{align*} where the fourth uncertainty is due to the $B^0\!\to D^{*+}D^{*-}$branching fraction. These results are calculated using the average $B^0_s$ meson lifetime in simulation. Correction factors are reported for scenarios where either a purely heavy or a purely light $B^0_s$ eigenstate is considered.
△ Less
Submitted 17 July, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Revisiting Sparse Convolutional Model for Visual Recognition
Authors:
Xili Dai,
Mingyang Li,
Pengyuan Zhai,
Shengbang Tong,
Xingjian Gao,
Shao-Lun Huang,
Zhihui Zhu,
Chong You,
Yi Ma
Abstract:
Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical inte…
▽ More
Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks. By leveraging stable recovery property of sparse modeling, we further show that such models can be much more robust to input corruptions as well as adversarial perturbations in testing through a simple proper trade-off between sparse regularization and data reconstruction terms. Source code can be found at https://github.com/Delay-Xili/SDNet.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Structure-Unified M-Tree Coding Solver for MathWord Problem
Authors:
Bin Wang,
Jiangzhou Ju,
Yang Fan,
Xinyu Dai,
Shujian Huang,
Jiajun Chen
Abstract:
As one of the challenging NLP tasks, designing math word problem (MWP) solvers has attracted increasing research attention for the past few years. In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance. However, the expressions corresponding to a MWP are often diverse (e.g.…
▽ More
As one of the challenging NLP tasks, designing math word problem (MWP) solvers has attracted increasing research attention for the past few years. In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance. However, the expressions corresponding to a MWP are often diverse (e.g., $n_1+n_2 \times n_3-n_4$, $n_3\times n_2-n_4+n_1$, etc.), and so are the corresponding binary trees, which creates difficulties in model learning due to the non-deterministic output space. In this paper, we propose the Structure-Unified M-Tree Coding Solver (SUMC-Solver), which applies a tree with any M branches (M-tree) to unify the output structures. To learn the M-tree, we use a mapping to convert the M-tree into the M-tree codes, where codes store the information of the paths from tree root to leaf nodes and the information of leaf nodes themselves, and then devise a Sequence-to-Code (seq2code) model to generate the codes. Experimental results on the widely used MAWPS and Math23K datasets have demonstrated that SUMC-Solver not only outperforms several state-of-the-art models under similar experimental settings but also performs much better under low-resource conditions.
△ Less
Submitted 25 October, 2022; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Measurement of the ratio of branching fractions $\mathcal{B}(B_c^+ \to B_s^0 π^+)/\mathcal{B}(B_c^+ \to J/ψπ^+)$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1046 additional authors not shown)
Abstract:
The ratio of branching fractions of $B_c^+ \to B_s^0 π^+$ and $B_c^+ \to J/ψπ^+$ decays is measured with proton-proton collision data of a centre-of-mass energy of $13\text{TeV}$. The data were collected with the LHCb experiment during 2016--2018, corresponding to an integrated luminosity of $5.4 \text{fb}^{-1}$. The $B_s^0$ mesons are reconstructed via the decays $B_s^0 \to J/ψφ$ and…
▽ More
The ratio of branching fractions of $B_c^+ \to B_s^0 π^+$ and $B_c^+ \to J/ψπ^+$ decays is measured with proton-proton collision data of a centre-of-mass energy of $13\text{TeV}$. The data were collected with the LHCb experiment during 2016--2018, corresponding to an integrated luminosity of $5.4 \text{fb}^{-1}$. The $B_s^0$ mesons are reconstructed via the decays $B_s^0 \to J/ψφ$ and $B_s^0 \to D_s^- π^+$. The ratio of branching fractions is measured to be $\mathcal{B}(B_c^+ \to B_s^0 π^+)/\mathcal{B}(B_c^+ \to J/ψπ^+) = 91 \pm 10 \pm 8 \pm 3$ where the first uncertainty is statistical, the second is systematic and the third is due to the knowledge of the branching fractions of the intermediate state decays.
△ Less
Submitted 18 July, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Authors:
Zheng Ma,
Shi Zong,
Mianzhi Pan,
Jianbing Zhang,
Shujian Huang,
Xinyu Dai,
Jiajun Chen
Abstract:
In recent years, vision and language pre-training (VLP) models have advanced the state-of-the-art results in a variety of cross-modal downstream tasks. Aligning cross-modal semantics is claimed to be one of the essential capabilities of VLP models. However, it still remains unclear about the inner working mechanism of alignment in VLP models. In this paper, we propose a new probing method that is…
▽ More
In recent years, vision and language pre-training (VLP) models have advanced the state-of-the-art results in a variety of cross-modal downstream tasks. Aligning cross-modal semantics is claimed to be one of the essential capabilities of VLP models. However, it still remains unclear about the inner working mechanism of alignment in VLP models. In this paper, we propose a new probing method that is based on image captioning to first empirically study the cross-modal semantics alignment of VLP models. Our probing method is built upon the fact that given an image-caption pair, the VLP models will give a score, indicating how well two modalities are aligned; maximizing such scores will generate sentences that VLP models believe are of good alignment. Analyzing these sentences thus will reveal in what way different modalities are aligned and how well these alignments are in VLP models. We apply our probing method to five popular VLP models, including UNITER, ROSITA, ViLBERT, CLIP, and LXMERT, and provide a comprehensive analysis of the generated captions guided by these models. Our results show that VLP models (1) focus more on just aligning objects with visual words, while neglecting global semantics; (2) prefer fixed sentence patterns, thus ignoring more important textual information including fluency and grammar; and (3) deem the captions with more visual words are better aligned with images. These findings indicate that VLP models still have weaknesses in cross-modal semantics alignment and we hope this work will draw researchers' attention to such problems when designing a new VLP model.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Token Merging: Your ViT But Faster
Authors:
Daniel Bolya,
Cheng-Yang Fu,
Xiaoliang Dai,
Peizhao Zhang,
Christoph Feichtenhofer,
Judy Hoffman
Abstract:
We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. ToMe gradually combines similar tokens in a transformer using a general and light-weight matching algorithm that is as fast as pruning while being more accurate. Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and…
▽ More
We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. ToMe gradually combines similar tokens in a transformer using a general and light-weight matching algorithm that is as fast as pruning while being more accurate. Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and 2.2x the throughput of ViT-L on video with only a 0.2-0.3% accuracy drop in each case. ToMe can also easily be applied during training, improving in practice training speed up to 2x for MAE fine-tuning on video. Training with ToMe further minimizes accuracy drop, leading to 2x the throughput of ViT-B on audio for only a 0.4% mAP drop. Qualitatively, we find that ToMe merges object parts into one token, even over multiple frames of video. Overall, ToMe's accuracy and speed are competitive with state-of-the-art on images, video, and audio.
△ Less
Submitted 1 March, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems
Authors:
Zhengbang Zhu,
Rongjun Qin,
Junjie Huang,
Xinyi Dai,
Yang Yu,
Yong Yu,
Weinan Zhang
Abstract:
Recommender systems are expected to be assistants that help human users find relevant information automatically without explicit queries. As recommender systems evolve, increasingly sophisticated learning techniques are applied and have achieved better performance in terms of user engagement metrics such as clicks and browsing time. The increase in the measured performance, however, can have two p…
▽ More
Recommender systems are expected to be assistants that help human users find relevant information automatically without explicit queries. As recommender systems evolve, increasingly sophisticated learning techniques are applied and have achieved better performance in terms of user engagement metrics such as clicks and browsing time. The increase in the measured performance, however, can have two possible attributions: a better understanding of user preferences, and a more proactive ability to utilize human bounded rationality to seduce user over-consumption. A natural following question is whether current recommendation algorithms are manipulating user preferences. If so, can we measure the manipulation level? In this paper, we present a general framework for benchmarking the degree of manipulations of recommendation algorithms, in both slate recommendation and sequential recommendation scenarios. The framework consists of four stages, initial preference calculation, training data collection, algorithm training and interaction, and metrics calculation that involves two proposed metrics. We benchmark some representative recommendation algorithms in both synthetic and real-world datasets under the proposed framework. We have observed that a high online click-through rate does not necessarily mean a better understanding of user initial preference, but ends in prompting users to choose more documents they initially did not favor. Moreover, we find that the training data have notable impacts on the manipulation degrees, and algorithms with more powerful modeling abilities are more sensitive to such impacts. The experiments also verified the usefulness of the proposed metrics for measuring the degree of manipulations. We advocate that future recommendation algorithm studies should be treated as an optimization problem with constrained user preference manipulations.
△ Less
Submitted 18 December, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification
Authors:
Ilias Chalkidis,
Xiang Dai,
Manos Fergadiotis,
Prodromos Malakasiotis,
Desmond Elliott
Abstract:
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms of efficiency, but Hierarchical Attention Transformer (HAT) models are a vastly understudied alternative. We develop and release fully pre-trained HAT models tha…
▽ More
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms of efficiency, but Hierarchical Attention Transformer (HAT) models are a vastly understudied alternative. We develop and release fully pre-trained HAT models that use segment-wise followed by cross-segment encoders and compare them with Longformer models and partially pre-trained HATs. In several long document downstream classification tasks, our best HAT model outperforms equally-sized Longformer models while using 10-20% less GPU memory and processing documents 40-45% faster. In a series of ablation studies, we find that HATs perform best with cross-segment contextualization throughout the model than alternative configurations that implement either early or late cross-segment contextualization. Our code is on GitHub: https://github.com/coastalcph/hierarchical-transformers.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Thurston's asymmetric metrics for Anosov representations
Authors:
León Carvajales,
Xian Dai,
Beatrice Pozzetti,
Anna Wienhard
Abstract:
We provide a good dynamical framework allowing to generalize Thurston's asymmetric metric and the associated Finsler norm from Teichmüller space to large classes of Anosov representations. In many cases, including the space of Hitchin representations, this gives a (possibly asymmetric) Finsler distance. In some cases we explicitly compute the associated Finsler norm.
We provide a good dynamical framework allowing to generalize Thurston's asymmetric metric and the associated Finsler norm from Teichmüller space to large classes of Anosov representations. In many cases, including the space of Hitchin representations, this gives a (possibly asymmetric) Finsler distance. In some cases we explicitly compute the associated Finsler norm.
△ Less
Submitted 7 May, 2024; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Label-Driven Denoising Framework for Multi-Label Few-Shot Aspect Category Detection
Authors:
Fei Zhao,
Yuchen Shen,
Zhen Wu,
Xinyu Dai
Abstract:
Multi-Label Few-Shot Aspect Category Detection (FS-ACD) is a new sub-task of aspect-based sentiment analysis, which aims to detect aspect categories accurately with limited training instances. Recently, dominant works use the prototypical network to accomplish this task, and employ the attention mechanism to extract keywords of aspect category from the sentences to produce the prototype for each a…
▽ More
Multi-Label Few-Shot Aspect Category Detection (FS-ACD) is a new sub-task of aspect-based sentiment analysis, which aims to detect aspect categories accurately with limited training instances. Recently, dominant works use the prototypical network to accomplish this task, and employ the attention mechanism to extract keywords of aspect category from the sentences to produce the prototype for each aspect. However, they still suffer from serious noise problems: (1) due to lack of sufficient supervised data, the previous methods easily catch noisy words irrelevant to the current aspect category, which largely affects the quality of the generated prototype; (2) the semantically-close aspect categories usually generate similar prototypes, which are mutually noisy and confuse the classifier seriously. In this paper, we resort to the label information of each aspect to tackle the above problems, along with proposing a novel Label-Driven Denoising Framework (LDF). Extensive experimental results show that our framework achieves better performance than other state-of-the-art methods.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Authors:
Feng Liang,
Bichen Wu,
Xiaoliang Dai,
Kunpeng Li,
Yinan Zhao,
Hang Zhang,
Peizhao Zhang,
Peter Vajda,
Diana Marculescu
Abstract:
Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-trained vision-language models, e.g., CLIP, to classify masked regions. We identify the performance bottleneck of this paradigm to be the pre-tra…
▽ More
Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-trained vision-language models, e.g., CLIP, to classify masked regions. We identify the performance bottleneck of this paradigm to be the pre-trained CLIP model, since it does not perform well on masked images. To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions. We collect training data by mining an existing image-caption dataset (e.g., COCO Captions), using CLIP to match masked image regions to nouns in the image captions. Compared with the more precise and manually annotated segmentation labels with fixed classes (e.g., COCO-Stuff), we find our noisy but diverse dataset can better retain CLIP's generalization ability. Along with finetuning the entire model, we utilize the "blank" areas in masked images using a method we dub mask prompt tuning. Experiments demonstrate mask prompt tuning brings significant improvement without modifying any weights of CLIP, and it can further improve a fully finetuned model. In particular, when trained on COCO and evaluated on ADE20K-150, our best model achieves 29.6% mIoU, which is +8.5% higher than the previous state-of-the-art. For the first time, open-vocabulary generalist models match the performance of supervised specialist models in 2017 without dataset-specific adaptations.
△ Less
Submitted 1 April, 2023; v1 submitted 8 October, 2022;
originally announced October 2022.
-
Truncated atomic plane wave method for the subband structure calculations of Moiré systems
Authors:
Wangqian Miao,
Chu Li,
Xu Han,
Ding Pan,
Xi Dai
Abstract:
We propose a highly efficient and accurate numerical scheme named Truncated Atomic Plane Wave (TAPW) method to determine the subband structure of Twisted Bilayer Graphene (TBG) inspired by BM model. Our method utilizes real space information of carbon atoms in the moiré unit cell and projects the full tight binding Hamiltonian into a much smaller subspace using atomic plane waves. We present accur…
▽ More
We propose a highly efficient and accurate numerical scheme named Truncated Atomic Plane Wave (TAPW) method to determine the subband structure of Twisted Bilayer Graphene (TBG) inspired by BM model. Our method utilizes real space information of carbon atoms in the moiré unit cell and projects the full tight binding Hamiltonian into a much smaller subspace using atomic plane waves. We present accurate electronic band structures of TBG in a wide range of twist angles together with detailed moiré potential and screened Coulomb interaction at the first magic angle using our new method. Furthermore, we generalize our formalism to solve the problem of low frequency moiré phonons in TBG.
△ Less
Submitted 15 February, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
The Answer to Baggett's Problem is Affirmative
Authors:
Xingde Dai
Abstract:
Let $ψ$ be a Parceval wavelet in $L^2 (\R)$ with the space of negative dilates $V(ψ)$. The intersection of the dilates $V(ψ)$ is the zero space. In other words, we have \begin{align*}
\bigcap_{n\in\Z} D^n \overline{\textrm{span}}\{D^{\textrm{-}m} T^\ell ψ\mid m\geq 0, m,\ell\in\Z\}=\{0\}. \end{align*}
Let $ψ$ be a Parceval wavelet in $L^2 (\R)$ with the space of negative dilates $V(ψ)$. The intersection of the dilates $V(ψ)$ is the zero space. In other words, we have \begin{align*}
\bigcap_{n\in\Z} D^n \overline{\textrm{span}}\{D^{\textrm{-}m} T^\ell ψ\mid m\geq 0, m,\ell\in\Z\}=\{0\}. \end{align*}
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Revealing AGNs Through TESS Variability
Authors:
Helena P. Treiber,
Jason T. Hinkle,
Michael M. Fausnaugh,
Benjamin J. Shappee,
Christopher S. Kochanek,
Patrick J. Vallely,
Katie Auchettl,
Thomas W. S. Holoien,
Anna V. Payne,
Xinyu Dai
Abstract:
We used Transiting Exoplanet Survey Satellite (TESS) data to identify 29 candidate active galactic nuclei (AGNs) through their optical variability. The high-cadence, high-precision TESS light curves present a unique opportunity for the identification of AGNs, including those not selected through other methods. Of the candidates, we found that 18 have either previously been identified as AGNs in th…
▽ More
We used Transiting Exoplanet Survey Satellite (TESS) data to identify 29 candidate active galactic nuclei (AGNs) through their optical variability. The high-cadence, high-precision TESS light curves present a unique opportunity for the identification of AGNs, including those not selected through other methods. Of the candidates, we found that 18 have either previously been identified as AGNs in the literature or could have been selected based on emission-line diagnostics, mid-IR colors, or X-ray luminosity. AGNs in low-mass galaxies offer a window into supermassive black hole (SMBH) and galaxy co-evolution and 8 of the 29 candidates have estimated black hole masses $\mathrm{\lesssim 10^{6} M_{\odot}}$. The low-mass galaxies NGC 4395 and NGC 4449 are two of our five "high-confidence" candidates. By applying our methodology to the entire TESS main and extended mission datasets, we expect to identify $\sim$45 more AGN candidates, of which $\sim$26 will be new and $\sim$8 will be in low-mass galaxies.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Zero-Shot 3D Drug Design by Sketching and Generating
Authors:
Siyu Long,
Yi Zhou,
Xinyu Dai,
Hao Zhou
Abstract:
Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we p…
▽ More
Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we propose the zero-shot drug design method DESERT (Drug dEsign by SkEtching and geneRaTing). Specifically, DESERT splits the design process into two stages: sketching and generating, and bridges them with the molecular shape. The two-stage fashion enables our method to utilize the large-scale molecular database to reduce the need for experimental data and docking simulation. Experiments show that DESERT achieves a new state-of-the-art at a fast speed.
△ Less
Submitted 4 October, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Heavy fermion representation for twisted bilayer graphene systems
Authors:
Hao Shi,
Xi Dai
Abstract:
We construct a heavy fermion representation for twisted bilayer graphene (TBG) systems. Two local orbitals (per spin/valley) are analytically found, which are exactly the maximally localized zero modes of the continuum Hamiltonian near the AA-stacking center. They have similar properties to the Wannier functions in [arXiv:2111.05865v2], but also have a clear interpretation as the zeroth pseudo Lan…
▽ More
We construct a heavy fermion representation for twisted bilayer graphene (TBG) systems. Two local orbitals (per spin/valley) are analytically found, which are exactly the maximally localized zero modes of the continuum Hamiltonian near the AA-stacking center. They have similar properties to the Wannier functions in [arXiv:2111.05865v2], but also have a clear interpretation as the zeroth pseudo Landau levels (ZLL) of Dirac fermions under the uniform strain field created by twisting [arXiv:1810.03103v3]. The electronic states of TBG can be viewed as the hybridization between these ZLL orbitals and other itinerant states which can be obtained following the standard procedure of orthogonalized plane wave method. The "heavy fermion" model for TBG separates the strongly correlated components from the itinerant components and provides a solid base for the comprehensive understanding of the exotic physics in TBG.
△ Less
Submitted 10 November, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Nonparametric Estimation via Partial Derivatives
Authors:
Xiaowu Dai
Abstract:
Traditional nonparametric estimation methods often lead to a slow convergence rate in large dimensions and require unrealistically enormous sizes of datasets for reliable conclusions. We develop an approach based on partial derivatives, either observed or estimated, to effectively estimate the function at near-parametric convergence rates. The novel approach and computational algorithm could lead…
▽ More
Traditional nonparametric estimation methods often lead to a slow convergence rate in large dimensions and require unrealistically enormous sizes of datasets for reliable conclusions. We develop an approach based on partial derivatives, either observed or estimated, to effectively estimate the function at near-parametric convergence rates. The novel approach and computational algorithm could lead to methods useful to practitioners in many areas of science and engineering. Our theoretical results reveal a behavior universal to this class of nonparametric estimation problems. We explore a general setting involving tensor product spaces and build upon the smoothing spline analysis of variance (SS-ANOVA) framework. For $d$-dimensional models under full interaction, the optimal rates with gradient information on $p$ covariates are identical to those for the $(d-p)$-interaction models without gradients and, therefore, the models are immune to the "curse of interaction." For additive models, the optimal rates using gradient information are root-$n$, thus achieving the "parametric rate." We demonstrate aspects of the theoretical results through synthetic and real data applications.
△ Less
Submitted 18 August, 2024; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Hydra Attention: Efficient Attention with Many Heads
Authors:
Daniel Bolya,
Cheng-Yang Fu,
Xiaoliang Dai,
Peizhao Zhang,
Judy Hoffman
Abstract:
While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the number of tokens, which in turn, scales quadratically with the image size. On larger images (e.g., 1080p), over 60% of the total computation in the network is spent solely on creating and apply…
▽ More
While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the number of tokens, which in turn, scales quadratically with the image size. On larger images (e.g., 1080p), over 60% of the total computation in the network is spent solely on creating and applying attention matrices. We take a step toward solving this issue by introducing Hydra Attention, an extremely efficient attention operation for Vision Transformers (ViTs). Paradoxically, this efficiency comes from taking multi-head attention to its extreme: by using as many attention heads as there are features, Hydra Attention is computationally linear in both tokens and features with no hidden constants, making it significantly faster than standard self-attention in an off-the-shelf ViT-B/16 by a factor of the token count. Moreover, Hydra Attention retains high accuracy on ImageNet and, in some cases, actually improves it.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Landmark Tracking in Liver US images Using Cascade Convolutional Neural Networks with Long Short-Term Memory
Authors:
Yupei Zhang,
Xianjin Dai,
Zhen Tian,
Yang Lei,
Jacob F. Wynne,
Pretesh Patel,
Yue Chen,
Tian Liu,
Xiaofeng Yang
Abstract:
This study proposed a deep learning-based tracking method for ultrasound (US) image-guided radiation therapy. The proposed cascade deep learning model is composed of an attention network, a mask region-based convolutional neural network (mask R-CNN), and a long short-term memory (LSTM) network. The attention network learns a mapping from a US image to a suspected area of landmark motion in order t…
▽ More
This study proposed a deep learning-based tracking method for ultrasound (US) image-guided radiation therapy. The proposed cascade deep learning model is composed of an attention network, a mask region-based convolutional neural network (mask R-CNN), and a long short-term memory (LSTM) network. The attention network learns a mapping from a US image to a suspected area of landmark motion in order to reduce the search region. The mask R-CNN then produces multiple region-of-interest (ROI) proposals in the reduced region and identifies the proposed landmark via three network heads: bounding box regression, proposal classification, and landmark segmentation. The LSTM network models the temporal relationship among the successive image frames for bounding box regression and proposal classification. To consolidate the final proposal, a selection method is designed according to the similarities between sequential frames. The proposed method was tested on the liver US tracking datasets used in the Medical Image Computing and Computer Assisted Interventions (MICCAI) 2015 challenges, where the landmarks were annotated by three experienced observers to obtain their mean positions. Five-fold cross-validation on the 24 given US sequences with ground truths shows that the mean tracking error for all landmarks is 0.65+/-0.56 mm, and the errors of all landmarks are within 2 mm. We further tested the proposed model on 69 landmarks from the testing dataset that has a similar image pattern to the training pattern, resulting in a mean tracking error of 0.94+/-0.83 mm. Our experimental results have demonstrated the feasibility and accuracy of our proposed method in tracking liver anatomic landmarks using US images, providing a potential solution for real-time liver tracking for active motion management during radiation therapy.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Singular Weyl's law with Ricci curvature bounded below
Authors:
Xianzhe Dai,
Shouhei Honda,
Jiayin Pan,
Guofang Wei
Abstract:
We establish two surprising types of Weyl's laws for some compact $\mathrm{RCD}(K, N)$/Ricci limit spaces. The first type could have power growth of any order (bigger than one). The other one has an order corrected by logarithm similar to some fractals even though the space is 2-dimensional. Moreover the limits in both types can be written in terms of the singular sets of null capacities, instead…
▽ More
We establish two surprising types of Weyl's laws for some compact $\mathrm{RCD}(K, N)$/Ricci limit spaces. The first type could have power growth of any order (bigger than one). The other one has an order corrected by logarithm similar to some fractals even though the space is 2-dimensional. Moreover the limits in both types can be written in terms of the singular sets of null capacities, instead of the regular sets. These are the first examples with such features for $\mathrm{RCD}(K,N)$ spaces. Our results depends crucially on analyzing and developing important properties of the examples constructed by the last two authors, showing them isometric to the $α$-Grushin halfplanes. Of independent interest, this also allows us to provide counterexamples to conjectures by Cheeger-Colding and by Kapovitch-Kell-Ketterer.
△ Less
Submitted 6 May, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Open-Set Semi-Supervised Object Detection
Authors:
Yen-Cheng Liu,
Chih-Yao Ma,
Xiaoliang Dai,
Junjiao Tian,
Peter Vajda,
Zijian He,
Zsolt Kira
Abstract:
Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open…
▽ More
Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Authors:
Rui Wang,
Zuxuan Wu,
Dongdong Chen,
Yinpeng Chen,
Xiyang Dai,
Mengchen Liu,
Luowei Zhou,
Lu Yuan,
Yu-Gang Jiang
Abstract:
Transformer-based models have achieved top performance on major video recognition benchmarks. Benefiting from the self-attention mechanism, these models show stronger ability of modeling long-range dependencies compared to CNN-based models. However, significant computation overheads, resulted from the quadratic complexity of self-attention on top of a tremendous number of tokens, limit the use of…
▽ More
Transformer-based models have achieved top performance on major video recognition benchmarks. Benefiting from the self-attention mechanism, these models show stronger ability of modeling long-range dependencies compared to CNN-based models. However, significant computation overheads, resulted from the quadratic complexity of self-attention on top of a tremendous number of tokens, limit the use of existing video transformers in applications with limited resources like mobile devices. In this paper, we extend Mobile-Former to Video Mobile-Former, which decouples the video architecture into a lightweight 3D-CNNs for local context modeling and a Transformer modules for global interaction modeling in a parallel fashion. To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e.g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism. Through efficient global spatial-temporal modeling, Video Mobile-Former significantly improves the video recognition performance of alternative lightweight baselines, and outperforms other efficient CNN-based models at the low FLOP regime from 500M to 6G total FLOPs on various video recognition tasks. It is worth noting that Video Mobile-Former is the first Transformer-based video model which constrains the computational budget within 1G FLOPs.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.