Search | arXiv e-print repository

S-EPOA: Overcoming the Indivisibility of Annotations with Skill-Driven Preference-Based Reinforcement Learning

Authors: Ni Mu, Yao Luan, Yiqin Yang, Qing-shan Jia

Abstract: Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indivisibility of annotations, which impedes the learning process. In this paper, we introduce a groundbreaking approach, Skill-Enhanced Prefer… ▽ More Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indivisibility of annotations, which impedes the learning process. In this paper, we introduce a groundbreaking approach, Skill-Enhanced Preference Optimization Algorithm~(S-EPOA), which addresses the annotation indivisibility issue by integrating skill mechanisms into the preference learning framework. Specifically, we first conduct the unsupervised pretraining to learn useful skills. Then, we propose a novel query selection mechanism to balance the information gain and discriminability over the learned skill space. Experimental results on a range of tasks, including robotic manipulation and locomotion, demonstrate that S-EPOA significantly outperforms conventional PbRL methods in terms of both robustness and learning efficiency. The results highlight the effectiveness of skill-driven learning in overcoming the challenges posed by annotation indivisibility. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Submitted to AAAI 02025

arXiv:2406.13121 [pdf, other]

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

arXiv:2404.18704 [pdf, other]

A geometric approach for stability analysis of delay systems: Applications to network dynamics

Authors: Shijie Zhou, Yang Luan, Xuzhe Qian, Wei Lin

Abstract: Investigating the network stability or synchronization dynamics of multi-agent systems with time delays is of significant importance in numerous real-world applications. Such investigations often rely on solving the transcendental characteristic equations (TCEs) obtained from linearization of the considered systems around specific solutions. While stability results based on the TCEs with real-valu… ▽ More Investigating the network stability or synchronization dynamics of multi-agent systems with time delays is of significant importance in numerous real-world applications. Such investigations often rely on solving the transcendental characteristic equations (TCEs) obtained from linearization of the considered systems around specific solutions. While stability results based on the TCEs with real-valued coefficients induced by symmetric networks in time-delayed models have been extensively explored in the literature, there remains a notable gap in stability analysis for the TCEs with complexvalued coefficients arising from asymmetric networked dynamics with time delays. To address this challenge comprehensively, we propose a rigorously geometric approach. By identifying and studying the stability crossing curves in the complex plane, we are able to determine the stability region of these systems. This approach is not only suitable for analyzing the stability of models with discrete time delays but also for models with various types of delays, including distributed time delays. Additionally, it can also handle random networks. We demonstrate the efficacy of this approach in designing delayed control strategies for car-following systems, mechanical systems, and deep brain stimulation modeling, where involved are complex-valued TCEs or/and different types of delays. All these therefore highlight the broad applicability of our approach across diverse domains. △ Less

Submitted 6 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: No

MSC Class: 14J60 (Primary) 14F05 ACM Class: F.2.2

arXiv:2404.10063 [pdf, other]

Adjusting for bias due to measurement error in functional quantile regression models with error-prone functional and scalar covariates

Authors: Xiwei Chen, Yuanyuan Luan, Roger S. Zoh, Lan Xue, Sneha Jadhav, Carmen D. Tekwe

Abstract: Wearable devices enable the continuous monitoring of physical activity (PA) but generate complex functional data with poorly characterized errors. Most work on functional data views the data as smooth, latent curves obtained at discrete time intervals with some random noise with mean zero and constant variance. Viewing this noise as homoscedastic and independent ignores potential serial correlatio… ▽ More Wearable devices enable the continuous monitoring of physical activity (PA) but generate complex functional data with poorly characterized errors. Most work on functional data views the data as smooth, latent curves obtained at discrete time intervals with some random noise with mean zero and constant variance. Viewing this noise as homoscedastic and independent ignores potential serial correlations. Our preliminary studies indicate that failing to account for these serial correlations can bias estimations. In dietary assessments, epidemiologists often use self-reported measures based on food frequency questionnaires that are prone to recall bias. With the increased availability of complex, high-dimensional functional, and scalar biomedical data potentially prone to measurement errors, it is necessary to adjust for biases induced by these errors to permit accurate analyses in various regression settings. However, there has been limited work to address measurement errors in functional and scalar covariates in the context of quantile regression. Therefore, we developed new statistical methods based on simulation extrapolation (SIMEX) and mixed effects regression with repeated measures to correct for measurement error biases in this context. We conducted simulation studies to establish the finite sample properties of our new methods. The methods are illustrated through application to a real data set. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.03918 [pdf, ps, other]

Dirac cohomology, branching laws and Wallach modules

Authors: Chao-Ping Dong, Yongzhi Luan, Haojun Xu

Abstract: The idea of using Dirac cohomology to study branching laws was initiated by Huang, Pandzić and Zhu in 2013 [HPZ]. One of their results says that the Dirac cohomology of $π$ completely determines $π|_{K}$, where $π$ is any irreducible unitarizable highest weight $(\mathfrak{g}, K)$ module. This paper aims to develop this idea for the exceptional Lie groups $E_{6(-14)}$ and $E_{7(-25)}$: we recover… ▽ More The idea of using Dirac cohomology to study branching laws was initiated by Huang, Pandzić and Zhu in 2013 [HPZ]. One of their results says that the Dirac cohomology of $π$ completely determines $π|_{K}$, where $π$ is any irreducible unitarizable highest weight $(\mathfrak{g}, K)$ module. This paper aims to develop this idea for the exceptional Lie groups $E_{6(-14)}$ and $E_{7(-25)}$: we recover the $K$-spectrum of the Wallach modules from their Dirac cohomology. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 17 pages, 4 figures, 5 tables

MSC Class: 22E46

arXiv:2403.20327 [pdf, other]

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2403.19651 [pdf, other]

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Authors: Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

Abstract: Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a sm… ▽ More Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a small set of pre-defined relations. The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e.g., inside view of), and we can bring those implicit relations explicit by synthesizing instructions via foundation models. Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relations mined from the web, MagicLens achieves results comparable with or better than prior best on eight benchmarks of various image retrieval tasks, while maintaining high parameter efficiency with a significantly smaller model size. Additional human analyses on a 1.4M-image unseen corpus further demonstrate the diversity of search intents supported by MagicLens. Code and models are publicly available at https://open-vision-language.github.io/MagicLens/. △ Less

Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: ICML 2024 (Oral); Project Website: https://open-vision-language.github.io/MagicLens/

arXiv:2402.00286 [pdf, ps, other]

Dirac series of $E_{8(-24)}$

Authors: Yi-Hao Ding, Chao-Ping Dong, Chengyu Du, Yong-Zhi Luan, Liang Yang

Abstract: This paper classifies the Dirac series of $E_{8(-24)}$, the linear quaternionic real form of complex $E_8$. One tool for us is a further sharpening of the Helgason-Johnson bound in 1969. Our calculation continues to support Vogan's fundamental parallelepiped conjecture. This paper classifies the Dirac series of $E_{8(-24)}$, the linear quaternionic real form of complex $E_8$. One tool for us is a further sharpening of the Helgason-Johnson bound in 1969. Our calculation continues to support Vogan's fundamental parallelepiped conjecture. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 32 pages

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.07911 [pdf, other]

Instruction-Following Evaluation for Large Language Models

Authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou

Abstract: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval… ▽ More One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval (IFEval) for large language models. IFEval is a straightforward and easy-to-reproduce evaluation benchmark. It focuses on a set of "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times". We identified 25 types of those verifiable instructions and constructed around 500 prompts, with each prompt containing one or more verifiable instructions. We show evaluation results of two widely available LLMs on the market. Our code and data can be found at https://github.com/google-research/google-research/tree/master/instruction_following_eval △ Less

Submitted 14 November, 2023; originally announced November 2023.

MSC Class: 68T50 (Primary) 68T99 (Secondary) ACM Class: I.2.7

arXiv:2305.12624 [pdf, other]

Scalable regression calibration approaches to correcting measurement error in multi-level generalized functional linear regression models with heteroscedastic measurement errors

Authors: Yuanyuan Luan, Roger S. Zoh, Erjia Cui, Xue Lan, Sneha Jadhav, Carmen D. Tekwe

Abstract: Wearable devices permit the continuous monitoring of biological processes, such as blood glucose metabolism, and behavior, such as sleep quality and physical activity. The continuous monitoring often occurs in epochs of 60 seconds over multiple days, resulting in high dimensional longitudinal curves that are best described and analyzed as functional data. From this perspective, the functional data… ▽ More Wearable devices permit the continuous monitoring of biological processes, such as blood glucose metabolism, and behavior, such as sleep quality and physical activity. The continuous monitoring often occurs in epochs of 60 seconds over multiple days, resulting in high dimensional longitudinal curves that are best described and analyzed as functional data. From this perspective, the functional data are smooth, latent functions obtained at discrete time intervals and prone to homoscedastic white noise. However, the assumption of homoscedastic errors might not be appropriate in this setting because the devices collect the data serially. While researchers have previously addressed measurement error in scalar covariates prone to errors, less work has been done on correcting measurement error in high dimensional longitudinal curves prone to heteroscedastic errors. We present two new methods for correcting measurement error in longitudinal functional curves prone to complex measurement error structures in multi-level generalized functional linear regression models. These methods are based on two-stage scalable regression calibration. We assume that the distribution of the scalar responses and the surrogate measures prone to heteroscedastic errors both belong in the exponential family and that the measurement errors follow Gaussian processes. In simulations and sensitivity analyses, we established some finite sample properties of these methods. In our simulations, both regression calibration methods for correcting measurement error performed better than estimators based on averaging the longitudinal functional data and using observations from a single day. We also applied the methods to assess the relationship between physical activity and type 2 diabetes in community dwelling adults in the United States who participated in the National Health and Nutrition Examination Survey. △ Less

Submitted 20 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.02651 [pdf, other]

Generalized functional linear regression models with a mixture of complex function-valued and scalar-valued covariates prone to measurement error

Authors: Yuanyuan Luan, Roger S. Zoh, Sneha Jadhav, Lan Xue, Carmen D. Tekwe

Abstract: While extensive work has been done to correct for biases due to measurement error in scalar-valued covariates prone to errors in generalized linear regression models, limited work has been done to address biases associated with functional covariates prone to errors or the combination of scalar and functional covariates prone to errors in these models. We propose Simulation Extrapolation (SIMEX) an… ▽ More While extensive work has been done to correct for biases due to measurement error in scalar-valued covariates prone to errors in generalized linear regression models, limited work has been done to address biases associated with functional covariates prone to errors or the combination of scalar and functional covariates prone to errors in these models. We propose Simulation Extrapolation (SIMEX) and Regression Calibration approaches to correct measurement errors associated with a mixture of functional and scalar covariates prone to classical measurement errors in generalized functional linear regression. The simulation extrapolation method is developed to handle the functional and scalar covariates prone to errors. We also develop methods based on regression calibration extended to our current measurement error settings. Extensive simulation studies are conducted to assess the finite sample performance of our developed methods. The methods are applied to the 2011-2014 cycles of the National Health and Examination Survey data to assess the relationship between physical activity and total caloric intake with type 2 diabetes among community-dwelling adults living in the United States. We treat the device-based measures of physical activity as error-prone functional covariates prone to complex arbitrary heteroscedastic errors, while the total caloric intake is considered a scalar-valued covariate prone to error. We also examine the characteristics of observed measurement errors in device-based physical activity by important demographic subgroups including age, sex, and race. △ Less

Submitted 12 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2302.11713 [pdf, other]

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Authors: Yang Chen, Hexiang Hu, Yi Luan, Haitian Sun, Soravit Changpinyo, Alan Ritter, Ming-Wei Chang

Abstract: Pre-trained vision and language models have demonstrated state-of-the-art capabilities over existing tasks involving images and texts, including visual question answering. However, it remains unclear whether these models possess the capability to answer questions that are not only querying visual content but knowledge-intensive and information-seeking. In this study, we introduce InfoSeek, a visua… ▽ More Pre-trained vision and language models have demonstrated state-of-the-art capabilities over existing tasks involving images and texts, including visual question answering. However, it remains unclear whether these models possess the capability to answer questions that are not only querying visual content but knowledge-intensive and information-seeking. In this study, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. Using InfoSeek, we analyze various pre-trained visual question answering models and gain insights into their characteristics. Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their pre-training. Furthermore, we show that accurate visual entity recognition can be used to improve performance on InfoSeek by retrieving relevant documents, showing a significant space for improvement. △ Less

Submitted 17 October, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: EMNLP 2023 (main conference); Our dataset and evaluation is available at https://open-vision-language.github.io/infoseek/

arXiv:2302.11154 [pdf, other]

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Authors: Hexiang Hu, Yi Luan, Yang Chen, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Kristina Toutanova, Ming-Wei Chang

Abstract: Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks. However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual… ▽ More Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks. However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual recognizers. To address this, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct OVEN-Wiki by re-purposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. OVEN challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels. Our study on state-of-the-art pre-trained models reveals large headroom in generalizing to the massive-scale label space. We show that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning. We also find existing pretrained models yield different strengths: while PaLI-based models obtain higher overall performance, CLIP-based models are better at recognizing tail entities. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Dataset available at https://open-vision-language.github.io/oven

arXiv:2301.12980 [pdf]

doi 10.1038/nphoton.2017.65

Imaging exciton-polariton transport in MoSe2 waveguides

Authors: Fengrui Hu, Yilong Luan, M. E. Scott, Jiaqiang Yan, D. G. Mandrus, Xiaodong Xu, Z Fei

Abstract: The exciton polariton (EP), a half-light and half-matter quasiparticle, is potentially an important element for future photonic and quantum technologies. It provides both strong light-matter interactions and long-distance propagation that is necessary for applications associated with energy or information transfer. Recently, strongly-coupled cavity EPs at room temperature have been demonstrated in… ▽ More The exciton polariton (EP), a half-light and half-matter quasiparticle, is potentially an important element for future photonic and quantum technologies. It provides both strong light-matter interactions and long-distance propagation that is necessary for applications associated with energy or information transfer. Recently, strongly-coupled cavity EPs at room temperature have been demonstrated in van der Waals (vdW) materials due to their strongly-bound excitons. Here we report a nano-optical imaging study of waveguide EPs in MoSe2, a prototypical vdW semiconductor. The measured propagation length of the EPs is sensitive to the excitation photon energy and reaches over 12 μm. The polariton wavelength can be conveniently altered from 600 nm down to 300 nm by controlling the waveguide thickness. Furthermore, we found an intriguing mode back-bending dispersion close to the exciton resonance. The observed EPs in vdW semiconductors could be useful in future nanophotonic circuits operating in the near-infrared to visible spectral regions. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 21 pages

Journal ref: Nature Photonics 11, 356-360 (2017)

arXiv:2301.11646 [pdf]

doi 10.1103/PhysRevLett.119.247402

Real-space imaging of the tailored plasmons in twisted bilayer graphene

Authors: Fengrui Hu, Suprem R Das, Yilong Luan, T. -F. Chung, Yong P. Chen, Zhe Fei

Abstract: We report a systematic plasmonic study of twisted bilayer graphene (TBLG) - two graphene layers stacked with a twist angle. Through real-space nanoimaging of TBLG single crystals with a wide distribution of twist angles, we find that TBLG supports confined infrared plasmons that are sensitively dependent on the twist angle. At small twist angles, TBLG has a plasmon wavelength comparable to that of… ▽ More We report a systematic plasmonic study of twisted bilayer graphene (TBLG) - two graphene layers stacked with a twist angle. Through real-space nanoimaging of TBLG single crystals with a wide distribution of twist angles, we find that TBLG supports confined infrared plasmons that are sensitively dependent on the twist angle. At small twist angles, TBLG has a plasmon wavelength comparable to that of single-layer graphene. At larger twist angles, the plasmon wavelength of TBLG increases significantly with apparently lower damping. Further analysis and modeling indicate that the observed twist-angle dependence of TBLG plasmons in the Dirac linear regime is mainly due to the Fermi-velocity renormalization, a direct consequence of interlayer electronic coupling. Our work unveils the tailored plasmonic characteristics of TBLG and deepens our understanding of the intriguing nano-optical physics in novel van der Waals coupled two-dimensional materials. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 22 pages

Journal ref: Phys. Rev. Lett. 119, 247402 (2017)

arXiv:2301.11645 [pdf]

doi 10.1021/acs.nanolett.9b01945

Tailored plasmons in pentacene/graphene heterostructures with interlayer electron transfer

Authors: Fengrui Hu, Minsung Kim, Y. Zhang, Yilong Luan, Kai-Ming Ho, Yi Shi, Cai-Zhuang Wang, Xinran Wang, Zhe Fei

Abstract: Van der Waals (vdW) heterostructures, which are produced by the precise assemblies of varieties of two-dimensional (2D) materials, have demonstrated many novel properties and functionalities. Here we report a nano-plasmonic study of vdW heterostructures that were produced by depositing ordered molecular layers of pentacene on top of graphene. We find through nano-infrared (IR) imaging that surface… ▽ More Van der Waals (vdW) heterostructures, which are produced by the precise assemblies of varieties of two-dimensional (2D) materials, have demonstrated many novel properties and functionalities. Here we report a nano-plasmonic study of vdW heterostructures that were produced by depositing ordered molecular layers of pentacene on top of graphene. We find through nano-infrared (IR) imaging that surface plasmons formed due to the collective oscillations of Dirac fermions in graphene are highly sensitive to the adjacent pentacene layers. In particular, the plasmon wavelength declines systematically but nonlinearly with increasing pentacene thickness. Further analysis and density functional theory (DFT) calculations indicate that the observed peculiar thickness dependence is mainly due to the tunneling-type electron transfer from pentacene to graphene. Our work unveils a new method for tailoring graphene plasmons and deepens our understanding of the intriguing nano-optical phenomena due to interlayer couplings in novel vdW heterostructures. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 21 pages

Journal ref: Nano Lett. 19, 6058-6064 (2019)

arXiv:2301.11381 [pdf]

doi 10.1021/acs.nanolett.1c03833

Imaging Anisotropic Waveguide Exciton Polaritons in Tin Sulfide

Authors: Yilong Luan, Hamidreza Zobeiri, Xinwei Wang, Eli Sutter, Peter Sutter, Zhe Fei

Abstract: In recent years, novel materials supporting in-plane anisotropic polaritons have attracted a lot of research interest due to their capability of shaping nanoscale field distributions and controlling nanophotonic energy flows. Here we report a nano-optical imaging study of waveguide exciton polaritons (EPs) in tin sulfide (SnS) in the near-infrared (IR) region using the scattering-type scanning nea… ▽ More In recent years, novel materials supporting in-plane anisotropic polaritons have attracted a lot of research interest due to their capability of shaping nanoscale field distributions and controlling nanophotonic energy flows. Here we report a nano-optical imaging study of waveguide exciton polaritons (EPs) in tin sulfide (SnS) in the near-infrared (IR) region using the scattering-type scanning near-field optical microscopy (s-SNOM). With s-SNOM, we mapped in real space the propagative EPs in SnS, which show sensitive dependence on the excitation energy and sample thickness. Moreover, we found that both the polariton wavelength and propagation length are anisotropic in the sample plane. In particular, in a narrow spectral range from 1.32 to 1.44 eV, the EPs demonstrate quasi-one-dimensional propagation, which is rarely seen in natural polaritonic materials. Further analysis indicates that the observed polariton anisotropy is originated from the different optical bandgaps and exciton binding energies along the two principal crystal axes of SnS. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 21 pages

Journal ref: Nano Lett. 22, 4, 1497-1503 (2022)

arXiv:2301.11171 [pdf]

doi 10.1103/PhysRevApplied.13.034020

Tip-and plasmon-enhanced infrared nanoscopy for ultrasensitive molecular characterizations

Authors: Yilong Luan, Liam McDermott, Fengrui Hu, Zhe Fei

Abstract: We propose a novel method for ultra-sensitive infrared (IR) vibrational spectroscopy of molecules with nanoscale footprints by combining the tip enhancement of the scattering-type scanning near-field optical microscope (s-SNOM) and the plasmon enhancement of the breathing-mode (BM) plasmon resonances of graphene nanodisks (GNDs). To demonstrate that, we developed a quantitative model that is capab… ▽ More We propose a novel method for ultra-sensitive infrared (IR) vibrational spectroscopy of molecules with nanoscale footprints by combining the tip enhancement of the scattering-type scanning near-field optical microscope (s-SNOM) and the plasmon enhancement of the breathing-mode (BM) plasmon resonances of graphene nanodisks (GNDs). To demonstrate that, we developed a quantitative model that is capable of computing accurately the s-SNOM signals of nanoscale samples. With our modeling, we show that the s-SNOM tip can effectively excite gate-tunable BM plasmonic resonances in GNDs with strong field enhancement and sensitive dependence on the size of GND. Moreover, we demonstrate that the intense electric field of tip-excited plasmonic BMs can strongly enhance the IR vibrational modes of molecules. As a result, IR vibrational signatures of individual molecular particles with sizes down to 1-2 nm can be readily observable by s-SNOM. Our study sheds light on future ultra-sensitive IR biosensing that takes advantage of both the tip and plasmon enhancement. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 18 pages

Journal ref: Phys. Rev. Applied 13, 034020 (2020)

arXiv:2301.11157 [pdf]

doi 10.1103/PhysRevApplied.18.024052

Imaging Stacking-Dependent Surface Plasmon Polaritons in Trilayer Graphene

Authors: Yilong Luan, Jun Qian, Minsung Kim, Kai-Ming Ho, Yi Shi, Yun Li, Cai-Zhuang Wang, Michael C. Tringides, Zhe Fei

Abstract: We report a nano-infrared (IR) imaging study of trilayer graphene (TLG) with both ABA (Bernal) and ABC (rhombohedral) stacking orders using the scattering-type scanning near-field optical microscope (s-SNOM). With s-SNOM operating in the mid-IR region, we mapped in real space the surface plasmon polaritons (SPPs) of ABA-TLG and ABC-TLG, which are tunable with electrical gating. Through quantitativ… ▽ More We report a nano-infrared (IR) imaging study of trilayer graphene (TLG) with both ABA (Bernal) and ABC (rhombohedral) stacking orders using the scattering-type scanning near-field optical microscope (s-SNOM). With s-SNOM operating in the mid-IR region, we mapped in real space the surface plasmon polaritons (SPPs) of ABA-TLG and ABC-TLG, which are tunable with electrical gating. Through quantitative modeling of the plasmonic imaging data, we found that the plasmon wavelength of ABA-TLG is significantly larger than that of ABC-TLG, resulting in a sizable impedance mismatch and hence a strong plasmon reflection at the ABA/ABC lateral junction. Further analysis indicates that the different plasmonic responses of the two types of TLG are directly linked to their electronic structures and carrier properties. Our work uncovers the physics behind the stacking-dependent plasmonic responses of TLG and sheds light on future applications of TLG and the ABA/ABC junctions in IR plasmonics and planar nano-optics. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 11 pages

Journal ref: Phys. Rev. Applied 18, 024052 (2022)

arXiv:2210.08868 [pdf, other]

Cerebrovascular Segmentation via Vessel Oriented Filtering Network

Authors: Zhanqiang Guo, Yao Luan, Jianjiang Feng, Wangsheng Lu, Yin Yin, Guangming Yang, Jie Zhou

Abstract: Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering… ▽ More Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering Network (VOF-Net) which embeds domain knowledge into the convolutional neural network. We design oriented filters for blood vessels according to vessel orientation field, which is obtained by orientation estimation network. Features extracted by oriented filtering are injected into segmentation network, so as to make use of the prior information that the blood vessels are slender and curved tubular structure. Experimental results on datasets of CTA and MRA show that the proposed method is effective for vessel segmentation, and embedding the specific vascular filter improves the segmentation performance. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.01170 [pdf, ps, other]

Irreducible components of Hilbert scheme of points on non-reduced curves

Authors: Yuze Luan

Abstract: We classify the irreducible components of the Hilbert scheme of $n$ points on non-reduced algebraic plane curves, and give a formula for the multiplicities of the irreducible components. The irreducible components are indexed by partitions of $n$; all have dimension $n$; and their multiplicities are given as a polynomial of the parts of the corresponding partitions. We classify the irreducible components of the Hilbert scheme of $n$ points on non-reduced algebraic plane curves, and give a formula for the multiplicities of the irreducible components. The irreducible components are indexed by partitions of $n$; all have dimension $n$; and their multiplicities are given as a polynomial of the parts of the corresponding partitions. △ Less

Submitted 23 October, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.11755 [pdf, other]

Promptagator: Few-shot Dense Retrieval From 8 Examples

Authors: Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith B. Hall, Ming-Wei Chang

Abstract: Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search… ▽ More Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it possible to create task-specific end-to-end retrievers solely based on a few examples {without} using Natural Questions or MS MARCO to train %question generators or dual encoders. Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2204.06092 [pdf, other]

ASQA: Factoid Questions Meet Long-Form Answers

Authors: Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, Ming-Wei Chang

Abstract: An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer questions that require in-depth explanations. The hurdles include (i) a lack of high-quality data, and (ii) the absence of a well-defined notion of the… ▽ More An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer questions that require in-depth explanations. The hurdles include (i) a lack of high-quality data, and (ii) the absence of a well-defined notion of the answer's quality. In this work, we address these problems by (i) releasing a novel dataset and a task that we call ASQA (Answer Summaries for Questions which are Ambiguous); and (ii) proposing a reliable metric for measuring performance on ASQA. Our task focuses on factoid questions that are ambiguous, that is, have different correct answers depending on interpretation. Answers to ambiguous questions should synthesize factual information from multiple sources into a long-form summary that resolves the ambiguity. In contrast to existing long-form QA tasks (such as ELI5), ASQA admits a clear notion of correctness: a user faced with a good summary should be able to answer different interpretations of the original ambiguous question. We use this notion of correctness to define an automated metric of performance for ASQA. Our analysis demonstrates an agreement between this metric and human judgments, and reveals a considerable gap between human performance and strong baselines. △ Less

Submitted 22 January, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

Comments: A minor bug in computing the ROUGE score was fixed. The fix **did not** result in any changes in observations and conclusions

arXiv:2202.00711 [pdf, other]

A fully Bayesian semi-parametric scalar-on-function regression (SoFR) with measurement error using instrumental variables

Authors: Roger S. Zoh, Yuanyuan Luan, Carmen Tekwe

Abstract: Wearable devices such as the ActiGraph are now commonly used in health studies to monitor or track physical activity. This trend aligns well with the growing need to accurately assess the effects of physical activity on health outcomes such as obesity. When accessing the association between these device-based physical activity measures with health outcomes such as body mass index, the device-based… ▽ More Wearable devices such as the ActiGraph are now commonly used in health studies to monitor or track physical activity. This trend aligns well with the growing need to accurately assess the effects of physical activity on health outcomes such as obesity. When accessing the association between these device-based physical activity measures with health outcomes such as body mass index, the device-based data is considered functions, while the outcome is a scalar-valued. The regression model applied in these settings is the scalar-on-function regression (SoFR). Most estimation approaches in SoFR assume that the functional covariates are precisely observed, or the measurement errors are considered random errors. Violation of this assumption can lead to both under-estimation of the model parameters and sub-optimal analysis. The literature on a measurement corrected approach in SoFR is sparse in the non-Bayesian literature and virtually non-existent in the Bayesian literature. This paper considers a fully nonparametric Bayesian measurement error corrected SoFR model that relaxes all the constraining assumptions often made in these models. Our estimation relies on an instrumental variable (IV) to identify the measurement error model. Finally, we introduce an IV quality scalar parameter that is jointly estimated along with all model parameters. Our method is easy to implement, and we demonstrate its finite sample properties through an extensive simulation. Finally, the developed methods are applied to the National Health and Examination Survey to assess the relationship between wearable-device-based measures of physical activity and body mass index among adults living in the United States. △ Less

Submitted 9 November, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2112.14549 [pdf, ps, other]

doi 10.1088/1674-1137/ace312

Nucleons as modified Ising models

Authors: Shu-Man Hu, Yin-Sen Luan, Ji Xu

Abstract: In this paper, we propose a map which connects nucleons bound in nuclei and Ising spins in Ising model. This proposal is based on the fact that the description of states of nucleons and Ising spins could share the same type of observables. We present a nuclear model as a correspondence to an explicit modified Ising model and qualitatively confirm the correctness of this map by simulation on a two-… ▽ More In this paper, we propose a map which connects nucleons bound in nuclei and Ising spins in Ising model. This proposal is based on the fact that the description of states of nucleons and Ising spins could share the same type of observables. We present a nuclear model as a correspondence to an explicit modified Ising model and qualitatively confirm the correctness of this map by simulation on a two-dimensional square lattice. This map would help us understand the profound connections between different physical systems. △ Less

Submitted 26 March, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 8 pages, 6 figures

arXiv:2112.08558 [pdf, other]

CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

Authors: Zeqiu Wu, Yi Luan, Hannah Rashkin, David Reitter, Hannaneh Hajishirzi, Mari Ostendorf, Gaurav Singh Tomar

Abstract: Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context. Moreover, it can be expensive to re-train well-established retrievers such as search engines that are originally developed for non-conversational queries. To facilit… ▽ More Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context. Moreover, it can be expensive to re-train well-established retrievers such as search engines that are originally developed for non-conversational queries. To facilitate their use, we develop a query rewriting model CONQRR that rewrites a conversational question in the context into a standalone question. It is trained with a novel reward function to directly optimize towards retrieval using reinforcement learning and can be adapted to any off-the-shelf retriever. CONQRR achieves state-of-the-art results on a recent open-domain CQA dataset containing conversations from three different sources, and is effective for two different off-the-shelf retrievers. Our extensive analysis also shows the robustness of CONQRR to out-of-domain dialogues as well as to zero query rewriting supervision. △ Less

Submitted 28 October, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: EMNLP 2022 camera-ready

arXiv:2112.07899 [pdf, other]

Large Dual Encoders Are Generalizable Retrievers

Authors: Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang

Abstract: It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we… ▽ More It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model {\em while keeping the bottleneck embedding size fixed.} With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, \textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR), outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10\% of MS Marco supervised data to achieve the best out-of-domain performance. All the GTR models are released at https://tfhub.dev/google/collections/gtr/1. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2112.03539 [pdf, other]

A Function-Based Approach to Model the Measurement Error in Wearable Devices

Authors: Sneha Jadhav, Carmen D. Tekwe, Yuanyuan Luan

Abstract: Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measu… ▽ More Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measurement error. To address these challenges we treat accelerometer-based measures of physical activity as a single function-valued covariate prone to measurement error. Specifically, in order to determine the relationship between PA and a health outcome of interest, we propose a regression model with a functional covariate that accounts for measurement error. Using regression calibration, we develop a two-step estimation method for the model parameters and establish their consistency. A test is also proposed to test the significance of the estimated model parameters. Simulation studies are conducted to compare the proposed methods with existing alternative approaches under varying scenarios. Finally, the developed methods are used to assess the relationship between PA intensity and BMI obtained from the National Health and Nutrition Examination Survey data. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2106.06462 [pdf, other]

Semi-Supervised and Unsupervised Sense Annotation via Translations

Authors: Bradley Hauer, Grzegorz Kondrak, Yixing Luan, Arnob Mallik, Lili Mou

Abstract: Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as co… ▽ More Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks. △ Less

Submitted 17 September, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Comments: In proceedings of RANLP 2021

arXiv:2102.04407 [pdf]

doi 10.1063/5.0046789

Quantifying the Temperature of Heated Microdevices Using Scanning Thermal Probes

Authors: Amin Reihani, Shen Yan, Yuxuan Luan, Rohith Mittapally, Edgar Meyhofer, Pramod Reddy

Abstract: Quantifying the temperature of microdevices is critical for probing nanoscale energy transport.Such quantification is often accomplished by integrating resistance thermometers into microdevices. However, such thermometers frequently become structurally unstable and fail due to thermal stresses at elevated temperatures. Here, we show that custom-fabricated scanning thermal probes (STPs) with a shar… ▽ More Quantifying the temperature of microdevices is critical for probing nanoscale energy transport.Such quantification is often accomplished by integrating resistance thermometers into microdevices. However, such thermometers frequently become structurally unstable and fail due to thermal stresses at elevated temperatures. Here, we show that custom-fabricated scanning thermal probes (STPs) with a sharp tip and an integrated heater/thermometer can accurately measure the temperature of microdevices held at elevated temperatures. This measurement is accomplished by introducing a modulated heat input to the STP after contacting the microdevice with the STP's tip, and characterizing the DC and AC components of the STP's temperature.From these measured temperature components, the tip-to-sample thermal resistance and the microdevice surface temperature are deduced via a simple lumped-capacitance model. The advances presented here can greatly facilitate temperature measurements of a variety of heated microdevices. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: Manuscript submitted to Applied Physics Letters

arXiv:2005.00181 [pdf, other]

Sparse, Dense, and Attentional Representations for Text Retrieval

Authors: Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins

Abstract: Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks. Using both theoretical and empirical analysis, we establish connections between the encoding dimension, the margin betw… ▽ More Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks. Using both theoretical and empirical analysis, we establish connections between the encoding dimension, the margin between gold and lower-ranked documents, and the document length, suggesting limitations in the capacity of fixed-length encodings to support precise retrieval of long documents. Building on these insights, we propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and explore sparse-dense hybrids to capitalize on the precision of sparse retrieval. These models outperform strong alternatives in large-scale retrieval. △ Less

Submitted 16 February, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

Comments: To appear in TACL 2020. The arXiv version is a pre-MIT Press publication version

arXiv:2004.12006 [pdf, other]

Contextualized Representations Using Textual Encyclopedic Knowledge

Authors: Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

Abstract: We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for ta… ▽ More We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1). △ Less

Submitted 13 July, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Added experiments comparing linkers

arXiv:1911.09418 [pdf, other]

MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks

Authors: Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai

Abstract: As the development of neural networks, more and more deep neural networks are adopted in various tasks, such as image classification. However, as the huge computational overhead, these networks could not be applied on mobile devices or other low latency scenes. To address this dilemma, multi-classifier convolutional network is proposed to allow faster inference via early classifiers with the corre… ▽ More As the development of neural networks, more and more deep neural networks are adopted in various tasks, such as image classification. However, as the huge computational overhead, these networks could not be applied on mobile devices or other low latency scenes. To address this dilemma, multi-classifier convolutional network is proposed to allow faster inference via early classifiers with the corresponding classifiers. These networks utilize sophisticated designing to increase the early classifier accuracy. However, naively training the multi-classifier network could hurt the performance (accuracy) of deep neural networks as early classifiers throughout interfere with the feature generation process. In this paper, we propose a general training framework named multi-self-distillation learning (MSD), which mining knowledge of different classifiers within the same network and increase every classifier accuracy. Our approach can be applied not only to multi-classifier networks, but also modern CNNs (e.g., ResNet Series) augmented with additional side branch classifiers. We use sampling-based branch augmentation technique to transform a single-classifier network into a multi-classifier network. This reduces the gap of capacity between different classifiers, and improves the effectiveness of applying MSD. Our experiments show that MSD improves the accuracy of various networks: enhancing the accuracy of every classifier significantly for existing multi-classifier network (MSDNet), improving vanilla single-classifier networks with internal classifiers with high accuracy, while also improving the final accuracy. △ Less

Submitted 2 December, 2019; v1 submitted 21 November, 2019; originally announced November 2019.

arXiv:1909.03546 [pdf, other]

Entity, Relation, and Event Extraction with Contextualized Span Representations

Authors: David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi

Abstract: We examine the capabilities of a unified, multi-task framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction. Our framework (called DyGIE++) accomplishes all tasks by enumerating, refining, and scoring text spans designed to capture local (within-sentence) and global (cross-sentence) context. Our framework achieves state-of-the-art resu… ▽ More We examine the capabilities of a unified, multi-task framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction. Our framework (called DyGIE++) accomplishes all tasks by enumerating, refining, and scoring text spans designed to capture local (within-sentence) and global (cross-sentence) context. Our framework achieves state-of-the-art results across all tasks, on four datasets from a variety of domains. We perform experiments comparing different techniques to construct span representations. Contextualized embeddings like BERT perform well at capturing relationships among entities in the same or adjacent sentences, while dynamic span graph updates model long-range cross-sentence relationships. For instance, propagating span representations via predicted coreference links can enable the model to disambiguate challenging entity mentions. Our code is publicly available at https://github.com/dwadden/dygiepp and can be easily adapted for new tasks or datasets. △ Less

Submitted 9 September, 2019; v1 submitted 8 September, 2019; originally announced September 2019.

Comments: EMNLP 2019

arXiv:1905.07870 [pdf, other]

doi 10.18653/v1/P19-1191

PaperRobot: Incremental Draft Generation of Scientific Ideas

Authors: Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan

Abstract: We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some k… ▽ More We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively. △ Less

Submitted 31 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: 12 pages. Accepted by ACL 2019 Code and resource is available at https://github.com/EagleW/PaperRobot

arXiv:1904.03296 [pdf, other]

A General Framework for Information Extraction using Dynamic Span Graphs

Authors: Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, Hannaneh Hajishirzi

Abstract: We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs. The graphs are constructed by selecting the most confident entity spans and linking these nodes with confidence-weighted relation types and coreferences. The dynamic span graph allows coreference and relation type confidences to propagate through the… ▽ More We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs. The graphs are constructed by selecting the most confident entity spans and linking these nodes with confidence-weighted relation types and coreferences. The dynamic span graph allows coreference and relation type confidences to propagate through the graph to iteratively refine the span representations. This is unlike previous multi-task frameworks for information extraction in which the only interaction between tasks is in the shared first-layer LSTM. Our framework significantly outperforms the state-of-the-art on multiple information extraction tasks across multiple datasets reflecting different domains. We further observe that the span enumeration approach is good at detecting nested span entities, with significant F1 score improvement on the ACE dataset. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: NAACL 2019

arXiv:1904.02342 [pdf, other]

Text Generation from Knowledge Graphs with Graph Transformers

Authors: Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, Hannaneh Hajishirzi

Abstract: Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. Graphical… ▽ More Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. Graphical knowledge representations are ubiquitous in computing, but pose a significant challenge for text generation techniques due to their non-hierarchical nature, collapsing of long-distance dependencies, and structural variety. We introduce a novel graph transforming encoder which can leverage the relational structure of such knowledge graphs without imposing linearization or hierarchical constraints. Incorporated into an encoder-decoder setup, we provide an end-to-end trainable system for graph-to-text generation that we apply to the domain of scientific text. Automatic and human evaluations show that our technique produces more informative texts which exhibit better document structure than competitive encoder-decoder methods. △ Less

Submitted 24 March, 2022; v1 submitted 4 April, 2019; originally announced April 2019.

Comments: Accepted as a long paper in NAACL 2019

arXiv:1903.01934 [pdf, other]

doi 10.1109/TUFFC.2019.2921103

Coherent Multi-Transducer Ultrasound Imaging

Authors: Laura Peralta, Alberto Gomez, Ying Luan, Baehyung Kim, Joseph V. Hajnal, Robert J. Eckersley

Abstract: An extended aperture has the potential to greatly improve ultrasound imaging performance. This work extends the effective aperture size by coherently compounding the received radio frequency data from multiple transducers. A framework is developed in which an ultrasound imaging system consisting of $N$ synchronized matrix arrays, each with partly shared field of view, take turns to transmit plane… ▽ More An extended aperture has the potential to greatly improve ultrasound imaging performance. This work extends the effective aperture size by coherently compounding the received radio frequency data from multiple transducers. A framework is developed in which an ultrasound imaging system consisting of $N$ synchronized matrix arrays, each with partly shared field of view, take turns to transmit plane waves. Only one individual transducer transmits at each time while all $N$ transducers simultaneously receive. The subwavelength localization accuracy required to combine information from multiple transducers is achieved without the use of any external tracking device. The method developed in this study is based on the study of the backscattered echoes received by the same transducer and resulting from a targeted scatterer point in the medium insonated by the multiple ultrasound probes of the system. The current transducer locations along with the speed of sound in the medium are deduced by optimizing the cross-correlation between these echoes. The method is demonstrated experimentally in 2-D using ultrasound point and anechoic lesion phantoms and a first demonstration of a free-hand experiment is also shown. Results demonstrate that the coherent multi-transducer imaging has the potential to improve ultrasound image quality, improving resolution and target detectability. Lateral resolution, contrast and contrast-to-noise ratio improved from 0.67 mm, -6.708 dB and 0.702, respectively, when using a single probe, to 0.18 mm, -7.251 dB and 0.721 in the coherent multi-transducer imaging case. △ Less

Submitted 5 March, 2019; originally announced March 2019.

MSC Class: Mechanics of deformable solids

arXiv:1901.00401 [pdf, other]

Information Extraction from Scientific Literature for Method Recommendation

Authors: Yi Luan

Abstract: As a research community grows, more and more papers are published each year. As a result there is increasing demand for improved methods for finding relevant papers, automatically understanding the key ideas and recommending potential methods for a target problem. Despite advances in search engines, it is still hard to identify new technologies according to a researcher's need. Due to the large va… ▽ More As a research community grows, more and more papers are published each year. As a result there is increasing demand for improved methods for finding relevant papers, automatically understanding the key ideas and recommending potential methods for a target problem. Despite advances in search engines, it is still hard to identify new technologies according to a researcher's need. Due to the large variety of domains and extremely limited annotated resources, there has been relatively little work on leveraging natural language processing in scientific recommendation. In this proposal, we aim at making scientific recommendations by extracting scientific terms from a large collection of scientific papers and organizing the terms into a knowledge graph. In preliminary work, we trained a scientific term extractor using a small amount of annotated data and obtained state-of-the-art performance by leveraging large amount of unannotated papers through applying multiple semi-supervised approaches. We propose to construct a knowledge graph in a way that can make minimal use of hand annotated data, using only the extracted terms, unsupervised relational signals such as co-occurrence, and structural external resources such as Wikipedia. Latent relations between scientific terms can be learned from the graph. Recommendations will be made through graph inference for both observed and unobserved relational pairs. △ Less

Submitted 13 December, 2018; originally announced January 2019.

Comments: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.06075

arXiv:1809.08703 [pdf, other]

Monolingual sentence matching for text simplification

Authors: Yonghui Huang, Yunhui Li, Yi Luan

Abstract: This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a convolutional neural network structure to model similarity between two sentences. Due to the limitation of available parallel corpora, the model is trained in a semi-supervised way, by using the output of a knowledge-based high performance aligning syste… ▽ More This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a convolutional neural network structure to model similarity between two sentences. Due to the limitation of available parallel corpora, the model is trained in a semi-supervised way, by using the output of a knowledge-based high performance aligning system. We apply the resulting similarity score to rescore the knowledge-based output, and adapt the model by a small hand-aligned dataset. Experiments show that both rescoring and adaptation improve the performance of knowledge-based method. △ Less

Submitted 19 September, 2018; originally announced September 2018.

arXiv:1808.09602 [pdf, other]

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Authors: Yi Luan, Luheng He, Mari Ostendorf, Hannaneh Hajishirzi

Abstract: We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages c… ▽ More We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Journal ref: EMNLP 2018

arXiv:1808.08643 [pdf, other]

Scientific Relation Extraction with Selectively Incorporated Concept Embeddings

Authors: Yi Luan, Mari Ostendorf, Hannaneh Hajishirzi

Abstract: This paper describes our submission for the SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. We extend the end-to-end relation extraction model of (Miwa and Bansal) with enhancements such as a character-level encoding attention mechanism on selecting pretrained concept candidate embeddings. Our official submission ranked the second in relatio… ▽ More This paper describes our submission for the SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. We extend the end-to-end relation extraction model of (Miwa and Bansal) with enhancements such as a character-level encoding attention mechanism on selecting pretrained concept candidate embeddings. Our official submission ranked the second in relation classification task (Subtask 1.1 and Subtask 2 Senerio 2), and the first in the relation extraction task (Subtask 2 Scenario 1). △ Less

Submitted 26 August, 2018; originally announced August 2018.

arXiv:1808.06729 [pdf, other]

You Shall Know the Most Frequent Sense by the Company it Keeps

Authors: Bradley Hauer, Yixing Luan, Grzegorz Kondrak

Abstract: Identification of the most frequent sense of a polysemous word is an important semantic task. We introduce two concepts that can benefit MFS detection: companions, which are the most frequently co-occurring words, and the most frequent translation in a bitext. We present two novel methods that incorporate these new concepts, and show that they advance the state of the art on MFS detection. Identification of the most frequent sense of a polysemous word is an important semantic task. We introduce two concepts that can benefit MFS detection: companions, which are the most frequently co-occurring words, and the most frequent translation in a bitext. We present two novel methods that incorporate these new concepts, and show that they advance the state of the art on MFS detection. △ Less

Submitted 15 February, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

Comments: Updated to reflect the camera-ready version accepted to ICSC 2019

Journal ref: Proceedings of IEEE ICSC 2019

arXiv:1806.09250 [pdf]

doi 10.1109/TNS.2019.2900480

Electronics of Time-of-flight Measurement for Back-n at CSNS

Authors: T. Yu, P. Cao, X. Y. Ji, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. Jing, L. Kang , et al. (46 additional authors not shown)

Abstract: Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI… ▽ More Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXIe (Peripheral Component Interconnect Express eXtensions for Instrumentation) platform, which is composed of FDM (Field Digitizer Modules), TCM (Trigger and Clock Module), and SCM (Signal Conditioning Module). T0 signal synchronous to the CSNS accelerator represents the neutron emission from the target. It is the start of time stamp. The trigger and clock module (TCM) receives, synchronizes and distributes the T0 signal to each FDM based on the PXIe backplane bus. Meantime, detector signals after being conditioned are fed into FDMs for waveform digitizing. First sample point of the signal is the stop of time stamp. According to the start, stop time stamp and the time of signal over threshold, the total TOF can be obtained. FPGA-based (Field Programmable Gate Array) TDC is implemented on TCM to accurately acquire the time interval between the asynchronous T0 signal and the global synchronous clock phase. There is also an FPGA-based TDC on FDM to accurately acquire the time interval between T0 arriving at FDM and the first sample point of the detector signal, the over threshold time of signal is obtained offline. This method for TOF measurement is efficient and not needed for additional modules. Test result shows the accuracy of TOF is sub-nanosecond and can meet the requirement for Back-n at CSNS. △ Less

Submitted 24 June, 2018; originally announced June 2018.

Comments: 4 pages, 13 figures, 21st IEEE Real Time Conference

arXiv:1806.09249 [pdf]

T0 Fan-out for Back-n White Neutron Facility at CSNS

Authors: X. Y. Ji, P. Cao, T. Yu, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. Jing, L. Kang , et al. (46 additional authors not shown)

Abstract: the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal,… ▽ More the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal, generated from the CSNS accelerator, represents this start time. Besides, the T0 signal is also used as the gate control signal that triggers the readout electronics. Obviously, the timing precision of T0 directly affects the measurement precision of TOF and controls the running or readout electronics. In this paper, the T0 fan-out for Back-n white neutron facility at CSNS is proposed. The T0 signal travelling from the CSNS accelerator is fanned out to the two underground experiment stations respectively over long cables. To guarantee the timing precision, T0 signal is conditioned with good signal edge. Furthermore, techniques of signal pre-emphasizing and equalizing are used to improve signal quality after T0 being transmitted over long cables with about 100 m length. Experiments show that the T0 fan-out works well, the T0 signal transmitted over 100 m remains a good time resolution with a standard deviation of 25 ps. It absolutely meets the required accuracy of the measurement of TOF. △ Less

Submitted 24 June, 2018; originally announced June 2018.

Comments: 3 pages, 6 figures, the 21st IEEE Real Time Conference

arXiv:1711.07660 [pdf, other]

doi 10.1103/PhysRevB.98.195131

Evolution of the optimal trial wave function with interactions in fractional Chern insulators

Authors: Yumin Luan, Yinhan Zhang, Junren Shi

Abstract: We show that the optimal trial wave function of a fractional Chern insulator depends on the form of its electron-electron interaction. The gauge of single particle Bloch bases for constructing the optimal trail wave function is obtained by applying the variational principle proposed by Zhang et al. [Phys. Rev. B 93, 165129 (2016)]. We consider a short-range interaction, the Coulomb interaction, an… ▽ More We show that the optimal trial wave function of a fractional Chern insulator depends on the form of its electron-electron interaction. The gauge of single particle Bloch bases for constructing the optimal trail wave function is obtained by applying the variational principle proposed by Zhang et al. [Phys. Rev. B 93, 165129 (2016)]. We consider a short-range interaction, the Coulomb interaction, and an interpolation between them, and determine the evolution of the optimal gauge with the different interactions. We compare the optimal gauge with those proposed by Qi [Phys. Rev. Lett. 107, 126803 (2011)] and Wu et al. [Phys. Rev. B 86, 085129 (2012)], and find that Wu et al.'s gauge is close to the optimal gauge when the interaction is a certain mixture of the Coulomb interaction and the short-range interaction, while Qi's gauge is qualitatively different from the optimal gauge in all the cases. Both the gauges deviate significantly from the optimal gauge when the short-range component of the interaction becomes more prominent. △ Less

Submitted 14 December, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 7 pages, 5 figures

Journal ref: Phys. Rev. B 98, 195131 (2018)

arXiv:1710.07388 [pdf, other]

Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models

Authors: Yi Luan, Chris Brockett, Bill Dolan, Jianfeng Gao, Michel Galley

Abstract: Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training neural conversation models that leverages both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be mode… ▽ More Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training neural conversation models that leverages both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be modeled. Experiments show that our approach leads to significant improvements over baseline model quality, generating responses that capture more precisely speakers' traits and speaking styles. The model offers the benefits of being algorithmically simple and easy to implement, and not relying on large quantities of data representing specific individual speakers. △ Less

Submitted 19 October, 2017; originally announced October 2017.

arXiv:1708.06075 [pdf, other]

Scientific Information Extraction with Semi-supervised Neural Tagging

Authors: Yi Luan, Mari Ostendorf, Hannaneh Hajishirzi

Abstract: This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-ba… ▽ More This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-based semi-supervised algorithm together with a data selection scheme to leverage unannotated articles. Both inductive and transductive semi-supervised learning strategies outperform state-of-the-art information extraction performance on the 2017 SemEval Task 10 ScienceIE task. △ Less

Submitted 20 August, 2017; originally announced August 2017.

Comments: accepted by EMNLP 2017

arXiv:1603.09457 [pdf, other]

LSTM based Conversation Models

Authors: Yi Luan, Yangfeng Ji, Mari Ostendorf

Abstract: In this paper, we present a conversational model that incorporates both context and participant role for two-party conversations. Different architectures are explored for integrating participant role and context information into a Long Short-term Memory (LSTM) language model. The conversational model can function as a language model or a language generation model. Experiments on the Ubuntu Dialog… ▽ More In this paper, we present a conversational model that incorporates both context and participant role for two-party conversations. Different architectures are explored for integrating participant role and context information into a Long Short-term Memory (LSTM) language model. The conversational model can function as a language model or a language generation model. Experiments on the Ubuntu Dialog Corpus show that our model can capture multiple turn interaction between participants. The proposed method outperforms a traditional LSTM model as measured by language model perplexity and response ranking. Generated responses show characteristic differences between the two participant roles. △ Less

Submitted 31 March, 2016; originally announced March 2016.

Showing 1–50 of 58 results for author: Luan, Y