Search | arXiv e-print repository

arXiv:2405.20563 [pdf, ps, other]

Limit sets, internal chain transitivity and orbital shadowing of tree-shifts defined on Markov-Cayley trees

Authors: Jung-Chao Ban, Nai-Zhu Huang, Guan-Yu Lai

Abstract: In this paper, we introduce the concepts of $ω$-limit sets and pseudo orbits for a tree-shift defined on a Markov-Cayley tree, extending the results of tree-shifts defined on $d$-trees [5,6]. Firstly, we establish the relationships between $ω$-limit sets and we introduce a modified definition of $ω$-limit set based on complete prefix sets (Theorems 1.4 and 1.9). Secondly, we introduce the concept… ▽ More In this paper, we introduce the concepts of $ω$-limit sets and pseudo orbits for a tree-shift defined on a Markov-Cayley tree, extending the results of tree-shifts defined on $d$-trees [5,6]. Firstly, we establish the relationships between $ω$-limit sets and we introduce a modified definition of $ω$-limit set based on complete prefix sets (Theorems 1.4 and 1.9). Secondly, we introduce the concept of projected pseudo orbits and investigate the concept of the shadowing property (Theorems 1.12 and 1.14). △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.19324 [pdf, other]

Entropy of axial product of multiplicative subshifts

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai, Lingmin Liao

Abstract: We obtain the entropy and the surface entropy of the axial products on $\mathbb{N}^d$ and the $d$-tree $T^d$ of two types of systems: the subshift and the multiplicative subshift. We obtain the entropy and the surface entropy of the axial products on $\mathbb{N}^d$ and the $d$-tree $T^d$ of two types of systems: the subshift and the multiplicative subshift. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18822 [pdf, ps, other]

Hausdorff dimensions of affine multiplicative subshifts

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai, Lingmin Liao

Abstract: We calculate the Minkowski and Hausdorff dimensions of affine multiplicative subshifts on $\mathbb{N}$. We calculate the Minkowski and Hausdorff dimensions of affine multiplicative subshifts on $\mathbb{N}$. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2401.09695 [pdf]

Should ChatGPT Write Your Breakup Text? Exploring the Role of AI in Relationship Dissolution

Authors: Yue Fu, Yixin Chen, Zelia Gomes Da Costa Lai, Alexis Hiniker

Abstract: Relationships are essential to our happiness and wellbeing. The dissolution of a relationship, the final stage of relationship's lifecycle and one of the most stressful events in an individual's life, can have profound and long-lasting impacts on people. With the breakup process increasingly facilitated by computer-mediated communication (CMC), and the likely future influence of AI-mediated commun… ▽ More Relationships are essential to our happiness and wellbeing. The dissolution of a relationship, the final stage of relationship's lifecycle and one of the most stressful events in an individual's life, can have profound and long-lasting impacts on people. With the breakup process increasingly facilitated by computer-mediated communication (CMC), and the likely future influence of AI-mediated communication (AIMC) tools, we conducted a semi-structured interview study with 21 participants. We aim to understand: 1) the current role of technology in the breakup process, 2) the needs and support individuals have during the process, and 3) how AI might address these needs. Our research shows that people have distinct needs at various stages of ending a relationship. Presently, technology is used for information gathering and community support, acting as a catalyst for breakups, enabling ghosting and blocking, and facilitating communication. Participants anticipate that AI could aid in sense-making of their relationship leading up to the breakup, act as a mediator, assist in crafting appropriate wording, tones, and language during breakup conversations, and support companionship, reflection, recovery, and growth after a breakup. Our findings also demonstrate an overlap between the breakup process and the Transtheoretical Model (TTM) of behavior change. Through the lens of TTM, we explore the potential support and affordances AI could offer in breakups, including its benefits and the necessary precautions regarding AI's role in this sensitive process. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.05320 [pdf, other]

Hausdorff dimensions of topologically transitive Markov hom tree-shifts

Authors: Jung-Chao Ban, Guan-Yu Lai, Yu-Liang Wu

Abstract: This paper features an analog of Sanov's theorem for finite-state Markov chains indexed by rooted d-trees, obtained via the method of types in the classical analysis of large deviations. Along with the theorem comes two applications: an almost-sure type convergence of sample means and a formula for the Hausdorff dimension of the symbolic space associated with the irreducible Markov chain. This paper features an analog of Sanov's theorem for finite-state Markov chains indexed by rooted d-trees, obtained via the method of types in the classical analysis of large deviations. Along with the theorem comes two applications: an almost-sure type convergence of sample means and a formula for the Hausdorff dimension of the symbolic space associated with the irreducible Markov chain. △ Less

Submitted 10 January, 2024; originally announced January 2024.

MSC Class: 28A80; 60J10 (Primary) 37B10 (Secondary)

arXiv:2311.10614 [pdf, other]

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

Authors: Ruohong Zhang, Luyu Gao, Chen Zheng, Zhen Fan, Guokun Lai, Zheng Zhang, Fangzhou Ai, Yiming Yang, Hongxia Yang

Abstract: Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries.… ▽ More Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: Work in progress

arXiv:2309.00309 [pdf, other]

The strip entropy approximation of Markov shifts on trees

Authors: Jung-Chao Ban, Guan-Yu Lai, Cheng-Yu Tsai

Abstract: The strip entropy is studied in this article. We prove that the strip entropy approximation is valid for every ray of a golden-mean tree. This result extends the previous result of [Petersen-Salama, Discrete \& Continuous Dynamical Systems, 2020] on the conventional 2-tree. Lastly, we prove that the strip entropy approximation is valid for eventually periodic rays of a class of Markov-Cayley trees… ▽ More The strip entropy is studied in this article. We prove that the strip entropy approximation is valid for every ray of a golden-mean tree. This result extends the previous result of [Petersen-Salama, Discrete \& Continuous Dynamical Systems, 2020] on the conventional 2-tree. Lastly, we prove that the strip entropy approximation is valid for eventually periodic rays of a class of Markov-Cayley trees. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2303.13011 [pdf, other]

The entropy structures of axial products on $\mathbb{N}^d$ and Trees

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai

Abstract: In this paper, we first concentrate on the possible values and dense property of entropies for isotropic and anisotropic axial products of subshifts of finite type (SFTs) on $\mathbb{N}^d$ and $d$-tree $\mathcal{T}_d$. We prove that the entropies of isotropic and anisotropic axial products of SFTs on $\mathbb{N}^d$ are dense in $[0,\infty)$, and the same result also holds for anisotropic axial pro… ▽ More In this paper, we first concentrate on the possible values and dense property of entropies for isotropic and anisotropic axial products of subshifts of finite type (SFTs) on $\mathbb{N}^d$ and $d$-tree $\mathcal{T}_d$. We prove that the entropies of isotropic and anisotropic axial products of SFTs on $\mathbb{N}^d$ are dense in $[0,\infty)$, and the same result also holds for anisotropic axial products of SFTs on $\mathcal{T}_d$. However, the result is no longer true for isotropic axial products of SFTs on $\mathcal{T}_d$. Next, motivated by the work of Johnson, Kass and Madden [16], and Schraudner [28], we establish the entropy formula and structures for full axial extension shifts on $\mathbb{N}^d$ and $\mathcal{T}_d$. Combining the aforementioned results with the findings on the surface entropy for multiplicative integer systems [8] on $\mathbb{N}^d$ enables us to estimate the surface entropy for the full axial extension shifts on $\mathcal{T}_d$. Finally, we extend the results of full axial extension shifts on $\mathcal{T}_d$ to general trees. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.11550 [pdf, ps, other]

On the discrete modified KP hierarchy: tau functions, Fay identity and squared eigenfunction symmetries

Authors: Kelei Tian, Guangmiao Lai, Ge Yi, Ying Xu

Abstract: In this paper, we prove the existence of tau functions of the discrete modified KP hierarchy and define the squared eigenfunction symmetry. Meanwhile, the Fay identity with its difference form, the squared eigenfunction potentials and the symmetry flow acting on tau functions are obtained. In this paper, we prove the existence of tau functions of the discrete modified KP hierarchy and define the squared eigenfunction symmetry. Meanwhile, the Fay identity with its difference form, the squared eigenfunction potentials and the symmetry flow acting on tau functions are obtained. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2210.09115 [pdf, ps, other]

doi 10.1063/5.0118652

Boundary complexity and surface entropy of 2-multiplicative integer systems on $\mathbb{N}^d$

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai

Abstract: In this article, we introduce the concept of the boundary complexity and prove that for a 2-multiplicative integer system (2-MIS) $X^{p}_Ω$ on $\mathbb{N}$ (or $X^{\bf p}_Ω$ on $\mathbb{N}^d,d\geq 2$), every point in $[h(X^p_Ω), \log r]$ can be realized as a boundary complexity of a 2-MIS with a specific speed, where r stands for the number of the alphabets. The result is new and quite different f… ▽ More In this article, we introduce the concept of the boundary complexity and prove that for a 2-multiplicative integer system (2-MIS) $X^{p}_Ω$ on $\mathbb{N}$ (or $X^{\bf p}_Ω$ on $\mathbb{N}^d,d\geq 2$), every point in $[h(X^p_Ω), \log r]$ can be realized as a boundary complexity of a 2-MIS with a specific speed, where r stands for the number of the alphabets. The result is new and quite different from $\mathbb{N}^d$ subshifts of finite type (SFT) for $d\geq 1$. Furthermore, the rigorous formula of surface entropy for a $\mathbb{N}^d$ 2-MIS is also presented. This provides an efficient method to calculate the topological entropy for $\mathbb{N}^d$ 2-MIS and also provides an intrinsic differences between $\mathbb{N}^d$ $k$-MIS and SFTs for $d\geq 1$ and $k\geq 2$. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.03314 [pdf, other]

doi 10.1088/1361-6420/acc2b6

Uniformly convex neural networks and non-stationary iterated network Tikhonov (iNETT) method

Authors: Davide Bianchi, Guanghao Lai, Wenbin Li

Abstract: We propose a non-stationary iterated network Tikhonov (iNETT) method for the solution of ill-posed inverse problems. The iNETT employs deep neural networks to build a data-driven regularizer, and it avoids the difficult task of estimating the optimal regularization parameter. To achieve the theoretical convergence of iNETT, we introduce uniformly convex neural networks to build the data-driven reg… ▽ More We propose a non-stationary iterated network Tikhonov (iNETT) method for the solution of ill-posed inverse problems. The iNETT employs deep neural networks to build a data-driven regularizer, and it avoids the difficult task of estimating the optimal regularization parameter. To achieve the theoretical convergence of iNETT, we introduce uniformly convex neural networks to build the data-driven regularizer. Rigorous theories and detailed algorithms are proposed for the construction of convex and uniformly convex neural networks. In particular, given a general neural network architecture, we prescribe sufficient conditions to achieve a trained neural network which is component-wise convex or uniformly convex; moreover, we provide concrete examples of realizing convexity and uniform convexity in the modern U-net architecture. With the tools of convex and uniformly convex neural networks, the iNETT algorithm is developed and a rigorous convergence analysis is provided. Lastly, we show applications of the iNETT algorithm in 2D computerized tomography, where numerical examples illustrate the efficacy of the proposed algorithm. △ Less

Submitted 1 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

MSC Class: 47A52; 65F22; 68T07

arXiv:2208.04089 [pdf]

The mechanism of Li deposition on the Cu substrates in the anode-free Li metal batteries

Authors: Genming Lai, Junyu Jiao, Chi Fang, Liyuan Sheng, Yao Jiang, Chuying Ouyang, Jiaxin Zheng

Abstract: Due to the rapid growth in the demand for high-energy-density Li batteries and insufficient global Li reserves, the anode-free Li metal batteries are receiving increasing attention. Various strategies, such as surface modification and structural design of Cu current collectors, have been proposed to stabilize the anode-free Li metal batteries. Unfortunately, the mechanism of Li deposition on the C… ▽ More Due to the rapid growth in the demand for high-energy-density Li batteries and insufficient global Li reserves, the anode-free Li metal batteries are receiving increasing attention. Various strategies, such as surface modification and structural design of Cu current collectors, have been proposed to stabilize the anode-free Li metal batteries. Unfortunately, the mechanism of Li deposition on the Cu surfaces with the different Miller indices is poorly understood, especially on the atomic scale. Here, a large-scale molecular dynamics simulation of Li deposition on the Cu substrates was performed in the anode-free Li metal batteries. The results show that the Li layers on the Cu (100), Cu (110), and Cu (111) surfaces are closer to the structures of Li (110), Li (100), and Li (110) surfaces, respectively. The mechanism was studied through the surface similarity analysis, potential energy surfaces, and lattice features. Finally, a proposal to reduce the fraction of the (110) facet in commercial Cu foils was made to improve the reversibility and stability of Li plating/stripping in the anode-free Li metal batteries. △ Less

Submitted 8 August, 2022; originally announced August 2022.

arXiv:2207.11381 [pdf, other]

On spatial entropy and periodic entropies of Two-dimensional Shifts of Finite Type

Authors: Wen-Guei Hu, Guan-Yu Lai, Song-Sun Lin

Abstract: Topological entropy or spatial entropy is a way to measure the complexity of shift spaces. This study investigates the relationships between the spatial entropy and the various periodic entropies which are computed by skew-coordinated systems $γ\in GL_2(\mathbb{Z})$ on two dimensional shifts of finite type. Topological entropy or spatial entropy is a way to measure the complexity of shift spaces. This study investigates the relationships between the spatial entropy and the various periodic entropies which are computed by skew-coordinated systems $γ\in GL_2(\mathbb{Z})$ on two dimensional shifts of finite type. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.06366 [pdf, other]

N-Grammer: Augmenting Transformers with latent n-grams

Authors: Aurko Roy, Rohan Anil, Guangda Lai, Benjamin Lee, Jeffrey Zhao, Shuyuan Zhang, Shibo Wang, Ye Zhang, Shen Wu, Rigel Swavely, Tao, Yu, Phuong Dao, Christopher Fifty, Zhifeng Chen, Yonghui Wu

Abstract: Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we prop… ▽ More Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer. We open-source our model for reproducibility purposes in Jax. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 8 pages, 2 figures

arXiv:2204.07705 [pdf, other]

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Authors: Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza , et al. (15 additional authors not shown)

Abstract: How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting,… ▽ More How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models. △ Less

Submitted 24 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: Accepted to EMNLP 2022, 25 pages

arXiv:2204.02604 [pdf, other]

Interactive Evolutionary Multi-Objective Optimization via Learning-to-Rank

Authors: Ke Li, Guiyu Lai, Xin Yao

Abstract: In practical multi-criterion decision-making, it is cumbersome if a decision maker (DM) is asked to choose among a set of trade-off alternatives covering the whole Pareto-optimal front. This is a paradox in conventional evolutionary multi-objective optimization (EMO) that always aim to achieve a well balance between convergence and diversity. In essence, the ultimate goal of multi-objective optimi… ▽ More In practical multi-criterion decision-making, it is cumbersome if a decision maker (DM) is asked to choose among a set of trade-off alternatives covering the whole Pareto-optimal front. This is a paradox in conventional evolutionary multi-objective optimization (EMO) that always aim to achieve a well balance between convergence and diversity. In essence, the ultimate goal of multi-objective optimization is to help a decision maker (DM) identify solution(s) of interest (SOI) achieving satisfactory trade-offs among multiple conflicting criteria. Bearing this in mind, this paper develops a framework for designing preference-based EMO algorithms to find SOI in an interactive manner. Its core idea is to involve human in the loop of EMO. After every several iterations, the DM is invited to elicit her feedback with regard to a couple of incumbent candidates. By collecting such information, her preference is progressively learned by a learning-to-rank neural network and then applied to guide the baseline EMO algorithm. Note that this framework is so general that any existing EMO algorithm can be applied in a plug-in manner. Experiments on $48$ benchmark test problems with up to 10 objectives fully demonstrate the effectiveness of our proposed algorithms for finding SOI. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.08970 [pdf, other]

Thermodynamic formalism and large deviation principle of multiplicative Ising models

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai

Abstract: The aim of this study is tree-fold. First, we investigate the thermodynamics of the Ising models with respect to 2-multiple Hamiltonians. This extends the previous results of [Chazotte and Redig, Electron. J. Probably., 2014] to $\mathbb{N}^d$. Second, we establish the large deviation principle (LDP) of the average $\frac{1}{N} S_N^G$, where $S_N^G$ is a 2-multiple sum along a semigroup generated… ▽ More The aim of this study is tree-fold. First, we investigate the thermodynamics of the Ising models with respect to 2-multiple Hamiltonians. This extends the previous results of [Chazotte and Redig, Electron. J. Probably., 2014] to $\mathbb{N}^d$. Second, we establish the large deviation principle (LDP) of the average $\frac{1}{N} S_N^G$, where $S_N^G$ is a 2-multiple sum along a semigroup generated by k numbers which are k co-primes. This extends the previous results [Ban et al. Indag. Math., 2021] to a board class of the long-range interactions. Finally, the results described above are generalized to the multidimensional lattice $\mathbb{N}^d, d\geq1$. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2110.13569 [pdf, other]

doi 10.1145/3472688.3472689

Two Decades of Game Jams

Authors: Gorm Lai, Annakaisa Kultima, Foaad Khosmood, Johanna Pirker, Allan Fowler, Ilaria Vecchi, William Latham, Frederic Fol Leymarie

Abstract: In less than a year's time, March 2022 will mark the twentieth anniversary of the first documented game jam, the Indie Game Jam, which took place in Oakland, California in 2002. Initially, game jams were widely seen as frivolous activities. Since then, they have taken the world by storm. Game jams have not only become part of the day-to-day process of many game developers, but jams are also used f… ▽ More In less than a year's time, March 2022 will mark the twentieth anniversary of the first documented game jam, the Indie Game Jam, which took place in Oakland, California in 2002. Initially, game jams were widely seen as frivolous activities. Since then, they have taken the world by storm. Game jams have not only become part of the day-to-day process of many game developers, but jams are also used for activist purposes, for learning and teaching, as part of the experience economy, for making commercial prototypes that gamers can vote on, and more. Beyond only surveying game jams and the relevant published scientific literature from the last two decades, this paper has several additional contributions. It builds a history of game jams, and proposes two different taxonomies of game jams - a historical and a categorical. In addition, it discusses the definition of game jam and identifies the most active research areas within the game jam community such as the interplay and development with local communities, the study and analysis of game jammers and organisers, and works that bring a critical look on game jams. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Journal ref: ICGJ 2021: Sixth Annual International Conference on Game Jams, Hackathons, and Game Creation Events

arXiv:2108.12986 [pdf, other]

Characterization and Topological Behavior of Homomorphism Tree-Shifts

Authors: Jung-Chao Ban, Chih-Hung Chang, Wen-Guei Hu, Guan-Yu Lai, Yu-Liang Wu

Abstract: The purpose of this article is twofold. On one hand, we reveal the equivalence of shift of finite type between a one-sided shift $X$ and its associated hom tree-shift $\mathcal{T}_{X}$, as well as the equivalence in the sofic shift. On the other hand, we investigate the interrelationship among the comparable mixing properties on tree-shifts as those on multidimensional shift spaces. They include i… ▽ More The purpose of this article is twofold. On one hand, we reveal the equivalence of shift of finite type between a one-sided shift $X$ and its associated hom tree-shift $\mathcal{T}_{X}$, as well as the equivalence in the sofic shift. On the other hand, we investigate the interrelationship among the comparable mixing properties on tree-shifts as those on multidimensional shift spaces. They include irreducibility, topologically mixing, block gluing, and strong irreducibility, all of which are defined in the spirit of classical multidimensional shift, complete prefix code (CPC), and uniform CPC. In summary, the mixing properties defined in all three manners coincide for $\mathcal{T}_{X}$. Furthermore, an equivalence between irreducibility on $\mathcal{T}_{A}$ and irreducibility on $X_A$ are seen, and so is one between topologically mixing on $\mathcal{T}_{A}$ and mixing property on $X_A$, where $X_A$ is the one-sided shift space induced by the matrix $A$ and $T_A$ is the associated tree-shift. These equivalences are consistent with the mixing properties on $X$ or $X_A$ when viewed as a degenerate tree-shift. △ Less

Submitted 30 August, 2021; originally announced August 2021.

MSC Class: 37B10; 37E25

arXiv:2106.10979 [pdf]

Self-healing mechanism of lithium in lithium metal batteries

Authors: Junyu Jiao, Genming Lai, Liang Zhao, Jiaze Lu, Qidong Li, Xianqi Xu, Yao Jiang, Yan-Bing He, Chuying Ouyang, Feng Pan, Hong Li, Jiaxin Zheng

Abstract: Li metal is an ideal anode material for use in state-of-the-art secondary batteries. However, Li-dendrite growth is a safety concern and results in low coulombic efficiency, which significantly restricts the commercial application of Li secondary batteries. Unfortunately, the Li deposition (growth) mechanism is poorly understood on the atomic scale. Here, we used machine learning to construct a Li… ▽ More Li metal is an ideal anode material for use in state-of-the-art secondary batteries. However, Li-dendrite growth is a safety concern and results in low coulombic efficiency, which significantly restricts the commercial application of Li secondary batteries. Unfortunately, the Li deposition (growth) mechanism is poorly understood on the atomic scale. Here, we used machine learning to construct a Li potential model with quantum-mechanical computational accuracy. Molecular dynamics simulations in this study with this model revealed two self-healing mechanisms in a large Li-metal system, viz. surface self-healing and bulk self-healing, and identified three Li-dendrite morphologies under different conditions, viz. "needle", "mushroom", and "hemisphere". Finally, we introduce the concepts of local current density and variance in local current density to supplement the critical current density when evaluating the probability of self-healing. △ Less

Submitted 27 September, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

arXiv:2106.09860 [pdf, other]

Large Deviation Principle of Multidimensional Multiple Averages on $\mathbb{N}^d$

Authors: Jung-Chao Ban, Wen-Guei Hu, Guan-Yu Lai

Abstract: This paper establishs the large deviation principle (LDP) for multiple averages on $\mathbb{N}^d$. We extend the previous work of [Carinci et al., Indag. Math. 2012] to multidimensional lattice $\mathbb{N}^d$ for $d\geq 2$. The same technique is also applicable to the weighted multiple average launched by Fan [Fan, Adv. Math. 2021]. Finally, the boundary conditions are imposed to the multiple sum… ▽ More This paper establishs the large deviation principle (LDP) for multiple averages on $\mathbb{N}^d$. We extend the previous work of [Carinci et al., Indag. Math. 2012] to multidimensional lattice $\mathbb{N}^d$ for $d\geq 2$. The same technique is also applicable to the weighted multiple average launched by Fan [Fan, Adv. Math. 2021]. Finally, the boundary conditions are imposed to the multiple sum and explicit formulae of the energy functions with respect to the boundary conditions are obtained. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:2012.11291 [pdf]

How to estimate the association between change in a risk factor and a health outcome?

Authors: Michail Katsoulis, Alvina G Lai, Dimitra-Kleio Kipourou, Reecha Sofat, Manuel Gomes, Amitava Banerjee, Spiros Denaxas, Thomas R Lumbers, Kostas Tsilidis, Harry Hemingway, Karla Diaz-Ordaz

Abstract: Estimating the effect of a change in a particular risk factor and a chronic disease requires information on the risk factor from two time points; the enrolment and the first follow-up. When using observational data to study the effect of such an exposure (change in risk factor) extra complications arise, namely (i) when is time zero? and (ii) which information on confounders should we account for… ▽ More Estimating the effect of a change in a particular risk factor and a chronic disease requires information on the risk factor from two time points; the enrolment and the first follow-up. When using observational data to study the effect of such an exposure (change in risk factor) extra complications arise, namely (i) when is time zero? and (ii) which information on confounders should we account for in this type of analysis? From enrolment or the 1st follow-up? Or from both?. The combination of these questions has proven to be very challenging. Researchers have applied different methodologies with mixed success, because the different choices made when answering these questions induce systematic bias. Here we review these methodologies and highlight the sources of bias in each type of analysis. We discuss the advantages and the limitations of each method ending by making our recommendations on the analysis plan. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: 13 pages, 2 Tables, 3 Figures

MSC Class: 62-07 (in MSC2010) or 62R07 (in MSC2020)

arXiv:2009.08595 [pdf, ps, other]

Unsupervised Parallel Corpus Mining on Web Data

Authors: Guokun Lai, Zihang Dai, Yiming Yang

Abstract: With a large amount of parallel data, neural machine translation systems are able to deliver human-level performance for sentence-level translation. However, it is costly to label a large amount of parallel data by humans. In contrast, there is a large-scale of parallel corpus created by humans on the Internet. The major difficulty to utilize them is how to filter them out from the noise website e… ▽ More With a large amount of parallel data, neural machine translation systems are able to deliver human-level performance for sentence-level translation. However, it is costly to label a large amount of parallel data by humans. In contrast, there is a large-scale of parallel corpus created by humans on the Internet. The major difficulty to utilize them is how to filter them out from the noise website environments. Current parallel data mining methods all require labeled parallel data as the training source. In this paper, we present a pipeline to mine the parallel corpus from the Internet in an unsupervised manner. On the widely used WMT'14 English-French and WMT'16 English-German benchmarks, the machine translator trained with the data extracted by our pipeline achieves very close performance to the supervised results. On the WMT'16 English-Romanian and Romanian-English benchmarks, our system produces new state-of-the-art results, 39.81 and 38.95 BLEU scores, even compared with supervised approaches. △ Less

Submitted 17 September, 2020; originally announced September 2020.

arXiv:2006.03236 [pdf, other]

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Authors: Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

Abstract: With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With… ▽ More With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension. The code and pretrained checkpoints are available at https://github.com/laiguokun/Funnel-Transformer. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2005.09324 [pdf, other]

Towards Friendly Mixed Initiative Procedural Content Generation: Three Pillars of Industry

Authors: Gorm Lai, William Latham, Frederic Fol Leymarie

Abstract: While the games industry is moving towards procedural content generation (PCG) with tools available under popular platforms such as Unreal, Unity or Houdini, and video game titles like No Man's Sky and Horizon Zero Dawn taking advantage of PCG, the gap between academia and industry is as wide as it has ever been, in terms of communication and sharing methods. One of the authors, has worked on both… ▽ More While the games industry is moving towards procedural content generation (PCG) with tools available under popular platforms such as Unreal, Unity or Houdini, and video game titles like No Man's Sky and Horizon Zero Dawn taking advantage of PCG, the gap between academia and industry is as wide as it has ever been, in terms of communication and sharing methods. One of the authors, has worked on both sides of this gap and in an effort to shorten it and increase the synergy between the two sectors, has identified three design pillars for PCG using mixed-initiative interfaces. The three pillars are Respect Designer Control, Respect the Creative Process and Respect Existing Work Processes. Respecting designer control is about creating a tool that gives enough control to bring out the designer's vision. Respecting the creative process concerns itself with having a feedback loop that is short enough, that the creative process is not disturbed. Respecting existing work processes means that a PCG tool should plug in easily to existing asset pipelines. As academics and communicators, it is surprising that publications often do not describe ways for developers to use our work or lack considerations for how a piece of work might fit into existing content pipelines. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2004.11934 [pdf, other]

Correlation-aware Unsupervised Change-point Detection via Graph Neural Networks

Authors: Ruohong Zhang, Yu Hao, Donghan Yu, Wei-Cheng Chang, Guokun Lai, Yiming Yang

Abstract: Change-point detection (CPD) aims to detect abrupt changes over time series data. Intuitively, effective CPD over multivariate time series should require explicit modeling of the dependencies across input variables. However, existing CPD methods either ignore the dependency structures entirely or rely on the (unrealistic) assumption that the correlation structures are static over time. In this pap… ▽ More Change-point detection (CPD) aims to detect abrupt changes over time series data. Intuitively, effective CPD over multivariate time series should require explicit modeling of the dependencies across input variables. However, existing CPD methods either ignore the dependency structures entirely or rely on the (unrealistic) assumption that the correlation structures are static over time. In this paper, we propose a Correlation-aware Dynamics Model for CPD, which explicitly models the correlation structure and dynamics of variables by incorporating graph neural networks into an encoder-decoder framework. Extensive experiments on synthetic and real-world datasets demonstrate the advantageous performance of the proposed model on CPD tasks over strong baselines, as well as its ability to classify the change-points as correlation changes or independent changes. Keywords: Multivariate Time Series, Change-point Detection, Graph Neural Networks △ Less

Submitted 13 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted for publication in the International Conference on Neural Information Processing (ICONIP) 2020 Original paper is 12 pages, additional appendix is available on arxiv

MSC Class: I.2.6

Journal ref: ICONIP 2020: Neural Information Processing

arXiv:2004.01170 [pdf, other]

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

Authors: Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

Abstract: We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that b… ▽ More We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes. 3D bounding box parameters are estimated in one pass for every point, aggregated through graph convolutions, and fed into a branch of the network that predicts latent codes representing the shape of each detected object. The latent shape space and shape decoder are learned on a synthetic dataset and then used as supervision for the end-to-end training of the 3D object detection pipeline. Thus our model is able to extract shapes without access to ground-truth shape information in the target dataset. During experiments, we find that our proposed method achieves state-of-the-art results by ~5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Waymo Open Dataset, while reproducing the shapes of detected cars. △ Less

Submitted 6 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: To appear in CVPR 2020

arXiv:2003.10256 [pdf, ps, other]

Self-similar solutions of the spherically symmetric Euler equations for general equations of state

Authors: Jianjun Chen, Geng Lai

Abstract: The study of spherically symmetric motion is important for the theory of explosion waves. In this paper, we construct rigorously self-similar solutions to the Riemann problem of the spherically symmetric Euler equations for general equations of state. We used the assumption of self-similarity to reduce the spherically symmetric Euler equations to a system of nonlinear ordinary differential equatio… ▽ More The study of spherically symmetric motion is important for the theory of explosion waves. In this paper, we construct rigorously self-similar solutions to the Riemann problem of the spherically symmetric Euler equations for general equations of state. We used the assumption of self-similarity to reduce the spherically symmetric Euler equations to a system of nonlinear ordinary differential equations, from which we obtain detailed structures of solutions besides their existence. △ Less

Submitted 23 March, 2020; originally announced March 2020.

arXiv:1911.10000 [pdf, other]

Topologically Mixing Properties of Multiplicative Integer System

Authors: Jung-Chao Ban, Chih-Hung Chang, Wen-Guei Hu, Guan-Yu Lai, Yu-Liang Wu

Abstract: Motivated from the study of multiple ergodic average, the investigation of multiplicative shift spaces has drawn much of interest among researchers. This paper focuses on the relation of topologically mixing properties between multiplicative shift spaces and traditional shift spaces. Suppose that $\mathsf{X}_Ω^{(l)}$ is the multiplicative subshift derived from the shift space $Ω$ with given… ▽ More Motivated from the study of multiple ergodic average, the investigation of multiplicative shift spaces has drawn much of interest among researchers. This paper focuses on the relation of topologically mixing properties between multiplicative shift spaces and traditional shift spaces. Suppose that $\mathsf{X}_Ω^{(l)}$ is the multiplicative subshift derived from the shift space $Ω$ with given $l > 1$. We show that $\mathsf{X}_Ω^{(l)}$ is (topologically) transitive/mixing if and only if $Ω$ is extensible/mixing. After introducing $l$-directional mixing property, we derive the equivalence between $l$-directional mixing property of $\mathsf{X}_Ω^{(l)}$ and weakly mixing property of $Ω$. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 14 pages, 6 figures

MSC Class: 37B10

arXiv:1909.07009 [pdf, other]

Bridging the domain gap in cross-lingual document classification

Authors: Guokun Lai, Barlas Oguz, Yiming Yang, Veselin Stoyanov

Abstract: The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models trained in one language would not transfer to another la… ▽ More The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models trained in one language would not transfer to another language perfectly due to the natural domain drift across languages and cultures. We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language. We combine state-of-the-art cross-lingual methods with recently proposed methods for weakly supervised learning such as unsupervised pre-training and unsupervised data augmentation to simultaneously close both the language gap and the domain gap in XLU. We show that addressing the domain gap is crucial. We improve over strong baselines and achieve a new state-of-the-art for cross-lingual document classification. △ Less

Submitted 20 September, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

arXiv:1908.11459 [pdf, ps, other]

doi 10.1145/3337722.3341844

Introducing: The Game Jam License

Authors: Gorm Lai, Kai Erenli, Foaad Khosmood, William Latham

Abstract: Since their inception at the Indie Game Jam in 2002, a significant part of game jams has been knowledge sharing and showcasing ideas and work to peers. While various licensing mechanisms have been used for game jams throughout the years, there has never been a licence uniquely designed for artifacts created during a game jam. In this paper, we present to the community the Game Jam License (GJL) wh… ▽ More Since their inception at the Indie Game Jam in 2002, a significant part of game jams has been knowledge sharing and showcasing ideas and work to peers. While various licensing mechanisms have been used for game jams throughout the years, there has never been a licence uniquely designed for artifacts created during a game jam. In this paper, we present to the community the Game Jam License (GJL) which is designed to facilitate that sharing and knowledge transfer, while making sure the original creators retain commercial rights. The Global Game Jam, since 2009, strives to formalise sharing in a similar manner, by having jammers upload and license their creations under Creative Commons Non Commercial Share Alike 3.0 free license. However, the CC family of licenses is not well suited for software. CC is not compatible with most other licenses, and introduces a legal grey area with the division between commercial and non-commercial use. Moreover, open source licences like GPL are well suited for source code, but not for art and design content. Instead the GJL presented in this paper, aims to uphold the original ideas of game jams (sharing and knowledge transfer), while still allowing the original team to hold on to all rights to their creation, without any of the deficiencies of the CC family of licenses. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1907.05221 [pdf, ps, other]

Global non-isentropic rotational supersonic flows in a semi-infinite divergent duct

Authors: Geng Lai

Abstract: Supersonic flows for the two-dimensional (2D) steady full Euler system are studied. We construct a global non-isentropic rotational supersonic flow in a semi-infinite divergent duct. The flow satisfies the slip condition on the walls of the duct, and the state of the flow is given at the inlet of the duct. The solution is constructed by the method of characteristics. The main difficulty for the gl… ▽ More Supersonic flows for the two-dimensional (2D) steady full Euler system are studied. We construct a global non-isentropic rotational supersonic flow in a semi-infinite divergent duct. The flow satisfies the slip condition on the walls of the duct, and the state of the flow is given at the inlet of the duct. The solution is constructed by the method of characteristics. The main difficulty for the global existence is that uniform a priori $C^1$ norm estimate of the solution is hard to obtain, especially when the solution tends to vacuum state. We derive a group of characteristic decompositions for the 2D steady full Euler system. Using these decompositions, we obtain the uniform a priori estimates of the derivatives of the solution. A sufficient condition for the appearance of vacuum is given. We also show that if there is a vacuum then the vacuum is always adjacent to one of the walls, and the interface between gas and vacuum must be straight. △ Less

Submitted 23 March, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

arXiv:1902.01388 [pdf, ps, other]

Re-examination of the Role of Latent Variables in Sequence Modeling

Authors: Zihang Dai, Guokun Lai, Yiming Yang, Shinjae Yoo

Abstract: With latent variables, stochastic recurrent models have achieved state-of-the-art performance in modeling sound-wave sequence. However, opposite results are also observed in other domains, where standard recurrent networks often outperform stochastic models. To better understand this discrepancy, we re-examine the roles of latent variables in stochastic recurrent models for speech density estimati… ▽ More With latent variables, stochastic recurrent models have achieved state-of-the-art performance in modeling sound-wave sequence. However, opposite results are also observed in other domains, where standard recurrent networks often outperform stochastic models. To better understand this discrepancy, we re-examine the roles of latent variables in stochastic recurrent models for speech density estimation. Our analysis reveals that under the restriction of fully factorized output distribution in previous evaluations, the stochastic models were implicitly leveraging intra-step correlation but the standard recurrent baselines were prohibited to do so, resulting in an unfair comparison. To correct the unfairness, we remove such restriction in our re-examination, where all the models can explicitly leverage intra-step correlation with an auto-regressive structure. Over a diverse set of sequential data, including human speech, MIDI music, handwriting trajectory and frame-permuted speech, our results show that stochastic recurrent models fail to exhibit any practical advantage despite the claimed theoretical superiority. In contrast, standard recurrent models equipped with an auto-regressive output distribution consistently perform better, significantly advancing the state-of-the-art results on three speech datasets. △ Less

Submitted 16 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: Code available at https://github.com/zihangdai/reexamine-srnn, accepted by NeurIPS 2019

arXiv:1806.06116 [pdf, other]

Stochastic WaveNet: A Generative Latent Variable Model for Sequential Data

Authors: Guokun Lai, Bohan Li, Guoqing Zheng, Yiming Yang

Abstract: How to model distribution of sequential data, including but not limited to speech and human motions, is an important ongoing research problem. It has been demonstrated that model capacity can be significantly enhanced by introducing stochastic latent variables in the hidden states of recurrent neural networks. Simultaneously, WaveNet, equipped with dilated convolutions, achieves astonishing empiri… ▽ More How to model distribution of sequential data, including but not limited to speech and human motions, is an important ongoing research problem. It has been demonstrated that model capacity can be significantly enhanced by introducing stochastic latent variables in the hidden states of recurrent neural networks. Simultaneously, WaveNet, equipped with dilated convolutions, achieves astonishing empirical performance in natural speech generation task. In this paper, we combine the ideas from both stochastic latent variables and dilated convolutions, and propose a new architecture to model sequential data, termed as Stochastic WaveNet, where stochastic latent variables are injected into the WaveNet structure. We argue that Stochastic WaveNet enjoys powerful distribution modeling capacity and the advantage of parallel training from dilated convolutions. In order to efficiently infer the posterior distribution of the latent variables, a novel inference network structure is designed based on the characteristics of WaveNet architecture. State-of-the-art performances on benchmark datasets are obtained by Stochastic WaveNet on natural speech modeling and high quality human handwriting samples can be generated as well. △ Less

Submitted 15 June, 2018; originally announced June 2018.

Comments: ICML 2018 Workshop

arXiv:1711.03225 [pdf, other]

Large-scale Cloze Test Dataset Created by Teachers

Authors: Qizhe Xie, Guokun Lai, Zihang Dai, Eduard Hovy

Abstract: Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language under… ▽ More Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck. △ Less

Submitted 27 August, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

Comments: EMNLP 2018

arXiv:1710.11577 [pdf, other]

Learning Depthwise Separable Graph Convolution from Data Manifold

Authors: Guokun Lai, Hanxiao Liu, Yiming Yang

Abstract: Convolution Neural Network (CNN) has gained tremendous success in computer vision tasks with its outstanding ability to capture the local latent features. Recently, there has been an increasing interest in extending convolution operations to the non-Euclidean geometry. Although various types of convolution operations have been proposed for graphs or manifolds, their connections with traditional co… ▽ More Convolution Neural Network (CNN) has gained tremendous success in computer vision tasks with its outstanding ability to capture the local latent features. Recently, there has been an increasing interest in extending convolution operations to the non-Euclidean geometry. Although various types of convolution operations have been proposed for graphs or manifolds, their connections with traditional convolution over grid-structured data are not well-understood. In this paper, we show that depthwise separable convolution can be successfully generalized for the unification of both graph-based and grid-based convolution methods. Based on this insight we propose a novel Depthwise Separable Graph Convolution (DSGC) approach which is compatible with the tradition convolution network and subsumes existing convolution methods as special cases. It is equipped with the combined strengths in model expressiveness, compatibility (relatively small number of parameters), modularity and computational efficiency in training. Extensive experiments show the outstanding performance of DSGC in comparison with strong baselines on multi-domain benchmark datasets. △ Less

Submitted 8 November, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

arXiv:1704.04683 [pdf, other]

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Authors: Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy

Abstract: We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluat… ▽ More We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines. △ Less

Submitted 5 December, 2017; v1 submitted 15 April, 2017; originally announced April 2017.

Comments: EMNLP 2017

arXiv:1703.07015 [pdf, other]

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

Authors: Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu

Abstract: Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Proce… ▽ More Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends. Furthermore, we leverage traditional autoregressive model to tackle the scale insensitive problem of the neural network model. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods. All the data and experiment codes are available online. △ Less

Submitted 18 April, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

Comments: Accepted by SIGIR 2018

arXiv:0906.1276 [pdf]

Ultrafast Imaging and the "Phase Problem" for Inelastic X-Ray Scattering

Authors: P. Abbamonte, G. C. L. Wong, D. Cahill, J. P. Reed, R. H. Coridan, N. W. Schmidt, G. H. Lai, Y. I. Joe, D. Casa

Abstract: We describe a new method for imaging ultrafast dynamics in condensed matter using inelastic x-ray scattering (IXS). We use the concepts of causality and irreversibility to construct a general solution to the inverse scattering problem (or "phase problem") for inelastic x-ray scattering, which enables direct imaging of dynamics of the electron density with resolutions of ~1 attosecond (10-18 sec)… ▽ More We describe a new method for imaging ultrafast dynamics in condensed matter using inelastic x-ray scattering (IXS). We use the concepts of causality and irreversibility to construct a general solution to the inverse scattering problem (or "phase problem") for inelastic x-ray scattering, which enables direct imaging of dynamics of the electron density with resolutions of ~1 attosecond (10-18 sec) in time and < 1 A in space. This method is not a Fourier transform of IXS data, but a means to impose causality on the data and reconstruct the charge propagator. The method can also be applied to inelastic electron or neutron scattering. We give a general outline of phenomena that can and cannot be studied with this technique, and provide an outlook for the future. △ Less

Submitted 12 June, 2009; v1 submitted 6 June, 2009; originally announced June 2009.

Comments: General-interest paper; 19 pages, 3 figures; submission to Advanced Materials

Showing 1–40 of 40 results for author: Lai, G