-
Using Sequences of Life-events to Predict Human Lives
Authors:
Germans Savcisens,
Tina Eliassi-Rad,
Lars Kai Hansen,
Laust Mortensen,
Lau Lilleholt,
Anna Rogers,
Ingo Zettler,
Sune Lehmann
Abstract:
Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also rep…
▽ More
Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also represent human lives in a way that shares this structural similarity to language. From one perspective, lives are simply sequences of events: People are born, visit the pediatrician, start school, move to a new location, get married, and so on. Here, we exploit this similarity to adapt innovations from natural language processing to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on arguably the most comprehensive registry data in existence, available for an entire nation of more than six million individuals across decades. Our data include information about life-events related to health, education, occupation, income, address, and working hours, recorded with day-to-day resolution. We create embeddings of life-events in a single vector space showing that this embedding space is robust and highly structured. Our models allow us to predict diverse outcomes ranging from early mortality to personality nuances, outperforming state-of-the-art models by a wide margin. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to identify new potential mechanisms that impact life outcomes and associated possibilities for personalized interventions.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Generating stable molecules using imitation and reinforcement learning
Authors:
Søren Ager Meldgaard,
Jonas Köhler,
Henrik Lund Mortensen,
Mads-Peter V. Christiansen,
Frank Noé,
Bjørk Hammer
Abstract:
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates…
▽ More
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
Atomistic Structure Learning Algorithm with surrogate energy model relaxation
Authors:
Henrik Lund Mortensen,
Søren Ager Meldgaard,
Malthe Kjær Bisbo,
Mads-Peter V. Christiansen,
Bjørk Hammer
Abstract:
The recently proposed Atomistic Structure Learning Algorithm (ASLA) builds on neural network enabled image recognition and reinforcement learning. It enables fully autonomous structure determination when used in combination with a first-principles total energy calculator, e.g. a density functional theory (DFT) program. To save on the computational requirements, ASLA utilizes the DFT program in a s…
▽ More
The recently proposed Atomistic Structure Learning Algorithm (ASLA) builds on neural network enabled image recognition and reinforcement learning. It enables fully autonomous structure determination when used in combination with a first-principles total energy calculator, e.g. a density functional theory (DFT) program. To save on the computational requirements, ASLA utilizes the DFT program in a single-point mode, i.e. without allowing for relaxation of the structural candidates according to the force information at the DFT level. In this work, we augment ASLA to establish a surrogate energy model concurrently with its structure search. This enables approximative but computationally cheap relaxation of the structural candidates before the single-point energy evaluation with the computationally expensive DFT program. We demonstrate a significantly increased performance of ASLA for building benzene while utilizing a surrogate energy landscape. Further we apply this model-enhanced ASLA in a thorough investigation of the c(4x8) phase of the Ag(111) surface oxide. ASLA successfully identifies a surface reconstruction which has previously only been guessed on the basis of scanning tunnelling microscopy images.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Atomistic structure learning
Authors:
Mathias S. Jørgensen,
Henrik L. Mortensen,
Søren A. Meldgaard,
Esben L. Kolsbjerg,
Thomas L. Jacobsen,
Knud H. Sørensen,
Bjørk Hammer
Abstract:
One endeavour of modern physical chemistry is to use bottom-up approaches to design materials and drugs with desired properties. Here we introduce an atomistic structure learning algorithm (ASLA) that utilizes a convolutional neural network to build 2D compounds and layered structures atom by atom. The algorithm takes no prior data or knowledge on atomic interactions but inquires a first-principle…
▽ More
One endeavour of modern physical chemistry is to use bottom-up approaches to design materials and drugs with desired properties. Here we introduce an atomistic structure learning algorithm (ASLA) that utilizes a convolutional neural network to build 2D compounds and layered structures atom by atom. The algorithm takes no prior data or knowledge on atomic interactions but inquires a first-principles quantum mechanical program for physical properties. Using reinforcement learning, the algorithm accumulates knowledge of chemical compound space for a given number and type of atoms and stores this in the neural network, ultimately learning the blueprint for the optimal structural arrangement of the atoms for a given target property. ASLA is demonstrated to work on diverse problems, including grain boundaries in graphene sheets, organic compound formation and a surface oxide structure. This approach to structure prediction is a first step toward direct manipulation of atoms with artificially intelligent first principles computer codes.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
Estimating the Impact of Unknown Unknowns on Aggregate Query Results
Authors:
Yeounoh Chung,
Michael Lind Mortensen,
Carsten Binnig,
Tim Kraska
Abstract:
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results?
In this work, we develop and analyze techniques to estimate…
▽ More
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results?
In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries. The key idea is that the overlap between different data sources enables us to estimate the number and values of the missing data items. Our main techniques are parameter-free and do not assume prior knowledge about the distribution. Through a series of experiments, we show that estimating the impact of unknown unknowns is invaluable to better assess the results of aggregate queries over integrated data sources.
△ Less
Submitted 26 December, 2015; v1 submitted 20 July, 2015;
originally announced July 2015.