Search | arXiv e-print repository

Passive learning of active causal strategies in agents and language models

Authors: Andrew Kyle Lampinen, Stephanie C Y Chan, Ishita Dasgupta, Andrew J Nam, Jane X Wang

Abstract: What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long… ▽ More What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models. △ Less

Submitted 2 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Advances in Neural Information Processing Systems (NeurIPS 2023). 10 pages main text

arXiv:2210.03275 [pdf, other]

Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers

Authors: Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland

Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language m… ▽ More Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers. △ Less

Submitted 13 December, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02615 [pdf, other]

Learning to Reason With Relational Abstractions

Authors: Andrew J. Nam, Mengye Ren, Chelsea Finn, James L. McClelland

Abstract: Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study… ▽ More Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning. △ Less

Submitted 5 December, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2107.06994 [pdf, other]

Systematic human learning and generalization from a brief tutorial with explanatory feedback

Authors: Andrew J. Nam, James L. McClelland

Abstract: Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar… ▽ More Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar domains, but can also rapidly learn to reason through novel problems and situations, raising the question of how well modern neural network models capture human intelligence and in which ways they diverge. In this work, we explore this gap by investigating human adults' ability to learn an abstract reasoning task based on Sudoku from a brief instructional tutorial with explanatory feedback for incorrect responses using a narrow range of training examples. We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range. We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete. Interestingly, fewer than half of our human participants were successful in acquiring a valid solution strategy, and this ability is associated with high school mathematics education. We consider the challenges these findings pose for building computational models that capture all aspects of our findings and point toward a possible role for learning to engage in explanation-based reasoning to support rapid learning and generalization. △ Less

Submitted 28 March, 2023; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: 27 pages, 108 references, 8 Figures, and one Table, plus Supplementary Materials

arXiv:1811.07974 [pdf, other]

A Map of Knowledge

Authors: Zachary A. Pardos, Andrew Joo Hun Nam

Abstract: Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational str… ▽ More Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure, comprised of elements from digital records of our interactions, can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the literal descriptions of elements onto this behaviorally informed representation. We use the course enrollment behaviors of 124,000 students at a public university to learn vector representations of its courses. From these behaviorally informed representations, a notable 88% of course attribute information were recovered (e.g., department and division), as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Math H1B as Physics 7B is to Physics H7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions. We find that the representations learned from enrollments resolved course vectors to a level of semantic fidelity exceeding that of their catalog descriptions, depicting a vector space of high conceptual rationality. We end with a discussion of the possible mechanisms by which this knowledge structure may be informed and its implications for data science. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Showing 1–5 of 5 results for author: Nam, A J