-
Modularis: Modular Underwater Robot for Rapid Development and Validation of Autonomous Systems
Authors:
Baker Herrin,
Victoria Close,
Nathan Berner,
Joshua Herbert,
Ethan Reussow,
Ryan James,
Cale Woodward,
Jared Mindlin,
Sebastian Paez,
Nilson Bretas,
Jane Shin
Abstract:
Autonomous underwater robots typically require higher cost and time for demonstrations compared to other domains due to the complexity of the environment. Due to the limited capacity and payload flexibility, it is challenging to find off-the-shelf underwater robots that are affordable, customizable, and subject to environmental variability. Custom-built underwater robots may be necessary for speci…
▽ More
Autonomous underwater robots typically require higher cost and time for demonstrations compared to other domains due to the complexity of the environment. Due to the limited capacity and payload flexibility, it is challenging to find off-the-shelf underwater robots that are affordable, customizable, and subject to environmental variability. Custom-built underwater robots may be necessary for specialized applications or missions, but the process can be more costly and time-consuming than purchasing an off-the-shelf autonomous underwater vehicle (AUV). To address these challenges, we propose a modular underwater robot, Modularis, that can serve as an open-source testbed system. Our proposed system expedites the testing of perception, planning, and control algorithms.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
In-context Pretraining: Language Modeling Beyond Document Boundaries
Authors:
Weijia Shi,
Sewon Min,
Maria Lomeli,
Chunting Zhou,
Margaret Li,
Gergely Szilvasy,
Rich James,
Xi Victoria Lin,
Noah A. Smith,
Luke Zettlemoyer,
Scott Yih,
Mike Lewis
Abstract:
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next d…
▽ More
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).
△ Less
Submitted 24 June, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Authors:
Xi Victoria Lin,
Xilun Chen,
Mingda Chen,
Weijia Shi,
Maria Lomeli,
Rich James,
Pedro Rodriguez,
Jacob Kahn,
Gergely Szilvasy,
Mike Lewis,
Luke Zettlemoyer,
Scott Yih
Abstract:
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction…
▽ More
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.
△ Less
Submitted 6 May, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Authors:
Lili Yu,
Bowen Shi,
Ramakanth Pasunuru,
Benjamin Muller,
Olga Golovneva,
Tianlu Wang,
Arun Babu,
Binh Tang,
Brian Karrer,
Shelly Sheynin,
Candace Ross,
Adam Polyak,
Russell Howes,
Vasu Sharma,
Puxin Xu,
Hovhannes Tamoyan,
Oron Ashual,
Uriel Singer,
Shang-Wen Li,
Susan Zhang,
Richard James,
Gargi Ghosh,
Yaniv Taigman,
Maryam Fazel-Zarandi,
Asli Celikyilmaz
, et al. (2 additional authors not shown)
Abstract:
We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted fr…
▽ More
We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Reimagining Retrieval Augmented Language Models for Answering Queries
Authors:
Wang-Chiew Tan,
Yuliang Li,
Pedro Rodriguez,
Richard James,
Xi Victoria Lin,
Alon Halevy,
Scott Yih
Abstract:
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-pa…
▽ More
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-parametric architectures can be enhanced with views, a query analyzer/planner, and provenance to make a significantly more powerful system for question answering in terms of accuracy and efficiency, and potentially for other NLP tasks
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
REPLUG: Retrieval-Augmented Black-Box Language Models
Authors:
Weijia Shi,
Sewon Min,
Michihiro Yasunaga,
Minjoon Seo,
Rich James,
Mike Lewis,
Luke Zettlemoyer,
Wen-tau Yih
Abstract:
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simpl…
▽ More
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simple design can be easily applied to any existing retrieval and language models. Furthermore, we show that the LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions. Our experiments demonstrate that REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
△ Less
Submitted 24 May, 2023; v1 submitted 29 January, 2023;
originally announced January 2023.
-
Retrieval-Augmented Multimodal Language Modeling
Authors:
Michihiro Yasunaga,
Armen Aghajanyan,
Weijia Shi,
Rich James,
Jure Leskovec,
Percy Liang,
Mike Lewis,
Luke Zettlemoyer,
Wen-tau Yih
Abstract:
Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all learned knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a…
▽ More
Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all learned knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e.g., documents on the web). Specifically, for the retriever, we use a pretrained CLIP, and for the generator, we train a CM3 Transformer on the LAION dataset. Our resulting model, named Retrieval-Augmented CM3 (RA-CM3), is the first multimodal model that can retrieve and generate both text and images. We show that RA-CM3 significantly outperforms baseline multimodal models such as DALL-E and CM3 on both image and caption generation tasks (12 FID and 17 CIDEr improvements on MS-COCO), while requiring much less compute for training (<30% of DALL-E). Moreover, we show that RA-CM3 exhibits novel capabilities, such as faithful image generation and multimodal in-context learning (e.g., image generation from demonstrations).
△ Less
Submitted 5 June, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Personal+Context navigation: combining AR and shared displays in network path-following
Authors:
Raphaël James,
Anastasia Bezerianos,
Olivier Chapuis,
Maxime Cordeil,
Tim Dwyer,
Arnaud Prouzeau
Abstract:
Shared displays are well suited to public viewing and collaboration, however they lack personal space to view private information and act without disturbing others. Combining them with Augmented Reality (AR) headsets allows interaction without altering the context on the shared display. We study a set of such interaction techniques in the context of network navigation, in particular path following…
▽ More
Shared displays are well suited to public viewing and collaboration, however they lack personal space to view private information and act without disturbing others. Combining them with Augmented Reality (AR) headsets allows interaction without altering the context on the shared display. We study a set of such interaction techniques in the context of network navigation, in particular path following, an important network analysis task. Applications abound, for example planning private trips on a network map shown on a public display.The proposed techniques allow for hands-free interaction, rendering visual aids inside the headset, in order to help the viewer maintain a connection between the AR cursor and the network that is only shown on the shared display. In two experiments on path following, we found that adding persistent connections between the AR cursor and the network on the shared display works well for high precision tasks, but more transient connections work best for lower precision tasks. More broadly, we show that combining personal AR interaction with shared displays is feasible for network navigation.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Correlated structural evolution within multiplex networks
Authors:
Haochen Wu,
Ryan G. James,
James P. Crutchfield,
Raissa M. D'Souza
Abstract:
Many natural, engineered, and social systems can be represented using the framework of a layered network, where each layer captures a different type of interaction between the same set of nodes. The study of such multiplex networks is a vibrant area of research. Yet, understanding how to quantify the correlations present between pairs of layers, and more so present in their co-evolution, is lackin…
▽ More
Many natural, engineered, and social systems can be represented using the framework of a layered network, where each layer captures a different type of interaction between the same set of nodes. The study of such multiplex networks is a vibrant area of research. Yet, understanding how to quantify the correlations present between pairs of layers, and more so present in their co-evolution, is lacking. Such methods would enable us to address fundamental questions involving issues such as function, redundancy and potential disruptions. Here we show first how the edge-set of a multiplex network can be used to construct an estimator of a joint probability distribution describing edge existence over all layers. We then adapt an information-theoretic measure of general correlation called the conditional mutual information, which uses the estimated joint probability distribution, to quantify the pairwise correlations present between layers. The pairwise comparisons can also be temporal, allowing us to identify if knowledge of a certain layer can provide additional information about the evolution of another layer.
We analyze datasets from three distinct domains---economic, political, and airline networks---to demonstrate how pairwise correlation in structure and dynamical evolution between layers can be identified and show that anomalies can serve as potential indicators of major events such as shocks.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Embracing the Laws of Physics: Three Reversible Models of Computation
Authors:
Jacques Carette,
Roshan P. James,
Amr Sabry
Abstract:
Our main models of computation (the Turing Machine and the RAM) make fundamental assumptions about which primitive operations are realizable. The consensus is that these include logical operations like conjunction, disjunction and negation, as well as reading and writing to memory locations. This perspective conforms to a macro-level view of physics and indeed these operations are realizable using…
▽ More
Our main models of computation (the Turing Machine and the RAM) make fundamental assumptions about which primitive operations are realizable. The consensus is that these include logical operations like conjunction, disjunction and negation, as well as reading and writing to memory locations. This perspective conforms to a macro-level view of physics and indeed these operations are realizable using macro-level devices involving thousands of electrons. This point of view is however incompatible with quantum mechanics, or even elementary thermodynamics, as both imply that information is a conserved quantity of physical processes, and hence of primitive computational operations.
Our aim is to re-develop foundational computational models that embraces the principle of conservation of information. We first define what conservation of information means in a computational setting. We emphasize that computations must be reversible transformations on data. One can think of data as modeled using topological spaces and programs as modeled by reversible deformations. We illustrate this idea using three notions of data. The first assumes unstructured finite data, i.e., discrete topological spaces. The corresponding notion of reversible computation is that of permutations. We then consider a structured notion of data based on the Curry-Howard correspondence; here reversible deformations, as a programming language for witnessing type isomorphisms, comes from proof terms for commutative semirings. We then "move up a level" to treat programs as data. The corresponding notion of reversible programs equivalences comes from the "higher dimensional" analog to commutative semirings: symmetric rig groupoids. The coherence laws for these are exactly the program equivalences we seek.
We conclude with some generalizations inspired by homotopy type theory and survey directions for further research.
△ Less
Submitted 10 December, 2018; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Unique Information and Secret Key Agreement
Authors:
Ryan G. James,
Jeffrey Emenheiser,
James P. Crutchfield
Abstract:
The partial information decomposition (PID) is a promising framework for decomposing a joint random variable into the amount of influence each source variable Xi has on a target variable Y, relative to the other sources. For two sources, influence breaks down into the information that both X0 and X1 redundantly share with Y, what X0 uniquely shares with Y, what X1 uniquely shares with Y, and final…
▽ More
The partial information decomposition (PID) is a promising framework for decomposing a joint random variable into the amount of influence each source variable Xi has on a target variable Y, relative to the other sources. For two sources, influence breaks down into the information that both X0 and X1 redundantly share with Y, what X0 uniquely shares with Y, what X1 uniquely shares with Y, and finally what X0 and X1 synergistically share with Y. Unfortunately, considerable disagreement has arisen as to how these four components should be quantified. Drawing from cryptography, we consider the secret key agreement rate as an operational method of quantifying unique informations. Secret key agreement rate comes in several forms, depending upon which parties are permitted to communicate. We demonstrate that three of these four forms are inconsistent with the PID. The remaining form implies certain interpretations as to the PID's meaning---interpretations not present in PID's definition but that, we argue, need to be explicit. These reveal an inconsistency between third-order connected information, two-way secret key agreement rate, and synergy. Similar difficulties arise with a popular PID measure in light the results here as well as from a maximum entropy viewpoint. We close by reviewing the challenges facing the PID.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement
Authors:
Ryan G. James,
Jeffrey Emenheiser,
James P. Crutchfield
Abstract:
Recently, the partial information decomposition emerged as a promising framework for identifying the meaningful components of the information contained in a joint distribution. Its adoption and practical application, however, have been stymied by the lack of a generally-accepted method of quantifying its components. Here, we briefly discuss the bivariate (two-source) partial information decomposit…
▽ More
Recently, the partial information decomposition emerged as a promising framework for identifying the meaningful components of the information contained in a joint distribution. Its adoption and practical application, however, have been stymied by the lack of a generally-accepted method of quantifying its components. Here, we briefly discuss the bivariate (two-source) partial information decomposition and two implicitly directional interpretations used to intuitively motivate alternative component definitions. Drawing parallels with secret key agreement rates from information-theoretic cryptography, we demonstrate that these intuitions are mutually incompatible and suggest that this underlies the persistence of competing definitions and interpretations. Having highlighted this hitherto unacknowledged issue, we outline several possible solutions.
△ Less
Submitted 26 August, 2018;
originally announced August 2018.
-
Modes of Information Flow
Authors:
Ryan G. James,
Blanca Daniella Mansante Ayala,
Bahti Zakirov,
James P. Crutchfield
Abstract:
Information flow between components of a system takes many forms and is key to understanding the organization and functioning of large-scale, complex systems. We demonstrate three modalities of information flow from time series X to time series Y. Intrinsic information flow exists when the past of X is individually predictive of the present of Y, independent of Y's past; this is most commonly cons…
▽ More
Information flow between components of a system takes many forms and is key to understanding the organization and functioning of large-scale, complex systems. We demonstrate three modalities of information flow from time series X to time series Y. Intrinsic information flow exists when the past of X is individually predictive of the present of Y, independent of Y's past; this is most commonly considered information flow. Shared information flow exists when X's past is predictive of Y's present in the same manner as Y's past; this occurs due to synchronization or common driving, for example. Finally, synergistic information flow occurs when neither X's nor Y's pasts are predictive of Y's present on their own, but taken together they are. The two most broadly-employed information-theoretic methods of quantifying information flow---time-delayed mutual information and transfer entropy---are both sensitive to a pair of these modalities: time-delayed mutual information to both intrinsic and shared flow, and transfer entropy to both intrinsic and synergistic flow. To quantify each mode individually we introduce our cryptographic flow ansatz, positing that intrinsic flow is synonymous with secret key agreement between X and Y. Based on this, we employ an easily-computed secret-key-agreement bound---intrinsic mutual information&mdashto quantify the three flow modalities in a variety of systems including asymmetric flows and financial markets.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Unique Information via Dependency Constraints
Authors:
Ryan G. James,
Jeffrey Emenheiser,
James P. Crutchfield
Abstract:
The partial information decomposition (PID) is perhaps the leading proposal for resolving information shared between a set of sources and a target into redundant, synergistic, and unique constituents. Unfortunately, the PID framework has been hindered by a lack of a generally agreed-upon, multivariate method of quantifying the constituents. Here, we take a step toward rectifying this by developing…
▽ More
The partial information decomposition (PID) is perhaps the leading proposal for resolving information shared between a set of sources and a target into redundant, synergistic, and unique constituents. Unfortunately, the PID framework has been hindered by a lack of a generally agreed-upon, multivariate method of quantifying the constituents. Here, we take a step toward rectifying this by developing a decomposition based on a new method that quantifies unique information. We first develop a broadly applicable method---the dependency decomposition---that delineates how statistical dependencies influence the structure of a joint distribution. The dependency decomposition then allows us to define a measure of the information about a target that can be uniquely attributed to a particular source as the least amount which the source-target statistical dependency can influence the information shared between the sources and the target. The result is the first measure that satisfies the core axioms of the PID framework while not satisfying the Blackwell relation, which depends on a particular interpretation of how the variables are related. This makes a key step forward to a practical PID.
△ Less
Submitted 27 October, 2018; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Prediction and Generation of Binary Markov Processes: Can a Finite-State Fox Catch a Markov Mouse?
Authors:
J. Ruebeck,
R. G. James,
J. R. Mahoney,
J. P. Crutchfield
Abstract:
Understanding the generative mechanism of a natural system is a vital component of the scientific method. Here, we investigate one of the fundamental steps toward this goal by presenting the minimal generator of an arbitrary binary Markov process. This is a class of processes whose predictive model is well known. Surprisingly, the generative model requires three distinct topologies for different r…
▽ More
Understanding the generative mechanism of a natural system is a vital component of the scientific method. Here, we investigate one of the fundamental steps toward this goal by presenting the minimal generator of an arbitrary binary Markov process. This is a class of processes whose predictive model is well known. Surprisingly, the generative model requires three distinct topologies for different regions of parameter space. We show that a previously proposed generator for a particular set of binary Markov processes is, in fact, not minimal. Our results shed the first quantitative light on the relative (minimal) costs of prediction and generation. We find, for instance, that the difference between prediction and generation is maximized when the process is approximately independently, identically distributed.
△ Less
Submitted 31 July, 2017;
originally announced August 2017.
-
Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics
Authors:
Tomoki Nishi,
Prashant Doshi,
Michael R. James,
Danil Prokhorov
Abstract:
In many robotic applications, some aspects of the system dynamics can be modeled accurately while others are difficult to obtain or model. We present a novel reinforcement learning (RL) method for continuous state and action spaces that learns with partial knowledge of the system and without active exploration. It solves linearly-solvable Markov decision processes (L-MDPs), which are well suited f…
▽ More
In many robotic applications, some aspects of the system dynamics can be modeled accurately while others are difficult to obtain or model. We present a novel reinforcement learning (RL) method for continuous state and action spaces that learns with partial knowledge of the system and without active exploration. It solves linearly-solvable Markov decision processes (L-MDPs), which are well suited for continuous state and action spaces, based on an actor-critic architecture. Compared to previous RL methods for L-MDPs and path integral methods which are model based, the actor-critic learning does not need a model of the uncontrolled dynamics and, importantly, transition noise levels; however, it requires knowing the control dynamics for the problem. We evaluate our method on two synthetic test problems, and one real-world problem in simulation and using real traffic data. Our experiments demonstrate improved learning and policy performance.
△ Less
Submitted 4 June, 2017;
originally announced June 2017.
-
Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States
Authors:
Ryan G. James,
John R. Mahoney,
James P. Crutchfield
Abstract:
One of the most fundamental questions one can ask about a pair of random variables X and Y is the value of their mutual information. Unfortunately, this task is often stymied by the extremely large dimension of the variables. We might hope to replace each variable by a lower-dimensional representation that preserves the relationship with the other variable. The theoretically ideal implementation i…
▽ More
One of the most fundamental questions one can ask about a pair of random variables X and Y is the value of their mutual information. Unfortunately, this task is often stymied by the extremely large dimension of the variables. We might hope to replace each variable by a lower-dimensional representation that preserves the relationship with the other variable. The theoretically ideal implementation is the use of minimal sufficient statistics, where it is well-known that either X or Y can be replaced by their minimal sufficient statistic about the other while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal sufficient statistic preserves about X. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future and the present is viewed as a memoryful channel. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.
△ Less
Submitted 6 February, 2017;
originally announced February 2017.
-
Multivariate Dependence Beyond Shannon Information
Authors:
Ryan G. James,
James P. Crutchfield
Abstract:
Accurately determining dependency structure is critical to discovering a system's causal organization. We recently showed that the transfer entropy fails in a key aspect of this---measuring information flow---due to its conflation of dyadic and polyadic relationships. We extend this observation to demonstrate that this is true of all such Shannon information measures when used to analyze multivari…
▽ More
Accurately determining dependency structure is critical to discovering a system's causal organization. We recently showed that the transfer entropy fails in a key aspect of this---measuring information flow---due to its conflation of dyadic and polyadic relationships. We extend this observation to demonstrate that this is true of all such Shannon information measures when used to analyze multivariate dependencies. This has broad implications, particularly when employing information to express the organization and mechanisms embedded in complex systems, including the burgeoning efforts to combine complex network theory with information theory. Here, we do not suggest that any aspect of information theory is wrong. Rather, the vast majority of its informational measures are simply inadequate for determining the meaningful dependency structure within joint probability distributions. Therefore, such information measures are inadequate for discovering intrinsic causal relations. We close by demonstrating that such distributions exist across an arbitrary set of variables.
△ Less
Submitted 8 September, 2016; v1 submitted 5 September, 2016;
originally announced September 2016.
-
Information Flows? A Critique of Transfer Entropies
Authors:
Ryan G. James,
Nix Barnett,
James P. Crutchfield
Abstract:
A central task in analyzing complex dynamics is to determine the loci of information storage and the communication topology of information flows within a system. Over the last decade and a half, diagnostics for the latter have come to be dominated by the transfer entropy. Via straightforward examples, we show that it and a derivative quantity, the causation entropy, do not, in fact, quantify the f…
▽ More
A central task in analyzing complex dynamics is to determine the loci of information storage and the communication topology of information flows within a system. Over the last decade and a half, diagnostics for the latter have come to be dominated by the transfer entropy. Via straightforward examples, we show that it and a derivative quantity, the causation entropy, do not, in fact, quantify the flow of information. At one and the same time they can overestimate flow or underestimate influence. We isolate why this is the case and propose several avenues to alternate measures for information flow. We also address an auxiliary consequence: The proliferation of networks as a now-common theoretical model for large-scale systems, in concert with the use of transfer-like entropies, has shoehorned dyadic relationships into our structural interpretation of the organization and behavior of complex systems. This interpretation thus fails to include the effects of polyadic dependencies. The net result is that much of the sophisticated organization of complex systems may go undetected.
△ Less
Submitted 17 June, 2016; v1 submitted 20 December, 2015;
originally announced December 2015.
-
Anatomy of a Spin: The Information-Theoretic Structure of Classical Spin Systems
Authors:
V. S. Vijayaraghavan,
R. G. James,
J. P. Crutchfield
Abstract:
Collective organization in matter plays a significant role in its expressed physical properties. Typically, it is detected via an order parameter, appropriately defined for each given system's observed emergent patterns. Recent developments in information theory, however, suggest quantifying collective organization in a system- and phenomenon-agnostic way: decompose the system's thermodynamic entr…
▽ More
Collective organization in matter plays a significant role in its expressed physical properties. Typically, it is detected via an order parameter, appropriately defined for each given system's observed emergent patterns. Recent developments in information theory, however, suggest quantifying collective organization in a system- and phenomenon-agnostic way: decompose the system's thermodynamic entropy density into a localized entropy, that solely contained in the dynamics at a single location, and a bound entropy, that stored in space as domains, clusters, excitations, or other emergent structures. We compute this decomposition and related quantities explicitly for the nearest-neighbor Ising model on the 1D chain, the Bethe lattice with coordination number k=3, and the 2D square lattice, illustrating its generality and the functional insights it gives near and away from phase transitions. In particular, we consider the roles that different spin motifs play (in cluster bulk, cluster edges, and the like) and how these affect the dependencies between spins.
△ Less
Submitted 13 August, 2016; v1 submitted 29 October, 2015;
originally announced October 2015.
-
A new method for choosing parameters in delay reconstruction-based forecast strategies
Authors:
Joshua Garland,
Ryan G. James,
Elizabeth Bradley
Abstract:
Delay-coordinate reconstruction is a proven modeling strategy for building effective forecasts of nonlinear time series. The first step in this process is the estimation of good values for two parameters, the time delay and the embedding dimension. Many heuristics and strategies have been proposed in the literature for estimating these values. Few, if any, of these methods were developed with fore…
▽ More
Delay-coordinate reconstruction is a proven modeling strategy for building effective forecasts of nonlinear time series. The first step in this process is the estimation of good values for two parameters, the time delay and the embedding dimension. Many heuristics and strategies have been proposed in the literature for estimating these values. Few, if any, of these methods were developed with forecasting in mind, however, and their results are not optimal for that purpose. Even so, these heuristics---intended for other applications---are routinely used when building delay coordinate reconstruction-based forecast models. In this paper, we propose a new strategy for choosing optimal parameter values for forecast methods that are based on delay-coordinate reconstructions. The basic calculation involves maximizing the shared information between each delay vector and the future state of the system. We illustrate the effectiveness of this method on several synthetic and experimental systems, showing that this metric can be calculated quickly and reliably from a relatively short time series, and that it provides a direct indication of how well a near-neighbor based forecasting method will work on a given delay reconstruction of that time series. This allows a practitioner to choose reconstruction parameters that avoid any pathologies, regardless of the underlying mechanism, and maximize the predictive information contained in the reconstruction.
△ Less
Submitted 15 October, 2015; v1 submitted 5 September, 2015;
originally announced September 2015.
-
The Elusive Present: Hidden Past and Future Dependency and Why We Build Models
Authors:
Pooneh M. Ara,
Ryan G. James,
James P. Crutchfield
Abstract:
Modeling a temporal process as if it is Markovian assumes the present encodes all of the process's history. When this occurs, the present captures all of the dependency between past and future. We recently showed that if one randomly samples in the space of structured processes, this is almost never the case. So, how does the Markov failure come about? That is, how do individual measurements fail…
▽ More
Modeling a temporal process as if it is Markovian assumes the present encodes all of the process's history. When this occurs, the present captures all of the dependency between past and future. We recently showed that if one randomly samples in the space of structured processes, this is almost never the case. So, how does the Markov failure come about? That is, how do individual measurements fail to encode the past? And, how many are needed to capture dependencies between the past and future? Here, we investigate how much information can be shared between the past and future, but not be reflected in the present. We quantify this elusive information, give explicit calculational methods, and draw out the consequences. The most important of which is that when the present hides past-future dependency we must move beyond sequence-based statistics and build state-based models.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
Understanding and Designing Complex Systems: Response to "A framework for optimal high-level descriptions in science and engineering---preliminary report"
Authors:
James P. Crutchfield,
Ryan G. James,
Sarah Marzen,
Dowman P. Varn
Abstract:
We recount recent history behind building compact models of nonlinear, complex processes and identifying their relevant macroscopic patterns or "macrostates". We give a synopsis of computational mechanics, predictive rate-distortion theory, and the role of information measures in monitoring model complexity and predictive performance. Computational mechanics provides a method to extract the optima…
▽ More
We recount recent history behind building compact models of nonlinear, complex processes and identifying their relevant macroscopic patterns or "macrostates". We give a synopsis of computational mechanics, predictive rate-distortion theory, and the role of information measures in monitoring model complexity and predictive performance. Computational mechanics provides a method to extract the optimal minimal predictive model for a given process. Rate-distortion theory provides methods for systematically approximating such models. We end by commenting on future prospects for developing a general framework that automatically discovers optimal compact models. As a response to the manuscript cited in the title above, this brief commentary corrects potentially misleading claims about its state space compression method and places it in a broader historical setting.
△ Less
Submitted 29 December, 2014;
originally announced December 2014.
-
Model-free quantification of time-series predictability
Authors:
Joshua Garland,
Ryan James,
Elizabeth Bradley
Abstract:
This paper provides insight into when, why, and how forecast strategies fail when they are applied to complicated time series. We conjecture that the inherent complexity of real-world time-series data---which results from the dimension, nonlinearity, and non-stationarity of the generating process, as well as from measurement issues like noise, aggregation, and finite data length---is both empirica…
▽ More
This paper provides insight into when, why, and how forecast strategies fail when they are applied to complicated time series. We conjecture that the inherent complexity of real-world time-series data---which results from the dimension, nonlinearity, and non-stationarity of the generating process, as well as from measurement issues like noise, aggregation, and finite data length---is both empirically quantifiable and directly correlated with predictability. In particular, we argue that redundancy is an effective way to measure complexity and predictive structure in an experimental time series and that weighted permutation entropy is an effective way to estimate that redundancy. To validate these conjectures, we study 120 different time-series data sets. For each time series, we construct predictions using a wide variety of forecast models, then compare the accuracy of the predictions with the permutation entropy of that time series. We use the results to develop a model-free heuristic that can help practitioners recognize when a particular prediction method is not well matched to the task at hand: that is, when the time series has more predictive structure than that method can capture and exploit.
△ Less
Submitted 5 August, 2014; v1 submitted 27 April, 2014;
originally announced April 2014.
-
Intersection Information based on Common Randomness
Authors:
Virgil Griffith,
Edwin K. P. Chong,
Ryan G. James,
Christopher J. Ellison,
James P. Crutchfield
Abstract:
The introduction of the partial information decomposition generated a flurry of proposals for defining an intersection information that quantifies how much of "the same information" two or more random variables specify about a target random variable. As of yet, none is wholly satisfactory. A palatable measure of intersection information would provide a principled way to quantify slippery concepts,…
▽ More
The introduction of the partial information decomposition generated a flurry of proposals for defining an intersection information that quantifies how much of "the same information" two or more random variables specify about a target random variable. As of yet, none is wholly satisfactory. A palatable measure of intersection information would provide a principled way to quantify slippery concepts, such as synergy. Here, we introduce an intersection information measure based on the Gács-Körner common random variable that is the first to satisfy the coveted target monotonicity property. Our measure is imperfect, too, and we suggest directions for improvement.
△ Less
Submitted 10 June, 2015; v1 submitted 6 October, 2013;
originally announced October 2013.
-
Chaos Forgets and Remembers: Measuring Information Creation, Destruction, and Storage
Authors:
Ryan G. James,
Korana Burke,
James P. Crutchfield
Abstract:
The hallmark of deterministic chaos is that it creates information---the rate being given by the Kolmogorov-Sinai metric entropy. Since its introduction half a century ago, the metric entropy has been used as a unitary quantity to measure a system's intrinsic unpredictability. Here, we show that it naturally decomposes into two structurally meaningful components: A portion of the created informati…
▽ More
The hallmark of deterministic chaos is that it creates information---the rate being given by the Kolmogorov-Sinai metric entropy. Since its introduction half a century ago, the metric entropy has been used as a unitary quantity to measure a system's intrinsic unpredictability. Here, we show that it naturally decomposes into two structurally meaningful components: A portion of the created information---the ephemeral information---is forgotten and a portion---the bound information---is remembered. The bound information is a new kind of intrinsic computation that differs fundamentally from information creation: it measures the rate of active information storage. We show that it can be directly and accurately calculated via symbolic dynamics, revealing a hitherto unknown richness in how dynamical systems compute.
△ Less
Submitted 16 December, 2013; v1 submitted 21 September, 2013;
originally announced September 2013.
-
Determinism, Complexity, and Predictability in Computer Performance
Authors:
Joshua Garland,
Ryan James,
Elizabeth Bradley
Abstract:
Computers are deterministic dynamical systems (CHAOS 19:033124, 2009). Among other things, that implies that one should be able to use deterministic forecast rules to predict their behavior. That statement is sometimes-but not always-true. The memory and processor loads of some simple programs are easy to predict, for example, but those of more-complex programs like compilers are not. The goal of…
▽ More
Computers are deterministic dynamical systems (CHAOS 19:033124, 2009). Among other things, that implies that one should be able to use deterministic forecast rules to predict their behavior. That statement is sometimes-but not always-true. The memory and processor loads of some simple programs are easy to predict, for example, but those of more-complex programs like compilers are not. The goal of this paper is to determine why that is the case. We conjecture that, in practice, complexity can effectively overwhelm the predictive power of deterministic forecast models. To explore that, we build models of a number of performance traces from different programs running on different Intel-based computers. We then calculate the permutation entropy-a temporal entropy metric that uses ordinal analysis-of those traces and correlate those values against the prediction success
△ Less
Submitted 23 May, 2013;
originally announced May 2013.
-
How Hidden are Hidden Processes? A Primer on Crypticity and Entropy Convergence
Authors:
John R. Mahoney,
Christopher J. Ellison,
Ryan G. James,
James P. Crutchfield
Abstract:
We investigate a stationary process's crypticity---a measure of the difference between its hidden state information and its observed information---using the causal states of computational mechanics. Here, we motivate crypticity and cryptic order as physically meaningful quantities that monitor how hidden a hidden process is. This is done by recasting previous results on the convergence of block en…
▽ More
We investigate a stationary process's crypticity---a measure of the difference between its hidden state information and its observed information---using the causal states of computational mechanics. Here, we motivate crypticity and cryptic order as physically meaningful quantities that monitor how hidden a hidden process is. This is done by recasting previous results on the convergence of block entropy and block-state entropy in a geometric setting, one that is more intuitive and that leads to a number of new results. For example, we connect crypticity to how an observer synchronizes to a process. We show that the block-causal-state entropy is a convex function of block length. We give a complete analysis of spin chains. We present a classification scheme that surveys stationary processes in terms of their possible cryptic and Markov orders. We illustrate related entropy convergence behaviors using a new form of foliated information diagram. Finally, along the way, we provide a variety of interpretations of crypticity and cryptic order to establish their naturalness and pervasiveness. Hopefully, these will inspire new applications in spatially extended and network dynamical systems.
△ Less
Submitted 6 August, 2011;
originally announced August 2011.
-
Information Symmetries in Irreversible Processes
Authors:
Christopher J. Ellison,
John R. Mahoney,
Ryan G. James,
James P. Crutchfield,
Joerg Reichardt
Abstract:
We study dynamical reversibility in stationary stochastic processes from an information theoretic perspective. Extending earlier work on the reversibility of Markov chains, we focus on finitary processes with arbitrarily long conditional correlations. In particular, we examine stationary processes represented or generated by edge-emitting, finite-state hidden Markov models. Surprisingly, we find p…
▽ More
We study dynamical reversibility in stationary stochastic processes from an information theoretic perspective. Extending earlier work on the reversibility of Markov chains, we focus on finitary processes with arbitrarily long conditional correlations. In particular, we examine stationary processes represented or generated by edge-emitting, finite-state hidden Markov models. Surprisingly, we find pervasive temporal asymmetries in the statistics of such stationary processes with the consequence that the computational resources necessary to generate a process in the forward and reverse temporal directions are generally not the same. In fact, an exhaustive survey indicates that most stationary processes are irreversible. We study the ensuing relations between model topology in different representations, the process's statistical properties, and its reversibility in detail. A process's temporal asymmetry is efficiently captured using two canonical unifilar representations of the generating model, the forward-time and reverse-time epsilon-machines. We analyze example irreversible processes whose epsilon-machine presentations change size under time reversal, including one which has a finite number of recurrent causal states in one direction, but an infinite number in the opposite. From the forward-time and reverse-time epsilon-machines, we are able to construct a symmetrized, but nonunifilar, generator of a process---the bidirectional machine. Using the bidirectional machine, we show how to directly calculate a process's fundamental information properties, many of which are otherwise only poorly approximated via process samples. The tools we introduce and the insights we offer provide a better understanding of the many facets of reversibility and irreversibility in stochastic processes.
△ Less
Submitted 11 July, 2011;
originally announced July 2011.
-
Anatomy of a Bit: Information in a Time Series Observation
Authors:
Ryan G. James,
Christopher J. Ellison,
James P. Crutchfield
Abstract:
Appealing to several multivariate information measures---some familiar, some new here---we analyze the information embedded in discrete-valued stochastic time series. We dissect the uncertainty of a single observation to demonstrate how the measures' asymptotic behavior sheds structural and semantic light on the generating process's internal information dynamics. The measures scale with the length…
▽ More
Appealing to several multivariate information measures---some familiar, some new here---we analyze the information embedded in discrete-valued stochastic time series. We dissect the uncertainty of a single observation to demonstrate how the measures' asymptotic behavior sheds structural and semantic light on the generating process's internal information dynamics. The measures scale with the length of time window, which captures both intensive (rates of growth) and subextensive components. We provide interpretations for the components, developing explicit relationships between them. We also identify the informational component shared between the past and the future that is not contained in a single observation. The existence of this component directly motivates the notion of a process's effective (internal) states and indicates why one must build models.
△ Less
Submitted 15 May, 2011;
originally announced May 2011.
-
Many Roads to Synchrony: Natural Time Scales and Their Algorithms
Authors:
Ryan G. James,
John R. Mahoney,
Christopher J. Ellison,
James P. Crutchfield
Abstract:
We consider two important time scales---the Markov and cryptic orders---that monitor how an observer synchronizes to a finitary stochastic process. We show how to compute these orders exactly and that they are most efficiently calculated from the epsilon-machine, a process's minimal unifilar model. Surprisingly, though the Markov order is a basic concept from stochastic process theory, it is not a…
▽ More
We consider two important time scales---the Markov and cryptic orders---that monitor how an observer synchronizes to a finitary stochastic process. We show how to compute these orders exactly and that they are most efficiently calculated from the epsilon-machine, a process's minimal unifilar model. Surprisingly, though the Markov order is a basic concept from stochastic process theory, it is not a probabilistic property of a process. Rather, it is a topological property and, moreover, it is not computable from any finite-state model other than the epsilon-machine. Via an exhaustive survey, we close by demonstrating that infinite Markov and infinite cryptic orders are a dominant feature in the space of finite-memory processes. We draw out the roles played in statistical mechanical spin systems by these two complementary length scales.
△ Less
Submitted 20 December, 2013; v1 submitted 26 October, 2010;
originally announced October 2010.
-
Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation
Authors:
James P. Crutchfield,
Christopher J. Ellison,
Ryan G. James,
John R. Mahoney
Abstract:
We adapt tools from information theory to analyze how an observer comes to synchronize with the hidden states of a finitary, stationary stochastic process. We show that synchronization is determined by both the process's internal organization and by an observer's model of it. We analyze these components using the convergence of state-block and block-state entropies, comparing them to the previousl…
▽ More
We adapt tools from information theory to analyze how an observer comes to synchronize with the hidden states of a finitary, stationary stochastic process. We show that synchronization is determined by both the process's internal organization and by an observer's model of it. We analyze these components using the convergence of state-block and block-state entropies, comparing them to the previously known convergence properties of the Shannon block entropy. Along the way, we introduce a hierarchy of information quantifiers as derivatives and integrals of these entropies, which parallels a similar hierarchy introduced for block entropy. We also draw out the duality between synchronization properties and a process's controllability. The tools lead to a new classification of a process's alternative representations in terms of minimality, synchronizability, and unifilarity.
△ Less
Submitted 29 July, 2010;
originally announced July 2010.