Search | arXiv e-print repository

StructuredRAG: JSON Response Formatting with Large Language Models

Authors: Connor Shorten, Charles Pierse, Thomas Benjamin Smith, Erika Cardenas, Akanksha Sharma, John Trengrove, Bob van Luijt

Abstract: The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is crucial for their use in Compound AI Systems. However, evaluating and improving this capability remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to assess LLMs' proficiency in following response format instructions. We evaluate two state-of-the-art LLMs, Gemi… ▽ More The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is crucial for their use in Compound AI Systems. However, evaluating and improving this capability remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to assess LLMs' proficiency in following response format instructions. We evaluate two state-of-the-art LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting. Across 24 experiments, we find an average success rate of 82.55%. We further find a high variance in performance across tasks, models, and prompting strategies with success rates ranging from 0 to 100%. We find that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro. We observe that task complexity significantly influences performance, with tasks involving lists or composite object outputs proving more challenging. Our findings highlight the need for further research into improving the reliability and consistency of structured output generation in LLMs. We have open-sourced our experimental code and results at github.com/weaviate/structured-rag. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: Preprint. 10 pages, 6 figures

arXiv:2408.05604 [pdf, other]

Cellular Plasticity Model for Bottom-Up Robotic Design

Authors: Trevor R. Smith, Thomas J. Smith, Nicholas S. Szczecinski, Sergiy Yakovenko, Yu Gu

Abstract: Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesi… ▽ More Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesis -- the emergence of form from simple interactions. Turing patterns describe how diffusion and interactions between two chemical substances-an activator and an inhibitor-can lead to complex patterns and structures, such as the formation of limbs and feathers. Our study extends this concept by modeling cellular plasticity as an activator-inhibitor reaction augmented with environmental stimuli, encapsulating the core phenomena observed across various cell types: stem cells, neurons, and muscle cells. In addition to demonstrating self-regulation and self-containment, this approach ensures that a robot's form and function are direct emergent responses to its environment without a comprehensive environmental model. In the proposed model, a factory acts as the activator, producing a product that serves as the inhibitor, which is then influenced by environmental stimuli through consumption. These components are regulated by cellular plasticity phenomena as feedback loops. We calculate the equilibrium points of the model and the stability criterion. Simulations examine how varying parameters affect the system's transient behavior and the impact of competing functions on its functional capacity. Results show the model converges to a single stable equilibrium tuned to the environmental stimulation. Such dynamic behavior underscores the model's utility for generating predictable responses within robotics and biological systems, showcasing its potential for navigating the complexities of adaptive systems. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 15 pages, 7 figures, Living Machines 2024

arXiv:2407.19115 [pdf, other]

Towards Scalable and Stable Parallelization of Nonlinear RNNs

Authors: Xavier Gonzalez, Andrew Warrington, Jimmy T. H. Smith, Scott W. Linderman

Abstract: Conventional nonlinear RNNs are not naturally parallelizable across the sequence length, whereas transformers and linear RNNs are. Lim et al. [2024] therefore tackle parallelized evaluation of nonlinear RNNs by posing it as a fixed point problem, solved with Newton's method. By deriving and applying a parallelized form of Newton's method, they achieve huge speedups over sequential evaluation. Howe… ▽ More Conventional nonlinear RNNs are not naturally parallelizable across the sequence length, whereas transformers and linear RNNs are. Lim et al. [2024] therefore tackle parallelized evaluation of nonlinear RNNs by posing it as a fixed point problem, solved with Newton's method. By deriving and applying a parallelized form of Newton's method, they achieve huge speedups over sequential evaluation. However, their approach inherits cubic computational complexity and numerical instability. We tackle these weaknesses. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably to full-Newton, use less memory, and are faster. To stabilize Newton's method, we leverage a connection between Newton's method damped with trust regions and Kalman smoothing. This connection allows us to stabilize Newtons method, per the trust region, while using efficient parallelized Kalman algorithms to retain performance. We compare these methods empirically, and highlight the use cases where each algorithm excels. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 22 pages, 6 figures

ACM Class: I.2.6

arXiv:2407.13277 [pdf, other]

URCDM: Ultra-Resolution Image Synthesis in Histopathology

Authors: Sarah Cechnicka, James Ball, Matthew Baugh, Hadrien Reynaud, Naomi Simmonds, Andrew P. T. Smith, Catherine Horsfield, Candice Roufosse, Bernhard Kainz

Abstract: Diagnosing medical conditions from histopathology data requires a thorough analysis across the various resolutions of Whole Slide Images (WSI). However, existing generative methods fail to consistently represent the hierarchical structure of WSIs due to a focus on high-fidelity patches. To tackle this, we propose Ultra-Resolution Cascaded Diffusion Models (URCDMs) which are capable of synthesising… ▽ More Diagnosing medical conditions from histopathology data requires a thorough analysis across the various resolutions of Whole Slide Images (WSI). However, existing generative methods fail to consistently represent the hierarchical structure of WSIs due to a focus on high-fidelity patches. To tackle this, we propose Ultra-Resolution Cascaded Diffusion Models (URCDMs) which are capable of synthesising entire histopathology images at high resolutions whilst authentically capturing the details of both the underlying anatomy and pathology at all magnification levels. We evaluate our method on three separate datasets, consisting of brain, breast and kidney tissue, and surpass existing state-of-the-art multi-resolution models. Furthermore, an expert evaluation study was conducted, demonstrating that URCDMs consistently generate outputs across various resolutions that trained evaluators cannot distinguish from real images. All code and additional examples can be found on GitHub. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.01152

arXiv:2407.09902 [pdf, other]

doi 10.1109/TFR.2024.3424748

Air-Ground Collaboration with SPOMP: Semantic Panoramic Online Mapping and Planning

Authors: Ian D. Miller, Fernando Cladera, Trey Smith, Camillo Jose Taylor, Vijay Kumar

Abstract: Mapping and navigation have gone hand-in-hand since long before robots existed. Maps are a key form of communication, allowing someone who has never been somewhere to nonetheless navigate that area successfully. In the context of multi-robot systems, the maps and information that flow between robots are necessary for effective collaboration, whether those robots are operating concurrently, sequent… ▽ More Mapping and navigation have gone hand-in-hand since long before robots existed. Maps are a key form of communication, allowing someone who has never been somewhere to nonetheless navigate that area successfully. In the context of multi-robot systems, the maps and information that flow between robots are necessary for effective collaboration, whether those robots are operating concurrently, sequentially, or completely asynchronously. In this paper, we argue that maps must go beyond encoding purely geometric or visual information to enable increasingly complex autonomy, particularly between robots. We propose a framework for multi-robot autonomy, focusing in particular on air and ground robots operating in outdoor 2.5D environments. We show that semantic maps can enable the specification, planning, and execution of complex collaborative missions, including localization in GPS-denied settings. A distinguishing characteristic of this work is that we strongly emphasize field experiments and testing, and by doing so demonstrate that these ideas can work at scale in the real world. We also perform extensive simulation experiments to validate our ideas at even larger scales. We believe these experiments and the experimental results constitute a significant step forward toward advancing the state-of-the-art of large-scale, collaborative multi-robot systems operating with real communication, navigation, and perception constraints. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Video: https://www.youtube.com/watch?v=ieNYH40buBo

Journal ref: IEEE Transactions on Field Robotics (2024)

arXiv:2407.07279 [pdf, other]

Towards a theory of learning dynamics in deep state space models

Authors: Jakub Smékal, Jimmy T. H. Smith, Michael Kleinman, Dan Biderman, Scott W. Linderman

Abstract: State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We… ▽ More State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2405.06147 [pdf, other]

State-Free Inference of State-Space Models: The Transfer Function Approach

Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of… ▽ More We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF. △ Less

Submitted 1 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Resubmission 02/06/2024: Fixed minor typo of recurrent form RTF

arXiv:2404.07703 [pdf, ps, other]

Learning Hamiltonian Dynamics with Reproducing Kernel Hilbert Spaces and Random Features

Authors: Torbjørn Smith, Olav Egeland

Abstract: A method for learning Hamiltonian dynamics from a limited and noisy dataset is proposed. The method learns a Hamiltonian vector field on a reproducing kernel Hilbert space (RKHS) of inherently Hamiltonian vector fields, and in particular, odd Hamiltonian vector fields. This is done with a symplectic kernel, and it is shown how the kernel can be modified to an odd symplectic kernel to impose the od… ▽ More A method for learning Hamiltonian dynamics from a limited and noisy dataset is proposed. The method learns a Hamiltonian vector field on a reproducing kernel Hilbert space (RKHS) of inherently Hamiltonian vector fields, and in particular, odd Hamiltonian vector fields. This is done with a symplectic kernel, and it is shown how the kernel can be modified to an odd symplectic kernel to impose the odd symmetry. A random feature approximation is developed for the proposed kernel to reduce the problem size. This includes random feature approximations for odd kernels. The performance of the method is validated in simulations for three Hamiltonian systems. It is demonstrated that the use of an odd symplectic kernel improves prediction accuracy, and that the learned vector fields are Hamiltonian and exhibit the imposed odd symmetry characteristics. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2312.09734

arXiv:2404.03489 [pdf, other]

Design of Stickbug: a Six-Armed Precision Pollination Robot

Authors: Trevor Smith, Madhav Rijal, Christopher Tatsch, R. Michael Butts, Jared Beard, R. Tyler Cook, Andy Chu, Jason Gross, Yu Gu

Abstract: This work presents the design of Stickbug, a six-armed, multi-agent, precision pollination robot that combines the accuracy of single-agent systems with swarm parallelization in greenhouses. Precision pollination robots have often been proposed to offset the effects of a decreasing population of natural pollinators, but they frequently lack the required parallelization and scalability. Stickbug ac… ▽ More This work presents the design of Stickbug, a six-armed, multi-agent, precision pollination robot that combines the accuracy of single-agent systems with swarm parallelization in greenhouses. Precision pollination robots have often been proposed to offset the effects of a decreasing population of natural pollinators, but they frequently lack the required parallelization and scalability. Stickbug achieves this by allowing each arm and drive base to act as an individual agent, significantly reducing planning complexity. Stickbug uses a compact holonomic Kiwi drive to navigate narrow greenhouse rows, a tall mast to support multiple manipulators and reach plant heights, a detection model and classifier to identify Bramble flowers, and a felt-tipped end-effector for contact-based pollination. Initial experimental validation demonstrates that Stickbug can attempt over 1.5 pollinations per minute with a 50% success rate. Additionally, a Bramble flower perception dataset was created and is publicly available alongside Stickbug's software and design files. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 7 pages, 7 figures

arXiv:2404.03039 [pdf, ps, other]

Illustrating Finite Automata with Grail+ and TikZ

Authors: Alastair May, Taylor J. Smith

Abstract: In this article, we discuss a new software tool that interacts with Grail+, a library of automata-theoretic command-line utilities. Our software, the Grail+ Visualizer, takes the textual representation of a finite automaton produced by Grail+ and generates TikZ code to illustrate the finite automaton, with automatic layout of states and transitions. In addition to giving an overview of the basics… ▽ More In this article, we discuss a new software tool that interacts with Grail+, a library of automata-theoretic command-line utilities. Our software, the Grail+ Visualizer, takes the textual representation of a finite automaton produced by Grail+ and generates TikZ code to illustrate the finite automaton, with automatic layout of states and transitions. In addition to giving an overview of the basics of automata theory and Grail+, we discuss how the Grail+ Visualizer works in detail and suggest avenues for future work. △ Less

Submitted 3 April, 2024; originally announced April 2024.

MSC Class: 68-04 (primary); 68Q45 (secondary)

arXiv:2403.08707 [pdf, ps, other]

Improved Randomized Approximation of Hard Universality and Emptiness Problems

Authors: Pantelis Andreou, Stavros Konstantinidis, Taylor J. Smith

Abstract: We build on recent research on polynomial randomized approximation (PRAX) algorithms for the hard problems of NFA universality and NFA equivalence. Loosely speaking, PRAX algorithms use sampling of infinite domains within any desired accuracy $δ$. In the spirit of experimental mathematics, we extend the concept of PRAX algorithms to be applicable to the emptiness and universality problems in any d… ▽ More We build on recent research on polynomial randomized approximation (PRAX) algorithms for the hard problems of NFA universality and NFA equivalence. Loosely speaking, PRAX algorithms use sampling of infinite domains within any desired accuracy $δ$. In the spirit of experimental mathematics, we extend the concept of PRAX algorithms to be applicable to the emptiness and universality problems in any domain whose instances admit a tractable distribution as defined in this paper. A technical result here is that a linear (w.r.t. $1/δ$) number of samples is sufficient, as opposed to the quadratic number of samples in previous papers. We show how the improved and generalized PRAX algorithms apply to universality and emptiness problems in various domains: ordinary automata, tautology testing of propositions, 2D automata, and to solution sets of certain Diophantine equations. △ Less

Submitted 13 March, 2024; originally announced March 2024.

MSC Class: 68W25 (primary); 68W20; 68Q45 (secondary)

arXiv:2312.09734 [pdf, ps, other]

Learning of Hamiltonian Dynamics with Reproducing Kernel Hilbert Spaces

Authors: Torbjørn Smith, Olav Egeland

Abstract: This paper presents a method for learning Hamiltonian dynamics from a limited set of data points. The Hamiltonian vector field is found by regularized optimization over a reproducing kernel Hilbert space of vector fields that are inherently Hamiltonian, and where the vector field is required to be odd or even. This is done with a symplectic kernel, and it is shown how this symplectic kernel can be… ▽ More This paper presents a method for learning Hamiltonian dynamics from a limited set of data points. The Hamiltonian vector field is found by regularized optimization over a reproducing kernel Hilbert space of vector fields that are inherently Hamiltonian, and where the vector field is required to be odd or even. This is done with a symplectic kernel, and it is shown how this symplectic kernel can be modified to be odd or even. The performance of the method is validated in simulations for two Hamiltonian systems. It is shown that the learned dynamics are Hamiltonian, and that the learned Hamiltonian vector field can be prescribed to be odd or even. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.02396 [pdf, other]

doi 10.2514/6.2024-1960

Unsupervised Change Detection for Space Habitats Using 3D Point Clouds

Authors: Jamie Santos, Holly Dinkel, Julia Di, Paulo V. K. Borges, Marina Moreira, Oleg Alexandrov, Brian Coltin, Trey Smith

Abstract: This work presents an algorithm for scene change detection from point clouds to enable autonomous robotic caretaking in future space habitats. Autonomous robotic systems will help maintain future deep-space habitats, such as the Gateway space station, which will be uncrewed for extended periods. Existing scene analysis software used on the International Space Station (ISS) relies on manually-label… ▽ More This work presents an algorithm for scene change detection from point clouds to enable autonomous robotic caretaking in future space habitats. Autonomous robotic systems will help maintain future deep-space habitats, such as the Gateway space station, which will be uncrewed for extended periods. Existing scene analysis software used on the International Space Station (ISS) relies on manually-labeled images for detecting changes. In contrast, the algorithm presented in this work uses raw, unlabeled point clouds as inputs. The algorithm first applies modified Expectation-Maximization Gaussian Mixture Model (GMM) clustering to two input point clouds. It then performs change detection by comparing the GMMs using the Earth Mover's Distance. The algorithm is validated quantitatively and qualitatively using a test dataset collected by an Astrobee robot in the NASA Ames Granite Lab comprising single frame depth images taken directly by Astrobee and full-scene reconstructed maps built with RGB-D and pose data from Astrobee. The runtimes of the approach are also analyzed in depth. The source code is publicly released to promote further development. △ Less

Submitted 5 August, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 15 pages, 7 figures, Manuscript was presented at the AIAA SciTech Forum in Orlando, FL, USA, 8 - 12 January 2024. Video presentation: [https://www.youtube.com/watch?v=7WHp0dQYG4Y]. Code: [https://github.com/nasa/isaac/tree/master/anomaly/gmm-change-detection]

Report number: AIAA 2024-1960

Journal ref: AIAA SCITECH 2024 Forum

arXiv:2311.14711 [pdf, other]

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

Authors: Markus Anderljung, Everett Thornton Smith, Joe O'Brien, Lisa Soder, Benjamin Bucknall, Emma Bluemke, Jonas Schuett, Robert Trager, Lacey Strahm, Rumman Chowdhury

Abstract: With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Inv… ▽ More With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Involving outside actors in the evaluation of these systems - what we term 'external scrutiny' - via red-teaming, auditing, and external researcher access, offers a solution. Though there are encouraging signs of increasing external scrutiny of frontier LLMs, its success is not assured. In this paper, we survey six requirements for effective external scrutiny of frontier AI systems and organize them under the ASPIRE framework: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise. We then illustrate how external scrutiny might function throughout the AI lifecycle and offer recommendations to policymakers. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Accepted to Workshop on Socially Responsible Language Modelling Research (SoLaR) at the 2023 Conference on Neural Information Processing Systems (NeurIPS 2023)

ACM Class: I.2.0

arXiv:2311.02558 [pdf, other]

doi 10.1016/j.actaastro.2024.06.037

Multi-Agent 3D Map Reconstruction and Change Detection in Microgravity with Free-Flying Robots

Authors: Holly Dinkel, Julia Di, Jamie Santos, Keenan Albee, Paulo Borges, Marina Moreira, Oleg Alexandrov, Brian Coltin, Trey Smith

Abstract: Assistive free-flyer robots autonomously caring for future crewed outposts -- such as NASA's Astrobee robots on the International Space Station (ISS) -- must be able to detect day-to-day interior changes to track inventory, detect and diagnose faults, and monitor the outpost status. This work presents a framework for multi-agent cooperative mapping and change detection to enable robotic maintenanc… ▽ More Assistive free-flyer robots autonomously caring for future crewed outposts -- such as NASA's Astrobee robots on the International Space Station (ISS) -- must be able to detect day-to-day interior changes to track inventory, detect and diagnose faults, and monitor the outpost status. This work presents a framework for multi-agent cooperative mapping and change detection to enable robotic maintenance of space outposts. One agent is used to reconstruct a 3D model of the environment from sequences of images and corresponding depth information. Another agent is used to periodically scan the environment for inconsistencies against the 3D model. Change detection is validated after completing the surveys using real image and pose data collected by Astrobee robots in a ground testing environment and from microgravity aboard the ISS. This work outlines the objectives, requirements, and algorithmic modules for the multi-agent reconstruction system, including recommendations for its use by assistive free-flyers aboard future microgravity outposts. *Denotes Equal Contribution △ Less

Submitted 6 August, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: 11 pages, 8 figures, Manuscript presented at the 74th International Astronautical Congress, IAC 2023, Baku, Azerbaijan, 2 - 6 October 2023. Video presentation: [https://www.youtube.com/watch?v=VfjV-zwFEtU]. Code: [https://github.com/hollydinkel/astrobeecd]

Journal ref: Acta Astronautica 223 (2024) 98-107

arXiv:2310.19694 [pdf, other]

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Authors: Jimmy T. H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon

Abstract: Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequen… ▽ More Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.07322 [pdf]

A webcam-based machine learning approach for three-dimensional range of motion evaluation

Authors: Xiaoye Michael Wang, Derek T. Smith, Qin Zhu

Abstract: Background. Joint range of motion (ROM) is an important quantitative measure for physical therapy. Commonly relying on a goniometer, accurate and reliable ROM measurement requires extensive training and practice. This, in turn, imposes a significant barrier for those who have limited in-person access to healthcare. Objective. The current study presents and evaluates an alternative machine learni… ▽ More Background. Joint range of motion (ROM) is an important quantitative measure for physical therapy. Commonly relying on a goniometer, accurate and reliable ROM measurement requires extensive training and practice. This, in turn, imposes a significant barrier for those who have limited in-person access to healthcare. Objective. The current study presents and evaluates an alternative machine learning-based ROM evaluation method that could be remotely accessed via a webcam. Methods. To evaluate its reliability, the ROM measurements for a diverse set of joints (neck, spine, and upper and lower extremities) derived using this method were compared to those obtained from a marker-based optical motion capture system. Results. Data collected from 25 healthy adults demonstrated that the webcam solution exhibited high test-retest reliability, with substantial to almost perfect intraclass correlation coefficients for most joints. Compared with the marker-based system, the webcam-based system demonstrated substantial to almost perfect inter-rater reliability for some joints, and lower inter-rater reliability for other joints (e.g., shoulder flexion and elbow flexion), which could be attributed to the reduced sensitivity to joint locations at the apex of the movement. Conclusions. The proposed webcam-based method exhibited high test-retest and inter-rater reliability, making it a versatile alternative for existing ROM evaluation methods in clinical practice and the tele-implementation of physical therapy and rehabilitation. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.02183 [pdf, other]

Puddles: Application-Independent Recovery and Location-Independent Data for Persistent Memory

Authors: Suyash Mahar, Mingyao Shen, TJ Smith, Joseph Izraelevitz, Steven Swanson

Abstract: In this paper, we argue that current work has failed to provide a comprehensive and maintainable in-memory representation for persistent memory. PM data should be easily mappable into a process address space, shareable across processes, shippable between machines, consistent after a crash, and accessible to legacy code with fast, efficient pointers as first-class abstractions. While existing s… ▽ More In this paper, we argue that current work has failed to provide a comprehensive and maintainable in-memory representation for persistent memory. PM data should be easily mappable into a process address space, shareable across processes, shippable between machines, consistent after a crash, and accessible to legacy code with fast, efficient pointers as first-class abstractions. While existing systems have provided niceties like mmap()-based load/store access, they have not been able to support all these necessary properties due to conflicting requirements. We propose Puddles, a new persistent memory abstraction, to solve these problems. Puddles provide application-independent recovery after a power outage; they make recovery from a system failure a system-level property of the stored data rather than the responsibility of the programs that access it. Puddles use native pointers, so they are compatible with existing code. Finally, Puddles implement support for sharing and shipping of PM data between processes and systems without expensive serialization and deserialization. Compared to existing systems, Puddles are at least as fast as and up to 1.34$\times$ faster than PMDK while being competitive with other PM libraries across YCSB workloads. Moreover, to demonstrate Puddles' ability to relocate data, we showcase a sensor network data-aggregation workload that results in a 4.7$\times$ speedup over PMDK. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: To appear in EuroSys 2024

arXiv:2310.00206 [pdf, other]

An Investigation of Multi-feature Extraction and Super-resolution with Fast Microphone Arrays

Authors: Eric T. Chang, Runsheng Wang, Peter Ballentine, Jingxi Xu, Trey Smith, Brian Coltin, Ioannis Kymissis, Matei Ciocarlie

Abstract: In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled qu… ▽ More In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled quickly, are affordable, and occupy a very small footprint. Our prototype sensor uses only a sparse array (8-9 mm spacing) of distributed MEMS microphones (<$1, 3.76 x 2.95 x 1.10 mm) embedded under an elastomer. We use transformer-based architectures for data analysis, taking advantage of the microphones' high sampling rate to run our models on time-series data as opposed to individual snapshots. This approach allows us to obtain 77.3% average accuracy on 4-class texture classification (84.2% when excluding the slowest drag velocity), 1.8 mm mean error on contact localization, and 5.6 mm/s mean error on contact velocity. We show that the learned texture and localization models are robust to varying velocity and generalize to unseen velocities. We also report that our sensor provides fast contact detection, an important advantage of fast transducers. This investigation illustrates the capabilities one can achieve with a MEMS microphone array alone, leaving valuable sensor real estate available for integration with complementary tactile sensing modalities. △ Less

Submitted 7 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: 6 pages, 4 figures, accepted to 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2308.13135 [pdf, other]

Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery

Authors: Patrick Emedom-Nnamdi, Timothy R. Smith, Jukka-Pekka Onnela, Junwei Lu

Abstract: We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation… ▽ More We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation in patient recovery. While reinforcement learning has achieved widespread success in domains such as games, recent methods heavily rely on black-box methods, such neural networks. Unfortunately, these methods hinder the ability of examining the contribution each feature makes in producing the final suggested decision. While such interpretations are easily provided in classical algorithms such as Least Squares Policy Iteration, basic linearity assumptions prevent learning higher-order flexible interactions between features. In this paper, we present a novel method that offers a flexible technique for estimating action-value functions without making explicit parametric assumptions regarding their additive functional form. This nonparametric estimation strategy relies on incorporating local kernel regression and basis expansion to obtain a sparse, additive representation of the action-value function. Under this approach, we are able to locally approximate the action-value function and retrieve the nonlinear, independent contribution of select features as well as joint feature pairs. We validate the proposed approach with a simulation study, and, in an application to spine disease, uncover recovery recommendations that are inline with related clinical knowledge. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 28 pages, 13 figures

arXiv:2306.12629 [pdf, other]

Swarm of One: Bottom-up Emergence of Stable Robot Bodies from Identical Cells

Authors: Trevor Smith, R. Michael Butts, Nathan Adkins, Yu Gu

Abstract: Unlike most human-engineered systems, biological systems are emergent from low-level interactions, allowing much broader diversity and superior adaptation to the complex environments. Inspired by the process of morphogenesis in nature, a bottom-up design approach for robot morphology is proposed to treat a robot's body as an emergent response to underlying processes rather than a predefined shape.… ▽ More Unlike most human-engineered systems, biological systems are emergent from low-level interactions, allowing much broader diversity and superior adaptation to the complex environments. Inspired by the process of morphogenesis in nature, a bottom-up design approach for robot morphology is proposed to treat a robot's body as an emergent response to underlying processes rather than a predefined shape. This paper presents Loopy, a "Swarm-of-One" polymorphic robot testbed that can be viewed simultaneously as a robotic swarm and a single robot. Loopy's shape is determined jointly by self-organization and morphological computing using physically linked homogeneous cells. Experimental results show that Loopy can form symmetric shapes consisting of lobes. Using the the same set of parameters, even small amounts of initial noise can change the number of lobes formed. However, once in a stable configuration, Loopy has an "inertia" to transfiguring in response to dynamic parameters. By making the connections among self-organization, morphological computing, and robot design, this paper lays the foundation for more adaptable robot designs in the future. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 6 pages, 6 figures, IROS 2023

arXiv:2306.05810 [pdf, other]

Explaining Reinforcement Learning with Shapley Values

Authors: Daniel Beechey, Thomas M. S. Smith, Özgür Şimşek

Abstract: For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Rei… ▽ More For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 12 pages, 9 figures. Accepted at ICML 2023

arXiv:2305.00100 [pdf, other]

Temporal Subsampling Diminishes Small Spatial Scales in Recurrent Neural Network Emulators of Geophysical Turbulence

Authors: Timothy A. Smith, Stephen G. Penny, Jason A. Platt, Tse-Chun Chen

Abstract: The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this o… ▽ More The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this often overlooked processing step affects the quality of an emulator's predictions. We implement two ML architectures from a class of methods called reservoir computing: (1) a form of Nonlinear Vector Autoregression (NVAR), and (2) an Echo State Network (ESN). Despite their simplicity, it is well documented that these architectures excel at predicting low dimensional chaotic dynamics. We are therefore motivated to test these architectures in an idealized setting of predicting high dimensional geophysical turbulence as represented by Surface Quasi-Geostrophic dynamics. In all cases, subsampling the training data consistently leads to an increased bias at small spatial scales that resembles numerical diffusion. Interestingly, the NVAR architecture becomes unstable when the temporal resolution is increased, indicating that the polynomial based interactions are insufficient at capturing the detailed nonlinearities of the turbulent flow. The ESN architecture is found to be more robust, suggesting a benefit to the more expensive but more general structure. Spectral errors are reduced by including a penalty on the kinetic energy density spectrum during training, although the subsampling related errors persist. Future work is warranted to understand how the temporal resolution of training data affects other ML architectures. △ Less

Submitted 21 September, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

arXiv:2304.12865 [pdf, other]

Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks

Authors: Jason A. Platt, Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, Henry D. I. Abarbanel

Abstract: Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using th… ▽ More Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using the recurrent neural network architecture of reservoir computing. Results are given for the Lorenz 1996 chaotic dynamical system and a spectral quasi-geostrophic model, both typical test cases for numerical weather prediction. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2301.03708 [pdf, ps, other]

Descriptional Complexity of Finite Automata -- Selected Highlights

Authors: Arto Salomaa, Kai Salomaa, Taylor J. Smith

Abstract: The state complexity, respectively, nondeterministic state complexity of a regular language $L$ is the number of states of the minimal deterministic, respectively, of a minimal nondeterministic finite automaton for $L$. Some of the most studied state complexity questions deal with size comparisons of nondeterministic finite automata of differing degree of ambiguity. More generally, if for a regula… ▽ More The state complexity, respectively, nondeterministic state complexity of a regular language $L$ is the number of states of the minimal deterministic, respectively, of a minimal nondeterministic finite automaton for $L$. Some of the most studied state complexity questions deal with size comparisons of nondeterministic finite automata of differing degree of ambiguity. More generally, if for a regular language we compare the size of description by a finite automaton and by a more powerful language definition mechanism, such as a context-free grammar, we encounter non-recursive trade-offs. Operational state complexity studies the state complexity of the language resulting from a regularity preserving operation as a function of the complexity of the argument languages. Determining the state complexity of combined operations is generally challenging and for general combinations of operations that include intersection and marked concatenation it is uncomputable. △ Less

Submitted 4 July, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.06701 [pdf]

A Novel Approach For Generating Customizable Light Field Datasets for Machine Learning

Authors: Julia Huang, Toure Smith, Aloukika Patro, Vidhi Chhabra

Abstract: To train deep learning models, which often outperform traditional approaches, large datasets of a specified medium, e.g., images, are used in numerous areas. However, for light field-specific machine learning tasks, there is a lack of such available datasets. Therefore, we create our own light field datasets, which have great potential for a variety of applications due to the abundance of informat… ▽ More To train deep learning models, which often outperform traditional approaches, large datasets of a specified medium, e.g., images, are used in numerous areas. However, for light field-specific machine learning tasks, there is a lack of such available datasets. Therefore, we create our own light field datasets, which have great potential for a variety of applications due to the abundance of information in light fields compared to singular images. Using the Unity and C# frameworks, we develop a novel approach for generating large, scalable, and reproducible light field datasets based on customizable hardware configurations to accelerate light field deep learning research. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 5 pages, 5 figures, accepted to and presented at MIT URTC Conference, and will be published in IEEE proceedings

ACM Class: I.3.6

arXiv:2212.06566 [pdf, other]

doi 10.1029/2023WR035803

How to select an objective function using information theory

Authors: Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall

Abstract: In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their r… ▽ More In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change. △ Less

Submitted 3 June, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 17 pages, 3 figures, 1 table

Journal ref: Water Resources Research, 60, e2023WR035803 (2024)

arXiv:2209.09313 [pdf, ps, other]

Natural Wave Numbers, Natural Wave Co-numbers, and the Computation of the Primes

Authors: Terence R. Smith

Abstract: The paper exploits an isomorphism between the natural numbers N and a space U of periodic sequences of the roots of unity in constructing a recursive procedure for representing and computing the prime numbers. The nth wave number ${\bf u}_n$ is the countable sequence of the nth roots of unity having frequencies k/n for all integer phases k. The space U is closed under a commutative and associative… ▽ More The paper exploits an isomorphism between the natural numbers N and a space U of periodic sequences of the roots of unity in constructing a recursive procedure for representing and computing the prime numbers. The nth wave number ${\bf u}_n$ is the countable sequence of the nth roots of unity having frequencies k/n for all integer phases k. The space U is closed under a commutative and associative binary operation ${\bf u}_m \odot{\bf u}_n={\bf u}_{mn}$, termed the circular product, and is isomorphic with N under their respective product operators. Functions are defined on U that partition wave numbers into two complementary sequences, of which the co-number $ {\overset {\bf \ast }{ \bf u}}_n$ is a function of a wave number in which zeros replace its positive roots of unity. The recursive procedure $ {\overset {\bf \ast }{ \bf U}}_{N+1}= {\overset {\bf \ast }{ \bf U}}_{N}\odot{\overset {\bf \ast }{\bf u}}_{N+1}$ represents prime numbers explicitly in terms of preceding prime numbers, starting with $p_1=2$, and is shown never to terminate. If ${p}_1, ... , { p}_{N+1}$ are the first $N+1$ prime phases, then the phases in the range $p_{N+1} \leq k < p^2_{N+1}$ that are associated with the non-zero terms of $ {\overset {\bf \ast }{\bf U}}_{N}$ are, together with $ p_1, ...,p_N$, all of the prime phases less than $p^2_{N+1}$. When applied with all of the primes identified at the previous step, the recursive procedure identifies approximately $7^{2(N-1)}/(2(N-1)ln7)$ primes at each iteration for $ N>1$. When the phases of wave numbers are represented in modular arithmetic, the prime phases are representable in terms of sums of reciprocals of the initial set of prime phases and have a relation with the zeta-function. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 16 pages

arXiv:2208.04933 [pdf, other]

Simplified State Space Layers for Sequence Modeling

Authors: Jimmy T. H. Smith, Andrew Warrington, Scott W. Linderman

Abstract: Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent sing… ▽ More Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent single-input, single-output SSMs, the S5 layer uses one multi-input, multi-output SSM. We establish a connection between S5 and S4, and use this to develop the initialization and parameterization used by the S5 model. The result is a state space layer that can leverage efficient and widely implemented parallel scans, allowing S5 to match the computational efficiency of S4, while also achieving state-of-the-art performance on several long-range sequence modeling tasks. S5 averages 87.4% on the long range arena benchmark, and 98.5% on the most difficult Path-X task. △ Less

Submitted 3 March, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2207.10016 [pdf, ps, other]

Two-Dimensional Typewriter Automata

Authors: Taylor J. Smith

Abstract: A typewriter automaton is a special variant of a two-dimensional automaton that receives two-dimensional words as input and is only capable of moving its input head through its input word in three directions: downward, leftward, and rightward. In addition, downward and leftward moves may only be made via a special "reset" move that simulates the action of a typewriter's carriage return. In this… ▽ More A typewriter automaton is a special variant of a two-dimensional automaton that receives two-dimensional words as input and is only capable of moving its input head through its input word in three directions: downward, leftward, and rightward. In addition, downward and leftward moves may only be made via a special "reset" move that simulates the action of a typewriter's carriage return. In this paper, we initiate the study of the typewriter automaton model and relate it to similar models, including three-way two-dimensional automata, boustrophedon automata, and returning automata. We study the recognition powers of the typewriter automaton model, establish closure properties of the class of languages recognized by the model, and consider operational state complexity bounds for the specific operation of row concatenation. We also provide a variety of potential future research directions pertaining to the model. △ Less

Submitted 20 July, 2022; originally announced July 2022.

MSC Class: 68Q45 (primary); 68Q15; 68Q19 (secondary)

arXiv:2206.14289 [pdf, other]

Stronger Together: Air-Ground Robotic Collaboration Using Semantics

Authors: Ian D. Miller, Fernando Cladera, Trey Smith, Camillo Jose Taylor, Vijay Kumar

Abstract: In this work, we present an end-to-end heterogeneous multi-robot system framework where ground robots are able to localize, plan, and navigate in a semantic map created in real time by a high-altitude quadrotor. The ground robots choose and deconflict their targets independently, without any external intervention. Moreover, they perform cross-view localization by matching their local maps with the… ▽ More In this work, we present an end-to-end heterogeneous multi-robot system framework where ground robots are able to localize, plan, and navigate in a semantic map created in real time by a high-altitude quadrotor. The ground robots choose and deconflict their targets independently, without any external intervention. Moreover, they perform cross-view localization by matching their local maps with the overhead map using semantics. The communication backbone is opportunistic and distributed, allowing the entire system to operate with no external infrastructure aside from GPS for the quadrotor. We extensively tested our system by performing different missions on top of our framework over multiple experiments in different environments. Our ground robots travelled over 6 km autonomously with minimal intervention in the real world and over 96 km in simulation without interventions. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Sumbitted to RA-L and IROS

arXiv:2206.08972 [pdf, other]

Shallow and Deep Nonparametric Convolutions for Gaussian Processes

Authors: Thomas M. McDonald, Magnus Ross, Michael T. Smith, Mauricio A. Álvarez

Abstract: A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by e… ▽ More A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by extension the covariance, as a way to bypass the need to specify it in advance. However, such models have been limited in several ways: they are restricted to single dimensional inputs, e.g. time; they only allow modelling of single outputs and they do not scale to large datasets since inference is not straightforward. In this paper, we introduce a nonparametric process convolution formulation for GPs that alleviates these weaknesses by using a functional sampling approach based on Matheron's rule to perform fast sampling using interdomain inducing variables. Furthermore, we propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models, and allows the covariance functions of the intermediate layers to be inferred from the data. We test the performance of our model on benchmarks for single output GPs, multiple output GPs and deep GPs and find that our approach can provide improvements over standard GP models, particularly for larger datasets. △ Less

Submitted 18 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: 19 pages, 7 figures. NP-DGP results and discussion updated

arXiv:2205.01988 [pdf, other]

Modelling calibration uncertainty in networks of environmental sensors

Authors: Michael Thomas Smith, Magnus Ross, Joel Ssematimba, Pablo A. Alvarado, Mauricio Alvarez, Engineer Bainomugisha, Richard Wilkinson

Abstract: Networks of low-cost sensors are becoming ubiquitous, but often suffer from poor accuracies and drift. Regular colocation with reference sensors allows recalibration but is complicated and expensive. Alternatively the calibration can be transferred using low-cost, mobile sensors. However inferring the calibration (with uncertainty) becomes difficult. We propose a variational approach to model the… ▽ More Networks of low-cost sensors are becoming ubiquitous, but often suffer from poor accuracies and drift. Regular colocation with reference sensors allows recalibration but is complicated and expensive. Alternatively the calibration can be transferred using low-cost, mobile sensors. However inferring the calibration (with uncertainty) becomes difficult. We propose a variational approach to model the calibration across the network. We demonstrate the approach on synthetic and real air pollution data, and find it can perform better than the state of the art (multi-hop calibration). We extend it to categorical data produced by citizen-scientist labelling. In Summary: The method achieves uncertainty-quantified calibration, which has been one of the barriers to low-cost sensor deployment and citizen-science research. △ Less

Submitted 9 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

Comments: 31 pages (23 pages of content, 4 pages of references, 4 supplementary). 11 figures. 4 tables. Submitted to Journal of the Royal Statistical Society. Series C

MSC Class: 60G15

arXiv:2203.13595 [pdf, other]

doi 10.1109/ICIP42928.2021.9506584

Fast Hybrid Image Retargeting

Authors: Daniel Valdez-Balderas, Oleg Muraveynyk, Timothy Smith

Abstract: Image retargeting changes the aspect ratio of images while aiming to preserve content and minimise noticeable distortion. Fast and high-quality methods are particularly relevant at present, due to the large variety of image and display aspect ratios. We propose a retargeting method that quantifies and limits warping distortions with the use of content-aware cropping. The pipeline of the proposed a… ▽ More Image retargeting changes the aspect ratio of images while aiming to preserve content and minimise noticeable distortion. Fast and high-quality methods are particularly relevant at present, due to the large variety of image and display aspect ratios. We propose a retargeting method that quantifies and limits warping distortions with the use of content-aware cropping. The pipeline of the proposed approach consists of the following steps. First, an importance map of a source image is generated using deep semantic segmentation and saliency detection models. Then, a preliminary warping mesh is computed using axis aligned deformations, enhanced with the use of a distortion measure to ensure low warping deformations. Finally, the retargeted image is produced using a content-aware cropping algorithm. In order to evaluate our method, we perform a user study based on the RetargetMe benchmark. Experimental analyses show that our method outperforms recent approaches, while running in a fraction of their execution time. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: 5 pages

ACM Class: I.2.10; I.4.0; I.5.4

Journal ref: 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 1849-1853

arXiv:2203.08925 [pdf, other]

doi 10.1109/LRA.2021.3061332

Any Way You Look At It: Semantic Crossview Localization and Mapping with LiDAR

Authors: Ian D. Miller, Anthony Cowley, Ravi Konkimalla, Shreyas S. Shivakumar, Ty Nguyen, Trey Smith, Camillo Jose Taylor, Vijay Kumar

Abstract: Currently, GPS is by far the most popular global localization method. However, it is not always reliable or accurate in all environments. SLAM methods enable local state estimation but provide no means of registering the local map to a global one, which can be important for inter-robot collaboration or human interaction. In this work, we present a real-time method for utilizing semantics to global… ▽ More Currently, GPS is by far the most popular global localization method. However, it is not always reliable or accurate in all environments. SLAM methods enable local state estimation but provide no means of registering the local map to a global one, which can be important for inter-robot collaboration or human interaction. In this work, we present a real-time method for utilizing semantics to globally localize a robot using only egocentric 3D semantically labelled LiDAR and IMU as well as top-down RGB images obtained from satellites or aerial robots. Additionally, as it runs, our method builds a globally registered, semantic map of the environment. We validate our method on KITTI as well as our own challenging datasets, and show better than 10 meter accuracy, a high degree of robustness, and the ability to estimate the scale of a top-down map on the fly if it is initially unknown. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Published in the IEEE Robotics and Automation Letters and presented at the IEEE 2021 International Conference on Robotics and Automation. See https://www.youtube.com/watch?v=_qwAoYK9iGU for accompanying video

Journal ref: in IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2397-2404, April 2021

arXiv:2201.08910 [pdf, other]

A Systematic Exploration of Reservoir Computing for Forecasting Complex Spatiotemporal Dynamics

Authors: Jason A. Platt, Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, Henry D. I. Abarbanel

Abstract: A reservoir computer (RC) is a type of simplified recurrent neural network architecture that has demonstrated success in the prediction of spatiotemporally chaotic dynamical systems. A further advantage of RC is that it reproduces intrinsic dynamical quantities essential for its incorporation into numerical forecasting routines such as the ensemble Kalman filter -- used in numerical weather predic… ▽ More A reservoir computer (RC) is a type of simplified recurrent neural network architecture that has demonstrated success in the prediction of spatiotemporally chaotic dynamical systems. A further advantage of RC is that it reproduces intrinsic dynamical quantities essential for its incorporation into numerical forecasting routines such as the ensemble Kalman filter -- used in numerical weather prediction to compensate for sparse and noisy data. We explore here the architecture and design choices for a "best in class" RC for a number of characteristic dynamical systems, and then show the application of these choices in scaling up to larger models using localization. Our analysis points to the importance of large scale parameter optimization. We also note in particular the importance of including input bias in the RC design, which has a significant impact on the forecast skill of the trained RC model. In our tests, the the use of a nonlinear readout operator does not affect the forecast time or the stability of the forecast. The effects of the reservoir dimension, spinup time, amount of training data, normalization, noise, and the RC time step are also investigated. While we are not aware of a generally accepted best reported mean forecast time for different models in the literature, we report over a factor of 2 increase in the mean forecast time compared to the best performing RC model of Vlachas et.al (2020) for the 40 dimensional spatiotemporally chaotic Lorenz 1996 dynamics, and we are able to accomplish this using a smaller reservoir size. △ Less

Submitted 21 January, 2022; originally announced January 2022.

arXiv:2201.05985 [pdf, other]

Exposing the Obscured Influence of State-Controlled Media: A Causal Estimation of Influence Between Media Outlets Via Quotation Propagation

Authors: Joseph Schlessinger, Richard Bennet, Jacob Coakwell, Steven T. Smith, Edward K. Kao

Abstract: This study quantifies influence between media outlets by applying a novel methodology that uses causal effect estimation on networks and transformer language models. We demonstrate the obscured influence of state-controlled outlets over other outlets, regardless of orientation, by analyzing a large dataset of quotations from over 100 thousand articles published by the most prominent European and R… ▽ More This study quantifies influence between media outlets by applying a novel methodology that uses causal effect estimation on networks and transformer language models. We demonstrate the obscured influence of state-controlled outlets over other outlets, regardless of orientation, by analyzing a large dataset of quotations from over 100 thousand articles published by the most prominent European and Russian traditional media outlets, appearing between May 2018 and October 2019. The analysis maps out the network structure of influence with news wire services serving as prominent bridges that connect outlets in different geo-political spheres. Overall, this approach demonstrates capabilities to identify and quantify the channels of influence in intermedia agenda setting over specific topics. △ Less

Submitted 16 January, 2022; originally announced January 2022.

arXiv:2201.05193 [pdf, other]

`Next Generation' Reservoir Computing: an Empirical Data-Driven Expression of Dynamical Equations in Time-Stepping Form

Authors: Tse-Chun Chen, Stephen G. Penny, Timothy A. Smith, Jason A. Platt

Abstract: Next generation reservoir computing based on nonlinear vector autoregression (NVAR) is applied to emulate simple dynamical system models and compared to numerical integration schemes such as Euler and the $2^\text{nd}$ order Runge-Kutta. It is shown that the NVAR emulator can be interpreted as a data-driven method used to recover the numerical integration scheme that produced the data. It is also… ▽ More Next generation reservoir computing based on nonlinear vector autoregression (NVAR) is applied to emulate simple dynamical system models and compared to numerical integration schemes such as Euler and the $2^\text{nd}$ order Runge-Kutta. It is shown that the NVAR emulator can be interpreted as a data-driven method used to recover the numerical integration scheme that produced the data. It is also shown that the approach can be extended to produce high-order numerical schemes directly from data. The impacts of the presence of noise and temporal sparsity in the training set is further examined to gauge the potential use of this method for more realistic applications. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: 12 pages, 6 figures

arXiv:2112.01718 [pdf, other]

Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Authors: Vithya Yogarajan, Bernhard Pfahringer, Tony Smith, Jacob Montiel

Abstract: Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label class… ▽ More Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.01256 [pdf, other]

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Authors: Jimmy T. H. Smith, Scott W. Linderman, David Sussillo

Abstract: Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has signif… ▽ More Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. These include difficulty choosing which fixed point to expand around when studying RNN dynamics and error accumulation when reconstructing the nonlinear dynamics with the linearized dynamics. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation. A first-order Taylor series expansion of the co-trained RNN and an auxiliary function trained to pick out the RNN's fixed points govern the SLDS dynamics. The results are a trained SLDS variant that closely approximates the RNN, an auxiliary function that can produce a fixed point for each point in state-space, and a trained nonlinear RNN whose dynamics have been regularized such that its first-order terms perform the computation, if possible. This model removes the post-training fixed point optimization and allows us to unambiguously study the learned dynamics of the SLDS at any point in state-space. It also generalizes SLDS models to continuous manifolds of switching points while sharing parameters across switches. We validate the utility of the model on two synthetic tasks relevant to previous work reverse engineering RNNs. We then show that our model can be used as a drop-in in more complex architectures, such as LFADS, and apply this LFADS hybrid to analyze single-trial spiking activity from the motor system of a non-human primate. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: 23 pages, 9 figures

arXiv:2110.00183 [pdf, other]

doi 10.1007/978-3-030-97546-3_27

Predicting COVID-19 Patient Shielding: A Comprehensive Study

Authors: Vithya Yogarajan, Jacob Montiel, Tony Smith, Bernhard Pfahringer

Abstract: There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label cla… ▽ More There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label classification of medical text. Using the information published by the United Kingdom NHS and the World Health Organisation, we present a novel approach to predicting COVID-19 patient shielding as a multi-label classification problem. We use publicly available, de-identified ICU medical text data for our experiments. The labels are derived from the published COVID-19 patient shielding data. We present an extensive comparison across 12 multi-label classifiers from the simple binary relevance to neural networks and the most recent transformers. To the best of our knowledge this is the first comprehensive study, where such a range of multi-label classifiers for medical text are considered. We highlight the benefits of various approaches, and argue that, for the task at hand, both predictive accuracy and processing time are essential. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: Accepted in AJCAI 2021

Journal ref: The 2021 Australasian Joint Conference on Artificial Intelligence (AJCAI 2021)

arXiv:2109.12269 [pdf, other]

doi 10.1029/2021MS002843

Integrating Recurrent Neural Networks with Data Assimilation for Scalable Data-Driven State Estimation

Authors: Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, Jason A. Platt, Hsin-Yi Lin, Michael Goodliff, Henry D. I. Abarbanel

Abstract: Data assimilation (DA) is integrated with machine learning in order to perform entirely data-driven online state estimation. To achieve this, recurrent neural networks (RNNs) are implemented as surrogate models to replace key components of the DA cycle in numerical weather prediction (NWP), including the conventional numerical forecast model, the forecast error covariance matrix, and the tangent l… ▽ More Data assimilation (DA) is integrated with machine learning in order to perform entirely data-driven online state estimation. To achieve this, recurrent neural networks (RNNs) are implemented as surrogate models to replace key components of the DA cycle in numerical weather prediction (NWP), including the conventional numerical forecast model, the forecast error covariance matrix, and the tangent linear and adjoint models. It is shown how these RNNs can be initialized using DA methods to directly update the hidden/reservoir state with observations of the target system. The results indicate that these techniques can be applied to estimate the state of a system for the repeated initialization of short-term forecasts, even in the absence of a traditional numerical forecast model. Further, it is demonstrated how these integrated RNN-DA methods can scale to higher dimensions by applying domain localization and parallelization, providing a path for practical applications in NWP. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: 22 pages, 16 figures

arXiv:2108.02955 [pdf, other]

Impressions of the GDMC AI Settlement Generation Challenge in Minecraft

Authors: Christoph Salge, Claus Aranha, Adrian Brightmoore, Sean Butler, Rodrigo Canaan, Michael Cook, Michael Cerny Green, Hagen Fischer, Christian Guckelsberger, Jupiter Hadley, Jean-Baptiste Hervé, Mark R Johnson, Quinn Kybartas, David Mason, Mike Preuss, Tristan Smith, Ruck Thawonmas, Julian Togelius

Abstract: The GDMC AI settlement generation challenge is a PCG competition about producing an algorithm that can create an "interesting" Minecraft settlement for a given map. This paper contains a collection of written experiences with this competition, by participants, judges, organizers and advisors. We asked people to reflect both on the artifacts themselves, and on the competition in general. The aim of… ▽ More The GDMC AI settlement generation challenge is a PCG competition about producing an algorithm that can create an "interesting" Minecraft settlement for a given map. This paper contains a collection of written experiences with this competition, by participants, judges, organizers and advisors. We asked people to reflect both on the artifacts themselves, and on the competition in general. The aim of this paper is to offer a shareable and edited collection of experiences and qualitative feedback - which seem to contain a lot of insights on PCG and computational creativity, but would otherwise be lost once the output of the competition is reduced to scalar performance values. We reflect upon some organizational issues for AI competitions, and discuss the future of the GDMC competition. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: 28 pages, 5 figures

arXiv:2106.05582 [pdf, other]

Learning Nonparametric Volterra Kernels with Gaussian Processes

Authors: Magnus Ross, Michael T. Smith, Mauricio A. Álvarez

Abstract: This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple outp… ▽ More This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model. When the input function is observed, the NVKM can be used to perform Bayesian system identification. We use recent advances in efficient sampling of explicit functions from GPs to map process realisations through the Volterra series without resorting to numerical integration, allowing scalability through doubly stochastic variational inference, and avoiding the need for Gaussian approximations of the output processes. We demonstrate the performance of the model for both multiple output regression and system identification using standard benchmarks. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 17 pages, 5 figures

arXiv:2105.01179 [pdf, ps, other]

Degrees of Restriction for Two-Dimensional Automata

Authors: Taylor J. Smith, Kai Salomaa

Abstract: A three-way (resp., two-way) two-dimensional automaton has a read-only input head that moves in three (resp., two) directions on a finite array of cells labelled by symbols of the input alphabet. Restricting the input head movement of a two-dimensional automaton results in a model that is weaker in terms of recognition power. In this paper, we introduce the notion of "degrees of restriction" for… ▽ More A three-way (resp., two-way) two-dimensional automaton has a read-only input head that moves in three (resp., two) directions on a finite array of cells labelled by symbols of the input alphabet. Restricting the input head movement of a two-dimensional automaton results in a model that is weaker in terms of recognition power. In this paper, we introduce the notion of "degrees of restriction" for two-dimensional automata, and we develop sets of extended two-dimensional automaton models that allow for some bounded number of restricted moves. We establish recognition hierarchies for both deterministic and nondeterministic extended three-way two-dimensional automata, and we find similar hierarchies for both deterministic and nondeterministic extended two-way two-dimensional automata. We also prove incomparability results between nondeterministic and deterministic extended three-way two-dimensional automata. Lastly, we consider closure properties for some operations on languages recognized by extended three-way two-dimensional automata. △ Less

Submitted 3 May, 2021; originally announced May 2021.

MSC Class: 68Q45 (primary); 68Q15 (secondary)

arXiv:2104.06481 [pdf, other]

Political Polarization in Online News Consumption

Authors: Kiran Garimella, Tim Smith, Rebecca Weiss, Robert West

Abstract: Political polarization appears to be on the rise, as measured by voting behavior, general affect towards opposing partisans and their parties, and contents posted and consumed online. Research over the years has focused on the role of the Web as a driver of polarization. In order to further our understanding of the factors behind online polarization, in the present work we collect and analyze Web… ▽ More Political polarization appears to be on the rise, as measured by voting behavior, general affect towards opposing partisans and their parties, and contents posted and consumed online. Research over the years has focused on the role of the Web as a driver of polarization. In order to further our understanding of the factors behind online polarization, in the present work we collect and analyze Web browsing histories of tens of thousands of users alongside careful measurements of the time spent browsing various news sources. We show that online news consumption follows a polarized pattern, where users' visits to news sources aligned with their own political leaning are substantially longer than their visits to other news sources. Next, we show that such preferences hold at the individual as well as the population level, as evidenced by the emergence of clear partisan communities of news domains from aggregated browsing patterns. Finally, we tackle the important question of the role of user choices in polarization. Are users simply following the links proffered by their Web environment, or do they exacerbate partisan polarization by intentionally pursuing like-minded news sources? To answer this question, we compare browsing patterns with the underlying hyperlink structure spanned by the considered news domains, finding strong evidence of polarization in partisan browsing habits beyond that which can be explained by the hyperlink structure of the Web. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted at ICWSM 2021

arXiv:2009.00602 [pdf, ps, other]

Recognition and Complexity Results for Projection Languages of Two-Dimensional Automata

Authors: Taylor J. Smith, Kai Salomaa

Abstract: The row projection (resp., column projection) of a two-dimensional language $L$ is the one-dimensional language consisting of all first rows (resp., first columns) of each two-dimensional word in $L$. The operation of row projection has previously been studied under the name "frontier language", and previous work has focused on one- and two-dimensional language classes. In this paper, we study p… ▽ More The row projection (resp., column projection) of a two-dimensional language $L$ is the one-dimensional language consisting of all first rows (resp., first columns) of each two-dimensional word in $L$. The operation of row projection has previously been studied under the name "frontier language", and previous work has focused on one- and two-dimensional language classes. In this paper, we study projections of languages recognized by various two-dimensional automaton classes. We show that both the row and column projections of languages recognized by (four-way) two-dimensional automata are exactly context-sensitive. We also show that the column projections of languages recognized by unary three-way two-dimensional automata can be recognized using nondeterministic logspace. Finally, we study the state complexity of projection languages for two-way two-dimensional automata, focusing on the language operations of union and diagonal concatenation. △ Less

Submitted 1 September, 2020; originally announced September 2020.

MSC Class: 68Q45 (primary); 68Q15; 68Q19 (secondary)

arXiv:2008.11164 [pdf, ps, other]

Concatenation Operations and Restricted Variants of Two-Dimensional Automata

Authors: Taylor J. Smith, Kai Salomaa

Abstract: A two-dimensional automaton operates on arrays of symbols. While a standard (four-way) two-dimensional automaton can move its input head in four directions, restricted two-dimensional automata are only permitted to move their input heads in three or two directions; these models are called three-way and two-way two-dimensional automata, respectively. In two dimensions, we may extend the notion of… ▽ More A two-dimensional automaton operates on arrays of symbols. While a standard (four-way) two-dimensional automaton can move its input head in four directions, restricted two-dimensional automata are only permitted to move their input heads in three or two directions; these models are called three-way and two-way two-dimensional automata, respectively. In two dimensions, we may extend the notion of concatenation in multiple ways, depending on the words to be concatenated. We may row-concatenate (resp., column-concatenate) a pair of two-dimensional words when they have the same number of columns (resp., rows). In addition, the diagonal concatenation operation combines two words at their lower-right and upper-left corners, and is not dimension-dependent. In this paper, we investigate closure properties of restricted models of two-dimensional automata under various concatenation operations. We give non-closure results for two-way two-dimensional automata under row and column concatenation in both the deterministic and nondeterministic cases. We further give positive closure results for the same concatenation operations on unary nondeterministic two-way two-dimensional automata. Finally, we study closure properties of diagonal concatenation on both two- and three-way two-dimensional automata. △ Less

Submitted 25 August, 2020; originally announced August 2020.

MSC Class: 68Q45 (primary); 68R15 (secondary)

arXiv:2005.10879 [pdf, other]

doi 10.1073/pnas.2011216118

Automatic Detection of Influential Actors in Disinformation Networks

Authors: Steven T. Smith, Edward K. Kao, Erika D. Mackin, Danelle C. Shah, Olga Simek, Donald B. Rubin

Abstract: The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IOs). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language processing,… ▽ More The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IOs). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language processing, machine learning, graph analytics, and a novel network causal inference approach to quantify the impact of individual actors in spreading IO narratives. We demonstrate its capability on real-world hostile IO campaigns with Twitter datasets collected during the 2017 French presidential elections, and known IO accounts disclosed by Twitter over a broad range of IO campaigns (May 2007 to February 2020), over 50,000 accounts, 17 countries, and different account types including both trolls and bots. Our system detects IO accounts with 96% precision, 79% recall, and 96% area-under-the-PR-curve, maps out salient network communities, and discovers high-impact accounts that escape the lens of traditional impact statistics based on activity counts and network centrality. Results are corroborated with independent sources of known IO accounts from U.S. Congressional reports, investigative journalism, and IO datasets provided by Twitter. △ Less

Submitted 7 January, 2021; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: Proc. Natl. Acad. Sciences U.S.A. Vol. 118, No. 4, e2011216118

arXiv:2004.03666 [pdf, other]

Compositional Formal Analysis Based on Conventional Engineering Models

Authors: Tyler D. Smith, Ryan Peroutka, Robert Edman

Abstract: Applications of formal methods for state space exploration have been successfully applied to evaluate robust critical software systems. Formal methods enable discovery of error conditions that conventional testing may miss, and can aid in planning complex system operations. However, broad application of formal methods has been hampered by the effort required to generate formal specifications for r… ▽ More Applications of formal methods for state space exploration have been successfully applied to evaluate robust critical software systems. Formal methods enable discovery of error conditions that conventional testing may miss, and can aid in planning complex system operations. However, broad application of formal methods has been hampered by the effort required to generate formal specifications for real systems. In this paper we present State Linked Interface Compliance Engine for Data (SLICED), a methodology that addresses the complexity of formal state machine specification generation by leveraging conventional engineering models to derive compositional formal state models and to generate formal assertions on the state machines. We demonstrate SLICED using the Virtual ADAPT model published by NASA and validate our results by replicating them using Simulink. △ Less

Submitted 7 April, 2020; originally announced April 2020.

ACM Class: F.0

Showing 1–50 of 90 results for author: Smith, T