Search | arXiv e-print repository

Can an unsupervised clustering algorithm reproduce a categorization system?

Authors: Nathalia Castellanos, Dhruv Desai, Sebastian Frank, Stefano Pasquali, Dhagash Mehta

Abstract: Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems' consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depend… ▽ More Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems' consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depends on feature selection and the chosen distance metric. Using toy datasets and fund categorization as real-world examples we demonstrate that accurately reproducing ground truth classes is challenging. We also highlight the limitations of standard clustering evaluation metrics in identifying the optimal number of clusters relative to the ground truth classes. We then show that if appropriate features are available in the dataset, and a proper distance metric is known (e.g., using a supervised Random Forest-based distance metric learning method), then an unsupervised clustering can indeed reproduce the ground truth classes as distinct clusters. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 9 pages, 4 tables 28 figures

arXiv:2408.09604 [pdf, other]

Circuit design in biology and machine learning. I. Random networks and dimensional reduction

Authors: Steven A. Frank

Abstract: A biological circuit is a neural or biochemical cascade, taking inputs and producing outputs. How have biological circuits learned to solve environmental challenges over the history of life? The answer certainly follows Dobzhansky's famous quote that ``nothing in biology makes sense except in the light of evolution.'' But that quote leaves out the mechanistic basis by which natural selection's tri… ▽ More A biological circuit is a neural or biochemical cascade, taking inputs and producing outputs. How have biological circuits learned to solve environmental challenges over the history of life? The answer certainly follows Dobzhansky's famous quote that ``nothing in biology makes sense except in the light of evolution.'' But that quote leaves out the mechanistic basis by which natural selection's trial-and-error learning happens, which is exactly what we have to understand. How does the learning process that designs biological circuits actually work? How much insight can we gain about the form and function of biological circuits by studying the processes that have made those circuits? Because life's circuits must often solve the same problems as those faced by machine learning, such as environmental tracking, homeostatic control, dimensional reduction, or classification, we can begin by considering how machine learning designs computational circuits to solve problems. We can then ask: How much insight do those computational circuits provide about the design of biological circuits? How much does biology differ from computers in the particular circuit designs that it uses to solve problems? This article steps through two classic machine learning models to set the foundation for analyzing broad questions about the design of biological circuits. One insight is the surprising power of randomly connected networks. Another is the central role of internal models of the environment embedded within biological circuits, illustrated by a model of dimensional reduction and trend prediction. Overall, many challenges in biology have machine learning analogs, suggesting hypotheses about how biology's circuits are designed. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2404.17358 [pdf, ps, other]

Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier

Authors: Natalie S. Frank

Abstract: Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to propertie… ▽ More Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as \emph{adversarial Bayes classifiers}. Specifically, under reasonable distributional assumptions, a convex loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness. △ Less

Submitted 15 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 18 pages, v2: fixed typos

arXiv:2404.16956 [pdf, other]

A Notion of Uniqueness for the Adversarial Bayes Classifier

Authors: Natalie S. Frank

Abstract: We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this concept produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family of one dimensional data distributions. This characterization is then leveraged to show that as the perturbation radius increases, certain the regularity of a… ▽ More We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this concept produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family of one dimensional data distributions. This characterization is then leveraged to show that as the perturbation radius increases, certain the regularity of adversarial Bayes classifiers improves. Various examples demonstrate that the boundary of the adversarial Bayes classifier frequently lies near the boundary of the Bayes classifier. △ Less

Submitted 17 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 49 pages, 7 figures v2: fixed typos, notation errors, and a mistake in example 7

arXiv:2308.15084 [pdf, other]

Introducing Interactions in Multi-Objective Optimization of Software Architectures

Authors: Vittorio Cortellessa, J. Andres Diaz-Pace, Daniele Di Pompeo, Sebastian Frank, Pooyan Jamshidi, Michele Tucci, André van Hoorn

Abstract: Software architecture optimization aims to enhance non-functional attributes like performance and reliability while meeting functional requirements. Multi-objective optimization employs metaheuristic search techniques, such as genetic algorithms, to explore feasible architectural changes and propose alternatives to designers. However, the resource-intensive process may not always align with practi… ▽ More Software architecture optimization aims to enhance non-functional attributes like performance and reliability while meeting functional requirements. Multi-objective optimization employs metaheuristic search techniques, such as genetic algorithms, to explore feasible architectural changes and propose alternatives to designers. However, the resource-intensive process may not always align with practical constraints. This study investigates the impact of designer interactions on multi-objective software architecture optimization. Designers can intervene at intermediate points in the fully automated optimization process, making choices that guide exploration towards more desirable solutions. We compare this interactive approach with the fully automated optimization process, which serves as the baseline. The findings demonstrate that designer interactions lead to a more focused solution space, resulting in improved architectural quality. By directing the search towards regions of interest, the interaction uncovers architectures that remain unexplored in the fully automated process. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2305.14585 [pdf, other]

Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models

Authors: Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang

Abstract: A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution… ▽ More A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution. Approximation is critical for eNTK analysis due to the high computational cost to compute the eNTK. We define new approximate eNTK and perform novel analysis on how well the resulting kernel machine surrogate models correlate with the underlying neural network. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models, with the introduced trace NTK the most consistent performer. Open source software allowing users to efficiently calculate kernel functions in the PyTorch framework is available (https://github.com/pnnl/projection\_ntk). △ Less

Submitted 11 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 figures, 3 tables Updated 3/11/2024 various additions/clarifications after ICLR review. Accepted as a Spotlight paper at ICLR 2024

arXiv:2210.13134 [pdf, other]

Multilingual Multimodal Learning with Machine Translated Text

Authors: Chen Qiu, Dan Oneata, Emanuele Bugliarello, Stella Frank, Desmond Elliott

Abstract: Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the l… ▽ More Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: EMNLP 2022

arXiv:2207.04487 [pdf, other]

doi 10.3389/fevo.2022.1010278

Automatic differentiation and the optimization of differential equation models in biology

Authors: Steven A. Frank

Abstract: A computational revolution unleashed the power of artificial neural networks. At the heart of that revolution is automatic differentiation, which calculates the derivative of a performance measure relative to a large number of parameters. Differentiation enhances the discovery of improved performance in large models, an achievement that was previously difficult or impossible. Recently, a second co… ▽ More A computational revolution unleashed the power of artificial neural networks. At the heart of that revolution is automatic differentiation, which calculates the derivative of a performance measure relative to a large number of parameters. Differentiation enhances the discovery of improved performance in large models, an achievement that was previously difficult or impossible. Recently, a second computational advance optimizes the temporal trajectories traced by differential equations. Optimization requires differentiating a measure of performance over a trajectory, such as the closeness of tracking the environment, with respect to the parameters of the differential equations. Because model trajectories are usually calculated numerically by multistep algorithms, such as Runge-Kutta, the automatic differentiation must be passed through the numerical algorithm. This article explains how such automatic differentiation of trajectories is achieved. It also discusses why such computational breakthroughs are likely to advance theoretical and statistical studies of biological problems, in which one can consider variables as dynamic paths over time and space. Many common problems arise between improving success in computational learning models over performance landscapes, improving evolutionary fitness over adaptive landscapes, and improving statistical fits to data over information landscapes. △ Less

Submitted 11 October, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

arXiv:2206.09099

The Consistency of Adversarial Training for Binary Classification

Authors: Natalie S. Frank, Jonathan Niles-Weed

Abstract: Robustness to adversarial perturbations is of paramount concern in modern machine learning. One of the state-of-the-art methods for training robust classifiers is adversarial training, which involves minimizing a supremum-based surrogate risk. The statistical consistency of surrogate risks is well understood in the context of standard machine learning, but not in the adversarial setting. In this p… ▽ More Robustness to adversarial perturbations is of paramount concern in modern machine learning. One of the state-of-the-art methods for training robust classifiers is adversarial training, which involves minimizing a supremum-based surrogate risk. The statistical consistency of surrogate risks is well understood in the context of standard machine learning, but not in the adversarial setting. In this paper, we characterize which supremum-based surrogates are consistent for distributions absolutely continuous with respect to Lebesgue measure in binary classification. Furthermore, we obtain quantitative bounds relating adversarial surrogate risks to the adversarial classification risk. Lastly, we discuss implications for the $\cH$-consistency of adversarial training. △ Less

Submitted 17 May, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: There was an error in the main theorem of the paper (Theorem 7)

arXiv:2206.09098 [pdf, ps, other]

Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification

Authors: Natalie S. Frank, Jonathan Niles-Weed

Abstract: Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm devel… ▽ More Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks. △ Less

Submitted 10 December, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: 42 pages. version 2: corrects several errors and employs a significantly different proof technique. version 3: modifies the arXiv author list but has no other changes. version 4: improved exposition and fixed typos

arXiv:2204.07833 [pdf, other]

doi 10.1002/ece3.9895

Optimizing differential equations to fit data and predict outcomes

Authors: Steven A. Frank

Abstract: Many scientific problems focus on observed patterns of change or on how to design a system to achieve particular dynamics. Those problems often require fitting differential equation models to target trajectories. Fitting such models can be difficult because each evaluation of the fit must calculate the distance between the model and target patterns at numerous points along a trajectory. The gradie… ▽ More Many scientific problems focus on observed patterns of change or on how to design a system to achieve particular dynamics. Those problems often require fitting differential equation models to target trajectories. Fitting such models can be difficult because each evaluation of the fit must calculate the distance between the model and target patterns at numerous points along a trajectory. The gradient of the fit with respect to the model parameters can be challenging. Recent technical advances in automatic differentiation through numerical differential equation solvers potentially change the fitting process into a relatively easy problem, opening up new possibilities to study dynamics. However, application of the new tools to real data may fail to achieve a good fit. This article illustrates how to overcome a variety of common challenges, using the classic ecological data for oscillations in hare and lynx populations. Models include simple ordinary differential equations (ODEs) and neural ordinary differential equations (NODEs), which use artificial neural networks to estimate the derivatives of differential equation systems. Comparing the fits obtained with ODEs versus NODEs, representing small and large parameter spaces, and changing the number of variable dimensions provide insight into the geometry of the observed and model trajectories. To analyze the quality of the models for predicting future observations, a Bayesian-inspired preconditioned stochastic gradient Langevin dynamics (pSGLD) calculation of the posterior distribution of predicted model trajectories clarifies the tendency for various models to underfit or overfit the data. Coupling fitted differential equation systems with pSGLD sampling provides a powerful way to study the properties of optimization surfaces, raising an analogy with mutation-selection dynamics on fitness landscapes. △ Less

Submitted 16 April, 2022; originally announced April 2022.

arXiv:2203.10020 [pdf, other]

Challenges and Strategies in Cross-Cultural NLP

Authors: Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

Abstract: Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogo… ▽ More Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: ACL 2022 - Theme track

arXiv:2203.06937 [pdf, ps, other]

Modelling word learning and recognition using visually grounded speech

Authors: Danny Merkx, Sebastiaan Scholten, Stefan L. Frank, Mirjam Ernestus, Odette Scharenborg

Abstract: Background: Computational models of speech recognition often assume that the set of target words is already given. This implies that these models do not learn to recognise speech from scratch without prior knowledge and explicit supervision. Visually grounded speech models learn to recognise speech without prior knowledge by exploiting statistical dependencies between spoken and visual input. Whil… ▽ More Background: Computational models of speech recognition often assume that the set of target words is already given. This implies that these models do not learn to recognise speech from scratch without prior knowledge and explicit supervision. Visually grounded speech models learn to recognise speech without prior knowledge by exploiting statistical dependencies between spoken and visual input. While it has previously been shown that visually grounded speech models learn to recognise the presence of words in the input, we explicitly investigate such a model as a model of human speech recognition. Methods: We investigate the time-course of word recognition as simulated by the model using a gating paradigm to test whether its recognition is affected by well-known word-competition effects in human speech processing. We furthermore investigate whether vector quantisation, a technique for discrete representation learning, aids the model in the discovery and recognition of words. Results/Conclusion: Our experiments show that the model is able to recognise nouns in isolation and even learns to properly differentiate between plural and singular nouns. We also find that recognition is influenced by word competition from the word-initial cohort and neighbourhood density, mirroring word competition effects in human speech comprehension. Lastly, we find no evidence that vector quantisation is helpful in discovering and recognising words. Our gating experiments even show that the vector quantised model requires more of the input sequence for correct recognition. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.10292 [pdf, other]

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

Authors: Danny Merkx, Stefan L. Frank, Mirjam Ernestus

Abstract: Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compar… ▽ More Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL) 2022

arXiv:2112.01694 [pdf, other]

On the Existence of the Adversarial Bayes Classifier (Extended Version)

Authors: Pranjal Awasthi, Natalie S. Frank, Mehryar Mohri

Abstract: Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which th… ▽ More Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended and corrected version of the paper \emph{On the Existence of the Adversarial Bayes Classifier} published in NeurIPS 2021. There were two errors in theorem statements in the original paper -- one in the definition of pseudo-certifiable robustness and the other in the measurability of $A^\e$ for arbitrary metric spaces. In this version we correct the errors. Furthermore, the results of the original paper did not apply to some non-strictly convex norms and here we extend our results to all possible norms. △ Less

Submitted 28 August, 2023; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: 27 pages, 3 figures. Version 2: Corrects 2 errors in the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS. Version 3: Update to acknowledgements

arXiv:2109.06129 [pdf, other]

Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

Authors: Mostafa Abdou, Artur Kulmizev, Daniel Hershcovich, Stella Frank, Ellie Pavlick, Anders Søgaard

Abstract: Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To expl… ▽ More Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context. △ Less

Submitted 14 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: CoNLL 2021

arXiv:2109.04448 [pdf, other]

Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers

Authors: Stella Frank, Emanuele Bugliarello, Desmond Elliott

Abstract: Pretrained vision-and-language BERTs aim to learn representations that combine information from both modalities. We propose a diagnostic method based on cross-modal input ablation to assess the extent to which these models actually integrate cross-modal information. This method involves ablating inputs from one modality, either entirely or selectively based on cross-modal grounding alignments, and… ▽ More Pretrained vision-and-language BERTs aim to learn representations that combine information from both modalities. We propose a diagnostic method based on cross-modal input ablation to assess the extent to which these models actually integrate cross-modal information. This method involves ablating inputs from one modality, either entirely or selectively based on cross-modal grounding alignments, and evaluating the model prediction performance on the other modality. Model performance is measured by modality-specific tasks that mirror the model pretraining objectives (e.g. masked language modelling for text). Models that have learned to construct cross-modal representations using both modalities are expected to perform worse when inputs are missing from a modality. We find that recently proposed models have much greater relative difficulty predicting text when visual information is ablated, compared to predicting visual object categories when text is ablated, indicating that these models are not symmetrically cross-modal. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: EMNLP 2021

arXiv:2106.08648 [pdf, other]

doi 10.21437/Interspeech.2021-1464

Semantic sentence similarity: size does not always matter

Authors: Danny Merkx, Stefan L. Frank, Mirjam Ernestus

Abstract: This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our resul… ▽ More This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important important. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: This paper has been accepted at Interspeech 2021 where it will be presented and appear in the conference proceedings in September 2021

Journal ref: Proc. Interspeech 2021

arXiv:2101.11587 [pdf]

doi 10.1162/leon_a_02095

The Work of Art in an Age of Mechanical Generation

Authors: Steven J. Frank

Abstract: Can we define what it means to be "creative," and if so, can our definition drive artificial intelligence (AI) systems to feats of creativity indistinguishable from human efforts? This mixed question is considered from technological and social perspectives. Beginning with an exploration of the value we attach to authenticity in works of art, the article considers the ability of AI to detect forger… ▽ More Can we define what it means to be "creative," and if so, can our definition drive artificial intelligence (AI) systems to feats of creativity indistinguishable from human efforts? This mixed question is considered from technological and social perspectives. Beginning with an exploration of the value we attach to authenticity in works of art, the article considers the ability of AI to detect forgeries of renowned paintings and, in so doing, somehow reveal the quiddity of a work of art. We conclude by considering whether evolving technical capability can revise traditional relationships among art, artist, and the market. △ Less

Submitted 10 August, 2022; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: This is the author's final version; the article has been accepted for publication in Leonardo Journal

Journal ref: Leonardo(2022) 55(4): 378-381

arXiv:2010.14544 [pdf, other]

doi 10.3390/e22121395

The fundamental equations of change in statistical ensembles and biological populations

Authors: Steven A. Frank, Frank J. Bruggeman

Abstract: A recent article in Nature Physics unified key results from thermodynamics, statistics, and information theory. The unification arose from a general equation for the rate of change in the information content of a system. The general equation describes the change in the moments of an observable quantity over a probability distribution. One term in the equation describes the change in the probabilit… ▽ More A recent article in Nature Physics unified key results from thermodynamics, statistics, and information theory. The unification arose from a general equation for the rate of change in the information content of a system. The general equation describes the change in the moments of an observable quantity over a probability distribution. One term in the equation describes the change in the probability distribution. The other term describes the change in the observable values for a given state. We show the equivalence of this general equation for moment dynamics with the widely known Price equation from evolutionary theory, named after George Price. We introduce the Price equation from its biological roots, review a mathematically abstract form of the equation, and discuss the potential for this equation to unify diverse mathematical theories from different disciplines. The new work in Nature Physics and many applications in biology show that this equation also provides the basis for deriving many novel theoretical results within each discipline. △ Less

Submitted 27 October, 2020; originally announced October 2020.

arXiv:2010.07143 [pdf, other]

A Graph Neural Network Framework for Causal Inference in Brain Networks

Authors: Simon Wein, Wilhelm Malloni, Ana Maria Tomé, Sebastian M. Frank, Gina-Isabelle Henze, Stefan Wüst, Mark W. Greenlee, Elmar W. Lang

Abstract: A central question in neuroscience is how self-organizing dynamic interactions in the brain emerge on their relatively static structural backbone. Due to the complexity of spatial and temporal dependencies between different brain areas, fully comprehending the interplay between structure and function is still challenging and an area of intense research. In this paper we present a graph neural netw… ▽ More A central question in neuroscience is how self-organizing dynamic interactions in the brain emerge on their relatively static structural backbone. Due to the complexity of spatial and temporal dependencies between different brain areas, fully comprehending the interplay between structure and function is still challenging and an area of intense research. In this paper we present a graph neural network (GNN) framework, to describe functional interactions based on the structural anatomical layout. A GNN allows us to process graph-structured spatio-temporal signals, providing a possibility to combine structural information derived from diffusion tensor imaging (DTI) with temporal neural activity profiles, like observed in functional magnetic resonance imaging (fMRI). Moreover, dynamic interactions between different brain regions learned by this data-driven approach can provide a multi-modal measure of causal connectivity strength. We assess the proposed model's accuracy by evaluating its capabilities to replicate empirically observed neural activation profiles, and compare the performance to those of a vector auto regression (VAR), like typically used in Granger causality. We show that GNNs are able to capture long-term dependencies in data and also computationally scale up to the analysis of large-scale networks. Finally we confirm that features learned by a GNN can generalize across MRI scanner types and acquisition protocols, by demonstrating that the performance on small datasets can be improved by pre-training the GNN on data from an earlier and different study. We conclude that the proposed multi-modal GNN framework can provide a novel perspective on the structure-function relationship in the brain. Therewith this approach can be promising for the characterization of the information flow in brain networks. △ Less

Submitted 14 October, 2020; originally announced October 2020.

arXiv:2008.05458 [pdf]

Deep-Learning-Based, Multi-Timescale Load Forecasting in Buildings: Opportunities and Challenges from Research to Deployment

Authors: Sakshi Mishra, Stephen M. Frank, Anya Petersen, Robert Buechler, Michelle Slovensky

Abstract: Electricity load forecasting for buildings and campuses is becoming increasingly important as the penetration of distributed energy resources (DERs) grows. Efficient operation and dispatch of DERs require reasonably accurate predictions of future energy consumption in order to conduct near-real-time optimized dispatch of on-site generation and storage assets. Electric utilities have traditionally… ▽ More Electricity load forecasting for buildings and campuses is becoming increasingly important as the penetration of distributed energy resources (DERs) grows. Efficient operation and dispatch of DERs require reasonably accurate predictions of future energy consumption in order to conduct near-real-time optimized dispatch of on-site generation and storage assets. Electric utilities have traditionally performed load forecasting for load pockets spanning large geographic areas, and therefore forecasting has not been a common practice by buildings and campus operators. Given the growing trends of research and prototyping in the grid-interactive efficient buildings domain, characteristics beyond simple algorithm forecast accuracy are important in determining true utility of the algorithm for smart buildings. Other characteristics include the overall design of the deployed architecture and the operational efficiency of the forecasting system. In this work, we present a deep-learning-based load forecasting system that predicts the building load at 1-hour intervals for 18 hours in the future. We also discuss challenges associated with the real-time deployment of such systems as well as the research opportunities presented by a fully functional forecasting system that has been developed within the National Renewable Energy Laboratory Intelligent Campus program. △ Less

Submitted 16 December, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: 13 pages, 4 figures

arXiv:2006.02174 [pdf, other]

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Authors: Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon

Abstract: Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub… ▽ More Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular concerning attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with abstract and situated attributes. By using diagnostic classifiers, we show that current models learn representations that are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%). △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: Accepted to the Annual Conference of the Association for Computational Linguistics (ACL) 2020

arXiv:2005.10600 [pdf]

A Neural Network Looks at Leonardo's(?) Salvator Mundi

Authors: Steven J. Frank, Andrea M. Frank

Abstract: We use convolutional neural networks (CNNs) to analyze authorship questions surrounding the works of Leonardo da Vinci -- in particular, Salvator Mundi, the world's most expensive painting and among the most controversial. Trained on the works of an artist under study and visually comparable works of other artists, our system can identify likely forgeries and shed light on attribution controversie… ▽ More We use convolutional neural networks (CNNs) to analyze authorship questions surrounding the works of Leonardo da Vinci -- in particular, Salvator Mundi, the world's most expensive painting and among the most controversial. Trained on the works of an artist under study and visually comparable works of other artists, our system can identify likely forgeries and shed light on attribution controversies. Leonardo's few extant paintings test the limits of our system and require corroborative techniques of testing and analysis. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: This is the author's final version. The article has been accepted for publication in Leonardo (MIT Press)

arXiv:2005.09471 [pdf, other]

doi 10.18653/v1/2021.cmcl-1.2

Human Sentence Processing: Recurrence or Attention?

Authors: Danny Merkx, Stefan L. Frank

Abstract: Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models' ability to account for measures… ▽ More Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models' ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval. △ Less

Submitted 4 May, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: This paper will appear in the proceedings of CMCL 2021 to be held June 10th

Journal ref: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL) 2021

arXiv:2003.07690 [pdf]

doi 10.1016/j.autcon.2020.103411

A Unified Architecture for Data-Driven Metadata Tagging of Building Automation Systems

Authors: Sakshi Mishra, Andrew Glaws, Dylan Cutler, Stephen Frank, Muhammad Azam, Farzam Mohammadi, Jean-Simon Venne

Abstract: This article presents a Unified Architecture for automated point tagging of Building Automation System data, based on a combination of data-driven approaches. Advanced energy analytics applications-including fault detection and diagnostics and supervisory control-have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analyti… ▽ More This article presents a Unified Architecture for automated point tagging of Building Automation System data, based on a combination of data-driven approaches. Advanced energy analytics applications-including fault detection and diagnostics and supervisory control-have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analytics depends on harnessing structured data from the various building control and monitoring systems, but typical Building Automation System implementations do not employ any standardized metadata schema. While standards such as Project Haystack and Brick Schema have been developed to address this issue, the process of structuring the data, i.e., tagging the points to apply a standard metadata schema, has, to date, been a manual process. This process is typically costly, labor-intensive, and error-prone. In this work we address this gap by proposing a UA that automates the process of point tagging by leveraging the data accessible through connection to the BAS, including time series data and the raw point names. The UA intertwines supervised classification and unsupervised clustering techniques from machine learning and leverages both their deterministic and probabilistic outputs to inform the point tagging process. Furthermore, we extend the UA to embed additional input and output data-processing modules that are designed to address the challenges associated with the real-time deployment of this automation solution. We test the UA on two datasets for real-life buildings: 1. commercial retail buildings and 2. office buildings from the National Renewable Energy Laboratory campus. The proposed methodology correctly applied 85-90 percent and 70-75 percent of the tags in each of these test scenarios, respectively. △ Less

Submitted 11 September, 2020; v1 submitted 26 February, 2020; originally announced March 2020.

Comments: 19 pages, 9 figures, accepted for publication in Automation in Construction

arXiv:2002.07621 [pdf]

doi 10.1016/j.bspc.2020.102388

Resource-Frugal Classification and Analysis of Pathology Slides Using Image Entropy

Authors: Steven J. Frank

Abstract: Pathology slides of lung malignancies are classified using resource-frugal convolution neural networks (CNNs) that may be deployed on mobile devices. In particular, the challenging task of distinguishing adenocarcinoma (LUAD) and squamous-cell carcinoma (LUSC) lung cancer subtypes is approached in two stages. First, whole-slide histopathology images are downsampled to a size too large for CNN anal… ▽ More Pathology slides of lung malignancies are classified using resource-frugal convolution neural networks (CNNs) that may be deployed on mobile devices. In particular, the challenging task of distinguishing adenocarcinoma (LUAD) and squamous-cell carcinoma (LUSC) lung cancer subtypes is approached in two stages. First, whole-slide histopathology images are downsampled to a size too large for CNN analysis but large enough to retain key anatomic detail. The downsampled images are decomposed into smaller square tiles, which are sifted based on their image entropies. A lightweight CNN produces tile-level classifications that are aggregated to classify the slide. The resulting accuracies are comparable to those obtained with much more complex CNNs and larger training sets. To allow clinicians to visually assess the basis for the classification -- that is, to see the image regions that underlie it -- color-coded probability maps are created by overlapping tiles and averaging the tile-level probabilities at a pixel level. △ Less

Submitted 2 December, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Journal ref: Biomedical Signal Processing and Control, vol. 66, April 2021, 102388

arXiv:2002.05107 [pdf]

Analysis of Dutch Master Paintings with Convolutional Neural Networks

Authors: Steven J. Frank, Andrea M. Frank

Abstract: Trained on the works of an artist under study and visually comparable works of other artists, convolutional neural networks can identify forgeries and provide attributions. They can also assign classification probabilities within a painting, revealing mixed authorship and identifying regions painted by different hands. Trained on the works of an artist under study and visually comparable works of other artists, convolutional neural networks can identify forgeries and provide attributions. They can also assign classification probabilities within a painting, revealing mixed authorship and identifying regions painted by different hands. △ Less

Submitted 16 August, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

arXiv:1910.05291 [pdf, other]

The Emergence of Compositional Languages for Numeric Concepts Through Iterated Learning in Neural Agents

Authors: Shangmin Guo, Yi Ren, Serhii Havrylov, Stella Frank, Ivan Titov, Kenny Smith

Abstract: Since first introduced, computer simulation has been an increasingly important tool in evolutionary linguistics. Recently, with the development of deep learning techniques, research in grounded language learning has also started to focus on facilitating the emergence of compositional languages without pre-defined elementary linguistic knowledge. In this work, we explore the emergence of compositio… ▽ More Since first introduced, computer simulation has been an increasingly important tool in evolutionary linguistics. Recently, with the development of deep learning techniques, research in grounded language learning has also started to focus on facilitating the emergence of compositional languages without pre-defined elementary linguistic knowledge. In this work, we explore the emergence of compositional languages for numeric concepts in multi-agent communication systems. We demonstrate that compositional language for encoding numeric concepts can emerge through iterated learning in populations of deep neural network agents. However, language properties greatly depend on the input representations given to agents. We found that compositional languages only emerge if they require less iterations to be fully learnt than other non-degenerate languages for agents on a given input representation. △ Less

Submitted 11 October, 2019; originally announced October 2019.

arXiv:1909.03795 [pdf, ps, other]

doi 10.21437/Interspeech.2019-3067

Language learning using Speech to Image retrieval

Authors: Danny Merkx, Stefan L. Frank, Mirjam Ernestus

Abstract: Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyc… ▽ More Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: Submitted to InterSpeech 2019

Journal ref: Proc. Interspeech 2019

arXiv:1907.12436 [pdf]

Salient Slices: Improved Neural Network Training and Performance with Image Entropy

Authors: Steven J. Frank, Andrea M. Frank

Abstract: As a training and analysis strategy for convolutional neural networks (CNNs), we slice images into tiled segments and use, for training and prediction, segments that both satisfy a criterion of information diversity and contain sufficient content to support classification. In particular, we utilize image entropy as the diversity criterion. This ensures that each tile carries as much information di… ▽ More As a training and analysis strategy for convolutional neural networks (CNNs), we slice images into tiled segments and use, for training and prediction, segments that both satisfy a criterion of information diversity and contain sufficient content to support classification. In particular, we utilize image entropy as the diversity criterion. This ensures that each tile carries as much information diversity as the original image, and for many applications serves as an indicator of usefulness in classification. To make predictions, a probability aggregation framework is applied to probabilities assigned by the CNN to the input image tiles. This technique facilitates the use of large, high-resolution images that would be impractical to analyze unmodified; provides data augmentation for training, which is particularly valuable when image availability is limited; and the ensemble nature of the input for prediction enhances its accuracy. △ Less

Submitted 4 May, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

Comments: Final version; article will be published in Neural Computation 32, 1222-1237 (June 2020)

arXiv:1904.00825 [pdf, other]

doi 10.1098/rstb.2019.0351

Simple unity among the fundamental equations of science

Authors: Steven A. Frank

Abstract: The Price equation describes the change in populations. Change concerns some value, such as biological fitness, information or physical work. The Price equation reveals universal aspects for the nature of change, independently of the meaning ascribed to values. By understanding those universal aspects, we can see more clearly why fundamental mathematical results in different disciplines often shar… ▽ More The Price equation describes the change in populations. Change concerns some value, such as biological fitness, information or physical work. The Price equation reveals universal aspects for the nature of change, independently of the meaning ascribed to values. By understanding those universal aspects, we can see more clearly why fundamental mathematical results in different disciplines often share a common form. We can also interpret more clearly the meaning of key results within each discipline. For example, the mathematics of natural selection in biology has a form closely related to information theory and physical entropy. Does that mean that natural selection is about information or entropy? Or do natural selection, information and entropy arise as interpretations of a common underlying abstraction? The Price equation suggests the latter. The Price equation achieves its abstract generality by partitioning change into two terms. The first term naturally associates with the direct forces that cause change. The second term naturally associates with the changing frame of reference. In the Price equation's canonical form, total change remains zero because the conservation of total probability requires that all probabilities invariantly sum to one. Much of the shared common form for the mathematics of different disciplines may arise from that seemingly trivial invariance of total probability, which leads to the partitioning of total change into equal and opposite components of the direct forces and the changing frame of reference. △ Less

Submitted 4 August, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

Comments: arXiv admin note: text overlap with arXiv:1810.09262

arXiv:1903.11393 [pdf, other]

doi 10.1017/S1351324919000196

Learning semantic sentence representations from visually grounded language without lexical knowledge

Authors: Danny Merkx, Stefan Frank

Abstract: Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural… ▽ More Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Journal ref: Natural Language Engineering, Volume 25 - Issue 4 - July 2019

arXiv:1810.09262 [pdf, other]

doi 10.3390/e20120978

The Price equation program: simple invariances unify population dynamics, thermodynamics, probability, information and inference

Authors: Steven A. Frank

Abstract: The fundamental equations of various disciplines often seem to share the same basic structure. Natural selection increases information in the same way that Bayesian updating increases information. Thermodynamics and the forms of common probability distributions express maximum increase in entropy, which appears mathematically as loss of information. Physical mechanics follows paths of change that… ▽ More The fundamental equations of various disciplines often seem to share the same basic structure. Natural selection increases information in the same way that Bayesian updating increases information. Thermodynamics and the forms of common probability distributions express maximum increase in entropy, which appears mathematically as loss of information. Physical mechanics follows paths of change that maximize Fisher information. The information expressions typically have analogous interpretations as the Newtonian balance between force and acceleration, representing a partition between direct causes of change and opposing changes in the frame of reference. This web of vague analogies hints at a deeper common mathematical structure. I suggest that the Price equation expresses that underlying universal structure. The abstract Price equation describes dynamics as the change between two sets. One component of dynamics expresses the change in the frequency of things, holding constant the values associated with things. The other component of dynamics expresses the change in the values of things, holding constant the frequency of things. The separation of frequency from value generalizes Shannon's separation of the frequency of symbols from the meaning of symbols in information theory. The Price equation's generalized separation of frequency and value reveals a few simple invariances that define universal geometric aspects of change. For example, the conservation of total frequency, although a trivial invariance by itself, creates a powerful constraint on the geometry of change. That constraint plus a few others seem to explain the common structural forms of the equations in different disciplines. From that abstract perspective, interpretations such as selection, information, entropy, force, acceleration, and physical work arise from the same underlying geometry expressed by the Price equation. △ Less

Submitted 14 December, 2018; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: Version 3: added figure illustrating geometry; added table of symbols and two tables summarizing mathematical relations; this version accepted for publication in Entropy

Journal ref: 2018. Entropy 20:978

arXiv:1809.08758 [pdf, other]

Low Frequency Adversarial Perturbation

Authors: Chuan Guo, Jared S. Frank, Kilian Q. Weinberger

Abstract: Adversarial images aim to change a target model's decision by minimally perturbing a target image. In the black-box setting, the absence of gradient information often renders this search problem costly in terms of query complexity. In this paper we propose to restrict the search for adversarial images to a low frequency domain. This approach is readily compatible with many existing black-box attac… ▽ More Adversarial images aim to change a target model's decision by minimally perturbing a target image. In the black-box setting, the absence of gradient information often renders this search problem costly in terms of query complexity. In this paper we propose to restrict the search for adversarial images to a low frequency domain. This approach is readily compatible with many existing black-box attack frameworks and consistently reduces their query cost by 2 to 4 times. Further, we can circumvent image transformation defenses even when both the model and the defense strategy are unknown. Finally, we demonstrate the efficacy of this technique by fooling the Google Cloud Vision platform with an unprecedented low number of model queries. △ Less

Submitted 22 July, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

Comments: 9 pages, 9 figures. Accepted to UAI 2019

arXiv:1710.07177 [pdf, other]

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

Authors: Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia

Abstract: We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, o… ▽ More We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, only the image is given. Compared to last year, multimodal systems improved, but text-only systems remain competitive. △ Less

Submitted 19 October, 2017; originally announced October 2017.

Journal ref: Proceedings of the Second Conference on Machine Translation, 2017, pp. 215--233

arXiv:1706.05656 [pdf, ps, other]

doi 10.1371/journal.pone.0197304

Lexical representation explains cortical entrainment during speech comprehension

Authors: Stefan Frank, Jinbiao Yang

Abstract: Results from a recent neuroimaging study on spoken sentence comprehension have been interpreted as evidence for cortical entrainment to hierarchical syntactic structure. We present a simple computational model that predicts the power spectra from this study, even though the model's linguistic knowledge is restricted to the lexical level, and word-level representations are not combined into higher-… ▽ More Results from a recent neuroimaging study on spoken sentence comprehension have been interpreted as evidence for cortical entrainment to hierarchical syntactic structure. We present a simple computational model that predicts the power spectra from this study, even though the model's linguistic knowledge is restricted to the lexical level, and word-level representations are not combined into higher-level units (phrases or sentences). Hence, the cortical entrainment results can also be explained from the lexical properties of the stimuli, without recourse to hierarchical syntax. △ Less

Submitted 10 January, 2018; v1 submitted 18 June, 2017; originally announced June 2017.

Comments: Submitted for publication

arXiv:1605.00459 [pdf, ps, other]

Multi30K: Multilingual English-German Image Descriptions

Authors: Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia

Abstract: We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and… ▽ More We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks. △ Less

Submitted 2 May, 2016; originally announced May 2016.

arXiv:1510.04709 [pdf, ps, other]

Multilingual Image Description with Neural Sequence Models

Authors: Desmond Elliott, Stella Frank, Eva Hasler

Abstract: In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and… ▽ More In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline. △ Less

Submitted 18 November, 2015; v1 submitted 15 October, 2015; originally announced October 2015.

Comments: Under review as a conference paper at ICLR 2016

arXiv:1509.04473 [pdf, other]

Splitting Compounds by Semantic Analogy

Authors: Joachim Daiber, Lautaro Quiroz, Roger Wechsler, Stella Frank

Abstract: Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. In this paper, we investigate whether distributional semantics in the form of word embeddings can enable a deeper, i.e., more knowledge-rich, processing of compounds than the standard string-based methods. We present an unsupervised approach that explo… ▽ More Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. In this paper, we investigate whether distributional semantics in the form of word embeddings can enable a deeper, i.e., more knowledge-rich, processing of compounds than the standard string-based methods. We present an unsupervised approach that exploits regularities in the semantic vector space (based on analogies such as "bookshop is to shop as bookshelf is to shelf") to produce compound analyses of high quality. A subsequent compound splitting algorithm based on these analyses is highly effective, particularly for ambiguous compounds. German to English machine translation experiments show that this semantic analogy-based compound splitter leads to better translations than a commonly used frequency-based method. △ Less

Submitted 15 September, 2015; originally announced September 2015.

Journal ref: Proceedings of the 1st Deep Machine Translation Workshop. Prague, Czech Republic. 2015

arXiv:1412.1285 [pdf, other]

The inductive theory of natural selection: summary and synthesis

Authors: Steven A. Frank

Abstract: The theory of natural selection has two forms. Deductive theory describes how populations change over time. One starts with an initial population and some rules for change. From those assumptions, one calculates the future state of the population. Deductive theory predicts how populations adapt to environmental challenge. Inductive theory describes the causes of change in populations. One starts w… ▽ More The theory of natural selection has two forms. Deductive theory describes how populations change over time. One starts with an initial population and some rules for change. From those assumptions, one calculates the future state of the population. Deductive theory predicts how populations adapt to environmental challenge. Inductive theory describes the causes of change in populations. One starts with a given amount of change. One then assigns different parts of the total change to particular causes. Inductive theory analyzes alternative causal models for how populations have adapted to environmental challenge. This chapter emphasizes the inductive analysis of cause. △ Less

Submitted 12 November, 2016; v1 submitted 3 December, 2014; originally announced December 2014.

Comments: Version 2: Changed title. Noted that condensed and simplified version of this manuscript will be published as book chapter with original title "The inductive theory of natural selection." See footnote on title page of pdf

Showing 1–41 of 41 results for author: Frank, S