Search | arXiv e-print repository

Function Trees: Transparent Machine Learning

Abstract: The output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simp… ▽ More The output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simpler functions. This tree exposes the global internal structure of the function by uncovering and describing the combined joint influences of subsets of its input variables. Given the inputs and corresponding function values, a function tree is constructed that can be used to rapidly identify and compute all of the function's main and interaction effects up to high order. Interaction effects involving up to four variables are graphically visualized. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2207.05112 [pdf, other]

An Interpretable Joint Nonnegative Matrix Factorization-Based Point Cloud Distance Measure

Authors: Hannah Friedman, Amani R. Maina-Kilaas, Julianna Schalkwyk, Hina Ahmed, Jamie Haddock

Abstract: In this paper, we propose a new method for determining shared features of and measuring the distance between data sets or point clouds. Our approach uses the joint factorization of two data matrices $X_1,X_2$ into non-negative matrices $X_1 = AS_1, X_2 = AS_2$ to derive a similarity measure that determines how well the shared basis $A$ approximates $X_1, X_2$. We also propose a point cloud distanc… ▽ More In this paper, we propose a new method for determining shared features of and measuring the distance between data sets or point clouds. Our approach uses the joint factorization of two data matrices $X_1,X_2$ into non-negative matrices $X_1 = AS_1, X_2 = AS_2$ to derive a similarity measure that determines how well the shared basis $A$ approximates $X_1, X_2$. We also propose a point cloud distance measure built upon this method and the learned factorization. Our method reveals structural differences in both image and text data. Potential applications include classification, detecting plagiarism or other manipulation, data denoising, and transfer learning. △ Less

Submitted 27 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

arXiv:2107.07160 [pdf, other]

Lockout: Sparse Regularization of Neural Networks

Authors: Gilmer Valdes, Wilmer Arbelo, Yannet Interian, Jerome H. Friedman

Abstract: Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization prob… ▽ More Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2001.10102 [pdf, ps, other]

Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

Authors: Jerome H. Friedman

Abstract: The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer… ▽ More The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions. △ Less

Submitted 27 January, 2020; originally announced January 2020.

Comments: 33 pages, 19 figures

arXiv:1912.03785 [pdf, ps, other]

doi 10.1073/pnas.1921562117

Contrast Trees and Distribution Boosting

Authors: Jerome H. Friedman

Abstract: Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If decisions are to be made based on such results it is important to have some notion of their veracity. Contrast trees represent a new approach for assessing the accuracy… ▽ More Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If decisions are to be made based on such results it is important to have some notion of their veracity. Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods. In situations where inaccuracies are detected boosted contrast trees can often improve performance. A special case, distribution boosting, provides an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values. △ Less

Submitted 8 December, 2019; originally announced December 2019.

Comments: 18 pages, 20 figures

arXiv:1903.09731 [pdf]

doi 10.1073/pnas.1906831117

Expert-Augmented Machine Learning

Authors: E. D. Gennatas, J. H. Friedman, L. H. Ungar, R. Pirracchio, E. Eaton, L. Reichman, Y. Interian, C. B. Simone, A. Auerbach, E. Delgado, M. J. Van der Laan, T. D. Solberg, G. Valdes

Abstract: Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may… ▽ More Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability on a different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications. △ Less

Submitted 5 January, 2021; v1 submitted 22 March, 2019; originally announced March 2019.

arXiv:0805.1386 [pdf, ps, other]

A language for mathematical knowledge management

Authors: Steven Kieffer, Jeremy Avigad, Harvey Friedman

Abstract: We argue that the language of Zermelo Fraenkel set theory with definitions and partial functions provides the most promising bedrock semantics for communicating and sharing mathematical knowledge. We then describe a syntactic sugaring of that language that provides a way of writing remarkably readable assertions without straying far from the set-theoretic semantics. We illustrate with some example… ▽ More We argue that the language of Zermelo Fraenkel set theory with definitions and partial functions provides the most promising bedrock semantics for communicating and sharing mathematical knowledge. We then describe a syntactic sugaring of that language that provides a way of writing remarkably readable assertions without straying far from the set-theoretic semantics. We illustrate with some examples of formalized textbook definitions from elementary set theory and point-set topology. We also present statistics concerning the complexity of these definitions, under various complexity measures. △ Less

Submitted 3 January, 2011; v1 submitted 9 May, 2008; originally announced May 2008.

ACM Class: F.4.1; I.2.4

Journal ref: Studies in Logic, Grammar and Rhetoric, 18:51-66, 2009

arXiv:cs/0601134 [pdf, ps, other]

doi 10.2168/LMCS-2(4:4)2006

Combining decision procedures for the reals

Authors: Jeremy Avigad, Harvey Friedman

Abstract: We address the general problem of determining the validity of boolean combinations of equalities and inequalities between real-valued expressions. In particular, we consider methods of establishing such assertions using only restricted forms of distributivity. At the same time, we explore ways in which "local" decision or heuristic procedures for fragments of the theory of the reals can be am… ▽ More We address the general problem of determining the validity of boolean combinations of equalities and inequalities between real-valued expressions. In particular, we consider methods of establishing such assertions using only restricted forms of distributivity. At the same time, we explore ways in which "local" decision or heuristic procedures for fragments of the theory of the reals can be amalgamated into global ones. Let Tadd[Q] be the first-order theory of the real numbers in the language of ordered groups, with negation, a constant 1, and function symbols for multiplication by rational constants. Let Tmult[Q] be the analogous theory for the multiplicative structure, and let T[Q] be the union of the two. We show that although T[Q] is undecidable, the universal fragment of T[Q] is decidable. We also show that terms of T[Q]can fruitfully be put in a normal form. We prove analogous results for theories in which Q is replaced, more generally, by suitable subfields F of the reals. Finally, we consider practical methods of establishing quantifier-free validities that approximate our (impractical) decidability results. △ Less

Submitted 18 October, 2006; v1 submitted 31 January, 2006; originally announced January 2006.

Comments: Will appear in Logical Methods in Computer Science

ACM Class: F.4.1; I.2.3

Journal ref: Logical Methods in Computer Science, Volume 2, Issue 4 (October 18, 2006) lmcs:2240

Showing 1–8 of 8 results for author: Friedman, H