Search | arXiv e-print repository

Mechanistic interpretability of large language models with applications to the financial services industry

Authors: Ashkan Golgoon, Khashayar Filom, Arjun Ravi Kannan

Abstract: Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial inst… ▽ More Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial institutions, where concerns and accountability regarding bias, fairness, and reliability are of paramount importance. Mechanistic interpretability aims at reverse engineering complex AI models such as transformers. In this paper, we are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. We offer several examples of how algorithmic tasks can be designed for compliance monitoring purposes. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws. Using direct logit attribution, we study the contributions of each layer and its corresponding attention heads to the logit difference in the residual stream. Finally, we design clean and corrupted prompts and use activation patching as a causal intervention method to localize our task completion components further. We observe that the (positive) heads $10.2$ (head $2$, layer $10$), $10.7$, and $11.3$, as well as the (negative) heads $9.6$ and $10.6$ play a significant role in the task completion. △ Less

Submitted 15 July, 2024; originally announced July 2024.

MSC Class: 68T01 ACM Class: I.2.7

arXiv:2303.10216 [pdf, other]

Approximation of group explainers with coalition structure using Monte Carlo sampling on the product space of coalitions and features

Authors: Konstandinos Kotsiopoulos, Alexey Miroshnikov, Khashayar Filom, Arjun Ravi Kannan

Abstract: In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor… ▽ More In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor vector. By viewing these explainers as expectations over appropriate sample spaces, we design a novel Monte Carlo sampling algorithm that estimates them at a reduced complexity that depends linearly on the size of the background dataset. We set up a rigorous framework for the statistical analysis and obtain error bounds for our sampling methods. The advantage of this approach is that it is fast, easily implementable, and model-agnostic. Furthermore, it has similar statistical accuracy as other known estimation techniques that are more complex and model-specific. We provide rigorous proofs of statistical convergence, as well as numerical experiments whose results agree with our theoretical findings. △ Less

Submitted 18 April, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 31 pages, 6 figures

arXiv:2302.08434 [pdf, other]

doi 10.3934/fods.2024021

On marginal feature attributions of tree-based models

Authors: Khashayar Filom, Alexey Miroshnikov, Konstandinos Kotsiopoulos, Arjun Ravi Kannan

Abstract: Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent onl… ▽ More Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. We contrast this with the popular TreeSHAP algorithm by presenting two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the internal structure of tree-based models may be leveraged to help with computing their marginal feature attributions according to a linear game value. One important observation is that these are simple (piecewise-constant) functions with respect to a certain grid partition of the input space determined by the trained model. Another crucial observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble. Thus, the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. This remains valid for a broader class of game values which we shall axiomatically characterize. A prime example is the case of CatBoost models where the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models. This results in a fast, accurate algorithm for estimating these feature attributions. △ Less

Submitted 5 May, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: Minor corrections. 30 pages+appendix (64 pages in total), 10 figures. To appear in Foundations of Data Science

MSC Class: Primary: 68T01; 91A12; 91A80; 05A19; Secondary: 91A68; 91A06; 05C05

arXiv:2102.10878 [pdf, other]

Stability theory of game-theoretic group feature explanations for machine learning models

Authors: Alexey Miroshnikov, Konstandinos Kotsiopoulos, Khashayar Filom, Arjun Ravi Kannan

Abstract: In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds f… ▽ More In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds for both marginal and conditional explanations. The differences between the two games are then elucidated, such as showing that the marginal explanations can become discontinuous on some naturally-designed domains, while the conditional explanations remain stable. In the second part of our work, group explanation methodologies are devised based on game values with coalition structure, where the features are grouped based on dependencies. We show analytically that grouping features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations. Our results are verified in a number of numerical experiments where an information-theoretic measure of dependence is used for grouping. △ Less

Submitted 10 August, 2024; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: 82 pages, 43 figures. Typos fixed. Some technical results have been improved

MSC Class: 91A06; 91A12; 91A80; 46N30; 46N99; 68T01

arXiv:2005.08859 [pdf, other]

PDE constraints on smooth hierarchical functions computed by neural networks

Authors: Khashayar Filom, Konrad Paul Kording, Roozbeh Farhoodi

Abstract: Neural networks are versatile tools for computation, having the ability to approximate a broad range of functions. An important problem in the theory of deep neural networks is expressivity; that is, we want to understand the functions that are computable by a given network. We study real infinitely differentiable (smooth) hierarchical functions implemented by feedforward neural networks via compo… ▽ More Neural networks are versatile tools for computation, having the ability to approximate a broad range of functions. An important problem in the theory of deep neural networks is expressivity; that is, we want to understand the functions that are computable by a given network. We study real infinitely differentiable (smooth) hierarchical functions implemented by feedforward neural networks via composing simpler functions in two cases: 1) each constituent function of the composition has fewer inputs than the resulting function; 2) constituent functions are in the more specific yet prevalent form of a non-linear univariate function (e.g. tanh) applied to a linear multivariate function. We establish that in each of these regimes there exist non-trivial algebraic partial differential equations (PDEs), which are satisfied by the computed functions. These PDEs are purely in terms of the partial derivatives and are dependent only on the topology of the network. For compositions of polynomial functions, the algebraic PDEs yield non-trivial equations (of degrees dependent only on the architecture) in the ambient polynomial space that are satisfied on the associated functional varieties. Conversely, we conjecture that such PDE constraints, once accompanied by appropriate non-singularity conditions and perhaps certain inequalities involving partial derivatives, guarantee that the smooth function under consideration can be represented by the network. The conjecture is verified in numerous examples including the case of tree architectures which are of neuroscientific interest. Our approach is a step toward formulating an algebraic description of functional spaces associated with specific neural networks, and may provide new, useful tools for constructing neural networks. △ Less

Submitted 13 August, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: Minor changes, typos corrected. 52 pages, 17 figures

arXiv:1904.02309 [pdf, other]

doi 10.1162/neco_a_01231

On functions computed on trees

Authors: Roozbeh Farhoodi, Khashayar Filom, Ilenna Simone Jones, Konrad Paul Kording

Abstract: Any function can be constructed using a hierarchy of simpler functions through compositions. Such a hierarchy can be characterized by a binary rooted tree. Each node of this tree is associated with a function which takes as inputs two numbers from its children and produces one output. Since thinking about functions in terms of computation graphs is getting popular we may want to know which functio… ▽ More Any function can be constructed using a hierarchy of simpler functions through compositions. Such a hierarchy can be characterized by a binary rooted tree. Each node of this tree is associated with a function which takes as inputs two numbers from its children and produces one output. Since thinking about functions in terms of computation graphs is getting popular we may want to know which functions can be implemented on a given tree. Here, we describe a set of necessary constraints in the form of a system of non-linear partial differential equations that must be satisfied. Moreover, we prove that these conditions are sufficient in both contexts of analytic and bit-valued functions. In the latter case, we explicitly enumerate discrete functions and observe that there are relatively few. Our point of view allows us to compare different neural network architectures in regard to their function spaces. Our work connects the structure of computation graphs with the functions they can implement and has potential applications to neuroscience and computer science. △ Less

Submitted 22 October, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: 52 pages, 10 figures. The final version. To appear in Neural Computation. May vary slightly from published version

Journal ref: Neural Computation 31 (2019), no. 11, 2075--2137

Showing 1–6 of 6 results for author: Filom, K