-
An energy-based comparative analysis of common approaches to text classification in the Legal domain
Authors:
Sinan Gultekin,
Achille Globo,
Andrea Zugarini,
Marco Ernandes,
Leonardo Rigutini
Abstract:
Most Machine Learning research evaluates the best solutions in terms of performance. However, in the race for the best performing model, many important aspects are often overlooked when, on the contrary, they should be carefully considered. In fact, sometimes the gaps in performance between different approaches are neglectable, whereas factors such as production costs, energy consumption, and carb…
▽ More
Most Machine Learning research evaluates the best solutions in terms of performance. However, in the race for the best performing model, many important aspects are often overlooked when, on the contrary, they should be carefully considered. In fact, sometimes the gaps in performance between different approaches are neglectable, whereas factors such as production costs, energy consumption, and carbon footprint must take into consideration. Large Language Models (LLMs) are extensively adopted to address NLP problems in academia and industry. In this work, we present a detailed quantitative comparison of LLM and traditional approaches (e.g. SVM) on the LexGLUE benchmark, which takes into account both performance (standard indices) and alternative metrics such as timing, power consumption and cost, in a word: the carbon-footprint. In our analysis, we considered the prototyping phase (model selection by training-validation-test iterations) and in-production phases separately, since they follow different implementation procedures and also require different resources. The results indicate that very often, the simplest algorithms achieve performance very close to that of large LLMs but with very low power consumption and lower resource demands. The results obtained could suggest companies to include additional evaluations in the choice of Machine Learning (ML) solutions.
△ Less
Submitted 5 February, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
DENTEX: An Abnormal Tooth Detection with Dental Enumeration and Diagnosis Benchmark for Panoramic X-rays
Authors:
Ibrahim Ethem Hamamci,
Sezgin Er,
Enis Simsar,
Atif Emre Yuksel,
Sadullah Gultekin,
Serife Damla Ozdemir,
Kaiyuan Yang,
Hongwei Bran Li,
Sarthak Pati,
Bernd Stadlinger,
Albert Mehl,
Mustafa Gundogar,
Bjoern Menze
Abstract:
Panoramic X-rays are frequently used in dentistry for treatment planning, but their interpretation can be both time-consuming and prone to error. Artificial intelligence (AI) has the potential to aid in the analysis of these X-rays, thereby improving the accuracy of dental diagnoses and treatment plans. Nevertheless, designing automated algorithms for this purpose poses significant challenges, mai…
▽ More
Panoramic X-rays are frequently used in dentistry for treatment planning, but their interpretation can be both time-consuming and prone to error. Artificial intelligence (AI) has the potential to aid in the analysis of these X-rays, thereby improving the accuracy of dental diagnoses and treatment plans. Nevertheless, designing automated algorithms for this purpose poses significant challenges, mainly due to the scarcity of annotated data and variations in anatomical structure. To address these issues, the Dental Enumeration and Diagnosis on Panoramic X-rays Challenge (DENTEX) has been organized in association with the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. This challenge aims to promote the development of algorithms for multi-label detection of abnormal teeth, using three types of hierarchically annotated data: partially annotated quadrant data, partially annotated quadrant-enumeration data, and fully annotated quadrant-enumeration-diagnosis data, inclusive of four different diagnoses. In this paper, we present the results of evaluating participant algorithms on the fully annotated data, additionally investigating performance variation for quadrant, enumeration, and diagnosis labels in the detection of abnormal teeth. The provision of this annotated dataset, alongside the results of this challenge, may lay the groundwork for the creation of AI-powered tools that can offer more precise and efficient diagnosis and treatment planning in the field of dentistry. The evaluation code and datasets can be accessed at https://github.com/ibrahimethemhamamci/DENTEX
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Nonlinear Kalman Filtering with Reparametrization Gradients
Authors:
San Gultekin,
Brendan Kitts,
Aaron Flores,
John Paisley
Abstract:
We introduce a novel nonlinear Kalman filter that utilizes reparametrization gradients. The widely used parametric approximation is based on a jointly Gaussian assumption of the state-space model, which is in turn equivalent to minimizing an approximation to the Kullback-Leibler divergence. It is possible to obtain better approximations using the alpha divergence, but the resulting problem is subs…
▽ More
We introduce a novel nonlinear Kalman filter that utilizes reparametrization gradients. The widely used parametric approximation is based on a jointly Gaussian assumption of the state-space model, which is in turn equivalent to minimizing an approximation to the Kullback-Leibler divergence. It is possible to obtain better approximations using the alpha divergence, but the resulting problem is substantially more complex. In this paper, we introduce an alternate formulation based on an energy function, which can be optimized instead of the alpha divergence. The optimization can be carried out using reparametrization gradients, a technique that has recently been utilized in a number of deep learning models.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
An Efficient Deep Distribution Network for Bid Shading in First-Price Auctions
Authors:
Tian Zhou,
Hao He,
Shengjun Pan,
Niklas Karlsson,
Bharatbhushan Shetty,
Brendan Kitts,
Djordje Gligorijevic,
San Gultekin,
Tingyu Mao,
Junwei Pan,
Jianlong Zhang,
Aaron Flores
Abstract:
Since 2019, most ad exchanges and sell-side platforms (SSPs), in the online advertising industry, shifted from second to first price auctions. Due to the fundamental difference between these auctions, demand-side platforms (DSPs) have had to update their bidding strategies to avoid bidding unnecessarily high and hence overpaying. Bid shading was proposed to adjust the bid price intended for second…
▽ More
Since 2019, most ad exchanges and sell-side platforms (SSPs), in the online advertising industry, shifted from second to first price auctions. Due to the fundamental difference between these auctions, demand-side platforms (DSPs) have had to update their bidding strategies to avoid bidding unnecessarily high and hence overpaying. Bid shading was proposed to adjust the bid price intended for second-price auctions, in order to balance cost and winning probability in a first-price auction setup. In this study, we introduce a novel deep distribution network for optimal bidding in both open (non-censored) and closed (censored) online first-price auctions. Offline and online A/B testing results show that our algorithm outperforms previous state-of-art algorithms in terms of both surplus and effective cost per action (eCPX) metrics. Furthermore, the algorithm is optimized in run-time and has been deployed into VerizonMedia DSP as production algorithm, serving hundreds of billions of bid requests per day. Online A/B test shows that advertiser's ROI are improved by +2.4%, +2.4%, and +8.6% for impression based (CPM), click based (CPC), and conversion based (CPA) campaigns respectively.
△ Less
Submitted 15 July, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Bid Shading by Win-Rate Estimation and Surplus Maximization
Authors:
Shengjun Pan,
Brendan Kitts,
Tian Zhou,
Hao He,
Bharatbhushan Shetty,
Aaron Flores,
Djordje Gligorijevic,
Junwei Pan,
Tingyu Mao,
San Gultekin,
Jianlong Zhang
Abstract:
This paper describes a new win-rate based bid shading algorithm (WR) that does not rely on the minimum-bid-to-win feedback from a Sell-Side Platform (SSP). The method uses a modified logistic regression to predict the profit from each possible shaded bid price. The function form allows fast maximization at run-time, a key requirement for Real-Time Bidding (RTB) systems. We report production result…
▽ More
This paper describes a new win-rate based bid shading algorithm (WR) that does not rely on the minimum-bid-to-win feedback from a Sell-Side Platform (SSP). The method uses a modified logistic regression to predict the profit from each possible shaded bid price. The function form allows fast maximization at run-time, a key requirement for Real-Time Bidding (RTB) systems. We report production results from this method along with several other algorithms. We found that bid shading, in general, can deliver significant value to advertisers, reducing price per impression to about 55% of the unshaded cost. Further, the particular approach described in this paper captures 7% more profit for advertisers, than do benchmark methods of just bidding the most probable winning price. We also report 4.3% higher surplus than an industry Sell-Side Platform shading service. Furthermore, we observed 3% - 7% lower eCPM, eCPC and eCPA when the algorithm was integrated with budget controllers. We attribute the gains above as being mainly due to the explicit maximization of the surplus function, and note that other algorithms can take advantage of this same approach.
△ Less
Submitted 19 September, 2020;
originally announced September 2020.
-
Risk Bounds for Low Cost Bipartite Ranking
Authors:
San Gultekin,
John Paisley
Abstract:
Bipartite ranking is an important supervised learning problem; however, unlike regression or classification, it has a quadratic dependence on the number of samples. To circumvent the prohibitive sample cost, many recent work focus on stochastic gradient-based methods. In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to ob…
▽ More
Bipartite ranking is an important supervised learning problem; however, unlike regression or classification, it has a quadratic dependence on the number of samples. To circumvent the prohibitive sample cost, many recent work focus on stochastic gradient-based methods. In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to obtain a stochastic and low cost algorithm that does not require stochastic gradients or learning rates. Using a novel uniform risk bound---based on matrix and vector concentration inequalities---we show that the sample size required for competitive performance against the all-pairs batch algorithm does not have a quadratic dependence. Generalization bounds for both the batch and low cost stochastic algorithms are presented. Experimental results show significant speed gain against the batch algorithm, as well as competitive performance against state-of-the-art bipartite ranking algorithms on real datasets.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
MBA: Mini-Batch AUC Optimization
Authors:
San Gultekin,
Avishek Saha,
Adwait Ratnaparkhi,
John Paisley
Abstract:
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of signal processing and machine learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large datasets remains an open challenge for this problem. This paper proposes a novel approach to AUC maximization, based on sampling mini-batches of pos…
▽ More
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of signal processing and machine learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large datasets remains an open challenge for this problem. This paper proposes a novel approach to AUC maximization, based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.
△ Less
Submitted 31 May, 2018; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Online Forecasting Matrix Factorization
Authors:
San Gultekin,
John Paisley
Abstract:
In this paper the problem of forecasting high dimensional time series is considered. Such time series can be modeled as matrices where each column denotes a measurement. In addition, when missing values are present, low rank matrix factorization approaches are suitable for predicting future values. This paper formally defines and analyzes the forecasting problem in the online setting, i.e. where t…
▽ More
In this paper the problem of forecasting high dimensional time series is considered. Such time series can be modeled as matrices where each column denotes a measurement. In addition, when missing values are present, low rank matrix factorization approaches are suitable for predicting future values. This paper formally defines and analyzes the forecasting problem in the online setting, i.e. where the data arrives as a stream and only a single pass is allowed. We present and analyze novel matrix factorization techniques which can learn low-dimensional embeddings effectively in an online manner. Based on these embeddings a recursive minimum mean square error estimator is derived, which learns an autoregressive model on them. Experiments with two real datasets with tens of millions of measurements show the benefits of the proposed approach.
△ Less
Submitted 23 December, 2017;
originally announced December 2017.
-
Nonlinear Kalman Filtering with Divergence Minimization
Authors:
San Gultekin,
John Paisley
Abstract:
We consider the nonlinear Kalman filtering problem using Kullback-Leibler (KL) and $α$-divergence measures as optimization criteria. Unlike linear Kalman filters, nonlinear Kalman filters do not have closed form Gaussian posteriors because of a lack of conjugacy due to the nonlinearity in the likelihood. In this paper we propose novel algorithms to optimize the forward and reverse forms of the KL…
▽ More
We consider the nonlinear Kalman filtering problem using Kullback-Leibler (KL) and $α$-divergence measures as optimization criteria. Unlike linear Kalman filters, nonlinear Kalman filters do not have closed form Gaussian posteriors because of a lack of conjugacy due to the nonlinearity in the likelihood. In this paper we propose novel algorithms to optimize the forward and reverse forms of the KL divergence, as well as the alpha-divergence which contains these two as limiting cases. Unlike previous approaches, our algorithms do not make approximations to the divergences being optimized, but use Monte Carlo integration techniques to derive unbiased algorithms for direct optimization. We assess performance on radar and sensor tracking, and options pricing problems, showing general improvement over the UKF and EKF, as well as competitive performance with particle filtering.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Stochastic Annealing for Variational Inference
Authors:
San Gultekin,
Aonan Zhang,
John Paisley
Abstract:
We empirically evaluate a stochastic annealing strategy for Bayesian posterior optimization with variational inference. Variational inference is a deterministic approach to approximate posterior inference in Bayesian models in which a typically non-convex objective function is locally optimized over the parameters of the approximating distribution. We investigate an annealing method for optimizing…
▽ More
We empirically evaluate a stochastic annealing strategy for Bayesian posterior optimization with variational inference. Variational inference is a deterministic approach to approximate posterior inference in Bayesian models in which a typically non-convex objective function is locally optimized over the parameters of the approximating distribution. We investigate an annealing method for optimizing this objective with the aim of finding a better local optimal solution and compare with deterministic annealing methods and no annealing. We show that stochastic annealing can provide clear improvement on the GMM and HMM, while performance on LDA tends to favor deterministic annealing methods.
△ Less
Submitted 25 May, 2015;
originally announced May 2015.
-
A Collaborative Kalman Filter for Time-Evolving Dyadic Processes
Authors:
San Gultekin,
John Paisley
Abstract:
We present the collaborative Kalman filter (CKF), a dynamic model for collaborative filtering and related factorization models. Using the matrix factorization approach to collaborative filtering, the CKF accounts for time evolution by modeling each low-dimensional latent embedding as a multidimensional Brownian motion. Each observation is a random variable whose distribution is parameterized by th…
▽ More
We present the collaborative Kalman filter (CKF), a dynamic model for collaborative filtering and related factorization models. Using the matrix factorization approach to collaborative filtering, the CKF accounts for time evolution by modeling each low-dimensional latent embedding as a multidimensional Brownian motion. Each observation is a random variable whose distribution is parameterized by the dot product of the relevant Brownian motions at that moment in time. This is naturally interpreted as a Kalman filter with multiple interacting state space vectors. We also present a method for learning a dynamically evolving drift parameter for each location by modeling it as a geometric Brownian motion. We handle posterior intractability via a mean-field variational approximation, which also preserves tractability for downstream calculations in a manner similar to the Kalman filter. We evaluate the model on several large datasets, providing quantitative evaluation on the 10 million Movielens and 100 million Netflix datasets and qualitative evaluation on a set of 39 million stock returns divided across roughly 6,500 companies from the years 1962-2014.
△ Less
Submitted 22 January, 2015;
originally announced January 2015.