Practical Knowledge Distillation: Using DNNs to Beat DNNs

Lee, Chung-Wei; Apostolopulos, Pavlos Athanasios; Markov, Igor L.

Computer Science > Machine Learning

arXiv:2302.12360 (cs)

[Submitted on 23 Feb 2023 (v1), last revised 1 Mar 2023 (this version, v2)]

Title:Practical Knowledge Distillation: Using DNNs to Beat DNNs

Authors:Chung-Wei Lee, Pavlos Athanasios Apostolopulos, Igor L. Markov

View PDF

Abstract:For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.

Comments:	11 pages, 1 figure, 17 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2302.12360 [cs.LG]
	(or arXiv:2302.12360v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.12360

Submission history

From: Igor L. Markov [view email]
[v1] Thu, 23 Feb 2023 22:53:02 UTC (140 KB)
[v2] Wed, 1 Mar 2023 18:28:36 UTC (136 KB)

Computer Science > Machine Learning

Title:Practical Knowledge Distillation: Using DNNs to Beat DNNs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Practical Knowledge Distillation: Using DNNs to Beat DNNs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators