Jump to content

MNIST database: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m clean up
Old refspam. Cite WP:SECONDARY/WP:IS before restoring, please.
(19 intermediate revisions by 12 users not shown)
Line 1: Line 1:
{{Short description|Database of handwritten digits}}
{{Short description|Database of handwritten digits}}
[[File:MnistExamplesModified.png|alt=MNIST sample images|thumb|320x320px|Sample images from MNIST test dataset]]
[[File:MnistExamplesModified.png|alt=MNIST sample images|thumb|320x320px|Sample images from MNIST test dataset]]
The '''MNIST database''' (''Modified [[National Institute of Standards and Technology]] database''<ref>{{cite web |title=THE MNIST DATABASE of handwritten digits|url=http://yann.lecun.com/exdb/mnist/ |publisher=[[Yann LeCun]], Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond}}</ref>) is a large [[database]] of handwritten digits that is commonly used for [[training set|training]] various [[image processing]] systems.<ref>{{cite web|title=Support vector machines speed pattern recognition - Vision Systems Design|url=http://www.vision-systems.com/articles/print/volume-9/issue-9/technology-trends/software/support-vector-machines-speed-pattern-recognition.html|work=Vision Systems Design|access-date=17 August 2013}}</ref><ref>{{cite web|last=Gangaputra|first=Sachin|title=Handwritten digit database|url=http://cis.jhu.edu/~sachin/digit/digit.html|access-date=17 August 2013}}</ref> The database is also widely used for training and testing in the field of [[machine learning]].<ref>{{cite web|last=Qiao|first=Yu|title=THE MNIST DATABASE of handwritten digits|url=http://www.gavo.t.u-tokyo.ac.jp/~qiao/database.html|access-date=18 August 2013|year=2007}}</ref><ref>{{cite journal|last=Platt|first=John C.|title=Using analytic QP and sparseness to speed training of support vector machines|journal=Advances in Neural Information Processing Systems|year=1999|pages=557{{en dash}}563|url=http://ar.newsmth.net/att/148aa490aed5b5/smo-nips.pdf|access-date=18 August 2013|archive-url=https://web.archive.org/web/20160304083810/http://ar.newsmth.net/att/148aa490aed5b5/smo-nips.pdf|archive-date=4 March 2016|url-status=dead}}</ref> It was created by "re-mixing" the samples from NIST's original datasets.<ref>{{Cite web|title=NIST Special Database 19 - Handprinted Forms and Characters Database|url=https://www.nist.gov/system/files/documents/srd/nistsd19.pdf|last=Grother|first=Patrick J.|website=[[National Institute of Standards and Technology]]}}</ref> The creators felt that since NIST's training dataset was taken from American [[United States Census Bureau|Census Bureau]] employees, while the testing dataset was taken from [[Americans|American]] [[high school]] students, it was not well-suited for machine learning experiments.<ref name="LeCun">{{cite web|last1=LeCun| first1=Yann| last2=Cortez| first2=Corinna| last3=Burges| first3=Christopher C.J.| title=The MNIST Handwritten Digit Database| website=Yann LeCun's Website yann.lecun.com| url=http://yann.lecun.com/exdb/mnist/| access-date=30 April 2020}}</ref> Furthermore, the black and white images from NIST were [[Normalization (image processing)|normalized]] to fit into a 28x28 pixel bounding box and [[Spatial anti-aliasing|anti-aliased]], which introduced grayscale levels.<ref name="LeCun"/>
The '''MNIST database''' (''Modified [[National Institute of Standards and Technology]] database''<ref>{{cite web |title=THE MNIST DATABASE of handwritten digits|url=http://yann.lecun.com/exdb/mnist/ |publisher=[[Yann LeCun]], Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond}}</ref>) is a large [[database]] of handwritten digits that is commonly used for [[training set|training]] various [[image processing]] systems.<ref>{{cite web|title=Support vector machines speed pattern recognition - Vision Systems Design|url=http://www.vision-systems.com/articles/print/volume-9/issue-9/technology-trends/software/support-vector-machines-speed-pattern-recognition.html|work=Vision Systems Design|date=September 2004 |access-date=17 August 2013}}</ref><ref>{{cite web|last=Gangaputra|first=Sachin|title=Handwritten digit database|url=http://cis.jhu.edu/~sachin/digit/digit.html|access-date=17 August 2013}}</ref> The database is also widely used for training and testing in the field of [[machine learning]].<ref>{{cite web|last=Qiao|first=Yu|title=THE MNIST DATABASE of handwritten digits|url=http://www.gavo.t.u-tokyo.ac.jp/~qiao/database.html|access-date=18 August 2013|year=2007}}</ref><ref>{{cite journal|last=Platt|first=John C.|title=Using analytic QP and sparseness to speed training of support vector machines|journal=Advances in Neural Information Processing Systems|year=1999|pages=557{{en dash}}563|url=http://ar.newsmth.net/att/148aa490aed5b5/smo-nips.pdf|access-date=18 August 2013|archive-url=https://web.archive.org/web/20160304083810/http://ar.newsmth.net/att/148aa490aed5b5/smo-nips.pdf|archive-date=4 March 2016|url-status=dead}}</ref> It was created by "re-mixing" the samples from NIST's original datasets.<ref>{{Cite web|title=NIST Special Database 19 - Handprinted Forms and Characters Database|url=https://www.nist.gov/system/files/documents/srd/nistsd19.pdf|last=Grother|first=Patrick J.|website=[[National Institute of Standards and Technology]]}}</ref> The creators felt that since NIST's training dataset was taken from American [[United States Census Bureau|Census Bureau]] employees, while the testing dataset was taken from [[Americans|American]] [[high school]] students, it was not well-suited for machine learning experiments.<ref name="LeCun">{{cite web|last1=LeCun| first1=Yann| last2=Cortez| first2=Corinna| last3=Burges| first3=Christopher C.J.| title=The MNIST Handwritten Digit Database| website=Yann LeCun's Website yann.lecun.com| url=http://yann.lecun.com/exdb/mnist/| access-date=30 April 2020}}</ref> Furthermore, the black and white images from NIST were [[Normalization (image processing)|normalized]] to fit into a 28x28 pixel bounding box and [[Spatial anti-aliasing|anti-aliased]], which introduced grayscale levels.<ref name="LeCun"/>


The MNIST database contains 60,000 training images and 10,000 testing images.<ref>{{cite journal |last1=Kussul |first1=Ernst |last2=Baidyk |first2=Tatiana|author2-link=Tetyana Baydyk |title=Improved method of handwritten digit recognition tested on MNIST database |journal=Image and Vision Computing |year=2004 |volume=22 |issue=12 |pages=971{{en dash}}981 |doi=10.1016/j.imavis.2004.03.008}}</ref> Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.<ref>{{cite journal |last1=Zhang |first1=Bin |last2=Srihari |first2=Sargur N. |title=Fast ''k''-Nearest Neighbor Classification Using Cluster-Based Trees |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |year=2004 |volume=26 |issue=4 |pages=525{{en dash}}528 |url=http://mleg.cse.sc.edu/edu/csce822/uploads/Main.ReadingList/KNN_fastbyClustering.pdf |access-date=20 April 2020 |doi=10.1109/TPAMI.2004.1265868 |pmid=15382657 |s2cid=6883417}}</ref> The original creators of the database keep a list of some of the methods tested on it.<ref name="LeCun"/> In their original paper, they use a [[support-vector machine]] to get an error rate of 0.8%.<ref name="Gradient">{{cite journal |last=LeCun |first=Yann |author2=Léon Bottou |author3=Yoshua Bengio |author4=Patrick Haffner |title=Gradient-Based Learning Applied to Document Recognition |journal=Proceedings of the IEEE |year=1998 |volume=86 |issue=11 |pages=2278{{en dash}}2324 |url=http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf |access-date=18 August 2013 |doi=10.1109/5.726791|s2cid=14542261 }}</ref>
The MNIST database contains 60,000 training images and 10,000 testing images.<ref>{{cite journal |last1=Kussul |first1=Ernst |last2=Baidyk |first2=Tatiana|author2-link=Tetyana Baydyk |title=Improved method of handwritten digit recognition tested on MNIST database |journal=Image and Vision Computing |year=2004 |volume=22 |issue=12 |pages=971{{en dash}}981 |doi=10.1016/j.imavis.2004.03.008}}</ref> Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.<ref>{{cite journal |last1=Zhang |first1=Bin |last2=Srihari |first2=Sargur N. |title=Fast ''k''-Nearest Neighbor Classification Using Cluster-Based Trees |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |year=2004 |volume=26 |issue=4 |pages=525{{en dash}}528 |url=http://mleg.cse.sc.edu/edu/csce822/uploads/Main.ReadingList/KNN_fastbyClustering.pdf |access-date=20 April 2020 |doi=10.1109/TPAMI.2004.1265868 |pmid=15382657 |s2cid=6883417}}</ref> The original creators of the database keep a list of some of the methods tested on it.<ref name="LeCun"/> In their original paper, they use a [[support-vector machine]] to get an error rate of 0.8%.<ref name="Gradient">{{cite journal |last=LeCun |first=Yann |author2=Léon Bottou |author3=Yoshua Bengio |author4=Patrick Haffner |title=Gradient-Based Learning Applied to Document Recognition |journal=Proceedings of the IEEE |year=1998 |volume=86 |issue=11 |pages=2278{{en dash}}2324 |url=http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf |access-date=18 August 2013 |doi=10.1109/5.726791|s2cid=14542261 }}</ref>
Line 8: Line 8:


== History ==
== History ==
The set of images in the MNIST database was created in 1994<ref>http://yann.lecun.com/exdb/publis/pdf/bottou-94.pdf {{bare URL PDF|date=May 2023}}</ref> as a combination of two of [[NIST]]'s databases: Special Database 1 and Special Database 3. Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the [[United States Census Bureau]], respectively.<ref name="LeCun"/>
The set of images in the MNIST database was created in 1994 as a combination of two of [[NIST]]'s databases: Special Database 1; and Special Database 3.<ref name="bottou1994">{{cite book |first1=Léon |last1=Bottou |first2=Corinna |last2=Cortes |first3=John S. |last3=Denker |first4=Harris |last4=Drucker |first5=Isabelle |last5=Guyon |first6=L. D. |last6=Jackel |first7=Y. |last7=LeCun |first8=U. A. |last8=Muller |first9=E. |last9=Sackinger |first10=P. |last10=Simard |first11=V. |last11=Vapnik |chapter=Comparison of classifier methods: A case study in handwritten digit recognition |doi=10.1109/ICPR.1994.576879 |title=Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5) |location=Jerusalem, Israel |year=1994 |pages=77–82 |volume=2 |isbn=0-8186-6270-0 }}</ref>


Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the [[United States Census Bureau]], respectively.<ref name="LeCun"/>
The original dataset was a set of 128x128 binary images, processed into 28x28 grayscale images. There were originally 60k samples in both the training set and the testing set, but 50k of the testing set were discarded. Refer to <ref>{{Cite journal |last1=Yadav |first1=Chhavi |last2=Bottou |first2=Leon |date=2019 |title=Cold Case: The Lost MNIST Digits |url=https://proceedings.neurips.cc/paper/2019/hash/51c68dc084cb0b8467eafad1330bce66-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=32|arxiv=1905.10498 }}</ref> for a detailed history and a reconstruction of the discarded testing set.

The original dataset was a set of 128x128 binary images, processed into 28x28 grayscale images. The training set and the testing set both originally had 60k samples, but 50k of the testing set samples were discarded.<ref name="yadav2019">{{Cite journal |last1=Yadav |first1=Chhavi |last2=Bottou |first2=Leon |date=2019 |title=Cold Case: The Lost MNIST Digits |url=https://proceedings.neurips.cc/paper/2019/hash/51c68dc084cb0b8467eafad1330bce66-Abstract.html |journal=Advances in Neural Information Processing Systems |volume=32|arxiv=1905.10498 |quote=Article has a detailed history and a reconstruction of the discarded testing set. }}</ref>


== Performance ==
== Performance ==
Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.<ref name="Multideep">{{cite book|last=Cires¸an|first=Dan|chapter-url=http://repository.supsi.ch/5145/1/IDSIA-04-12.pdf|chapter=Multi-column deep neural networks for image classification|author2=Ueli Meier|author3=Jürgen Schmidhuber|title=2012 IEEE Conference on Computer Vision and Pattern Recognition|year=2012|isbn=978-1-4673-1228-8|pages=3642{{en dash}}3649|arxiv=1202.2745|citeseerx=10.1.1.300.3283|doi=10.1109/CVPR.2012.6248110|s2cid=2161592}}</ref> The highest error rate listed<ref name="LeCun"/> on the original website of the database is 12 percent, which is achieved using a simple linear classifier with no preprocessing.<ref name="Gradient"/>
Some researchers have achieved "near-human performance" on the MNIST database, using a committee of [[neural networks]]; in the same paper, the authors achieve performance double that of humans on other recognition tasks.<ref name="Multideep">{{cite book|last=Cires¸an|first=Dan|chapter-url=http://repository.supsi.ch/5145/1/IDSIA-04-12.pdf|chapter=Multi-column deep neural networks for image classification|author2=Ueli Meier|author3=Jürgen Schmidhuber|title=2012 IEEE Conference on Computer Vision and Pattern Recognition|year=2012|isbn=978-1-4673-1228-8|pages=3642{{en dash}}3649|arxiv=1202.2745|citeseerx=10.1.1.300.3283|doi=10.1109/CVPR.2012.6248110|s2cid=2161592}}</ref> The highest error rate listed<ref name="LeCun"/> on the original website of the database is 12 percent, which is achieved using a simple [[linear classifier]] with no preprocessing.<ref name="Gradient"/>


In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.<ref>{{cite journal|last=Kussul|first=Ernst|author2=Tatiana Baidyk|author2-link=Tetyana Baydyk|title=Improved method of handwritten digit recognition tested on MNIST database|journal=Image and Vision Computing|year=2004|volume=22|issue=12|pages=971{{en dash}}981|doi=10.1016/j.imavis.2004.03.008|url=https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|access-date=20 September 2013|archive-url=https://web.archive.org/web/20130921060416/https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|archive-date=21 September 2013|url-status=dead}}</ref>
In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.<ref>{{cite journal|last=Kussul|first=Ernst|author2=Tatiana Baidyk|author2-link=Tetyana Baydyk|title=Improved method of handwritten digit recognition tested on MNIST database|journal=Image and Vision Computing|year=2004|volume=22|issue=12|pages=971{{en dash}}981|doi=10.1016/j.imavis.2004.03.008|url=https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|access-date=20 September 2013|archive-url=https://web.archive.org/web/20130921060416/https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|archive-date=21 September 2013|url-status=dead}}</ref>


Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either [[Affine transformation|affine distortions]] or [[Elastic deformation|elastic distortions]].<ref name="LeCun"/> Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.<ref>{{cite journal|last=Ranzato|first=Marc'Aurelio|author2=Christopher Poultney |author3=Sumit Chopra |author4=Yann LeCun |title=Efficient Learning of Sparse Representations with an Energy-Based Model|journal=Advances in Neural Information Processing Systems|year=2006|volume=19|pages=1137{{en dash}}1144|url=http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf|access-date=20 September 2013}}</ref>
Some researchers have tested [[artificial intelligence]] systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either [[Affine transformation|affine distortions]] or [[Elastic deformation|elastic distortions]].<ref name="LeCun"/> Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.<ref>{{cite journal|last=Ranzato|first=Marc'Aurelio|author2=Christopher Poultney |author3=Sumit Chopra |author4=Yann LeCun |title=Efficient Learning of Sparse Representations with an Energy-Based Model|journal=Advances in Neural Information Processing Systems|year=2006|volume=19|pages=1137{{en dash}}1144|url=http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf|access-date=20 September 2013}}</ref>


In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks.<ref>{{cite book|last=Ciresan|first=Dan Claudiu|author2=Ueli Meier|author3=Luca Maria Gambardella|author4=Jürgen Schmidhuber|chapter=Convolutional neural network committees for handwritten character classification|title=2011 International Conference on Document Analysis and Recognition (ICDAR)|year=2011|pages=1135{{en dash}}1139|doi=10.1109/ICDAR.2011.229|chapter-url=http://www.icdar2011.org/fileup/PDF/4520b135.pdf|access-date=20 September 2013|isbn=978-1-4577-1350-7|citeseerx=10.1.1.465.2138|s2cid=10122297|archive-url=https://web.archive.org/web/20160222152015/http://www.icdar2011.org/fileup/PDF/4520b135.pdf|archive-date=22 February 2016|url-status=dead}}</ref> In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate.<ref>{{cite conference|last=Wan|first=Li|author2=Matthew Zeiler|author3=Sixin Zhang|author4=Yann LeCun|author5=Rob Fergus|title=Regularization of Neural Network using DropConnect|conference=International Conference on Machine Learning(ICML)|year=2013}}</ref> In 2016, the single convolutional neural network best performance was 0.25 percent error rate.<ref name=":0">{{Cite web|last=SimpleNet|year=2016|title=Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures|url=https://github.com/Coderx7/SimpleNet|access-date=3 December 2020|arxiv=1608.06037}}</ref> As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no [[data augmentation]] is 0.25 percent error rate.<ref name=":0"/><ref>{{Cite web|last=SimpNet|title=Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet|url=https://github.com/Coderx7/SimpNet|access-date=3 December 2020|website=Github|year=2018|arxiv=1802.06205}}</ref> Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.<ref name="Romanuke3">{{cite web|last=Romanuke|first=Vadim|title=Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.|url=https://drive.google.com/file/d/0B1WkCFOvGHDddElkdkl6bzRLRE0/view?usp=sharing|access-date=24 November 2016}}</ref><ref name="Romanuke4">{{cite journal |last1=Romanuke |first1=Vadim |title=Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate|journal=Research Bulletin of NTUU "Kyiv Polytechnic Institute"|date=2016 |volume=6|issue=6 |pages=29{{en dash}}34|doi=10.20535/1810-0546.2016.6.84115|ref=24|doi-access=free}}</ref> Some images in the testing dataset are barely readable and may prevent reaching test error rates of 0%.<ref name="mnist_github">{{cite web|last=MNIST classifier|first=GitHub|title=Classify MNIST digits using Convolutional Neural Networks|website=[[GitHub]]|url=https://github.com/j05t/mnist|access-date=3 August 2018}}</ref> In 2018, researchers from Department of System and Information Engineering, University of Virginia announced 0.18% error with simultaneous stacked three kind of neural networks (fully connected, recurrent and convolution neural networks).<ref name="Kowsari2018">{{cite journal |last1=Kowsari |first1=Kamran |last2=Heidarysafa |first2=Mojtaba |last3=Brown |first3=Donald E. |last4=Meimandi |first4=Kiana Jafari |last5=Barnes |first5=Laura E. |title=RMDL: Random Multimodel Deep Learning for Classification |journal=Proceedings of the 2018 International Conference on Information System and Data Mining |date=2018-05-03 |arxiv=1805.01890 |doi=10.1145/3206098.3206111|s2cid=19208611}}</ref>
In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks.<ref>{{cite book|last=Ciresan|first=Dan Claudiu|author2=Ueli Meier|author3=Luca Maria Gambardella|author4=Jürgen Schmidhuber|chapter=Convolutional neural network committees for handwritten character classification|title=2011 International Conference on Document Analysis and Recognition (ICDAR)|year=2011|pages=1135{{en dash}}1139|doi=10.1109/ICDAR.2011.229|chapter-url=http://www.icdar2011.org/fileup/PDF/4520b135.pdf|access-date=20 September 2013|isbn=978-1-4577-1350-7|citeseerx=10.1.1.465.2138|s2cid=10122297|archive-url=https://web.archive.org/web/20160222152015/http://www.icdar2011.org/fileup/PDF/4520b135.pdf|archive-date=22 February 2016|url-status=dead}}</ref> In 2013, an approach based on regularization of neural networks using [[DropConnect]] has been claimed to achieve a 0.21 percent error rate.<ref>{{cite conference|last=Wan|first=Li|author2=Matthew Zeiler|author3=Sixin Zhang|author4=Yann LeCun|author5=Rob Fergus|title=Regularization of Neural Network using DropConnect|conference=International Conference on Machine Learning(ICML)|year=2013}}</ref> In 2016, the single [[convolutional neural network]] best performance was 0.25 percent error rate.<ref name=":0">{{Cite web|last=SimpleNet|year=2016|title=Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures|url=https://github.com/Coderx7/SimpleNet|access-date=3 December 2020|arxiv=1608.06037}}</ref> As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no [[data augmentation]] is 0.25 percent error rate.<ref name=":0"/><ref>{{Cite web|last=SimpNet|title=Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet|url=https://github.com/Coderx7/SimpNet|access-date=3 December 2020|website=Github|year=2018|arxiv=1802.06205}}</ref> Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.<ref name="Romanuke3">{{cite web|last=Romanuke|first=Vadim|title=Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.|url=https://drive.google.com/file/d/0B1WkCFOvGHDddElkdkl6bzRLRE0/view?usp=sharing|access-date=24 November 2016}}</ref><ref name="Romanuke4">{{cite journal |last1=Romanuke |first1=Vadim |title=Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate|journal=Research Bulletin of NTUU "Kyiv Polytechnic Institute"|date=2016 |volume=6|issue=6 |pages=29{{en dash}}34|doi=10.20535/1810-0546.2016.6.84115|ref=24|doi-access=free}}</ref>


== Classifiers ==
== Classifiers ==
This is a table of some of the [[machine learning]] methods used on the dataset and their error rates, by type of classifier:
This is a table of some of the [[machine learning]] methods used on the dataset and their error rates, by type of [[Classifier (machine learning)|classifier]]:
{| class="wikitable sortable"
{| class="wikitable sortable"
|-
|-
! Type !! Classifier !! Distortion !! [[Data pre-processing|Preprocessing]] !! Error rate (%)
! Type !! Classifier !! Distortion !! [[Data pre-processing|Preprocessing]] !! Error rate (%)
|-
|-
!Neural Network
| [[Linear classifier]] || [[Linear discriminant analysis|Pairwise linear classifier]] || {{okay|None}} || Deskewing || 7.6<ref name="Gradient"/>
!Gradient Descent Tunneling
!None
!None
!0<ref>{{cite arXiv |last=Deng |first=Bo |title=Error-free Training for Artificial Neural Network |date=2023-12-26 |class=cs.LG |eprint=2312.16060}}</ref>
|-
|-
| [[Linear classifier]] || [[Linear discriminant analysis|Pairwise linear classifier]] || {{okay|None}} || Deskewing || 7.6<ref name="Gradient"/>
| Decision stream with Extremely randomized trees || Single model (depth > 400 levels) || {{okay|None}} || {{okay|None}} || 2.7<ref>{{cite book|author1=Ignatov, D.Yu.|author2=Ignatov, A.D.|title=2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI) |chapter=Decision Stream: Cultivating Deep Decision Trees |journal=IEEE Ictai|pages=905–912|doi=10.1109/ICTAI.2017.00140|date=2017|arxiv=1704.07657|bibcode=2017arXiv170407657I|isbn=978-1-5386-3876-7|s2cid=21864203 |chapter-url=https://www.researchgate.net/publication/316471270}}</ref>
|-
|-
| [[K-Nearest Neighbors]] || K-NN with rigid transformations || {{okay|None}} || {{okay|None}} || 0.96<ref>{{cite journal|last=Lindblad|first=Joakim|author2=Nataša Sladoje|title=Linear time distances between fuzzy sets with applications to pattern matching and classification|journal=IEEE Transactions on Image Processing|date=January 2014|volume=23|issue=1|pages=126{{en dash}}136|doi=10.1109/TIP.2013.2286904|pmid=24158476|bibcode=2014ITIP...23..126L |s2cid=1908950 }}</ref>
| [[K-Nearest Neighbors]] || K-NN with rigid transformations || {{okay|None}} || {{okay|None}} || 0.96<ref>{{cite journal|last=Lindblad|first=Joakim|author2=Nataša Sladoje|title=Linear time distances between fuzzy sets with applications to pattern matching and classification|journal=IEEE Transactions on Image Processing|date=January 2014|volume=23|issue=1|pages=126{{en dash}}136|doi=10.1109/TIP.2013.2286904|pmid=24158476|bibcode=2014ITIP...23..126L |s2cid=1908950 }}</ref>
Line 45: Line 51:
| [[Support-vector machine]] (SVM) || Virtual [[support-vector machine|SVM]], deg-9 poly, 2-pixel jittered || {{okay|None}} || Deskewing || 0.56<ref name="Decoste2012">{{Cite journal|last1=Decoste|first1=Dennis|last2=Schölkopf|first2=Bernhard|year=2002|title=Training Invariant Support Vector Machines|journal=Machine Learning|volume=46|pages=161{{en dash}}190|issue=1–3|doi=10.1023/A:1012454411458|oclc=703649027|issn=0885-6125|doi-access=free}}</ref>
| [[Support-vector machine]] (SVM) || Virtual [[support-vector machine|SVM]], deg-9 poly, 2-pixel jittered || {{okay|None}} || Deskewing || 0.56<ref name="Decoste2012">{{Cite journal|last1=Decoste|first1=Dennis|last2=Schölkopf|first2=Bernhard|year=2002|title=Training Invariant Support Vector Machines|journal=Machine Learning|volume=46|pages=161{{en dash}}190|issue=1–3|doi=10.1023/A:1012454411458|oclc=703649027|issn=0885-6125|doi-access=free}}</ref>
|-
|-
| Deep [[neural network]] (DNN) || 2-layer 784-800-10 || {{okay|None}} || {{okay|None}} || 1.6<ref name="simard2003">{{cite book|chapter=Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis|author=Patrice Y. Simard|author2=Dave Steinkraus|author3=John C. Platt|year=2003|chapter-url=http://research.microsoft.com/apps/pubs/?id=68920|publisher=[[Institute of Electrical and Electronics Engineers]]|doi=10.1109/ICDAR.2003.1227801|title=Proceedings of the Seventh International Conference on Document Analysis and Recognition |volume=1|pages=958|isbn=978-0-7695-1960-9|s2cid=4659176}}</ref>
| [[Neural network]] || 2-layer 784-800-10 || {{okay|None}} || {{okay|None}} || 1.6<ref name="simard2003">{{cite book|chapter=Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis|author=Patrice Y. Simard|author2=Dave Steinkraus|author3=John C. Platt|year=2003|chapter-url=http://research.microsoft.com/apps/pubs/?id=68920|publisher=[[Institute of Electrical and Electronics Engineers]]|doi=10.1109/ICDAR.2003.1227801|title=Proceedings of the Seventh International Conference on Document Analysis and Recognition |volume=1|pages=958|isbn=978-0-7695-1960-9|s2cid=4659176}}</ref>
|-
|-
| Deep [[neural network]] || 2-layer 784-800-10 || Elastic&nbsp;distortions || {{okay|None}} || 0.7<ref name="simard2003"/>
| [[Neural network]] || 2-layer 784-800-10 || Elastic&nbsp;distortions || {{okay|None}} || 0.7<ref name="simard2003"/>
|-
|-
| Deep [[neural network]] || 6-layer 784-2500-2000-1500-1000-500-10 || Elastic&nbsp;distortions || {{okay|None}} || 0.35<ref>{{cite journal|last=Ciresan|first=Claudiu Dan |author2=Ueli Meier |author3=Luca Maria Gambardella |author4=Juergen Schmidhuber |title=Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition|journal=Neural Computation|date=December 2010|volume=22|issue=12|pages=3207{{en dash}}20 |doi=10.1162/NECO_a_00052|pmid=20858131 |arxiv=1003.0358|s2cid=1918673}}</ref>
| Deep [[neural network]] (DNN) || 6-layer 784-2500-2000-1500-1000-500-10 || Elastic&nbsp;distortions || {{okay|None}} || 0.35<ref>{{cite journal|last=Ciresan|first=Claudiu Dan |author2=Ueli Meier |author3=Luca Maria Gambardella |author4=Juergen Schmidhuber |title=Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition|journal=Neural Computation|date=December 2010|volume=22|issue=12|pages=3207{{en dash}}20 |doi=10.1162/NECO_a_00052|pmid=20858131 |arxiv=1003.0358|s2cid=1918673}}</ref>
|-
|-
| {{nowrap|[[Convolutional neural network]] (CNN)}} || 6-layer 784-40-80-500-1000-2000-10 || {{okay|None}} || {{nowrap|Expansion of the training data}} || 0.31<ref name="Romanuke1">{{cite web|last=Romanuke|first=Vadim|title=The single convolutional neural network best performance in 18 epochs on the expanded training data at Parallel Computing Center, Khmelnytskyi, Ukraine|url=https://drive.google.com/file/d/0B1WkCFOvGHDdWlZvWUlLd0V3ZFU/view?usp=sharing|access-date=16 November 2016}}</ref>
| {{nowrap|[[Convolutional neural network]] (CNN)}} || 6-layer 784-40-80-500-1000-2000-10 || {{okay|None}} || {{nowrap|Expansion of the training data}} || 0.31<ref name="Romanuke1">{{cite web|last=Romanuke|first=Vadim|title=The single convolutional neural network best performance in 18 epochs on the expanded training data at Parallel Computing Center, Khmelnytskyi, Ukraine|url=https://drive.google.com/file/d/0B1WkCFOvGHDdWlZvWUlLd0V3ZFU/view?usp=sharing|access-date=16 November 2016}}</ref>
Line 62: Line 68:
|-
|-
| [[Convolutional neural network]] || {{nowrap|Committee of 5 CNNs, 6-layer 784-50-100-500-1000-10-10}} || {{okay|None}} || Expansion of the training data || 0.21<ref name="Romanuke3"/><ref name="Romanuke4"/>
| [[Convolutional neural network]] || {{nowrap|Committee of 5 CNNs, 6-layer 784-50-100-500-1000-10-10}} || {{okay|None}} || Expansion of the training data || 0.21<ref name="Romanuke3"/><ref name="Romanuke4"/>
|-
|Random Multimodel Deep Learning (RMDL)
|10 NN-10 RNN - 10 CNN|| {{okay|None}} || {{okay|None}}
|0.18<ref name="Kowsari2018"/>
|-
|-
|[[Convolutional neural network]]
|[[Convolutional neural network]]

Revision as of 21:46, 24 June 2024

MNIST sample images
Sample images from MNIST test dataset

The MNIST database (Modified National Institute of Standards and Technology database[1]) is a large database of handwritten digits that is commonly used for training various image processing systems.[2][3] The database is also widely used for training and testing in the field of machine learning.[4][5] It was created by "re-mixing" the samples from NIST's original datasets.[6] The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments.[7] Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.[7]

The MNIST database contains 60,000 training images and 10,000 testing images.[8] Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.[9] The original creators of the database keep a list of some of the methods tested on it.[7] In their original paper, they use a support-vector machine to get an error rate of 0.8%.[10]

Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the (final) successor to MNIST.[11][12] MNIST included images only of handwritten digits. EMNIST includes all the images from NIST Special Database 19, which is a large database of handwritten uppercase and lower case letters as well as digits.[13][14] The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools which work with the older, smaller, MNIST dataset will likely work unmodified with EMNIST.

History

The set of images in the MNIST database was created in 1994 as a combination of two of NIST's databases: Special Database 1; and Special Database 3.[15]

Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the United States Census Bureau, respectively.[7]

The original dataset was a set of 128x128 binary images, processed into 28x28 grayscale images. The training set and the testing set both originally had 60k samples, but 50k of the testing set samples were discarded.[16]

Performance

Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.[17] The highest error rate listed[7] on the original website of the database is 12 percent, which is achieved using a simple linear classifier with no preprocessing.[10]

In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.[18]

Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either affine distortions or elastic distortions.[7] Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.[19]

In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks.[20] In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate.[21] In 2016, the single convolutional neural network best performance was 0.25 percent error rate.[22] As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no data augmentation is 0.25 percent error rate.[22][23] Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.[24][25]

Classifiers

This is a table of some of the machine learning methods used on the dataset and their error rates, by type of classifier:

Typ Classifier Distortion Preprocessing Error rate (%)
Neural Network Gradient Descent Tunneling None None 0[26]
Linear classifier Pairwise linear classifier None Deskewing 7.6[10]
K-Nearest Neighbors K-NN with rigid transformations None None 0.96[27]
K-Nearest Neighbors K-NN with non-linear deformation (P2DHMDM) None Shiftable edges 0.52[28]
Boosted Stumps Product of stumps on Haar features None Haar features 0.87[29]
Non-linear classifier 40 PCA + quadratic classifier None None 3.3[10]
Random Forest Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)[30] None Simple statistical pixel importance 2.8[31]
Support-vector machine (SVM) Virtual SVM, deg-9 poly, 2-pixel jittered None Deskewing 0.56[32]
Neural network 2-layer 784-800-10 None None 1.6[33]
Neural network 2-layer 784-800-10 Elastic distortions None 0.7[33]
Deep neural network (DNN) 6-layer 784-2500-2000-1500-1000-500-10 Elastic distortions None 0.35[34]
Convolutional neural network (CNN) 6-layer 784-40-80-500-1000-2000-10 None Expansion of the training data 0.31[35]
Convolutional neural network 6-layer 784-50-100-500-1000-10-10 None Expansion of the training data 0.27[36]
Convolutional neural network (CNN) 13-layer 64-128(5x)-256(3x)-512-2048-256-256-10 None None 0.25[22]
Convolutional neural network Committee of 35 CNNs, 1-20-P-40-P-150-10 Elastic distortions Width normalizations 0.23[17]
Convolutional neural network Committee of 5 CNNs, 6-layer 784-50-100-500-1000-10-10 None Expansion of the training data 0.21[24][25]
Convolutional neural network Committee of 20 CNNS with Squeeze-and-Excitation Networks[37] None Data augmentation 0.17[38]
Convolutional neural network Ensemble of 3 CNNs with varying kernel sizes None Data augmentation consisting of rotation and translation 0.09[39]

See also

References

  1. ^ "THE MNIST DATABASE of handwritten digits". Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond.
  2. ^ "Support vector machines speed pattern recognition - Vision Systems Design". Vision Systems Design. September 2004. Retrieved 17 August 2013.
  3. ^ Gangaputra, Sachin. "Handwritten digit database". Retrieved 17 August 2013.
  4. ^ Qiao, Yu (2007). "THE MNIST DATABASE of handwritten digits". Retrieved 18 August 2013.
  5. ^ Platt, John C. (1999). "Using analytic QP and sparseness to speed training of support vector machines" (PDF). Advances in Neural Information Processing Systems: 557–563. Archived from the original (PDF) on 4 March 2016. Retrieved 18 August 2013.
  6. ^ Grother, Patrick J. "NIST Special Database 19 - Handprinted Forms and Characters Database" (PDF). National Institute of Standards and Technology.
  7. ^ a b c d e f LeCun, Yann; Cortez, Corinna; Burges, Christopher C.J. "The MNIST Handwritten Digit Database". Yann LeCun's Website yann.lecun.com. Retrieved 30 April 2020.
  8. ^ Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008.
  9. ^ Zhang, Bin; Srihari, Sargur N. (2004). "Fast k-Nearest Neighbor Classification Using Cluster-Based Trees" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 26 (4): 525–528. doi:10.1109/TPAMI.2004.1265868. PMID 15382657. S2CID 6883417. Retrieved 20 April 2020.
  10. ^ a b c d LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-Based Learning Applied to Document Recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. doi:10.1109/5.726791. S2CID 14542261. Retrieved 18 August 2013.
  11. ^ NIST (4 April 2017). "The EMNIST Dataset". NIST. Retrieved 11 April 2022.
  12. ^ NIST (27 August 2010). "NIST Special Database 19". NIST. Retrieved 11 April 2022.
  13. ^ Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. (2017). "EMNIST: an extension of MNIST to handwritten letters". arXiv:1702.05373 [cs.CV].
  14. ^ Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. (2017). "EMNIST: an extension of MNIST to handwritten letters". arXiv:1702.05373v1 [cs.CV].
  15. ^ Bottou, Léon; Cortes, Corinna; Denker, John S.; Drucker, Harris; Guyon, Isabelle; Jackel, L. D.; LeCun, Y.; Muller, U. A.; Sackinger, E.; Simard, P.; Vapnik, V. (1994). "Comparison of classifier methods: A case study in handwritten digit recognition". Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5). Vol. 2. Jerusalem, Israel. pp. 77–82. doi:10.1109/ICPR.1994.576879. ISBN 0-8186-6270-0.{{cite book}}: CS1 maint: location missing publisher (link)
  16. ^ Yadav, Chhavi; Bottou, Leon (2019). "Cold Case: The Lost MNIST Digits". Advances in Neural Information Processing Systems. 32. arXiv:1905.10498. Article has a detailed history and a reconstruction of the discarded testing set.
  17. ^ a b Cires¸an, Dan; Ueli Meier; Jürgen Schmidhuber (2012). "Multi-column deep neural networks for image classification" (PDF). 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649. arXiv:1202.2745. CiteSeerX 10.1.1.300.3283. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1228-8. S2CID 2161592.
  18. ^ Kussul, Ernst; Tatiana Baidyk (2004). "Improved method of handwritten digit recognition tested on MNIST database" (PDF). Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008. Archived from the original (PDF) on 21 September 2013. Retrieved 20 September 2013.
  19. ^ Ranzato, Marc'Aurelio; Christopher Poultney; Sumit Chopra; Yann LeCun (2006). "Efficient Learning of Sparse Representations with an Energy-Based Model" (PDF). Advances in Neural Information Processing Systems. 19: 1137–1144. Retrieved 20 September 2013.
  20. ^ Ciresan, Dan Claudiu; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber (2011). "Convolutional neural network committees for handwritten character classification" (PDF). 2011 International Conference on Document Analysis and Recognition (ICDAR). pp. 1135–1139. CiteSeerX 10.1.1.465.2138. doi:10.1109/ICDAR.2011.229. ISBN 978-1-4577-1350-7. S2CID 10122297. Archived from the original (PDF) on 22 February 2016. Retrieved 20 September 2013.
  21. ^ Wan, Li; Matthew Zeiler; Sixin Zhang; Yann LeCun; Rob Fergus (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning(ICML).
  22. ^ a b c SimpleNet (2016). "Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures". arXiv:1608.06037. Retrieved 3 December 2020.
  23. ^ SimpNet (2018). "Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet". Github. arXiv:1802.06205. Retrieved 3 December 2020.
  24. ^ a b Romanuke, Vadim. "Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate". Retrieved 24 November 2016.
  25. ^ a b Romanuke, Vadim (2016). "Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate". Research Bulletin of NTUU "Kyiv Polytechnic Institute". 6 (6): 29–34. doi:10.20535/1810-0546.2016.6.84115.
  26. ^ Deng, Bo (2023-12-26). "Error-free Training for Artificial Neural Network". arXiv:2312.16060 [cs.LG].
  27. ^ Lindblad, Joakim; Nataša Sladoje (January 2014). "Linear time distances between fuzzy sets with applications to pattern matching and classification". IEEE Transactions on Image Processing. 23 (1): 126–136. Bibcode:2014ITIP...23..126L. doi:10.1109/TIP.2013.2286904. PMID 24158476. S2CID 1908950.
  28. ^ Keysers, Daniel; Thomas Deselaers; Christian Gollan; Hermann Ney (August 2007). "Deformation models for image recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (8): 1422–1435. CiteSeerX 10.1.1.106.3963. doi:10.1109/TPAMI.2007.1153. PMID 17568145. S2CID 2528485.
  29. ^ Kégl, Balázs; Róbert Busa-Fekete (2009). "Boosting products of base classifiers" (PDF). Proceedings of the 26th Annual International Conference on Machine Learning. pp. 497–504. doi:10.1145/1553374.1553439. ISBN 9781605585161. S2CID 8460779. Retrieved 27 August 2013.
  30. ^ "RandomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)". 21 January 2020.
  31. ^ "Mehrad Mahmoudian / MNIST with RandomForest".
  32. ^ Decoste, Dennis; Schölkopf, Bernhard (2002). "Training Invariant Support Vector Machines". Machine Learning. 46 (1–3): 161–190. doi:10.1023/A:1012454411458. ISSN 0885-6125. OCLC 703649027.
  33. ^ a b Patrice Y. Simard; Dave Steinkraus; John C. Platt (2003). "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis". Proceedings of the Seventh International Conference on Document Analysis and Recognition. Vol. 1. Institute of Electrical and Electronics Engineers. p. 958. doi:10.1109/ICDAR.2003.1227801. ISBN 978-0-7695-1960-9. S2CID 4659176.
  34. ^ Ciresan, Claudiu Dan; Ueli Meier; Luca Maria Gambardella; Juergen Schmidhuber (December 2010). "Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition". Neural Computation. 22 (12): 3207–20. arXiv:1003.0358. doi:10.1162/NECO_a_00052. PMID 20858131. S2CID 1918673.
  35. ^ Romanuke, Vadim. "The single convolutional neural network best performance in 18 epochs on the expanded training data at Parallel Computing Center, Khmelnytskyi, Ukraine". Retrieved 16 November 2016.
  36. ^ Romanuke, Vadim. "Parallel Computing Center (Khmelnytskyi, Ukraine) gives a single convolutional neural network performing on MNIST at 0.27 percent error rate". Retrieved 24 November 2016.
  37. ^ Hu, Jie; Shen, Li; Albanie, Samuel; Sun, Gang; Wu, Enhua (2019). "Squeeze-and-Excitation Networks". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (8): 2011–2023. arXiv:1709.01507. doi:10.1109/TPAMI.2019.2913372. PMID 31034408. S2CID 140309863.
  38. ^ "GitHub - Matuzas77/MNIST-0.17: MNIST classifier with average 0.17% error". GitHub. 25 February 2020.
  39. ^ An, Sanghyeon; Lee, Minjun; Park, Sanglee; Yang, Heerin; So, Jungmin (2020-10-04). "An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition". arXiv:2008.10400 [cs.CV].

Further reading