Jump to content

ImageNet: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Muhali (talk | contribs)
m link update
m En dash
Tags: Visual edit Mobile edit Mobile web edit
 
(21 intermediate revisions by 19 users not shown)
Line 1: Line 1:
{{Short description|Image dataset}}
{{Short description|Image dataset}}
{{Use dmy dates|date=September 2019}}The '''ImageNet''' project is a large visual [[database]] designed for use in [[Outline of object recognition|visual object recognition software]] research. More than 14 million<ref name="New Scientist">{{cite news|title=New computer vision challenge wants to teach robots to see in 3D|url=https://www.newscientist.com/article/2127131-new-computer-vision-challenge-wants-to-teach-robots-to-see-in-3d/|access-date=3 February 2018|work=New Scientist|date=7 April 2017}}</ref><ref name="nytimes 2012">{{cite news|last1=Markoff|first1=John|title=For Web Images, Creating New Technology to Seek and Find|url=https://www.nytimes.com/2012/11/20/science/for-web-images-creating-new-technology-to-seek-and-find.html|access-date=3 February 2018|work=The New York Times|date=19 November 2012}}</ref> images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided.<ref>{{Cite web |date=2020-09-07 |title=ImageNet |url=http://image-net.org/about-stats.php |archive-url=https://web.archive.org/web/20200907212153/http://image-net.org/about-stats.php |archive-date=2020-09-07 |access-date=2022-10-11 }}</ref> ImageNet contains more than 20,000 categories,<ref name="nytimes 2012"/> with a typical category, such as "balloon" or "strawberry", consisting of several hundred images.<ref name=economist>{{cite news|title=From not working to neural networking|url=https://www.economist.com/news/special-report/21700756-artificial-intelligence-boom-based-old-idea-modern-twist-not|access-date=3 February 2018|newspaper=The Economist|date=25 June 2016}}</ref> The database of annotations of third-party image [[URL]]s is freely available directly from ImageNet, though the actual images are not owned by ImageNet.<ref>{{cite web|title=ImageNet Overview|url=https://image-net.org/about.php|publisher=ImageNet|access-date=15 October 2022}}</ref> Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge ([[#History_of_the_ImageNet_challenge|ILSVRC]]), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.<ref name=ILJVRC-2015/>
{{Use dmy dates|date=September 2019}}
The '''ImageNet''' project is a large visual [[database]] designed for use in [[Outline of object recognition|visual object recognition software]] research. More than 14 million<ref name="New Scientist">{{cite news|title=New computer vision challenge wants to teach robots to see in 3D|url=https://www.newscientist.com/article/2127131-new-computer-vision-challenge-wants-to-teach-robots-to-see-in-3d/|access-date=3 February 2018|work=New Scientist|date=7 April 2017}}</ref><ref name="nytimes 2012">{{cite news|last1=Markoff|first1=John|title=For Web Images, Creating New Technology to Seek and Find|url=https://www.nytimes.com/2012/11/20/science/for-web-images-creating-new-technology-to-seek-and-find.html|access-date=3 February 2018|work=The New York Times|date=19 November 2012}}</ref> images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided.<ref>{{Cite web |date=2020-09-07 |title=ImageNet |url=https://web.archive.org/web/20200907212153/http://image-net.org/about-stats.php |access-date=2022-10-11 |website=web.archive.org}}</ref> ImageNet contains more than 20,000 categories,<ref name="nytimes 2012"/> with a typical category, such as "balloon" or "strawberry", consisting of several hundred images.<ref name=economist>{{cite news|title=From not working to neural networking|url=https://www.economist.com/news/special-report/21700756-artificial-intelligence-boom-based-old-idea-modern-twist-not|access-date=3 February 2018|newspaper=The Economist|date=25 June 2016}}</ref> The database of annotations of third-party image [[URL]]s is freely available directly from ImageNet, though the actual images are not owned by ImageNet.<ref>{{cite web|title=ImageNet Overview|url=http://image-net.org/about-overview|publisher=ImageNet|access-date=22 June 2016}}</ref> Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge ([[#History_of_the_ImageNet_challenge|ILSVRC]]), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.<ref name=ILJVRC-2015/>


==Significance for deep learning==
==Significance for deep learning==
On 30 September 2012, a [[convolutional neural network]] (CNN) called [[AlexNet]]<ref name=":0">{{Cite journal|last1=Krizhevsky|first1=Alex|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffrey E.|access-date=24 May 2017|title=ImageNet classification with deep convolutional neural networks|url=https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf|journal=Communications of the ACM|volume=60|issue=6|date=June 2017|pages=84–90|doi=10.1145/3065386|s2cid=195908774|issn=0001-0782|doi-access=free}}</ref> achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up. This was made feasible due to the use of [[graphics processing unit]]s (GPUs) during training,<ref name=":0" /> an essential ingredient of the [[deep learning]] revolution. According to ''[[The Economist]]'', "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole."<ref name=economist/><ref>{{cite news|title=Machines 'beat humans' for a growing number of tasks|url=https://www.ft.com/content/4cc048f6-d5f4-11e7-a303-9060cb1e5f44|access-date=3 February 2018|work=Financial Times|date=30 November 2017}}</ref><ref>{{Cite web|url=https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/|title=The inside story of how AI got good enough to dominate Silicon Valley|last1=Gershgorn|first1=Dave|website=Quartz|date=18 June 2018 |access-date=10 December 2018}}</ref>
On 30 September 2012, a [[convolutional neural network]] (CNN) called [[AlexNet]]<ref name=":0">{{Cite journal|last1=Krizhevsky|first1=Alex|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffrey E.|access-date=24 May 2017|title=ImageNet classification with deep convolutional neural networks|url=https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf|journal=Communications of the ACM|volume=60|issue=6|date=June 2017|pages=84–90|doi=10.1145/3065386|s2cid=195908774|issn=0001-0782|doi-access=free}}</ref> achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up. Using convolutional neural networks was feasible due to the use of [[graphics processing unit]]s (GPUs) during training,<ref name=":0" /> an essential ingredient of the [[deep learning]] revolution. According to ''[[The Economist]]'', "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole."<ref name=economist/><ref>{{cite news|title=Machines 'beat humans' for a growing number of tasks|url=https://www.ft.com/content/4cc048f6-d5f4-11e7-a303-9060cb1e5f44|access-date=3 February 2018|work=Financial Times|date=30 November 2017}}</ref><ref>{{Cite web|url=https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/|title=The inside story of how AI got good enough to dominate Silicon Valley|last1=Gershgorn|first1=Dave|website=Quartz|date=18 June 2018 |access-date=10 December 2018}}</ref>


In 2015, AlexNet was outperformed by Microsoft's [[ResNets|very deep CNN]] with over 100 layers, which won the ImageNet 2015 contest.<ref name="microsoft2015">{{cite journal|last1=He|first1=Kaiming|last2=Zhang|first2=Xiangyu|last3=Ren|first3=Shaoqing|last4=Sun|first4=Jian|title=Deep Residual Learning for Image Recognition.|journal= 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|pages=770–778|year=2016|doi=10.1109/CVPR.2016.90|arxiv=1512.03385|isbn=978-1-4673-8851-1|s2cid=206594692}}</ref>
In 2015, AlexNet was outperformed by [[Microsoft]]'s [[ResNets|very deep CNN]] with over 100 layers, which won the ImageNet 2015 contest.<ref name="microsoft2015">{{cite book|last1=He|first1=Kaiming|last2=Zhang|first2=Xiangyu|last3=Ren|first3=Shaoqing|last4=Sun|first4=Jian|title=2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |chapter=Deep Residual Learning for Image Recognition |pages=770–778|year=2016|doi=10.1109/CVPR.2016.90|arxiv=1512.03385|isbn=978-1-4673-8851-1|s2cid=206594692}}</ref>


==History of the database==
==History of the database==
AI researcher [[Fei-Fei Li]] began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms.<ref name="WiredQuest">{{Cite magazine |url=https://www.wired.com/story/fei-fei-li-artificial-intelligence-humanity/ |title=Fei-Fei Li's Quest to Make AI Better for Humanity |last=Hempel |first=Jesse |magazine=Wired |quote=When Li, who had moved back to Princeton to take a job as an assistant professor in 2007, talked up her idea for ImageNet, she had a hard time getting faculty members to help out. Finally, a professor who specialized in computer architecture agreed to join her as a collaborator. |date=13 November 2018 |access-date=5 May 2019}}</ref> In 2007, Li met with Princeton professor [[Christiane Fellbaum]], one of the creators of [[WordNet]], to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the word database of WordNet and using many of its features.<ref name="Gershgorn"/>
AI researcher [[Fei-Fei Li]] began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms.<ref name="WiredQuest">{{Cite magazine |url=https://www.wired.com/story/fei-fei-li-artificial-intelligence-humanity/ |title=Fei-Fei Li's Quest to Make AI Better for Humanity |last=Hempel |first=Jesse |magazine=Wired |quote=When Li, who had moved back to Princeton to take a job as an assistant professor in 2007, talked up her idea for ImageNet, she had a hard time getting faculty members to help out. Finally, a professor who specialized in computer architecture agreed to join her as a collaborator. |date=13 November 2018 |access-date=5 May 2019}}</ref> In 2007, Li met with Princeton professor [[Christiane Fellbaum]], one of the creators of [[WordNet]], to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the word database of WordNet and using many of its features.<ref name="Gershgorn"/>


As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used [[Amazon Mechanical Turk]] to help with the classification of images.<ref name="Gershgorn"/>
As an assistant professor at [[Princeton University|Princeton]], Li assembled a team of researchers to work on the ImageNet project. They used [[Amazon Mechanical Turk]] to help with the classification of images.<ref name="Gershgorn"/>


They presented their database for the first time as a poster at the 2009 [[Conference on Computer Vision and Pattern Recognition]] (CVPR) in Florida.<ref name="Gershgorn">{{cite web |url=https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ |title=The data that transformed AI research—and possibly the world |last=Gershgorn |first=Dave |date=26 July 2017 |website=Quartz |publisher=Atlantic Media Co.|quote=Having read about WordNet's approach, Li met with professor Christiane Fellbaum, a researcher influential in the continued work on WordNet, during a 2006 visit to Princeton. |access-date=26 July 2017 }}</ref><ref>{{Citation |last1=Deng |first1=Jia |last2=Dong |first2=Wei |last3=Socher |first3=Richard |last4=Li |first4=Li-Jia |last5=Li |first5=Kai |last6=Fei-Fei |first6=Li |contribution=ImageNet: A Large-Scale Hierarchical Image Database |year=2009 |title=2009 conference on Computer Vision and Pattern Recognition |contribution-url=http://www.image-net.org/papers/imagenet_cvpr09.pdf }}</ref><ref>{{Citation|last=Li|first=Fei-Fei|title=How we're teaching computers to understand pictures|url=https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures?language=en|access-date=16 December 2018}}</ref>
They presented their database for the first time as a poster at the 2009 [[Conference on Computer Vision and Pattern Recognition]] (CVPR) in Florida.<ref name="Gershgorn">{{cite web |url=https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ |title=The data that transformed AI research—and possibly the world |last=Gershgorn |first=Dave |date=26 July 2017 |website=Quartz |publisher=Atlantic Media Co.|quote=Having read about WordNet's approach, Li met with professor Christiane Fellbaum, a researcher influential in the continued work on WordNet, during a 2006 visit to Princeton. |access-date=26 July 2017 }}</ref><ref>{{Citation |last1=Deng |first1=Jia |last2=Dong |first2=Wei |last3=Socher |first3=Richard |last4=Li |first4=Li-Jia |last5=Li |first5=Kai |last6=Fei-Fei |first6=Li |contribution=ImageNet: A Large-Scale Hierarchical Image Database |year=2009 |title=2009 conference on Computer Vision and Pattern Recognition |contribution-url=http://www.image-net.org/papers/imagenet_cvpr09.pdf |access-date=26 July 2017 |archive-date=15 January 2021 |archive-url=https://web.archive.org/web/20210115185228/http://www.image-net.org/papers/imagenet_cvpr09.pdf |url-status=dead }}</ref><ref>{{Citation|last=Li|first=Fei-Fei|title=How we're teaching computers to understand pictures|date=23 March 2015 |url=https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures?language=en|access-date=16 December 2018}}</ref>


==Dataset==
==Dataset==
ImageNet [[crowdsources]] its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad [[WordNet]] schema to categorize objects, augmented with 120 categories of [[dog breeds]] to showcase fine-grained classification.<ref name=ILJVRC-2015>Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, [[Andrej Karpathy]], Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.</ref> One downside of WordNet use is the categories may be more "elevated" than would be optimal for ImageNet: "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of [[diplodocus]]."{{Clarify|date=August 2019}}<!-- elevated? --> In 2012 ImageNet was the world's largest academic user of [[Amazon Mechanical Turk|Mechanical Turk]]. The average worker identified 50 images per minute.<ref name="nytimes 2012"/>
ImageNet [[crowdsources]] its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad [[WordNet]] schema to categorize objects, augmented with 120 categories of [[dog breeds]] to showcase fine-grained classification.<ref name=ILJVRC-2015>Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, [[Andrej Karpathy]], Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.</ref> One downside of WordNet use is the categories may be more "elevated" than would be optimal for ImageNet: "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of [[diplodocus]]."{{Clarify|date=August 2019}}<!-- elevated? --> In 2012, ImageNet was the world's largest academic user of [[Amazon Mechanical Turk|Mechanical Turk]]. The average worker identified 50 images per minute.<ref name="nytimes 2012"/>

== Subsets of the dataset ==
There are various subsets of the ImageNet dataset used in various context. One of the most highly used subset of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images.<ref>{{Cite web |title=ImageNet |url=https://www.image-net.org/download.php |access-date=2022-10-19 |website=www.image-net.org}}</ref> The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k.<ref>{{cite arXiv |last1=Ridnik |first1=Tal |last2=Ben-Baruch |first2=Emanuel |last3=Noy |first3=Asaf |last4=Zelnik-Manor |first4=Lihi |date=2021-08-05 |title=ImageNet-21K Pretraining for the Masses |class=cs.CV |eprint=2104.10972 }}</ref>


==History of the ImageNet challenge==
==History of the ImageNet challenge==
[[File:ImageNet_error_rate_history_(just_systems).svg|thumb|Error rate history on ImageNet (showing best result per team and up to 10 entries per year)]]
[[File:ImageNet_error_rate_history_(just_systems).svg|thumb|Error rate history on ImageNet (showing best result per team and up to 10 entries per year)]]
The ILSVRC aims to "follow in the footsteps" of the smaller-scale PASCAL VOC challenge, established in 2005, which contained only about 20,000 images and twenty object classes.<ref name="ILJVRC-2015" /> To "democratize" ImageNet, Fei-Fei Li proposed to the PASCAL VOC team a collaboration, beginning in 2010, where research teams would
The ILSVRC aims to "follow in the footsteps" of the smaller-scale PASCAL VOC challenge, established in 2005, which contained only about 20,000 images and twenty object classes.<ref name="ILJVRC-2015" /> To "democratize" ImageNet, Fei-Fei Li proposed to the PASCAL VOC team a collaboration, beginning in 2010, where research teams would evaluate their algorithms on the given data set, and compete to achieve higher accuracy on several visual recognition tasks.<ref name="Gershgorn"/>
evaluate their algorithms on the given data set, and compete to achieve higher accuracy on several visual recognition tasks.<ref name="Gershgorn"/>


The resulting annual competition is now known as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The ILSVRC uses a "trimmed" list of only 1000 image categories or "classes", including 90 of the 120 dog breeds classified by the full ImageNet schema.<ref name="ILJVRC-2015" /> The 2010s saw dramatic progress in image processing. Around 2011, a good ILSVRC classification top-5 error rate was 25%. In 2012, a deep [[Convolutional neural network|convolutional neural net]] called [[AlexNet]] achieved 16%; in the next couple of years, top-5 error rates fell to a few percent.<ref>{{cite news|last1=Robbins|first1=Martin|title=Does an AI need to make love to Rembrandt's girlfriend to make art?|url=https://www.theguardian.com/science/2016/may/06/does-an-ai-need-to-make-love-to-rembrandts-girlfriend-to-make-art|access-date=22 June 2016|work=The Guardian|date=6 May 2016}}</ref> While the 2012 breakthrough "combined pieces that were all there before", the dramatic quantitative improvement marked the start of an industry-wide artificial intelligence boom.<ref name="economist" /> By 2015, researchers at Microsoft reported that their CNNs exceeded human ability at the narrow ILSVRC tasks.<ref name="microsoft2015" /><ref>{{cite news|last1=Markoff|first1=John|title=A Learning Advance in Artificial Intelligence Rivals Human Abilities|url=https://www.nytimes.com/2015/12/11/science/an-advance-in-artificial-intelligence-rivals-human-vision-abilities.html|access-date=22 June 2016|work=The New York Times|date=10 December 2015}}</ref> However, as one of the challenge's organizers, [[Olga Russakovsky]], pointed out in 2015, the programs only have to identify images as belonging to one of a thousand categories; humans can recognize a larger number of categories, and also (unlike the programs) can judge the context of an image.<ref>{{cite news|last1=Aron|first1=Jacob|title=Forget the Turing test – there are better ways of judging AI|url=https://www.newscientist.com/article/dn28206-forget-the-turing-test-there-are-better-ways-of-judging-ai/|access-date=22 June 2016|work=New Scientist|date=21 September 2015}}</ref>
The resulting annual competition is now known as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The ILSVRC uses a "trimmed" list of only 1000 image categories or "classes", including 90 of the 120 dog breeds classified by the full ImageNet schema.<ref name="ILJVRC-2015" /> The 2010s saw dramatic progress in image processing. Around 2011, a good ILSVRC classification top-5 error rate was 25%. In 2012, a deep [[Convolutional neural network|convolutional neural net]] called [[AlexNet]] achieved 16%; in the next couple of years, top-5 error rates fell to a few percent.<ref>{{cite news|last1=Robbins|first1=Martin|title=Does an AI need to make love to Rembrandt's girlfriend to make art?|url=https://www.theguardian.com/science/2016/may/06/does-an-ai-need-to-make-love-to-rembrandts-girlfriend-to-make-art|access-date=22 June 2016|work=The Guardian|date=6 May 2016}}</ref> While the 2012 breakthrough "combined pieces that were all there before", the dramatic quantitative improvement marked the start of an industry-wide artificial intelligence boom.<ref name="economist" /> By 2015, researchers at Microsoft reported that their CNNs exceeded human ability at the narrow ILSVRC tasks.<ref name="microsoft2015" /><ref>{{cite news|last1=Markoff|first1=John|title=A Learning Advance in Artificial Intelligence Rivals Human Abilities|url=https://www.nytimes.com/2015/12/11/science/an-advance-in-artificial-intelligence-rivals-human-vision-abilities.html|access-date=22 June 2016|work=The New York Times|date=10 December 2015}}</ref> However, as one of the challenge's organizers, [[Olga Russakovsky]], pointed out in 2015, the programs only have to identify images as belonging to one of a thousand categories; humans can recognize a larger number of categories, and also (unlike the programs) can judge the context of an image.<ref>{{cite news|last1=Aron|first1=Jacob|title=Forget the Turing test – there are better ways of judging AI|url=https://www.newscientist.com/article/dn28206-forget-the-turing-test-there-are-better-ways-of-judging-ai/|access-date=22 June 2016|work=New Scientist|date=21 September 2015}}</ref>


By 2014, more than fifty institutions participated in the ILSVRC.<ref name=ILJVRC-2015 /> In 2017, 29 of 38 competing teams had greater than 95% accuracy.<ref>{{cite news|last1=Gershgorn|first1=Dave|title=The Quartz guide to artificial intelligence: What is it, why is it important, and should we be afraid?|url=https://qz.com/1046350/the-quartz-guide-to-artificial-intelligence-what-is-it-why-is-it-important-and-should-we-be-afraid/|access-date=3 February 2018|work=Quartz|date=10 September 2017}}</ref> In 2017 ImageNet stated it would roll out a new, much more difficult, challenge in 2018 that involves classifying 3D objects using natural language. Because creating 3D data is more costly than annotating a pre-existing 2D image, the dataset is expected to be smaller. The applications of progress in this area would range from robotic navigation to [[augmented reality]].<ref name="New Scientist"/>
By 2014, more than fifty institutions participated in the ILSVRC.<ref name=ILJVRC-2015 /> In 2017, 29 of 38 competing teams had greater than 95% accuracy.<ref>{{cite news|last1=Gershgorn|first1=Dave|title=The Quartz guide to artificial intelligence: What is it, why is it important, and should we be afraid?|url=https://qz.com/1046350/the-quartz-guide-to-artificial-intelligence-what-is-it-why-is-it-important-and-should-we-be-afraid/|access-date=3 February 2018|work=Quartz|date=10 September 2017}}</ref> In 2017 ImageNet stated it would roll out a new, much more difficult challenge in 2018 that involves classifying 3D objects using natural language. Because creating 3D data is more costly than annotating a pre-existing 2D image, the dataset is expected to be smaller. The applications of progress in this area would range from robotic navigation to [[augmented reality]].<ref name="New Scientist"/>


== Bias in ImageNet ==
== Bias in ImageNet ==
A study of the history of the multiple layers ([[Taxonomy (general)|taxonomy]], object classes and labeling) of ImageNet and WordNet in 2019 described how [[Algorithmic bias|bias]] is deeply embedded in most classification approaches for of all sorts of images.<ref>{{Cite magazine|url=https://www.wired.com/story/viral-app-labels-you-isnt-what-you-think/|title=The Viral App That Labels You Isn't Quite What You Think|magazine=Wired|access-date=22 September 2019|issn=1059-1028}}</ref><ref>{{Cite news|url=https://www.theguardian.com/technology/2019/sep/17/imagenet-roulette-asian-racist-slur-selfie|title=The viral selfie app ImageNet Roulette seemed fun – until it called me a racist slur|last=Wong|first=Julia Carrie|date=18 September 2019|work=The Guardian|access-date=22 September 2019|issn=0261-3077}}</ref><ref>{{Cite web|url=https://www.excavating.ai/|title=Excavating AI: The Politics of Training Sets for Machine Learning|last1=Crawford|first1=Kate|last2=Paglen|first2=Trevor|date=19 September 2019|website=-|access-date=22 September 2019}}</ref><ref>{{Cite journal|title=Excavating "Excavating AI": The Elephant in the Gallery |last=Lyons|first=Michael|date=4 September 2020|doi=10.5281/zenodo.4037538|arxiv=2009.01215 |s2cid=221447952}}</ref> ImageNet is working to address various sources of bias.<ref>{{Cite web|url=http://image-net.org/update-sep-17-2019.php|title=Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy|date=17 September 2019|website=image-net.org|access-date=22 September 2019}}</ref>
A study of the history of the multiple layers ([[Taxonomy (general)|taxonomy]], object classes and labeling) of ImageNet and WordNet in 2019 described how [[Algorithmic bias|bias]]{{clarification needed|date=December 2023}} is deeply embedded in most classification approaches for all sorts of images.<ref>{{Cite magazine|url=https://www.wired.com/story/viral-app-labels-you-isnt-what-you-think/|title=The Viral App That Labels You Isn't Quite What You Think|magazine=Wired|access-date=22 September 2019|issn=1059-1028}}</ref><ref>{{Cite news |url=https://www.theguardian.com/technology/2019/sep/17/imagenet-roulette-asian-racist-slur-selfie |title=The viral selfie app ImageNet Roulette seemed fun – until it called me a racist slur |last=Wong |first=Julia Carrie |author-link=Julia Carrie Wong |date=18 September 2019 |work=The Guardian|access-date=22 September 2019 |issn=0261-3077}}</ref><ref>{{Cite web|url=https://www.excavating.ai/|title=Excavating AI: The Politics of Training Sets for Machine Learning|last1=Crawford|first1=Kate|last2=Paglen|first2=Trevor|date=19 September 2019|website=-|access-date=22 September 2019}}</ref><ref>{{cite arXiv|last=Lyons|first=Michael|date=24 December 2020|title=Excavating "Excavating AI": The Elephant in the Gallery|eprint=2009.01215}}</ref> ImageNet is working to address various sources of bias.<ref>{{Cite web|url=http://image-net.org/update-sep-17-2019.php|title=Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy|date=17 September 2019|website=image-net.org|access-date=22 September 2019}}</ref>


== See also ==
== See also ==

Latest revision as of 02:08, 25 April 2024

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million[1][2] images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided.[3] ImageNet contains more than 20,000 categories,[2] with a typical category, such as "balloon" or "strawberry", consisting of several hundred images.[4] The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet.[5] Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.[6]

Significance for deep learning[edit]

On 30 September 2012, a convolutional neural network (CNN) called AlexNet[7] achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training,[7] an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole."[4][8][9]

In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest.[10]

History of the database[edit]

AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms.[11] In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the word database of WordNet and using many of its features.[12]

As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images.[12]

They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida.[12][13][14]

Dataset[edit]

ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification.[6] One downside of WordNet use is the categories may be more "elevated" than would be optimal for ImageNet: "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus."[clarification needed] In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute.[2]

Subsets of the dataset[edit]

There are various subsets of the ImageNet dataset used in various context. One of the most highly used subset of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images.[15] The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k.[16]

History of the ImageNet challenge[edit]

Error rate history on ImageNet (showing best result per team and up to 10 entries per year)

The ILSVRC aims to "follow in the footsteps" of the smaller-scale PASCAL VOC challenge, established in 2005, which contained only about 20,000 images and twenty object classes.[6] To "democratize" ImageNet, Fei-Fei Li proposed to the PASCAL VOC team a collaboration, beginning in 2010, where research teams would evaluate their algorithms on the given data set, and compete to achieve higher accuracy on several visual recognition tasks.[12]

The resulting annual competition is now known as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The ILSVRC uses a "trimmed" list of only 1000 image categories or "classes", including 90 of the 120 dog breeds classified by the full ImageNet schema.[6] The 2010s saw dramatic progress in image processing. Around 2011, a good ILSVRC classification top-5 error rate was 25%. In 2012, a deep convolutional neural net called AlexNet achieved 16%; in the next couple of years, top-5 error rates fell to a few percent.[17] While the 2012 breakthrough "combined pieces that were all there before", the dramatic quantitative improvement marked the start of an industry-wide artificial intelligence boom.[4] By 2015, researchers at Microsoft reported that their CNNs exceeded human ability at the narrow ILSVRC tasks.[10][18] However, as one of the challenge's organizers, Olga Russakovsky, pointed out in 2015, the programs only have to identify images as belonging to one of a thousand categories; humans can recognize a larger number of categories, and also (unlike the programs) can judge the context of an image.[19]

By 2014, more than fifty institutions participated in the ILSVRC.[6] In 2017, 29 of 38 competing teams had greater than 95% accuracy.[20] In 2017 ImageNet stated it would roll out a new, much more difficult challenge in 2018 that involves classifying 3D objects using natural language. Because creating 3D data is more costly than annotating a pre-existing 2D image, the dataset is expected to be smaller. The applications of progress in this area would range from robotic navigation to augmented reality.[1]

Bias in ImageNet[edit]

A study of the history of the multiple layers (taxonomy, object classes and labeling) of ImageNet and WordNet in 2019 described how bias[clarification needed] is deeply embedded in most classification approaches for all sorts of images.[21][22][23][24] ImageNet is working to address various sources of bias.[25]

See also[edit]

References[edit]

  1. ^ a b "New computer vision challenge wants to teach robots to see in 3D". New Scientist. 7 April 2017. Retrieved 3 February 2018.
  2. ^ a b c Markoff, John (19 November 2012). "For Web Images, Creating New Technology to Seek and Find". The New York Times. Retrieved 3 February 2018.
  3. ^ "ImageNet". 7 September 2020. Archived from the original on 7 September 2020. Retrieved 11 October 2022.
  4. ^ a b c "From not working to neural networking". The Economist. 25 June 2016. Retrieved 3 February 2018.
  5. ^ "ImageNet Overview". ImageNet. Retrieved 15 October 2022.
  6. ^ a b c d e Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
  7. ^ a b Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (June 2017). "ImageNet classification with deep convolutional neural networks" (PDF). Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782. S2CID 195908774. Retrieved 24 May 2017.
  8. ^ "Machines 'beat humans' for a growing number of tasks". Financial Times. 30 November 2017. Retrieved 3 February 2018.
  9. ^ Gershgorn, Dave (18 June 2018). "The inside story of how AI got good enough to dominate Silicon Valley". Quartz. Retrieved 10 December 2018.
  10. ^ a b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). "Deep Residual Learning for Image Recognition". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1. S2CID 206594692.
  11. ^ Hempel, Jesse (13 November 2018). "Fei-Fei Li's Quest to Make AI Better for Humanity". Wired. Retrieved 5 May 2019. When Li, who had moved back to Princeton to take a job as an assistant professor in 2007, talked up her idea for ImageNet, she had a hard time getting faculty members to help out. Finally, a professor who specialized in computer architecture agreed to join her as a collaborator.
  12. ^ a b c d Gershgorn, Dave (26 July 2017). "The data that transformed AI research—and possibly the world". Quartz. Atlantic Media Co. Retrieved 26 July 2017. Having read about WordNet's approach, Li met with professor Christiane Fellbaum, a researcher influential in the continued work on WordNet, during a 2006 visit to Princeton.
  13. ^ Deng, Jia; Dong, Wei; Socher, Richard; Li, Li-Jia; Li, Kai; Fei-Fei, Li (2009), "ImageNet: A Large-Scale Hierarchical Image Database" (PDF), 2009 conference on Computer Vision and Pattern Recognition, archived from the original (PDF) on 15 January 2021, retrieved 26 July 2017
  14. ^ Li, Fei-Fei (23 March 2015), How we're teaching computers to understand pictures, retrieved 16 December 2018
  15. ^ "ImageNet". www.image-net.org. Retrieved 19 October 2022.
  16. ^ Ridnik, Tal; Ben-Baruch, Emanuel; Noy, Asaf; Zelnik-Manor, Lihi (5 August 2021). "ImageNet-21K Pretraining for the Masses". arXiv:2104.10972 [cs.CV].
  17. ^ Robbins, Martin (6 May 2016). "Does an AI need to make love to Rembrandt's girlfriend to make art?". The Guardian. Retrieved 22 June 2016.
  18. ^ Markoff, John (10 December 2015). "A Learning Advance in Artificial Intelligence Rivals Human Abilities". The New York Times. Retrieved 22 June 2016.
  19. ^ Aron, Jacob (21 September 2015). "Forget the Turing test – there are better ways of judging AI". New Scientist. Retrieved 22 June 2016.
  20. ^ Gershgorn, Dave (10 September 2017). "The Quartz guide to artificial intelligence: What is it, why is it important, and should we be afraid?". Quartz. Retrieved 3 February 2018.
  21. ^ "The Viral App That Labels You Isn't Quite What You Think". Wired. ISSN 1059-1028. Retrieved 22 September 2019.
  22. ^ Wong, Julia Carrie (18 September 2019). "The viral selfie app ImageNet Roulette seemed fun – until it called me a racist slur". The Guardian. ISSN 0261-3077. Retrieved 22 September 2019.
  23. ^ Crawford, Kate; Paglen, Trevor (19 September 2019). "Excavating AI: The Politics of Training Sets for Machine Learning". -. Retrieved 22 September 2019.
  24. ^ Lyons, Michael (24 December 2020). "Excavating "Excavating AI": The Elephant in the Gallery". arXiv:2009.01215.
  25. ^ "Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy". image-net.org. 17 September 2019. Retrieved 22 September 2019.

External links[edit]