Jump to content

OpenAI Codex: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Capabilities: add quote from NYT source
 
(36 intermediate revisions by 31 users not shown)
Line 1: Line 1:
{{short description|Artifical intelligence model geared towards programming}}
{{short description|Artificial intelligence model geared towards programming}}


'''OpenAI Codex''' is an [[artificial intelligence]] model developed by [[OpenAI]]. It parses natural language and generates code in response. It is used to power [[GitHub Copilot]], a programming [[autocompletion]] tool developed for [[Visual Studio Code]].<ref name="OAI">{{Cite web|last=Zaremba|first=Wojciech|author-link=Wojciech Zaremba|date=August 10, 2021|title=OpenAI Codex|url=https://openai.com/blog/openai-codex/|url-status=live|access-date=2021-09-03|website=[[OpenAI]]}}</ref> Codex is a descendent of OpenAI's [[GPT-3]] model, fine-tuned for use in programming applications.
'''OpenAI Codex''' is an [[artificial intelligence]] model developed by [[OpenAI]]. It parses natural language and generates [[computer program|code]] in response. It powers [[GitHub Copilot]], a programming [[autocompletion]] tool for select [[Integrated development environment|IDEs]], like [[Visual Studio Code]] and [[Vim (text editor)|Neovim]].<ref name="OAI">{{cite web|last=Zaremba|first=Wojciech|author-link=Wojciech Zaremba|date=August 10, 2021|title=OpenAI Codex|url=https://openai.com/blog/openai-codex/|access-date=2021-09-03|website=[[OpenAI]]|archive-date=2023-02-03|archive-url=https://web.archive.org/web/20230203201912/https://openai.com/blog/openai-codex/|url-status=live}}</ref> Codex is a descendant of OpenAI's [[GPT-3]] model, [[fine-tuning (machine learning)|fine-tuned]] for use in programming applications.


OpenAI released an [[API]] for Codex in [[closed beta]].<ref name="OAI" /> In March 2023, OpenAI shut down access to Codex.<ref>{{Cite web |last=Kemper |first=Jonathan |date=2023-03-22 |title=OpenAI kills its Codex code model, recommends GPT3.5 instead |url=https://the-decoder.com/openai-kills-code-model-codex/ |access-date=2023-03-29 |website=THE DECODER |language=en-US |archive-date=2023-06-01 |archive-url=https://web.archive.org/web/20230601195835/https://the-decoder.com/openai-kills-code-model-codex/ |url-status=live }}</ref> Due to public appeals from researchers, OpenAI reversed course.<ref>{{Cite tweet |user=OfficialLoganK |author=Logan Kilpatrick |number=1638336152800206858 |title=Hey Carolyn, we will continue to support Codex access via our Researcher Access Program. Sorry for any confusion and hopefully the research is going well! |access-date=2023-04-08}}</ref> The Codex model can still be used by researchers of the OpenAI Research Access Program.<ref>{{Cite web |title=Researcher Access Program application |url=https://openai.com/form/researcher-access-program |access-date=2023-04-08 |website=openai.com |language=en-US |archive-date=2023-10-10 |archive-url=https://web.archive.org/web/20231010073704/https://openai.com/form/researcher-access-program |url-status=live }}</ref>
OpenAI has released an [[API]] for Codex in [[closed beta]].<ref name="OAI" />


== Capabilities ==
== Capabilities ==
Based on GPT-3, a [[neural network]] trained on text, Codex has additionally been trained on 159 gigabytes of [[Python (programming language)|Python]] code from 54 million [[GitHub]] repositories.<ref name="VB-bias">{{Cite news|last=Wiggers|first=Kyle|date=July 8, 2021|title=OpenAI warns AI behind GitHub’s Copilot may be susceptible to bias|work=[[VentureBeat]]|url=https://venturebeat.com/2021/07/08/openai-warns-ai-behind-githubs-copilot-may-be-susceptible-to-bias/|access-date=2021-09-03}}</ref><ref name="IQ">{{Cite news|last=Alford|first=Anthony|date=August 31, 2021|title=OpenAI Announces 12 Billion Parameter Code-Generation AI Codex|work=InfoQ|url=https://www.infoq.com/news/2021/08/openai-codex/|access-date=2021-09-03}}</ref> A typical use case of Codex is typing a comment, such as "<code>//compute the moving average of an array for a given window size</code>", then using the AI to suggest a block of code satisfying that prompt.<ref name="RegTA">{{Cite news|last=Anderson|first=Tim|last2=Quach|first2=Katyanna|date=July 6, 2021|title=GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it|work=[[The Register]]|url=https://www.theregister.com/2021/07/06/github_copilot_autocoder_caught_spilling/|access-date=2021-09-04}}</ref> OpenAI has stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than replace it; according to OpenAI's blog, Codex excels most at "mapping [...] simple problems to existing code", which they describe as "probably the least fun part of programming".<ref name="SH">{{Cite news|last=Dorrier|first=Jason|date=August 15, 2021|title=OpenAI’s Codex Translates Everyday Language Into Computer Code|work=[[SingularityHub]]|url=https://singularityhub.com/2021/08/15/openais-codex-translates-everyday-language-into-computer-code/|access-date=2021-09-03}}</ref><ref name="VB">{{Cite news|last=Dickson|first=Ben|date=August 16, 2021|title=What to expect from OpenAI’s Codex API|work=[[VentureBeat]]|url=https://venturebeat.com/2021/08/16/what-to-expect-from-openais-codex-api/|access-date=2021-09-03}}</ref> [[Jeremy Howard (entrepreneur)|Jeremy Howard]], co-founder of [[Fast.ai]], stated that "[Codex] is a way of getting code written without having to write as much code" and that "it is not always correct, but it is just close enough".<ref name="NYT">{{Cite news|last=Metz|first=Cade|date=September 9, 2021|title=A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.|work=[[The New York Times]]|url=https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.html|access-date=2021-09-16}}</ref> According to a paper written by OpenAI researchers, when attempting each test case 100 times, 70.2% of prompts had working solutions.<ref name="arXiv">{{Cite arxiv|last=Chen|first=Mark|last2=Tworek|first2=Jerry|last3=Jun|first3=Heewoo|last4=Yuan|first4=Qiming|last5=Pinto|first5=Henrique Ponde de Oliveira|last6=Kaplan|first6=Jared|last7=Edwards|first7=Harri|last8=Burda|first8=Yuri|last9=Joseph|first9=Nicholas|last10=Brockman|first10=Greg|last11=Ray|first11=Alex|date=2021-07-14|title=Evaluating Large Language Models Trained on Code |arxiv=2107.03374 |class=cs}}</ref>
Based on GPT-3, a [[neural network]] trained on text, Codex was additionally trained on 159 gigabytes of [[Python (programming language)|Python]] code from 54 million [[GitHub]] repositories.<ref name="VB-bias">{{Cite news|last=Wiggers|first=Kyle|date=July 8, 2021|title=OpenAI warns AI behind GitHub's Copilot may be susceptible to bias|work=[[VentureBeat]]|url=https://venturebeat.com/2021/07/08/openai-warns-ai-behind-githubs-copilot-may-be-susceptible-to-bias/|access-date=2021-09-03|archive-date=2023-02-03|archive-url=https://web.archive.org/web/20230203201912/https://venturebeat.com/business/openai-warns-ai-behind-githubs-copilot-may-be-susceptible-to-bias/|url-status=live}}</ref><ref name="IQ">{{Cite news|last=Alford|first=Anthony|date=August 31, 2021|title=OpenAI Announces 12 Billion Parameter Code-Generation AI Codex|work=InfoQ|url=https://www.infoq.com/news/2021/08/openai-codex/|access-date=2021-09-03|archive-date=2022-07-09|archive-url=https://web.archive.org/web/20220709221205/https://www.infoq.com/news/2021/08/openai-codex/|url-status=live}}</ref> A typical use case of Codex is for a user to type a comment, such as "<code>//compute the moving average of an array for a given window size</code>", then use the AI to suggest a block of code that satisfies that comment prompt.<ref name="RegTA">{{Cite news|last1=Anderson|first1=Tim|last2=Quach|first2=Katyanna|date=July 6, 2021|title=GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it|work=[[The Register]]|url=https://www.theregister.com/2021/07/06/github_copilot_autocoder_caught_spilling/|access-date=2021-09-04|archive-date=2023-06-02|archive-url=https://web.archive.org/web/20230602214528/https://www.theregister.com/2021/07/06/github_copilot_autocoder_caught_spilling/|url-status=live}}</ref> OpenAI stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it. According to OpenAI's blog, Codex excels most at "mapping... simple problems to existing code", which they describe as "probably the least fun part of programming".<ref name="SH">{{Cite news|last=Dorrier|first=Jason|date=August 15, 2021|title=OpenAI's Codex Translates Everyday Language Into Computer Code|work=[[SingularityHub]]|url=https://singularityhub.com/2021/08/15/openais-codex-translates-everyday-language-into-computer-code/|access-date=2021-09-03|archive-date=2023-05-26|archive-url=https://web.archive.org/web/20230526045651/https://singularityhub.com/2021/08/15/openais-codex-translates-everyday-language-into-computer-code/|url-status=live}}</ref><ref name="VB">{{Cite news|last=Dickson|first=Ben|date=August 16, 2021|title=What to expect from OpenAI's Codex API|work=[[VentureBeat]]|url=https://venturebeat.com/2021/08/16/what-to-expect-from-openais-codex-api/|access-date=2021-09-03|archive-date=2023-02-03|archive-url=https://web.archive.org/web/20230203201913/https://venturebeat.com/ai/what-to-expect-from-openais-codex-api/|url-status=live}}</ref> [[Jeremy Howard (entrepreneur)|Jeremy Howard]], co-founder of [[Fast.ai]], stated that "[[Codex]] is a way of getting code written without having to write as much code", and that "it is not always correct, but it is just close enough".<ref name="NYT">{{Cite news|last=Metz|first=Cade|date=September 9, 2021|title=A.I. Can Now Write Its Own Computer Code. That's Good News for Humans.|work=[[The New York Times]]|url=https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.html|access-date=2021-09-16|archive-date=2022-03-30|archive-url=https://web.archive.org/web/20220330010719/https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.html|url-status=live}}</ref> According to a paper written by OpenAI researchers, when Codex attempted each test case 100 times, it generated working solutions for 70.2% of prompts.<ref name="arXiv">{{Cite arXiv|last1=Chen|first1=Mark|last2=Tworek|first2=Jerry|last3=Jun|first3=Heewoo|last4=Yuan|first4=Qiming|last5=Pinto|first5=Henrique Ponde de Oliveira|last6=Kaplan|first6=Jared|last7=Edwards|first7=Harri|last8=Burda|first8=Yuri|last9=Joseph|first9=Nicholas|last10=Brockman|first10=Greg|last11=Ray|first11=Alex|date=2021-07-14|title=Evaluating Large Language Models Trained on Code |eprint=2107.03374 |class=cs}}</ref>


OpenAI claims that Codex is able to function in over a dozen programming languages, including [[Go (programming language)|Go]], [[JavaScript]], [[Perl]], [[PHP]], [[Ruby (programming language)|Ruby]], [[Shell (programming language)|Shell]], [[Swift (programming language)|Swift]], and [[TypeScript]], though it is most effective in Python.<ref name="OAI" /> According to ''[[VentureBeat]]'', demonstrations uploaded by OpenAI showed impressive [[coreference resolution]] capabilities and were able to create a [[browser game]] in JavaScript and generate data science charts using [[matplotlib]].<ref name="VB" />
OpenAI claims that Codex can create code in over a dozen programming languages, including [[Go (programming language)|Go]], [[JavaScript]], [[Perl]], [[PHP]], [[Ruby (programming language)|Ruby]], [[Shell (programming language)|Shell]], [[Swift (programming language)|Swift]], and [[TypeScript]], though it is most effective in Python.<ref name="OAI" /> According to ''[[VentureBeat]]'', demonstrations uploaded by OpenAI showed impressive [[coreference resolution]] capabilities. The demonstrators were able to create a [[browser game]] in JavaScript and generate data science charts using [[matplotlib]].<ref name="VB" />


A very powerful language model called OpenAI Codex was created expressly to generate code in response to natural language commands. It is capable of understanding and producing code in a multitude of areas because it is compatible with a large number of programming languages and libraries. Codex is a useful tool for developers who want to optimize their coding processes because it can debug, parse natural language inquiries, and provide code completions.<ref>{{Cite web |title= Best AI Headshot Generators |url=https://supermachines.io/best-ai-headshot-generators |access-date=2024-03-12 |language=en-US}}</ref>
OpenAI has demonstrated that Codex is able to interface with services and apps such as [[Mailchimp]], [[Microsoft Word]], [[Spotify]], and [[Google Calendar]].<ref name="VB" /><ref name="Verge">{{Cite news|last=Vincent|first=James|date=August 10, 2021|title=OpenAI can translate English into code with its new machine learning software Codex|work=[[The Verge]]|url=https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access|access-date=2021-09-03}}</ref> [[Microsoft]] is reportedly interested in exploring Codex's capabilities.<ref name="Verge" />

OpenAI showed that Codex can interface with services and apps such as [[Mailchimp]], [[Microsoft Word]], [[Spotify]], and [[Google Calendar]].<ref name="VB" /><ref name="Verge">{{Cite news|last=Vincent|first=James|date=August 10, 2021|title=OpenAI can translate English into code with its new machine learning software Codex|work=[[The Verge]]|url=https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access|access-date=2021-09-03|archive-date=2021-09-02|archive-url=https://web.archive.org/web/20210902142401/https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access|url-status=live}}</ref> [[Microsoft]] is {{vague|text=reportedly interested in exploring|reason=this sentence says only marginally more than nothing|date=March 2023}} Codex's capabilities.<ref name="Verge" />


== Issues ==
== Issues ==
OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples.<ref name="VB" /> In an interview with ''[[The Verge]]'', OpenAI [[chief technology officer]] Greg Brockman said that "sometimes [Codex] doesn't quite know exactly what you're asking" and that it can require some trial and error.<ref name="Verge" /> OpenAI researchers found that Codex struggles with multi-step and higher-level prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code.<ref name="arXiv" />
OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples.<ref name="VB" /> In an interview with ''[[The Verge]]'', OpenAI [[chief technology officer]] [[Greg Brockman]] said that "sometimes [Codex] doesn't quite know exactly what you're asking" and that it can require some trial and error.<ref name="Verge" /> OpenAI researchers found that Codex struggles with multi-step and {{clarify|text=higher-level|date=March 2023}} prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code.<ref name="arXiv" />


''VentureBeat'' has stated that because Codex is trained on public data, it could be vulnerable to data poisoning (intentional uploads of malicious code).<ref name="VB" /> According to a study by researchers from [[New York University]], approximately 40% of code generated by [[GitHub Copilot]] (which uses Codex) included glitches or other exploitable design flaws.<ref name="RegTC">{{Cite news|last=Claburn|first=Thomas|date=August 25, 2021|title=GitHub's Copilot may steer you into dangerous waters about 40% of the time study|work=[[The Register]]|url=https://www.theregister.com/2021/08/25/github_copilot_study/|access-date=2021-09-03}}</ref>
''VentureBeat'' stated that because Codex is trained on public data, it could be vulnerable to "data poisoning" via intentional uploads of malicious code.<ref name="VB" /> According to a study by researchers from [[New York University]], approximately 40% of code generated by [[GitHub Copilot]] (which uses Codex) in scenarios relevant to high-risk [[Common Weakness Enumeration|CWEs]] included glitches or other exploitable design flaws.<ref name="RegTC">{{cite arXiv |last1=Pearce |first1=Hammond |last2=Ahmad |first2=Baleegh |last3=Tan |first3=Benjamin |last4=Dolan-Gavitt |first4=Brendan |last5=Karri |first5=Ramesh |date=2021-12-16 |title=Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions |class=cs.CR |eprint=2108.09293 }}</ref>


===Copyright===
The [[Free Software Foundation]] has expressed concerns that code snippets generated by Copilot and Codex could unknowingly violate the terms of [[free software licenses]], such as the [[GPL]], which requires derivative works to be licensed under equivalent terms.<ref name="IW-FSF">{{Cite news|last=Krill|first=Paul|date=August 2, 2021|title=GitHub Copilot is ‘unacceptable and unjust, says Free Software Foundation|work=[[InfoWorld]]|url=https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html|access-date=2021-09-03}}</ref> Issues they raised include whether training on public repositories falls into [[fair use]] or not, how developers could discover infringing generated code, whether trained [[machine learning]] models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom.<ref name="IW-FSF" /><ref name="FSF">{{Cite news|last=Robertson|first=Donald|date=2021-07-28|title=FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021|work=[[Free Software Foundation]]|url=https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot|access-date=2021-09-04}}</ref> An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. One specific example has been raised, in which the model outputted the original code of the [[fast inverse square root]] algorithm, including comments and an incorrect copyright notice.<ref name="RegTA"/>
The [[Free Software Foundation]] expressed concerns that code snippets generated by Copilot and Codex could [[copyright infringement|violate copyright]], in particular the condition of the [[GPL]] that requires [[derivative work]]s to be licensed under equivalent terms.<ref name="IW-FSF">{{Cite news|last=Krill|first=Paul|date=August 2, 2021|title=GitHub Copilot is 'unacceptable and unjust,' says Free Software Foundation|work=[[InfoWorld]]|url=https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html|access-date=2021-09-03|archive-date=2021-09-03|archive-url=https://web.archive.org/web/20210903201419/https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html|url-status=live}}</ref> Issues they raised include whether training on public repositories falls into [[fair use]] or not, how developers could discover infringing generated code, whether trained [[machine learning]] models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom.<ref name="IW-FSF" /><ref name="FSF">{{Cite news|last=Robertson|first=Donald|date=2021-07-28|title=FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021|work=[[Free Software Foundation]]|url=https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot|access-date=2021-09-04|archive-date=2021-08-11|archive-url=https://web.archive.org/web/20210811003717/https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot|url-status=live}}</ref> An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. In one example the model outputted the training data code implementing the [[fast inverse square root]] algorithm, including comments and an incorrect [[copyright notice]].<ref name="RegTA"/>


In response, OpenAI has stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved."<ref name="RegTA" /> The copyright issues with Codex have been compared to the ''[[Authors Guild, Inc. v. Google, Inc.]]'' court case, in which judges ruled that [[Google Books]]'s use of text snippets from millions of [[Book scanning|scanned books]] constituted fair use.<ref name="RegTA" /><ref name="WIRED">{{Cite news|last=Barber|first=Gregory|date=July 12, 2021|title=GitHub’s Commercial AI Tool Was Built From Open Source Code|work=[[WIRED]]|url=https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/|access-date=2021-09-04}}</ref>
In response, OpenAI stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved."<ref name="RegTA" />
The copyright issues with Codex have been compared to the ''[[Authors Guild, Inc. v. Google, Inc.]]'' court case, in which judges ruled that [[Google Books]]'s use of text snippets from millions of [[Book scanning|scanned books]] constituted fair use.<ref name="RegTA" /><ref name="WIRED">{{Cite magazine|last=Barber|first=Gregory|date=July 12, 2021|title=GitHub's Commercial AI Tool Was Built From Open Source Code|magazine=[[WIRED]]|url=https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/|access-date=2021-09-04|archive-date=2021-07-25|archive-url=https://web.archive.org/web/20210725233825/https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/|url-status=live}}</ref> However, use of text snippets from books provides for a reliable reference of the copyright owner, as opposed to compiled works used for the training algorithm data where the final output is made without any such reference.


== References ==
== References ==
{{reflist}}
{{reflist}}

[[Category:Artificial intelligence]]
{{OpenAI navbox}}
[[Category:Neural network software]]

[[Category:Deep learning software applications]]
[[Category:Copyright infringement of software]]
[[Category:Generative pre-trained transformers]]
[[Category:OpenAI|Codex]]

Latest revision as of 01:56, 1 April 2024

OpenAI Codex is an artificial intelligence model developed by OpenAI. It parses natural language and generates code in response. It powers GitHub Copilot, a programming autocompletion tool for select IDEs, like Visual Studio Code and Neovim.[1] Codex is a descendant of OpenAI's GPT-3 model, fine-tuned for use in programming applications.

OpenAI released an API for Codex in closed beta.[1] In March 2023, OpenAI shut down access to Codex.[2] Due to public appeals from researchers, OpenAI reversed course.[3] The Codex model can still be used by researchers of the OpenAI Research Access Program.[4]

Capabilities

[edit]

Based on GPT-3, a neural network trained on text, Codex was additionally trained on 159 gigabytes of Python code from 54 million GitHub repositories.[5][6] A typical use case of Codex is for a user to type a comment, such as "//compute the moving average of an array for a given window size", then use the AI to suggest a block of code that satisfies that comment prompt.[7] OpenAI stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it. According to OpenAI's blog, Codex excels most at "mapping... simple problems to existing code", which they describe as "probably the least fun part of programming".[8][9] Jeremy Howard, co-founder of Fast.ai, stated that "Codex is a way of getting code written without having to write as much code", and that "it is not always correct, but it is just close enough".[10] According to a paper written by OpenAI researchers, when Codex attempted each test case 100 times, it generated working solutions for 70.2% of prompts.[11]

OpenAI claims that Codex can create code in over a dozen programming languages, including Go, JavaScript, Perl, PHP, Ruby, Shell, Swift, and TypeScript, though it is most effective in Python.[1] According to VentureBeat, demonstrations uploaded by OpenAI showed impressive coreference resolution capabilities. The demonstrators were able to create a browser game in JavaScript and generate data science charts using matplotlib.[9]

A very powerful language model called OpenAI Codex was created expressly to generate code in response to natural language commands. It is capable of understanding and producing code in a multitude of areas because it is compatible with a large number of programming languages and libraries. Codex is a useful tool for developers who want to optimize their coding processes because it can debug, parse natural language inquiries, and provide code completions.[12]

OpenAI showed that Codex can interface with services and apps such as Mailchimp, Microsoft Word, Spotify, and Google Calendar.[9][13] Microsoft is reportedly interested in exploring[vague] Codex's capabilities.[13]

Issues

[edit]

OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples.[9] In an interview with The Verge, OpenAI chief technology officer Greg Brockman said that "sometimes [Codex] doesn't quite know exactly what you're asking" and that it can require some trial and error.[13] OpenAI researchers found that Codex struggles with multi-step and higher-level[clarification needed] prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code.[11]

VentureBeat stated that because Codex is trained on public data, it could be vulnerable to "data poisoning" via intentional uploads of malicious code.[9] According to a study by researchers from New York University, approximately 40% of code generated by GitHub Copilot (which uses Codex) in scenarios relevant to high-risk CWEs included glitches or other exploitable design flaws.[14]

[edit]

The Free Software Foundation expressed concerns that code snippets generated by Copilot and Codex could violate copyright, in particular the condition of the GPL that requires derivative works to be licensed under equivalent terms.[15] Issues they raised include whether training on public repositories falls into fair use or not, how developers could discover infringing generated code, whether trained machine learning models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom.[15][16] An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. In one example the model outputted the training data code implementing the fast inverse square root algorithm, including comments and an incorrect copyright notice.[7]

In response, OpenAI stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved."[7]

The copyright issues with Codex have been compared to the Authors Guild, Inc. v. Google, Inc. court case, in which judges ruled that Google Books's use of text snippets from millions of scanned books constituted fair use.[7][17] However, use of text snippets from books provides for a reliable reference of the copyright owner, as opposed to compiled works used for the training algorithm data where the final output is made without any such reference.

References

[edit]
  1. ^ a b c Zaremba, Wojciech (August 10, 2021). "OpenAI Codex". OpenAI. Archived from the original on 2023-02-03. Retrieved 2021-09-03.
  2. ^ Kemper, Jonathan (2023-03-22). "OpenAI kills its Codex code model, recommends GPT3.5 instead". THE DECODER. Archived from the original on 2023-06-01. Retrieved 2023-03-29.
  3. ^ Logan Kilpatrick [@OfficialLoganK] (March 22, 2023). "Hey Carolyn, we will continue to support Codex access via our Researcher Access Program. Sorry for any confusion and hopefully the research is going well!" (Tweet). Retrieved 2023-04-08 – via Twitter.
  4. ^ "Researcher Access Program application". openai.com. Archived from the original on 2023-10-10. Retrieved 2023-04-08.
  5. ^ Wiggers, Kyle (July 8, 2021). "OpenAI warns AI behind GitHub's Copilot may be susceptible to bias". VentureBeat. Archived from the original on 2023-02-03. Retrieved 2021-09-03.
  6. ^ Alford, Anthony (August 31, 2021). "OpenAI Announces 12 Billion Parameter Code-Generation AI Codex". InfoQ. Archived from the original on 2022-07-09. Retrieved 2021-09-03.
  7. ^ a b c d Anderson, Tim; Quach, Katyanna (July 6, 2021). "GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it". The Register. Archived from the original on 2023-06-02. Retrieved 2021-09-04.
  8. ^ Dorrier, Jason (August 15, 2021). "OpenAI's Codex Translates Everyday Language Into Computer Code". SingularityHub. Archived from the original on 2023-05-26. Retrieved 2021-09-03.
  9. ^ a b c d e Dickson, Ben (August 16, 2021). "What to expect from OpenAI's Codex API". VentureBeat. Archived from the original on 2023-02-03. Retrieved 2021-09-03.
  10. ^ Metz, Cade (September 9, 2021). "A.I. Can Now Write Its Own Computer Code. That's Good News for Humans". The New York Times. Archived from the original on 2022-03-30. Retrieved 2021-09-16.
  11. ^ a b Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Pinto, Henrique Ponde de Oliveira; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex (2021-07-14). "Evaluating Large Language Models Trained on Code". arXiv:2107.03374 [cs].
  12. ^ "Best AI Headshot Generators". Retrieved 2024-03-12.
  13. ^ a b c Vincent, James (August 10, 2021). "OpenAI can translate English into code with its new machine learning software Codex". The Verge. Archived from the original on 2021-09-02. Retrieved 2021-09-03.
  14. ^ Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (2021-12-16). "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions". arXiv:2108.09293 [cs.CR].
  15. ^ a b Krill, Paul (August 2, 2021). "GitHub Copilot is 'unacceptable and unjust,' says Free Software Foundation". InfoWorld. Archived from the original on 2021-09-03. Retrieved 2021-09-03.
  16. ^ Robertson, Donald (2021-07-28). "FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021". Free Software Foundation. Archived from the original on 2021-08-11. Retrieved 2021-09-04.
  17. ^ Barber, Gregory (July 12, 2021). "GitHub's Commercial AI Tool Was Built From Open Source Code". WIRED. Archived from the original on 2021-07-25. Retrieved 2021-09-04.