OpenAI Codex: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:38, 12 November 2022 edit CitationCleanerBot (talk \| contribs) Bots 62,520 edits m →‎Issues: clean up Tag: AWB ← Previous edit		Latest revision as of 01:56, 1 April 2024 edit undo HeyElliott (talk \| contribs) Extended confirmed users 115,596 edits →‎References: WP:SORTKEY Tag: 2017 wikitext editor
(27 intermediate revisions by 26 users not shown)
Line 1: {{short description\|Artificial intelligence model geared towards programming}} '''OpenAI Codex''' is an [[artificial intelligence]] model developed by [[OpenAI]]. It parses natural language and generates [[computer program\|code]] in response. It ~~is used to power~~powers [[GitHub Copilot]], a programming [[autocompletion]] tool ~~developed~~ for select [[Integrated development environment\|IDEs]], like [[Visual Studio Code]] and [[Vim (text editor)\|Neovim]].<ref name="OAI">{{cite web\|last=Zaremba\|first=Wojciech\|author-link=Wojciech Zaremba\|date=August 10, 2021\|title=OpenAI Codex\|url=https://openai.com/blog/openai-codex/~~\|url-status=live~~\|access-date=2021-09-03\|website=[[OpenAI]]\|archive-date=2023-02-03\|archive-url=https://web.archive.org/web/20230203201912/https://openai.com/blog/openai-codex/\|url-status=live}}</ref> Codex is a descendant of OpenAI's [[GPT-3]] model, [[fine-tuning (machine learning)\|fine-tuned]] for use in programming applications. OpenAI released an [[API]] for Codex in [[closed beta]].<ref name="OAI" /> In March 2023, OpenAI shut down access to Codex.<ref>{{Cite web \|last=Kemper \|first=Jonathan \|date=2023-03-22 \|title=OpenAI kills its Codex code model, recommends GPT3.5 instead \|url=https://the-decoder.com/openai-kills-code-model-codex/ \|access-date=2023-03-29 \|website=THE DECODER \|language=en-US \|archive-date=2023-06-01 \|archive-url=https://web.archive.org/web/20230601195835/https://the-decoder.com/openai-kills-code-model-codex/ \|url-status=live }}</ref> Due to public appeals from researchers, OpenAI reversed course.<ref>{{Cite tweet \|user=OfficialLoganK \|author=Logan Kilpatrick \|number=1638336152800206858 \|title=Hey Carolyn, we will continue to support Codex access via our Researcher Access Program. Sorry for any confusion and hopefully the research is going well! \|access-date=2023-04-08}}</ref> The Codex model can still be used by researchers of the OpenAI Research Access Program.<ref>{{Cite web \|title=Researcher Access Program application \|url=https://openai.com/form/researcher-access-program \|access-date=2023-04-08 \|website=openai.com \|language=en-US \|archive-date=2023-10-10 \|archive-url=https://web.archive.org/web/20231010073704/https://openai.com/form/researcher-access-program \|url-status=live }}</ref> ~~OpenAI has released an [[API]] for Codex in [[closed beta]].<ref name="OAI" />~~ == Capabilities == Based on GPT-3, a [[neural network]] trained on text, Codex ~~has~~was additionally ~~been~~ trained on 159 gigabytes of [[Python (programming language)\|Python]] code from 54 million [[GitHub]] repositories.<ref name="VB-bias">{{Cite news\|last=Wiggers\|first=Kyle\|date=July 8, 2021\|title=OpenAI warns AI behind GitHub's Copilot may be susceptible to bias\|work=[[VentureBeat]]\|url=https://venturebeat.com/2021/07/08/openai-warns-ai-behind-githubs-copilot-may-be-susceptible-to-bias/\|access-date=2021-09-03\|archive-date=2023-02-03\|archive-url=https://web.archive.org/web/20230203201912/https://venturebeat.com/business/openai-warns-ai-behind-githubs-copilot-may-be-susceptible-to-bias/\|url-status=live}}</ref><ref name="IQ">{{Cite news\|last=Alford\|first=Anthony\|date=August 31, 2021\|title=OpenAI Announces 12 Billion Parameter Code-Generation AI Codex\|work=InfoQ\|url=https://www.infoq.com/news/2021/08/openai-codex/\|access-date=2021-09-03\|archive-date=2022-07-09\|archive-url=https://web.archive.org/web/20220709221205/https://www.infoq.com/news/2021/08/openai-codex/\|url-status=live}}</ref> A typical use case of Codex is ~~typing~~for a user to type a comment, such as "<code>//compute the moving average of an array for a given window size</code>", then ~~using~~use the AI to suggest a block of code ~~satisfying~~that satisfies that comment prompt.<ref name="RegTA">{{Cite news\|last1=Anderson\|first1=Tim\|last2=Quach\|first2=Katyanna\|date=July 6, 2021\|title=GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it\|work=[[The Register]]\|url=https://www.theregister.com/2021/07/06/github_copilot_autocoder_caught_spilling/\|access-date=2021-09-04\|archive-date=2023-06-02\|archive-url=https://web.archive.org/web/20230602214528/https://www.theregister.com/2021/07/06/github_copilot_autocoder_caught_spilling/\|url-status=live}}</ref> OpenAI ~~has~~ stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it;. ~~according~~According to OpenAI's blog, Codex excels most at "mapping [...] simple problems to existing code", which they describe as "probably the least fun part of programming".<ref name="SH">{{Cite news\|last=Dorrier\|first=Jason\|date=August 15, 2021\|title=OpenAI's Codex Translates Everyday Language Into Computer Code\|work=[[SingularityHub]]\|url=https://singularityhub.com/2021/08/15/openais-codex-translates-everyday-language-into-computer-code/\|access-date=2021-09-03\|archive-date=2023-05-26\|archive-url=https://web.archive.org/web/20230526045651/https://singularityhub.com/2021/08/15/openais-codex-translates-everyday-language-into-computer-code/\|url-status=live}}</ref><ref name="VB">{{Cite news\|last=Dickson\|first=Ben\|date=August 16, 2021\|title=What to expect from OpenAI's Codex API\|work=[[VentureBeat]]\|url=https://venturebeat.com/2021/08/16/what-to-expect-from-openais-codex-api/\|access-date=2021-09-03\|archive-date=2023-02-03\|archive-url=https://web.archive.org/web/20230203201913/https://venturebeat.com/ai/what-to-expect-from-openais-codex-api/\|url-status=live}}</ref> [[Jeremy Howard (entrepreneur)\|Jeremy Howard]], co-founder of [[Fast.ai]], stated that "[[Codex]] is a way of getting code written without having to write as much code", and that "it is not always correct, but it is just close enough".<ref name="NYT">{{Cite news\|last=Metz\|first=Cade\|date=September 9, 2021\|title=A.I. Can Now Write Its Own Computer Code. That's Good News for Humans.\|work=[[The New York Times]]\|url=https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.html\|access-date=2021-09-16\|archive-date=2022-03-30\|archive-url=https://web.archive.org/web/20220330010719/https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.html\|url-status=live}}</ref> According to a paper written by OpenAI researchers, when ~~attempting~~Codex attempted each test case 100 times, it generated working solutions for 70.2% of prompts ~~had working solutions~~.<ref name="arXiv">{{Cite arXiv\|last1=Chen\|first1=Mark\|last2=Tworek\|first2=Jerry\|last3=Jun\|first3=Heewoo\|last4=Yuan\|first4=Qiming\|last5=Pinto\|first5=Henrique Ponde de Oliveira\|last6=Kaplan\|first6=Jared\|last7=Edwards\|first7=Harri\|last8=Burda\|first8=Yuri\|last9=Joseph\|first9=Nicholas\|last10=Brockman\|first10=Greg\|last11=Ray\|first11=Alex\|date=2021-07-14\|title=Evaluating Large Language Models Trained on Code \|eprint=2107.03374 \|class=cs}}</ref> OpenAI claims that Codex iscan ~~able~~create ~~to function~~code in over a dozen programming languages, including [[Go (programming language)\|Go]], [[JavaScript]], [[Perl]], [[PHP]], [[Ruby (programming language)\|Ruby]], [[Shell (programming language)\|Shell]], [[Swift (programming language)\|Swift]], and [[TypeScript]], though it is most effective in Python.<ref name="OAI" /> According to ''[[VentureBeat]]'', demonstrations uploaded by OpenAI showed impressive [[coreference resolution]] capabilities. The demonstrators were able to create a [[browser game]] in JavaScript and generate data science charts using [[matplotlib]].<ref name="VB" /> A very powerful language model called OpenAI Codex was created expressly to generate code in response to natural language commands. It is capable of understanding and producing code in a multitude of areas because it is compatible with a large number of programming languages and libraries. Codex is a useful tool for developers who want to optimize their coding processes because it can debug, parse natural language inquiries, and provide code completions.<ref>{{Cite web \|title= Best AI Headshot Generators \|url=https://supermachines.io/best-ai-headshot-generators \|access-date=2024-03-12 \|language=en-US}}</ref> OpenAI has shown that Codex is able to interface with services and apps such as [[Mailchimp]], [[Microsoft Word]], [[Spotify]], and [[Google Calendar]].<ref name="VB" /><ref name="Verge">{{Cite news\|last=Vincent\|first=James\|date=August 10, 2021\|title=OpenAI can translate English into code with its new machine learning software Codex\|work=[[The Verge]]\|url=https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access\|access-date=2021-09-03}}</ref> [[Microsoft]] is reportedly interested in exploring Codex's capabilities.<ref name="Verge" />▼ ▲OpenAI ~~has shown~~showed that Codex ~~is able to~~can interface with services and apps such as [[Mailchimp]], [[Microsoft Word]], [[Spotify]], and [[Google Calendar]].<ref name="VB" /><ref name="Verge">{{Cite news\|last=Vincent\|first=James\|date=August 10, 2021\|title=OpenAI can translate English into code with its new machine learning software Codex\|work=[[The Verge]]\|url=https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access\|access-date=2021-09-03\|archive-date=2021-09-02\|archive-url=https://web.archive.org/web/20210902142401/https://www.theverge.com/2021/8/10/22618128/openai-codex-natural-language-into-code-api-beta-access\|url-status=live}}</ref> [[Microsoft]] is {{vague\|text=reportedly interested in exploring\|reason=this sentence says only marginally more than nothing\|date=March 2023}} Codex's capabilities.<ref name="Verge" /> == Issues == OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples.<ref name="VB" /> In an interview with ''[[The Verge]]'', OpenAI [[chief technology officer]] [[Greg Brockman]] said that "sometimes [Codex] doesn't quite know exactly what you're asking" and that it can require some trial and error.<ref name="Verge" /> OpenAI researchers found that Codex struggles with multi-step and {{clarify\|text=higher-level\|date=March 2023}} prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code.<ref name="arXiv" /> ''VentureBeat'' ~~has~~ stated that because Codex is trained on public data, it could be vulnerable to "data poisoning" via intentional uploads of malicious code.<ref name="VB" /> According to a study by researchers from [[New York University]], approximately 40% of code generated by [[GitHub Copilot]] (which uses Codex) in scenarios relevant to high-risk [[Common Weakness Enumeration\|CWEs]] included glitches or other exploitable design flaws.<ref name="RegTC">{{cite ~~arxiv~~arXiv \|~~last~~last1=Pearce \|~~first~~first1=Hammond \|last2=Ahmad \|first2=Baleegh \|last3=Tan \|first3=Benjamin \|last4=Dolan-Gavitt \|first4=Brendan \|last5=Karri \|first5=Ramesh \|date=2021-12-16 \|title=Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions \|~~arxiv~~class=cs.CR \|eprint=2108.09293 }}</ref> ===Copyright=== The [[Free Software Foundation]] ~~has~~ expressed concerns that code snippets generated by Copilot and Codex could ~~unknowingly~~ [[copyright infringement\|violate copyright]], ~~and~~ in particular the condition of the [[GPL]] that requires [[derivative work]]s to be licensed under equivalent terms.<ref name="IW-FSF">{{Cite news\|last=Krill\|first=Paul\|date=August 2, 2021\|title=GitHub Copilot is 'unacceptable and unjust,' says Free Software Foundation\|work=[[InfoWorld]]\|url=https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html\|access-date=2021-09-03\|archive-date=2021-09-03\|archive-url=https://web.archive.org/web/20210903201419/https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html\|url-status=live}}</ref> Issues they raised include whether training on public repositories falls into [[fair use]] or not, how developers could discover infringing generated code, whether trained [[machine learning]] models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom.<ref name="IW-FSF" /><ref name="FSF">{{Cite news\|last=Robertson\|first=Donald\|date=2021-07-28\|title=FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021\|work=[[Free Software Foundation]]\|url=https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot\|access-date=2021-09-04\|archive-date=2021-08-11\|archive-url=https://web.archive.org/web/20210811003717/https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot\|url-status=live}}</ref> An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. ~~One~~In ~~specific~~one example ~~has been raised, in which~~ the model outputted the ~~original~~training data code ofimplementing the [[fast inverse square root]] algorithm, including comments and an incorrect [[copyright notice]].<ref name="RegTA"/> In response, OpenAI ~~has~~ stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved."<ref name="RegTA" /> The copyright issues with Codex have been compared to the ''[[Authors Guild, Inc. v. Google, Inc.]]'' court case, in which judges ruled that [[Google Books]]'s use of text snippets from millions of [[Book scanning\|scanned books]] constituted fair use.<ref name="RegTA" /><ref name="WIRED">{{Cite magazine\|last=Barber\|first=Gregory\|date=July 12, 2021\|title=GitHub's Commercial AI Tool Was Built From Open Source Code\|magazine=[[WIRED]]\|url=https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/\|access-date=2021-09-04\|archive-date=2021-07-25\|archive-url=https://web.archive.org/web/20210725233825/https://www.wired.com/story/github-commercial-ai-tool-built-open-source-code/\|url-status=live}}</ref> However, use of text snippets from books provides for a reliable reference of the copyright owner, as opposed to compiled works used for the training algorithm data where the final output is made without any such reference. == References == {{reflist}} {{OpenAI navbox}} ~~[[Category:Artificial intelligence]]~~ [[Category:Deep learning software applications]] [[Category:Copyright infringement of software]] [[Category:Generative pre-trained transformers]] [[Category:OpenAI\|Codex]]