Background The generation of innovative research ideas is crucial to advancing the field of medicine. As physicians face increasingly demanding clinical schedules, it is important to identify tools that may expedite the research process. Artificial intelligence may offer a promising solution by enabling the efficient generation of novel research ideas. This study aimed to assess the feasibility of using artificial intelligence to build upon existing knowledge by generating innovative research questions. Methods A comparative evaluation study was conducted to assess the ability of AI models to generate novel research questions. The prompt "research ideas for adolescent idiopathic scoliosis" was input into ChatGPT 3.5, Gemini 1.5, Copilot, and Llama 3. This resulted in an output of several research questions ranging from 10 questions to 14 questions. A keyword-friendly modified version of the AI-generated responses was searched in the PubMed database. Results were limited to manuscripts published in the English language from the year 2000 to the present. Each response was then cross-referenced to the PubMed search results and assigned an originality score of 0-5, with 0 being the most original and 5 being not original at all, by adding one numerical value for each paper already published on the topic. The mean originality scores were calculated manually by summing the originality scores from all the responses from each AI model and then dividing that sum by the respective number of prompts generated by the AI. The standard deviation of the originality scores for each AI was calculated using the standard deviation function (STDEV) function in Google Sheets (Google, Mountain View, California). Each AI was also evaluated on its percent novelty, the percentage of total generated responses that yielded an originality score of 0 when searched in PubMed. Results Each AI produced varying numbers of research prompts that were inputted into PubMed. The mean originality scores for ChatGPT, Gemini, Copilot, and Llama were 4.2 ± 1.9, 4.1 ± 1.3, 4.0 ± 1.6, and 3.8 ± 1.7, respectively. Of ChatGPT's 12 prompts, 16.67% were completely novel (no prior research had been conducted on the topic provided by the AI model). 10.00% of Copilot's 10 prompts were completely novel, and 8.33% of Llama's 12 prompts were completely novel. None of Gemini's fourteen responses yielded an originality score of 0. Conclusions Our findings demonstrate that ChatGPT, Llama, and Copilot are capable of generating novel ideas in orthopaedics research. As these models continue to evolve and become even more refined with time, physicians and scientists should consider incorporating them when brainstorming and planning their research studies.
Keywords: adolescent idiopathic scoliosis (ais); aritifical intelligence; chatgpt-3.5; ortho-surgery; research design.
Copyright © 2024, Leonardo et al.