Using Large Language Models in Public Transit Systems: San Antonio as a Case Study thanks: Citation: Authors. Title. Pages…. DOI:000000/11111.

Ramya Jonnala
Texas A&M University, San Antonio
San Antonio, Texas
[email protected]
\AndGongbo Liang, Jeong Yang, Izzat Alsmadi
Texas A&M University, San Antonio
San Antonio, Texas
{gliang, JYang, ialsmadi}tamusa.edu
Abstract

The integration of large language models (LLMs) into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio’s public transit system, leveraging their capabilities in natural language processing, data analysis, and real-time communication. By utilizing GTFS and other public transportation information, the research highlights the transformative potential of LLMs in enhancing route planning, reducing wait times, and providing personalized travel assistance. Our case study is the city of San Antonio as part of a project aiming to demonstrate how LLMs can optimize resource allocation, improve passenger satisfaction, and support decision-making processes in transit management. We evaluated LLM responses to questions related to both information retrieval and also understanding. Ultimately, we believe that the adoption of LLMs in public transit systems can lead to more efficient, responsive, and user-friendly transportation networks, providing a model for other cities to follow.

Keywords Large Language Models  \cdot GTFS \cdot Public Transit Systems

1 Introduction

The advent of artificial intelligence (AI) and machine learning (ML) has ushered in a new era of technological advancements that are transforming various sectors, such as cyber security [1, 2, 3], healthcare [4, 5, 6], and public transportation [7, 8, 9]. Among these innovations, large language models (LLMs), such as OpenAI’s GPT series [10, 11], have demonstrated exceptional capabilities in natural language processing, understanding, and generation. These models can analyze vast amounts of data [12, 13], generate human-like text [14], and facilitate complex decision-making processes [15, 16], making them potentially invaluable tools for enhancing public transit systems.

Public transit systems are the backbone of urban mobility, providing essential services to millions of passengers daily [17, 18]. Efficient and reliable public transportation is crucial for reducing traffic congestion, minimizing environmental impact, and promoting equitable access to mobility [19]. However, transit agencies often face challenges such as fluctuating passenger demand, route optimization, real-time communication with passengers, and efficient resource allocation [20, 21]. Traditional methods of addressing these issues may fall short due to their limited scalability and adaptability.

San Antonio, one of the fastest-growing cities in the United States, presents a unique case study for examining the integration of LLMs in public transit. The city’s rapid population growth has increased the demand for efficient public transportation solutions [22, 23]. The deployment of LLMs offers a promising avenue for addressing current challenges and future demands in public transportation as well as many other domains.

This study aims to investigate the potential of LLMs to improve various aspects of San Antonio’s public transit system. Below are some of the potentials from employing LLMs in public transportation:

  • Optimize Route Planning and Scheduling: Evaluating how LLMs can analyze historical and real-time data to optimize routes and schedules, thereby reducing wait times and improving service reliability.

  • Enhance Passenger Communication: Exploring the use of LLMs for real-time interaction with passengers, providing personalized travel assistance, updates, and recommendations.

  • Improve Operational Efficiency: Assessing the impact of LLMs on resource allocation, including the deployment of buses and drivers, to enhance overall operational efficiency.

2 Significance of the Study

The integration of LLMs in public transit systems holds the potential to revolutionize urban mobility by making transportation more efficient, responsive, and user-friendly. This study not only contributes to the academic understanding of AI applications in transportation but also provides practical insights for transit authorities and policymakers. By focusing on San Antonio, a city representative of many growing urban areas, the findings can be generalized and applied to other cities facing similar challenges.

Furthermore, the research highlights the broader implications of AI in public services, emphasizing the importance of ethical considerations, data privacy, and the need for continuous evaluation and improvement. As cities worldwide grapple with the complexities of modern urbanization, the lessons learned from San Antonio’s experience with LLMs can serve as a valuable guide for future innovations in public transit systems.

In conclusion, this study endeavors to bridge the gap between cutting-edge AI technologies and practical applications in public transportation, demonstrating how LLMs can be harnessed to create smarter, more adaptive, and passenger-centric transit networks. The following sections delve deeper into the theoretical framework, detailed methodology, findings, and implications of this transformative approach to public transit management.

3 Related Work

The integration of large language models (LLMs) like OpenAI’s GPT-4 into public transit systems is a burgeoning field that aims to enhance the efficiency, accessibility, and user experience of public transportation. LLMs can process and analyze vast amounts of data, generate human-like text, and understand complex queries, making them suitable for a range of applications in public transit. This literature review explores the current state of research on the deployment of LLMs in public transit systems, focusing on areas such as passenger information services, operational efficiency, and accessibility improvements.

One of the primary applications of LLMs in public transit is in improving passenger information services. Studies have demonstrated that LLMs can enhance the quality and accuracy of real-time information provided to passengers. For instance, researchers explored the use of GPT in generating real-time updates and personalized travel advice for passengers, [24], [25], [26], [27]. Their findings indicated that LLMs could effectively handle complex passenger queries and provide accurate, context-aware responses, thereby improving the overall passenger experience.

Furthermore, researchers highlighted the potential of LLMs in multilingual support for transit systems, [28], [29], [30]. Given the diverse linguistic backgrounds of urban populations, LLMs like GPT-4 can be trained to provide information in multiple languages, ensuring that non-native speakers have equal access to transit information. This capability not only improves user satisfaction but also promotes inclusivity and accessibility.

The paper, [31] presents an evaluation of large language models (LLMs), specifically ChatGPT, in interpreting and retrieving information from General Transit Feed Specification (GTFS) data. The study demonstrates that ChatGPT can effectively understand and respond to various queries about public transit schedules and services, showcasing its potential in enhancing transit information systems. However, the paper also highlights areas for improvement, such as the model’s occasional inaccuracies and the need for further fine-tuning to handle complex and domain-specific transit queries more reliably.

The paper, [32] explores the potential of using ChatGPT and similar large language models (LLMs) to revolutionize intelligent transportation systems. It argues that LLMs could significantly enhance various aspects of transportation, such as traffic management, passenger assistance, and operational efficiency, but also points out the challenges related to data privacy, model accuracy, and integration with existing systems.

4 Goals and Approaches

Most LLMs today rely on learning-based methods. For example, the well-known ChatGPT [10] leverages the Transformer [33] architecture and generative pre-training (GPTs) [34, 35, 36, 11]. The output these models is inherently tied to the data they were trained on. Consequently, incorrect LLM responses can stem from multiple factors, such as limited information on a specific topic within the pre-training data or an LLM architecture (including its embedding method) incapable of correctly processing the user’s input. Therefore, differentiating between pre-trained models and architectures is crucial when evaluating learning-based LLMs.

This project aims to assess LLMs’ ability to understand GTFS (General Transit Feed Specification) and other public transportation information in two ways:

  1. 1.

    Performance of Common Pre-trained Models: We will evaluate a pre-trained LLM model "as-is" by posing transportation-related questions and analyzing the accuracy of its responses. This assesses the model’s ability to leverage its existing knowledge of GTFS data and public transportation information. Errors in this experiment might indicate either limited information within the pre-training dataset on the topic or an LLM architecture unsuited for handling the specific topic or questions. We denote this as the "understanding" task in our experiments.

  2. 2.

    Impact of LLM Architecture: To delve deeper into the cause of errors, we propose a second experiment, assuming the LLM models have not encountered relevant information during pre-training. Before posing a specific transportation-related question, we will provide the necessary GTFS data and public transportation information to the LLMs and instruct them to answer based on the provided information. We will then re-ask the questions that resulted in failures during the first experiment. We denote this as the "information retrieval" task in our experiments.

The findings from these tasks will offer valuable insights into the cause of errors. For instance, if the LLMs can answer the questions correctly in the second experiment but not the first, it suggests insufficient pre-training data on the specific topic within the models. Conversely, the results might indicate that even with adequate data, the LLM models struggle with the questions, potentially due to architectural limitations.

5 Experiments and Analysis

5.1 Experiment Setup

This project specifically investigates the ability of LLMs to understand GTFS and public transportation information in the context of San Antonio’s public transportation system. We leverage OpenAI’s ChatGPT as the representative LLM due to its widespread public availability through both a web portal and a programmatic API. We designed a set of 275 questions specifically tailored to San Antonio’s public transportation system. These questions are used to evaluate the LLM’s performance in two key areas: 1) Understanding and 2) Information Retrieval (IR).

The Understanding task assesses how well the pre-trained ChatGPT model can comprehend and respond to questions about San Antonio’s public transportation system (Goal #1 in Section 4). In contrast, the IR task examines the impact of LLM architecture on retrieving relevant information from a provided dataset (Goal #2 in Section 4).

For our Understanding task, we employ 195 original multiple-choice questions (MCQs) with single correct answers that were meticulously crafted and span across the six question categories. The benchmarking dataset with all questionaires is made available to the public (see Appendix I). The breakdown of the number of questions in each category is presented in Table 1. We derived these questions and categories using the official GTFS Schedule documentation 111https://gtfs.org/schedule/reference.

Question Type Number of Questions
Term Definitions 14
Common Reasoning 28
File Structure 17
Attribute Mapping 32
Data Structure 30
Categorial Mapping 74
Total 195
Table 1: GTFS Understanding Benchmarking dataset questionnaire and their categories
S.No Kategorie Typ Question
1 Categorial Mapping Original In the "trips.txt" file, what is the meaning of "wheelchair_accessible" 0 or empty? a) No accessibility information for the trip b) Vehicle being used on this particular trip can accommodate at least one rider in a wheelchair c) No riders in wheelchairs can be accommodated on this trip d) Stop cannot be accessed by anyone A question
2 Attribute Mapping Original In which file does the shape_dist_traveled attribute appear in GTFS? a) stops.txt b) shapes.txt c) trips.txt d) stop_times.txt A question
3 Common Reasoning Original Can a GTFS feed contain multiple agency information? a) Each agency should publish a seperate GTFS. b) No, GTFS feeds can only represent a single agency. c) Multiple agency information is specified in the "agency.txt" file. d) Agencies are not relevant in GTFS feeds. A question
4 Data Structure Original How is the wheelchair_accessible attribute represented in GTFS? a) Boolean (true or false) b) Float (number of accessible seats) c) Enum (e.g., 0,1,2) d) Text representation of wheelchair accessibility …
5 File Structure Original What is the purpose of the "transfers.txt" file in GTFS? a) It contains information about fare rules and transfers. b) It provides details about the geographic shapes of routes. c) It specifies the frequency of trips. d) It provides real-time arrival and departure information.
6 Term Definition Original What is a dataset in the context of GTFS? a) A single file containing all transit information b) A collection of tables representing different entities c) A specific date for transit service d) A record representing a transit agency
7 Attribute Mapping Augmented In which file can you find the route_desc attribute in GTFS? a) stops.txt b) None of these c) trips.txt d) calendar.txt
8 Categorial Mapping Augmented What value is used in the "wheelchair_boarding" field of the "stops.txt" file to indicate that the stop has no information regarding wheelchair accessibility? a) 0 b) 1 c) None of these d) 3
9 Common Reasoning Augmented How does GTFS handle multiple trips on the same route at the same time? a) GTFS does not allow multiple trips on the same route at the same time. b) None of these c) Multiple trips are represented as separate routes in GTFS. d) GTFS relies on real-time updates to handle such cases.
10 Data Structure Augmented What data type is used for the stop_sequence attribute in GTFS? a) None of these b) Time c) Text d) Integer
Table 2: Ten questions that are used in this study.

In evaluating the LLM’s performance on MCQs, the model selects the answer (choice) with the highest probability for each question and output that without the need for any explanation. Although the LLM may always choose the correct answer when it is present, the LLM could opt for an alternate option when the correct choice is missing. To check the LLM’s robustness, we generate an augmented question set by creating variations of the original questions. Specifically, each original answer choice denoted as ‘a’, ‘b’, ‘c’, and ‘d’—is replaced one at a time with the phrase ‘None of these,’ resulting in additional 780 (195×4) variant questions and a total of 975 questions in the augmented dataset. The augmentation aims to evaluate how well the LLM can adapt to scenarios where the correct answer is removed.Refer to the above table for the examples of augmented questions

The ‘GTFS Retrieval’ benchmark employs a question-answer (QA) format, where no options are given and the LLM is supposed to give a single, correct answer. To prepare the questionnaire, we used the San Antonio VIA GTFS feed. The full feed included data on 98 bus routes. However, LLMs have limited context length: a metric for the number of tokens the LLM can process at once. The GPT-3.5-Turbo and GPT-4o have a maximum context length of 16,385 and 128k tokens respectively. The full GTFS feed is much larger than either LLM can accept, so we trim the dataset to just three bus routes (‘242’, ‘243’, and ‘246’) and 34 trips on these routes. These routes have 60 unique stops.

These questions in this benchmark range from basic lookup (single or multiple files) to performing data manipulations by the LLM. These include common data manipulation techniques like filtering, sorting, grouping, and joining. We divide the questions into two categories:

  • Simple: These questions are based on simple lookups within the same file or two different files (using relational keys) within GTFS. Example: What route_type corresponds to route_id 243?

  • Complex : These questions need multiple files to extract information, require a deeper understanding, and could be open-ended. Example: Tell the route_long_name in which there is a stop_name as "GILLETTE & PLEASANTON RD."?

5.2 Experimental Result

In this study, we benchmarked both GPT-3.5-Turbo and GPT-4o on the original and augmented ‘GTFS Understanding’ dataset. Using the zeroshot learning (ZS) technique, the LLM attempts to answer the questions without been explicitly trained on. The accuracy of ZS on different categories of questions for both GPT-3.5-Turbo and GPT-4o is shown in Figure 1. The accuracy across both LLMs and all categories are higher on the original dataset than on the augmented dataset except for Attribute Mapping category. This indicates that LLMs might not be robust to option substitution. But for the original dataset, the accuracy of GPT-4o is equal or less than GPT-3.5-turbo.

The discussions in the remainder of the paper are focused to the augmented dataset alone. Overall, GPT-4o performs better than GPT-3.5-Turbo, with above 88% accuracy in “File Structure”, above 98% accuracy in "Attribute Mapping, above 80% in “Term Definitions”, and above 75% accuracy in “Data Structure” and “Common Reasoning”. However, GPT-4o and GPT-3.5-turbo achieve a below 50% accuracy for “Categorical Mapping”. The GPT-3.5-Turbo has around 70% accuracy for all categories, except “Categorical Mapping”, which has the worst accuracy for both LLMs.

Refer to caption
Figure 1: Summary of performance by question category for GPT-3.5-Turbo and GPT-4o on GTFS Understanding.

Similar to testing the understanding of GTFS, we pose questions to the LLM to see its capabilities in information retrieval. A total of 80 questions were posed with 42 simple and 38 complex questions. Using the ZS technique, posed these questions. Before posing these questions, extracted the content from all the files of the filtered data. The extracted content and questions were posed to the gpt API. The results in the Figure 2 shows that the accuracy of gpt-4o is significantly better than gpt-3.5-turbo

Refer to caption
Figure 2: Summary of performance by question type for GPT 3.5-turbo and GPT-4o on GTFS Retrieval Benchmark

For simple type question, the accuracy of gpt-4o is 1̃5% higher than gpt-3.5-turbo and for the complex type question, the accuracy of gpt-4o is 8̃% higher than gpt-3.5-turbo. The overall perfomance of gpt-4o in the IR task is very much better than gpt-3.5-turbo model

6 Conclusion and Future Work

This work evaluates the ability of Large Language Models (LLMs) to understand public transportation information through two tasks: "understanding" and "information retrieval." The LLMs achieved accuracy ranging from 47.97%percent47.9747.97\%47.97 % to 98.44%percent98.4498.44\%98.44 % on the understanding task and 60.53%percent60.5360.53\%60.53 % to 90.48%percent90.4890.48\%90.48 % on information retrieval. The high performance on some understanding tasks suggests that pre-trained LLM models have acquired a significant amount of transportation-related information from their training datasets. However, the large gap between the best and worst performing tasks also indicates that the models might have been trained on an imbalanced dataset, with significantly less information on certain areas. While relevant information is given, modern LLM models can handle task about to unknown data, suggested by the high performance on the information retrieval task. However, their ability to do so seems to be significantly reduced when the task complexity increases.

This work demonstrated the use of large language models in public transit systems holds great promise for transforming how these systems operate and serve their users. From improving passenger information services and operational efficiency to enhancing accessibility, LLMs offer a wide range of applications that can significantly benefit public transit. However, the large performance gaps between the best and worst performing tasks needed to be address before using it in the real-world. In addition, addressing ethical concerns and ensuring the responsible use of these technologies will be essential as this field continues to evolve. With continued research and development, LLMs have the potential to play a pivotal role in the future of public transportation.

7 Acknowledgment

This work is supported by the National Science Foundation under Grant No. 2131193. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

References

  • [1] Izzat Alsmadi, Kashif Ahmad, Mahmoud Nazzal, Firoj Alam, Ala Al-Fuqaha, Abdallah Khreishah, and Abdulelah Algosaibi. Adversarial nlp for social network applications: Attacks, defenses, and research directions. IEEE Transactions on Computational Social Systems, 10(6):3089–3108, 2022.
  • [2] Rasheed Ahmad, Izzat Alsmadi, Wasim Alhamdani, and Lo’ai Tawalbeh. A deep learning ensemble approach to detecting unknown network attacks. Journal of Information Security and Applications, 67:103196, 2022.
  • [3] Gongbo Liang, Jesus Guerrero, Fengbo Zheng, and Izzat Alsmadi. Enhancing neural text detector robustness with μ𝜇\muitalic_μ attacking and rr-training. Electronics, 12(8):1948, 2023.
  • [4] Gongbo Liang, Xiaoqin Wang, Yu Zhang, Xin Xing, Hunter Blanton, Tawfiq Salem, and Nathan Jacobs. Joint 2d-3d breast cancer classification. In 2019 IEEE International Conference on Bioinformatics and biomedicine (BIBM), pages 692–696. IEEE, 2019.
  • [5] Xin Xing, Gongbo Liang, Chris Wang, Nathan Jacobs, and Ai-Ling Lin. Self-supervised learning application on covid-19 chest x-ray image classification using masked autoencoder. Bioengineering, 10(8):901, 2023.
  • [6] Liangliang Liu, Jing Chang, Gongbo Liang, and Shufeng Xiong. Simulated quantum mechanics-based joint learning network for stroke lesion segmentation and tici grading. IEEE Journal of Biomedical and Health Informatics, 2023.
  • [7] Mei Chen, Armin Hadzic, Weilian Song, and Nathan Jacobs. Applications of deep machine learning to highway safety and usage assessment. In Transportation Research Board Workshop (Sponsored by AED50), January 2021. (oral).
  • [8] Nawaf O Alsrehin, Mohit Gupta, Izzat Alsmadi, and Saif Addeen Alrababah. U2-net: A very-deep convolutional neural network for detecting distracted drivers. Applied Sciences, 13(21):11898, 2023.
  • [9] Gongbo Liang, Janet Zulu, Xin Xing, and Nathan Jacobs. Unveiling roadway hazards: Enhancing fatal crash risk estimation through multiscale satellite imagery and self-supervised cross-matching. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:535–546, 2023.
  • [10] Introducing chatgpt. OpenAI.com, 2022.
  • [11] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  • [12] Dinesh Kalla, Nathan Smith, Fnu Samaah, and Sivaraju Kuraku. Study and analysis of chat gpt and its impact on different fields of study. International journal of innovative science and research technology, 8(3), 2023.
  • [13] Osamah Mohammed Alyasiri, Ahmed Hussein Ali, et al. Exploring gpt-4’s characteristics through the 5vs of big data: A brief perspective. Babylonian Journal of Artificial Intelligence, 2023:5–9, 2023.
  • [14] Graziella Orrù, Andrea Piarulli, Ciro Conversano, and Angelo Gemignani. Human-like problem-solving abilities in large language models using chatgpt. Frontiers in artificial intelligence, 6:1199350, 2023.
  • [15] Alexandre Goossens and Jan Vanthienen. Integrating gpt-technologies with decision models for explainability. In World Conference on Explainable Artificial Intelligence, pages 428–448. Springer, 2023.
  • [16] Wenbo Li, Mingshu Fu, Siyu Liu, and Hongyu Yu. Revolutionizing neurosurgery with gpt-4: a leap forward or ethical conundrum? Annals of Biomedical Engineering, 51(10):2105–2112, 2023.
  • [17] Thomas Abdallah. Sustainable mass transit: Challenges and opportunities in urban public transportation. 2023.
  • [18] Meng Xu, Tao Liu, Shao-Peng Zhong, and Yu Jiang. Urban smart public transport studies: a review and prospect. J. Transp. Syst. Eng. Inf. Technol., 22(2):91–108, 2022.
  • [19] Yujie Guo, Zhiwei Chen, Amy Stuart, Xiaopeng Li, and Yu Zhang. A systematic overview of transportation equity in terms of accessibility, traffic emissions, and safety outcomes: From conventional to emerging technologies. Transportation research interdisciplinary perspectives, 4:100091, 2020.
  • [20] Jiamin Zhang. Agent-based optimizing match between passenger demand and service supply for urban rail transit network with netlogo. IEEE Access, 9:32064–32080, 2021.
  • [21] Ailing Huang, Ziqi Dou, Liuzi Qi, and Lewen Wang. Flexible route optimization for demand-responsive public transit service. Journal of Transportation Engineering, Part A: Systems, 146(12):04020132, 2020.
  • [22] Beto Altamirano, Javier Paredes, and Joey Pawlik. To thrive, san antonio must enhance, redefine transportation. https://www.expressnews.com/opinion/commentary/article/san-antonio-transportation-mobility-planning-18710066.php, 2024. Accessed: 2024-06-22.
  • [23] Colin Hounston. San antonio needs to fund public transit. https://trinitonian.com/2023/02/23/san-antonio-needs-to-fund-public-transit/, 2023. Accessed: 2024-06-22.
  • [24] Alexandros Papangelis, Mahdi Namazifar, Chandra Khatri, Yi-Chia Wang, Piero Molino, and Gokhan Tur. Plato dialogue system: A flexible conversational ai research platform. arXiv preprint arXiv:2001.06463, 2020.
  • [25] Gokul Yenduri, M Ramalingam, G Chemmalar Selvi, Y Supriya, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, G Deepti Raj, Rutvij H Jhaveri, B Prabadevi, Weizheng Wang, et al. Gpt (generative pre-trained transformer)–a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 2024.
  • [26] Stefan Voß. Bus bunching and bus bridging: What can we learn from generative ai tools like chatgpt? Sustainability, 15(12):9625, 2023.
  • [27] Ruhul Amin Khalil, Ziad Safelnasr, Naod Yemane, Mebruk Kedir, Atawulrahman Shafiqurrahman, and Nasir Saeed. Advanced learning technologies for intelligent transportation systems: Prospects and challenges. IEEE Open Journal of Vehicular Technology, 2024.
  • [28] Amin Ullah, Guilin Qi, Saddam Hussain, Irfan Ullah, and Zafar Ali. The role of llms in sustainable smart cities: Applications, challenges, and future directions. arXiv preprint arXiv:2402.14596, 2024.
  • [29] Pravneet Kaur, Gautam Siddharth Kashyap, Ankit Kumar, Md Tabrez Nafis, Sandeep Kumar, and Vikrant Shokeen. From text to transformation: A comprehensive review of large language models’ versatility. arXiv preprint arXiv:2402.16142, 2024.
  • [30] Ou Zheng, Mohamed Abdel-Aty, Dongdong Wang, Chenzhu Wang, and Shengxuan Ding. Trafficsafetygpt: Tuning a pre-trained large language model to a domain-specific expert in transportation safety. arXiv preprint arXiv:2307.15311, 2023.
  • [31] Saipraneeth Devunuri, Shirin Qiam, and Lewis Lehe. Chatgpt for gtfs: Benchmarking llms on gtfs understanding and retrieval.
  • [32] Ou Zheng, Mohamed Abdel-Aty, Dongdong Wang, Zijin Wang, and Shengxuan Ding. Chatgpt is on the horizon: Could a large language model be all we need for intelligent transportation? arXiv preprint arXiv:2303.05382, 2023.
  • [33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  • [34] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. 2018.
  • [35] Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, and Ilya Sutskever. Better language models and their implications. OpenAI blog, 1(2), 2019.
  • [36] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.