To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.
© 2024. The Author(s).