Predictions of hospitalizations can help in the development of applications for health insurance, hospitals, and medicine. The data collected by health insurance has potential that is not always explored, and extracting features from it for use in machine learning applications requires demanding processes and specialized knowledge. With the emergence of large language models (LLM) there are possibilities to use this data for a wide range of applications requiring little specialized knowledge. To do this, it is necessary to organize and prepare this data to be used by these models. Therefore, in this work, an approach is presented for using data from health insurance in LLMs with the objective of predict hospitalizations. As a result, pre-trained models were generated in Portuguese and English with health insurance data that can be used in several applications. To prove the effectiveness of the models, tests were carried out to predict hospitalizations in general and due to stroke. For hospitalizations in general, F1-Score = 87.8 and AUC = 0.955 were achieved, and for hospitalizations due to stroke, the best model achieved F1-Score = 88.7 and AUC of 0.964. Considering the potential for use, the models were made available to the scientific community.
Keywords: BERT; Health insurance; Hospitalization; LLaMA; Large language models; Machine learning; RoBERTa; Strokes.
© 2024. International Federation for Medical and Biological Engineering.