Background: Medical narratives are fundamental to the correct identification of a patient's health condition. This is not only because it describes the patient's situation. It also contains relevant information about the patient's context and health state evolution. Narratives are usually vague and cannot be categorized easily. On the other hand, once the patient's situation is correctly identified based on a narrative, it is then possible to map the patient's situation into precise classification schemas and ontologies that are machine-readable. To this end, language models can be trained to read and extract elements from these narratives. However, the main problem is the lack of data for model identification and model training in languages other than English. First, gold standard annotations are usually not available due to the high level of data protection for patient data. Second, gold standard annotations (if available) are difficult to access. Alternative available data, like MIMIC (Sci Data 3:1, 2016) is written in English and for specific patient conditions like intensive care. Thus, when model training is required for other types of patients, like oncology (and not intensive care), this could lead to bias. To facilitate clinical narrative model training, a method for creating high-quality synthetic narratives is needed.
Method: We devised workflows based on generative AI methods to synthesize narratives in the German language to avoid the disclosure of patient's health data. Since we required highly realistic narratives, we generated prompts, written with high-quality medical terminology, asking for clinical narratives containing both a main and co-disease. The frequency of distribution of both the main and co-disease was extracted from the hospital's structured data, such that the synthetic narratives reflect the disease distribution among the patient's cohort. In order to validate the quality of the synthetic narratives, we annotated them to train a Named Entity Recognition (NER) algorithm. According to our assumptions, the validation of this system implies that the synthesized data used for its training are of acceptable quality.
Result: We report precision, recall and F1 score for the NER model while also considering metrics that take into account both exact and partial entity matches. Trained models are cautious, with a precision up to 0.8 for Entity Type match metric and a F1 score of 0.3.
Conclusion: Despite its inherent limitations, this technology has the potential to allow data interoperability by using encoded diseases across languages and regions without compromising data safety. Additionally, it facilitates the synthesis of unstructured patient data. In this way, the identification and training of models can be accelerated. We believe that this method may be able to generate discharge letters for any combination of main and co-diseases, which will significantly reduce the amount of time spent writing these letters by healthcare professionals.
Keywords: Generative AI; Language models; Non-english Language models; Synthetic data; Synthetic narratives.
© 2024. The Author(s).