-
70B-parameter large language models in Japanese medical question-answering
Authors:
Issey Sukeda,
Risa Kishikawa,
Satoshi Kodera
Abstract:
Since the rise of large language models (LLMs), the domain adaptation has been one of the hot topics in various domains. Many medical LLMs trained with English medical dataset have made public recently. However, Japanese LLMs in medical domain still lack its research. Here we utilize multiple 70B-parameter LLMs for the first time and show that instruction tuning using Japanese medical question-ans…
▽ More
Since the rise of large language models (LLMs), the domain adaptation has been one of the hot topics in various domains. Many medical LLMs trained with English medical dataset have made public recently. However, Japanese LLMs in medical domain still lack its research. Here we utilize multiple 70B-parameter LLMs for the first time and show that instruction tuning using Japanese medical question-answering dataset significantly improves the ability of Japanese LLMs to solve Japanese medical license exams, surpassing 50\% in accuracy. In particular, the Japanese-centric models exhibit a more significant leap in improvement through instruction tuning compared to their English-centric counterparts. This underscores the importance of continual pretraining and the adjustment of the tokenizer in our local language. We also examine two slightly different prompt formats, resulting in non-negligible performance improvement.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
Authors:
Issey Sukeda,
Masahiro Suzuki,
Hiroki Sakaji,
Satoshi Kodera
Abstract:
In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in doma…
▽ More
In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in domain adaptation remain unknown. Here we show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. In doing so, we employ a multifaceted evaluation for multiple-choice questions, including scoring based on "Exact match" and "Gestalt distance" in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services.
△ Less
Submitted 30 November, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
COVID-19 forecasting using new viral variants and vaccination effectiveness models
Authors:
Essam A. Rashed,
Sachiko Kodera,
Akimasa Hirata
Abstract:
Background: Recently, a high number of daily positive COVID-19 cases have been reported in regions with relatively high vaccination rates; hence, booster vaccination has become necessary. In addition, infections caused by the different variants and correlated factors have not been discussed in depth. With large variabilities and different co-factors, it is difficult to use conventional mathematica…
▽ More
Background: Recently, a high number of daily positive COVID-19 cases have been reported in regions with relatively high vaccination rates; hence, booster vaccination has become necessary. In addition, infections caused by the different variants and correlated factors have not been discussed in depth. With large variabilities and different co-factors, it is difficult to use conventional mathematical models to forecast the incidence of COVID-19.
Methods: Machine learning based on long short-term memory was applied to forecasting the time series of new daily positive cases (DPC), serious cases, hospitalized cases, and deaths. Data acquired from regions with high rates of vaccination, such as Israel, were blended with the current data of other regions in Japan to factor in the potential effects of vaccination. The protection provided by symptomatic infection was also considered in terms of the population effectiveness of vaccination as well as the waning protection and ratio and infectivity of viral variants. To represent changes in public behavior, public mobility and interactions through social media were also included in the analysis.
Findings: Comparing the observed and estimated new DPC in Tel Aviv, Israel, the parameters characterizing vaccination effectiveness and the waning protection from infection were well estimated; the vaccination effectiveness of the second dose after 5 months and the third dose after two weeks from infection by the delta variant were 0.24 and 0.95, respectively. Using the extracted parameters regarding vaccination effectiveness, new cases in three prefectures of Japan were replicated.
△ Less
Submitted 11 August, 2022; v1 submitted 23 January, 2022;
originally announced January 2022.
-
Knowledge discovery from emergency ambulance dispatch during COVID-19: A case study of Nagoya City, Japan
Authors:
Essam A. Rashed,
Sachiko Kodera,
Hidenobu Shirakami,
Ryotetsu Kawaguchi,
Kazuhiro Watanabe,
Akimasa Hirata
Abstract:
Accurate forecasting of medical service requirements is an important big data problem that is crucial for resource management in critical times such as natural disasters and pandemics. With the global spread of coronavirus disease 2019 (COVID-19), several concerns have been raised regarding the ability of medical systems to handle sudden changes in the daily routines of healthcare providers. One s…
▽ More
Accurate forecasting of medical service requirements is an important big data problem that is crucial for resource management in critical times such as natural disasters and pandemics. With the global spread of coronavirus disease 2019 (COVID-19), several concerns have been raised regarding the ability of medical systems to handle sudden changes in the daily routines of healthcare providers. One significant problem is the management of ambulance dispatch and control during a pandemic. To help address this problem, we first analyze ambulance dispatch data records from April 2014 to August 2020 for Nagoya City, Japan. Significant changes were observed in the data during the pandemic, including the state of emergency (SoE) declared across Japan. In this study, we propose a deep learning framework based on recurrent neural networks to estimate the number of emergency ambulance dispatches (EADs) during a SoE. The fusion of data includes environmental factors, the localization data of mobile phone users, and the past history of EADs, thereby providing a general framework for knowledge discovery and better resource management. The results indicate that the proposed blend of training data can be used efficiently in a real-world estimation of EAD requirements during periods of high uncertainties such as pandemics.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Model-based approach for analyzing prevalence of nuclear cataracts in elderly residents
Authors:
Sachiko Kodera,
Akimasa Hirata,
Fumiaki Miura,
Essam A. Rashed,
Natsuko Hatsusaka,
Naoki Yamamoto,
Eri Kubo,
Hiroshi Sasaki
Abstract:
Recent epidemiological studies have hypothesized that the prevalence of cortical cataracts is closely related to ultraviolet radiation. However, the prevalence of nuclear cataracts is higher in elderly people in tropical areas than in temperate areas. The dominant factors inducing nuclear cataracts have been widely debated. In this study, the temperature increase in the lens due to exposure to amb…
▽ More
Recent epidemiological studies have hypothesized that the prevalence of cortical cataracts is closely related to ultraviolet radiation. However, the prevalence of nuclear cataracts is higher in elderly people in tropical areas than in temperate areas. The dominant factors inducing nuclear cataracts have been widely debated. In this study, the temperature increase in the lens due to exposure to ambient conditions was computationally quantified in subjects of 50-60 years of age in tropical and temperate areas, accounting for differences in thermoregulation. A thermoregulatory response model was extended to consider elderly people in tropical areas. The time course of lens temperature for different weather conditions in five cities in Asia was computed. The temperature was higher around the mid and posterior part of the lens, which coincides with the position of the nuclear cataract. The duration of higher temperatures in the lens varied, although the daily maximum temperatures were comparable. A strong correlation (adjusted R2 > 0.85) was observed between the prevalence of nuclear cataract and the computed cumulative thermal dose in the lens. We propose the use of a cumulative thermal dose to assess the prevalence of nuclear cataracts. Cumulative wet-bulb globe temperature, a new metric computed from weather data, would be useful for practical assessment in different cities.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Correlation between COVID-19 morbidity and mortality rates in Japan and local population density, temperature and absolute humidity
Authors:
Sachiko Kodera,
Essam A. Rashed,
Akimasa Hirata
Abstract:
This study analyzed the morbidity and mortality rates of the COVID-19 pandemic in different prefectures of Japan. Under the constraint that daily maximum confirmed deaths and daily maximum cases should exceed 4 and 10, respectively, 14 prefectures were included, and cofactors affecting the morbidity and mortality rates were evaluated. In particular, the number of confirmed deaths was assessed excl…
▽ More
This study analyzed the morbidity and mortality rates of the COVID-19 pandemic in different prefectures of Japan. Under the constraint that daily maximum confirmed deaths and daily maximum cases should exceed 4 and 10, respectively, 14 prefectures were included, and cofactors affecting the morbidity and mortality rates were evaluated. In particular, the number of confirmed deaths was assessed excluding the cases of nosocomial infections and nursing home patients. A mild correlation was observed between morbidity rate and population density (R2=0.394). In addition, the percentage of the elderly per population was also found to be non-negligible. Among weather parameters, the maximum temperature and absolute humidity averaged over the duration were found to be in modest correlation with the morbidity and mortality rates, excluding the cases of nosocomial infections. The lower morbidity and mortality are observed for higher temperature and absolute humidity. Multivariate analysis considering these factors showed that determination coefficients for the spread, decay, and combined stages were 0.708, 0.785, and 0.615, respectively. These findings could be useful for intervention planning during future pandemics, including a potential second COVID-19 outbreak.
△ Less
Submitted 29 July, 2020; v1 submitted 28 July, 2020;
originally announced July 2020.
-
Influence of Absolute Humidity, Temperature and Population Density on COVID-19 Spread and Decay Durations: Multi-prefecture Study in Japan
Authors:
Essam A. Rashed,
Sachiko Kodera,
Jose Gomez-Tames,
Akimasa Hirata
Abstract:
This study analyzed the spread and decay durations of the COVID-19 pandemic in different prefectures of Japan. During the pandemic, affordable healthcare was widely available in Japan and the medical system did not suffer a collapse, making accurate comparisons between prefectures possible. For the 16 prefectures included in this study that had daily maximum confirmed cases exceeding ten, the numb…
▽ More
This study analyzed the spread and decay durations of the COVID-19 pandemic in different prefectures of Japan. During the pandemic, affordable healthcare was widely available in Japan and the medical system did not suffer a collapse, making accurate comparisons between prefectures possible. For the 16 prefectures included in this study that had daily maximum confirmed cases exceeding ten, the number of daily confirmed cases follow bell-shape or log-normal distribution in most prefectures. A good correlation was observed between the spread and decay durations. However, some exceptions were observed in areas where travelers returned from foreign countries, which were defined as the origins of infection clusters. Excluding these prefectures, the population density was shown to be a major factor affecting the spread and decay patterns, with R2=0.39 (p<0.05) and 0.42 (p<0.05), respectively, approximately corresponding to social distancing. The maximum absolute humidity was found to affect the decay duration normalized by the population density (R2>0.36, p <0.05). Our findings indicate that the estimated pandemic spread duration, based on the multivariate analysis of maximum absolute humidity, ambient temperature, and population density (adjusted R2=0.53, p-value<0.05), could prove useful for intervention planning during potential future pandemics, including a second COVID-19 outbreak.
△ Less
Submitted 21 July, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.