Aims: The hepatitis C virus (HCV) has developed a strategy to coexist with its host resulting in varying degrees of tissue and cell damage, which generate different pathological phenotypes, such as varying degrees of fibrosis, cirrhosis, and hepatocellular carcinoma (HCC). However, there is no integrated information that can predict the evolutionary course of the infection. We propose to combine Near-infrared spectroscopy (NIRS) and machine learning techniques to provide a predictive model. In this work, we propose to discriminate HCV positivity in biobank patient serum samples.
Methods: 126 serum samples from 38 HCV patients in different stages of the disease were obtained from the Biobank of Hospital Universitario Fundación Alcorcon. NIRS spectrum was captured by a FT-NIRS Spectrum 100 (Perkin Elmer) device in reflectance mode. For each patient, the HCV positivity was identified (PCR) and labeled as detectable =1 and undetectable =0. We propose an L1-penalized logistic regression model to classify each spectrum as positive (1) or negative (0) for HCV presence (x). The regularization parameter is selected using 5- fold cross-validation. The penalized model will induce sparsity in the solution so that only a few relevant wavelengths will be different from zero.
Results: L1-penalized logistic regression model provided 167 wavelengths different from zero. The accuracy on an independent test set was 0.78.
Conclusions: We present a straightforward promising approach to detect HCV positivity from patient serum samples combining NIRS and machine learning techniques. This result is encouraging to predict HCV progression, among other applications. Clinical relevance- We presented a simple while promising approach to use machine learning and NIRS to analyze viral presence on sample serums.