Objective: To construct a signature for identifying active tuberculosis (TB) based on the relative expression orderings (REOs) of gene expression within a single sample. Methods: Using peripheral whole blood samples from 75 active TB and 69 latently infected individuals from four datasets as the training set, and highly stable REO patterns were extracted from the gene expression profile of the two groups of samples. Then, the gene pairs that reversed the REO pattern between the two groups were selected, and each gene pair was ranked in descending order based on their reversal degree. Finally, the top k gene pairs with the highest classification accuracy were selected as the signature for independent dataset validation. Results: A signature composed of seven gene pairs, denoted as 7-GPS, was constructed from the training set. The accuracy rate for 7-GPS to distinguish active TB from latently infected samples was 88.89%, and the accuracy rate for distinguishing active TB from normal samples was 90.09%. In the mixed validation data from different detection platforms, the AUC value for distinguishing active TB from latently infected samples was 0.914 (95%CI: 0.881-0.948), and the AUC value for distinguishing active TB from normal samples was 0.934 (95%CI: 0.904-0.964). In addition, the four genes ETV7, BATF2, ANKRD22 and CARD17P from this signature tended to be highly expressed in peripheral blood samples of active TB, and their expression values were significantly related to the duration of anti-tuberculosis treatment in clinical. Conclusion: The 7-GPS signature is robust and suitable for individualized analysis of a single peripheral blood sample. It has certain clinical application potential.
目的: 基于单个样本内基因表达值的相对高低秩序关系(REOs)构建识别活动性结核的基因对标志。 方法: 以来自4 个数据集的75 例活动性结核及69 例潜伏感染者的外周全血样本为训练集,分别提取上述两类样本基因表达谱中高度稳定的REOs模式;然后筛选出二者间REOs模式发生逆转的基因对集合,并根据每个基因对在两类样本间发生逆转的程度将其降序排列;最后选取分类准确率最高的前k 个基因对作为标志,进行独立数据集验证。 结果: 在训练集中筛选出一个由7对基因组成的标志,即7-GPS;该标志应用多数投票规则区分活动性结核与潜伏感染样本的准确率为88.89%,区分活动性结核与正常样本的准确率为90.09%;在来自不同检测平台的混合验证数据中,其区分活动性结核与潜伏感染样本的受试者工作特征曲线下面积(AUC)为0.914(95%CI:0.881~0.948),区分活动性结核与正常样本的AUC为0.934(95%CI:0.904~0.964);另外,该标志中的4 个基因ETV7、BATF2、ANKRD22与CARD17P倾向于在活动性结核外周血样本中高表达,且其表达值与临床抗结核治疗时间显著相关。 结论: 7-GPS标志的分类效能良好,适合对单个外周血样本进行个体化分析,具有一定的应用潜力。.