A multivariate analysis of the National Cancer Institute gene expression database is reported here. The soft independent modelling of a class analogy approach achieved cell line classification according to histological origin. With the PCA method, based on the expression of 9605 genes and ESTs, classification of colon, leukaemia, renal, melanoma and CNS cells could be performed, but not of lung, breast and ovarian cells. Another multivariate procedure, called partial least squares discriminant analysis (PLS-DA), provides bioinformatic clues for the selection of a limited number of gene transcripts most effective in discriminating different tumoral histotypes. Among them it is possible to identify candidates in the development of new diagnostic tests for cancer detection and unknown genes deserving high priority in further studies. In particular, melan-A, acid phosphatase 5, dopachrome tautomerase, S100-beta and acid ceramidase were found to be among the most important genes for melanoma. The potential of the present bioinformatic approach is exemplified by its ability to identify differentiation and diagnostic markers already in use in clinical settings, such as protein S-100, a prognostic parameter in patients with metastatic melanoma and a screening marker for melanoma metastasis.