Description of Session
Lung cancer is the most incident and mortal of all types of cancer for both genders. The rates of this disease continue increasing rapidly year after year. Approximately 80% of the newly diagnosed lung cancers are non-small cell lung cancers. Early diagnosis of any type of cancer rise the patient probability to heal, thus, improve the chances of survival. Machine learning allows us to process a large number of variables involved in this disease. Using metabolites as attributes for the analysis, we can discern lung cancer patients from healthy patients. In addition, machine-learning algorithms show us which metabolites has an important contribution in the classification. The goal of this study is to demonstrate the accuracy, sensitivity and specificity of a supervised learning algorithm to classify and predict non-small cell lung cancer, using concentration values found in the serum and plasma metabolome of affected and healthy humans. We obtained the dataset from the Metabolomics Workbench repository, which contains 335 samples and 139 known metabolites detected. From the models applied, Random Forest Classifier obtained the highest accuracy. It was possible to identify metabolite patterns, which classify participants according to diagnosis with > 75 percentage accuracy. Important serum metabolites for the classification included aspartic acid and fructose, and cystine and arabinose for the plasma profiles. The metabolomic profile of healthy and lung cancer patients allows the classification and prediction of this disease with relatively high accuracy, which means that some metabolites are strongly associated with this condition, and could be considered as potential biomarkers. This study helps us to understand the biological processes of lung cancer, and contributes significantly in the field of personalized medicine, by giving clues for an earlier diagnosis.