Published on October 2019 | Data Science, Machine Learning
Breast cancer is one of the most frequently diagnosed cancers among women worldwide. Accurate detection of Breast cancer is essential for providing better treatment and risk minimization of the patients. Recently, the collection of biological data like gene expression, protein sequences, DNA sequences are used due to improvements of accessible data mining techniques to diagnosis the disease at an earlier stage. The current state-of-art methods reported to have certain limitations in their diagnostic capability. In order to improve the breast cancer classification, an efficient technique called Gaussian Kernelized Neighbor Embedding based Light Gradient Boost Classification (GKNE-LGBC) technique is introduced. The GKNE-LGBC technique considers the benchmark microarray dataset and performs two processes such as feature selection and classification for detecting breast cancer using gene expression data. The number of gene and the data are collected from the microarray dataset. After collecting, the Gaussian Kernelized stochastic neighbor embedding algorithm is applied to select the relevant features (i.e. genes) and remove the irrelevant features based on the distance similarity. Next, the classification of the gene expression data is done with the help of steepest descent light gradient boosting algorithm. The boosting algorithm initially constructs’ number of weak learners i.e. bivariate regression tree to classify the input expression data into normal or cancerous with the selected features. Then the weak classifiers are combined into strong by minimizing the training error. This helps to improve breast cancer detection accuracy and minimizes the false positive rate. The experimental evaluation is carried out using gene microarray dataset with various parameters such as breast cancer detection accuracy, false positive rate and breast cancer detection time with a number of genes. The experimental results confirm that the proposed GKNE-LGBC technique accurately identifies breast cancer with higher accuracy, and minimal time complexity as well as false positive rate as compared to the state-of-art- methods.