Neural network based ensemble classifier for digital mammography
thesisposted on 06.12.2017, 00:00 authored by NULL McLeodNULL McLeod
Breast cancer is a debilitating condition that has a high mortality and morbidity rate and is the most commonly diagnosed form of cancer in women. The aetiology of the disease is not known however mitigating lifestyle and genetic conditions are known to increase the likelihood of contracting the condition. This thesis proposes a number of techniques designed to increase the diagnostic accuracy of Computer Aided Diagnostic (CADx) systems. The first technique involved breaking down the benign and malignant classes into sub-classes through clustering and using a support vector machine classifier on these sub-classes in a technique known as Soft Clustered Support Vector Machine (SCSVM). The sub-classes are designed to reduce the variability within the classes in order to increase classification accuracy. The second technique attempted to use clustering in two complimentary mechanisms. The first was to retrieve those sub-classes generated through clustering that readily identified to a cluster by retrieving the cluster label. The second mechanism uses the variability in cluster membership of k-means clustering to introduce diversity into a feed forward neural network for the construction of an ensemble. The idea is that clustering processes the easily classified patterns allowing the ensemble to be trained on the harder-to-classify patterns. This technique is known as a Clustered Ensemble Neural Network (CENN). The third technique utilizes changing the number of neurons of a single hidden layer of feed forward neural networks. These candidate networks are then ranked in terms of performance in order to create an ensemble classifier which is combined or fused together using the majority vote algorithm. The technique is known as a Variable Hidden Neuron Ensemble Classifier (VHNEC). Only three base classifiers are needed to obtain high classification accuracy. During the creation of the VHNEC it is found that it is possible to predict a range that holds the best performing networks utilizing the number of hidden neurons through polynomial regression. The SCSVM produces a classification accuracy of 94%, which is 4% higher than that achieved by a Support Vector Machine classifier on a dataset from the Digital Database of Screening Mammography (DDSM). The CENN technique achieves a classification accuracy of 91% on mass anomalies from the DDSM and while the improvement in accuracy is not high the CENN classifier reduces the variability in classification accuracy. The VHNEC technique is tested on mass anomalies from the DDSM and biopsy samples from the Wisconsin Breast Cancer dataset and achieves 99% classification accuracy. The research also identifies a mechanism (polynomial regression) to determine the upper and lower boundaries for predicting the number of neurons in the hidden layer that results in the best classification accuracy for a neural network being identified. The technique successfully predicts the range that contains the highest performing classifiers on the DDSM and Wisconsin Breast Cancer datasets. Considering that breast cancer can be a fatal condition1-3 and that the research presented achieves a high degree of accuracy on breast masses and can reduce the resources (time, cpu and memory) in configuring and operating computer aided diagnostic systems it is highly significant. This thesis also makes recommendations about possible improvements to the proposed techniques including suggestions for future research.