A Representative Standardized Sample Set Selection For Improving Student’s Performance Prediction

Research Article
Sasi Regha, R and Uma Rani, R
DOI: 
http://dx.doi.org/10.24327/ijrsr.2018.0902.1693
Subject: 
science
KeyWords: 
NMFC, modified Computer Aided Design of Experiments, Divergence, Modified Principal Component Analysis
Abstract: 

A hybrid Artificial Fish Swarm-Cuckoo Search (AFSCS) and Non-negative Matrix Factorization Clustering (NMFC) was proposed for selecting optimal relevant features and removing redundant features in student academic dataset. However outliers and redundant data samples in the dataset are affecting the efficiency and effectiveness of classifiers. In this paper, initially, the most class specific representative samples are selected using Computer Aided Design of Experiments (CADEX) algorithm. The CADEX algorithm selects the sample that is closest to the mean and the next sample selection will be the one most distant from the already selected sample by using Euclidean distance. Principal Component Analysis (PCA) is used in the CADEX for selecting optimal components. Euclidean distance in CADEX only select right samples when all the attributes have the similar units. So, the Modified CADEX (MCADEX) is next proposed by using KullbackLeibler divergence. In this approach, for each class mean value is calculated then divergence between mean and data samples are found with multiple reference data samples. The highly divergence data samples are selected for each class. In the MCADEX algorithm, MPCA is used instead of PCA because some attributes in dataset might be in orders of magnitude of others, this may lead to create highest variance while eigen values calculation. In MPCA, the eigenvectors of the covariance matrix is derived from various similarity measurements like Mutual information, angle information and hybrid Gaussian and polynomial kernel. The sample selected datasets are used for predicting student performance using Prism and J48 classifiers. The experimental results show that the proposed sample selection approaches are improving accuracy of classifiers.