A Representative Standardized Sample Set Selection For Improving Student’s Performance Prediction

SUBJECT AREA

Life Sciences

Biology
Botany
Zoology
Microbiology
Biotechnology
More→

Health Science

Dermatology
Medicine and Dentistry
Dentistry
Endocrinology
Immunology
Nursing and Health Professions
More→

Physical Sciences and Engineering

Physics
Kinetics
Mechanics
Electromagnetics
Mathematics
Thermodynamic
More→

Social Science and Humanities

Physical Education
Economics
Social Sciences
Arts and Humanities
Business Management
Management
Accounting
More→

A Representative Standardized Sample Set Selection For Improving Student’s Performance Prediction

Research Article

Sasi Regha, R and Uma Rani, R

DOI:

http://dx.doi.org/10.24327/ijrsr.2018.0902.1693

Subject:

science

KeyWords:

NMFC, modified Computer Aided Design of Experiments, Divergence, Modified Principal Component Analysis

Abstract:

A hybrid Artificial Fish Swarm-Cuckoo Search (AFSCS) and Non-negative Matrix Factorization Clustering (NMFC) was proposed for selecting optimal relevant features and removing redundant features in student academic dataset. However outliers and redundant data samples in the dataset are affecting the efficiency and effectiveness of classifiers. In this paper, initially, the most class specific representative samples are selected using Computer Aided Design of Experiments (CADEX) algorithm. The CADEX algorithm selects the sample that is closest to the mean and the next sample selection will be the one most distant from the already selected sample by using Euclidean distance. Principal Component Analysis (PCA) is used in the CADEX for selecting optimal components. Euclidean distance in CADEX only select right samples when all the attributes have the similar units. So, the Modified CADEX (MCADEX) is next proposed by using KullbackLeibler divergence. In this approach, for each class mean value is calculated then divergence between mean and data samples are found with multiple reference data samples. The highly divergence data samples are selected for each class. In the MCADEX algorithm, MPCA is used instead of PCA because some attributes in dataset might be in orders of magnitude of others, this may lead to create highest variance while eigen values calculation. In MPCA, the eigenvectors of the covariance matrix is derived from various similarity measurements like Mutual information, angle information and hybrid Gaussian and polynomial kernel. The sample selected datasets are used for predicting student performance using Prism and J48 classifiers. The experimental results show that the proposed sample selection approaches are improving accuracy of classifiers.

Certificate Request Form

10106-A-2018.pdf

Monthly archive