DIABETIC PREDICTION USING FUZZY BACK PROPAGATION AND ANALYSIS

As the amount of data is increasing, it has become merely impossible to carry out the manual analysis of data. As there is enough variety of data crisp data set is not so practical, hence the need to process the fuzzy data is necessary. Data mining algorithms helps in analyzing and predicting the huge amount of data, with very less human effort. This project aims at implementation of enhanced fuzzy back propagation neural network with triangular membership function and its comparative analysis with neural network, Naive Bayes. The project aims at predicting the states of diabetic. As diabetic is a serious problem and need to be predicted. We have eight indicators which determine the state of the protein. The best algorithm found was Naïve Bayes with an accuracy of 89% followed by fuzzy ANN with 86% accuracy and standard ANN had 82% accuracy. The study was done in java using Java Servlet packages and SQL database for storage. A precise and accurate prediction is necessary as sometimes there can human errors which lead to poor diagnosis of data and sometimes false report, so accurate prediction is required.


INTRODUCTION
The diabetic disease among the pregnant women are major concern issue as it not only affect themselves it also effect the newborn child, hence a proper diagnosis of the disease is necessary. There are many factors that affect the diabetic condition among the women which are determined by the optimized algorithm. The "organizations" estimated that 11 million deaths occur globally due to diabetic disease. Due to advancement in the medical and health industry it has become easier to determine the risk factors which may be the most probable reason for the diabetic disease, but there are still many cases of improper treatment and misuse of the resources have been seen. Medical Diagnosis is necessary procedure but it is complicated and need to handle out very carefully and due to increasing population the case for diabetic disease is increasing alarmingly and due to the shortage of medical staff it will be very beneficial if there is an automated system which can provide with efficient and accurate result. The expertise required to diagnosis is not posses by all the doctors. Hence we need an automated system to overcome the problem.
These study implements algorithms such as Fuzzy back propagation using triangular membership function, standard artificial neural network and naïve Bayes algorithm and compare them. The analysis of fuzzy back propagation will also be done by epoch, N and M (memory constant) will be varied. The best results will be taken under consideration for the proposed system and for further implementation.
The results which are taken and the graphs have been plotted and variation of results have been seen in the study.

LITERATURE REVIEW
The first Fuzzy Back Propagation neural network was purposed in the year 1994 by Stefka Stoeva, Alexander Nikov. They showed the importance of fuzzy approach as compared to the standard neural network. They approach simulated hierarchal approach towards the implementation of the fuzzy neural network. The next application of Fuzzy Back Propagation was done by Li, Liu, et al for early diagnosis of hypoxic ischemic encephalopathy in newborns. The next approach was taken by M. Anbarasi, M. A. Saleem Durai for prediction of protein folding kinetics in the year 2015.
The study of standard ANN goes back to 1980s by Frank Rosenblatt (1958) who developed first perceptron algorithms and the outcome of the study was used to develop smart automated software and systems. Later on it became the most important algorithm in the field of supervised learning and prediction. It was used in the application credit risk evaluation were it was compared with Naïve Bayes and SVM by Lohit et al in the year 2016. Naïve Bayes is one more important approach in terms of probalistic approach. It has proven to give better results when compared with the algorithms such as PSO, Random forest, ANN in some cases and other algorithms, hence it is necessary to compare the results with Naïve Bayes algorithm.
Jeroen Eggermont and Joost N. Kok and Walter A. Kosters were the first to work with the dataset .They used genetic algorithm. In the year 2004 Hastie, Trevor, et al tried the same dataset with the same data set and got a decent results. Till now no one has implemented Fuzzy Back Propagation Using Triangular Membership function on this dataset. In this study variation of parameters such as no. of epoch, N and M (memory constant) will be varied.
Many works on Naïve Bayes classifiers have been also done in this context.

Dataset
This study is based upon the data set on which the research has been carries out and the attributes used for prediction. The data set was donated by National Institute of Diabetes and Digestive and Kidney Diseases. A total of 760 records are taken and worked upon. As the missing values of the dataset were replaced by 0 so the dataset required some preprocessing. The method adopted is replacing the 0 value with the global mean of the following column. The data set was again preprocessed as it needed to be converted to binary data set where the output can be either 1 or 2. This 1 or 2 represent the label rather than their usual value, where 1 represents credit risk and 2 represents credit no risk. The output given by is in range of 0 to 1. As given by, the output is marked 2 if value is greater than 0.75 and 1 if the output is lower than 0.75, this result in a binary data set as shown in Table 1.
There are eight predictor attributes in the data set that are 1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. Class variable (0 or 1). To increase the accuracy hope the above algorithms, we need to preprocess the data as different algorithm required different preprocessing in Fuzzy algorithms data preprocessing is done using membership function and in normal algorithms preprocessing is done using average value in non-fuzzy algorithms that data is divided into classes. Each classes can be varied from maybe 3-5.
Classes given a label for example high medium low but fuzzy Back propagation the membership function divides 1 data set into sets of 3 using triangular membership function. In naive Bayesian the dataset also the data are divided using membership function Data set normalisation need to be done where the missing values or the zero values need to be replaced by the Global mean The data was stored in a CSV (comma delimiter) and was stored in mysql database using PHP Myadmin for all the ALGORITHMS. The data set is derived using java JDBC driver and loaded for further processing. The data set are divided into group of two. One group is used for training of the dataset and the other group is used for testing of the data set.

Artificial Neural Network
ANN is an algorithm which is used to model the human brain using the concept of perceptron network. It is a learning based algorithm in which the weights are trained first again and again unless they measure upon the output. This is called the training phase. It is a standard algorithm which has same proceedings as the fuzzy ANN but its simple and less complex.
The figure above shows the architecture of ANN as first it consists of first the input neuron to which corresponding data are inserted. Then we have hidden layers whose input is calculated using the summation function of weights Now the hidden layers are calculated and fed into output neuron with a bias and the error is calculated. Then weight updation formula are applied and then again checked till we get the best accuracy and weight set is noted and used for testing of the data, to which the application is formed.

Fuzzy Back propagation Using Triangular Membership Function
Algorithm for Fuzzy Back propagation using triangular membership function: 1. Generate weights for the hidden input and output layer randomly between the range from 0.
Compute step 6 similar for the output neuron again. 8.
learning factor and memory constant. 9.
Statement should be used for the weight updation. 10. Iterate from step 2 to 9 unless, we receive less errors.

Naïve Bayes
The algorithm determines the occurrence of a state using probalistic prediction. Works on class labels. Attribute are predicted to be conditionally independent of one another. As the algorithm works on discrete data, in order to implement upon on continuous data, we need to divide the attributes among classes. We can use Gaussian, mean and fuzzy classifier. Gaussian produce the very poor result, hence we will be using mean and fuzzy classifiers.
The formula above is used in which data is already divided in set of classes and are determined whether they are codependent or independent. Another important classifier for Naïve Bayes is Naive Bayesian classifier is very effective in case of cost; it reduces the calculation cost It is best for the problems where there is strong relation between the variables and the data given.

RESULTS AND DISCUSSION
The artificial neural network and fuzzy back propagation using triangular membership function and naive Bayesian are all implemented in JDK 8.0, JSP (Java Servlet Package) and data set is managed using Mysql. The csv file has been converted to SQL file using PHP my admin and using Netbeans it has been past parsed. In fuzzy back propagation results have been obtained using variation of parameters. There are three main parameters Number of iterations, learning factor and momentum factor. Iteration has been varied between 100 to 500000. Learning factor have been varied between 0.2 to 0.9 and momentum factor have also been varied between 0.1 to 0.95 and results are obtained as follows In the above table 3 the comparison is done by taking iteration and momentum factor. Momentum factor is varied from 0.2 to 0.9 and Iteration from 300 to 10000 and their corresponding error has been noted.  It has been observed that the increasing Iteration results in better accuracy same goes momentum factor as it increases the error decreases.
The next table is between iteration and learning factor as it can be observed that the increase in iteration leads to decreasing in error but increase of learning factor leads to increase in error.
The next table is between learning factor and momentum factor. It can be seen that learning factor doesn't affect that much as compared to momentum factor. The results comply with the above observations.
The FIGURE 2 shows relation between error and learning factor, as it can be seen that increases in learning factor mostly leads to increase in error, hence in order to achieve better accuracy learning factor should be kept low.     The Figure 3 is between learning rate and Momentum factor, it can be seen as the moment factor increases the error decreases.
The Figure 4 is between and number of iteration and error rate it can be seen as we increase our iteration for certain time the error decreases but after a certain iteration the error rate remain constant.
Now the accuracy of the dataset is taken by three algorithms and their confusion matrix is taken. A set of 300 values have been taken to determine testing.
As it can be seen from the above results Naïve Bayes tends to give better accuracy than the fuzzy ANN and ANN but fuzzy ANN shows better accuracy than the standard ANN.

CONCLUSION
The novelty of the work is to make comparison among machine learning algorithm and to determine the best which we obtained was Naïve Bayes as it more efficient in classification than the other two, although the other algorithm were slightly less they can also be used to determine the diabetic diseases prediction. The results obtained can be base for implementation of a system for diabetic diseases prediction.