Performance Analysis of Classification model

Performance Analysis of Classification models

In a machine learning Algorithm once the model is built, the next step is the use of various performance criteria to evaluate Machine learning Models.

In the Classification model output is a discrete value therefore for classification performance analysis following methods are used

Confusion matrix
Accuracy
Precision
Recall (sensitivity)
Specificity
ROC curve (AUC) ROC Area Under Curve is useful when we are not concerned about whether the small dataset/class of dataset is positive or not, in contrast to F1 score where the class being positive is important.
F-score(F1 score is useful when the size of the positive class is relatively small)

Performance metrics should be chosen based on the problem domain, project goals, and objectives.

A confusion matrix

A confusion matrix is a table that is used to describe the performance of algorithm (or "classifier") on a set of test data for which the true values/targets are known.
For binary classifier, confusion matrix is as shown in Table1.

Figure1:Confusion Matrix

Total Samples=1000

Predicted Yes

Predicted No

Actual Yes

TP(800)

FN(40)

Actual No

FP(60)

TN(100)

Figure2 Confusion Matrix Values

If the Machine Learning model is used to predict patient is diabetic or not.From

fig1 and fig2

"Yes": means He is Diabetic.

"no" : means He is not a Diabetic.

Suppose total number of patients =1000(Total Samples)

Out of 1000samples

800 patients are diabetic

100 are non Diabetic

100 are the wrong result

Correctly Predicted 900
Wrongly Predicted 100
Basic terms of confusion Matrix

Predicted correctly: Diagonal element gives the correct prediction

True positives (TP): These are cases in which model predicted yes (they have the disease), and they do have diabetic

True negatives (TN): Model predicted no, and they don't have diabetic

Wrong Prediction.

False positives (FP): Model predicted yes, but they don't actually have the diabetic (Also known as a "Type I error.")

False negatives (FN): Model predicted no, but they actually do have diabetic (Also known as a "Type II error.")

For balance type of data Accuracy is used for performance analysis.

Balance Data: In binary classification equal number of samples belogs to class I and class II.
Accuracy
Accuracy=(TP+TN)/TP+FP+FN+TN = (800+100)/1000 = 0.9
(90% Samples are correctly Classified )
Error Rate=(FP+FN)/TP+TN+FN+FP = (40+60)/1000 = 0.1

Accuracy=1- Error Rate
Recall / Sensitivity/TPR
Recall =Sensitivity:=TPR=TP/TP+FN(Actual Positive)
Sensitivity=TP/actual yes = 800/840= 0.95
TPR=True Positive Rate
False Positive Rate: Type one error
FP/FP+TN= 60/(60+100)
True Negative Rate:

TN/TN+FP=100/100+60
Precision
Pecision=TP/TP+FP(predicted yes)= 800/800+60
F Score: F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall. This is a weighted average of the true positive rate (recall) and precision.

FScore=2*precision*recall/(Precision+Recall) b=2

b=1 precision and recall both are important FP and FN Both are having an impact

If FPR is imp then b=0.5 between 1 and 0

FNR is imp then b=2

AUC-ROC

The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a probability curve that plots the TPR against FPR at various threshold values The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.
In a ROC curve, a higher X-axis value indicates a higher number of False positives than True negatives. While a higher Y-axis value indicates a higher number of True positives than False negatives. So, the choice of the threshold depends on the ability to balance between False positives and False negatives.

The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems.
It is a probability curve that plots the TPR against FPR at various threshold values The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.
The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.

When AUC = 1, then the classifier is able to perfectly distinguish between all the Positive and the Negative class points correctly. If, however, the AUC had been 0, then the classifier would be predicting all Negatives as Positives, and all Positives as Negatives.
When 0.5<AUC<1, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is so because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.

Search This Blog

Artificial Neural Network