 Performance Analysis of Classification models

In a machine learning Algorithm once the model is built, the next step is the use of various performance criteria to evaluate Machine learning Models.

In the Classification model output is a discrete value therefore for classification performance analysis following methods are used  

  1. Confusion matrix
  2. Accuracy
  3. Precision
  4. Recall (sensitivity)
  5. Specificity
  6. ROC curve (AUC) ROC Area Under Curve is useful when we are not concerned about whether the small dataset/class of dataset is positive or not, in contrast to F1 score where the class being positive is important.
  7. F-score(F1 score is useful when the size of the positive class is relatively small)

Performance metrics should be chosen based on the problem domain, project goals, and objectives. 

A confusion matrix

A confusion matrix is a table that is  used to describe the performance of algorithm (or "classifier") on a set of test data for which the true values/targets are known.
For binary classifier,
confusion matrix is as shown in Table1. 

                                Figure1:Confusion Matrix

Total Samples=1000

Predicted Yes

Predicted No

Actual Yes



Actual No



                                                                    Figure2 Confusion Matrix Values

If the Machine Learning model is used to predict patient is diabetic or not.From

fig1 and fig2

"Yes": means He is Diabetic.

 "no" : means He is not a Diabetic.

Suppose total number of patients =1000(Total Samples)

Out of 1000samples

 800 patients are diabetic

100 are non Diabetic

100 are the wrong result


  • Correctly Predicted 900
  • Wrongly Predicted 100
  • Basic terms of confusion Matrix

    Predicted correctly: Diagonal element gives the correct prediction 

    True positives (TP): These are cases in which model predicted yes (they have the disease), and they do have diabetic

    True negatives (TN): Model predicted no, and they don't have diabetic 

    Wrong Prediction.

    False positives (FP): Model predicted yes, but they don't actually have the diabetic (Also known as a "Type I error.")

    False negatives (FN): Model predicted no, but they actually do have diabetic (Also known as a "Type II error.")


    For balance type of data Accuracy is used for performance analysis.

    Balance Data: In binary classification equal number of samples belogs to class I and class II.

  • Accuracy

    Accuracy=(TP+TN)/TP+FP+FN+TN = (800+100)/1000 = 0.9

    (90% Samples are correctly Classified ) 

    Error Rate=(FP+FN)/TP+TN+FN+FP = (40+60)/1000 = 0.1 

    Accuracy=1- Error Rate

  • Recall / Sensitivity/TPR

                         Recall  =Sensitivity:=TPR=TP/TP+FN(Actual Positive)

    Sensitivity=TP/actual yes = 800/840= 0.95 

  • TPR=True Positive Rate

  • False Positive Rate: Type one error

    FP/FP+TN= 60/(60+100) 

    True Negative Rate: 


  • Precision

    Pecision=TP/TP+FP(predicted yes)= 800/800+60

  • F Score: F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall. This is a weighted average of the true positive rate (recall) and precision.  

FScore=2*precision*recall/(Precision+Recall) b=2

b=1 precision and recall both are important FP and FN Both are having an impact 

If FPR is imp then b=0.5 between 1 and 0

FNR  is imp then b=2

  •  AUC-ROC

    The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a probability curve that plots the TPR against FPR at  various threshold values The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.

    In a ROC curve, a higher X-axis value indicates a higher number of False positives than     True negatives. While a higher Y-axis value indicates a higher number of True positives than False negatives. So, the choice of the threshold depends on the ability to balance between False positives and False negatives.

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.

 It is a probability curve that plots the TPR against FPR at various threshold values The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.

The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.

When AUC = 1, then the classifier is able to perfectly distinguish between all the Positive and the Negative class points correctly. If, however, the AUC had been 0, then the classifier would be predicting all Negatives as Positives, and all Positives as Negatives.

When 0.5<AUC<1, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is so because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.


