Role Of Confusion Matrix in Cyber Security

5 min readAug 25, 2021

Cyber Security

Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks, here digital attacks can be stealing information or spying on another system, and much more. There are cybersecurity experts present whose work is to protect users or prevent the digital attack, here digital attack is also known as cybercrime.

Confusion Matrix

A confusion matrix is a performance measurement technique for Machine learning classification. It is a kind of table which helps you to know the performance of the classification model on a set of test data for that the true values are known. The term confusion matrix itself is very simple, but its related terminology can be a little confusing. Here, some simple explanation is given for this technique.

TP (True Positive)- You predicted positive and it turns out to be true. For example, you had predicted that France would win the world cup, and it won.
TN (True Negative)- When you predicted negative, and it is true. You had predicted that England would not win and it lost.
FP (False Positive)- Your prediction is positive, but it is false. You had predicted that England would win, but it lost. It is also called Type-1 Error
FN (False Negative)- Your prediction is negative, and the result is also false. You had predicted that France would not win, but it won. It is also called Type-2 Error.

After the confusion matrix is created and we determine all the components values, it becomes quite easy for us to calculate the accuracy.

Accuracy Score can be calculated from the confusion matrix:

From the above formula, the sum of TP (True Positive) and the TN (True Negative) is the correct predicted results. Hence in order to calculate the accuracy in percentage, we divide with all the other components. However, there are some problems with the accuracy and we cannot completely depend on it.

Type I error {False Positive}

This type of error is the most dangerous. In such cases, our system predicts that we are safe and secure with no attack but in actual cyber attack actually takes place. In this case, no notification would have reached the security team and nothing can be done to prevent it.

Type II error {False Negative}

This type of error is not very dangerous as our system is protected in reality but the model predicted an attack. the team would get notified and check for any malicious activity.

Need for Confusion Matrix in Machine learning

It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.

The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. The matrix itself can be easily understood and implemented to test an ML model.

Cyber Attacks

A cyber attack is an attempt to disable computers, steal data, or use a breached computer system to launch additional attacks. Cybercriminals use different methods to launch a cyberattack that includes malware, phishing, ransomware, man-in-the-middle attack, or other methods.

CYBERCRIMES AND CONFUSION MATRIX

Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack. Detecting and avoiding cyber-attacks are difficult tasks. However, researchers have recently been solving these problems by developing security models and making predictions through artificial intelligence methods. A high number of methods of crime prediction are available in the literature. On the other hand, they suffer from a deficiency in predicting cyber-crime and cyber-attack methods. This problem can be tackled by identifying an attack and the perpetrator of such attack, using actual data. The data include the type of crime, gender of the perpetrator, damage, and methods of attack. The data can be acquired from the applications of the persons who were exposed to cyber-attacks to the forensic units. In this paper, we analyze cyber-crimes in two different models with machine-learning methods and predict the effect of the defined features on the detection of the cyber-attack method and the perpetrator. We used eight machine-learning methods in our approach and concluded that their accuracy ratios were close. The Support Vector Machine Linear was found out to be the most successful in the cyber-attack method, with an accuracy rate of 95.02%. In the first model, we could predict the types of attacks that the victims were likely to be exposed to with high accuracy. The Logistic Regression was the leading method in detecting attackers with an accuracy rate of 65.42%. In the second model, we predicted whether the perpetrators could be identified by comparing their characteristics. Our results have revealed that the probability of cyber-attacks decreases as the education and income level of victims increases. We believe that cyber-crime units will use the proposed model. It will also facilitate the detection of cyber-attacks and make the fight against these attacks easier and more effective.

Thank You..!!