Confusion Matrix Role in Cyber Crime Detection

Spam mail detection with the help of confusion matrix.

What is Cyber Crime?

Cybercrime is defined as a crime where a computer is the object of the crime or is used as a tool to commit an offense. A cybercriminal may use a device to access a user’s personal information, confidential business information, government information, or disable a device. It is also a cybercrime to sell or elicit the above information online.

Some types of cybercrimes

  • Email and internet fraud.
  • Identity fraud.
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyber extortion).

Email Classification (binomial) :

So our problem is to classify the incoming emails into two categories of useful and spam. For this we are using the Spambase Data Set. In this dataset emails has 57 different independent features and using these features we have to classify the emails in two outcome categories : ‘spam’ & ‘normal’.

  • How many of the actual spam emails were predicted as spam?
  • How many as normal?
  • Were some normal emails predicted as spam?
  • How many normal emails were predicted correctly?

Confusion Matrix :

The confusion matrix was initially introduced to evaluate results from binomial classification. Thus, the first thing to do is to take one of the two classes as the class of interest, i.e. the positive class. In the target column, we need to choose (arbitrarily) one value as the positive class. The other value is then automatically considered the negative class. This assignment is arbitrary, just keep in mind that some class statistics will show different values depending on the selected positive class.

So in our problem we chose the spam emails as the positive class and the normal emails as the negative class.

The confusion matrix gives count of four different numbers belonging to each class :

Binomial email classification matrix.

Mesaures to calculate model performance :

The four different counts in the confusion matrix, we can calculate a few class statistics measures to quantify the model performance.The class statistics summarizes the model performance for the positive and negative classes separately. To learn about these class statistics measure you can click here.

Multivariate Email Classification model :

We can also use confusion matrix for multinomial classification model. suppose we have to classify emails into three categories such as ‘normal, ‘advertisment’ and ‘spam. So here also like in binomila classification, the target class values are assigned to the positive and the negative class. Here we define spam as the positive class and the normal and ad emails as the negative class.

Multivariate Email Calssification confusion matrix.

After this these four statistics will be used to calculate the model performance accuracy with the help of statistics measure discussed above.

Conclusion :

  • In this article, we have seen what is cybercrime and its various types.
  • We have taken one cybercrime problem of spam emails to build a classification model for incoming emails to detect spam mails using confusion matrix.
  • The confusion matrix shows the performance of a classification model: how many positive and negative events are predicted correctly or incorrectly. These counts are the basis for the calculation of more general class statistics metrics. Here, we reported those most commonly used: sensitivity and specificity, recall and precision, and the F-measure.
  • Confusion matrix and class statistics have been defined for binomial classification problems. However, we have shown how they can be easily extended to address multinomial classification problems.

Thank you very much for reading the article!!!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store