Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Cyberattacks Detection in IoT-based Smart City Network Traffic
Latest

Cyberattacks Detection in IoT-based Smart City Network Traffic

Last Updated on November 24, 2021 by Editorial Team

Author(s): Abhinav Dubey

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Machine Learning

In this article, different machine learning and deep learning models have been used for the classification of cyberattacks such as DoS, Worms, Backdoor, and many more attacks from normal network traffic and network intrusion detection. UNSW-NB15 Dataset has been used to train the ML and DL models. You can find the complete code, trained models, plots, datasets, preprocessed files here on my GitHubΒ account.

Made usingΒ Draw.io

The whole idea of the Internet of Things is to extend the capability of the Internet beyond computers and smartphones to electronic, mechanical devices, sensors, etc. With the increasing number of use cases of IoT devices, security vulnerabilities have been also increased drastically.

Today IoT devices are used in fire systems, drones, smart homes, healthcare are just to name a few. You can imagine what a disaster it would be if someone with bad intent gets access to these systems. That’s why Network Intrusion Detection System (NIDS) is installed, which analyzes all the traffic and detects malicious traffic, and helps the organization monitor their cloud, on-premise, or hybrid infrastructure.

Dataset

The Pcap files (raw network packets) were created at the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) using the IXIA PerfectStorm tool.

The dataset is officially available at the University of New South Wales website https://research.unsw.edu.au/projects/unsw-nb15-dataset

  • UNSW_NB15.csvβ€Šβ€”β€ŠOriginalΒ Dataset
  • UNSW_NB15_features.csvβ€Šβ€”β€Š49 features with the class label. These features are described in the UNSW-NB15_freatures.csv file.
  • bin_data.csvβ€Šβ€”β€ŠProcessed CSV Dataset file for Binary Classification
  • multi_data.csvβ€Šβ€”β€ŠProcessed CSV Dataset file for Multi-class Classification

Machine Learning ModelsΒ used

Data Preprocessing

  • Dataset had 45 attributes and 175341Β rows.
  • After dropping null values Dataset had 45 attributes and 81173Β rows.
  • The data type of attributes is converted using provided data type information from the featuresΒ dataset.

One-hot Encoding

  • Categorical Columns β€˜proto’, β€˜service’, β€˜state’ are one-hot-encoded using
    pd.get_dummies() and these 3 attributes are removed afterward.
  • data_cat Dataframe had 19 attributes after one-hot-encoding.
  • data_cat is concatenated with the main data dataframe.
  • Total attributes of data dataframeβ€Šβ€”β€Š61

Data Normalization

  • 58 Numeric Columns of DataFrame are scaled using MinMax Scaler in the range 0 toΒ 1.

Preparing for Binary Classification

  • A copy of DataFrame is created for Binary Classification.
  • β€˜label’ attribute is classified into two categories β€˜normal’ and β€˜abnormal’.
  • β€˜label’ is encoded using LabelEncoder(), corresponding encoded labels (0,1) are saved in the β€˜label’ columnΒ itself.
  • Binary datasetβ€Šβ€”β€Š81173 rows, 61Β columns

Preparing for Multi-class Classification

  • A copy of DataFrame is created for Multi-class Classification.
  • β€˜attack_cat’ attribute is classified into 9 categories β€˜Analysis’, β€˜Backdoor’, β€˜DoS’, β€˜Exploits’, β€˜Fuzzers’, β€˜Generic’, β€˜Normal’, β€˜Reconnaissance’, β€˜Worms’.
  • attack_cat is encoded using LabelEncoder(), corresponding encoded labels (0,1,2,3,4,5,6,7,8) are saved in the label attribute.
  • attack_cat is one-hotΒ encoded.
  • Multi-class Datasetβ€Šβ€”β€Š81173 rows, 69Β columns

Feature Selection

  • No. of attributes of β€˜bin_dataβ€™β€Šβ€”β€Š61
  • No. of attributes of β€˜multi_dataβ€™β€Šβ€”β€Š69
  • The Pearson Correlation Coefficient method is used for feature extraction.
  • The attributes with more than 0.3 correlation coefficient with the target attribute label were selected.
  • No. of attributes of β€˜bin_data’ after feature selectionβ€Šβ€”β€Š15
  • β€˜rate’, β€˜sttl’, β€˜sload’, β€˜dload’, β€˜ct_srv_src’, β€˜ct_state_ttl’, β€˜ct_dst_ltm’, β€˜ct_src_dport_ltm’, β€˜ct_dst_sport_ltm’, β€˜ct_dst_src_ltm’, β€˜ct_src_ltm’, β€˜ct_srv_dst’, β€˜state_CON’, β€˜state_INT’, β€˜label’.
  • No. of attributes of β€˜multi_data’ after feature selectionβ€Šβ€”β€Š16
  • β€˜dttl’, β€˜swin’, β€˜dwin’, β€˜tcprtt’, β€˜synack’, β€˜ackdat’, β€˜label’, β€˜proto_tcp’, β€˜proto_udp’, β€˜service_dns’, β€˜state_CON’, β€˜state_FIN’, β€˜attack_cat_Analysis’, β€˜attack_cat_DoS’, β€˜attack_cat_Exploits’, β€˜attack_cat_Normal’.

Splitting Dataset into Training andΒ Testing

  • Randomly splitting the bin_data in 80% for training and 20% forΒ testing.
  • Randomly splitting the multi_data in 70% for training and 30% forΒ testing.
  • Target featureβ€Šβ€”β€Šlabel

Decision Tree Classifier

Binary Classification

  • Accuracyβ€Šβ€”β€Š98.09054511857099
  • Mean Absolute Errorβ€Šβ€”β€Š0.019094548814290114
  • Mean Squared Errorβ€Šβ€”β€Š0.019094548814290114
  • Root Mean Squared Errorβ€Šβ€”β€Š0.13818302650575473
  • R2 Scoreβ€Šβ€”β€Š89.55757103838098
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion=’gini’, max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=’deprecated’, random_state=123, splitter=’best’)
Binary Classification with Decision Tree Classifier

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.19940867279895
  • Mean Absolute Errorβ€Šβ€”β€Š0.06800262812089355
  • Mean Squared Errorβ€Šβ€”β€Š0.20532194480946123
  • Root Mean Squared Errorβ€Šβ€”β€Š0.4531246459965086
  • R2 Scoreβ€Šβ€”β€Š86.17743099336013
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion=’gini’, max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=’deprecated’, random_state=123, splitter=’best’)
Multi-class Classification with Decision Tree Classifier

K-Nearest Neighbor Classifier

Binary Classification

  • Accuracyβ€Šβ€”β€Š98.3061287342162
  • Mean Absolute Errorβ€Šβ€”β€Š0.016938712657838004
  • Mean Squared Errorβ€Šβ€”β€Š0.016938712657838004
  • Root Mean Squared Errorβ€Šβ€”β€Š0.13014880966738807
  • R2 Scoreβ€Šβ€”β€Š90.74435871039374
KNeighborsClassifier(algorithm=’auto’, leaf_size=30, metric=’minkowski’, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights=’uniform’)
Binary Classification with KNN Classifier

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.36777266754271
  • Mean Absolute Errorβ€Šβ€”β€Š0.06508705650459921
  • Mean Squared Errorβ€Šβ€”β€Š0.19411136662286466
  • Root Mean Squared Errorβ€Šβ€”β€Š0.44058071521897624
  • R2 Scoreβ€Šβ€”β€Š86.92848100772136
KNeighborsClassifier(algorithm=’auto’, leaf_size=30, metric=’minkowski’, metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights=’uniform’)
Multi-class Classification with KNN Classifier

Linear Regression Model

Binary Classification

  • Accuracyβ€Šβ€”β€Š97.80720665229443
  • Mean Absolute Errorβ€Šβ€”β€Š0.021927933477055742
  • Mean Squared Errorβ€Šβ€”β€Š0.021927933477055742
  • Root Mean Squared Errorβ€Šβ€”β€Š0.1480808342664767
  • R2 Scoreβ€Šβ€”β€Š88.20923868071647
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Binary Classification with Linear Regression Model

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š95.12976346911958
  • Mean Absolute Errorβ€Šβ€”β€Š0.06824901445466491
  • Mean Squared Errorβ€Šβ€”β€Š0.12146846254927726
  • Root Mean Squared Errorβ€Šβ€”β€Š0.3485232596962178
  • R2 Scoreβ€Šβ€”β€Š91.82055676180129
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Multi-class Classification with Linear Regression Model

Linear Support VectorΒ Machine

Binary Classification

  • Accuracyβ€Šβ€”β€Š97.85032337542347
  • Mean Absolute Errorβ€Šβ€”β€Š0.021496766245765322
  • Mean Squared Errorβ€Šβ€”β€Š0.021496766245765322
  • Root Mean Squared Errorβ€Šβ€”β€Š0.1466177555610688
  • R2 Scoreβ€Šβ€”β€Š88.45167193436498
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=’ovr’, degree=3, gamma=’auto’, kernel=’linear’, max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
Binary Classification with Linear Support VectorΒ Machine

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.59362680683311
  • Mean Absolute Errorβ€Šβ€”β€Š0.059912943495400786
  • Mean Squared Errorβ€Šβ€”β€Š0.17941031537450722
  • Root Mean Squared Errorβ€Šβ€”β€Š0.42356854861345317
  • R2 Scoreβ€Šβ€”β€Š87.93449282205455
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=’ovr’, degree=3, gamma=’auto’, kernel=’linear’, max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
Multi-class Classification with Linear Support VectorΒ Machine

Logistic Regression Model

Binary Classification

  • Accuracyβ€Šβ€”β€Š97.80104712041884
  • Mean Absolute Errorβ€Šβ€”β€Š0.02198952879581152
  • Mean Squared Errorβ€Šβ€”β€Š0.02198952879581152
  • Root Mean Squared Errorβ€Šβ€”β€Š0.1482886671186019
  • R2 Scoreβ€Šβ€”β€Š88.17947258428785
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=5000, multi_class=’auto’, n_jobs=None, penalty=’l2', random_state=123, solver=’lbfgs’, tol=0.0001, verbose=0, warm_start=False)
Binary Classification with Logistic Regression Model

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.58952036793693
  • Mean Absolute Errorβ€Šβ€”β€Š0.060077201051248356
  • Mean Squared Errorβ€Šβ€”β€Š0.18056011826544022
  • Root Mean Squared Errorβ€Šβ€”β€Š0.42492366169165047
  • R2 Scoreβ€Šβ€”β€Š87.87674567880146
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=5000, multi_class=’multinomial’, n_jobs=None, penalty=’l2', random_state=123, solver=’newton-cg’, tol=0.0001, verbose=0, warm_start=False)
Multi-class Classification with Logistic Regression Model

Multi-Layer Perceptron Classifier

Binary Classification

  • Accuracyβ€Šβ€”β€Š98.36772405297197
  • Mean Absolute Errorβ€Šβ€”β€Š0.01632275947028026
  • Mean Squared Errorβ€Šβ€”β€Š0.01632275947028026
  • Root Mean Squared Errorβ€Šβ€”β€Š0.12776055522061674
  • R2 Scoreβ€Šβ€”β€Š91.10646238100463
MLPClassifier(activation=’relu’, alpha=0.0001, batch_size=’auto’, beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(100,), learning_rate=’constant’, learning_rate_init=0.001, max_fun=15000, max_iter=8000, momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5, random_state=123, shuffle=True, solver=’adam’, tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False)
Binary Classification with Multi-Layer Perceptron Classifier

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.54434954007884
  • Mean Absolute Errorβ€Šβ€”β€Š0.06065210249671485
  • Mean Squared Errorβ€Šβ€”β€Š0.17858902759526937
  • Root Mean Squared Errorβ€Šβ€”β€Š0.4225979502970517
  • R2 Scoreβ€Šβ€”β€Š87.97913543550516
MLPClassifier(activation=’relu’, alpha=0.0001, batch_size=’auto’, beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(100,), learning_rate=’constant’, learning_rate_init=0.001, max_fun=15000, max_iter=8000, momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5, random_state=123, shuffle=True, solver=’adam’, tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False)
Multi-class Classification with Multi-Layer Perceptron Classifier

Random Forest Classifier

Binary Classification

  • Accuracyβ€Šβ€”β€Š98.64490298737296
  • Mean Absolute Errorβ€Šβ€”β€Š0.013550970126270403
  • Mean Squared Errorβ€Šβ€”β€Š0.013550970126270403
  • Root Mean Squared Errorβ€Šβ€”β€Š0.1164086342427846
  • R2 Scoreβ€Šβ€”β€Š92.59509512345335
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion=’gini’, max_depth=None, max_features=’auto’, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=123, verbose=0, warm_start=False)
Binary Classification with Random Forest Classifier

Multi-class Classification

  • Accuracyβ€Šβ€”β€Š97.31849540078844
  • Mean Absolute Errorβ€Šβ€”β€Š0.06611366622864652
  • Mean Squared Errorβ€Šβ€”β€Š0.1985052562417871
  • Root Mean Squared Errorβ€Šβ€”β€Š0.4455392869790352
  • R2 Scoreβ€Šβ€”β€Š86.6379909424011
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion=’gini’, max_depth=None, max_features=’auto’, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=50, verbose=0, warm_start=False)
Multi-class Classification with Random Forest Classifier

Get the complete code, models, plots on my GitHubΒ account

GitHub – abhinav-bhardwaj/IoT-Network-Intrusion-Detection-System-UNSW-NB15: Network Intrusion Detection based on various machine learning and deep learning algorithms using UNSW-NB15 Dataset

Citations

  • N. Moustafa and J. Slay, β€œUNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” 2015 Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6, DOI: 10.1109/MilCIS.2015.7348942
  • Nour Moustafa & Jill Slay (2016) The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set, Information Security Journal: A Global Perspective, 25:1–3, 18–31, DOI: 10.1080/19393555.2015.1125974


Cyberattacks Detection in IoT-based Smart City Network Traffic was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓