Training a Deep Learning Model to Detect DoS Attacks on Microcontrollers | by Déborah Mesquita | Apr, 2023

By Jessie Hobb On Apr 11, 2023

An end-to-end project walkthrough, including some helpful assists from ChatGPT

The ESP32 is an MCU (microcontroller unit) widely used in IoT projects due to its low cost and ESP-IDF (Espressif IoT Development Framework), its development framework. The board has a dual-core 32 bits processor, single-chip Wi-Fi and Bluetooth connectivity and is widely used in IoT, home automation, and robotics projects.

IoT applications can solve problems in every industry, including agriculture, home controllers and smart cities. The sad thing is that I don’t see IoT projects in our lives yet, mainly here in Brazil. When I think of IoT projects the first thought that comes to mind is that it’s something very technological and that we wouldn’t be able to implement without lots of money. Fortunately, that’s not true now that we have boards like the ESP32 because with about $3 we can conceive and experiment with IoT project ideas.

After hearing about the ESP32 my goal was to find a way to insert myself into this IoT world. LACNIC is an international organization responsible for assigning and managing Internet number resources and contributing to regional Internet development of Latin American and Caribbean. An area of research work of the organization is Cryptography, Security and Resilience. I was missing the feeling of being part of a research project, so I submitted a proposal for a technical paper with emphasis on IoT and network security as part of their IT Women Mentoring Program.

The main reference for my project was T800: Firewall tool and benchmark for IoT [1], a project from Brazilian researchers that creates a packet filter for the ESP32 for scanning attacks. They made all their code available on Github. Since this is the first time I’m working with the ESP32 and the lwIP stack (more on that later) I would be completely lost without this reference.

Cool, but what is my project anyways? Security should be an essential part of the development of IoT systems, and these devices are easy targets for cyberattacks mainly because of the poor cycles of updates and maintenance [1]. My work focuses on detecting volumetric attacks, more specifically on detecting DoS (Denial of Service) attacks. In volumetric attacks, the attacker’s goal is to send many network packets to a device in a short period of time, aiming to interrupt the operation of the system.

The main task was then to train a machine learning model to detect DoS attacks and deploy it on the ESP32. Most of the models trained to detect DoS and DDoS attacks use volumetric and statistical features, like “average duration of aggregated records” and “source-to-destination packet count”. My quest was to figure out if we could create a model trained on raw packet features, like tcp.window_size and udp.length. It was quite a journey and today we’ll talk more about the main roadblocks and how ChatGPT came to my rescue in some of them.

The dataset I’m using is the CIC IoT Dataset 2022, created by the Canadian Institute for Cybersecurity (CIC) with the goal of a generating state-of-the-art dataset for profiling, behavioral analysis and vulnerability testing of different IoT devices [2]. They used wireshark as the network protocol analyzer, capturing and saving the packets of the network in pcap files.

For the modeling part my main reference was DeepDefense: Identifying DDoS Attack via Deep Learning [3]. They designed a recurrent deep neural network to learn patterns from sequences of network traffic by using only 20 network traffic fields.

To communicate with other network devices the ESP32 needs support for the TCP/IP protocols. For that it uses lwIP (lightweight IP), an open-source stack of TCP/IP protocols designed to work in embedded systems with low memory and low computational power. My first task was to figure out which features are available in the lwIP implementation of ESP32 and find how to extract them from the pcaps.

In the beggining I was reading the wireshark docs and reading the esp-lwip code to try to find the correspondent features, but then I thought “what if I ask chatGPT”? Turns out the LLM was super useful for that:

Using chatGPT to get “ip.ttl” inside the ESP32 code

Of course I had to do some tweaks and the code from [1] was my main guide, but I saved a lot of time by using the model to help me get the name and the variables with the features I wanted.

The dataset has 3 pcap with Flood attacks for each IoT device. The attacks were carried out based on HTTP, UDP and TCP protocols. I thought “ok, I could use 2 attacks for training/validation and 1 for testing”, but the main problem was how to merge the malicious traffic with the legitimate one to create the data splits.

pcaps for the attacks on one device, each pcap is has only malicious traffic

The legitimate traffic of the dataset was captured day by day, so we have a pcap for each day (there are 30 days in total). My first plan was the use a recurrent deep neural network like in [3], and since this kind of neural network would be learning patterns from sequences of network traffic, taking random sample packets for each bucket (legitimate vs. malicious) would not work well. To make things worse all the papers I’ve read only say they’ve used 80/20 splits but don’t explain how they’ve done it.

My goal was to make the train and the test datasets as close as a real-world scenario. After a lot of thinking and back and forth I came to a solution that I’m not sure if totally accomplishes that goal but that is something. The strategy was to use the _ws.col.Time feauture of the pcaps to insert random attacks at different times. The algorithm goes like this:

Take a day full of legitimate packets as the starting point.
For each attack pcap, insert some attack packets (50000 to 70000) at random places in the full day-packets, adjusting the _ws.col.Time of the attack packets and sorting the dataset every time.
Repeat step 2 for each device pcap, using the first 2 attack pcaps for each device (the third pcaps will be used in the test dataset).

For the test dataset I do the same thing (starting with another full day of legitimate packets) but instead of only taking 50000 to 70000 attack packets I use all the packets of the attacks. This makes the test dataset very unbalanced but it’s what would happen in real life.

To program for the ESP32 we use C++. Besides some basic C classes in college I had no experience with it, but I thought “I’ve been programming with Python for a long time, I’ll only have to deal with some changes in the syntax, I’ll be fine”. Indeed I was fine most of the time, but for some things it was hard to find the right query to search exactly what I was looking for and thus chatGPT became very handy.

One example was with the & operator. If you’re a Python programmer and never programmed in C++, what do you think the next line code does?

(TCPH_FLAGS(tcphdr) & TCP_SYN)

Returns 0 or 1 whether they’re the same or not, right? WRONG!

chatGPT helping us figure out what the code does

In the training dataset the values for tcp.flags.syn are 0 and 1, so to get the same when we deploy the model on the ESP32 we need to do this:

input->data.f[7] = (TCPH_FLAGS(tcphdr) & TCP_SYN) ? 1 : 0;

This simple question to chatGPT saved me a lot of potential debugging time. Thanks chatGPT.

The model uses 10 features and the architecture is very simple:

train_features = [
"_ws.col.Time",
"ip.hdr_len",
"ip.flags.df",
"ip.frag_offset",
"ip.proto",
"ip.ttl",
"tcp.window_size",
"tcp.flags.syn",
"tcp.flags.urg",
"tcp.hdr_len",
]simple_nn_model = tf.keras.models.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(10), dtype=tf.float32),
tf.keras.layers.Dense(9, activation="relu", kernel_regularizer="l2"),
tf.keras.layers.Dense(9, activation="relu", kernel_regularizer="l2"),
tf.keras.layers.Dense(units=1, activation="sigmoid", kernel_regularizer="l2"),
],
name="simple_nn",
)
simple_nn_model.compile(
loss="binary_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"],
)

The accuracy was 97% for the test dataset.

When I started the research I wondered why we don’t just use a threshold to packets per second or something, since DoS attackers send many network packets to the device in a short period of time. This doesn’t work because there are different volumetric attacks with different characteristics. A machine learning model is then the best choice because they’re more robust and can potentially detect suspicious traffic regardless of the traffic rate.

But the answer to whether the model works well or not also depends on the application and the IoT system in place. I wanted to get a glimpse of the model performance on an IoT system, so I’ve tested it with a simple UDP server application running on the ESP32.

I’ve used a Python script to create the legitimate traffic and another Python script as the attacker. These were the results:

|                    | Malicious packets dropped (true positives) | Malicious packets processed (false negatives) | Legitimate packets processed |
| ------------------ | ------------------------------------------ | --------------------------------------------- | ---------------------------- |
| Without dosguard32 | \-                                         | 1798                                          | 20                           |
| With dosguard32    | 1855                                       | 495                                           | 20                           |

The attacks ran for 60 seconds and the real traffic ran for 80 seconds. The total processed packets include all packets (legitimate and malicious) and the total legitimate packets include only the packets that came from my simulated real sensor data. The firewall exhibited a considerable reduction of 72.44% in processing malicious packets, while not disrupting the legitimate traffic. The recall for malicious packets was 0.789, which is significantly lower than the outcomes for the test dataset. It is important to note that the results of the UDP server experiment are preliminary in nature, as they are based on a single execution.

If you’re curious this was the code I’ve used to simulate the real traffic (and yes, I’ve asked chatGPT for that as well):

import socket
import random
import timestart_time = time.time()
# UDP server config
UDP_IP = "10.0.0.105" 
UDP_PORT = 3333 
# Create socket UDP
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# Sending data
while (time.time() - start_time) < 80:
temperature = round(random.uniform(18.0, 30.0), 2)
humidity = round(random.uniform(30.0, 80.0), 2)
data = f"Temp: {temperature} C, Humidity: {humidity} %".encode()
sock.sendto(data, (UDP_IP, UDP_PORT))
time.sleep(1)

I think that the main next step would be to research how to create training and test datasets that really resemble real network traffic. On the test dataset we got 99% of recall and in my “real scenario” test a lot of attack packets were still classified as legitimate. Maybe the characteristics of my Python script attacks were very different from the attacks of the training dataset? Did the model overfit? These would be nice research questions to continue the research.

This project got me totally out of my zone since I had to work with a lot of new things:

The TCP/IP protocol (computer networks)
DoS attacks (cybersecurity)
The TPC/IP protocol implementation used in the ESP32 (embedded devices)
Understand how to use TensorFlow Lite to embed the model in the ESP32 (embedded devices + data science)

By far the hardest part was working with the TCP/IP stack on the ESP32, mainly because there is not much study material about it. Fortunately, the authors of [1] made their code available and I got on track based on their work. This is one of the reasons why I love the open-source community ❤.

To my surprise, chatGPT also helped me a lot. I was skeptical if it would be able to explain to me some lwIP methods and variables but it did a really good job. It felt like I had a lwIP expert I could ask questions and discuss ideas.

There is a lot of debate about whether LLM will replace us and blablabla but we also need to talk about how they are game changers. They are very powerful in increasing our capacity of learning new things and making knowledge more accessible to everyone.

Of course you can check the code of everything we’ve talked about today here: https://github.com/dmesquita/dosguard32

And if you got to here thank you very much for reading! 😀

[1] Fernandes, Gabriel Victor C., et al. “Implementaçao de um filtro de pacotes inteligente para dispositivos de Internet das Coisas.” Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos. SBC, 2022.

[2] Sajjad Dadkhah, Hassan Mahdikhani, Priscilla Kyei Danso, Alireza Zohourian, Kevin Anh Truong, Ali A. Ghorbani, “Towards the development of a realistic multidimensional IoT profiling dataset”, Submitted to: The 19th Annual International Conference on Privacy, Security & Trust (PST2022) August 22–24, 2022, Fredericton, Canada.

[3] Yuan, Xiaoyong, Chuanhuang Li, and Xiaolin Li. “DeepDefense: identifying DDoS attack via deep learning.” 2017 IEEE international conference on smart computing (SMARTCOMP). IEEE, 2017.

[4] Hamza, Ayyoob, Hassan Habibi Gharakheili, and Vijay Sivaraman. “IoT network security: requirements, threats, and countermeasures.” arXiv preprint arXiv:2008.09339 (2020).