Making Decisions from IoT Data | by Andrew Blance
IoT Development
How to design scalable IoT architecture in Azure for Real-Time data analysis
Using data to get insight into problems can be difficult. Whether it’s the amount of data you have, the rate you can collect it, or the quality of the data itself, there are numerous roadblocks to becoming data-driven. However, there is another problem, maybe more fundamental to those: you simply are not collecting the information you need to make the decision. This one can be hard to fix.
You run a factory and you want to understand when your production line will malfunction. You work in a shop and want to know the footfall through the building. You want to know the temperature of your flat, so you can adjust the heating accordingly. This is sometimes hard data to get to! There is a solution, however: targeted deployment of Internet of Things (IoT) sensors, aimed at gathering the specific information you will need to make a decision.
This article is about the practicalities of doing just that. We will primarily focus on the creation and deployment of a software stack that is able to handle live data from numerous sensors. This architecture will then have to be able to analyse the data, make AI/ML predictions, and finally visualise everything. This will focus on Azure technologies, though I am sure there are Amazon (or another cloud service) equivalents. First, we will spend a little time upfront discussing hardware considerations.
This article is about the journey I have gone on to understand how to manage and use IoT devices. The solution may not be perfect, but I hope it will give you some things to consider when building your own architecture.
This article is based on a talk, “Getting Insight From Anything”, that I recently gave at the North East Data Science Meetup
So, if we aim put sensors somewhere in the real world we are going to have some constraints. Naively, I thought a good Arduino or Raspberry Pi was all I was going to need. However, the real world has other ideas. Here are some considerations for real-world deployment: (i) is it cheap? (ii) how are you powering it? (iii) how are you connecting it to the internet? (iv) is it safe?
A Raspberry Pi might seem like it fits most of those categories! I spent a lot of time developing a battery-powered, 3G-enabled, packed-full-of-sensors system. But, the truth is these are mostly dev devices. To make a factory, government building, school, etc etc etc, comfortable with a fleet of them sitting on their premises is tough. Is it IP67 rated? Even in a case, is it dustproof? drop-resistant? Are you sure the battery is safe? (Li-ion batteries are pretty dangerous, actually!). Frankly, a board with cables and sensors coming off it can make people nervous — they wonder if this thing is legit? The list of challenges goes on! The step between Raspberry Pi with a breadboard and a production-ready system is enormous. By the time you’ve crossed that gap, you may have found yourself accidentally in the hardware business. I learnt this pretty late.
The solution here is to find a production-ready sensor setup. I learned of Monnit — a manufacturer of sensors. They provide a hub, which connects to Azure with a 3G sim. Ideal — we don’t need to connect to a company’s WiFi (connecting strange devices to WiFi makes Cyber security people very nervous!) This hub then also connects to a set of sensors, ranging from temperature/ humidity to CO2 and an accelerometer.
These sensors have super long battery lives, Monnit claims (and my tests back this up) that they will last multiple years without a charge. Ideal — no annoying cables hanging about the place, and no need for people to be constantly charging them! Importantly, they are cheap and conform to industrial standards. These are my preferred hardware choice when it comes to IoT data collection.
So, for example, we could attach an accelerometer to an industrial saw to measure vibrations (a good indicator of failure), and have the sensor send that to the Monnit hub, which in turn drops the data into Azure for analysis.
With our hardware decided, we need to sort out the software. Our software solution needs to consider multiple things that a smaller project with 1 Raspberry Pi might not: (i) how do we handle multiple devices and their settings, (ii) how do we handle and store incoming live data, and (iii) how to make this scalable, ensuring it will work with 1 or 1,000 sensors.
To do this we will stay entirely in the Azure ecosystem. Let’s take an initial look at our architecture:
We have split the process into three separate columns. The goal with this is to have a generic “middle” layer, where cleaned sensor data and other live data can eventually land. For a user accessing it here, it should now be “generic” — they do not need to care about the hardware itself. We can drop in and out other sensors or API calls, and someone accessing the data should not be affected.
Let’s go through the architecture, and explain what these services are and why they have been chosen.
Streaming Data Input
- IoT Hub: This is Azure’s service for IoT device management. Here, I can register devices and control their firmware. It also allows me to message the device, and receive messages back. In this scenario, the message I receive is the data from the sensors.
- Functions: Microsoft describes this as “Functions as a Service” (why is everything a service now?). Functions are serverless compute, allowing you to write code that will be triggered by an action, without having to handle the hardware it will run on in literally any way. What’s nice about staying within the Azure stack is you don’t have to hardcode file paths (usually). Typically, you can just point it to another Azure service, and tell the Function to trigger when that service is used. For example, Functions could be triggered when a message is received in IoT Hub, or when a file is dropped into storage. Furthermore, they are deeply cheap — some back-of-the-napkin maths suggest 200 million function calls may cost around $35.
In this architecture, we are using a Function to clean some particularly messy data (more on that later)
Generic Event Handling
- Event Hub: This is (very loosely), IoT Hub without device management. We wanted somewhere to land all data. Somewhere generic that other developers could call to access what we have gathered, regardless of what IoT device it came from, or even if it came from another source entirely (ie, calling an API to get live weather data).
Events that have been cleaned and landed here are now ready for analysis.
Live/ Historical
- Streaming Analytics: Microsoft’s naming convention can be very literal sometimes. Streaming Analytics (SA) is a platform to handle the analysis of streaming data. If you look at most architectural drawings Microsoft provides for IoT setups, SA is usually connected directly to IoT Hub. However, we found this very difficult to do. This service uses a SQL-like language which is quite restrictive, and we found it impossible to clean the data and perform the analysis within SA. It was suggested that we start piping one SA instance into another, but this quickly became prohibitively expensive. Therefore, cleaning is done by Functions, before SA performs the analysis.
The language SA uses is fantastic for performing things like windowing, and aggregations. You can quickly form moving averages, and prepare data for the next stage: visualisation - Power BI: So far, everything has been happening live: SA is triggered by Event Hub, triggered itself by a Functions activity, which is watching the output of IoT Hub. We are going to continue this by having Power Bi plot data live. Power Bi is a model-first Bi solution, rather than a visualisation-first Bi solution. A really clever Bi developer I know once said that within earshot of me, and now I parrot it endlessly. I take it to mean that Power Bi has an incredibly strong modelling engine, and a powerful query language (DAX). In the end, it makes really nice visualisation too.
Power Bi on your desktop has some refresh limits, but there is an online-only dashboard version of it that is truly live. You can point Streaming Analytics directly at this live dashboard, making it simple to integrate.
In the diagram above, we regularly pumped data out into storage. This is then surfaced in traditional Power Bi.
This setup we have allows us to handle live data and historical data. It allows us to access IoT data and other live sources from the same system. Having IoT Hub gives us access to device management. Finally, Azure tools have great scalability — Function calls are cheap and fast, you get hundreds of thousands of messages for free within IoT Hub, Power Bi is going to be able to handle huge datasets.
So far our solution can handle live data, and make some insight into it via Streaming Analytics. However, the next step is to perform more in-depth data analysis with the machine learning stack Azure lets us have access to.
Two services I find interesting are:
- Time Series Analysis: Again, Microsoft’s naming conventions are very literal. Time Series Analysis (TSA) does exactly what it says it does: it allows you to apply prebuilt standard time series analysis techniques to time series data. In fact, you can point it directly to an instance of IoT Hub. My reservations here come from using Streaming Analytics: big promises were made about how easy that is to integrate into IoT Hub too, and I ended up needing to augment it with Functions. Will the same be true for this?
- Azure Machine Learning: Azure’s machine learning service is fantastic. It is my go-to tool for machine learning development. With a large range of features (model tracking, pipelines, and really almost everything else you would expect from a modern ML platform) it is a good choice for implementing a machine-learning model. You can point this towards storage, and use it to perform analysis on the historical data we have just gathered. Results could be surfaced within Power Bi
As I said at the start, this is also a journey I am going on. This end part, the pointy analysis piece, is the least developed. However, Azure provides a lot of options for exploration.
It is hard to get insight into every process you want to understand. Sometimes, you simply don’t have the data to make the decisions you want to. However, there are instances where you can deploy cheap, flexible, non-intrusive IoT devices to perform highly-targeted data gathering. This data could then be used to support you in making a good decision.
Most of this article talked about the software stack that I have been building to handle live data collection from a variety of different sensors. It is heavily focused on Azure technologies. It makes use of IoT Hub to handle the devices and landing of data, Functions to clean everything up, Streaming Analytics to gain some insights and Power Bi for live data visualisation.
Going forward, Azure provides two (and probably more than that!) interesting tools to automate decision-making and gain more knowledge from the data we have gathered: Time Series Analysis and Machine Learning. Going forward I’m looking forward to playing with these more!
I hope you have enjoyed this article, and have it useful for creating your own architectures!
IoT Development
How to design scalable IoT architecture in Azure for Real-Time data analysis
Using data to get insight into problems can be difficult. Whether it’s the amount of data you have, the rate you can collect it, or the quality of the data itself, there are numerous roadblocks to becoming data-driven. However, there is another problem, maybe more fundamental to those: you simply are not collecting the information you need to make the decision. This one can be hard to fix.
You run a factory and you want to understand when your production line will malfunction. You work in a shop and want to know the footfall through the building. You want to know the temperature of your flat, so you can adjust the heating accordingly. This is sometimes hard data to get to! There is a solution, however: targeted deployment of Internet of Things (IoT) sensors, aimed at gathering the specific information you will need to make a decision.
This article is about the practicalities of doing just that. We will primarily focus on the creation and deployment of a software stack that is able to handle live data from numerous sensors. This architecture will then have to be able to analyse the data, make AI/ML predictions, and finally visualise everything. This will focus on Azure technologies, though I am sure there are Amazon (or another cloud service) equivalents. First, we will spend a little time upfront discussing hardware considerations.
This article is about the journey I have gone on to understand how to manage and use IoT devices. The solution may not be perfect, but I hope it will give you some things to consider when building your own architecture.
This article is based on a talk, “Getting Insight From Anything”, that I recently gave at the North East Data Science Meetup
So, if we aim put sensors somewhere in the real world we are going to have some constraints. Naively, I thought a good Arduino or Raspberry Pi was all I was going to need. However, the real world has other ideas. Here are some considerations for real-world deployment: (i) is it cheap? (ii) how are you powering it? (iii) how are you connecting it to the internet? (iv) is it safe?
A Raspberry Pi might seem like it fits most of those categories! I spent a lot of time developing a battery-powered, 3G-enabled, packed-full-of-sensors system. But, the truth is these are mostly dev devices. To make a factory, government building, school, etc etc etc, comfortable with a fleet of them sitting on their premises is tough. Is it IP67 rated? Even in a case, is it dustproof? drop-resistant? Are you sure the battery is safe? (Li-ion batteries are pretty dangerous, actually!). Frankly, a board with cables and sensors coming off it can make people nervous — they wonder if this thing is legit? The list of challenges goes on! The step between Raspberry Pi with a breadboard and a production-ready system is enormous. By the time you’ve crossed that gap, you may have found yourself accidentally in the hardware business. I learnt this pretty late.
The solution here is to find a production-ready sensor setup. I learned of Monnit — a manufacturer of sensors. They provide a hub, which connects to Azure with a 3G sim. Ideal — we don’t need to connect to a company’s WiFi (connecting strange devices to WiFi makes Cyber security people very nervous!) This hub then also connects to a set of sensors, ranging from temperature/ humidity to CO2 and an accelerometer.
These sensors have super long battery lives, Monnit claims (and my tests back this up) that they will last multiple years without a charge. Ideal — no annoying cables hanging about the place, and no need for people to be constantly charging them! Importantly, they are cheap and conform to industrial standards. These are my preferred hardware choice when it comes to IoT data collection.
So, for example, we could attach an accelerometer to an industrial saw to measure vibrations (a good indicator of failure), and have the sensor send that to the Monnit hub, which in turn drops the data into Azure for analysis.
With our hardware decided, we need to sort out the software. Our software solution needs to consider multiple things that a smaller project with 1 Raspberry Pi might not: (i) how do we handle multiple devices and their settings, (ii) how do we handle and store incoming live data, and (iii) how to make this scalable, ensuring it will work with 1 or 1,000 sensors.
To do this we will stay entirely in the Azure ecosystem. Let’s take an initial look at our architecture:
We have split the process into three separate columns. The goal with this is to have a generic “middle” layer, where cleaned sensor data and other live data can eventually land. For a user accessing it here, it should now be “generic” — they do not need to care about the hardware itself. We can drop in and out other sensors or API calls, and someone accessing the data should not be affected.
Let’s go through the architecture, and explain what these services are and why they have been chosen.
Streaming Data Input
- IoT Hub: This is Azure’s service for IoT device management. Here, I can register devices and control their firmware. It also allows me to message the device, and receive messages back. In this scenario, the message I receive is the data from the sensors.
- Functions: Microsoft describes this as “Functions as a Service” (why is everything a service now?). Functions are serverless compute, allowing you to write code that will be triggered by an action, without having to handle the hardware it will run on in literally any way. What’s nice about staying within the Azure stack is you don’t have to hardcode file paths (usually). Typically, you can just point it to another Azure service, and tell the Function to trigger when that service is used. For example, Functions could be triggered when a message is received in IoT Hub, or when a file is dropped into storage. Furthermore, they are deeply cheap — some back-of-the-napkin maths suggest 200 million function calls may cost around $35.
In this architecture, we are using a Function to clean some particularly messy data (more on that later)
Generic Event Handling
- Event Hub: This is (very loosely), IoT Hub without device management. We wanted somewhere to land all data. Somewhere generic that other developers could call to access what we have gathered, regardless of what IoT device it came from, or even if it came from another source entirely (ie, calling an API to get live weather data).
Events that have been cleaned and landed here are now ready for analysis.
Live/ Historical
- Streaming Analytics: Microsoft’s naming convention can be very literal sometimes. Streaming Analytics (SA) is a platform to handle the analysis of streaming data. If you look at most architectural drawings Microsoft provides for IoT setups, SA is usually connected directly to IoT Hub. However, we found this very difficult to do. This service uses a SQL-like language which is quite restrictive, and we found it impossible to clean the data and perform the analysis within SA. It was suggested that we start piping one SA instance into another, but this quickly became prohibitively expensive. Therefore, cleaning is done by Functions, before SA performs the analysis.
The language SA uses is fantastic for performing things like windowing, and aggregations. You can quickly form moving averages, and prepare data for the next stage: visualisation - Power BI: So far, everything has been happening live: SA is triggered by Event Hub, triggered itself by a Functions activity, which is watching the output of IoT Hub. We are going to continue this by having Power Bi plot data live. Power Bi is a model-first Bi solution, rather than a visualisation-first Bi solution. A really clever Bi developer I know once said that within earshot of me, and now I parrot it endlessly. I take it to mean that Power Bi has an incredibly strong modelling engine, and a powerful query language (DAX). In the end, it makes really nice visualisation too.
Power Bi on your desktop has some refresh limits, but there is an online-only dashboard version of it that is truly live. You can point Streaming Analytics directly at this live dashboard, making it simple to integrate.
In the diagram above, we regularly pumped data out into storage. This is then surfaced in traditional Power Bi.
This setup we have allows us to handle live data and historical data. It allows us to access IoT data and other live sources from the same system. Having IoT Hub gives us access to device management. Finally, Azure tools have great scalability — Function calls are cheap and fast, you get hundreds of thousands of messages for free within IoT Hub, Power Bi is going to be able to handle huge datasets.
So far our solution can handle live data, and make some insight into it via Streaming Analytics. However, the next step is to perform more in-depth data analysis with the machine learning stack Azure lets us have access to.
Two services I find interesting are:
- Time Series Analysis: Again, Microsoft’s naming conventions are very literal. Time Series Analysis (TSA) does exactly what it says it does: it allows you to apply prebuilt standard time series analysis techniques to time series data. In fact, you can point it directly to an instance of IoT Hub. My reservations here come from using Streaming Analytics: big promises were made about how easy that is to integrate into IoT Hub too, and I ended up needing to augment it with Functions. Will the same be true for this?
- Azure Machine Learning: Azure’s machine learning service is fantastic. It is my go-to tool for machine learning development. With a large range of features (model tracking, pipelines, and really almost everything else you would expect from a modern ML platform) it is a good choice for implementing a machine-learning model. You can point this towards storage, and use it to perform analysis on the historical data we have just gathered. Results could be surfaced within Power Bi
As I said at the start, this is also a journey I am going on. This end part, the pointy analysis piece, is the least developed. However, Azure provides a lot of options for exploration.
It is hard to get insight into every process you want to understand. Sometimes, you simply don’t have the data to make the decisions you want to. However, there are instances where you can deploy cheap, flexible, non-intrusive IoT devices to perform highly-targeted data gathering. This data could then be used to support you in making a good decision.
Most of this article talked about the software stack that I have been building to handle live data collection from a variety of different sensors. It is heavily focused on Azure technologies. It makes use of IoT Hub to handle the devices and landing of data, Functions to clean everything up, Streaming Analytics to gain some insights and Power Bi for live data visualisation.
Going forward, Azure provides two (and probably more than that!) interesting tools to automate decision-making and gain more knowledge from the data we have gathered: Time Series Analysis and Machine Learning. Going forward I’m looking forward to playing with these more!
I hope you have enjoyed this article, and have it useful for creating your own architectures!