Travel Time Estimation Using Quadkeys | by João Paulo Figueira | Sep, 2022
This article explains how to estimate travel times using known speed vectors indexed by quadkeys
How long is your road trip going to last? If you are like me, you go to Google maps or Here maps and query the route directly. You will not only get the fastest or shortest paths, but you will also get estimates of your travel times. These services report the most common speeds according to the learned time patterns and live traffic information and do so for the typical vehicles on the road.
These routing services are less valuable if you have a fleet of specialized vehicles, like garbage collection trucks, tuk-tuks, or even plain trucks. For instance, the maximum allowed highway speeds tend to vary according to the vehicle classification, and in Europe, trucks have lower top allowed highway speeds than light vehicles. Some commercial map routing products support truck routing, but you can roll your own if you have a fleet of specialized vehicles.
What if you could make these predictions using your own privately-collected speed vector data? It is not uncommon for fleets to collect telematics data for various purposes ranging from fuel savings to traffic safety. Telematics data usually includes GPS information that, besides the sampled location, also reports the corresponding speed vector, containing both the absolute speed and the direction (bearing). If you have such data, you can use it to predict your fleet’s expected travel times using the approach I describe in this article.
We must start with a substantial speed vector dataset covering an area of interest. In this article, I will use the Extended Vehicle Energy Dataset. Here’s a quote from the paper’s presentation that explains the main differences from its previous incarnation, the Vehicle Energy Dataset:
This work presents an extended version of the Vehicle Energy Dataset (VED), which is a openly released large-scale dataset for vehicle energy consumption analysis. Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e.g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption.
The Vehicle Energy Dataset collects trip information from November 2017 to November 2018 on 383 personal cars in Ann Arbor, Michigan. The Extended Vehicle Energy Dataset enhances the original by correcting the GPS fluctuations through map-matching, “snapping” the sampled locations to the known roads, and adding much more contextual information.
Each record of this dataset contains both the original and the map-matched GPS locations, the vehicle reported speed, and a set of additional properties that describe the vehicle’s state and spot. Unfortunately, the dataset does not include the speed directions, so we are missing half of the speed vector information. We will solve this problem by inferring the angles using the geospatial data of the location sequences (see below).
Travel Time Estimation
To estimate travel time, we must start by identifying the trip endpoints using direct geospatial coordinates or address geocoding. Next, we find the route that fits our requirements using a graph with the geospatial description of the area of concern. We create this graph from publicly available maps, the OpenStreetMap provider, using the OSMnx Python package to acquire and enhance the data.
The resulting path is a sequence of graph nodes and directed edges, where the nodes contain geospatial coordinates. The edges carry descriptive properties, such as the distance between endpoints, travel time, maximum speed, etc.
With the route information, we can now query the database of known speed vectors and extract the ones closest to the route with the same approximate direction or bearing. We can even impose that the query returns the speed vectors recorded at a specific time interval (day of week or hour of the day).
Bearings
Unfortunately, the extended dataset does not include the speed directions associated with each sampled GPS location. The vector direction is known as the bearing, measured as an angle in degrees from true North in the clockwise direction, as depicted in Figure 1.
We will use an approximation based on the natural location sequences to calculate the collected GPS speed vector bearings. The GPS location sequences in the data belong to vehicle trips, each with its unique identifier, and sorted according to the monotonically increasing timer. For any of these sequences, we project the bearing of the line connecting two sequential locations into the second location. The exception occurs at the first location, which receives the same direction as the second. We will use these angles as criteria when querying the speed vector database. Figure 2 below illustrates the bearing calculation process.
I used the “Movable Type Scripts” online resource to calculate the bearing from the end locations, a treasure trove for geospatial algorithms and formulae. The function that implements the vectorized bearing calculation is shown below in Figure 3.
Note that for N points, the function calculates N-1 bearings. We then need to use the rule depicted in Figure 2 to assign bearings to the individual locations of the speed measurements.
Space Discretization
We need to query the sampled input vectors database repeatedly to infer an arbitrary trip’s speeds. Due to its sheer size of over twenty-two million records, we need a very efficient way to index the sampled vector database. Following a simplistic approach and avoiding complex geospatial indexing, we will use a fixed space partition with quadkeys. Alternatively, we could use Uber H3’s hexagonal partitioning with similar results.
A square space discretization allows quick implicit indexing and a canvas to draw on. Why do we need to draw? Once we have the trip path, we can break it down to the intersecting quadkeys and use those to query the speed vector database. Figure 4 illustrates the process.
The process depicted in Figure 4 amounts to drawing lines in the quadkey tile space. We determine the tile endpoints for each graph edge and then use a line drawing algorithm to decide which quadkeys intersect the path. Finally, we use the list of resulting quadkeys and the edge bearing to query the database for sampled speed vectors.
We then estimate an average speed for each trip edge and reconstruct the total trip time using the individual edge distances.
Time Discretization
We should expect that for the same road segment and in the same direction, traffic speeds will vary during the day and throughout the week. Traffic density is closely related to vehicle speeds; we should expect higher velocities when there is low traffic density and vice versa. Rush hour is a typical density cycle with slow speeds, and weekends should also display different traffic behaviors than weekdays. If we want to capture these speed changes on a micro level, we are better off by discretizing time.
In this article, I will use a simple time discretization technique, splitting the one year into seven weekdays and each day into one hundred and forty-four ten-minute intervals. This time discretization allows us to research weekly and daily patterns.
We now focus on the implementation details of this article’s code. Please start by heading over to the accompanying GitHub repository and cloning it.
Data Acquisition
To download and decompress the dataset, you should use the first notebook. The code is as simple as it gets. Once ready, you can import the data into the supporting database.
Data Import
Like in many articles before this one, I used an SQLite database to hold all the data, and I implemented the process of importing the data in the second notebook. The code merely iterates through the expanded CSV files, reads the data into a Pandas DataFrame, and then inserts it into the database. Note that the code first reads the CSV file into a string and then filters out the semicolon character because some files have this character at the line ending, which defeats the standard CSV import.
Bearing and Quadkey Calculation
Now we can calculate bearings and quadkeys from the sampled speed vector data. To do this, please use the third notebook. As you can see, it merely calls an external Python script to do all the heavy lifting. The code executes in parallel to leverage the fact that it calculates per vehicle trip. Note that this code will take a long time to run (expect at least ninety minutes).
Time Slot Calculation
Time slot calculation requires two extra columns in the signals table, the weekday and the time slot. We calculate weekdays based on the information that the “day number” column contains an offset into November 1st, 2017. As discussed in the original GitHub repository documentation, a value of one means midnight of that day (a Wednesday). The decimal represents a fraction of that day, so 1.5 means noon that same day. Using this information, we can easily break the day numbers down to the respective weekdays and time slots. We will use ten-minute buckets for the time slots, meaning each day has 144 such buckets. To update the table, we use the following SQL query:
Note that the query above encodes Sunday as zero. You can adapt to your preference.
Run notebook number four to update your database with the time slots. We are now ready to predict trip times.
Now that we have set up all the preconditions let us illustrate the process of estimating an arbitrary trip in Ann Arbor using the EVED. Please refer to notebook number five for the actual implementation.
Load and Prepare Map Data
Our first step is to load the OpenStreetMap data on Ann Arbor. We will use Geoff Boeing’s brilliant OSMnx Python package to do this.
Figure 6 shows the single line of code required to load the drivable road information for Ann Arbor from OpenStreetMap. I explicitly avoid the network simplification step to get better resolution when matching trip endpoint locations to the road graph nodes. Figure 7 below shows what the loaded road graph looks like.
After loading the road network, we need to add a few missing pieces of information, namely edge speeds, travel times, and bearings. Fortunately, we have help from OSMnx to do that, as shown in Figure 8 below.
Now that the data is ingested and prepared, we can proceed to the route definition process.
Define the Route
We define a route using the most straightforward approach possible by specifying its endpoints. I adapted the method and the code you find here from the excellent article below.
The code in Figure 9 below illustrates the process of obtaining a route from its endpoints. We start by defining the addresses, geocoding them into geospatial locations, mapping these to the nearest graph nodes, and finally obtaining the route. The resulting route object is a list of graph nodes in sequence. We readily get the connecting edges by querying the graph object using the sequential node pairs.
After acquiring a valid route between the endpoints, we can now use it to query the discretized speed vector database. We will approach this by converting the graph nodes’ locations to their corresponding quadkeys and using their tile coordinates to “draw” the edges between them, thereby determining the quadkeys that best match them. Figure 10 below shows our route, drawn from the road network information.
Draw the Route Quadkeys
One of the exciting advantages of using quadkeys is that we partition the latitude and longitude space into a mesh of square areas, like graph paper or a raster display. Each square receives a unique x and y coordinate pair, which depends on the “zoom” or detail level. As the detail increases, the square size decreases.
We need a way to draw virtual lines on the quadkey space, as illustrated in Figure 4 above. Although we can use the old Bresenham line drawing algorithm, I will opt for an anti-aliased version, Xiaolin Wu’s algorithm. This choice becomes apparent from Figure 11 below or once you see Wikipedia’s animation of the algorithm at work. Each drawn square is a quadkey to query, and its shade is the weight of the reported information. This algorithm allows us to query data in the line’s vicinity and assign it importance or a relevance score.
You can see the anti-aliased line generating function in Figure 12 below. Note how the function generates a list of tuples containing x and y coordinates and pixel weight. We will later use these coordinates to create quadkey indexes to query the database for speed vectors that match the line direction.
We can look at the result of applying this method to the generated route by plotting the results on a map. Figure 13 below shows our path fully covered with level 20 quadkeys. We can now use these to query the database for matching speed vectors and infer the edge speeds using the quadkey weights. We calculate the edge travel duration from the edge speeds and their known distances.
Speed Inference
As previously stated, we estimate edge speeds by breaking them down into quadkeys and querying each. The function depicted in Figure 14 below shows how we query each quadkey for the speed vectors. Note how we apply the angle condition, forcing the vector bearings to be within a range of five degrees (ten degrees total) from the edge bearing.
We use the previous function to estimate the actual edge speed during the given time slot and weekdays. The function below, depicted in Figure 15, queries the speed vectors in the line-drawn quadkeys and averages them using the weight of the anti-aliasing algorithm.
Note how this function accepts a time offset in seconds instead of the day slot number. As we progress through the edges, estimating their duration, we must also update the implicit clock and advance time. We see this mechanism work in the following function that builds on the previous one and estimates the trip duration for the whole route.
Using the function above, we can estimate the trip duration for a given time of day and weekday and obtain the daily profile, as shown in Figure 17 below.
We can see some noise in the chart, possibly due to the short ten-minute time slot and the resulting data sparseness. With the current data preparation, we can only mitigate this issue by averaging the trip durations throughout the week, as depicted in Figure 18 below.
Painting Speed
We can now take an extra step and color the route according to the relation between the actual speed and the map-provided speed. The function below iterates the route’s edges and compares the inferred and edge speeds.
By running the function above for a Monday at 9 PM, we get the following map:
In this article, we have developed a technique to infer trip times using a database of sampled speed vectors and a digital map for directions. We used the Extended Vehicle Energy Dataset as the source for speed information that we enriched with the bearing information. The speed query process involves drawing the route on the quadkey tile space and then querying each tile for the speeds that align with the underlying edge bearing. With the resulting data, we estimate route durations on different days of the week and times of day and create a graphical depiction of the route showing the fastest and slowest segments.
An obvious experiment I should conduct is to increase the time slot from ten to fifteen or even twenty minutes. While the result will have a smaller granularity, it will gain by decreasing data sparseness.
OSMnx 1.2.2 — OSMnx 1.2.2 documentation
joaofig/eved-explore: An exploration of the Extended Vehicle Energy Dataset (github.com)
This article explains how to estimate travel times using known speed vectors indexed by quadkeys
How long is your road trip going to last? If you are like me, you go to Google maps or Here maps and query the route directly. You will not only get the fastest or shortest paths, but you will also get estimates of your travel times. These services report the most common speeds according to the learned time patterns and live traffic information and do so for the typical vehicles on the road.
These routing services are less valuable if you have a fleet of specialized vehicles, like garbage collection trucks, tuk-tuks, or even plain trucks. For instance, the maximum allowed highway speeds tend to vary according to the vehicle classification, and in Europe, trucks have lower top allowed highway speeds than light vehicles. Some commercial map routing products support truck routing, but you can roll your own if you have a fleet of specialized vehicles.
What if you could make these predictions using your own privately-collected speed vector data? It is not uncommon for fleets to collect telematics data for various purposes ranging from fuel savings to traffic safety. Telematics data usually includes GPS information that, besides the sampled location, also reports the corresponding speed vector, containing both the absolute speed and the direction (bearing). If you have such data, you can use it to predict your fleet’s expected travel times using the approach I describe in this article.
We must start with a substantial speed vector dataset covering an area of interest. In this article, I will use the Extended Vehicle Energy Dataset. Here’s a quote from the paper’s presentation that explains the main differences from its previous incarnation, the Vehicle Energy Dataset:
This work presents an extended version of the Vehicle Energy Dataset (VED), which is a openly released large-scale dataset for vehicle energy consumption analysis. Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e.g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption.
The Vehicle Energy Dataset collects trip information from November 2017 to November 2018 on 383 personal cars in Ann Arbor, Michigan. The Extended Vehicle Energy Dataset enhances the original by correcting the GPS fluctuations through map-matching, “snapping” the sampled locations to the known roads, and adding much more contextual information.
Each record of this dataset contains both the original and the map-matched GPS locations, the vehicle reported speed, and a set of additional properties that describe the vehicle’s state and spot. Unfortunately, the dataset does not include the speed directions, so we are missing half of the speed vector information. We will solve this problem by inferring the angles using the geospatial data of the location sequences (see below).
Travel Time Estimation
To estimate travel time, we must start by identifying the trip endpoints using direct geospatial coordinates or address geocoding. Next, we find the route that fits our requirements using a graph with the geospatial description of the area of concern. We create this graph from publicly available maps, the OpenStreetMap provider, using the OSMnx Python package to acquire and enhance the data.
The resulting path is a sequence of graph nodes and directed edges, where the nodes contain geospatial coordinates. The edges carry descriptive properties, such as the distance between endpoints, travel time, maximum speed, etc.
With the route information, we can now query the database of known speed vectors and extract the ones closest to the route with the same approximate direction or bearing. We can even impose that the query returns the speed vectors recorded at a specific time interval (day of week or hour of the day).
Bearings
Unfortunately, the extended dataset does not include the speed directions associated with each sampled GPS location. The vector direction is known as the bearing, measured as an angle in degrees from true North in the clockwise direction, as depicted in Figure 1.
We will use an approximation based on the natural location sequences to calculate the collected GPS speed vector bearings. The GPS location sequences in the data belong to vehicle trips, each with its unique identifier, and sorted according to the monotonically increasing timer. For any of these sequences, we project the bearing of the line connecting two sequential locations into the second location. The exception occurs at the first location, which receives the same direction as the second. We will use these angles as criteria when querying the speed vector database. Figure 2 below illustrates the bearing calculation process.
I used the “Movable Type Scripts” online resource to calculate the bearing from the end locations, a treasure trove for geospatial algorithms and formulae. The function that implements the vectorized bearing calculation is shown below in Figure 3.
Note that for N points, the function calculates N-1 bearings. We then need to use the rule depicted in Figure 2 to assign bearings to the individual locations of the speed measurements.
Space Discretization
We need to query the sampled input vectors database repeatedly to infer an arbitrary trip’s speeds. Due to its sheer size of over twenty-two million records, we need a very efficient way to index the sampled vector database. Following a simplistic approach and avoiding complex geospatial indexing, we will use a fixed space partition with quadkeys. Alternatively, we could use Uber H3’s hexagonal partitioning with similar results.
A square space discretization allows quick implicit indexing and a canvas to draw on. Why do we need to draw? Once we have the trip path, we can break it down to the intersecting quadkeys and use those to query the speed vector database. Figure 4 illustrates the process.
The process depicted in Figure 4 amounts to drawing lines in the quadkey tile space. We determine the tile endpoints for each graph edge and then use a line drawing algorithm to decide which quadkeys intersect the path. Finally, we use the list of resulting quadkeys and the edge bearing to query the database for sampled speed vectors.
We then estimate an average speed for each trip edge and reconstruct the total trip time using the individual edge distances.
Time Discretization
We should expect that for the same road segment and in the same direction, traffic speeds will vary during the day and throughout the week. Traffic density is closely related to vehicle speeds; we should expect higher velocities when there is low traffic density and vice versa. Rush hour is a typical density cycle with slow speeds, and weekends should also display different traffic behaviors than weekdays. If we want to capture these speed changes on a micro level, we are better off by discretizing time.
In this article, I will use a simple time discretization technique, splitting the one year into seven weekdays and each day into one hundred and forty-four ten-minute intervals. This time discretization allows us to research weekly and daily patterns.
We now focus on the implementation details of this article’s code. Please start by heading over to the accompanying GitHub repository and cloning it.
Data Acquisition
To download and decompress the dataset, you should use the first notebook. The code is as simple as it gets. Once ready, you can import the data into the supporting database.
Data Import
Like in many articles before this one, I used an SQLite database to hold all the data, and I implemented the process of importing the data in the second notebook. The code merely iterates through the expanded CSV files, reads the data into a Pandas DataFrame, and then inserts it into the database. Note that the code first reads the CSV file into a string and then filters out the semicolon character because some files have this character at the line ending, which defeats the standard CSV import.
Bearing and Quadkey Calculation
Now we can calculate bearings and quadkeys from the sampled speed vector data. To do this, please use the third notebook. As you can see, it merely calls an external Python script to do all the heavy lifting. The code executes in parallel to leverage the fact that it calculates per vehicle trip. Note that this code will take a long time to run (expect at least ninety minutes).
Time Slot Calculation
Time slot calculation requires two extra columns in the signals table, the weekday and the time slot. We calculate weekdays based on the information that the “day number” column contains an offset into November 1st, 2017. As discussed in the original GitHub repository documentation, a value of one means midnight of that day (a Wednesday). The decimal represents a fraction of that day, so 1.5 means noon that same day. Using this information, we can easily break the day numbers down to the respective weekdays and time slots. We will use ten-minute buckets for the time slots, meaning each day has 144 such buckets. To update the table, we use the following SQL query:
Note that the query above encodes Sunday as zero. You can adapt to your preference.
Run notebook number four to update your database with the time slots. We are now ready to predict trip times.
Now that we have set up all the preconditions let us illustrate the process of estimating an arbitrary trip in Ann Arbor using the EVED. Please refer to notebook number five for the actual implementation.
Load and Prepare Map Data
Our first step is to load the OpenStreetMap data on Ann Arbor. We will use Geoff Boeing’s brilliant OSMnx Python package to do this.
Figure 6 shows the single line of code required to load the drivable road information for Ann Arbor from OpenStreetMap. I explicitly avoid the network simplification step to get better resolution when matching trip endpoint locations to the road graph nodes. Figure 7 below shows what the loaded road graph looks like.
After loading the road network, we need to add a few missing pieces of information, namely edge speeds, travel times, and bearings. Fortunately, we have help from OSMnx to do that, as shown in Figure 8 below.
Now that the data is ingested and prepared, we can proceed to the route definition process.
Define the Route
We define a route using the most straightforward approach possible by specifying its endpoints. I adapted the method and the code you find here from the excellent article below.
The code in Figure 9 below illustrates the process of obtaining a route from its endpoints. We start by defining the addresses, geocoding them into geospatial locations, mapping these to the nearest graph nodes, and finally obtaining the route. The resulting route object is a list of graph nodes in sequence. We readily get the connecting edges by querying the graph object using the sequential node pairs.
After acquiring a valid route between the endpoints, we can now use it to query the discretized speed vector database. We will approach this by converting the graph nodes’ locations to their corresponding quadkeys and using their tile coordinates to “draw” the edges between them, thereby determining the quadkeys that best match them. Figure 10 below shows our route, drawn from the road network information.
Draw the Route Quadkeys
One of the exciting advantages of using quadkeys is that we partition the latitude and longitude space into a mesh of square areas, like graph paper or a raster display. Each square receives a unique x and y coordinate pair, which depends on the “zoom” or detail level. As the detail increases, the square size decreases.
We need a way to draw virtual lines on the quadkey space, as illustrated in Figure 4 above. Although we can use the old Bresenham line drawing algorithm, I will opt for an anti-aliased version, Xiaolin Wu’s algorithm. This choice becomes apparent from Figure 11 below or once you see Wikipedia’s animation of the algorithm at work. Each drawn square is a quadkey to query, and its shade is the weight of the reported information. This algorithm allows us to query data in the line’s vicinity and assign it importance or a relevance score.
You can see the anti-aliased line generating function in Figure 12 below. Note how the function generates a list of tuples containing x and y coordinates and pixel weight. We will later use these coordinates to create quadkey indexes to query the database for speed vectors that match the line direction.
We can look at the result of applying this method to the generated route by plotting the results on a map. Figure 13 below shows our path fully covered with level 20 quadkeys. We can now use these to query the database for matching speed vectors and infer the edge speeds using the quadkey weights. We calculate the edge travel duration from the edge speeds and their known distances.
Speed Inference
As previously stated, we estimate edge speeds by breaking them down into quadkeys and querying each. The function depicted in Figure 14 below shows how we query each quadkey for the speed vectors. Note how we apply the angle condition, forcing the vector bearings to be within a range of five degrees (ten degrees total) from the edge bearing.
We use the previous function to estimate the actual edge speed during the given time slot and weekdays. The function below, depicted in Figure 15, queries the speed vectors in the line-drawn quadkeys and averages them using the weight of the anti-aliasing algorithm.
Note how this function accepts a time offset in seconds instead of the day slot number. As we progress through the edges, estimating their duration, we must also update the implicit clock and advance time. We see this mechanism work in the following function that builds on the previous one and estimates the trip duration for the whole route.
Using the function above, we can estimate the trip duration for a given time of day and weekday and obtain the daily profile, as shown in Figure 17 below.
We can see some noise in the chart, possibly due to the short ten-minute time slot and the resulting data sparseness. With the current data preparation, we can only mitigate this issue by averaging the trip durations throughout the week, as depicted in Figure 18 below.
Painting Speed
We can now take an extra step and color the route according to the relation between the actual speed and the map-provided speed. The function below iterates the route’s edges and compares the inferred and edge speeds.
By running the function above for a Monday at 9 PM, we get the following map:
In this article, we have developed a technique to infer trip times using a database of sampled speed vectors and a digital map for directions. We used the Extended Vehicle Energy Dataset as the source for speed information that we enriched with the bearing information. The speed query process involves drawing the route on the quadkey tile space and then querying each tile for the speeds that align with the underlying edge bearing. With the resulting data, we estimate route durations on different days of the week and times of day and create a graphical depiction of the route showing the fastest and slowest segments.
An obvious experiment I should conduct is to increase the time slot from ten to fifteen or even twenty minutes. While the result will have a smaller granularity, it will gain by decreasing data sparseness.
OSMnx 1.2.2 — OSMnx 1.2.2 documentation
joaofig/eved-explore: An exploration of the Extended Vehicle Energy Dataset (github.com)