Travel Time Estimation Using Quadkeys | by João Paulo Figueira | Sep, 2022

By Jessie Hobb On Sep 28, 2022

This article explains how to estimate travel times using known speed vectors indexed by quadkeys

How long is your road trip going to last? If you are like me, you go to Google maps or Here maps and query the route directly. You will not only get the fastest or shortest paths, but you will also get estimates of your travel times. These services report the most common speeds according to the learned time patterns and live traffic information and do so for the typical vehicles on the road.

These routing services are less valuable if you have a fleet of specialized vehicles, like garbage collection trucks, tuk-tuks, or even plain trucks. For instance, the maximum allowed highway speeds tend to vary according to the vehicle classification, and in Europe, trucks have lower top allowed highway speeds than light vehicles. Some commercial map routing products support truck routing, but you can roll your own if you have a fleet of specialized vehicles.

What if you could make these predictions using your own privately-collected speed vector data? It is not uncommon for fleets to collect telematics data for various purposes ranging from fuel savings to traffic safety. Telematics data usually includes GPS information that, besides the sampled location, also reports the corresponding speed vector, containing both the absolute speed and the direction (bearing). If you have such data, you can use it to predict your fleet’s expected travel times using the approach I describe in this article.

We must start with a substantial speed vector dataset covering an area of interest. In this article, I will use the Extended Vehicle Energy Dataset. Here’s a quote from the paper’s presentation that explains the main differences from its previous incarnation, the Vehicle Energy Dataset:

This work presents an extended version of the Vehicle Energy Dataset (VED), which is a openly released large-scale dataset for vehicle energy consumption analysis. Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e.g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption.

The Vehicle Energy Dataset collects trip information from November 2017 to November 2018 on 383 personal cars in Ann Arbor, Michigan. The Extended Vehicle Energy Dataset enhances the original by correcting the GPS fluctuations through map-matching, “snapping” the sampled locations to the known roads, and adding much more contextual information.

Each record of this dataset contains both the original and the map-matched GPS locations, the vehicle reported speed, and a set of additional properties that describe the vehicle’s state and spot. Unfortunately, the dataset does not include the speed directions, so we are missing half of the speed vector information. We will solve this problem by inferring the angles using the geospatial data of the location sequences (see below).

Travel Time Estimation

To estimate travel time, we must start by identifying the trip endpoints using direct geospatial coordinates or address geocoding. Next, we find the route that fits our requirements using a graph with the geospatial description of the area of concern. We create this graph from publicly available maps, the OpenStreetMap provider, using the OSMnx Python package to acquire and enhance the data.

The resulting path is a sequence of graph nodes and directed edges, where the nodes contain geospatial coordinates. The edges carry descriptive properties, such as the distance between endpoints, travel time, maximum speed, etc.

With the route information, we can now query the database of known speed vectors and extract the ones closest to the route with the same approximate direction or bearing. We can even impose that the query returns the speed vectors recorded at a specific time interval (day of week or hour of the day).

Bearings

Unfortunately, the extended dataset does not include the speed directions associated with each sampled GPS location. The vector direction is known as the bearing, measured as an angle in degrees from true North in the clockwise direction, as depicted in Figure 1.

We will use an approximation based on the natural location sequences to calculate the collected GPS speed vector bearings. The GPS location sequences in the data belong to vehicle trips, each with its unique identifier, and sorted according to the monotonically increasing timer. For any of these sequences, we project the bearing of the line connecting two sequential locations into the second location. The exception occurs at the first location, which receives the same direction as the second. We will use these angles as criteria when querying the speed vector database. Figure 2 below illustrates the bearing calculation process.

**Figure 2** — We use the angle of the geodesic connecting sequential GPS locations as the bearing of the last one. The direction of the speed vector sampled at location 2 is the same as the angle of the line connecting locations 1 and 2. The first location in the sequence gets the same bearing as the second. (Image source: Author)

I used the “Movable Type Scripts” online resource to calculate the bearing from the end locations, a treasure trove for geospatial algorithms and formulae. The function that implements the vectorized bearing calculation is shown below in Figure 3.

Figure 3 — The function above takes as parameters two NumPy arrays with the sequence of latitudes and longitudes of a vehicle trip. The result is the sequential pairwise bearing calculation. (Image source: Author)

Note that for N points, the function calculates N-1 bearings. We then need to use the rule depicted in Figure 2 to assign bearings to the individual locations of the speed measurements.

Space Discretization

We need to query the sampled input vectors database repeatedly to infer an arbitrary trip’s speeds. Due to its sheer size of over twenty-two million records, we need a very efficient way to index the sampled vector database. Following a simplistic approach and avoiding complex geospatial indexing, we will use a fixed space partition with quadkeys. Alternatively, we could use Uber H3’s hexagonal partitioning with similar results.

A square space discretization allows quick implicit indexing and a canvas to draw on. Why do we need to draw? Once we have the trip path, we can break it down to the intersecting quadkeys and use those to query the speed vector database. Figure 4 illustrates the process.

**Figure 4** — Before we query the speed vector database, we should determine which parts of it to use. By breaking the path down to its intersecting quadkeys, as depicted above, we can more easily query the database. Red dots represent OSM graph nodes, the trip path shows in blue, and the intersecting quadkeys are in green. (Image source: Author using Folium and OpenStreetMap imagery)

The process depicted in Figure 4 amounts to drawing lines in the quadkey tile space. We determine the tile endpoints for each graph edge and then use a line drawing algorithm to decide which quadkeys intersect the path. Finally, we use the list of resulting quadkeys and the edge bearing to query the database for sampled speed vectors.

We then estimate an average speed for each trip edge and reconstruct the total trip time using the individual edge distances.

Time Discretization

We should expect that for the same road segment and in the same direction, traffic speeds will vary during the day and throughout the week. Traffic density is closely related to vehicle speeds; we should expect higher velocities when there is low traffic density and vice versa. Rush hour is a typical density cycle with slow speeds, and weekends should also display different traffic behaviors than weekdays. If we want to capture these speed changes on a micro level, we are better off by discretizing time.

In this article, I will use a simple time discretization technique, splitting the one year into seven weekdays and each day into one hundred and forty-four ten-minute intervals. This time discretization allows us to research weekly and daily patterns.

We now focus on the implementation details of this article’s code. Please start by heading over to the accompanying GitHub repository and cloning it.

Data Acquisition

To download and decompress the dataset, you should use the first notebook. The code is as simple as it gets. Once ready, you can import the data into the supporting database.

Data Import

Like in many articles before this one, I used an SQLite database to hold all the data, and I implemented the process of importing the data in the second notebook. The code merely iterates through the expanded CSV files, reads the data into a Pandas DataFrame, and then inserts it into the database. Note that the code first reads the CSV file into a string and then filters out the semicolon character because some files have this character at the line ending, which defeats the standard CSV import.

Bearing and Quadkey Calculation

Now we can calculate bearings and quadkeys from the sampled speed vector data. To do this, please use the third notebook. As you can see, it merely calls an external Python script to do all the heavy lifting. The code executes in parallel to leverage the fact that it calculates per vehicle trip. Note that this code will take a long time to run (expect at least ninety minutes).

Time Slot Calculation

Time slot calculation requires two extra columns in the signals table, the weekday and the time slot. We calculate weekdays based on the information that the “day number” column contains an offset into November 1st, 2017. As discussed in the original GitHub repository documentation, a value of one means midnight of that day (a Wednesday). The decimal represents a fraction of that day, so 1.5 means noon that same day. Using this information, we can easily break the day numbers down to the respective weekdays and time slots. We will use ten-minute buckets for the time slots, meaning each day has 144 such buckets. To update the table, we use the following SQL query:

Figure 5 — The query above updates the time slots on the signal table. (Image source: Author)

Note that the query above encodes Sunday as zero. You can adapt to your preference.

Run notebook number four to update your database with the time slots. We are now ready to predict trip times.

Now that we have set up all the preconditions let us illustrate the process of estimating an arbitrary trip in Ann Arbor using the EVED. Please refer to notebook number five for the actual implementation.

Load and Prepare Map Data

Our first step is to load the OpenStreetMap data on Ann Arbor. We will use Geoff Boeing’s brilliant OSMnx Python package to do this.

Figure 6 — We load OSM data for Ann Arbor using a single line of Python code thanks to OSMnx. (Image source: Author)

Figure 6 shows the single line of code required to load the drivable road information for Ann Arbor from OpenStreetMap. I explicitly avoid the network simplification step to get better resolution when matching trip endpoint locations to the road graph nodes. Figure 7 below shows what the loaded road graph looks like.

**Figure 7** — The image above depicts Ann Arbor’s drivable road graph. (Image source: Author)

After loading the road network, we need to add a few missing pieces of information, namely edge speeds, travel times, and bearings. Fortunately, we have help from OSMnx to do that, as shown in Figure 8 below.

Figure 8 — The code snippet above shows how to use the OSMnx package to add missing edge speeds, travel times, and bearings. (Image source: Author)

Now that the data is ingested and prepared, we can proceed to the route definition process.

Define the Route

We define a route using the most straightforward approach possible by specifying its endpoints. I adapted the method and the code you find here from the excellent article below.

The code in Figure 9 below illustrates the process of obtaining a route from its endpoints. We start by defining the addresses, geocoding them into geospatial locations, mapping these to the nearest graph nodes, and finally obtaining the route. The resulting route object is a list of graph nodes in sequence. We readily get the connecting edges by querying the graph object using the sequential node pairs.

Figure 9 — The code above specifies the trip endpoints using street addresses. It then converts the addresses to geospatial coordinates through geocoding and maps these locations to the graph, finding the closest nodes. We determine the route by querying the network’s shortest path. Note that we could also use OSMnx to calculate the course as well. (Image source: Author)

After acquiring a valid route between the endpoints, we can now use it to query the discretized speed vector database. We will approach this by converting the graph nodes’ locations to their corresponding quadkeys and using their tile coordinates to “draw” the edges between them, thereby determining the quadkeys that best match them. Figure 10 below shows our route, drawn from the road network information.

**Figure 10** — The map above shows the calculated route with graph nodes in red and edges in blue. The tooltip shows the edge properties collected from the map provider and enhanced by OSMnx. (Image source: Author using Folium and OpenStreetMap imagery)

Draw the Route Quadkeys

One of the exciting advantages of using quadkeys is that we partition the latitude and longitude space into a mesh of square areas, like graph paper or a raster display. Each square receives a unique x and y coordinate pair, which depends on the “zoom” or detail level. As the detail increases, the square size decreases.

We need a way to draw virtual lines on the quadkey space, as illustrated in Figure 4 above. Although we can use the old Bresenham line drawing algorithm, I will opt for an anti-aliased version, Xiaolin Wu’s algorithm. This choice becomes apparent from Figure 11 below or once you see Wikipedia’s animation of the algorithm at work. Each drawn square is a quadkey to query, and its shade is the weight of the reported information. This algorithm allows us to query data in the line’s vicinity and assign it importance or a relevance score.

**Figure 11** — The image above depicts the result of drawing an anti-aliased line from coordinates (0, 0) to (20, 13). Pixels with the black color weigh one, while white weighs zero. Note how we collect more than one pixel for each horizontal scanline. (Image source: Author)

You can see the anti-aliased line generating function in Figure 12 below. Note how the function generates a list of tuples containing x and y coordinates and pixel weight. We will later use these coordinates to create quadkey indexes to query the database for speed vectors that match the line direction.

Figure 12 — The above functions generate the coordinates and weight for an anti-aliased line. (Image source: Author)

We can look at the result of applying this method to the generated route by plotting the results on a map. Figure 13 below shows our path fully covered with level 20 quadkeys. We can now use these to query the database for matching speed vectors and infer the edge speeds using the quadkey weights. We calculate the edge travel duration from the edge speeds and their known distances.

**Figure 13** — The map above shows the route drawn with level 20 quadkeys. Note how some display lighter than others due to their lower weight, although superposition also plays a role. (Image source: Author using Folium and OpenStreetMap imagery)

Speed Inference

As previously stated, we estimate edge speeds by breaking them down into quadkeys and querying each. The function depicted in Figure 14 below shows how we query each quadkey for the speed vectors. Note how we apply the angle condition, forcing the vector bearings to be within a range of five degrees (ten degrees total) from the edge bearing.

Figure 14 — The function above queries the speed vectors from a quadkey. Note that the function also accepts the bearing and the time discretization. (Image source: Author)

We use the previous function to estimate the actual edge speed during the given time slot and weekdays. The function below, depicted in Figure 15, queries the speed vectors in the line-drawn quadkeys and averages them using the weight of the anti-aliasing algorithm.

Figure 15 — Edge trip duration inference requires querying each quadkey from the generated line. Note how each quadkey weight implicitly averages the contribution of each vector. If the edge does not contain any speed vector data, the function above defaults to the one in the edge metadata. (Image source: Author)

Note how this function accepts a time offset in seconds instead of the day slot number. As we progress through the edges, estimating their duration, we must also update the implicit clock and advance time. We see this mechanism work in the following function that builds on the previous one and estimates the trip duration for the whole route.

Figure 16 — The depicted function estimates the whole route duration. Note how it advances the current clock as specified by the function parameters. We must update the implicit clock to avoid querying the entire route from the same time slot, incrementing the day if necessary. (Image source: Author)

Using the function above, we can estimate the trip duration for a given time of day and weekday and obtain the daily profile, as shown in Figure 17 below.

**Figure 17** — The chart above shows the estimated durations for the specified route on a Monday. (Image source: Author)

We can see some noise in the chart, possibly due to the short ten-minute time slot and the resulting data sparseness. With the current data preparation, we can only mitigate this issue by averaging the trip durations throughout the week, as depicted in Figure 18 below.

**Figure 18** — We obtain the chart above by averaging the trip durations through the week. Note how traffic only seems to pick up around 11 AM. (Image source: Author)

Painting Speed

We can now take an extra step and color the route according to the relation between the actual speed and the map-provided speed. The function below iterates the route’s edges and compares the inferred and edge speeds.

Figure 19 — The above function draws the route using a color code to reflect how fast the vehicle drives, with red for slowest and light green for fastest. (Image source: Author)

By running the function above for a Monday at 9 PM, we get the following map:

**Figure 20** — The map above shows the colored version of the route, where red segments are the slowest. Data for the above instance relates to 9 PM on a Monday. (Image source: Author)

In this article, we have developed a technique to infer trip times using a database of sampled speed vectors and a digital map for directions. We used the Extended Vehicle Energy Dataset as the source for speed information that we enriched with the bearing information. The speed query process involves drawing the route on the quadkey tile space and then querying each tile for the speeds that align with the underlying edge bearing. With the resulting data, we estimate route durations on different days of the week and times of day and create a graphical depiction of the route showing the fastest and slowest segments.

An obvious experiment I should conduct is to increase the time slot from ten to fifteen or even twenty minutes. While the result will have a smaller granularity, it will gain by decreasing data sparseness.

[2203.08630] Extended vehicle energy dataset (eVED): an enhanced large-scale dataset for deep learning on vehicle trip energy consumption (arxiv.org)

[1905.02081] Vehicle Energy Dataset (VED), A Large-scale Dataset for Vehicle Energy Consumption Research (arxiv.org)

OSMnx 1.2.2 — OSMnx 1.2.2 documentation

joaofig/eved-explore: An exploration of the Extended Vehicle Energy Dataset (github.com)