A Practical Introduction to Geospatial Data Analysis using QGIS | by Eugenia Anello | Feb, 2023

By Jessie Hobb On Feb 27, 2023

This is an interactive tutorial that will allow learning the GIS key concepts while using QGIS

Do you want to learn geospatial data analysis and don’t you know where to start? Then, this tutorial is for you. There are a lot of concepts that are taken for granted when you begin this journey and will allow you to manipulate geographic information in your dataset.

Geospatial data analysis is a subfield of data science that focus on a special type of data, geospatial data. Differently from normal data, each record of the geospatial data corresponds to a specific location and can be drawn on a map.

A specific data point can be described just by latitude and longitude. But you have a dataset with more complex items, like roads, rivers, boundaries between countries or the physical map with mountains, deserts and forests, a pair of coordinates is not enough anymore. Did I intrigue you? Let’s get started!

Types of Geospatial data

There are two main types of geospatial data: vector data and raster data. When dealing with vector data, you still have a tabular dataset, while raster data are more similar to coloured images with three channels, red, green and blue. Just focusing on vector data, we can distinguish three different cases: point data, line data and polygon data.

Screenshot by Author. Point data example obtained with QGIS.

Point data is the most simple data type, which is described by a pair of coordinates, latitude and longitude. Examples of point data can be cities, restaurants, and shopping centres.

Above, you can see an example of point data, where you can see the location of all the airports in the world, retrieved by Natural Earth Data, which constitutes one of the many free data sources.

Now, it’s time for passing Line data, which consists of a line with a starting point and an ending point. Classic examples can be streets, train routes and rivers, which you can visualize below.

Screenshot by Author. Line data example obtained with QGIS.

The last and third case is the Polygon data, which is composed of different points that are connected and closed. The simplest example to keep in mind for this particular type of data is by thinking boundaries of countries. Below, I provide you with an overview of our glaciers and recently deglaciated areas.

After explaining the vector data, it’s the turn to Raster data, which is the most fascinating for me. As I said previously, it may be confused with images since they both are matrices of pixels. But differently from common images, each pixel corresponds to a different geographic region and each value of this pixel describes a particular characteristic of the territory.

Screenshot by Author. Raster data example obtained with QGIS.

As you can deduce from this visualization, the raster data can provide more information than the vector data in terms of real-world surface. Examples of raster data are satellite images and aerial photographs.

This data can be crucial for monitoring disasters and speeding up the rescue of people. So, it doesn’t just provide actionable insights for businesses, but it can save even lives. This is possible by training deep learning models to depict specific objects in Satellite images.

Format of vector data

When working with geospatial data, it’s also important to know the format of files. In the case of vector data, the most common geospatial file is the Shapefile. You can find it a lot from the many free open-source datasets. When you download the vector data, you’ll have a zip file, composed of three mandatory files:

.shp is the most important file that provides the geometry, which is the field that contains the geometries to plot the points, lines and polygons in the map.
.shx provides the positional index of the feature geometry
.dbf is the standard database file that contains the attribute data, which is composed of non-geospatial fields that allow understanding of the context of the geospatial data, like the names of the cities, rivers, streets and countries.

Another common type is GeoJSON, which stands for Geographic JavaScript Object Notation and is used for web-based mapping. It’s composed of two files: .geojson and .json.

Format of raster data

Raster data also has a common type format, called GeoTIFF. Similarly to Shapefile, it’s composed of three files: .tif, .tiff, .ovr. It can happen that Shapefile and GeoTIFF have also other files in addition, but they aren’t mandatory luckily.

Other alternatives of format are ERDAS Imagine (.img) and IDRISI Raster (.rst,.rdc). That’s it!

A practical example with QGIS

QGIS is the open-source software we are going to use to visualize geospatial data. If you don’t have QGIS, download it from here. Once it’s installed, you can open it and it should have a window like this:

The first step is to add the background map to the map window. The most popular way is to use OpenStreetMap, which provides the biggest free and editable geographic database, continuously updated by a team of volunteers. The procedure to add it is very simple:

Click the arrow preceding the option “XYZ Tiles” on the Panels.
Double click OpenStreetMap

And Voilà. We have imported OSM data into our QGIS project. After we can drag the geographical data you choose for the analysis into the Layers Panel. For example, let’s import the data with the airports, provided by Natural Earth Data and shown in the previous sections.

GIF by Author. Add the data with airports.

We can also check the information of the data and change the colour of the dots:

GIF by Author. Check the data information and Change the color of the dots.

The Information provides an overview of the data type, which is point data, and the coordinate reference system, which is another characteristic of geospatial data. This last aspect is crucial for converting the locations on the earth, with an irregular spheroid-like shape, into a 2D map. You can notice that it doesn’t match the CRS of QGIS and needs to be changed.

Now, the error is corrected and we can breathe a sigh of relief.

Final thoughts:

That’s it! It was a fast and brief tutorial to introduce you to the magic world of geospatial data analysis. I decided to use QGIS in this tutorial to provide intuitive examples of geospatial data. This is just the beginning. In the next articles, I am going to cover more applications with Python libraries. If you are interested in going deeper and finding free GIS Data Sources, check here.