Techno Blender
Digitally Yours.

Create a Local dbt Project

0 64


Photo by Daniel K Cheung on Unsplash

dbt (data build tool) is one of the hottest technologies in the data engineering and analytics space. Recently, I’ve been working on a task that performs some post-processing over dbt artefacts and wanted to write up some tests. In order to do so, I’ve had to create an example project that could run locally (or in a docker container), so that I wouldn’t have to interact with the actual Data Warehouse.

In this article we will go through a step-by-step process one can follow in order to create a dbt project and connect it with a containerized Postgres instance. You can use such projects either for testing purposes, or even for experimenting with the dbt itself in order to try out features or even practise your skills.

Step 1: Create a dbt project

We will be populating some data in a Postgres database therefore, we first need to install the dbt Postgres adapter from PyPI:

pip install dbt-postgres==1.3.1

Note that the command will also install the dbt-core package as well as other dependencies that are required for running dbt.

Now let’s go ahead and create a dbt project — to do so, we can initialise a new dbt project by running the dbt init command in the terminal:

dbt init test_dbt_project

You will then be prompted to select which database you like to use (depending on the adapters you have installed locally, you may see different options):

16:43:08  Running with dbt=1.3.1
Which database would you like to use?
[1] postgres

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1

Make sure to enter the number that corresponds to the Postgres adapter, as shown in the output list. Now the init command should have created the following basic structure in the directory where you’ve executed it:

dbt project structure created by `dbt init` command — Source: Author

Step 2: Create a Docker Compose file

Now let’s create a docker-compose.yml file (place the file at the same level as the test_dbt_projectdirectory) in which we will be specifying two services — one would correspond to a ready-made Postgres image and the second one to a dbt image that we will define in a Dockerfile in the next step:

version: "3.9"

services:
postgres:
container_name: postgres
image: frantiseks/postgres-sakila
ports:
- '5432:5432'
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
build: .
image: dbt-dummy
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
condition: service_healthy

As you can tell, for the Postgres container, we will be using an image called frantiseks/postgres-sakila which is publicly available and accessible on Docker Hub. This image, will populate the Sakila Database on the Postgres instance. The database models a DVD rental store and is consisted of multiple tables which are normalised and correspond to entities such as films, actors, customers and payments. In the next few following sections we’ll make use of this data in order to build some example dbt data models.

The second service, called dbt, will be the one that creates an environment where we will build our data models. Note that we mount the current directory into the docker container. This will let the container have access to any changes we may be doing to the data models without having to re-build the image. Additionally, any metadata generated by dbt commands (such as manifet.json) will appear instantly on the host machine.

Step 3: Create a Dockerfile

Now let’s specify a Dockerfile that will be used to build an image on top of which the running container will then build the models specified in our example dbt project.

FROM python:3.10-slim-buster

RUN apt-get update \
&& apt-get install -y --no-install-recommends

WORKDIR /usr/src/dbt/dbt_project

# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.3.1

# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir profiles && sleep infinity

Note that in the last CMD command, we intentionally added an extra && sleep infinity command such that the container won’t exit after running the steps specified in the Dockerfile so that we can then access the container and run additional dbt commands (if needed).

Step 4: Create a dbt profile for the Postgres database

Now that we have created the required infrastructure for our host machines in order to create a Postgres database, populate some dummy data as well as creating an image for our dbt environment, let’s focus on the dbt side.

We will first have to create a dbt profile that will be used when interacting with the target Postgres database. Within the test_dbt_project directory, create another directory called profiles and then a file called profiles.yml with the following content:

test_profile:
target: dev
outputs:
dev:
type: postgres
host: postgres
user: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1

Step 5: Define some data models

The next step is to create some data models based on the Sakila data populated by the Postgres container. If you are planning to use this project for testing purposes, I would advise to create at least one seed, one model and a snapshot (with tests if possible) so that you have a full coverage of all dbt entities (macros excluding).

I have created some data models, seeds and snapshots already, that you can access them on this repository

Step 6: Run the Docker containers

We now have everything we need in order to spin up the two docker containers we specified in the docker-compose.yml file earlier, and build the data models defined in our example dbt project.

First, let’s build the images

docker-compose build

And now let’s spin up the running containers:

docker-compose up

This command should have initialised a Postgres database using the Sakila Database, and created the dbt models specified. For now, let’s make sure you have two running containers:

docker ps

should give an output that includes one container with name dbt and another one with name postgres.

Step 7: Query the models on Postgres database

In order to access the Postgres container, you’ll first need to infer the container id

docker ps

And then run

docker exec -it <container-id> /bin/bash

We will then need to use psql, a command-line interface that gives us access the postgres instance:

psql -U postgres

If you have used the data models I’ve shared in the previous sections, you can now query each of the models created on Postgres using the queries below.

-- Query seed tables
SELECT * FROM customer_base;

-- Query staging views
SELECT * FROM stg_payment;

-- Query intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;

-- Query mart tables
SELECT * FROM cumulative_revenue;

-- Query snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;

Step 8: Creating additional or modifying existing models

As mentioned already, the Dockerfile and docker-compose.yml files were written in such a way such that the dbt container would still be up and running. Therefore, whenever you modify or create data models, you can still use that container to re-build seeds, models, snapshots and/or tests.

To do so, first infer the container id of the dbt container:

docker ps

Then enter the running container by running

docker exec -it <container-id> /bin/bash

And finally run any dbt command you wish, depending on the modifications you’ve made to the example dbt project. Here’s a quick reference of the most commonly used commands for these purposes:

# Install dbt deps
dbt deps

# Build seeds
dbt seeds --profiles-dir profiles

# Build data models
dbt run --profiles-dir profiles

# Build snapshots
dbt snapshot --profiles-dir profiles

# Run tests
dbt test --profiles-dir profiles

How to get the full code of this tutorial

I’ve created a GitHub repository called dbt-dummy that contains all the pieces you need in order to quickly create a containerized dbt project that uses Postgres. You can access it in the link below.

This project is also available in the example projects section of the official dbt documentation!

Final Thoughts

In today’s tutorial we went through a step by step process for creating a dbt project on a local machine using Docker. We’ve built two images, one for the Postgres database that also populates the Sakila database, and another one for our dbt environment.

It’s important to be able to quickly build some example projects with data build tool that can then be used as a testing environment or even a playground for experimenting and learning.

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like


Photo by Daniel K Cheung on Unsplash

dbt (data build tool) is one of the hottest technologies in the data engineering and analytics space. Recently, I’ve been working on a task that performs some post-processing over dbt artefacts and wanted to write up some tests. In order to do so, I’ve had to create an example project that could run locally (or in a docker container), so that I wouldn’t have to interact with the actual Data Warehouse.

In this article we will go through a step-by-step process one can follow in order to create a dbt project and connect it with a containerized Postgres instance. You can use such projects either for testing purposes, or even for experimenting with the dbt itself in order to try out features or even practise your skills.

Step 1: Create a dbt project

We will be populating some data in a Postgres database therefore, we first need to install the dbt Postgres adapter from PyPI:

pip install dbt-postgres==1.3.1

Note that the command will also install the dbt-core package as well as other dependencies that are required for running dbt.

Now let’s go ahead and create a dbt project — to do so, we can initialise a new dbt project by running the dbt init command in the terminal:

dbt init test_dbt_project

You will then be prompted to select which database you like to use (depending on the adapters you have installed locally, you may see different options):

16:43:08  Running with dbt=1.3.1
Which database would you like to use?
[1] postgres

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1

Make sure to enter the number that corresponds to the Postgres adapter, as shown in the output list. Now the init command should have created the following basic structure in the directory where you’ve executed it:

dbt project structure created by `dbt init` command — Source: Author

Step 2: Create a Docker Compose file

Now let’s create a docker-compose.yml file (place the file at the same level as the test_dbt_projectdirectory) in which we will be specifying two services — one would correspond to a ready-made Postgres image and the second one to a dbt image that we will define in a Dockerfile in the next step:

version: "3.9"

services:
postgres:
container_name: postgres
image: frantiseks/postgres-sakila
ports:
- '5432:5432'
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
build: .
image: dbt-dummy
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
condition: service_healthy

As you can tell, for the Postgres container, we will be using an image called frantiseks/postgres-sakila which is publicly available and accessible on Docker Hub. This image, will populate the Sakila Database on the Postgres instance. The database models a DVD rental store and is consisted of multiple tables which are normalised and correspond to entities such as films, actors, customers and payments. In the next few following sections we’ll make use of this data in order to build some example dbt data models.

The second service, called dbt, will be the one that creates an environment where we will build our data models. Note that we mount the current directory into the docker container. This will let the container have access to any changes we may be doing to the data models without having to re-build the image. Additionally, any metadata generated by dbt commands (such as manifet.json) will appear instantly on the host machine.

Step 3: Create a Dockerfile

Now let’s specify a Dockerfile that will be used to build an image on top of which the running container will then build the models specified in our example dbt project.

FROM python:3.10-slim-buster

RUN apt-get update \
&& apt-get install -y --no-install-recommends

WORKDIR /usr/src/dbt/dbt_project

# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.3.1

# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir profiles && sleep infinity

Note that in the last CMD command, we intentionally added an extra && sleep infinity command such that the container won’t exit after running the steps specified in the Dockerfile so that we can then access the container and run additional dbt commands (if needed).

Step 4: Create a dbt profile for the Postgres database

Now that we have created the required infrastructure for our host machines in order to create a Postgres database, populate some dummy data as well as creating an image for our dbt environment, let’s focus on the dbt side.

We will first have to create a dbt profile that will be used when interacting with the target Postgres database. Within the test_dbt_project directory, create another directory called profiles and then a file called profiles.yml with the following content:

test_profile:
target: dev
outputs:
dev:
type: postgres
host: postgres
user: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1

Step 5: Define some data models

The next step is to create some data models based on the Sakila data populated by the Postgres container. If you are planning to use this project for testing purposes, I would advise to create at least one seed, one model and a snapshot (with tests if possible) so that you have a full coverage of all dbt entities (macros excluding).

I have created some data models, seeds and snapshots already, that you can access them on this repository

Step 6: Run the Docker containers

We now have everything we need in order to spin up the two docker containers we specified in the docker-compose.yml file earlier, and build the data models defined in our example dbt project.

First, let’s build the images

docker-compose build

And now let’s spin up the running containers:

docker-compose up

This command should have initialised a Postgres database using the Sakila Database, and created the dbt models specified. For now, let’s make sure you have two running containers:

docker ps

should give an output that includes one container with name dbt and another one with name postgres.

Step 7: Query the models on Postgres database

In order to access the Postgres container, you’ll first need to infer the container id

docker ps

And then run

docker exec -it <container-id> /bin/bash

We will then need to use psql, a command-line interface that gives us access the postgres instance:

psql -U postgres

If you have used the data models I’ve shared in the previous sections, you can now query each of the models created on Postgres using the queries below.

-- Query seed tables
SELECT * FROM customer_base;

-- Query staging views
SELECT * FROM stg_payment;

-- Query intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;

-- Query mart tables
SELECT * FROM cumulative_revenue;

-- Query snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;

Step 8: Creating additional or modifying existing models

As mentioned already, the Dockerfile and docker-compose.yml files were written in such a way such that the dbt container would still be up and running. Therefore, whenever you modify or create data models, you can still use that container to re-build seeds, models, snapshots and/or tests.

To do so, first infer the container id of the dbt container:

docker ps

Then enter the running container by running

docker exec -it <container-id> /bin/bash

And finally run any dbt command you wish, depending on the modifications you’ve made to the example dbt project. Here’s a quick reference of the most commonly used commands for these purposes:

# Install dbt deps
dbt deps

# Build seeds
dbt seeds --profiles-dir profiles

# Build data models
dbt run --profiles-dir profiles

# Build snapshots
dbt snapshot --profiles-dir profiles

# Run tests
dbt test --profiles-dir profiles

How to get the full code of this tutorial

I’ve created a GitHub repository called dbt-dummy that contains all the pieces you need in order to quickly create a containerized dbt project that uses Postgres. You can access it in the link below.

This project is also available in the example projects section of the official dbt documentation!

Final Thoughts

In today’s tutorial we went through a step by step process for creating a dbt project on a local machine using Docker. We’ve built two images, one for the Postgres database that also populates the Sakila database, and another one for our dbt environment.

It’s important to be able to quickly build some example projects with data build tool that can then be used as a testing environment or even a playground for experimenting and learning.

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment