How To Handle Date and Time Difference in PostgreSQL | by Marie Lefevre | Jun, 2022
SQL TIPS
3 tricks to make your life easier when there is no DATE_DIFF() function available
Prospect A signed up for your company’s service offer on January 20th, 2022 at 8 AM. She cancelled her subscription on February 3rd, 2022 at 4 PM. How long has she been a client of your company?
To answer this question, you will need to compute the difference between two points in time. Regardless of the most relevant time scale in this specific situation (should the difference be expressed in weeks? in days? in hours?), there must exist a function to help you find easily how much time has passed between two events — in this case, between the prospect’s signup and the cancellation of her subscription.
In my previous work experience I used to get an answer to this type of questions by using the Google BigQuery console that was directly connected to our database. In Google BigQuery queries are written in standard SQL. Conveniently enough, there is an easy-to-implement formula to calculate the difference between two points in time: DATE_DIFF().
However, in PostgreSQL there is no direct equivalent to DATE_DIFF(). Therefore we must find the right combination of existing functions to get what we really need as an output. On that topic, PostgreSQL’s documentation is extensive, but I must admit that I became somehow confused with all the formulas listed.
What seemed to be easy turned out to be slightly more complex than I expected, so I thought I could share with you the 3 following tips to correctly handle date differences in PostgreSQL.
Let’s go into them in details by using the example mentioned in the introduction. In this example let’s assume that we define two input variables: start_date and end_date. What we want to get as an output is the difference between these two date times:
The first tip I used to come across is from SQLines:
They suggest to create a user-defined function or, if you only have the rights or the skills to write some declarative SQL (SELECT...
), the suggested solution is to use a succession of DATE_PART() functions. In our example, we would write the following to get the difference in days and hours:
This would lead to the following output:
The calculation is correct, and the output is easily interpretable. However I see two disadvantages with this method.
First, the piece of code required is not compact. For a better code readability, one must ensure that the code is correctly indented so the reader understands which “DATEPART” corresponds to which part of the calculation. In our example we only went until the hour scale, but if we wanted to compute the difference in seconds, we would add two more DATE_PART functions.
Second, calling several times the function DATE_PART() leads to possible loss performance when executing the piece of code. When implementing various options for the same outputs, you must make sure to compare the performance between two queries when applied to the same dataset.
In PostgreSQL the function AGE() will give you a detailed answer to your question: how much time did pass between these 2 events? The output is in the form of a text line displaying the difference between the two points in time.
The output will look like this:
Depending in your data inputs and the use case you are tackling, you may be satisfied with this method. In particular if you answer the end user’s need with the output given by the function AGE(), there may be no need to over-complicate the output.
If you are sure that all your values are included within a given date and time scale, keep it simple. Just stick to the AGE() function and combine it with EXTRACT(). This works correctly for the difference expressed in days below. But look closely at the output of the difference in hours:
As you could notice, the problem with the second query is that we try to return the difference between two days in hours. Here, the AGE() function is not appropriate: it returns the difference in hours…regardless of the day!
In this type of situation— and in any case if you have a doubt — you should rather use the combination of the EXTRACT() function and the EPOCH parameter. Let’s take our example one last time to demonstrate that:
The output will look like this:
It works! With two advantages compared to the code written with option #1:
- The code is more compact
- The difference in days is expressed at a more detailed level (by default, with two decimals)
Keep it simple where applicable
Make it complex where relevant
The idea of this article is to gather useful tips to handle date and time differences in PostgreSQL. There is no need for you to apply all these functions at once — just choose the most relevant for your own use case.
Do you use other functions to easily handle dates and times in PostgreSQL? I would love to hear them and to share our best practices in the comments section!
SQL TIPS
3 tricks to make your life easier when there is no DATE_DIFF() function available
Prospect A signed up for your company’s service offer on January 20th, 2022 at 8 AM. She cancelled her subscription on February 3rd, 2022 at 4 PM. How long has she been a client of your company?
To answer this question, you will need to compute the difference between two points in time. Regardless of the most relevant time scale in this specific situation (should the difference be expressed in weeks? in days? in hours?), there must exist a function to help you find easily how much time has passed between two events — in this case, between the prospect’s signup and the cancellation of her subscription.
In my previous work experience I used to get an answer to this type of questions by using the Google BigQuery console that was directly connected to our database. In Google BigQuery queries are written in standard SQL. Conveniently enough, there is an easy-to-implement formula to calculate the difference between two points in time: DATE_DIFF().
However, in PostgreSQL there is no direct equivalent to DATE_DIFF(). Therefore we must find the right combination of existing functions to get what we really need as an output. On that topic, PostgreSQL’s documentation is extensive, but I must admit that I became somehow confused with all the formulas listed.
What seemed to be easy turned out to be slightly more complex than I expected, so I thought I could share with you the 3 following tips to correctly handle date differences in PostgreSQL.
Let’s go into them in details by using the example mentioned in the introduction. In this example let’s assume that we define two input variables: start_date and end_date. What we want to get as an output is the difference between these two date times:
The first tip I used to come across is from SQLines:
They suggest to create a user-defined function or, if you only have the rights or the skills to write some declarative SQL (SELECT...
), the suggested solution is to use a succession of DATE_PART() functions. In our example, we would write the following to get the difference in days and hours:
This would lead to the following output:
The calculation is correct, and the output is easily interpretable. However I see two disadvantages with this method.
First, the piece of code required is not compact. For a better code readability, one must ensure that the code is correctly indented so the reader understands which “DATEPART” corresponds to which part of the calculation. In our example we only went until the hour scale, but if we wanted to compute the difference in seconds, we would add two more DATE_PART functions.
Second, calling several times the function DATE_PART() leads to possible loss performance when executing the piece of code. When implementing various options for the same outputs, you must make sure to compare the performance between two queries when applied to the same dataset.
In PostgreSQL the function AGE() will give you a detailed answer to your question: how much time did pass between these 2 events? The output is in the form of a text line displaying the difference between the two points in time.
The output will look like this:
Depending in your data inputs and the use case you are tackling, you may be satisfied with this method. In particular if you answer the end user’s need with the output given by the function AGE(), there may be no need to over-complicate the output.
If you are sure that all your values are included within a given date and time scale, keep it simple. Just stick to the AGE() function and combine it with EXTRACT(). This works correctly for the difference expressed in days below. But look closely at the output of the difference in hours:
As you could notice, the problem with the second query is that we try to return the difference between two days in hours. Here, the AGE() function is not appropriate: it returns the difference in hours…regardless of the day!
In this type of situation— and in any case if you have a doubt — you should rather use the combination of the EXTRACT() function and the EPOCH parameter. Let’s take our example one last time to demonstrate that:
The output will look like this:
It works! With two advantages compared to the code written with option #1:
- The code is more compact
- The difference in days is expressed at a more detailed level (by default, with two decimals)
Keep it simple where applicable
Make it complex where relevant
The idea of this article is to gather useful tips to handle date and time differences in PostgreSQL. There is no need for you to apply all these functions at once — just choose the most relevant for your own use case.
Do you use other functions to easily handle dates and times in PostgreSQL? I would love to hear them and to share our best practices in the comments section!