Techno Blender
Digitally Yours.

BigQuery SQL Procedural Language to Simplify Data Engineering | by Vicky Yu | Nov, 2022

0 38


An introduction

Photo by Annie Spratt on Unsplash

As a long-time SQL user, I’ve often had to run the same code over and over with minor changes in my where statement. In programming languages such as Python, this copy and replacement wouldn’t be necessary because I could create a function to pass in different parameter values to rerun the same code. Today I want to share how you can use BigQuery’s procedural language to set variables and conditional logic to run SQL statements.

DECLARE and SET

The DECLARE statement initializes variables and the SET statement will set the value for the variable. This is useful if you need to run SQL code that is essentially the same except for a few values. In the example below, we have a table named product with two fields, item_name and quantity.

To get the quantity for apples, we would use a where statement to find item_name equal to apple ( row 3 ).

Now let’s say we want to query different fruits from this table but we don’t want to copy the entire SQL statement multiple times to change the fruit name in the where statement. In this case, we can use DECLARE to initialize a variable named fruit_name ( row 1 )and set the value to lemon ( row 2 ). Now, when the query is run the where statement queries item_name equal to the fruit_name variable which is set to lemon ( row 6 ).

To query for apple again we just need to change the fruit_name variable from lemon back to apple ( row 2 ).

This is a simple example of DECLARE and SET but they can be used in more ways than what I’ve shown above. Repetitive SQL statements can also be placed in a table function to remove the need for users to write the same SQL code multiple times with different where values.

IF-THEN

You can use IF-THEN conditional statements to execute SQL statements if the conditions have been met. A typical scenario I ran into was checking if a table had the latest data before running the remaining SQL for a report. Having the ability to skip over the code if the data wasn’t ready would’ve made it a lot easier from a data engineering perspective.

In the example below, I initialize two variables rowcnt ( row 1 ) and latest_date ( row 2 ). I check the row count of the prod_data table where the daily_date field is equal to 2022–11–18 and set that value to the rowcnt variable ( row 4 ).

Now using the IF-THEN conditional statements I check if rowcnt is equal to 1 ( row 6 ), meaning if there’s data found for 2022–11–18, then the string FOUND LATEST DATA will be shown. Otherwise, the latest_date is set to the value of the max date in the prod_data table ( row 10 ) and DATA DELAYED is displayed along with the value of latest_date ( row 12 ). In this case, data wasn’t found and the latest_date field shows 2022–11–15.

Again this is a simple example but you can see how conditional statements can prevent SQL code from running if the data isn’t available.

LOOP and LEAVE

You can use a combination of LOOP and LEAVE to loop until a condition is met before running your SQL statements. Using the example above, I added a counter variable and defaulted the value to -1 ( row 3 ). I continue subtracting days from 2022–11–18 using the date_sub function by the counter variable ( row 9 ) until the rowcnt variable equals 1.

Once rowcnt equals 1 the loop ends using the LEAVE statement ( row 11 ).

The last_date field shows the loop stopped when it found data in the prod_data table for 2022–11–15 ( row 16 ).

Special Mention: Besides LOOP and LEAVE, WHILE, CONTINUE, and FOR..IN can also be used to control loops.

Final Thoughts

I’ve barely scratched the surface of BigQuery’s procedural language but I hope you see the potential to simplify data engineering tasks. I highly recommend reviewing the documentation and giving procedural language a try.

Note: All queries above were run on BigQuery sandbox that’s free to anyone with a Google account.


An introduction

Photo by Annie Spratt on Unsplash

As a long-time SQL user, I’ve often had to run the same code over and over with minor changes in my where statement. In programming languages such as Python, this copy and replacement wouldn’t be necessary because I could create a function to pass in different parameter values to rerun the same code. Today I want to share how you can use BigQuery’s procedural language to set variables and conditional logic to run SQL statements.

DECLARE and SET

The DECLARE statement initializes variables and the SET statement will set the value for the variable. This is useful if you need to run SQL code that is essentially the same except for a few values. In the example below, we have a table named product with two fields, item_name and quantity.

To get the quantity for apples, we would use a where statement to find item_name equal to apple ( row 3 ).

Now let’s say we want to query different fruits from this table but we don’t want to copy the entire SQL statement multiple times to change the fruit name in the where statement. In this case, we can use DECLARE to initialize a variable named fruit_name ( row 1 )and set the value to lemon ( row 2 ). Now, when the query is run the where statement queries item_name equal to the fruit_name variable which is set to lemon ( row 6 ).

To query for apple again we just need to change the fruit_name variable from lemon back to apple ( row 2 ).

This is a simple example of DECLARE and SET but they can be used in more ways than what I’ve shown above. Repetitive SQL statements can also be placed in a table function to remove the need for users to write the same SQL code multiple times with different where values.

IF-THEN

You can use IF-THEN conditional statements to execute SQL statements if the conditions have been met. A typical scenario I ran into was checking if a table had the latest data before running the remaining SQL for a report. Having the ability to skip over the code if the data wasn’t ready would’ve made it a lot easier from a data engineering perspective.

In the example below, I initialize two variables rowcnt ( row 1 ) and latest_date ( row 2 ). I check the row count of the prod_data table where the daily_date field is equal to 2022–11–18 and set that value to the rowcnt variable ( row 4 ).

Now using the IF-THEN conditional statements I check if rowcnt is equal to 1 ( row 6 ), meaning if there’s data found for 2022–11–18, then the string FOUND LATEST DATA will be shown. Otherwise, the latest_date is set to the value of the max date in the prod_data table ( row 10 ) and DATA DELAYED is displayed along with the value of latest_date ( row 12 ). In this case, data wasn’t found and the latest_date field shows 2022–11–15.

Again this is a simple example but you can see how conditional statements can prevent SQL code from running if the data isn’t available.

LOOP and LEAVE

You can use a combination of LOOP and LEAVE to loop until a condition is met before running your SQL statements. Using the example above, I added a counter variable and defaulted the value to -1 ( row 3 ). I continue subtracting days from 2022–11–18 using the date_sub function by the counter variable ( row 9 ) until the rowcnt variable equals 1.

Once rowcnt equals 1 the loop ends using the LEAVE statement ( row 11 ).

The last_date field shows the loop stopped when it found data in the prod_data table for 2022–11–15 ( row 16 ).

Special Mention: Besides LOOP and LEAVE, WHILE, CONTINUE, and FOR..IN can also be used to control loops.

Final Thoughts

I’ve barely scratched the surface of BigQuery’s procedural language but I hope you see the potential to simplify data engineering tasks. I highly recommend reviewing the documentation and giving procedural language a try.

Note: All queries above were run on BigQuery sandbox that’s free to anyone with a Google account.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment