Techno Blender
Digitally Yours.

Time series forecasting in Snowflake using SQL | by Adithya Krishnan | Nov, 2022

0 31


Demand forecasting, supply-chain &inventory management, financial planning are important for business operations. Modelstar let’s you do that in Snowflake, with just 1 line of SQL.

Blog output overview. Image by Author.

Time series forecasting is a technique to predict values based on historical time sampled data.

Forecasting is rudimentary for business management

Forecasting can help companies make proper business decisions on supply chain management, inventory management (on how much & when to re-stock), financial planning, product roadmap, and hiring strategy, etc. With accurate and timely forecasting results, business management can have a better understanding of how to allocate resources or take advantage of tailwinds.

Technical challenges for forecasting

Forecasting is an application of time series analysis. There are several components to consider:

  • Seasonality: periodic changes over time. Example: Summer and winter vacation are yearly, or higher coffee consumption in mornings are daily.
  • Trend: continuous non-periodic changes. Example: Company sales growth in the past 5 years.
  • Disruptive events: sudden changes. It can be driven by both predictable factors, such as holidays or service maintenance, and unpredictable issues, such as random errors or bugs.

A good prediction algorithm should capture most of the components, and statistically make predictions with a certain confidence level.

Technical challenges of implementation

Python has a rich eco-system to implement machine learning and forecasting algorithms. Snowflake’s new Snowpark capability that brings Python to your Data Warehouse, using UDFs to run Python in SQL is a game changer on the transformations you can perform on your data. However, it can be daunting and time consuming if you want to implement an end-end solution to perform forecasting. Modelstar solves this by providing an streamlined solution to bring Python’s super powers to SQL.

Modelstar is an open source project and is built on the recently launched features from Snowflake, such as Snowpark. It automatically handles dependencies, model artefacts and file I/O in Snowflake compute.

The SQL 1-liner for forecasting

Modelstar lets you ship and manage forecasting models and visualise modelling results with 1 line of SQL inside Snowflake. Under the hood, Modelstar provides pre-built forecast algorithms, and exposes them as a SQL stored procedure in your database. In this example, we will be using univariate_time_series_forecast (API doc). This API is based on an open source library Prophet, which is one of the most widely used forecasting algorithms in industry.

This tutorial provides the steps to build a time series forecasting model and a report. It covers:

  • Basic concept: about sales forecasting use cases and technology.
  • Modelstar CLI tool: Modelstar installation guide
  • univariate_time_series_forecast SQL syntax: the SQL 1-liner to make forecast
  • Forecasting report: forecast results ready to be consumed by business teams

By the end of this example, you will know how to train a forecast model inside Snowflake, and generate a report showing model performance like this:

Ourput report. Image by Author.

This is a quick start guide to setup Modelstar if you are a first time Modelstar user.

Step #1: Install Modelstar

$ pip install modelstar

TIP: We recommend using a virtual environment to manage dependencies. Here is a quick overview on how to get started: Python environments.

Verify the installation with a quick version check:

$ modelstar --version

This should display the version number in your terminal.

Step #2: Initialise a Modelstar project

$ modelstar init forecast_project

TIP: modelstar init <project_name> is the base command, where <project_name> can be replaced with the name of your choice.

You will now see a forecast_project folder created in your working directory.

Step #3: Config Snowflake session

Inside forecast_project folder, find file modelstar.config.yaml and open it with your favourite editor. Add your Snowflake account info and credentials to it. Feel free to name the session with any name. In this example, we use snowflake-test. The credentails in this file is used to connect to your Snowflake data warehouse. (Note: Do not commit the modelstar.config.yaml file into your CI/CD, version control.)

# ./modelstar.config.yaml
# MODELSTAR CONFIGURATION FILE
---
sessions:
- name: snowflake-test
connector: snowflake
config:
account: WQA*****
username: <username>
password: <password>
database: MODELSTAR_TEST
schema: PUBLIC
stage: test
warehouse: COMPUTE_WH

NOTE: Please create the stage inside your Snowflake warehouse database and specify it here in the configuration.

Step #4: Ping Snowflake

We can now start a Modelstar session from your terminal. Inside the directory of the newly generated Modelstar project (in our example, it’s ./forecast_project/), run this:

$ modelstar use snowflake-test

TIP: modelstar use <session name> is the command, if you gave another session name, use that to replace <session name>.

A successful ping should lead to something like this:

Console output. Image by Author.

Step #5: Register the forecast algorithm to Snowflake

Modelstar provides the forecasting algorithm out-of-the-box and manages dependencies for this algorithm, so you wouldn’t have to. To makes this available in your Snowflake warehouse, run the following command:

$ modelstar register forecast:univariate_time_series_forecast

Success message looks like this:

Console output. Image by Author.

Step #6: Upload sample sales data to Snowflake (optional, if you are using your own dataset)

If you want to try the forecast algorithm on a sample sales dataset, run this command to create a data table in your data warehouse. You can skip this step if you want to use your own data.

$ modelstar create table sample_data/time_series_data.csv:TS_DATA_TABLE

This command uploads time_series_data.csv file to Snowflake and creates a table called ‘TS_DATA_TABLE’ .Find out more about this API here.

Run this script in a Snowflake Worksheet

Use the following command in Snowflake to build the prediction model (example below uses the sample data uploaded in step #6):

CALL UNIVARIATE_TIME_SERIES_FORECAST('TS_DATA_TABLE', 'DS', 'Y', 40, 'M');

It means: to predict the next 40 M (months) of Y value based on historical data in TS_DATA_TABLE table, where DS is the time column.

Snowflake Snowsight. Image by Author.

To run the forecasting algorithm on your own data

Under the hood, the forecast algorithm runs inside Snowflake as a Stored Procedure. It takes the following parameters:

To configure your own forecast period, check this API doc for a full list of unit alias.

Check the result

After the model training is finished, in the Snowflake Results window, a successful run should output a json string similar to this:

{
"return_table": "RESULT_UNIVARIATE_TIME_SERIES_FORECAST",
"run_id": "3NvQXnHQqUdYG4Fu"
}

It means a table named “RESULT_UNIVARIATE_TIME_SERIES_FORECAST” has been created to materialise the prediction data, and the run id (“3NvQXnHQqUdYG4Fu”) can help you pull a prediction report.

Check the prediction data table

Let’s check the results table from the run using:

SELECT * FROM RESULT_UNIVARIATE_TIME_SERIES_FORECAST;

There’re 4 columns in the table:

  • DS (datetime): datetime
  • Y_FORECAST, YHAT_LOWER, YHAT_UPPER (float): mean, lower and upper bonds of the predicted value (see Uncertainty Intervals in the Glossary section for their meaning).
Snowflake Snowsight. Image by Author.

Check the forecasting report

A report that records the information about the run, with the machine learning artefacts is auto generated with Modelstar. To check report, simply run this command in your local computer:

$ modelstar check <run_id>

The following message should be seen in your terminal:

As it mentions, a report will show up in your browser:

Modelstar report. Image by Author.

What’s in the report

The report includes 3 sections:

  • Meta information of this run
  • Forecasting chart: to check modelling quality and forecast results.
Modelstar report. Image by Author.
  • Component analysis: to illustrate trend and seasonality your model has “learned”, including an overall trend, and yearly and weekly seasonality (cyclical patterns over 1 year/week).
Modelstar report. Image by Author.

GLOSSARY

In-sample and out-of-sample forecast: From in-sample forecast, you can check how well the forecast model fits actual data. Out-of-sample forecast shows prediction of the future.

Uncertainty Intervals: the band between the upper and lower bounds. It means there is a 80% probability that the true value falls within that interval. A higher requirement of certainty leads to wider band (see Bartosz’s article). Uncertainty also grows as we go farther into the future, leading to a widened band as a function of time.

Forecasting is rudimentary for business management. Our goal was to ship a Forecasting function to Snowflake to train a Machine Learning model and make predictions using it. We achieved all of this with just 1 line of SQL. Along with this, a run report containing the details of the run along with the forecasting analysis is generated. This was made possible by Modelstar.

Check out Modelstar’s GitHub repository: here, star it to be updated on the latest. In case of bugs, issues or feature requests for your use-case reach out on Github or open an issue on GitHub.


Demand forecasting, supply-chain &inventory management, financial planning are important for business operations. Modelstar let’s you do that in Snowflake, with just 1 line of SQL.

Blog output overview. Image by Author.

Time series forecasting is a technique to predict values based on historical time sampled data.

Forecasting is rudimentary for business management

Forecasting can help companies make proper business decisions on supply chain management, inventory management (on how much & when to re-stock), financial planning, product roadmap, and hiring strategy, etc. With accurate and timely forecasting results, business management can have a better understanding of how to allocate resources or take advantage of tailwinds.

Technical challenges for forecasting

Forecasting is an application of time series analysis. There are several components to consider:

  • Seasonality: periodic changes over time. Example: Summer and winter vacation are yearly, or higher coffee consumption in mornings are daily.
  • Trend: continuous non-periodic changes. Example: Company sales growth in the past 5 years.
  • Disruptive events: sudden changes. It can be driven by both predictable factors, such as holidays or service maintenance, and unpredictable issues, such as random errors or bugs.

A good prediction algorithm should capture most of the components, and statistically make predictions with a certain confidence level.

Technical challenges of implementation

Python has a rich eco-system to implement machine learning and forecasting algorithms. Snowflake’s new Snowpark capability that brings Python to your Data Warehouse, using UDFs to run Python in SQL is a game changer on the transformations you can perform on your data. However, it can be daunting and time consuming if you want to implement an end-end solution to perform forecasting. Modelstar solves this by providing an streamlined solution to bring Python’s super powers to SQL.

Modelstar is an open source project and is built on the recently launched features from Snowflake, such as Snowpark. It automatically handles dependencies, model artefacts and file I/O in Snowflake compute.

The SQL 1-liner for forecasting

Modelstar lets you ship and manage forecasting models and visualise modelling results with 1 line of SQL inside Snowflake. Under the hood, Modelstar provides pre-built forecast algorithms, and exposes them as a SQL stored procedure in your database. In this example, we will be using univariate_time_series_forecast (API doc). This API is based on an open source library Prophet, which is one of the most widely used forecasting algorithms in industry.

This tutorial provides the steps to build a time series forecasting model and a report. It covers:

  • Basic concept: about sales forecasting use cases and technology.
  • Modelstar CLI tool: Modelstar installation guide
  • univariate_time_series_forecast SQL syntax: the SQL 1-liner to make forecast
  • Forecasting report: forecast results ready to be consumed by business teams

By the end of this example, you will know how to train a forecast model inside Snowflake, and generate a report showing model performance like this:

Ourput report. Image by Author.

This is a quick start guide to setup Modelstar if you are a first time Modelstar user.

Step #1: Install Modelstar

$ pip install modelstar

TIP: We recommend using a virtual environment to manage dependencies. Here is a quick overview on how to get started: Python environments.

Verify the installation with a quick version check:

$ modelstar --version

This should display the version number in your terminal.

Step #2: Initialise a Modelstar project

$ modelstar init forecast_project

TIP: modelstar init <project_name> is the base command, where <project_name> can be replaced with the name of your choice.

You will now see a forecast_project folder created in your working directory.

Step #3: Config Snowflake session

Inside forecast_project folder, find file modelstar.config.yaml and open it with your favourite editor. Add your Snowflake account info and credentials to it. Feel free to name the session with any name. In this example, we use snowflake-test. The credentails in this file is used to connect to your Snowflake data warehouse. (Note: Do not commit the modelstar.config.yaml file into your CI/CD, version control.)

# ./modelstar.config.yaml
# MODELSTAR CONFIGURATION FILE
---
sessions:
- name: snowflake-test
connector: snowflake
config:
account: WQA*****
username: <username>
password: <password>
database: MODELSTAR_TEST
schema: PUBLIC
stage: test
warehouse: COMPUTE_WH

NOTE: Please create the stage inside your Snowflake warehouse database and specify it here in the configuration.

Step #4: Ping Snowflake

We can now start a Modelstar session from your terminal. Inside the directory of the newly generated Modelstar project (in our example, it’s ./forecast_project/), run this:

$ modelstar use snowflake-test

TIP: modelstar use <session name> is the command, if you gave another session name, use that to replace <session name>.

A successful ping should lead to something like this:

Console output. Image by Author.

Step #5: Register the forecast algorithm to Snowflake

Modelstar provides the forecasting algorithm out-of-the-box and manages dependencies for this algorithm, so you wouldn’t have to. To makes this available in your Snowflake warehouse, run the following command:

$ modelstar register forecast:univariate_time_series_forecast

Success message looks like this:

Console output. Image by Author.

Step #6: Upload sample sales data to Snowflake (optional, if you are using your own dataset)

If you want to try the forecast algorithm on a sample sales dataset, run this command to create a data table in your data warehouse. You can skip this step if you want to use your own data.

$ modelstar create table sample_data/time_series_data.csv:TS_DATA_TABLE

This command uploads time_series_data.csv file to Snowflake and creates a table called ‘TS_DATA_TABLE’ .Find out more about this API here.

Run this script in a Snowflake Worksheet

Use the following command in Snowflake to build the prediction model (example below uses the sample data uploaded in step #6):

CALL UNIVARIATE_TIME_SERIES_FORECAST('TS_DATA_TABLE', 'DS', 'Y', 40, 'M');

It means: to predict the next 40 M (months) of Y value based on historical data in TS_DATA_TABLE table, where DS is the time column.

Snowflake Snowsight. Image by Author.

To run the forecasting algorithm on your own data

Under the hood, the forecast algorithm runs inside Snowflake as a Stored Procedure. It takes the following parameters:

To configure your own forecast period, check this API doc for a full list of unit alias.

Check the result

After the model training is finished, in the Snowflake Results window, a successful run should output a json string similar to this:

{
"return_table": "RESULT_UNIVARIATE_TIME_SERIES_FORECAST",
"run_id": "3NvQXnHQqUdYG4Fu"
}

It means a table named “RESULT_UNIVARIATE_TIME_SERIES_FORECAST” has been created to materialise the prediction data, and the run id (“3NvQXnHQqUdYG4Fu”) can help you pull a prediction report.

Check the prediction data table

Let’s check the results table from the run using:

SELECT * FROM RESULT_UNIVARIATE_TIME_SERIES_FORECAST;

There’re 4 columns in the table:

  • DS (datetime): datetime
  • Y_FORECAST, YHAT_LOWER, YHAT_UPPER (float): mean, lower and upper bonds of the predicted value (see Uncertainty Intervals in the Glossary section for their meaning).
Snowflake Snowsight. Image by Author.

Check the forecasting report

A report that records the information about the run, with the machine learning artefacts is auto generated with Modelstar. To check report, simply run this command in your local computer:

$ modelstar check <run_id>

The following message should be seen in your terminal:

As it mentions, a report will show up in your browser:

Modelstar report. Image by Author.

What’s in the report

The report includes 3 sections:

  • Meta information of this run
  • Forecasting chart: to check modelling quality and forecast results.
Modelstar report. Image by Author.
  • Component analysis: to illustrate trend and seasonality your model has “learned”, including an overall trend, and yearly and weekly seasonality (cyclical patterns over 1 year/week).
Modelstar report. Image by Author.

GLOSSARY

In-sample and out-of-sample forecast: From in-sample forecast, you can check how well the forecast model fits actual data. Out-of-sample forecast shows prediction of the future.

Uncertainty Intervals: the band between the upper and lower bounds. It means there is a 80% probability that the true value falls within that interval. A higher requirement of certainty leads to wider band (see Bartosz’s article). Uncertainty also grows as we go farther into the future, leading to a widened band as a function of time.

Forecasting is rudimentary for business management. Our goal was to ship a Forecasting function to Snowflake to train a Machine Learning model and make predictions using it. We achieved all of this with just 1 line of SQL. Along with this, a run report containing the details of the run along with the forecasting analysis is generated. This was made possible by Modelstar.

Check out Modelstar’s GitHub repository: here, star it to be updated on the latest. In case of bugs, issues or feature requests for your use-case reach out on Github or open an issue on GitHub.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment