Streamlit Tutorial: Creating Word Reports for Data Science Projects | by Andy McDonald | Apr, 2023

By Jessie Hobb On Apr 19, 2023

Combining python-docx and Streamlit for Data Science Report Automation

Report image generated by the author using Midjourney Basic Plan.

At the end of data-related projects, whether for petrophysics or data science, creating a report is a very common occurrence. The generated reports provide clients and end users with information on the key results and conclusions obtained during the study, as well as detailing the methodologies used.

However, creating structured reports can be a tedious and time-consuming process, especially when it comes to ensuring they are formatted correctly and the data is presented in the best way possible.

This article will show how we can use the popular Streamlit library, combined with python-docx, to create the first step in automating part of the reporting process.

The python-docx library will allow us to create a Microsoft Word report. Having the report in the format will allow us to make edits and apply finishing touches before making it into a PDF.

Even though the worked example in this article requires mostly manual input, it could be adapted to include the power of large language models to summarise the data and create the required text.

Let’s get started building a Streamlit Word document report generator.

First, we will import the main libraries we are going to work with. These are Streamlit, pandas, matplotlib and python-docx.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import docx

Next, we will set the Streamlit page layout to wide. This allows the app to take up the full width of the browser window rather than being a narrow column in the middle.

st.set_page_config(layout='wide')

Now we can start building out the User Interface (UI).

We will start by giving our app a title.

st.title('Streamlit Data Report Generator')

The starting point for the Report Generator. Image by the author.

To keep things simple, we will pre-load our data using pd.read_csv() and pass in a file name.

df = pd.read_csv('Xeek_Well_15-9-15.csv')

The dataset used for this tutorial is a subset of a training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). This dataset is licensed under a a Creative Commons Attribution 4.0 International license.

To make this app more flexible, we could add a file uploader allowing users to load their own data.

You can find out more on how to do this in my article Uploading and Reading Files with Streamlit

Creating the Report Form with st.form

When widgets are included within a Streamlit app, any time they are edited or selected, the Streamlit app will re-run. To prevent this, we can create a form.

This will allow us to enter values, and the app will only run when the button is pressed.

We can create a form using with st.form('report') followed by the inputs we want.

Streamlit report generator with report details section. Image by the author.

We will use the upper section of our app to create the report metadata. This includes the title of the report, who the author is, who the client is and the date for the report.

Each of these elements is tied to a user input widget from Streamlit.

    report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")

For the form to display, we need to add a submit button. This is done using st.form_submit_button() . Here, we can pass a label that will appear on the button.

if st.form_submit_button('Generate'):
generate_report(report_title)

Underneath this, I have placed a call to a generate_report function, which we will create shortly. For the moment, this will act as a placeholder.

Here is the code for the form so far.

with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
if st.form_submit_button('Generate'):
generate_report(report_title)

Creating a Report Section in the Streamlit Form

Reports are generally comprised of multiple sections or chapters.

To illustrate creating a very simple section within our app, we will add some inputs for the user to enter the section title and summary.

Streamlit report generator with the section input boxes. Image by the author.

In the above image, I have added two new input widgets.

A section title, which is a simple text input (st.text_input), and a summary of the section, which is a text area (st.text_area).

Additionally, I have created two new columns to separate them from the columns above. This is important if we want to add any full-width text/info between these parts of the form.

Here is our form code so far:

with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
sect_col1, sect_col2 = st.columns(2, gap='large')
sect_col1.write("### Section Details")
section_title = sect_col1.text_input("Enter section title")
section_text_summary = sect_col1.text_area("Section Summary")

We could expand this capability so that the user can add multiple sections. Each could then be coded to start on a new page using a page break.

Additionally, to make it more comprehensive, we could generate a preview of the report as it is being built within the app.

Lots of possibilities!

Including a Dataframe in a Word Document Using docx

Tables are essential in reports as they help present information in a clear, simple and organised way. This allows the readers to quickly understand the data and also compare it against other data values/categories in the same or different table.

To illustrate including a table in our report, we can use the statistical summary generated by the pandas describe() function as an example.

Within the UI, we can add a multi-select option that will allow the user to select columns from the dataframe. This is especially handy if we have many columns and are only interested in a select few.

Streamlit report generator with the option for users to select columns from a dataframe. Image by the author.

Before creating the multi-select entry box, we first need to grab the column names from the dataframe, which is done by creating a new variable and assigning it to df.columns

We then create the multi-select box using st.multiselect() . As we are working within columns, we need to call upon the required column. Which, in this case, is sect_col2.

with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
sect_col1, sect_col2 = st.columns(2, gap='large')
sect_col1.write("### Section Details")
section_title = sect_col1.text_input("Enter section title")
section_text_summary = sect_col1.text_area("Section Summary")
data_features = df.columns
sect_col2.write("### Data Summary")
data_to_summarise = sect_col2.multiselect("Select features to include in statistical summary", 
options=data_features)
if st.form_submit_button('Generate'):
generate_report(report_title)

Next, we need to create two functions.

The first will take the features we are interested in along with the dataframe, and generate the statistical summary of the data.

def create_df_stats_summary(dataframe, features_to_include):
sub_df = dataframe[features_to_include].copy()
return sub_df.describe()

The second function is a little bit more complex.

As python-docx doesn’t natively support dataframes, we need to create a table using docx like so:

def add_df_to_docx(doc, dataframe):
# Reset the index and get the new shape
dataframe = dataframe.reset_index()
num_rows, num_cols = dataframe.shape# Add a table to the document with the necessary number 
# of rows and columns
table = doc.add_table(rows=num_rows + 1, cols=num_cols)
# Add the header row
for i, col in enumerate(dataframe.columns):
table.cell(0, i).text = str(col)
# Add the data rows
for i, row in dataframe.iterrows():
for j, value in enumerate(row):
table.cell(i + 1, j).text = str(value)
return table

We will call these functions when the button is pressed.

Adding a Chart to the Word Document

Charts form another essential part of a report. They allow us to convey large amounts of data concisely.

To illustrate the creation and inclusion of a chart within the final Word document, we will allow the user to select three columns from the dataset. These will then be used to create a scatterplot to be added to the report.

Streamlit report generator after including options for the Scatterplot. Image by the author.

As seen in the image above, we will add the three selection boxes below the previous two sections. These will be added to three new columns and are created using Streamlit’s selectbox().

with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
sect_col1, sect_col2 = st.columns(2, gap='large')
sect_col1.write("### Section Details")
section_title = sect_col1.text_input("Enter section title")
section_text_summary = sect_col1.text_area("Section Summary")
data_features = df.columns
sect_col2.write("### Data Summary")
data_to_summarise = sect_col2.multiselect("Select features to include in statistical summary", 
options=data_features)
st.write("### Scatterplot Setup")
sub_col1, sub_col2, sub_col3 = st.columns(3)
chart_x = sub_col1.selectbox('X axis', options=data_features)
chart_y = sub_col2.selectbox('Y axis', options=data_features)
chart_z = sub_col3.selectbox('Z axis', options=data_features)
if st.form_submit_button('Generate'):
generate_report(report_title)

We will then create a new function called create_scatterplotthat will be used to create our figure.

We will set our function up to take several arguments:

dataframe : the dataframe object containing the data
xaxis: feature to be plotted on the x-axis
yaxis: feature to be plotted on the y-axis
colour: feature to be used to colour the data points
plot_name: the name of our plot. This will be used as the file name
xaxis_scale: a list which contains two elements and will be used to define the min and max range for the x-axis
yaxis_scale: a list which contains two elements and will be used to define the min and max range for the y-axis

By default, the xaxis_scale, and yaxis_scale will both be set to None. If the user does not pass these in, then matplotlib will use the min and max range of the data plotted as the axes limits.

Python-docx does not natively support matplotlib figures. As a workaround, we need to save our plot as a file, which can be picked up when we start writing to the Word document.

def create_scatterplot(dataframe, xaxis, yaxis, colour, plot_name,
xaxis_scale= None, yaxis_scale=None):
fig, ax = plt.subplots()ax.scatter(dataframe[xaxis], dataframe[yaxis],
c=dataframe[colour], cmap='viridis')
ax.set_xlabel(xaxis)
ax.set_ylabel(yaxis)
if xaxis_scale is not None:
ax.set_xlim(xmin=xaxis_scale[0], xmax=xaxis_scale[1])
if yaxis_scale is not None:
ax.set_ylim(ymin=yaxis_scale[0], ymax=yaxis_scale[1])
filename = f'{plot_name}.png'
plt.savefig(filename)

Adding Separating Horizontal Lines to the Streamlit UI

To help break up the UI and make each section stand out, we can add a horizontal line using st.write('---').

This will be converted from markdown language to an actual line.

If you want to find out more about the st.write function, then check out: How to Use Streamlit’s st.write Function to Improve Your Streamlit Dashboard.

Our final code for the Streamlit form is as follows:

with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
st.write("---")
sect_col1, sect_col2 = st.columns(2, gap='large')
sect_col1.write("### Section Details")
section_title = sect_col1.text_input("Enter section title")
section_text_summary = sect_col1.text_area("Section Summary")
data_features = df.columns
sect_col2.write("### Data Summary")
data_to_summarise = sect_col2.multiselect("Select features to include in statistical summary", 
options=data_features)
st.write("---")
st.write("### Scatterplot Setup")
sub_col1, sub_col2, sub_col3 = st.columns(3)
chart_x = sub_col1.selectbox('X axis', options=data_features)
chart_y = sub_col2.selectbox('Y axis', options=data_features)
chart_z = sub_col3.selectbox('Z axis', options=data_features)
if st.form_submit_button('Generate'):
generate_report(report_title)

Our final part is to create the generate_report function.

This function will take in all the previous pieces we gathered from the user and then write them into our Word document.

As seen in the code below, we first need to create our docx object, which is done by calling docx.Document().

We then start creating each section of the report using a combination of headings and paragraphs. Some of these utilise f-strings so that we can combine text with the input variables.

Then we move on to adding in the scatterplot we created earlier, which is done using doc.add_picture().

The final section contains our dataframe statistical summary, which calls upon the add_df_to_docx function.

Finally, we save the report to the docx file.

def generate_report(report_title, report_author, report_date, report_client,
section_title=None, 
section_text_summary=None, 
data_stats_summary=None, 
graph_figure=None):doc = docx.Document()
# Add Title Page followed by section summary
doc.add_heading(report_title, 0)
doc.add_paragraph(f'Authored By: {report_author}')
doc.add_paragraph(f'Created On: {str(report_date)}')
doc.add_paragraph(f'Created For: {report_client}')
doc.add_heading(section_title, 1)
doc.add_paragraph(section_text_summary)
# Add Scatter plot
doc.add_heading('Data Visualisation', 2)
doc.add_picture(graph_figure)
# Add dataframe summary
doc.add_heading('Data Summary', 2)
summary_table = add_df_to_docx(doc, data_stats_summary)
summary_table.style = 'LightShading-Accent1'
doc.save('report.docx')
return st.info('Report Generated')

Once the writing function has been created, we can then fill out what happens when the user clicks on the Generate button.

First, we have to call upon the summary_stats and scatter_plot_file functions. The results of these will then be passed into the generate_report function.

    if st.form_submit_button('Generate'):
summary_stats = create_df_stats_summary(df, data_to_summarise)
scatter_plot_file = create_scatterplot(df, chart_x, chart_y, chart_z, 
plot_name='scatter', yaxis_scale=[3,1], )generate_report(report_title, report_author, report_date, report_client, 
section_title, section_text_summary, summary_stats,
graph_figure='scatter.png')

When we view our app, we can fill the input boxes with the required information and hit Generate.

Final view of the Streamlit Word report generator. Image by the author.

This will then create our Word document that we see below.

Page one of the report generated from the Streamlit app. Image by the author.

Page two of the report generated from the Streamlit app. Image by the author.

Creating reports is an essential part of any data science or petrophysics workflow. However, creating these reports can often be time-consuming and tedious.

Using a combination of Streamlit for the UI, and docx for creating the Word document, we can help reduce some of the burdens of report creation and begin automating the process.

With the arrival of Large Language Models (LLMs), we could potentially integrate these with this app to further enhance its capabilities and take automation to the next level.

Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Surrender, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for machine learning competition [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4351156

Below is the full code to generate the Word Report Streamlit app:

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import docxst.set_page_config(layout='wide')
def create_df_stats_summary(dataframe, features_to_include):
sub_df = dataframe[features_to_include].copy()
return sub_df.describe()
def create_scatterplot(dataframe, xaxis, yaxis, colour, plot_name,
xaxis_scale= None, yaxis_scale=None):
fig, ax = plt.subplots()
ax.scatter(dataframe[xaxis], dataframe[yaxis],
c=dataframe[colour], cmap='viridis')
ax.set_xlabel(xaxis)
ax.set_ylabel(yaxis)
if xaxis_scale is not None:
ax.set_xlim(xmin=xaxis_scale[0], xmax=xaxis_scale[1])
if yaxis_scale is not None:
ax.set_ylim(ymin=yaxis_scale[0], ymax=yaxis_scale[1])
filename = f'{plot_name}.png'
plt.savefig(filename)
def add_df_to_docx(doc, dataframe):
# Reset the index and get the new shape
dataframe = dataframe.reset_index()
num_rows, num_cols = dataframe.shape
# Add a table to the document with the necessary number 
# of rows and columns
table = doc.add_table(rows=num_rows + 1, cols=num_cols)
# Add the header row
for i, col in enumerate(dataframe.columns):
table.cell(0, i).text = str(col)
# Add the data rows
for i, row in dataframe.iterrows():
for j, value in enumerate(row):
table.cell(i + 1, j).text = str(value)
return table
def generate_report(report_title, report_author, report_date, report_client,
section_title=None, 
section_text_summary=None, 
data_stats_summary=None, 
graph_figure=None):
doc = docx.Document()
# Add Title Page followed by section summary
doc.add_heading(report_title, 0)
doc.add_paragraph(f'Authored By: {report_author}')
doc.add_paragraph(f'Created On: {str(report_date)}')
doc.add_paragraph(f'Created For: {report_client}')
doc.add_heading(section_title, 1)
doc.add_paragraph(section_text_summary)
# Add Scatter plot
doc.add_heading('Data Visualisation', 2)
doc.add_picture(graph_figure)
# Add dataframe summary
doc.add_heading('Data Summary', 2)
summary_table = add_df_to_docx(doc, data_stats_summary)
summary_table.style = 'LightShading-Accent1'
doc.save('report.docx')
return st.info('Report Generated')
st.title('Streamlit Data Report Generator')
df = pd.read_csv('Xeek_Well_15-9-15.csv')
with st.form('report'):
st.write("### Report Details")
col1, col2 = st.columns(2, gap='large')
# Setup the title and associated data
report_title = col1.text_input("Enter report title")
report_author = col1.text_input("Enter the report author's name")
report_date = col2.date_input("Select a date for the report")
report_client = col2.text_input("Enter the client's name")
st.write("---")
sect_col1, sect_col2 = st.columns(2, gap='large')
# Setup the first report section and associated data
sect_col1.write("### Section Details")
section_title = sect_col1.text_input("Enter section title")
section_text_summary = sect_col1.text_area("Section Summary")
data_features = df.columns
sect_col2.write("### Data Summary")
data_to_summarise = sect_col2.multiselect("Select features to include in statistical summary", 
options=data_features)
st.write("---")
st.write("### Scatterplot Setup")
sub_col1, sub_col2, sub_col3 = st.columns(3)
chart_x = sub_col1.selectbox('X axis', options=data_features)
chart_y = sub_col2.selectbox('Y axis', options=data_features)
chart_z = sub_col3.selectbox('Z axis', options=data_features)
if st.form_submit_button('Generate'):
summary_stats = create_df_stats_summary(df, data_to_summarise)
scatter_plot_file = create_scatterplot(df, chart_x, chart_y, chart_z, 
plot_name='scatter', yaxis_scale=[3,1], )
generate_report(report_title, report_author, report_date, report_client, 
section_title, section_text_summary, summary_stats,
graph_figure='scatter.png')