Simulation with SimPy. Part 8: Output Data Analysis for… | by Darío Weitz | Jul, 2022

By Jessie Hobb On Jul 14, 2022

Part 8: Output Data Analysis for Non-terminating Simulations

As previously indicated, output data analysis for non-terminating simulations is more complicated than for terminating ones.

The usual procedure for terminating simulations is to make a certain number of independent replications, each one beginning with the same initial conditions and ending at the predefined terminal event (ending condition or ending time). That independence between runs is achieved utilizing a sequence of different random numbers for each run.

Performing a certain number of independent replications in a terminating simulation results in a random sample where the observations are independent. Besides, the analyst assumes that the observations are sampled from identical distributions and particularly drawn from a normal distribution.

If the conditions previously indicated are satisfied, we can use the procedures described in Article 4 and Article 5 to obtain a point estimator and a confidence interval for the measure of performance under study.

So, the procedure of independent replications provides a method for analyzing terminating simulations where the basic statistical assumptions are met.

Remember that a non-terminating simulation is one in which we are interested about the steady- state performance of the system under study. So, we make a very long run looking for a steady-state parameter, characteristic of the steady-state distribution of our measure of performance.

In article 7 we indicated that the most usual procedure to perform the output data analysis of a non-terminating simulation is the method of batch means.

It is a relatively simple method but it has two important drawbacks: 1) there is an initialization bias as a result of the fact that the system usually does not start under steady-state conditions; it is necessary to remove the data from the transient state –the warm-up period–, so that it does not have any influence on the steady-state parameter of the system. 2) as the method consists of a single very long run, the phenomenon of autocorrelation inevitably appears and consequently another bias arises. This situation should not be ignored since we need an unbiased estimator for the variance for a proper calculation of the size of the confidence interval of the measure of performance under study.

We will use the same example of Article 7—simulation of an emergency room with SimPy—to indicate procedures to reduce or eliminate those biases.

Several methods are used to determine the warm-up period in a non-terminating simulation [1]: Welch method; SPC method; Randomization test; Conway rule; Crossing of the mean rule; Marginal Standard Error Rule-5. The first two are graphical methods, the third is a statistical method, and the last three are heuristic procedures.

The Welch Method [2] is the simplest and most widely used. It consists of the following steps:

1. Make n replications (at least 5) each with length m.

2. Let Yji be the ith observation of the jth replication. So, i takes values from 1 to m; j takes values from 1 to n.

3. Calculate the averages across replications (Y⋅i) for each observation (i = 1,…,m) where m = min(mn) because each replication has a different number of observations.

4. Calculate a smoothing average (Yi(w)) for Y⋅i and plot it.

5. Visually assess a value of i beyond which Yi(w) appears to converge.

The method of batch means consists of performing a single long run, eliminating data from the warm-up period, and dividing the remaining length into intervals of equal length named batches. Care must be taken when constructing the batches because they can exhibit a high degree of correlation.

When there is a correlation between data, S2/n may be a biased estimator of the variance. For positive correlation, S2/n underestimates the value of the variance and the resulting confidence interval will be too short. For negative correlation, S2/n overestimates the value of the variance and the resulting confidence interval will be too wide. In any case, the true confidence interval will not be the desired (1-α).

The principal advantage of the batch means method is that provides an approximately unbiased estimator for the variance. The method divides the useful data into a number of contiguous batches. Each of the batches is treated as an observation from a random sample, allowing the calculation of a confidence interval using the sample average and the sample variance of the batch means data.

Since the observations that originate the batches show autocorrelation, it is inevitable that the batches also exhibit some degree of autocorrelation. This question has been intensively studied and several procedures have been developed to resolve it. However, there is no unanimous agreement as to which is the best and most effective procedure.

The major points of discussion revolve around what the batch size should be and what the number of batches should be. Increasing the batch size improves the independence between batches but as it also reduces the final number of batches, it increases the variance of the estimator.

It is recommended (heuristically) that the number of batches be between 20 and 30 and that the batch size be as close as possible to the warm-up period.

Below we will detail only those parts of the code that had some sort of variation with respect to what was indicated in Article 7.

Firstly, we need to import matplotlib.pyplot as plt for plotting the smoothing average indicated in the Welch Method.

@author: darwt
"""
## Emergency Room Simulation Model
## Non Terminating Simulation## Batch Means Technique
## Determining the warm-up period# Import Modulesimport pandas as pd
import numpy  as np
from numpy.random import RandomStateimport simpyfrom scipy import stats
from scipy.stats import uniform
from scipy.stats import truncnormimport matplotlib.pyplot as plt

We changed the value of some parameters in the initialization module and tripled the length of the run:

# initialization module
# Unit of time = hoursPATIENT_ARRIVAL_RATE  = 1.2
NUMBER_ADMISSIONS_DESKS = 2ADMISSION_MEAN = 0.3
ADMISSION_STD  = 0.15HOSPITAL_MEAN = 15
HOSPITAL_STD  = 1.5AMBULATORY_MEAN = 4
AMBULATORY_STD  = 1NO_CARE_INF = 0.5
NO_CARE_SUP = 1.0NUMBER_DOCS_HOSPITAL= 1
NUMBER_DOCS_AMBULAT = 5
NUMBER_DOCS_NO_CARE = 1# discrete probabilities for three care levels
prob1, prob2, prob3 = 0.3, 0.6, 0.1prob1 = round(prob1, 2)
prob2 = round(prob1 + prob2,2)
prob3 = round(prob2 + prob3,2)
list_of_probs = [prob1, prob2, prob3]listoflists = []SIM_TIME = 24 * 365 * 6        ## 24 hours * 365 days * 6 years

We didn’t make any changes to the two Python generator functions described in Article 7: def generate_patient()modeled the arrival of patients to the emergency room; def patient_stream()describes the flow of individual patients through the whole system.

The central core of the simulation algorithm is coded below. It includes a Boolean variable named l_warm_up. When it takes the True keyword, we make ten independent runs and call the function warm_up_period() to calculate and plot the smoothing average (Fig. 1).

prng = RandomState(5678)
stop_arrivals = 15000l_warm_up = Trueif l_warm_up:
number_of_runs = 10
else:
number_of_runs = 1for run in range(number_of_runs):
patient_arrival, arrival = [], []
patient_admission, patient_hospital_care = [], []   
patient_ambulatory_care, patient_no_care = [], []     time_in_admission, delay_in_admission = [],[]   
time_in_hospital_care,delay_in_hospital_care = [],[]
time_in_ambulatory_care, delay_in_ambulatory_care = [],[]
time_in_no_care, delay_in_no_care = [], []    env = simpy.Environment()
admission_desks = simpy.Resource(env, 
capacity = NUMBER_ADMISSIONS_DESKS)  
hospital_care   = simpy.Resource(env,
capacity = NUMBER_DOCS_HOSPITAL)
ambulatory_care = simpy.Resource(env, 
capacity = NUMBER_DOCS_AMBULAT)
no_care_level   = simpy.Resource(env, 
capacity = NUMBER_DOCS_NO_CARE)
env.process(generate_patient(env,
PATIENT_ARRIVAL_RATE,0,stop_arrivals, prng ))
env.run(until = SIM_TIME)
listoflists.append(delay_in_ambulatory_care)if l_warm_up:
warm_up_period()
else:
batch_means()

def warm_up_period():
df = pd.DataFrame([list(x) for x in zip(*listoflists)])
df['mean']   = df.mean(axis = 1)
df['cumavg'] = df['mean'].expanding().mean()    plt.plot(df.index.values,df['cumavg'])
plt.xlabel("Patient")
plt.ylabel("Cum. Avg.")
plt.savefig( your_path + 'warm_up_period.png')
plt.show()

Figure 1 illustrates a plot of the data from the Welch Method for determining the warm-up period using data from 10 replications. From the chart, it seems that the smoothing average begins to converge after the arrival of 1000 patients to the ambulatory care sector.

Fig. 1: plot of the warm-up period based on the Welch Method. Made by the author with Matplotlib.

So, we used the value indicated in the plot for our warm-up period, deleting the first 1000 records. We made a production run based on the remaining data using the Batch Means Method with the function def batch_means():

We turn the Boolean variable to False (l_warm_up = False) to make only one run. We decided to use 20 batches. The warm-up period is deleted in the first line of code:

def batch_means():
# eliminating the warm-up period
delay_in_ambulatory_care = delay_in_ambulatory_care[1000:]    number_batchs = 20        ## selected by the analyst
number_recs    = len(delay_in_ambulatory_care)
recs_per_batch = int(number_recs/number_batchs)    # to garantee equal number of records in each batch
matrix_dim = number_batchs*recs_per_batch
rows_to_eliminate = number_recs - matrix_dim
delay_in_ambulatory_care = 
delay_in_ambulatory_care[rows_to_eliminate:]    matrix = []
while delay_in_ambulatory_care != []:
matrix.append(delay_in_ambulatory_care[:recs_per_batch])
delay_in_ambulatory_care = 
delay_in_ambulatory_care[recs_per_batch:]    ## Calculating and printing the average delay in ambulatory care
dof  = number_batchs - 1
confidence = 0.90            ## selected by the analyst
t_crit = np.abs(stats.t.ppf((1-confidence)/2,dof))
batch_means = np.mean(matrix, axis = 1)
average_batch_means  = np.mean(batch_means,axis = 0)
standard_batch_means = np.std(batch_means, axis = 0)    inf = average_batch_means  
-standard_batch_means*t_crit/np.sqrt(number_batchs)    sup = average_batch_means 
+standard_batch_means*t_crit/np.sqrt(number_batchs)    inf = round(float(inf),2)
sup = round(float(sup),2)    print('')
print('Simulation of an Emergency Room')
print('')
print('Run %2s'% (run+1))
print('%3s patients arrived to the emergency room' % (len(patient_arrival)))
print('%3s patients derived to ambulatory care' % (number_recs))    print('%3s batches of %3s records were used for calculations' % (number_batchs, recs_per_batch))
print ('')
print(' The average delay in ambulatory care is %3s ' % (average_batch_means))
print ('')
print('The average delay in ambulatory care belongs to the interval  %3s %3s' % (inf, sup))

Finally, we used our simulation model to calculate the average delay in ambulatory care. We made a single long run using 6 years for SIM_TIME (SIM_TIME = 24 * 365 * 6 ) . It was necessary to increase the length of the run so that each batch had a significant number of observations.

Figure 2 summarizes the output data: after 60128 simulated patients, we claim that the average delay in the ambulatory care level is 1.20 hours, contained in the approximately 90% confidence interval [1.12- 1.29] hours.

The batch means method avoids replicating the simulation run several times but care must be taken with the autocorrelation bias.

[1]: Prasad S. Mahajan, Ricki G. Ingalls. “Evaluation of methods used to detect warm-up period in steady state simulation.” Proceedings of the 2004 Winter Simulation Conference R .G. Ingalls, M. D. Rossetti, J. S. Smith, and B. A. Peters, eds.

[2]: Welch, P. 1983. “The statistical analysis of simulation results.” In The computer modeling handbook, ed. S. Lavenberg, 268–328. New York: Academic Press.