Techno Blender
Digitally Yours.

Simple Probabilistic Inference in a Manufacturing Context | by Somik Raha | Sep, 2022

0 76


TL;DR This post applies probabilistic inference to a long-established mechanical engineering problem. If you don’t care much about theory and epistemology (how we got here), then just read The Problem and proceed to Fitting prior knowledge to a Beta distribution. If you are in a bigger rush, just read The Problem and open the model to figure it out.

How does one learn from data? With the explosion of data in so many business contexts, data science is no longer an optional discipline. With advanced statistical methods packaged in pretty libraries and commoditized for machine learning, it can be all too easy to miss the foundations of probability theory that is at the heart of how we learn from data. Those foundations are over 250 years old, and are both intuitive and philosophical. Understanding these foundations helps us be better practitioners of data science, machine learning and experimentation.

To do this, I am going to draw on a workshop I recently taught titled “The Magic of Probability” (public domain bilingual slides in English and Kannada here) under the auspices of the Dr. R. Venkatram Memorial Lecture Series at Bangalore Institute of Technology, my alma mater. The participants were mostly senior faculty members across all disciplines of engineering. As part of a deep dive, I asked for someone to volunteer a problem of inference. The professors of Mechanical Engineering gave me a great one to work with, and the goal of this article is to use that example to illustrate simple probabilistic inference. You can easily replace the manufacturing example with data-rich contexts. When extending this to AB tests, see this article.

A particular component being manufactured has a tolerance for acceptance, around 0.5 mm above or below 25 mm. If it falls outside this range, the component is called non-conforming and is rejected from the batch. The folks running the manufacturing shop believe that they will see a range of 4% to 6% non-conforming parts in each batch. When they run the next batch and count the number of non-conforming parts, how should they update their belief about non-conforming parts? What is the probability of the next part manufactured being non-conforming? What is the probability they should ascribe to being below an upper limit of the non-conformance level? How many batches should be run in order to get a 90% confidence level of being below the upper limit of the non-conformance level? Further, what annual operating profit should they forecast, and what is the probability of meeting a target operating profit?

In 1763, a great friendship changed the course of how humans do inference. The Reverend Thomas Bayes had passed away two years prior, leaving behind his unpublished work on probability. His dear friend and mathematician Richard Price published this work as An Essay towards solving a Problem in the Doctrine of Chances.

(Left) Rev. Thomas Bayes, Public Domain| (Right) Bayes’ friend Thomas Price, Public Domain.

This work carried two important advances. The first was the derivation of Bayes’ theorem from conditional probability. The second was the introduction of the Beta distribution. Pierre Simon Laplace independently worked out the same theorem and the Beta distribution, and also worked out a lot of what we call probability theory today.

Beta distributions start with a coin toss metaphor, which have only two possible outcomes, heads or tails. Hence, the probability of k successes in n trials is given by the binomial distribution (bi = “two”). This was known already before Bayes and Laplace came to the scene. They took inspiration from the discrete binomial distribution and by applying calculus, took it to the continuous distribution land. They retained k and n but called them alpha (or number of successes =k) and beta(or number of failures = n-k), which became the shape parameters of this new distribution, which they called the “Beta” distribution.

The Beta distribution with its shape parameters set to alpha = 1, and beta = 1. This is a graph that looks rectangular, starting at 0% with a height of 1 unit, and ending at 100% with the same height of 1 unit. Basically, this is the Uniform distribution.
The Beta distribution set to (alpha = 1, beta = 1) produces a Uniform Distribution. Image produced by author.

The amazing thing about this distribution is that it is shape shifting in a way that matches our common sense. If you were to start off your probabilistic inference by believing that you have only seen one success in two trials, then alpha = 1, beta = 2–1 = 1. The Beta(1,1) is actually the Uniform Distribution. Changing alpha and beta changes the distribution and helps us express different beliefs.

An animated gif showing the different distributions produced by varying the Beta Distribution’s shape parameters alpha and beta.
The shape-shifting Beta Distribution, Public Domain.

Bayes suggested “with a great deal of doubt” that the Uniform Distribution was the prior probability distribution to express ignorance about the correct prior distribution. Laplace had no such hesitation and asserted this was the way to go when we felt each outcome was equally likely. Further, Laplace provided a succession rule which basically said that we must use the mean of the distribution to place a probability on the next coin landing heads (or the next trial being considered a success).

Such was Laplace’s contribution in work Memoir on the Probability of the Causes of Events that most people in the West stopped focusing on probability theory believing there wasn’t much more to be advanced there. The Russians didn’t get that memo, and so they continued to think about probability, producing fundamental advances like Markov processes.

For our purposes though, the next big jump in classical probability came with the work of E. T. Jaynes, followed by Ronald A. Howard. Before we go there, did you notice an important detail? The x-axis of the graph above says “long-run fraction of heads”, and not “probability of heads.” This is an important detail because one cannot have a probability distribution on a probability — that is non-interpretable. Where did this thought come from?

Like Bayes, Jaynes’ seminal work was never published in his lifetime. His student Larry Bretthorst published Probability Theory: The Logic of Science after his passing. Jaynes’ class notes were a huge influence in the work of my teacher, Ronald A. Howard, the co-founder of Decision Analysis.

Jaynes introduced the concept of a reasoning robot which would use principles of logic that we would agree with. He wrote in the book cited above: “In order to direct attention to constructive things and away from controversial irrelevancies, we shall invent an imaginary being. Its brain is to be designed by us, so that it reasons according to certain definite rules. These rules will be deduced from simple desiderata which, it appears to us, would be desirable in human brains; i.e. we think that a rational person, on discovering that they were violating one of these desiderata, would wish to revise their thinking.”

“Our robot is going to reason about propositions. As already indicated above, we shall denote various propositions by italicized capital letters, {A, B, C, etc.}, and for the time being we must require that any proposition used must have, to the robot, an unambiguous meaning and must be of the simple, definite logical type that must be either true or false.”

Jaynes’ robot is the ancestor of Howard’s clairvoyant, an imaginary being that does not understand models but can answer factual questions about the future. The implication: we can only place probabilities on distinctions that are clear, and have not a trace of uncertainty in them. In some early writings, you will see the Beta distribution formulated on the “probability of heads.” A probability distribution on the “probability of heads” would not be interpretable in any meaningful way. Hence, the edit that Ronald Howard provided in his seminal 1970 paper, Perspectives on Inference, is to reframe the distinction as the long-run fraction of heads (or successes), a question that the clairvoyant can answer.

The beta distribution has a most interesting property. As we find more evidence, we can simply update alpha and beta, since they correspond to the number of successes and the number of failures, in order to obtain the updated probability distribution on the distinction of interest. Here is a simple example of different configurations of alpha and beta (S = number of successes, N = number of tosses):

A graph showing four charts. The first is a uniform distribution with alpha = 1, beta = 1. Second, the updated distribution with 0 heads in 2 tosses, alpha = 1, beta = 3. Third, the updated distribution with 1 head in 2 tosses, alpha = 2, beta = 2. Fourth, the updated distribution with 2 heads in 2 tosses, alpha = 3, beta = 1.
Updating the Beta(1,1) prior based on observations, Image created by author.

We can use this distribution to do our inference. I have prepared a public domain Google sheet (US version, India English version, India Kannada version) that you can play with after making a copy. I will use this sheet to explain the rest of the theory.

Remember that we started with a distribution of non-conformance (4% to 6%)? As an exercise for the reader, refer to the mean and variance of the Beta distribution and derive the formulae for alpha and beta using mean and variance.

Four equations are shown. The first two are for mean and variance in terms of alpha and beta, and the next two are alpha and beta in terms of mean and variance. The first equation is mu is equal to alpha divided by alpha plus beta. The second equation is variance = alpha times beta divided by alpha plus beta squared times alpha plus beta plus one. Third, alpha equals mu squared times one minus mu divided by variance. Finally, beta equals alpha times one minus mu divided by mu.
The first two are from wikipedia, and the third and fourth are derived with basic algebra, Image created by author.

How do we find the mean and variance of our prior assessment? We assessed the following percentile / probability pairs from our mechanical engineering experts:

A screenshot of the Google sheet with the inference model showing the quantile parameterized distribution. The 10% assessment for non-conformance is 4%. The 50% assessment is 5%, and the 90% assessment is 6%.
Snapshot from spreadsheet, Image created by author.

The interpretation of the above is that there is only a 10% chance of the non-conformance rate being below 4%, and a 10% chance of being above 6%. There is a 50–50 shot of being above or below 5%. A rule of thumb is to assign 25%/50%/25% to 10/50/90th percentile. If you’d like to read more about this theory, see [1][2][3]. This shortcut makes it easy for us to compute the mean as:

Mean = 25% x 4% + 50% x 5% + 25% x 6% = 5%

Snapshot from spreadsheet, Image created by author.

We can similarly calculate the variance using the standard formula to yield the following alpha and beta shape parameters.

As you can see, the worksheet shows the equivalent number of successes and tosses. Providing an input of 4%-5%-6% as our prior belief is the same as saying, “we have a strength of belief that is equivalent to seeing 47 successes in 949 tosses.” This framing allows our experts to cross-check whether 47 out of 949 tosses makes intuitive sense to them.

We can also discretize the fitted beta distribution and compare with the original inputs, which is below.

Two rows in table — first row shows original inputs: 4%-5%-6%. Second row shows fitted distribution discretized: 3.9%-4.95%-5.9%. A callout on the second row notes how close it is to the original inputs.
Comparing the fitted distribution to the original inputs, snapshot from spreadsheet, Image created by author.

Now that we have the prior, we can easily update it with our observations. We ask the following question:

A screenshot of the supplied model showing two inputs: number of non-conforming components, and the total number of components manufactured.
Input for observations, snapshot from spreadsheet, Image created by author.

The new alpha (successes) and beta (failures) parameters are simply the sum of the previous alpha with the new successes and the previous beta with the new failures respectively. This is shown separately in the section below:

The section of the model which shows the updating of the beta distribution, snapshot from spreadsheet, Image created by author.

This can now be visualized in different ways. First, we can see the posterior distribution discretized and compare it with the inputs:

Comparing the posterior with the prior, snapshot from spreadsheet, Image created by author.

We see that the posterior distribution is left-shifted. We can also see this in the visualizations that follow:

Snapshot from spreadsheet, Image created by author.

First, by Laplace’s succession rule, we can answer the question: What is the probability of the next component being non-conforming?

Snapshot from spreadsheet, Image created by author.

This was arrived at by simply dividing the number of posterior successes (posterior alpha) by the number of posterior trials (posterior alpha + posterior beta), or the mean of the posterior distribution.

Since we have the posterior cumulative distribution, we can easily read it to answer probability questions.

Next, we are interested in knowing the probability of being below the target non-conforming level. We can answer this easily by reading the cumulative distribution function against the target level. In our example below, we can do the readout against both the prior and the posterior.

Table showing the probability of the non-conformance level being below the target, set at 5% in a green input cell on the spreadsheet. Before testing, the probability that must be assigned would be 51.78%, and after testing (meaning, running the current batch), it would be 98.56%.
Probability of being below non-conformance target, before and after testing current batch, snapshot from spreadsheet, Image created by author.

As we can see, our observations made us far more confident about being below the non-conforming level. Note that this inference is good as far as it goes. One critique that can be leveled here is that we have taken all the data at face value and not discounted for the broader context in which this data may appear (for instance, how many batches are we going to see over the year?) Using that information would lead us to introduce a posterior scale power (a.k.a. data scale power) that would temper our inference from data.

Posterior scale power or data scale power can be thought of as the answer to the question: “how many trials (successes) do I need to see in this test/batch to consider as one trial (success)?” The worksheet has set the data scale power to 1 by default, which means all of the data is taken at face value and fully used. The problem with this is that we can make up our mind too quickly. A data scale power of 10, which implies that we will take every 10 trials as 1 trial and every 10 successes as 1 success, will immediately change our conclusion. As we can see below, the needle will barely move from the prior as we are now treating the 30 successes in 1000 trials as 3 successes in 100 trials (dividing by 10).

A snapshot of a spreadsheet showing inputs around observations of non-compliant components in the total batch that has gone through a test run. The first row has the input non-compliance rate of 4%, 5% and 6% (at the 10th, 50th and 90th percentile), followed by a fitted prior to a beta distribution at 4.1%, 4.95% and 5.9%, which is pretty close to the inputs. The posterior (3.95%, 4.75%, 5.65%) and probabilities of meeting a target non compliance rate of 5%, before and after the test are shown.
Reading Probabilities from distributions, snapshot from spreadsheet, Image created by author.

Looking at the above, we will quickly realize that we need to run more batches in order to get more confidence, as it should be. Let’s say we ran 5 batches of 1000 components each, and saw the same proportion as 30 successes over 1,000 trials — only, we saw 30 x 5 = 150 successes over 1,000 x 5 = 5,000 trials. We now see a close to 90% confidence level that we will be below the 5% target non-conformance level.

Snapshot from spreadsheet, Image created by author.

Now, a key question is: what is a principled way of setting a data scale power? Let’s say we want the forecast to be valid annually. One principle we can use is the proportion of the manufactured batches used for inference out of the total batches to be manufactured over the year. Let’s say our plan was to manufacture 50 batches, and we haver used 5 batches for inference. Then, we can set our data scale power to 50/5 (=10). Another way to interpret the data scale power is that we have to dilute the data by 10 times in order to interpret it for the entire year.

Let’s now turn to the final forecasting question on operating economics.

It is very easy to place an economic model on top of the forecasting work we have already done. By taking as inputs the price and cost of each component, the number of batches to be processed in a year, and the number of components in each batch, we can get a distribution of the number of non-compliant components by multiplying the total components manufactured (e.g. 50,000) by each item in the NC posterior distribution that we produced in the prior section. We can then directly calculate the loss distribution by multiplying the NC forecast by the cost of each component. The operating profit can also be calculated easily by calculating the net revenue of each compliant component and subtracting the loss of the non-compliant components.

A snapshot of a spreadsheet showing inputs for price, cost, number of components manufactured, and then a forecast table showing a range of the 10th, 50th and 90th percentile of compliant components, non-compliant components, the derived revenue from compliant components, the loss from non-compliant components, and the operating profit. Further, there is an input for the target operating profit, for which a probability of meeting the target is calculated, along with the implied non-compliant rat
Forecasting operating profit. Also available: US Version and Kannada Version. Snapshot from spreadsheet, Image created by author.

Further, as the screenshot above shows, we can calculate the probability of exceeding the target operating profit, which is the same as the probability of being below the implied non-conformance rate at that profit target, which we read off from the cumulative density function of the posterior (of non-conformance) in the previous section.

This is a simple model to show how we can get started using probability in forecasting. There are limitations to this approach, and one important limitation is that we are considering the price, cost, number of batches run and the number of components produced as fixed. These might all be uncertain, and when that happens, the model has to become a little more sophisticated. The reader is referred to the Tornado Diagram tool to make more sophisticated economic models that handle multi-factor uncertainty.

Further, the beta-binomial updating model works only if we assume stationarity in the process of making the parts, meaning there is no drift. The field of Statistical Process Control[4] gets into drift, and that is beyond the scope of this article.

Thanks to Dr. Brad Powley for reviewing this article, and to Anmol Mandhania for helpful comments. Mistakes are mine.

[1] Miller III, Allen C., and Thomas R. Rice. “Discrete approximations of probability distributions.” Management science 29, no. 3 (1983): 352–362. See P8.

[2] McNamee, Peter, and John Nunzio Celona. Decision analysis for the professional. SmartOrg, Incorporated, 2007. Free online PDF of book. See page 36 in chapter: Encoding Probabilities.

[3] Howard, Ronald A. “The foundations of decision analysis.” IEEE transactions on systems science and cybernetics 4, no. 3 (1968): 211–219.

[4] “Out of the Crisis” Wheeler, D J & Chambers, D S (1992) Understanding Statistical Process Control


TL;DR This post applies probabilistic inference to a long-established mechanical engineering problem. If you don’t care much about theory and epistemology (how we got here), then just read The Problem and proceed to Fitting prior knowledge to a Beta distribution. If you are in a bigger rush, just read The Problem and open the model to figure it out.

How does one learn from data? With the explosion of data in so many business contexts, data science is no longer an optional discipline. With advanced statistical methods packaged in pretty libraries and commoditized for machine learning, it can be all too easy to miss the foundations of probability theory that is at the heart of how we learn from data. Those foundations are over 250 years old, and are both intuitive and philosophical. Understanding these foundations helps us be better practitioners of data science, machine learning and experimentation.

To do this, I am going to draw on a workshop I recently taught titled “The Magic of Probability” (public domain bilingual slides in English and Kannada here) under the auspices of the Dr. R. Venkatram Memorial Lecture Series at Bangalore Institute of Technology, my alma mater. The participants were mostly senior faculty members across all disciplines of engineering. As part of a deep dive, I asked for someone to volunteer a problem of inference. The professors of Mechanical Engineering gave me a great one to work with, and the goal of this article is to use that example to illustrate simple probabilistic inference. You can easily replace the manufacturing example with data-rich contexts. When extending this to AB tests, see this article.

A particular component being manufactured has a tolerance for acceptance, around 0.5 mm above or below 25 mm. If it falls outside this range, the component is called non-conforming and is rejected from the batch. The folks running the manufacturing shop believe that they will see a range of 4% to 6% non-conforming parts in each batch. When they run the next batch and count the number of non-conforming parts, how should they update their belief about non-conforming parts? What is the probability of the next part manufactured being non-conforming? What is the probability they should ascribe to being below an upper limit of the non-conformance level? How many batches should be run in order to get a 90% confidence level of being below the upper limit of the non-conformance level? Further, what annual operating profit should they forecast, and what is the probability of meeting a target operating profit?

In 1763, a great friendship changed the course of how humans do inference. The Reverend Thomas Bayes had passed away two years prior, leaving behind his unpublished work on probability. His dear friend and mathematician Richard Price published this work as An Essay towards solving a Problem in the Doctrine of Chances.

(Left) Rev. Thomas Bayes, Public Domain| (Right) Bayes’ friend Thomas Price, Public Domain.

This work carried two important advances. The first was the derivation of Bayes’ theorem from conditional probability. The second was the introduction of the Beta distribution. Pierre Simon Laplace independently worked out the same theorem and the Beta distribution, and also worked out a lot of what we call probability theory today.

Beta distributions start with a coin toss metaphor, which have only two possible outcomes, heads or tails. Hence, the probability of k successes in n trials is given by the binomial distribution (bi = “two”). This was known already before Bayes and Laplace came to the scene. They took inspiration from the discrete binomial distribution and by applying calculus, took it to the continuous distribution land. They retained k and n but called them alpha (or number of successes =k) and beta(or number of failures = n-k), which became the shape parameters of this new distribution, which they called the “Beta” distribution.

The Beta distribution with its shape parameters set to alpha = 1, and beta = 1. This is a graph that looks rectangular, starting at 0% with a height of 1 unit, and ending at 100% with the same height of 1 unit. Basically, this is the Uniform distribution.
The Beta distribution set to (alpha = 1, beta = 1) produces a Uniform Distribution. Image produced by author.

The amazing thing about this distribution is that it is shape shifting in a way that matches our common sense. If you were to start off your probabilistic inference by believing that you have only seen one success in two trials, then alpha = 1, beta = 2–1 = 1. The Beta(1,1) is actually the Uniform Distribution. Changing alpha and beta changes the distribution and helps us express different beliefs.

An animated gif showing the different distributions produced by varying the Beta Distribution’s shape parameters alpha and beta.
The shape-shifting Beta Distribution, Public Domain.

Bayes suggested “with a great deal of doubt” that the Uniform Distribution was the prior probability distribution to express ignorance about the correct prior distribution. Laplace had no such hesitation and asserted this was the way to go when we felt each outcome was equally likely. Further, Laplace provided a succession rule which basically said that we must use the mean of the distribution to place a probability on the next coin landing heads (or the next trial being considered a success).

Such was Laplace’s contribution in work Memoir on the Probability of the Causes of Events that most people in the West stopped focusing on probability theory believing there wasn’t much more to be advanced there. The Russians didn’t get that memo, and so they continued to think about probability, producing fundamental advances like Markov processes.

For our purposes though, the next big jump in classical probability came with the work of E. T. Jaynes, followed by Ronald A. Howard. Before we go there, did you notice an important detail? The x-axis of the graph above says “long-run fraction of heads”, and not “probability of heads.” This is an important detail because one cannot have a probability distribution on a probability — that is non-interpretable. Where did this thought come from?

Like Bayes, Jaynes’ seminal work was never published in his lifetime. His student Larry Bretthorst published Probability Theory: The Logic of Science after his passing. Jaynes’ class notes were a huge influence in the work of my teacher, Ronald A. Howard, the co-founder of Decision Analysis.

Jaynes introduced the concept of a reasoning robot which would use principles of logic that we would agree with. He wrote in the book cited above: “In order to direct attention to constructive things and away from controversial irrelevancies, we shall invent an imaginary being. Its brain is to be designed by us, so that it reasons according to certain definite rules. These rules will be deduced from simple desiderata which, it appears to us, would be desirable in human brains; i.e. we think that a rational person, on discovering that they were violating one of these desiderata, would wish to revise their thinking.”

“Our robot is going to reason about propositions. As already indicated above, we shall denote various propositions by italicized capital letters, {A, B, C, etc.}, and for the time being we must require that any proposition used must have, to the robot, an unambiguous meaning and must be of the simple, definite logical type that must be either true or false.”

Jaynes’ robot is the ancestor of Howard’s clairvoyant, an imaginary being that does not understand models but can answer factual questions about the future. The implication: we can only place probabilities on distinctions that are clear, and have not a trace of uncertainty in them. In some early writings, you will see the Beta distribution formulated on the “probability of heads.” A probability distribution on the “probability of heads” would not be interpretable in any meaningful way. Hence, the edit that Ronald Howard provided in his seminal 1970 paper, Perspectives on Inference, is to reframe the distinction as the long-run fraction of heads (or successes), a question that the clairvoyant can answer.

The beta distribution has a most interesting property. As we find more evidence, we can simply update alpha and beta, since they correspond to the number of successes and the number of failures, in order to obtain the updated probability distribution on the distinction of interest. Here is a simple example of different configurations of alpha and beta (S = number of successes, N = number of tosses):

A graph showing four charts. The first is a uniform distribution with alpha = 1, beta = 1. Second, the updated distribution with 0 heads in 2 tosses, alpha = 1, beta = 3. Third, the updated distribution with 1 head in 2 tosses, alpha = 2, beta = 2. Fourth, the updated distribution with 2 heads in 2 tosses, alpha = 3, beta = 1.
Updating the Beta(1,1) prior based on observations, Image created by author.

We can use this distribution to do our inference. I have prepared a public domain Google sheet (US version, India English version, India Kannada version) that you can play with after making a copy. I will use this sheet to explain the rest of the theory.

Remember that we started with a distribution of non-conformance (4% to 6%)? As an exercise for the reader, refer to the mean and variance of the Beta distribution and derive the formulae for alpha and beta using mean and variance.

Four equations are shown. The first two are for mean and variance in terms of alpha and beta, and the next two are alpha and beta in terms of mean and variance. The first equation is mu is equal to alpha divided by alpha plus beta. The second equation is variance = alpha times beta divided by alpha plus beta squared times alpha plus beta plus one. Third, alpha equals mu squared times one minus mu divided by variance. Finally, beta equals alpha times one minus mu divided by mu.
The first two are from wikipedia, and the third and fourth are derived with basic algebra, Image created by author.

How do we find the mean and variance of our prior assessment? We assessed the following percentile / probability pairs from our mechanical engineering experts:

A screenshot of the Google sheet with the inference model showing the quantile parameterized distribution. The 10% assessment for non-conformance is 4%. The 50% assessment is 5%, and the 90% assessment is 6%.
Snapshot from spreadsheet, Image created by author.

The interpretation of the above is that there is only a 10% chance of the non-conformance rate being below 4%, and a 10% chance of being above 6%. There is a 50–50 shot of being above or below 5%. A rule of thumb is to assign 25%/50%/25% to 10/50/90th percentile. If you’d like to read more about this theory, see [1][2][3]. This shortcut makes it easy for us to compute the mean as:

Mean = 25% x 4% + 50% x 5% + 25% x 6% = 5%

Snapshot from spreadsheet, Image created by author.

We can similarly calculate the variance using the standard formula to yield the following alpha and beta shape parameters.

As you can see, the worksheet shows the equivalent number of successes and tosses. Providing an input of 4%-5%-6% as our prior belief is the same as saying, “we have a strength of belief that is equivalent to seeing 47 successes in 949 tosses.” This framing allows our experts to cross-check whether 47 out of 949 tosses makes intuitive sense to them.

We can also discretize the fitted beta distribution and compare with the original inputs, which is below.

Two rows in table — first row shows original inputs: 4%-5%-6%. Second row shows fitted distribution discretized: 3.9%-4.95%-5.9%. A callout on the second row notes how close it is to the original inputs.
Comparing the fitted distribution to the original inputs, snapshot from spreadsheet, Image created by author.

Now that we have the prior, we can easily update it with our observations. We ask the following question:

A screenshot of the supplied model showing two inputs: number of non-conforming components, and the total number of components manufactured.
Input for observations, snapshot from spreadsheet, Image created by author.

The new alpha (successes) and beta (failures) parameters are simply the sum of the previous alpha with the new successes and the previous beta with the new failures respectively. This is shown separately in the section below:

The section of the model which shows the updating of the beta distribution, snapshot from spreadsheet, Image created by author.

This can now be visualized in different ways. First, we can see the posterior distribution discretized and compare it with the inputs:

Comparing the posterior with the prior, snapshot from spreadsheet, Image created by author.

We see that the posterior distribution is left-shifted. We can also see this in the visualizations that follow:

Snapshot from spreadsheet, Image created by author.

First, by Laplace’s succession rule, we can answer the question: What is the probability of the next component being non-conforming?

Snapshot from spreadsheet, Image created by author.

This was arrived at by simply dividing the number of posterior successes (posterior alpha) by the number of posterior trials (posterior alpha + posterior beta), or the mean of the posterior distribution.

Since we have the posterior cumulative distribution, we can easily read it to answer probability questions.

Next, we are interested in knowing the probability of being below the target non-conforming level. We can answer this easily by reading the cumulative distribution function against the target level. In our example below, we can do the readout against both the prior and the posterior.

Table showing the probability of the non-conformance level being below the target, set at 5% in a green input cell on the spreadsheet. Before testing, the probability that must be assigned would be 51.78%, and after testing (meaning, running the current batch), it would be 98.56%.
Probability of being below non-conformance target, before and after testing current batch, snapshot from spreadsheet, Image created by author.

As we can see, our observations made us far more confident about being below the non-conforming level. Note that this inference is good as far as it goes. One critique that can be leveled here is that we have taken all the data at face value and not discounted for the broader context in which this data may appear (for instance, how many batches are we going to see over the year?) Using that information would lead us to introduce a posterior scale power (a.k.a. data scale power) that would temper our inference from data.

Posterior scale power or data scale power can be thought of as the answer to the question: “how many trials (successes) do I need to see in this test/batch to consider as one trial (success)?” The worksheet has set the data scale power to 1 by default, which means all of the data is taken at face value and fully used. The problem with this is that we can make up our mind too quickly. A data scale power of 10, which implies that we will take every 10 trials as 1 trial and every 10 successes as 1 success, will immediately change our conclusion. As we can see below, the needle will barely move from the prior as we are now treating the 30 successes in 1000 trials as 3 successes in 100 trials (dividing by 10).

A snapshot of a spreadsheet showing inputs around observations of non-compliant components in the total batch that has gone through a test run. The first row has the input non-compliance rate of 4%, 5% and 6% (at the 10th, 50th and 90th percentile), followed by a fitted prior to a beta distribution at 4.1%, 4.95% and 5.9%, which is pretty close to the inputs. The posterior (3.95%, 4.75%, 5.65%) and probabilities of meeting a target non compliance rate of 5%, before and after the test are shown.
Reading Probabilities from distributions, snapshot from spreadsheet, Image created by author.

Looking at the above, we will quickly realize that we need to run more batches in order to get more confidence, as it should be. Let’s say we ran 5 batches of 1000 components each, and saw the same proportion as 30 successes over 1,000 trials — only, we saw 30 x 5 = 150 successes over 1,000 x 5 = 5,000 trials. We now see a close to 90% confidence level that we will be below the 5% target non-conformance level.

Snapshot from spreadsheet, Image created by author.

Now, a key question is: what is a principled way of setting a data scale power? Let’s say we want the forecast to be valid annually. One principle we can use is the proportion of the manufactured batches used for inference out of the total batches to be manufactured over the year. Let’s say our plan was to manufacture 50 batches, and we haver used 5 batches for inference. Then, we can set our data scale power to 50/5 (=10). Another way to interpret the data scale power is that we have to dilute the data by 10 times in order to interpret it for the entire year.

Let’s now turn to the final forecasting question on operating economics.

It is very easy to place an economic model on top of the forecasting work we have already done. By taking as inputs the price and cost of each component, the number of batches to be processed in a year, and the number of components in each batch, we can get a distribution of the number of non-compliant components by multiplying the total components manufactured (e.g. 50,000) by each item in the NC posterior distribution that we produced in the prior section. We can then directly calculate the loss distribution by multiplying the NC forecast by the cost of each component. The operating profit can also be calculated easily by calculating the net revenue of each compliant component and subtracting the loss of the non-compliant components.

A snapshot of a spreadsheet showing inputs for price, cost, number of components manufactured, and then a forecast table showing a range of the 10th, 50th and 90th percentile of compliant components, non-compliant components, the derived revenue from compliant components, the loss from non-compliant components, and the operating profit. Further, there is an input for the target operating profit, for which a probability of meeting the target is calculated, along with the implied non-compliant rat
Forecasting operating profit. Also available: US Version and Kannada Version. Snapshot from spreadsheet, Image created by author.

Further, as the screenshot above shows, we can calculate the probability of exceeding the target operating profit, which is the same as the probability of being below the implied non-conformance rate at that profit target, which we read off from the cumulative density function of the posterior (of non-conformance) in the previous section.

This is a simple model to show how we can get started using probability in forecasting. There are limitations to this approach, and one important limitation is that we are considering the price, cost, number of batches run and the number of components produced as fixed. These might all be uncertain, and when that happens, the model has to become a little more sophisticated. The reader is referred to the Tornado Diagram tool to make more sophisticated economic models that handle multi-factor uncertainty.

Further, the beta-binomial updating model works only if we assume stationarity in the process of making the parts, meaning there is no drift. The field of Statistical Process Control[4] gets into drift, and that is beyond the scope of this article.

Thanks to Dr. Brad Powley for reviewing this article, and to Anmol Mandhania for helpful comments. Mistakes are mine.

[1] Miller III, Allen C., and Thomas R. Rice. “Discrete approximations of probability distributions.” Management science 29, no. 3 (1983): 352–362. See P8.

[2] McNamee, Peter, and John Nunzio Celona. Decision analysis for the professional. SmartOrg, Incorporated, 2007. Free online PDF of book. See page 36 in chapter: Encoding Probabilities.

[3] Howard, Ronald A. “The foundations of decision analysis.” IEEE transactions on systems science and cybernetics 4, no. 3 (1968): 211–219.

[4] “Out of the Crisis” Wheeler, D J & Chambers, D S (1992) Understanding Statistical Process Control

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment