Integrating Neural Net: Deriving the Normal Distribution CDF | by John Morrow | May, 2023

By Jessie Hobb On May 4, 2023

Integrating a function using a neural network (with code)

1. Introduction

This article presents a method for training a neural network to derive the integral of a function. The technique works not only with analytically-solvable integrals but also with integrals that do not have a closed-form solution and are typically solved by numerical methods. An example is the normal distribution’s cumulative density function (CDF). Equation 1 is this distribution’s probability density function (PDF), and Equation 2 is its CDF, the integral of the PDF. Figure 1 is a plot of these functions. After being trained, the resulting network can serve as a stand-alone function generator that delivers points on the CDF curve given points from the domain of the distribution’s PDF.

Equation2: CDF

Figure 1. **PDF and CDF of the normal distribution**

2. Integrating neural network

An integrating neural network is trained to produce the integral of a function y = f(x). Expressed in terms of the network’s input and output:

Equation 3

where h and x are the network’s output and input, respectively. For the normal distribution, f(x) is given by Equation 1, the PDF of the distribution.

Integration of the function is accomplished by training the neural network such that the derivative of the network’s output is equal to the function’s output, resulting in the network’s output becoming the integral of the function.

2.1 Neural network training

Following are the steps of the training procedure:

Apply a training point, xᵢ , to the function y = f(x):

Equation 4

2. Also apply xᵢ to the input of the neural network:
(The neural network model comprises a single input, x, two hidden layers, and a single output, h, and is represented by h(x) = nn_model(x).)

Equation 5

3. Take the derivative of hᵢ :

Equation 6

(Differentiation is provided in TensorFlow and PyTorch via their automatic differentiation function. In this article, the neural network is developed with TensorFlow GradientTape.)

4. Train the neural network with a loss function (loss 2 in Section 2.2) that forces the following relationship:

Equation 7

Then after the neural network is trained, since g = y, and substituting g and y from Equation 6 and Equation 4, respectively:

Equation 8

Integrating both sides of Equation 8 confirms that the neural network’s output is the integral of the function f(x):

Equation 9

where C is the constant of integration.

2.2 Neural network loss function

Typically, a neural network is trained with pairs of known input and output data. The training input data is presented to the neural network, and the resulting output is compared to the training output data using a loss function. The loss returned by this function is used via backpropagation to adjust the network’s weights to reduce the loss. An integrating neural network uses a custom loss function to constrain the neural network to produce an output that complies with the output of the integrated function.

The loss function for the integrating neural network, Figure 2, has three components. Loss 2, described in the training procedure above (Section 2.1), forces the output of the neural network to comply with the integral of f(x).

Loss 3 forces the neural network to comply with the initial condition h(x_init2) = h_init2. For the CDF model, this condition is h(−10) = 0, which sets C = 0 (Equation 9). For the purpose of this model, the responses of the PDF and CDF at x = −10 approximate the responses at x = −∞.

Setting the initial condition in Loss 3 to h(−∞) = 0 also simplifies the CDF calculation. Expanding the definite integral of Equation 2:

The initial condition, h(−∞) = 0, means that the second term equals zero, and the output of the trained neural network is the value of the CDF for the corresponding x input:

Loss 1, with condition h(10) = 1, stabilizes the training process for points near the right tail of the distribution. For the purpose of this model, the responses of the PDF and CDF at x = 10 approximate the responses at
x = ∞.

3. Integrating neural network implementation

Following is the Python code for the integrating neural network implementation of the normal distribution CDF. The complete code is available here.

3.1 Neural network model definition

The neural network has two fully-connected hidden layers, each with 512 neurons. There is a single input for domain points and a single output for the corresponding integral values.

Listing 1. TensorFlow neural network model

3.2 Initialization

The xᵢ training points from Section 2.1 for loss 2 are defined on line 9, below. The order of these points is randomly shuffled on line 11 to promote stable training of the neural network. On line 12 the points are applied to the PDF as described in Equation 4.

The initial conditions for loss 1 and loss 3 are defined in lines 15-16 and lines 19-20, respectively.

Listing 2. Initialization

3.3 Batch training step

Listing 3 is the training step function applied to each batch of training points. The total loss (the sum of loss 1, loss 2, and loss 3) in line 24 is used to update the neural network’s weights via gradient descent (lines 26 -30). Each training epoch includes multiple batches, which collectively use all the training points in model updates.

Line 9 produces the neural network’s response to the x_init initial condition. The response is compared to the corresponding initial condition, h_init, producing loss 1 (line 10).

Similarly, Line 13 produces the network’s response to the x_init2 initial condition. The response is compared to the corresponding initial condition, h_init2, producing loss 3 (line 14).

Line 17 produces the network’s response to training point xᵢ (Equation 5). Line 18 extracts the gradient of the response (Equation 6), and lines 19–20 compare the gradient to f(xᵢ) (Equation 7), producing loss 2.

Listing 3. Batch training step

4. Results

Figure 3 shows the CDF response (red trace) from the output of the trained neural network. As verification of the accuracy of the results, the CDF response from the norm.cdf function in the Python SciPy library is included (green dots).

Figure 3. **Trained CDF neural network output**

Figure 4 is the loss from the total loss function v.s. epoch logged during the training process.

5. Conclusion

This article demonstrates a method for training a neural network to integrate a function by using a custom loss function and automatic differentiation. Specifically, a neural network is trained to successfully integrate the PDF of the normal distribution to produce the CDF.

An upcoming article will present a method for training a neural network to invert a function. The inverting network will be used to invert the output of the CDF-trained network from this article, then produce samples from the normal distribution.