Optimization, Newton’s Method, & Profit Maximization: Part 3— Econometric Profit Maximization | by Jacob Pieniazek

By Jessie Hobb On Mar 1, 2023

Learn how to apply optimization & econometric techniques to solve an applied profit maximization problem

This article is the 3rd, and final, in a 3 part series. In the 1st part, we studied basic optimization theory. Then, in pt. 2, we extended this theory to constrained optimization problems. Now, in pt. 3, we will apply the optimization theory covered, as well as econometric and economic theory, to solve a profit maximization problem.

Suppose, as a data scientist working for your company, you are tasked with estimating the optimal amount of money to allocate towards different advertising channels that will maximize the overall profit of a certain product line. Furthermore, suppose you are given constraints on these allocation decisions, such as the maximum total spend you have to allocate and/or minimum amounts that have to spent in certain channels. In this article, we are going to combine the optimization theory covered in part 1 and part 2 of this series, along with additional economic and econometric theory to tackle a theoretical profit maximization problem of this sorts — which we will flesh out in more detail in this article.

The goal of this article is to tie together what we have learned thus far and my hope is to motivate and inspire readers on how to incorporate these techniques into an applied setting. It is not meant to be a comprehensive solution to the problem covered as nuances and idiosyncrasies can, of course, complicate theoretical examples. Nevertheless, we will provide a strong framework for constructing applied optimization problems, and related ones. Let’s dive into it!

In part 1, we covered basic optimization theory — including 1) setting up and solving a simple single variable optimization problem analytically, 2) iterative optimization schemes — namely, gradient descent & Newton’s Method, and 3) implementing Newton’s method by hand and in python for a multi-dimensional optimization problem. In part 2, we covered constrained optimization theory — including 1) incorporating equality constraints and 2) incorporating inequality constraints into our optimization problems and solving them via Newton’s Method. This article is designed to be accessible for those who are already familiar with the content covered in part 1 and part 2.

Optimization Theory (Part 1 & Part 2 Recap)

A mathematical optimization problem can be formulated abstractly as follows:

where we choose real values of the vector x that minimize the objective function f(x) (or maximize –f(x)) subject to the inequality constraints g(x) and equality constraints h(x). In part 2, we discussed how to incorporate these constraints directly into our optimization problem. Notably, using Lagrange Multipliers and Logarithmic Barrier functions we can construct a new objective function O(x, Λ, ρ):

(2) Generalizable function for constrained optimization problems

where Λ is the vector of Lagrange multipliers associated with each equality constraints h(x) and ρ is the barrier parameter associated with all of the inequality constraints g(x). We can then solve this new objective function iterating by choose a starting value ρ (note that large functional values of the objective function will require much larger starting values of ρ to scale the penalization), optimize the new objective function evaluated at ρ using Newton’s method iterative scheme, then update ρ by slowly decreasing it (ρ → 0), and repeat until convergence — where Newton’s Method iterative scheme is as follows:

where H(x) and ∇f(x) denote the Hessian and gradient of our objective function O(x, Λ, ρ), respectively. Convergence is obtained when we reach convergence across one or more of the following criteria:

(4) Convergence Criteria for Iterative Optimization Schemes

In python, utilizing SymPy, we have 4 functions. A function that obtains the gradient of our SymPy function, the Hessian of our SymPy function, solves unconstrained optimization problem via Newton’s method, and solves a constrained optimization problem via Newton’s method according to the generalization eq. (2) (see part 1 and part 2 of series for more in-depth coverage of this material):

import sympy as sm
import numpy as npdef get_gradient(function, symbols, x0):
'''
Helper function to solve for Gradient of SymPy function.
'''
d1 = {}
gradient = np.array([])
for s in symbols:
d1[s]= sm.diff(function,s).evalf(subs=x0) # Take first derivative w/ respect to each symbol
gradient = np.append(gradient, d1[s])
return gradient.astype(np.float64) 
def get_hessian(function, symbols, x0): 
'''
Helper function to solve for Hessian of SymPy function.
'''
d2 = {}
hessian = np.array([])
for s1 in symbols:
for s2 in symbols:
d2[f"{s1}{s2}"] = sm.diff(function,s1,s2).evalf(subs=x0) # Take second derivative w/ respect to each combination of symbols
hessian = np.append(hessian, d2[f"{s1}{s2}"])
hessian = np.array(np.array_split(hessian,len(symbols)))
return hessian.astype(np.float64)
def newton_method(function,symbols,x0,iterations=100,mute=False):
'''
Function to run Newton's method to optimize SymPy function.
'''
# Dictionary of values to record each iteration
x_star = {}
x_star[0] = np.array(list(x0.values()))
if not mute:
print(f"Starting Values: {x_star[0]}")
i=0
while i < iterations:
# Get gradient and hessian
gradient = get_gradient(function, symbols, dict(zip(x0.keys(),x_star[i])))
hessian = get_hessian(function, symbols, dict(zip(x0.keys(),x_star[i])))
# Newton method iteration scheme
x_star[i+1] = x_star[i].T - np.linalg.inv(hessian) @ gradient.T
# Check convergence criteria
if np.linalg.norm(x_star[i+1] - x_star[i]) < 10e-5:
solution = dict(zip(x0.keys(),x_star[i+1]))
print(f"\nConvergence Achieved ({i+1} iterations): Solution = {solution}")
break 
else: 
solution = None
if not mute:
print(f"Step {i+1}: {x_star[i+1]}")
i += 1
return solution
def constrained_newton_method(function,symbols,x0,iterations=100,mute=False):
'''
Function to run Barrier method & Newton's method to optimize a
constrained SymPy function.
'''  
# Values to record over each iteration
x_star = {}
x_star[0] = np.array(list(x0.values())[:-1])
optimal_solutions = []
optimal_solutions.append(dict(zip(list(x0.keys())[:-1],x_star[0])))
# Barrier Method Iteration 
step = 1 
while step < iterations:
# Starting rho
if step == 1: 
rho_sub = list(x0.values())[-1]
# Evaluate function at rho value
rho_sub_values = {list(x0.keys())[-1]:rho_sub}
function_eval = function.evalf(subs=rho_sub_values)
if not mute:
print(f"Step {step} w/ {rho_sub_values}")
print(f"Starting Values: {x_star[0]}")
# Newton's Method
i=0
while i < iterations:
gradient = get_gradient(function_eval, symbols[:-1], dict(zip(list(x0.keys())[:-1],x_star[i])))
hessian = get_hessian(function_eval, symbols[:-1], dict(zip(list(x0.keys())[:-1],x_star[i])))
x_star[i+1] = x_star[i].T - np.linalg.inv(hessian) @ gradient.T
if np.linalg.norm(x_star[i+1] - x_star[i]) < 10e-5:
solution = dict(zip(list(x0.keys())[:-1],x_star[i+1]))
if not mute:
print(f"Convergence Achieved ({i+1} iterations): Solution = {solution}\n") 
break
i += 1
# Record optimal solution for each barrier method iteration
optimal_solution = x_star[i+1]
previous_optimal_solution = list(optimal_solutions[step-1].values())
optimal_solutions.append(dict(zip(list(x0.keys())[:-1],optimal_solution)))
# Check for overall convergence
if np.linalg.norm(optimal_solution - previous_optimal_solution) < 10e-5:
print(f"\n Overall Convergence Achieved ({step} steps): Solution = {optimal_solutions[step]}\n")
break
# Set new starting point
x_star = {}
x_star[0] = optimal_solution
# Update rho
rho_sub = 0.9*rho_sub
# Update Steps
step += 1
return optimal_solutions[step]

And to solve a constrained optimization problem, we can run the following code (Make sure starting values are in the feasible range of inequality constraints!!):

import sympy as smx, y, λ, ρ = sm.symbols('x y λ ρ')
# f(x): 100*(y-x**2)**2 + (1-x)**2 
# h(x): x**2 - y = 2
# g_1(x): x <= 0
# g_2(x) y >= 3
objective = 100*(y-x**2)**2 + (1-x)**2 + λ*(x**2-y-2) - ρ*sm.log((y-3)*(-x))
symbols = [x,y,λ,ρ] # Function requires last symbol to be ρ (barrier parameter)
x0 = {x:-15,y:15,λ:0,ρ:10}
constrained_newton_method(objective,symbols,x0,mute=False)

With the corresponding output:

If the material above felt foreign or you need a more rigorous recap, then I recommend taking a look at part 1 and part 2 of this series which will provide a more in-depth survey of the material above. For the remainder of this article, we will first discuss basic profit maximization & econometric theory and then move into solving the theoretical example.

Applied Profit Maximization

Suppose we have a $100,000 advertising budget and all of it must be spent. We are tasked with choosing the optimal amount of this budget to allocate towards two types of advertisement channels (digital ads and television ads) that maximize the overall profit for a particular product line. Furthermore, suppose that we must allocate at a minimum of $20k to television advertising and $10k to digital advertising.

Theoretical Formulation

Let’s now mathematically formulate the profit maximization problem we seek to solve:

where π(⋅) denotes the profit function, δ denotes digital ad spend, τ denotes television ad spend, and (⋅) is a placeholder for additional variables. Note that we are minimizing the negative of π(⋅) which is equivalent to maximizing π(⋅). The profit function is defined as follows:

where p denotes the price, q(δ, τ, ⋅) denotes the quantity demanded function, and C(q(⋅), δ, τ) denotes the cost function which, intuitively, is a direct function of the quantity (if we make more it will cost more to produce) and how much we spend on advertising. The cost function can also take on additional inputs, but for the sake of demonstration we will keep it as a function of quantity and advertising costs. Notice that our choices of δ and τ impact the profit function directly through their impact of quantity demanded and the cost function. In order to add tractability to our optimization problem, we will need to use econometric techniques to estimate our quantity function. Once we have specified our cost function and estimated the quantity function, we can then solve our optimization problem as follows:

where q_hat is our estimated econometric model for quantity demanded. Before we lay out the econometric specification of our quantity model, it is necessary that we discuss an important note regarding the required assumptions for this optimization problem to prove tractable. It is imperative that we obtain the causal estimates of digital and television advertising on the quantity demanded. In the economists jargon, digital and television advertising need be exogenous in the econometric model. That is, they are uncorrelated with the error in the model. Exogeneity can be achieved in two ways: 1) We have the correct structural specification of the econometric model for the impact of digital and television advertising on quantity demanded (i.e., we include all of the relevant variables that are correlated with both quantity demanded and digital & television advertising spend) or 2) We have random variation of digital & television advertising spend (this can be achieved from randomly varying spend over time to see how demand responds).

Intuitively, exogeneity is required because it is necessary to capture the causal impact of changing advertising spend — that is, what will happen, on average, if we change the values of the advertising spend. If the effect we estimate is not causal then the changes we make in advertising spend will not correspond to the true change in quantity demanded. Note the model need not make the best predictions for quantity demanded, but rather accurately capture the causal relationship. See my prior article that discusses the exogeneity assumption in more depth.

Let’s now suppose we specify the following econometric model for quantity demanded indexed by time t:

where β and γ are the estimates of the impact of the natural log of digital ad spend, δ, and television ad spend, τ, respectively. Additionally, α is our intercept, ϕ1 and ϕ2 are estimates of the autoregressive components of quantity demanded, S denotes seasonality, X is the set of all relevant covariates and lagged covariates along with the matrix of their coefficient estimates Ω, and ϵ is the error term. Furthermore, assume that digital and television advertising satisfy the exogeneity assumption conditional on X, S, and the autoregressive components within our model. That is,

(9) Link

Why the natural log of digital and television ad spend you may ask? This is by no means a required nor a definitive decision in this context, but I am seeking to demonstrate how variable transformations can capture hypotheses about the relationship between our choice variables and the outcomes of interest. In our case, suppose we hypothesize that the impact on ad spend increases sharply initially, but gradually levels out. This is exactly what the logarithm transformation will allow us to model. Observe:

Note that the cost functional form can generally be more easily known internally. Thus, let’s also specify the functional form of our cost function:

(10) Link

Here we can see that we have a cost ζ associated with each unit produced and this cost is discounted as we produce more (think a discount for larger contracts or economies of scale). We also simply sum digital ad spend and television ad spend into our total cost.

Now that we have developed the theoretical basis for our econometric profit maximization problem, let’s simulate some data and take this to python!

Data Simulation (Optional)

Note this section can be skipped without any loss of the primary content.

Let’s first simulate monthly data over 10 years for quantity demanded, where the following variables included are as follows :

import pandas as pddf = pd.DataFrame()
## Digital Advertising - ln(δ) 
df['log_digital_advertising'] = np.log(np.random.normal(loc=50000,scale=15000,size=120).round())
## Television Advertising - ln(τ)
df['log_television_advertising'] = np.log(np.random.normal(loc=50000,scale=15000,size=120).round())
## Matrix X of covariates
# Lag Digital Advertising
df['log_digital_advertising_lag1'] = df['log_digital_advertising'].shift(1)
df['log_digital_advertising_lag2'] = df['log_digital_advertising'].shift(2)
# Lag Television Advertising
df['log_television_advertising_lag1'] = df['log_television_advertising'].shift(1)
df['log_television_advertising_lag2'] = df['log_television_advertising'].shift(2)
# Price
df['price'] = np.random.normal(loc=180,scale=15,size=120).round()
df['price_lag1'] = df['price'].shift(1)
df['price_lag2'] = df['price'].shift(2)
# Competitor Price
df['comp_price'] = np.random.normal(loc=120,scale=15,size=120).round()
df['comp_price_lag1'] = df['comp_price'].shift(1)
df['comp_price_lag2'] = df['comp_price'].shift(2)
# Seasonality 
from itertools import cycle
months = cycle(['Jan','Feb','Mar','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
df['months'] = [next(months) for m in range(len(df))]
one_hot = pd.get_dummies(df['months'])
one_hot = one_hot[['Jan','Feb','Mar','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec']]
df = df.join(one_hot).drop('months',axis=1)
## Constant
df['constant'] = 1
# Drop NaN (Two lags)
df = df.dropna()

Note that we include lag variables because it is highly plausible that today’s quantity demanded is a function of lagged values for many of the variables. We also control for seasonality effects by incorporation dummy variables for each month (this is one of many ways to incorporate seasonality into the model). We then specify the parameters associated with each variable as (note that these parameters are specifiedin the same order as the columns of the dataframe!):

params = np.array(
[10_000, # β
5_000, # γ
2_000, # Ω
1_000, # Ω
3_000, # Ω
1_000, # Ω
-1_000, # Ω
-500,   # Ω
-100,   # Ω
500,   # Ω
300,   # Ω
100,   # Ω
25_000, # S
15_000, # S 
15_000, # S
10_000, # S
10_000, # S
10_000, # S
15_000, # S
15_000, # S
25_000, # S
35_000, # S
35_000, # S
40_000, # S
50_000  # α
])

We can then simulate our econometric specification (eq. 8) of quantity demanded by running quantity_demanded = np.array(df) @ params. However, note that we are missing the autoregressive components, thus we also want quantity demanded to follow an autoregressive process as mentioned above. That is, quantity demanded is also a function of its own lagged values. We include 2 lags here (AR(2) process) with respective coefficients ϕ1 and ϕ2. Note, we can simulate this with initial conditions q0 and q-1 via the following system:

def quantity_ar2_process(T, ϕ1, ϕ2, q0, q_1, ϵ, df, params):Φ = np.identity(T)  # The T x T identity matrix
for i in range(T):
if i-1 >= 0:
Φ[i, i-1] = -ϕ1
if i-2 >= 0:
Φ[i, i-2] = -ϕ2
B = np.array(df) @ params + ϵ
B[0] = B[0] + ϕ1 * q0 + ϕ2 * q_1
B[1] = B[1] + ϕ2 * q0
return np.linalg.inv(Φ) @ B
## Quantity Demand AR(2) component process
# Parameters
T = 118 # Time periods less two lags 
ϕ1 = 0.3 # Lag 1 coefficient (ϕ1)
ϕ2 = 0.05 # Lag 2 coefficient (ϕ2)
q_1 = 250_000 # Initial Condition q_-1
q0 = 300_000 # Initial Condition q_0
ϵ = np.random.normal(0, 5000, size=T) # Random Error (ϵ)
quantity_demanded_ar = quantity_ar2_process(T,ϕ1,ϕ2,q0,q_1,ϵ,df,params)
# Quantity_demanded target variable
df['quantity_demanded'] = quantity_demanded_ar
# Additional covariates of lagged quantity demanded
df['quantity_demanded_lag1'] = df['quantity_demanded'].shift(1)
df['quantity_demanded_lag2'] = df['quantity_demanded'].shift(2)

Econometric Estimation & Optimization

Let’s first use our framework in eq. (2) to transform our constrained optimization problem in eq. (7) to one in which we can solve utilizing our function constrained_newton_method() from above:

As discussed before, we need to estimate our quantity demanded, q_hat. Let’s take a look at what our quantity demanded looks like over the 10 years simulated:

We can clearly see some seasonality occurring towards the end of the years and it appears we are dealing with a stationary process (this is all be design). Now suppose that we have the following observed variables:

where, in eq. 8, our econometric specification, quantity_demanded is our outcome q, log_digital_advertising is our ln(δ), log_television_advertising is our ln(τ), constant is our α, quantity_demanded_lag1 & quantity_demanded_lag2 are our autoregressive components q_t-1 & q_t-2, and the remainder are our additional covariates X including seasonality S.

Now, with this data, we seek to estimate our econometric specification in eq. 8. We can estimate this structural model using OLS. For this we will use statsmodels.

A great exercise would be to solve the linear regression using the Newton’s method code we have constructed and compare the results to statsmodels. Hint: the objective in a linear regression is to minimize the Residual Sum of Squares. Note that the code we have written is by no means an efficient approach to solving a linear regression, but this is more oriented towards illustrating optimization theory in a model fitting (regression) context. Code for this will be provided at the end of the article!

Note that we drop the first 2 observations as these are our first two lags and we drop July as a reference month:

import statsmodels.api as stats## Fit Econometric model using OLS
df = df[2:] # Drop first two lagged values
y = df['quantity_demanded']
X = df.drop(['quantity_demanded','July'],axis=1)
mod = stats.OLS(y,X)
results = mod.fit()
print(results.summary())

Now we have our estimated econometric specification for quantity demanded! A few observations:

An increase in log digital ad spend and log television ad spend are associated with an increase in quantity demand
An increase price is associated with a decrease in quantity demand (this is expected behavior)
We see clear seasonality with increasing demand during Sep-Dec, this is consistent with our time series above
We see that the first lag of quantity demanded is predictive of the present, in favor of autoregressive process

The results above can be verified and compared with the data construction above in the data simulation section

Let’s now specify our symbolic variables for our optimization problem (δ, τ, λ, and ρ), the values of our present variables at time t, and grab the lagged values from our data. We will then have everything necessary to construct our optimization problem:

# Build Symbolic Functions with all variables in function
δ, τ, λ, ρ  = sm.symbols('δ τ λ ρ')## Values of current variables
price = 180
comp_price = 120
Jan = 1
## Obtain Lagged Values
log_digital_advertising_lag1 = df['log_digital_advertising_lag1'].iloc[-1]
log_digital_advertising_lag2 = df['log_digital_advertising_lag2'].iloc[-2]
log_television_advertising_lag1 = df['log_television_advertising_lag1'].iloc[-1]
log_television_advertising_lag2 = df['log_television_advertising_lag2'].iloc[-2]
price_lag1 = df['price_lag1'].iloc[-1]
price_lag2 = df['price_lag2'].iloc[-2]
comp_price_lag1 = df['comp_price_lag1'].iloc[-1]
comp_price_lag2 = df['comp_price_lag2'].iloc[-2]
quantity_demanded_lag1 = df['quantity_demanded_lag1'].iloc[-1]
quantity_demanded_lag2 = df['quantity_demanded_lag2'].iloc[-2]
variables = [sm.log(δ),
sm.log(τ),
log_digital_advertising_lag1,
log_digital_advertising_lag2,
log_television_advertising_lag1,
log_television_advertising_lag2,
price,
price_lag1,
price_lag2,
comp_price,
comp_price_lag1,
comp_price_lag2,
Jan,0,0,0,0,0,0,0,0,0,0, # All Months less July 
1, # Constant
quantity_demanded_lag1,
quantity_demanded_lag2
]
# Quantity Demanded
quantity_demanded = np.array([variables]) @ np.array(results.params) # params from ols model
quantity_demanded = quantity_demanded[0]
print(quantity_demanded)

where our estimated quantity demanded from eq. 8 becomes:

Estimated Quantity Demanded Function

Now we can construct our revenue, cost, and put them together to construct our profit function. Here our cost to produce each unit is $140 base and is discounted by $0.0001 for each additional unit produced:

Revenue = price * quantity_demanded
Cost = quantity_demanded * (140 - 0.0001*quantity_demanded) + τ + δ
profit = Revenue - Costprint(profit)

Plotting our profit as a function of digital ad spend and television ad spend, π(δ, τ):

Let’s now solve our optimization problem as formulated in eq. 12 via python using the optimization theory that we have learned from part 1 and part 2 of this series. Note that the extremely high value of ρ is to account for the fact that the values of our objective function are extremely large thus we need to make sure penalization is large enough to avoid “jumping” out of constraints.

## Optimization Problemobjective = -profit + λ*(τ + δ - 100_000) - ρ*sm.log((τ-20_000)*(δ-10_000))
symbols = [δ, τ, λ, ρ]
x0 = {δ:20_000, τ:80_000, λ:0, ρ:100000}
results = constrained_newton_method(objective,symbols,x0,iterations=1000,mute=False)

with the corresponding output:

Thus, our solution is to spend ~$66,831 on digital ad spend and ~$33,169 on television ad spend. These values correspond to:

digital_ad = results[δ]
television_ad = results[τ]quantity = quantity_demanded.evalf(subs={δ:digital_ad,τ:television_ad})
revenue = Revenue.evalf(subs={δ:digital_ad,τ:television_ad})
cost = Cost.evalf(subs={δ:digital_ad,τ:television_ad})
profit = revenue - cost
print(f"Quantity: {int(quantity):,}")
print(f"Total Revenue: ${round(revenue,2):,}")
print(f"Total Cost: ${round(cost,2):,}")
print(f"Profit: ${round(profit,2):,}")

And there you have it!