Thursday, April 30, 2026

A Light Introduction to Stochastic Programming


In my first TDS put up, I wrote about easy methods to translate a real-world drawback into an integer linear program. In my second, I wrote about easy methods to make that program strong in opposition to uncertainty. Each have been variations on the identical thought: take a fuzzy real-world query, squeeze it into an LP, and let a solver do the remaining.

There’s a second in each optimizer’s life, although, when the LP begins to really feel a bit too neat. Demand is a quantity. Journey time is a quantity. Wind velocity is a quantity. The mannequin accepts the enter, returns an optimum answer, and goes on its approach. The fact these numbers have been supposed to explain (messy, jittery, and sometimes stunning) doesn’t actually present up anyplace.

Stochastic programming is the sphere that takes that discomfort severely. As a substitute of pretending the information is actual, it builds the uncertainty immediately into the mannequin. The worth you pay is a little more notation; the payoff is choices that maintain up when the world doesn’t cooperate.

This put up is a mild tour of the fundamentals. We’ll see why the apparent method doesn’t work, stroll via the 4 customary methods to deal with uncertainty in a linear program, and end with a fast sanity test on whether or not any of that is well worth the effort. There’s some math, nevertheless it’s the identical math you already know from LP, with one additional image connected.

Start line: a style firm with a foul crystal ball

To make this concrete, we’ll use the operating instance from dr. Ruben van Beesten’s lectures (extra on that within the credit beneath). It goes like this.

You run a style firm that sells winter clothes in Germany. Manufacturing occurs in Bangladesh, which is reasonable however gradual: the products take a number of weeks to reach. So within the fall, you must resolve how a lot to supply for the upcoming winter season.

Two methods this could go flawed: produce too little, and also you lose gross sales; produce an excessive amount of, and also you’re caught with inventory you’ll be able to’t promote. The entire query is how a lot to supply now, and the reply is dependent upon one thing you don’t truly know but: winter demand.

If you happen to ignored the uncertainty for a second and pretended demand was a hard and fast quantity, you can write down a vanilla LP:

Right here x is how a lot you produce, c is the unit manufacturing value, h is demand, and T is simply the identification matrix (one unit produced satisfies one unit of demand). The constraint says: produce not less than as a lot as is demanded.

That is advantageous if h is definitely recognized. The difficulty is that demand isn’t a quantity, it’s a random variable. Let’s name it ξ. The trustworthy model of the mannequin would seem like this:

And right here we hit a wall. What does it imply for x to fulfill a constraint that is dependent upon a random variable? Is x = 100possible if demand may be 80, may be 120, and could be anyplace in between? The issue isn’t laborious to unravel: it’s ill-defined. The solver doesn’t even know which drawback you’re asking it to unravel.

Stochastic programming is, in essence, a set of principled solutions to that query. We’ll take a look at the 4 most typical ones.

4 methods to deal with the uncertainty

Every of the 4 approaches takes the ill-defined LP above and turns it right into a well-defined optimization drawback. They differ in what they assume you understand concerning the uncertainty, and in how cautious they’re about unhealthy outcomes.

1. Strong optimization: put together for the worst

Essentially the most cautious method. You don’t must know the complete likelihood distribution of ξ, however solely its help, i.e., the set of values it may probably take. We name this set the uncertainty set, written U. Then you definately ask: what’s the finest determination that stays possible regardless of which ξ ∈ U truly exhibits up?

The constraint now has to carry for each ξ within the uncertainty set. In our style instance with U = [0, 10], you’d be planning for demand of 10, the worst case, each time.

That’s the energy and the weak spot of sturdy optimization in a single sentence. The answer is bulletproof, nevertheless it’s additionally conservative: you’ll typically be sitting on stock you didn’t want, since you deliberate as if the unlikely worst case have been assured. If you happen to’ve learn my earlier put up on robustifying linear packages, that is precisely the framework that sits behind these 4 steps.

2. Probability constraints: loosen up the worst case

Strong optimization plans for any attainable final result. Probability constraints loosen up that to: plan for most of them. You decide a likelihood stage α, say 95%, and require the constraint to carry with not less than that likelihood:

That is known as a joint probability constraint: all of the entries of the constraint vector should be glad concurrently, with joint likelihood ≥ α. A weaker variant treats every row individually:

These are particular person probability constraints: every constraint i should maintain with likelihood not less than αᵢ, however you don’t care concerning the joint occasion. Fast train: in the event you set each αᵢ equal to the joint α, which formulation is extra conservative?

Reply: the joint model. Satisfying all constraints concurrently is a stricter requirement than satisfying every one in isolation, so the joint formulation has a smaller possible area and a worse (greater) optimum value. Both approach, probability constraints offer you a knob, α, to dial how cautious you wish to be. Crank it to 1, and also you’re again to (virtually) strong. Drop it to 0.5, and also you’re principally flipping a coin on feasibility. Most actual purposes reside someplace within the 0.9–0.99 vary.

There’s a catch value flagging: probability constraints are laborious typically. The likelihood time period contained in the constraint is a non-linear, typically non-convex perform of x, so that you normally can’t hand the formulation on to a regular LP solver. There are tractable particular instances (Gaussian noise, sure mixtures of distributions, sample-based approximations), however the common drawback is tougher than it seems to be at first look.

3. Two-stage recourse fashions: resolve, observe, right

The primary two approaches deal with constraint violation as one thing to keep away from, both all the time (strong) or with excessive likelihood (probability). Generally that’s the flawed body. In our style instance, falling wanting demand isn’t catastrophic. It’s annoying. You may normally repair it: produce a small emergency batch in Germany at the next value, or ship by air, or simply settle for the misplaced gross sales and transfer on.

This concept, that violating a constraint isn’t the top of the world, you’ll be able to take a corrective motion later, is the guts of recourse fashions. Within the two-stage model, the timeline seems to be like this:

  • Stage 1 (now): you make a first-stage determination x whereas ξ remains to be unsure.
  • Then: ξ is realized, i.e., the random variable turns into a recognized quantity.
  • Stage 2 (later): you make a second-stage determination y, realizing ξ.

Mathematically, the primary stage seems to be virtually like a vanilla LP, besides the target now comprises an anticipated future value:

The perform v(ξ, x) is the optimum worth of the second-stage drawback, given that you just selected x within the first stage and that ξ turned out to be the realized worth:

Learn this rigorously. The correct-hand facet, h(ξ) − T(ξ) x, is the shortfall, how a lot your first-stage determination didn’t cowl, after ξ was revealed. The recourse determination y then closes that hole, at a value q(ξ) y. So the construction is: pay the up-front value c x, and on high of it pay the anticipated value of cleansing up after the random variable does its factor.

That’s the entire thought. Two-stage recourse fashions are by far the most typical formulation in observe, partly as a result of they seize the precise chronology of selections in lots of actual issues (manufacturing planning, stock, vitality dispatch, scheduling), and partly as a result of they’re comparatively well-behaved mathematically.

A few items of vocabulary you’ll journey over in the event you learn additional:

  • A mannequin has mounted recourse if the recourse matrix W doesn’t rely upon ξ. Many algorithms solely work on this case.
  • A mannequin has (comparatively) full recourse if there’s all the time a possible recourse determination y, it doesn’t matter what ξ seems to be and it doesn’t matter what x you selected. If full recourse fails, the second-stage drawback might be infeasible, which turns into an implicit constraint on the primary stage. (That is precisely the place Benders’ feasibility cuts come from, however that’s a narrative for an additional put up.)

4. Multi-stage recourse fashions: hold going

Generally life isn’t two levels. You don’t simply decide-observe-correct as soon as and go residence; you resolve, observe, resolve, observe, resolve, … again and again. Multi-stage recourse fashions are the pure extension.

In our style instance, suppose we’re now not selecting as soon as within the fall, however thrice: within the fall (low cost, in Bangladesh), in early winter (costlier, in Romania), and in late winter (costliest, in Germany). Demand is step by step revealed over the season, and at every stage we resolve based mostly on what we’ve noticed up to now.

The notation will get heavier, you find yourself writing recursive worth capabilities Qₜ, with histories ξ[t] = (ξ₁, …, ξₜ) hanging off them, however conceptually nothing new is happening. Every stage is a recourse drawback nested contained in the earlier one. The pure method to image that is as a state of affairs tree: every node is a state of the world, every department is a attainable realization of the following random variable, and a state of affairs is an entire root-to-leaf path.

Instance of a three-stage state of affairs tree, supply: course slides by dr. Ruben van Beesten.

One subtlety. A state of affairs is the whole trajectory of ξ, not only one realization. Figuring out that ξ₂ = 10 doesn’t inform you which state of affairs you’re in, as a result of ξ₃ hasn’t occurred but. This issues while you begin writing the deterministic equal (subsequent part), as a result of you must watch out that your choices solely rely upon data that has truly been noticed by the point the choice is made. That property is known as non-anticipativity: you’ll be able to’t anticipate the long run. The mannequin would fortunately cheat in the event you didn’t implement it explicitly.

How will we truly resolve a recourse mannequin?

To date we’ve been writing fashions. To resolve them, we sometimes rework them into one thing a regular LP solver can chew on. The trick is the deterministic equal formulation.

Suppose the random variable ξ has a discrete distribution: it takes finitely many values ξ¹, ξ², …, ξˢ (known as situations), every with likelihood pₛ. Then the anticipated second-stage value is only a finite sum, and we are able to write the whole two-stage drawback as one massive LP by introducing one copy of y per state of affairs:

That’s a daily LP. Massive, probably very massive, you probably have S situations, you’ve primarily copied the second stage S instances, nevertheless it’s an LP. You may hand it straight to HiGHS, Gurobi, CPLEX, or no matter solver you want, and it’ll resolve it.

Two pure questions comply with.

First: what if the distribution of ξ is not discrete? In that case the deterministic equal has infinitely many situations and isn’t finite-dimensional. The usual repair is pattern common approximation: draw a pattern of measurement S from the true distribution, resolve the sampled deterministic equal, and let S develop till your answer stabilizes statistically. There’s an entire literature on how massive S must be and what ensures you get.

Second: what if the deterministic equal is simply too massive to unravel immediately? That is the place decomposition strategies are available. Benders’ decomposition splits the issue right into a grasp drawback within the first-stage variables and a subproblem per state of affairs, then iteratively passes data between them. For multi-stage fashions with many levels, the analogous trick is stochastic twin dynamic programming (SDDP), which makes use of sampling and approximate worth capabilities to keep away from constructing the complete state of affairs tree. Each are superior sufficient to deserve their very own posts, so I’ll come again to them later.

Is any of this truly well worth the hassle?

Sincere query. Stochastic packages are messier to formulate, tougher to unravel, and slower to run than their deterministic cousins. In case your real-world drawback isn’t very delicate to uncertainty, you could be higher off simply plugging the anticipated demand into a daily LP and calling it a day.

The excellent news is, you’ll be able to quantify precisely how a lot the stochastic formulation buys you. There are two classical metrics, and each are value realizing.

Outline 4 numbers:

In phrases: SP is the optimum worth of the particular stochastic program. EV is what you get in the event you change ξ with its anticipated worth and resolve the ensuing deterministic drawback; name its answer x̄. EEV is the anticipated value of implementing that deterministic answer x̄ within the precise stochastic world. And WS (“wait-and-see”) is the anticipated value in the event you bought to peek on the realized ξ earlier than deciding x, the cheating-but-best case.

From these 4 numbers you’ll be able to construct two extremely informative portions:

VSS is the Worth of the Stochastic Answer: how a lot worse off you’d be in the event you simply solved the deterministic drawback with common values and carried out its answer. If VSS is small, the stochastic program isn’t shopping for you a lot; the deterministic shortcut is ok.

EVPI is the Anticipated Worth of Excellent Data: how a lot you’d achieve if a benevolent oracle handed you the realized ξ earlier than you needed to resolve. If EVPI is small, your forecasts already include many of the data you want; investing in higher predictions in all probability gained’t transfer the needle. If EVPI is massive, higher knowledge has actual worth.

Clarification of helpful metrics for a stochastic program.

The 2 metrics experience alongside on a tidy chain of inequalities (assuming uncertainty solely on the right-hand facet):

Learn it left to proper: cheating-with-the-mean (EV) is at most as unhealthy as cheating-with-the-realization (WS), which is at most as unhealthy because the trustworthy stochastic reply (SP), which is at most as unhealthy as plugging within the deterministic-solution-and-living-with-it (EEV). The chain implies a free higher sure on VSS you can compute earlier than you ever resolve the SP: VSS ≤ EEV − EV. If that hole is tiny, the deterministic shortcut is sweet sufficient and it can save you your self the headache.

The place to go from right here

This put up caught to the fundamentals: easy methods to write a stochastic program down. The following pure step is easy methods to resolve massive ones effectively. The 2 massive workhorses are:

  • Benders’ decomposition — for two-stage fashions, decomposes the deterministic equal right into a grasp drawback (in x) plus one subproblem per state of affairs, and reconciles them with cuts. Notably elegant when you might have a lot of situations however a comparatively small first stage.
  • Stochastic Twin Dynamic Programming (SDDP) — for multi-stage fashions, makes use of sampling and piecewise-linear approximations of the long run worth capabilities. Famously utilized in hydropower scheduling, the place the state of affairs tree is so massive that express enumeration is hopeless.

Each deserve their very own posts. If there’s curiosity, I’ll write them up.

Takeaway

If you happen to’re utilizing LPs in any context the place the enter knowledge is genuinely unsure because of forecasted demand, climate, costs, journey instances, or anything, then your mannequin is making an implicit alternative about easy methods to deal with that uncertainty. “Simply use the imply” is a alternative. So is “plan for the worst.” Stochastic programming provides you the vocabulary to make that alternative express, and the instruments to judge whether or not your alternative was an excellent one (hey, VSS).

To summarize the 4 primary methods to mannequin uncertainty in an LP:

  1. Strong optimization — plan for the worst case in a given uncertainty set.
  2. Probability constraints — require feasibility with not less than likelihood α.
  3. Two-stage recourse — resolve, observe, right; pay an anticipated recourse value.
  4. Multi-stage recourse — the identical thought, repeated over time on a state of affairs tree.

And two metrics value holding in your again pocket: VSS (does the stochastic mannequin assist?) and EVPI (would higher forecasts assist?).

Most actual issues aren’t deterministic. The excellent news is your modeling toolkit doesn’t should be both.

Credit and references

This put up relies on lectures by dr. Ruben van Beesten (Norwegian College of Science and Expertise) from his course on Stochastic Programming given in October 2023, which I had the pleasure of attending in Trondheim, Norway. The style-company instance, the four-way taxonomy of formulations, and the VSS/EVPI framing all come straight from his slides; any clumsiness within the retelling is mine.

The unique modeling train that motivates a lot of the recourse-model instinct is from 

  • Higle, J. L. (2005). Stochastic Programming: Optimization When Uncertainty Issues. In INFORMS TutORials in Operations Analysis, pp. 30–53.

A few additional pointers value realizing about:

  • Kleywegt, A. J., Shapiro, A., and Homem-de-Mello, T. (2002). The pattern common approximation methodology for stochastic discrete optimization. SIAM Journal on Optimization, 12(2), 479–502. The usual reference for SAA.
  • Higle, J. L., and Sen, S. (1991). Stochastic decomposition: an algorithm for two-stage linear packages with recourse. Arithmetic of Operations Analysis, 16(3), 650–669. One of many few strategies that handles non-discrete distributions immediately.

And naturally, the 2 earlier posts on this sequence: 5 questions that may allow you to mannequin integer linear packages higher and 4 steps to robustify your linear program.

Related Articles

Latest Articles