Saturday, January 24, 2026

Versatile discrete alternative modeling utilizing a multinomial probit mannequin, half 1


(newcommand{xb}{{bf x}}
newcommand{betab}{boldsymbol{beta}}
newcommand{zb}{{bf z}}
newcommand{gammab}{boldsymbol{gamma}})Now we have no alternative however to decide on

We make selections day by day, and infrequently these selections are made amongst a finite variety of potential options. For instance, will we take the automobile or trip a motorcycle to get to work? Will we now have dinner at house or eat out, and if we eat out, the place will we go? Scientists, advertising analysts, or political consultants, to call a number of, want to discover out why folks select what they select.

On this submit, I present some background about discrete alternative fashions, particularly, the multinomial probit mannequin. I talk about this mannequin from a random utility mannequin perspective and present you methods to simulate information from it. That is useful for understanding the underpinnings of this mannequin. In my subsequent submit, we are going to use the simulated information to exhibit methods to estimate and interpret results of curiosity.

Random utility mannequin and discrete alternative

An individual confronted with a discrete set of options is assumed to decide on the choice that maximizes his or her utility in some outlined approach. Utilities are usually conceived of as the results of a perform that consists of an noticed deterministic and an unobserved random half, as a result of not all components which may be related for a given determination will be noticed. The regularly used linear random utility mannequin is

[U_{ij} = V_{ij} + epsilon_{ij}, hspace{5mm} j = 1,…,J]

the place (U_{ij}) is the utility of the (i)th particular person associated to the (j
)th various, (V_{ij}) is the noticed element, and (epsilon_{ij}) is the unobserved element. Within the context of regression modeling, the noticed half, (V_{ij}), is often construed as some linear or nonlinear mixture of noticed traits associated to people and options and corresponding parameter estimates, whereas the parameters are estimated based mostly on a mannequin that makes sure assumptions concerning the distribution of the unobserved elements, (epsilon_{ij}).

Motivating instance

Let’s check out an instance. Suppose that people can enroll in one in every of three medical insurance plans: Sickmaster, Allgood, and Cowboy Well being. Thus we now have the next set of options:

[s={mathrm{Sickmaster},mathrm{Allgood},mathrm{Cowboy, Health}}]

We’d count on an individual’s utility associated to every of the three options to be a perform of each private traits (akin to revenue or age) and traits of the well being care plan (akin to its worth).

We would pattern people and ask them which well being plan they would like in the event that they needed to enroll in one in every of them. If we collected information on the individual’s age (in a long time), the individual’s family revenue (in $10,000), and the worth of a plan (in $100/month), then our information would possibly look one thing like the primary three circumstances from the simulated information beneath:


. listing in 1/9, sepby(id)

     +-----------------------------------------------------------+
     | id             alt   alternative   hhinc   age   worth       U |
     |-----------------------------------------------------------|
  1. |  1      Sickmaster        1    3.66   2.1    2.05    2.38 |
  2. |  1         Allgood        0    3.66   2.1    1.73   -1.04 |
  3. |  1   Cowboy Well being        0    3.66   2.1    1.07   -2.61 |
     |-----------------------------------------------------------|
  4. |  2      Sickmaster        0    3.75   4.2    2.19   -2.97 |
  5. |  2         Allgood        1    3.75   4.2    1.12    0.29 |
  6. |  2   Cowboy Well being        0    3.75   4.2    0.78   -2.22 |
     |-----------------------------------------------------------|
  7. |  3      Sickmaster        0    2.32   2.4    2.25   -4.49 |
  8. |  3         Allgood        0    2.32   2.4    1.31   -5.76 |
  9. |  3   Cowboy Well being        1    2.32   2.4    1.02    1.19 |
     +-----------------------------------------------------------+

Taking the primary case (id==1), we see that the case-specific variables hhinc and age are fixed throughout options and that the alternative-specific variable worth varies over options.

The variable alt labels the options, and the binary variable alternative signifies the chosen various (it’s coded 1 for the chosen plan, and 0 in any other case). As a result of it is a simulated dataset, we all know the underlying utilties that correspond to every various, and people are given in variable U. The primary respondent’s utility is highest for the primary various, and so the result variable alternative takes the worth 1 for alt==”Sickmaster” and 0 in any other case. That is the marginal distribution of circumstances over options:


. tabulate alt if alternative == 1

    Insurance coverage |
         plan |      Freq.     %        Cum.
--------------+-----------------------------------
   Sickmaster |      6,315       31.57       31.57
      Allgood |      8,308       41.54       73.11
Cowboy Well being |      5,377       26.89      100.00
--------------+-----------------------------------
        Complete |     20,000      100.00

As we are going to see beneath, a helpful mannequin for analyzing most of these information is the multinomial probit mannequin.

Multinomial probit mannequin

The multinomial probit mannequin is a discrete alternative mannequin that’s based mostly on the idea that the unobserved elements in (epsilon_{ij}) come from a traditional distribution. Completely different probit fashions come up from totally different specs of (V_{ij}) and totally different assumptions about (epsilon_{ij}). For instance, with a primary multinomial probit mannequin, as is applied in Stata’s mprobit command (see [R] mprobit), we specify (V_{ij}) to be

[V_{ij} = xb_{i}betab_{j}^{,’}]

the place (xb_{i}) is a vector of individual-specific covariates, and (betab_{j}) is the corresponding parameter vector for various (j). The random elements (epsilon_{ij}) are assumed to come back from a multivariate regular distribution with imply zero and identification variance–covariance matrix. For instance, if we had three options, we’d assume

start{equation*}
epsilon_{ij} {elevate.17exhbox{$scriptstylesim$}} mathcal{N}(0,Sigma) , hspace{5mm}
Sigma =
start{bmatrix}
1 & 0 & 0
& 1 & 0
& & 1
finish{bmatrix}
finish{equation*}

Specifying the above covariance construction signifies that the unobserved elements, (epsilon_{ij}), are assumed to be homoskedastic and unbiased throughout options.

Independence implies that variations in utility between any two options depend upon these two options however not on any of the opposite options. This property is called the independence from irrelevant options (IIA) assumption. When the IIA assumption holds, it could result in a lot of handy benefits akin to finding out solely a subset of options (see Practice [2009, 48]). Nonetheless, IIA is a reasonably restrictive assumption which may not maintain.

Persevering with with our well being care plan instance, suppose that Sickmaster and Allgood each favor folks with well being issues, whereas Cowboy Well being favors individuals who solely hardly ever see a health care provider. On this case, we’d count on the utilities that correspond to options Sickmaster and Allgood to be positively correlated whereas being negatively correlated with the utility comparable to Cowboy Well being. In different phrases, utilities with respect to options Sickmaster and Allgood are associated to these of Cowboy Well being. On this case, we should use a mannequin that relaxes the IIA assumption and permits for correlated utilities throughout options.

One other potential limitation of our multinomial probit specification issues the noticed (V_{ij}), which consists of the linear mixture of individual-specific variables and alternative-specific parameters. In different phrases, we solely contemplate noticed variables that modify over individuals however not over options. In a setting like this, we’d use

[V_{ij} = xb_{i}betab_{j}’ + zb_{ij}gammab’]

the place (zb_{ij}) are alternative-specific variables that modify each over people and options and (gammab) is the corresponding parameter vector. Combining this with our extra versatile assumptions concerning the unobservables, we are able to write our mannequin as

[U_{ij} = xb_{i}betab_{j}’ + zb_{ij}gammab’ + epsilon_{ij}, hspace{5mm} j = 1,…,J]

with (epsilon_{ij} {elevate.17exhbox{(scriptstylesim)}} mathcal{N}(0,Sigma)).

Assuming unstructured correlation and heteroskedastic errors throughout (J=3) options, for instance, (Sigma) is given by

start{equation*}
Sigma =
start{bmatrix}
sigma_{11} & sigma_{12} & sigma_{13}
& sigma_{22} & sigma_{23}
& & sigma_{33}
finish{bmatrix}
finish{equation*}

As we are going to see later, we are able to match this mannequin in Stata with the asmprobit command; see [R] asmprobit for particulars concerning the command and applied strategies.

We stated in our well being plan instance that we predict that the worth that particular person (i) has to pay for the plan is vital and it might range each over people and options. We will subsequently write our utility mannequin for 3 options as

[U_{ij} = beta_{j,mathtt{cons}} + beta_{j,mathtt{hhinc}}{tt hhinc}_{i} + beta_{j,mathtt{age}}{tt age}_{i} + gamma {tt price}_{ij} + epsilon_{ij}, hspace{5mm} j = 1,2,3]

Simulation

We will simulate information assuming the data-generating course of given within the above mannequin. We’ll specify the 2 case-specific variables, family revenue (hhinc) and age (age), and we are going to take the worth of the plan (worth) because the alternative-specific variable. The case-specific variables hhinc and age might be fixed throughout options inside every particular person, whereas the alternative-specific variable worth will range over people and inside people over options.

We specify the next inhabitants parameters for (betab_{j}) and (gamma):

start{align*}
beta_{1,mathtt{cons}} &= -1, &beta_{1,mathtt{hhinc}} &= hspace{2.7mm} 1, &beta_{1,mathtt{age}} &= -1
beta_{2,mathtt{cons}} &= -6, &beta_{2,mathtt{hhinc}} &= hspace{2.7mm} 0.5, &beta_{2,mathtt{age}} &= hspace{2.7mm} 1
beta_{3,mathtt{cons}} &= hspace{2.7mm} 2, &beta_{3,mathtt{hhinc}} &= -1, &beta_{3,mathtt{age}} &= hspace{2.7mm} 0.5
gamma &= -0.5
finish{align*}

For (epsilon_{ij}), we are going to specify the next:

start{equation*}
epsilon_{ij} {elevate.17exhbox{$scriptstylesim$}} mathcal{N}(0,Sigma), hspace{5mm}
Sigma =
start{bmatrix}
2.1 & 0.6 & -0.5
& 1.7 & -0.8
& & 1.4
finish{bmatrix}
finish{equation*}

With these specs, we are able to now create a simulated dataset. We begin by drawing our three error phrases and two case-specific covariates:


. clear

. set seed 65482

. set obs 20000
variety of observations (_N) was 0, now 20,000

. generate id = _n

. scalar s11 =  2.1

. scalar s22 =  1.7

. scalar s33 =  1.4

. scalar s12 =  0.6

. scalar s13 = -0.5

. scalar s23 = -0.8

. mat C = (s11,s12,s13)  
>         (s12,s22,s23)  
>         (s13,s23,s33)

. drawnorm e1 e2 e3, cov(C)

. generate double hhinc = max(0,rnormal(5,1.5))

. generate double age = runiformint(20,60)/10

To permit for various particular covariates, we are going to develop our information so that we are going to have one remark for every various for every case, then create an index for the options, after which generate our variables ({tt worth}_{ij}):


. develop 3
(40,000 observations created)

. bysort id : gen alt = _n

. generate double worth = rbeta(2,2) + 1.50 if alt == 1
(40,000 lacking values generated)

. substitute         worth = rbeta(2,2) + 0.75 if alt == 2
(20,000 actual modifications made)

. substitute         worth = rbeta(2,2) + 0.25 if alt == 3
(20,000 actual modifications made)

We will now go forward and generate three variables for the noticed utility elements, one for every various:


. generate double xb1 = -1.0 + 1.0*hhinc - 1.0*age - 0.5*worth

. generate double xb2 = -6.0 + 0.5*hhinc + 1.0*age - 0.5*worth

. generate double xb3 =  2.0 - 1.0*hhinc + 0.5*age - 0.5*worth

To calculate the utilities that correspond to every various, we add the unobserved to the noticed elements:


. native snorm = sqrt((s11 + s22 - 2*s12)/2)

. generate double U1 = xb1*`snorm' + e1

. generate double U2 = xb2*`snorm' + e2

. generate double U3 = xb3*`snorm' + e3

Wanting on the code above, you’ll discover that we included an element to scale our specified inhabitants parameters. This is because of identification particulars associated to our mannequin that I clarify additional within the Identification part. One factor we have to know now, nevertheless, is that for the mannequin to be recognized, the utilities have to be normalized for degree and scale. Normalizing for degree is simple as a result of, since we’re solely within the utilities relative to one another, we are able to outline a base-level various after which take the variations of utilities with respect to the set base. If we set the primary various as the bottom, we are able to rewrite our mannequin as follows:

start{align*}U^{*}_{ij} &= beta_{j,mathtt{cons}}-beta_{1,mathtt{cons}} + (beta_{j,mathtt{hhinc}}-beta_{1,mathtt{hhinc}}){tt hhinc}_{i} +
(beta_{j,mathtt{age}}-beta_{1,mathtt{age}}){tt age}_{i}
&quad +
gamma ({tt worth}_{ij}-{tt worth}_{i1}) + epsilon_{ij}-epsilon_{i1}, hspace{5mm} j = 2,3
finish{align*}

This suggests that solely (J-1) parameter vectors in (betab) are recognized. Let’s outline these parameters as

start{align*}
Delta beta_{j,mathtt{cons}} &= beta_{j,mathtt{cons}}-beta_{1,mathtt{cons}}
Delta beta_{j,mathtt{hhinc}} &= beta_{j,mathtt{hhinc}}-beta_{1,mathtt{hhinc}}
Delta beta_{j,mathtt{age}} &= beta_{j,mathtt{age}}-beta_{1,mathtt{age}}
finish{align*}

for (j = 2,3). The parameters in (betab_{j}) that we are going to attempt to recuperate will then be the next variations:

start{align*}
Delta beta_{2,mathtt{cons}} &= -5
Delta beta_{3,mathtt{cons}} &= hspace{2.7mm} 3
Delta beta_{2,mathtt{hhinc}} &= -0.5
Delta beta_{3,mathtt{hhinc}} &= -2
Delta beta_{2,mathtt{age}} &= hspace{2.7mm} 2
Delta beta_{3,mathtt{age}} &= hspace{2.7mm} 1.5
finish{align*}

What’s left to finish our simulated dataset is to generate the result variable that takes the worth 1 if remark (i) chooses various (ok), and 0 in any other case. To do that, we are going to first create a single variable for the utilities after which decide the choice with the best utility:


. quietly generate double U = .

. quietly generate y = .

. forval i = 1/3 {
  2.     quietly substitute U = U`i' if alt==`i'
  3. }

. bysort id : egen double umax_i = max(U)

. forval i = 1/3 {
  2.     quietly bysort id : substitute y = alt if umax_i == U
  3. }

. generate alternative = alt == y

We get hold of the next by utilizing asmprobit:


. asmprobit alternative worth, case(id) options(alt) casevars(hhinc age) 
> basealternative(1) scalealternative(2) nolog

Various-specific multinomial probit      Variety of obs      =     60,000
Case variable: id                            Variety of circumstances    =     20,000

Various variable: alt                    Alts per case: min =          3
                                                            avg =        3.0
                                                            max =          3
Integration sequence:      Hammersley
Integration factors:               150           Wald chi2(5)    =    4577.15
Log simulated-likelihood = -11219.181           Prob > chi2     =     0.0000

----------------------------------------------------------------------------
      alternative |     Coef.   Std. Err.      z    P>|z|    [95% Conf. Interval]
-------------+--------------------------------------------------------------
alt          |
       worth | -.4896106   .0523626    -9.35   0.000   -.5922394   -.3869818
-------------+---------------------------------------------------------------
Sickmaster   | (base various)
-------------+--------------------------------------------------------------
Allgood      |
       hhinc | -.5006212   .0302981   -16.52   0.000   -.5600043    -.441238
         age |  2.001367   .0306663    65.26   0.000    1.941262    2.061472
       _cons | -4.980841   .1968765   -25.30   0.000   -5.366711    -4.59497
-------------+--------------------------------------------------------------
Cowboy_Hea~h |
       hhinc | -1.991202   .1092118   -18.23   0.000   -2.205253    -1.77715
         age |  1.494056   .0446662    33.45   0.000    1.406512    1.581601
       _cons |  3.038869   .4066901     7.47   0.000    2.241771    3.835967
-------------+--------------------------------------------------------------
     /lnl2_2 |  .5550228   .0742726     7.47   0.000    .4094512    .7005944
-------------+--------------------------------------------------------------
       /l2_1 |   .667308   .1175286     5.68   0.000    .4369562    .8976598
----------------------------------------------------------------------------
(alt=Sickmaster is the choice normalizing location)
(alt=Allgood is the choice normalizing scale)

Wanting on the above output, we see that the coefficient of the alternative-specific variable worth is (widehat gamma = -0.49), which is near our specified inhabitants parameter of (gamma = -0.50). We will say the identical about our case-specific variables. The estimated coefficients of hhinc are (widehat Delta beta_{2,mathtt{hhinc}} = -0.50) for the second and (widehat Delta beta_{3,mathtt{hhinc}} = -1.99) for the third various. The estimates for age are (widehatDelta beta_{2,mathtt{age}} = 2.00) and (widehat Delta beta_{3,mathtt{age}} = 1.49). The estimated variations in alternative-specific constants are (widehat Delta beta_{2,mathtt{cons}} = -4.98) and (widehat Delta beta_{3,mathtt{cons}} = 3.04).

Identification

Now let me shed extra mild on the identification particulars associated to our mannequin that we wanted to think about once we simulated our dataset. An vital characteristic of (U_{ij}) is that the extent in addition to the dimensions of utility is irrelevant with respect to the chosen various as a result of shifting the extent by some fixed quantity, or multiplying it by a (optimistic) fixed, doesn’t change the rank order of utilities and thus would don’t have any influence on the chosen various. This has vital ramifications for modeling utilities as a result of and not using a set degree and scale of (U_{ij}), there are an infinite variety of parameters in (V_{ij}) that yield the identical end result by way of the chosen options. Subsequently, utilities have to be normalized to establish the parameters of the mannequin.

We already noticed methods to normalize for degree. Normalizing for scale is a little more tough, although, as a result of we assume correlated and heteroskedastic errors. Due to the hetersokedasticity, we have to set the dimensions for one of many variances after which estimate the opposite variances in relation to the set variance. We should additionally account for the nonzero covariance between the errors, which makes extra figuring out restrictions crucial. It seems that given our mannequin assumptions, solely (J(J-1)/2-1) parameters of our variance–covariance matrix are identifiable (see chapter 5 in Practice [2009] for particulars about figuring out restrictions within the context of probit fashions). To be concrete, our unique variance–covariance matrix was the next:

start{equation*}
Sigma =
start{bmatrix}
sigma_{11} & sigma_{12} & sigma_{13}
& sigma_{22} & sigma_{23}
& & sigma_{33}
finish{bmatrix}
finish{equation*}

Taking variations of correlated errors reduces the (3 occasions 3) matrix of error variances to a (2 occasions 2) variance–covariance matrix of error variations:

start{equation*}
Sigma^{*} =
start{bmatrix}
& sigma_{11}+sigma_{22}-2sigma_{12} & sigma_{11}+sigma_{23}-sigma_{12}-sigma_{13}
& & sigma_{11}+sigma_{33}-2sigma_{13}
finish{bmatrix}
finish{equation*}

If we normalize this matrix with respect to the second various, we get

start{equation*}
widetilde Sigma^{*} =
start{bmatrix}
& 1 & (sigma_{11}+sigma_{23}-sigma_{12}-sigma_{13})/nu
& & (sigma_{11}+sigma_{33}-2sigma_{13})/nu
finish{bmatrix}
finish{equation*}

the place (nu = sigma_{11}+sigma_{22}-2sigma_{12}). As a result of we additionally need to set the dimensions for our base various, our normalized matrix turns into

start{equation*}
examine Sigma^{*} =
start{bmatrix}
& 2 & 2(sigma_{11}+sigma_{23}-sigma_{12}-sigma_{13})/nu
& & 2(sigma_{11}+sigma_{33}-2sigma_{13})/nu
finish{bmatrix}
finish{equation*}

Thus, as a result of utilities are scaled by the usual deviation, they’re divided by (sqrt{nu/2}). Now, getting again to our simulation, if we want to recuperate our specified parameters, we have to scale them accordingly. We begin from the variance–covariance matrix of error variations:

start{equation*}
Sigma^{*} =
start{bmatrix}
& 2.1 + 1.7 – 2*0.6 & 2.1 – 0.8 – 0.6 + 0.5
& & 2.1 + 1.4 – 2*-0.5
finish{bmatrix}
=
start{bmatrix}
& 2.6 & 1.2
& & 4.5
finish{bmatrix}
finish{equation*}

Normalizing with respect to the second various yields

start{equation*}
widetilde Sigma^{*} =
start{bmatrix}
& 1 & 1.2/2.6
& & 4.5/2.6
finish{bmatrix}
=
start{bmatrix}
& 1 & 0.4615
& & 1.7308
finish{bmatrix}
finish{equation*}

after which multiplying (tilde Sigma^{*}) by 2 yields

start{equation*}
examine Sigma^{*} =
start{bmatrix}
& 2 & 0.9231
& & 3.4615
finish{bmatrix}
finish{equation*}

that are the true variance–covariance parameters. Our scaling time period is (sqrt{2.6/2}), and since utilities might be divided by this time period, we might want to multiply our parameters by this time period.

Lastly, we examine if we are able to recuperate our variance–covariance parameters. We use the postestimation command estat covariance to show the estimated variance–covariance matrix of error variations:


. estat covariance

  +-------------------------------------+
  |              |   Allgood  Cowboy_~h |
  |--------------+----------------------|
  |      Allgood |         2            |
  | Cowboy_Hea~h |   .943716   3.479797 |
  +-------------------------------------+
Notice: Covariances are for options differenced with Sickmaster.

We see that our estimate is near the true normalized covariance matrix.

Conclusion

I mentioned multinomial probit fashions in a discrete alternative context and confirmed methods to generate a simulated dataset accordingly. In my subsequent submit, we are going to use our simulated dataset and talk about estimation and interpretation of mannequin outcomes, which isn’t as easy as one would possibly assume.

Reference

Practice, Okay. E. 2009. Discrete Alternative Strategies with Simulation. 2nd ed. New York: Cambridge College Press.



Related Articles

Latest Articles