Tuesday, January 13, 2026

Bayesian logistic regression with Cauchy priors utilizing the bayes prefix


Introduction

Stata 15 gives a handy and stylish approach of becoming Bayesian regression fashions by merely prefixing the estimation command with bayes. You possibly can select from 45 supported estimation instructions. All of Stata’s present Bayesian options are supported by the brand new bayes prefix. You should utilize default priors for mannequin parameters or choose from many prior distributions. I’ll display the usage of the bayes prefix for becoming a Bayesian logistic regression mannequin and discover the usage of Cauchy priors (obtainable as of the replace on July 20, 2017) for regression coefficients.

A standard drawback for Bayesian practitioners is the selection of priors for the coefficients of a regression mannequin. The conservative method of specifying very weak or fully uninformative priors is taken into account to be data-driven and goal, however is at odds with the Bayesian paradigm. Noninformative priors may be inadequate for resolving some widespread regression issues such because the separation drawback in logistic regression. Then again, within the absence of sturdy prior data, there are not any common guidelines for selecting informative priors. On this article, I comply with some suggestions from Gelman et al. (2008) for offering weakly informative Cauchy priors for the coefficients of logistic regression fashions and display how these priors could be specified utilizing the bayes prefix command.

Knowledge

I think about a model of the well-known Iris dataset (Fisher 1936) that describes three iris vegetation utilizing their sepal and petal shapes. The binary variable virg distinguishes the Iris virginica class from these of Iris Versicolour and Iris Setosa. The variables slen and swid describe sepal size and width. The variables plen and pwid describe petal size and width. These 4 variables are standardized, in order that they have imply 0 and normal deviation 0.5.

Standardizing the variables used as covariates in a regression mannequin is really useful by Gelman et al. (2008) to use widespread prior distributions to the regression coefficients. This method can also be favored by different researchers, for instance, Raftery (1996).


. use irisstd

. summarize

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        virg |        150    .3333333    .4729838          0          1
        slen |        150    4.09e-09          .5  -.9318901   1.241849
        swid |        150   -2.88e-09          .5  -1.215422   1.552142
        plen |        150    2.05e-10          .5  -.7817487   .8901885
        pwid |        150    4.27e-09          .5  -.7198134   .8525946

For validation functions, I withhold the primary and final observations from being utilized in mannequin estimation. I generate the indicator variable touse that may mark the estimation subsample.


. generate touse = _n>1 & _n<_N

Fashions

I first run a regular logistic regression mannequin with consequence variable virg and predictors slen, swid, plen, and pwid.


. logit virg slen swid plen pwid if touse, nolog

Logistic regression                             Variety of obs     =        148
                                                LR chi2(4)        =     176.09
                                                Prob > chi2       =     0.0000
Log chance = -5.9258976                     Pseudo R2         =     0.9369

------------------------------------------------------------------------------
        virg |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        slen |  -3.993255   3.953552    -1.01   0.312    -11.74207    3.755564
        swid |  -5.760794   3.849978    -1.50   0.135    -13.30661    1.785024
        plen |   32.73222   16.62764     1.97   0.049     .1426454     65.3218
        pwid |   27.60757   14.78357     1.87   0.062    -1.367681    56.58283
       _cons |  -19.83216   9.261786    -2.14   0.032    -37.98493   -1.679393
------------------------------------------------------------------------------
Word: 54 failures and 6 successes fully decided.

The logit command points a notice that some observations are fully decided. This is because of the truth that the continual covariates, particularly pwid, have many repeating values.

I then match a Bayesian logistic regression mannequin by prefixing the above command with bayes. I additionally specify a random-number seed for reproducibility.


. set seed 15

. bayes: logit virg slen swid plen pwid if touse

Burn-in ...
Simulation ...

Mannequin abstract
------------------------------------------------------------------------------
Probability:
  virg ~ logit(xb_virg)

Prior:
  {virg:slen swid plen pwid _cons} ~ regular(0,10000)                       (1)
------------------------------------------------------------------------------
(1) Parameters are parts of the linear type xb_virg.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC pattern dimension =     10,000
                                                 Variety of obs    =        148
                                                 Acceptance price  =      .1511
                                                 Effectivity:  min =      .0119
                                                              avg =      .0204
Log marginal chance =  -20.64697                          max =     .03992

------------------------------------------------------------------------------
             |                                                Equal-tailed
        virg |      Imply   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
        slen | -7.391316   5.256959   .263113  -6.861438  -18.87585   2.036088
        swid | -9.686068    5.47113   .492419  -9.062651  -21.32787  -1.430718
        plen |  59.90382   23.97788   1.53277   56.48103   23.00752   116.0312
        pwid |  45.65266    22.2054   2.03525   42.01611   14.29399   99.51405
       _cons | -34.50204   13.77856   1.19136  -32.61649  -66.09916  -14.83324
------------------------------------------------------------------------------
Word: Default priors are used for mannequin parameters.

By default, regular priors with imply 0 and normal deviation 100 are used for the intercept and regression coefficients. Default regular priors are offered for comfort, so customers can see the naming conventions for parameters to specify their very own priors. The chosen priors are chosen to be pretty uninformative however might not be so for parameters with massive values.

The bayes:logit command produces estimates which might be a lot
greater in absolute worth than the corresponding most chance estimates. For instance, the posterior imply estimate for the coefficient of the plen variable, {virg:plen}, is about 60. Because of this a unit change in plen leads to a 60-unit change for the result within the logistic scale, which may be very massive. The corresponding most chance estimate is about 33. On condition that the default priors are obscure, can we clarify this distinction? Let us take a look at the pattern posterior distribution of the regression coefficients. I draw histograms utilizing the bayesgraph histogram command:


. bayesgraph histogram _all, mix(rows(3))

Underneath obscure priors, posterior modes are anticipated to be near MLEs. All posterior distributions are skewed. Thus, all pattern posterior means are a lot bigger than posterior modes in absolute worth and are totally different from MLEs.

Gelman et al. (2008) counsel making use of Cauchy priors for the regression coefficients when information are standardized so that every one steady variables have normal deviation 0.5. Particularly, we use a scale of 10 for the intercept and a scale of two.5 for the regression coefficients. This selection relies on the statement that throughout the unit change of every predictor, an consequence change of 5 models on the logistic scale will transfer the result likelihood from 0.01 to 0.5 and from 0.5 to 0.99.

The Cauchy priors are centered at 0, as a result of the covariates are centered at 0.


. set seed 15

. bayes, prior({virg:_cons}, cauchy(0, 10))                  ///
>         prior({virg:slen swid plen pwid}, cauchy(0, 2.5)): ///
>         logit virg slen swid plen pwid if touse

Burn-in ...
Simulation ...

Mannequin abstract
------------------------------------------------------------------------------
Probability:
  virg ~ logit(xb_virg)

Priors:
                {virg:_cons} ~ cauchy(0,10)                                (1)
  {virg:slen swid plen pwid} ~ cauchy(0,2.5)                               (1)
------------------------------------------------------------------------------
(1) Parameters are parts of the linear type xb_virg.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC pattern dimension =     10,000
                                                 Variety of obs    =        148
                                                 Acceptance price  =      .2072
                                                 Effectivity:  min =     .01907
                                                              avg =     .02489
Log marginal chance = -18.651475                          max =     .03497

------------------------------------------------------------------------------
             |                                                Equal-tailed
        virg |      Imply   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
        slen |  -2.04014   2.332062   .143792  -1.735382  -7.484299    1.63583
        swid |  -2.59423    2.00863   .107406  -2.351623  -7.379954   .6207466
        plen |  21.27293   10.37093   .717317   19.99503   4.803869   45.14316
        pwid |  16.74598   7.506278    .54353   16.00158   4.382826   35.00463
       _cons | -11.96009   4.157192   .272998  -11.20163  -22.45385  -6.549506
------------------------------------------------------------------------------

We are able to use bayesgraph diagnostics to confirm that there are not any convergence issues with the mannequin, however I skip this step right here.

The posterior imply estimates on this mannequin are about thrice smaller in absolute worth than these of the mannequin with default obscure regular priors and are nearer to the utmost chance estimates. For instance, the posterior imply estimate for {virg:plen} is now solely about 21.

The estimated log-marginal chance of the mannequin, -18.7, is greater than that of the mannequin with default regular priors, -20.6, which signifies that the mannequin with impartial Cauchy priors suits the information higher.

Predictions

Now that we’re glad with our mannequin, we will carry out some postestimation. All Bayesian postestimation options work after the bayes prefix simply as they do after the bayesmh command. Under, I present examples of acquiring out-of-sample predictions. Say we wish to make predictions for the primary and final observations in our dataset, which weren’t used for becoming the mannequin. The primary statement is just not from the Iris virginica class, however the final one is.


. checklist if !touse

     +------------------------------------------------------------+
     | virg       slen        swid       plen        pwid   touse |
     |------------------------------------------------------------|
  1. |    0   -.448837    .5143057   -.668397   -.6542964       0 |
150. |    1   .0342163   -.0622702   .3801059    .3939755       0 |
     +------------------------------------------------------------+

We are able to use the bayesstats abstract command to foretell the result class by making use of the invlogit() transformation to the specified linear mixture of predictors.


. bayesstats abstract (prob0:invlogit(-.448837*{virg:slen}  ///
>         +.5143057*{virg:swid}-.668397*{virg:plen}        ///
>         -.6542964*{virg:pwid}+{virg:_cons})), nolegend

Posterior abstract statistics                      MCMC pattern dimension =    10,000

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Imply   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
       prob0 |  7.26e-10   1.02e-08   3.1e-10   4.95e-16   3.18e-31   8.53e-10
------------------------------------------------------------------------------

. bayesstats abstract (prob1:invlogit(.0342163*{virg:slen}  ///
>         -.0622702*{virg:swid}+.3801059*{virg:plen}       ///
>         +.3939755*{virg:pwid}+{virg:_cons})), nolegend

Posterior abstract statistics                      MCMC pattern dimension =    10,000

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Imply   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
       prob1 |  .9135251   .0779741   .004257   .9361961   .7067089   .9959991
------------------------------------------------------------------------------

The posterior imply likelihood for the primary statement to belong to the category Iris virginica is estimated to be basically zero, 7.3e-10. In distinction, the estimated likelihood for the final statement is about 0.91. Each predictions agree with the noticed lessons.

The dataset used on this publish is obtainable right here: irisstd.dta

References

Gelman, A., A. Jakulin, M. G. Pittau, and Y.-S. Su. 2008. A weakly informative default prior distribution for logistic and different regression fashions. Annals of Utilized Statistics 2: 1360–1383.

Fisher, R. A. 1936. The usage of a number of measurements in taxonomic issues. Annals of Eugenics 7: 179–188.

Raftery, A. E. 1996. Approximate Bayes components and accounting for mannequin uncertainty in generalized linear fashions. Biometrika 83: 251–266.



Related Articles

Latest Articles