Calculating energy utilizing Monte Carlo simulations, half 3: Linear and logistic regression

In my final two posts, I confirmed you learn how to calculate energy for a t check utilizing Monte Carlo simulations and learn how to combine your simulations into Stata’s energy command. In immediately’s put up, I’m going to indicate you learn how to do these duties for linear and logistic regression fashions. The technique and general construction of the applications for linear and logistic regression are much like the t check examples. The elements that may change are the simulation of the info and the fashions used to check the null speculation.

Selecting sensible regression parameters is difficult when simulating regression fashions. Typically, pilot knowledge or historic knowledge can present clues, however typically we should contemplate a variety of parameter values that we imagine are sensible. I’m going to take care of this problem by working examples which might be primarily based on the info from the Nationwide Well being and Vitamin Examination Survey (NHANES). You’ll be able to obtain a model of those knowledge by typing webuse nhanes2.

Linear regression instance

Let’s think about that you’re planning a research of systolic blood strain (SBP) and also you imagine that there’s an interplay between age and intercourse. The NHANES dataset contains the variables bpsystol (SBP), age, and intercourse. Under, I’ve match a linear regression mannequin that features an age-by-sex interplay time period, and the p-values for all of the parameter estimates equal 0.000. This isn’t shocking, as a result of the dataset contains 10,351 observations. p-values turn into smaller as pattern sizes turn into bigger when all the things else is held fixed.

. webuse nhanes2

. regress bpsystol c.age##ib1.intercourse

      Supply |       SS           df       MS      Variety of obs   =    10,351
-------------+----------------------------------   F(3, 10347)     =   1180.87
       Mannequin |     1437147         3      479049   Prob > F        =    0.0000
    Residual |  4197523.03    10,347  405.675367   R-squared       =    0.2551
-------------+----------------------------------   Adj R-squared   =    0.2548
       Whole |  5634670.03    10,350  544.412563   Root MSE        =    20.141

------------------------------------------------------------------------------
    bpsystol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |     .47062   .0167357    28.12   0.000     .4378147    .5034252
             |
         intercourse |
     Feminine  |  -20.45813   1.165263   -17.56   0.000    -22.74227   -18.17398
             |
   intercourse#c.age |
     Feminine  |   .3457346   .0230373    15.01   0.000      .300577    .3908923
             |
       _cons |   110.5691   .8440692   131.00   0.000     108.9146    112.2236
------------------------------------------------------------------------------

Maybe you don’t have the sources to gather a pattern of 10,351 contributors to your research, however you wish to have 80% energy to detect an interplay parameter of 0.35. How massive does your pattern have to be?

Let’s begin by making a single pseudo-random dataset primarily based on the parameter estimates from the NHANES mannequin. We start the code block under by clearing Stata’s reminiscence. Subsequent, we set the random seed to fifteen in order that we will reproduce our outcomes and set the variety of observations to 100.

clear
set seed 15
set obs 100
generate age = runiformint(18,65)
generate feminine = rbinomial(1,0.5)
generate work together = age*feminine
generate e = rnormal(0,20)
generate sbp = 110 + 0.5*age + (-20)*feminine + 0.35*work together  + e

The fourth line of the code block generates a variable named age, which incorporates integers drawn from a uniform distribution on the interval [18,65].

The fifth line generates an indicator variable named feminine utilizing a Bernoulli distribution with chance equal to 0.5. Recall {that a} binomial distribution with one trial is equal to a Bernoulli distribution.

The sixth line generates a variable for the interplay of age and feminine.

The seventh line generates a variable, e, that’s the error time period for the regression mannequin. The errors are generated from a standard distribution with a imply of 0 and customary deviation of 20. The worth of 20 is predicated on the basis MSE estimated from the NHANES regression mannequin.

The final line of the code block generates the variable sbp primarily based on a linear mixture of our simulated variables and the parameter estimates from the NHANES regression mannequin.

Under are the outcomes of a linear mannequin match to our simulated knowledge utilizing regress. The parameter estimates differ considerably from our enter parameters as a result of I generated just one comparatively small dataset. We may cut back this discrepancy by rising our pattern dimension, drawing many samples, or each.

. regress sbp age i.feminine c.age#i.feminine

      Supply |       SS           df       MS      Variety of obs   =       100
-------------+----------------------------------   F(3, 96)        =      7.15
       Mannequin |  9060.32024         3  3020.10675   Prob > F        =    0.0002
    Residual |  40569.7504        96  422.601567   R-squared       =    0.1826
-------------+----------------------------------   Adj R-squared   =    0.1570
       Whole |  49630.0707        99  501.313845   Root MSE        =    20.557

------------------------------------------------------------------------------
         sbp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .7752417   .1883855     4.12   0.000     .4012994    1.149184
    1.feminine |   5.759113   15.09611     0.38   0.704    -24.20644    35.72466
             |
feminine#c.age |
          1  |  -.2690026   .3319144    -0.81   0.420    -.9278475    .3898423
             |
       _cons |   100.4409   8.381869    11.98   0.000     83.80306    117.0788
------------------------------------------------------------------------------

The p-value for the interplay time period equals 0.420, which isn’t statistically vital on the 0.05 stage. Clearly, we want a bigger pattern dimension.

We may use the p-value from the regression mannequin to check the null speculation that the interplay time period is zero. That will work on this instance as a result of we’re testing just one parameter. However we must check a number of parameters concurrently if our interplay included a categorical variable equivalent to race. And there could also be instances once we want to check a number of variables concurrently.

Chance-ratio checks can check many sorts of hypotheses, together with a number of parameters concurrently. I’m going to indicate you learn how to use a likelihood-ratio check on this instance as a result of it’s going to generalize to different hypotheses you could encounter in your analysis. You’ll be able to learn extra about likelihood-ratio checks within the Stata Base Reference Guide if you’re not accustomed to them.

The code block under exhibits 4 of the 5 steps used to calculate a likelihood-ratio check. We’ll check the null speculation that the coefficient for the interplay time period equals zero. The primary line matches the “full” regression mannequin that features the interplay time period. The second line shops the estimates of the total mannequin in reminiscence. The identify “full” is unfair. We may have named the outcomes of this mannequin something we like. The third line matches the “lowered” regression mannequin that omits the interplay time period. And the fourth line shops the outcomes of the lowered mannequin in reminiscence.

regress sbp age i.feminine c.age#i.feminine
estimates retailer full
regress sbp age i.feminine
estimates retailer lowered

The fifth step makes use of lrtest to calculate a likelihood-ratio check of the total mannequin versus the lowered mannequin. The check yields a p-value of 0.4089, which is near the Wald check reported within the regression output above. We can’t reject the null speculation that the interplay parameter is zero.

. lrtest full lowered

Chance-ratio check                                 LR chi2(1)  =      0.68
(Assumption: lowered nested in full)                  Prob > chi2 =    0.4089

You’ll be able to kind return listing to see that the p-value is saved within the scalar r(p). And you should utilize r(p) to outline reject the identical means we did in our t check program.

. return listing

scalars:
                  r(p) =  .4089399864598747
               r(chi2) =  .6818803412616035
                 r(df) =  1

. native reject = (r(p)<0.05)

Simulating the info and testing the null speculation for a regression mannequin are a bit extra sophisticated than for a t check. However writing a program to automate this course of is nearly equivalent to the t check instance. Let’s contemplate the code block under, which defines this system simregress.

seize program drop simregress
program simregress, rclass
    model 16
    // DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
    syntax, n(integer)          /// Pattern dimension
          [ alpha(real 0.05)    /// Alpha level
            intercept(real 110) /// Intercept parameter
            age(real 0.5)       /// Age parameter
            female(real -20)    /// Female parameter
            interact(real 0.35) /// Interaction parameter
            esd(real 20) ]      //  Commonplace deviation of the error
    quietly {
        // GENERATE THE RANDOM DATA
        clear
        set obs `n'
        generate age = runiformint(18,65)
        generate feminine = rbinomial(1,0.5)
        generate work together = age*feminine
        generate e = rnormal(0,`esd')
        generate sbp = `intercept' + `age'*age + `feminine'*feminine + ///
           `work together'*work together  + e
        // TEST THE NULL HYPOTHESIS
        regress sbp age i.feminine c.age#i.feminine
        estimates retailer full
        regress sbp age i.feminine
        estimates retailer lowered
        lrtest full lowered
    }
    // RETURN RESULTS
    return scalar reject = (r(p)<`alpha')
finish

The primary three traces, which start with seize program, program, and model, are principally the identical as in our t check program.

The syntax part of this system is much like that of the t check program, however the names of the enter parameters are, clearly, completely different. I’ve included enter parameters for the pattern dimension, alpha stage, and primary regression parameters. I’ve not included an enter parameter for each potential parameter within the mannequin, however you possibly can if you happen to like. For instance, I’ve “arduous coded” the vary of the variable age as 18 to 65 in my program. However you possibly can embody an enter parameter for the higher and decrease bounds of age if you want. I additionally discover it useful to incorporate feedback that describe the parameter names in order that there isn’t any ambiguity.

The following part of code is embedded in a “quietly” block. Instructions like set obs, generate, and regress ship output to the Outcomes window and log file (when you have one open). Putting these instructions in a quietly block suppresses that output.

Now we have already written the instructions to create our random knowledge and check the null speculation. So we will copy that code into the quietly block and change any enter parameters with their corresponding native macros outlined by syntax. For instance, I’ve modified set obs 100 to set obs `n’ in order that the variety of observations will likely be set by the enter parameter specified with syntax. I’ve additionally given the enter parameters the identical names because the simulated variables within the mannequin. So `age’*age is the product of the enter parameter `age’ outlined by syntax and the variable age generated by simulation.

The p-value from the likelihood-ratio check is saved within the scalar r(p), and our program returns the scalar reject precisely because it did in our t check program.

Under, I’ve used simulate to run simregress 100 instances and summarized the variable reject. The outcomes point out that we might have 16% energy to detect an interplay parameter of 0.35 given a pattern of 100 contributors and the opposite assumptions concerning the mannequin.

. simulate reject=r(reject), reps(100):   
>         simregress, n(100) age(0.5) feminine(20) work together(0.35) 
>            esd(20) alpha(0.05)

      command:  simregress, n(100) age(0.5) feminine(20) work together(0.35)
> esd(20) alpha(0.05)
       reject:  r(reject)

Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100

. summarize reject

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      reject |        100         .16    .3684529          0          1

Subsequent, let’s write a program known as power_cmd_simregress in order that we will combine simregress into Stata’s energy command. The construction of power_cmd_simregress is identical as power_cmd_ttest in my final put up. First, we outline the syntax and enter parameters and specify their default values. Then, we run the simulation and summarize the variable reject. And at last, we return the outcomes.

seize program drop power_cmd_simregress
program power_cmd_simregress, rclass
    model 16
    // DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
    syntax, n(integer)          /// Pattern dimension
          [ alpha(real 0.05)    /// Alpha level
            intercept(real 110) /// Intercept parameter
            age(real 0.5)       /// Age parameter
            female(real -20)    /// Female parameter
            interact(real 0.35) /// Interaction parameter
            esd(real 20)        /// Standard deviation of the error
            reps(integer 100)]  //   Variety of repetitions

    // GENERATE THE RANDOM DATA AND TEST THE NULL HYPOTHESIS
    quietly {
        simulate reject=r(reject), reps(`reps'):               ///
             simregress, n(`n') age(`age') feminine(`feminine')    ///
                         work together(`work together') esd(`esd') alpha(`alpha')
        summarize reject
    }
    // RETURN RESULTS
    return scalar energy = r(imply)
    return scalar N = `n'
    return scalar alpha = `alpha'
    return scalar intercept = `intercept'
    return scalar age = `age'
    return scalar feminine = `feminine'
    return scalar work together = `work together'
    return scalar esd = `esd'
finish

Let’s additionally write a program known as power_cmd_simregress_init. Recall from my final put up that this program will enable us to run energy simregress for a variety of enter parameter values, together with the parameters listed in double quotes.

seize program drop power_cmd_simregress_init
program power_cmd_simregress_init, sclass
    sreturn native pss_colnames "intercept age feminine work together esd"
    sreturn native pss_numopts  "intercept age feminine work together esd"
finish

Now, we’re prepared to make use of energy simregress! The output under exhibits the simulated energy when the interplay parameter equals 0.2 to 0.4 in increments of 0.05 for samples of dimension 400, 500, 600, and 700.

. energy simregress, n(400(100)700) intercept(110)                 ///
>                   age(0.5) feminine(-20) work together(0.2(0.05)0.4)   ///
>                   reps(1000) desk graph(xdimension(work together)   ///
>                   legend(rows(1)))

Estimated energy
Two-sided check

  +--------------------------------------------------------------------+
  |   alpha   energy       N intercept     age  feminine work together     esd |
  |--------------------------------------------------------------------|
  |     .05    .267     400       110      .5     -20       .2      20 |
  |     .05    .398     400       110      .5     -20      .25      20 |
  |     .05    .547     400       110      .5     -20       .3      20 |
  |     .05    .677     400       110      .5     -20      .35      20 |
  |     .05    .792     400       110      .5     -20       .4      20 |
  |     .05     .33     500       110      .5     -20       .2      20 |
  |     .05     .46     500       110      .5     -20      .25      20 |
  |     .05    .646     500       110      .5     -20       .3      20 |
  |     .05    .763     500       110      .5     -20      .35      20 |
  |     .05    .854     500       110      .5     -20       .4      20 |
  |     .05    .384     600       110      .5     -20       .2      20 |
  |     .05    .563     600       110      .5     -20      .25      20 |
  |     .05    .702     600       110      .5     -20       .3      20 |
  |     .05    .841     600       110      .5     -20      .35      20 |
  |     .05    .928     600       110      .5     -20       .4      20 |
  |     .05    .444     700       110      .5     -20       .2      20 |
  |     .05    .641     700       110      .5     -20      .25      20 |
  |     .05    .793     700       110      .5     -20       .3      20 |
  |     .05    .904     700       110      .5     -20      .35      20 |
  |     .05    .958     700       110      .5     -20       .4      20 |
  +--------------------------------------------------------------------+

Determine 1 shows the outcomes graphically.

Determine 1: Estimated energy for the interplay time period in a regression mannequin

The desk and the graph present us that there are a number of combos of parameters that may end in 80% energy. A pattern of 700 contributors would give us roughly 80% energy to detect an interplay parameter of 0.30. A pattern of 600 contributors would give us roughly 80% energy to detect an interplay parameter of 0.33. A pattern of 500 contributors would give us roughly 80% energy to detect an interplay parameter of roughly 0.37. And a pattern of 400 contributors would give us roughly 80% energy to detect an interplay parameter of 0.40. Our remaining selection of pattern dimension is then primarily based on the dimensions of the interplay parameter that we wish to detect.

This instance targeted on the interplay time period in a regression mannequin with two covariates. However you possibly can modify this instance to simulate energy for nearly any sort of regression mannequin you’ll be able to think about. I’d recommend the next steps when planning your simulation:

Write down the regression mannequin of curiosity, together with all parameters.
Specify the small print of the covariates, such because the vary of age or the proportion of females.
Find or take into consideration cheap values for the parameters in your mannequin.
Simulate a single dataset assuming the choice speculation, and match the mannequin.
Write a program to create the datasets, match the fashions, and use simulate to check this system.
Write a program known as power_cmd_mymethod, which lets you run your simulations with energy.
Write a program known as power_cmd_mymethod_init in an effort to use numlists for all parameters.

Let’s do this method for a logistic regression mannequin.

Logistic regression instance

On this instance, let’s think about that you’re planning a research of hypertension (highbp). Hypertension is binary, so we are going to use logistic regression to suit the mannequin and use odds ratios for the impact dimension.

Step 1: Write down the mannequin

Step one towards simulating energy is to write down down the mannequin.

[
{rm logit}({bf highbp}) = beta_0 + beta_1({bf age}) + beta_2({bf sex}) + beta_3({bf age}times {bf sex})
]

We might want to create variables for highbp, age, intercourse, and the interplay time period age(instances )intercourse. We may even must specify cheap parameter values for (beta_0), (beta_1), (beta_2), and (beta_3).

Step 2: Specify particulars of the covariates

Subsequent, we want to consider the covariates in our mannequin. What values of age are cheap for our research? Are we considering older adults? Youthful adults? Let’s assume that we’re considering adults between the ages of 18 and 65. Is the distribution of age prone to be uniform over the interval [18,65], or will we count on a hump-shaped distribution across the center of the age vary? We additionally want to consider the proportion of women and men in our research. Are we prone to pattern 50% males and 50% females? These are the sorts of questions that we have to ask ourselves when planning our energy calculations.

Let’s assume that we’re considering adults between the ages of 18 and 65 and we imagine that age is uniformly distributed. Let’s additionally assume that the pattern will likely be 50% feminine. The interplay time period age(instances )intercourse is simple to calculate as soon as we create variables for age and intercourse.

Step 3: Specify cheap values for the parameters

Subsequent we want to consider cheap values for the parameters in our mannequin. We may select parameter values primarily based on a evaluation of the literature, outcomes from a pilot research, or publicly accessible knowledge.

I’ve chosen to make use of the NHANES knowledge once more as a result of it contains the variables hypertension (highbp), age, and intercourse.

. webuse nhanes2

. logistic highbp c.age##ib1.intercourse

Logistic regression                             Variety of obs     =     10,351
                                                LR chi2(3)        =    1675.19
                                                Prob > chi2       =     0.0000
Log probability = -6213.1696                     Pseudo R2         =     0.1188

------------------------------------------------------------------------------
      highbp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.035184   .0018459    19.39   0.000     1.031572    1.038808
             |
         intercourse |
     Feminine  |   .1556985   .0224504   -12.90   0.000     .1173677    .2065477
             |
   intercourse#c.age |
     Feminine  |   1.028811    .002794    10.46   0.000      1.02335    1.034302
             |
       _cons |   .1690035   .0153794   -19.54   0.000     .1413957    .2020018
------------------------------------------------------------------------------
Word: _cons estimates baseline odds.

The output contains estimates of the percentages ratio for every of the variables in our mannequin. Odds ratios are exponentiated parameter estimates (that’s, (hat{{rm OR}_i} = e^{hat{beta_i}})), so we may specify the pure logarithms of the percentages ratios, (beta_i = {rm ln}({rm OR}_i)), as parameters in our energy simulations.

For instance, the estimate of the percentages ratio for age within the output above is 1.04, so we will specify (beta_1 = {rm ln}(hat{{rm OR}_{bf age}}) = {rm ln}(1.04) ).

We are able to additionally specify (beta_0 = {rm ln}(hat{{rm OR}_{bf cons}}) = {rm ln}(0.17) ), (beta_2 = {rm ln}(hat{{rm OR}_{bf intercourse}}) = {rm ln}(0.16) ), and (beta_3 = {rm ln}(hat{{rm OR}_{{bf age}instances {bf intercourse}}}) = {rm ln}(1.03) ).

Step 4: Simulate a dataset assuming the choice speculation, and match the mannequin

Subsequent, we create a simulated dataset primarily based on our assumptions concerning the mannequin below the choice speculation. The code block under is nearly equivalent to the code we used to create the info for our linear regression mannequin, however there are two essential variations. First, we use generate xb to create a linear mixture of the parameters and the simulated variables. The parameters are expressed because the pure logarithms of the percentages ratios estimated with the NHANES knowledge. Second, we use rlogistic(m,s) to create the binary dependent variable highbp from the variable xb.

clear
set seed 123456
set obs 100
generate age = runiformint(18,65)
generate feminine = rbinomial(1,0.5)
generate work together = age*feminine
generate xb = (ln(0.17) + ln(1.04)*age + ln(0.15)*feminine + ln(1.03)*work together)
generate highbp = rlogistic(xb,1) > 0

We are able to then match a logistic regression mannequin to our simulated knowledge.

. logistic highbp age i.feminine c.age#i.feminine

Logistic regression                             Variety of obs     =        100
                                                LR chi2(3)        =      14.95
                                                Prob > chi2       =     0.0019
Log probability = -61.817323                     Pseudo R2         =     0.1079

------------------------------------------------------------------------------
      highbp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.055245   .0250921     2.26   0.024     1.007194    1.105589
    1.feminine |   .2365046   .3730999    -0.91   0.361     .0107404    5.207868
             |
feminine#c.age |
          1  |   1.015651   .0365417     0.43   0.666     .9464976    1.089857
             |
       _cons |   .1578931    .151025    -1.93   0.054     .0242207    1.029293
------------------------------------------------------------------------------
Word: _cons estimates baseline odds.

Step 5: Write a program to create the datasets, match the fashions, and use simulate to check this system

Subsequent, let’s write a program that creates datasets below the choice speculation, matches logistic regression fashions, checks the null speculation, and makes use of simulate to run many iterations of this system.

The code block under incorporates the syntax for a program named simlogit. The default parameter values within the syntax command are the percentages ratios that we estimated utilizing the NHANES knowledge. And we use lrtest to check the null speculation that the percentages ratio for the age(instances )intercourse interplay equals 1.

seize program drop simlogit
program simlogit, rclass
    model 16
    // DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
    syntax, n(integer)              /// Pattern dimension
          [ alpha(real 0.05)        /// Alpha level
            intercept(real 0.17)    /// Intercept odds ratio
            age(real 1.04)          /// Age odds ratio
            female(real 0.15)       /// Female odds ratio
            interact(real 1.03) ]   //  Interplay odds ratio
    // GENERATE THE RANDOM DATA AND TEST THE NULL HYPOTHESIS
    quietly {
        drop _all
        set obs `n'
        generate age = runiformint(18,65)
        generate feminine = rbinomial(1,0.5)
        generate work together = age*feminine
        generate xb = (ln(`intercept') + ln(`age')*age + ln(`feminine')*feminine + ln(`work together')*work together)
        generate highbp = rlogistic(xb,1) > 0

        logistic highbp age i.feminine c.age#i.feminine
        estimates retailer full
        logistic highbp age i.feminine
        estimates retailer lowered
        lrtest full lowered
    }
    // RETURN RESULTS
    return scalar reject = (r(p)<`alpha')
finish

We then use simulate to run simlogit 100 instances utilizing the default parameter values.

. simulate reject=r(reject), reps(100):   ///
>         simlogit, n(500) intercept(0.17) age(1.04) feminine(.15) work together(1.03) alpha(0.05)

      command:  simlogit, n(500) intercept(0.17) age(1.04) feminine(.15)
> work together(1.03) alpha(0.05)
       reject:  r(reject)

Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

simulate saved the outcomes of the speculation checks to a variable named reject. The imply of reject is our estimate of the facility to detect an odds ratio of 1.03 for the age(instances )intercourse interplay time period assuming a pattern dimension of 500 individuals.

. summarize reject

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      reject |        100         .53    .5016136          0          1

Step 6: Write a program known as power_cmd_simlogit

We may cease with our fast simulation if we had been solely in a particular set of assumptions. Nevertheless it’s straightforward to write down a further program known as power_cmd_simlogit which can enable us to make use of Stata’s energy command to create tables and graphs for a variety of pattern sizes.

seize program drop power_cmd_simlogit
program power_cmd_simlogit, rclass
    model 16
    // DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
    syntax, n(integer)              /// Pattern dimension
          [ alpha(real 0.05)        /// Alpha level
            intercept(real 0.17)    /// Intercept odds ratio
            age(real 1.04)          /// Age odds ratio
            female(real 0.15)       /// Female odds ratio
            interact(real 1.03)     ///  Interaction odds ratio
            reps(integer 100) ]     //   Variety of repetitions
    // GENERATE THE RANDOM DATA AND TEST THE NULL HYPOTHESIS
    quietly {
        simulate reject=r(reject), reps(`reps'):                      ///
                 simlogit, n(`n') intercept(`intercept') age(`age')   ///
                 feminine(`feminine') work together(`work together') alpha(`alpha')
        summarize reject
        native energy = r(imply)
    }
    // RETURN RESULTS
    return scalar energy = r(imply)
    return scalar N = `n'
    return scalar alpha = `alpha'
    return scalar intercept = `intercept'
    return scalar age = `age'
    return scalar feminine = `feminine'
    return scalar work together = `work together'
finish

Step 7: Write a program known as power_cmd_simlogit_init

It’s additionally straightforward to write down a program known as power_cmd_simlogit_init which can enable us to simulate energy for a variety of values for the parameters in our mannequin.

seize program drop power_cmd_simlogit_init
program power_cmd_simlogit_init, sclass
    sreturn native pss_colnames "intercept age feminine work together"
    sreturn native pss_numopts  "intercept age feminine work together"
finish

Utilizing energy simlogit

Now, we will use energy simlogit to simulate energy for quite a lot of assumptions. The instance under simulates energy for a variety of pattern sizes and impact sizes. Pattern sizes vary from 400 to 1000 individuals in increments of 200. And the percentages ratios for the age(instances )intercourse interplay time period vary from 1.02 to 1.05 in increments of 0.01.

. energy simlogit, n(400(200)1000) intercept(0.17) age(1.04) feminine(.15) work together(1.02(0.01)1.05)  ///
>                 reps(1000) desk graph(xdimension(work together) legend(rows(1)))

Estimated energy
Two-sided check

  +------------------------------------------------------------+
  |   alpha   energy       N intercept     age  feminine work together |
  |------------------------------------------------------------|
  |     .05    .197     400       .17    1.04     .15     1.02 |
  |     .05    .423     400       .17    1.04     .15     1.03 |
  |     .05    .645     400       .17    1.04     .15     1.04 |
  |     .05    .797     400       .17    1.04     .15     1.05 |
  |     .05    .275     600       .17    1.04     .15     1.02 |
  |     .05    .602     600       .17    1.04     .15     1.03 |
  |     .05    .838     600       .17    1.04     .15     1.04 |
  |     .05    .927     600       .17    1.04     .15     1.05 |
  |     .05    .357     800       .17    1.04     .15     1.02 |
  |     .05    .675     800       .17    1.04     .15     1.03 |
  |     .05    .899     800       .17    1.04     .15     1.04 |
  |     .05    .987     800       .17    1.04     .15     1.05 |
  |     .05     .46   1,000       .17    1.04     .15     1.02 |
  |     .05    .791   1,000       .17    1.04     .15     1.03 |
  |     .05    .956   1,000       .17    1.04     .15     1.04 |
  |     .05    .993   1,000       .17    1.04     .15     1.05 |
  +------------------------------------------------------------+

Determine 2: Estimated energy for the interplay time period in a logistic regression mannequin

The desk and graph above point out that 80% energy is achieved with 4 combos of pattern dimension and impact dimension. Given our assumptions, we estimate that we are going to have no less than 80% energy to detect an odds ratio of 1.04 for pattern sizes of 600, 800, and 1000. And we estimate that we are going to have 80% energy to detect an odds ratio of 1.05 with a pattern dimension of 400 individuals.

On this weblog put up, I confirmed you learn how to simulate statistical energy for the interplay time period in each linear and logistic regression fashions. It’s possible you’ll be considering simulating energy for variations of those fashions, and you may modify the examples above to your personal functions. In my subsequent put up, I’ll present you learn how to simulate energy for multilevel and longitudinal fashions.

Calculating energy utilizing Monte Carlo simulations, half 3: Linear and logistic regression

Related Articles

What to Do in Chicago If You’re Right here for Enterprise (2026)

Postgres vs MySQL vs SQLite: Evaluating SQL Efficiency Throughout Engines

5 causes to purchase a Galaxy S26 and three causes to not purchase one

Latest Articles

What to Do in Chicago If You’re Right here for Enterprise (2026)

Postgres vs MySQL vs SQLite: Evaluating SQL Efficiency Throughout Engines

5 causes to purchase a Galaxy S26 and three causes to not purchase one

Submerged bumblebee queens breathe underwater

Simplifying expressions in SymPy