Tuesday, December 9, 2025

Calculating energy utilizing Monte Carlo simulations, half 4: Multilevel/longitudinal fashions


In my final three posts, I confirmed you the way to calculate energy for a t check utilizing Monte Carlo simulations, the way to combine your simulations into Stata’s energy command, and the way to do that for linear and logistic regression fashions. In immediately’s put up, I’m going to point out you the way to estimate energy for multilevel/longitudinal fashions utilizing simulations. You could wish to evaluation my earlier put up titled “How you can simulate multilevel/longitudinal information” earlier than you learn this put up.

Our objective is to put in writing a program that can calculate energy for various values of the mannequin parameters. For instance, determine 1 shows energy for various numbers of observations at stage 1 and a pair of in a longitudinal mannequin.

Determine 1: Estimated energy for a multilevel/longitudinal mannequin

In my final put up, I prompt the next steps for writing such a program, so let’s comply with these steps once more for a multilevel/longitudinal mannequin.

  1. Write down the regression mannequin of curiosity, together with all parameters.
  2. Specify the main points of the covariates, such because the vary of age or the proportion of females.
  3. Find or take into consideration affordable values for the parameters in your mannequin.
  4. Simulate a single dataset assuming the choice speculation, and match the mannequin.
  5. Write a program to create the datasets, match the fashions, and use simulate to check this system.
  6. Write a program known as power_cmd_mymethod, which lets you run your simulations with energy.
  7. Write a program known as power_cmd_mymethod_init so that you could use numlists for all parameters.

On this instance, let’s think about that you’re planning a longitudinal research of childrens’ weight and you have an interest within the interplay of age and intercourse.

Step 1: Write down the mannequin

Step one towards simulating energy is to put in writing down the mannequin.

[
{bf weight}_{it} = beta_0 + beta_1({bf age}_{it}) + beta_2({bf female}_i) + beta_3({bf age}_{it}times {bf female}_i) + u_{0i} + u_{1i}({bf age}) + e_{it}
]

Right here i indexes youngsters, t indexes time (age), and we assume that (u_{0i} sim N(0,tau_0)), (u_{1i} sim N(0,tau_1)), (e_{it} sim N(0,sigma)), and ({rm cov}(tau_0,tau_1)=0).

We might want to create variables for weight, age, feminine, and the interplay time period age(occasions )feminine. We may even must specify affordable parameter values for (beta_0), (beta_1), (beta_2), (beta_3), (tau_0), (tau_1), and (sigma).

Again to desk of contents

Step 2: Specify the main points of the covariates

Subsequent, we want to consider the covariates in our mannequin. This can be a longitudinal research, so we have to specify the beginning age, the size of time between measurements, and the whole variety of measurements. We additionally want to contemplate the proportion of women and men in our research. Are we prone to pattern 50% females and 50% males?

Let’s assume that we are going to measure the childrens’ weight each 6 months for two years starting at age 12. And let’s additionally assume that the pattern will likely be 50% feminine. The interplay time period age(occasions )feminine is straightforward to calculate as soon as we create variables for age and feminine.

Again to desk of contents

Step 3: Find or take into consideration affordable values for the parameters in your mannequin

Subsequent, we want to consider affordable values for the parameters in our mannequin. We might select parameter values based mostly on a evaluation of the literature, outcomes from a pilot research, or from publicly obtainable information.

I’ve chosen to make use of information from a research of the load of Asian youngsters. You possibly can obtain and describe this dataset by typing

. use http://www.stata-press.com/information/r16/childweight.dta, clear
(Weight information on Asian youngsters)

. describe

Accommodates information from http://www.stata-press.com/information/r16/childweight.dta
  obs:           198                          Weight information on Asian youngsters
 vars:             5                          23 Might 2016 15:12
 dimension:         3,168                          (_dta has notes)
--------------------------------------------------------------------------------
              storage   show    worth
variable title   kind    format     label      variable label
--------------------------------------------------------------------------------
id              int     %8.0g                 little one identifier
age             float   %8.0g                 age in years
weight          float   %8.0g                 weight in Kg
brthwt          int     %8.0g                 Delivery weight in g
lady            float   %9.0g      bg         gender
--------------------------------------------------------------------------------
Sorted by: id  age

This dataset consists of variables for age measured in years (age), weight measured in kilograms (weight), and gender (lady). We will confirm that these are longitudinal information by itemizing among the observations.

. listing in 1/5

     +---------------------------------------+
     | id       age   weight   brthwt   lady |
     |---------------------------------------|
  1. | 45   .136893    5.171     4140    boy |
  2. | 45   .657084    10.86     4140    boy |
  3. | 45   1.21834    13.15     4140    boy |
  4. | 45   1.42916     13.2     4140    boy |
  5. | 45   2.27242    15.88     4140    boy |
     +---------------------------------------+

Baby 45 is a boy whose weight was recorded 5 occasions over two years. Discover that age is just not saved as absolute age. It’s saved as decimals representing the variety of years since an unknown time limit. This isn’t an issue as a result of we have an interest within the change of weight over time. We will match our mannequin to this dataset to estimate the parameters.

. blended weight c.age##i.lady || id: age, stddev

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log chance = -339.79909
Iteration 1:   log chance = -339.41232
Iteration 2:   log chance = -339.41033
Iteration 3:   log chance = -339.41032

Computing customary errors:

Blended-effects ML regression                     Variety of obs     =        198
Group variable: id                              Variety of teams  =         68

                                                Obs per group:
                                                              min =          1
                                                              avg =        2.9
                                                              max =          5

                                                Wald chi2(3)      =     680.51
Log chance = -339.41032                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   3.588488   .1892333    18.96   0.000     3.217598    3.959379
             |
        lady |
       lady  |   -.468204   .2954284    -1.58   0.113    -1.047233     .110825
             |
  lady#c.age |
       lady  |  -.2397793    .267607    -0.90   0.370    -.7642793    .2847208
             |
       _cons |   5.351429   .2076606    25.77   0.000     4.944421    5.758436
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unbiased              |
                     sd(age) |   .5662947   .1219764      .3712777    .8637461
                   sd(_cons) |   .2384136   .4034962      .0086445    6.575403
-----------------------------+------------------------------------------------
                sd(Residual) |   1.165676   .0762599      1.025395    1.325148
------------------------------------------------------------------------------
LR check vs. linear mannequin: chi2(2) = 20.38                 Prob > chi2 = 0.0000

Notice: LR check is conservative and supplied just for reference.

We will use the parameter estimates from this mannequin to assist us select the parameters for our simulation. For instance, we might start our simulation by setting the simulation parameters equal to the parameter estimates above: (beta_0=5.35), (beta_1=3.59), (beta_2=-0.47), (beta_3=-0.24), (tau_0=0.24), (tau_1=0.57), and (sigma=1.17).

Again to desk of contents

Step 4: Simulate a single dataset assuming the choice speculation, and match the mannequin

Subsequent, we create a simulated dataset based mostly on our assumptions in regards to the mannequin below the choice speculation. We’ll simulate 5 observations at 6-month increments for 200 youngsters. You could want to learn my earlier weblog put up titled “How you can simulate multilevel/longitudinal information” you probably have not completed this earlier than.

The code block under is much like the code we used to create the info for our linear regression mannequin, however there are two options which are distinctive to simulating multilevel information. First, we generate the child-level variables little one and feminine in addition to the random deviations u_0i and u_1i. Second, we use increase to make 5 copies of the observations for every little one. Third, we use bysort little one: to generate age inside every little one. Fourth, we generate the observation-level variable interplay and the observation-level error (e_ij). Lastly, we generate the variable weight utilizing a linear mixture of parameter values and the opposite variables that we generated.

set seed 16
clear
set obs 200
generate little one = _n
generate feminine = rbinomial(1,0.5)
generate u_0i = rnormal(0,0.25)
generate u_1i = rnormal(0,0.60)
increase 5
bysort little one: generate age = (_n-1)*0.5
generate interplay = age*feminine
generate e_ij = rnormal(0,1.2)
generate weight = 5.35 + 3.6*age + (-0.5)*feminine + (-0.25)*interplay ///
    + u_0i + age*u_1i + e_ij

Let’s listing the info for little one 1 to test our work. The simulated information consists of 5 observations for weight, age, and feminine. Our dataset additionally consists of the random deviations that we might not observe in an actual dataset.

. listing little one weight age feminine u_0i u_1i e_ij if little one==1

      +------------------------------------------------------------------+
      | little one     weight   age   feminine       u_0i       u_1i       e_ij |
      |------------------------------------------------------------------|
   1. |     1   6.957936     0        1   .0530039   .6878794   2.054933 |
   2. |     1   8.517486    .5        1   .0530039   .6878794   1.595542 |
   3. |     1   10.28054     1        1   .0530039   .6878794   1.339654 |
   4. |     1   11.48091   1.5        1   .0530039   .6878794   .5210837 |
   5. |     1   14.25277     2        1   .0530039   .6878794   1.274008 |
      +------------------------------------------------------------------+

We will then use blended to suit a mannequin to our simulated information.

. blended weight age i.feminine c.age#i.feminine || little one: age , stddev nolog noheader

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   3.654038   .1029402    35.50   0.000     3.452279    3.855797
    1.feminine |  -.3765285   .1389315    -2.71   0.007    -.6488291   -.1042279
             |
feminine#c.age |
          1  |  -.4101041    .142071    -2.89   0.004    -.6885581   -.1316501
             |
       _cons |   5.309637   .1006654    52.75   0.000     5.112336    5.506937
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
little one: Unbiased           |
                     sd(age) |   .6628522   .0544682      .5642497    .7786855
                   sd(_cons) |   .3342406   .1071213       .178343    .6264151
-----------------------------+------------------------------------------------
                sd(Residual) |   1.190916   .0316823      1.130411    1.254659
------------------------------------------------------------------------------
LR check vs. linear mannequin: chi2(2) = 157.47                Prob > chi2 = 0.0000

Notice: LR check is conservative and supplied just for reference.

We will then check the null speculation that the interplay time period equals zero utilizing a likelihood-ratio check. Recall from my final put up that we will calculate a likelihood-ratio check by becoming the mannequin with and with out the intereaction time period feminine#c.age and use lrtest to calculate the check statistic.

blended weight age i.feminine c.age#i.feminine || little one: age , stddev nolog noheader
estimates retailer full
blended weight age i.feminine || little one: age , stddev nolog noheader
estimates retailer decreased
lrtest full decreased

The p-value for our check is 0.0041, so we might reject the null speculation that the interplay time period equals zero.

. lrtest full decreased

Probability-ratio check                                 LR chi2(1)  =      8.23
(Assumption: decreased nested in full)                  Prob > chi2 =    0.0041

On this instance, each fashions converged. However generally fashions fail to converge for among the datasets once we match fashions to many simulated datasets. This is actually because the method of simulating information can lead to information which are incompatible with the mannequin. However there could also be different causes, and you need to think about them fastidiously if you happen to observe many fashions that fail to converge.

Lack of mannequin convergence also can trigger our simulation to run far longer than crucial. By default, blended will run 16,000 maximum-likelihood iterations earlier than it stops. Within the code block under, I’ve added choices and contours to deal with fashions that fail to converge. The choice iter(200) within the blended instructions will cease them from working if a mannequin fails to converge within the first 200 iterations. blended will retailer a worth of 1 to e(converged) if the mannequin converges and 0 in any other case. The second line of the code block shops the worth of e(converged) from the primary mannequin to an area macro named conv1. The fifth line of the code block shops the worth of e(converged) from the second mannequin to an area macro named conv2. The final line of the code block shops a worth to the native macro reject. The worth that’s saved in reject is set by the conditional operate cond(). If each blended fashions converged and `conv1’+`conv2’==2, then r(p)<`alpha’ will likely be evaluated. A worth of 1 will likely be saved to reject if the likelihood-ratio p-value, r(p), is lower than the alpha stage laid out in `alpha’ and 0 in any other case. If both of the blended fashions fails to converge and `conv1’+`conv2′ doesn’t equal 2, then a lacking worth will likely be saved to reject.

blended weight age i.feminine c.age#i.feminine || little one: age, iter(200)
native conv1 = e(converged)
estimates retailer full
blended weight age i.feminine || little one: age, iter(200)
native conv2 = e(converged)
estimates retailer decreased
lrtest full decreased
native reject = cond(`conv1'+`conv2'==2, r(p)<`alpha', .)

Again to desk of contents

Step 5: Write a program to create the datasets, match the fashions, and use simulate to check this system

Subsequent, let’s write a program that creates datasets below the choice speculation, suits blended fashions, assessments the null speculation of curiosity, and makes use of simulate to run many iterations of this system.

The code block under accommodates the syntax for a program named simmixed. The default parameter values within the syntax command are much like the parameters that we estimated utilizing the Asian youngsters dataset. And we use lrtest to check the null speculation that the parameter for the age(occasions )feminine interplay equals zero.

seize program drop simmixed
program simmixed, rclass
    model 16
    // PARSE INPUT
    syntax, n1(integer)             ///
            n(integer)              ///
          [ alpha(real 0.05)        ///
            intercept(real 5.35)    ///
            age(real 3.6)           ///
            female(real -0.5)       ///
            interact(real -0.25)    ///
            u0i(real 0.25)          ///
            u1i(real 0.60)          ///
            eij(real 1.2) ]

    // COMPUTE POWER
    quietly 
    // RETURN RESULTS
    return scalar reject = `reject'
    return scalar conv = `conv1'+`conv2'
finish

We then use simulate to run simmixed 10 occasions utilizing the default parameter values for five observations on every of 200 youngsters.

. simulate reject=r(reject) converged=r(conv), reps(10) seed(12345):      
>              simmixed, n1(5) n(200)

      command:  simmixed, n1(5) n(200)
       reject:  r(reject)
    converged:  r(conv)

Simulations (10)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..........

simulate saved the outcomes of the speculation assessments to a variable named reject. The imply of reject is our estimate of the ability to check the null speculation that the age(occasions )intercourse interplay time period equals zero, assuming that the load of 200 youngsters is measured 5 occasions.

. summarize reject

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      reject |         10          .3    .4830459          0          1

Again to desk of contents

Step 6: Write a program known as power_cmd_mymethod, which lets you run your simulations with energy

We might cease with our fast simulation if we have been solely in a selected set of assumptions. But it surely’s straightforward to put in writing an extra program named power_cmd_simmixed that can permit us to make use of Stata’s energy command to create tables and graphs for a spread of pattern sizes.

seize program drop power_cmd_simmixed
program power_cmd_simmixed, rclass
    model 16
    // PARSE INPUT
    syntax, n1(integer)             ///
            n(integer)              ///
          [ alpha(real 0.05)        ///
            intercept(real 5.35)    ///
            age(real 3.6)           ///
            female(real -0.5)       ///
            interact(real -0.25)    ///
            u0i(real 0.25)          ///
            u1i(real 0.60)          ///
            eij(real 1.2)           ///
            reps(integer 1000) ]

    // COMPUTE POWER
    quietly {
        simulate reject=r(reject), reps(`reps'):                            ///
        simmixed, n1(`n1') n(`n') alpha(`alpha') intercept(`intercept')     ///
                  age(`age') feminine(`feminine') work together(`work together')          ///
                  u0i(`u0i') u1i(`u1i') eij(`eij')
        summarize reject
    }

    // RETURN RESULTS
    return scalar energy = r(imply)
    return scalar n1 = `n1'
    return scalar N = `n'
    return scalar alpha = `alpha'
    return scalar intercept = `intercept'
    return scalar age = `age'
    return scalar feminine = `feminine'
    return scalar work together = `work together'
    return scalar u0i = `u0i'
    return scalar u1i = `u1i'
    return scalar eij = `eij'
finish

Again to desk of contents

Step 7: Write a program known as power_cmd_mymethod_init so that you could use numlists for all parameters

It’s additionally straightforward to put in writing a program named power_cmd_simmixed_init that can permit us to simulate energy for a spread of values for the parameters in our mannequin.

seize program drop power_cmd_simmixed_init
program power_cmd_simmixed_init, sclass
      model 16
      sreturn clear
      // ADD COLUMNS TO THE OUTPUT TABLE
      sreturn native pss_colnames "n1 intercept age feminine work together u0i u1i eij"
      // ALLOW NUMLISTS FOR ALL PARAMETERS
      sreturn native pss_numopts  "n1 intercept age feminine work together u0i u1i eij"
finish

Utilizing energy simmixed

Now, we will use energy simmixed to simulate energy for a wide range of assumptions. The instance under simulates energy for a spread of pattern sizes at each ranges 1 and a pair of. Degree 2 pattern sizes vary from 100 to 500 youngsters in increments of 100. At stage 1, we think about 5 and 6 observations per little one.

. energy simmixed, n1(5 6) n(100(100)500) reps(1000)                         
>               desk(n1 N energy)                                         
>               graph(ydimension(energy) xdimension(N) plotdimension(n1)    
>               xtitle(Degree 2 Pattern Dimension) legend(title(Degree 1 Pattern Dimension)))
xxxxxxxxxxxxxxxxxxxxxxxxxxx
Estimated energy
Two-sided check

  +-------------------------+
  |      n1       N   energy |
  |-------------------------|
  |       5     100   .2629 |
  |       6     100    .313 |
  |       5     200    .397 |
  |       6     200    .569 |
  |       5     300    .621 |
  |       6     300    .735 |
  |       5     400    .734 |
  |       6     400    .855 |
  |       5     500    .828 |
  |       6     500    .917 |
  +-------------------------+

Determine 2: Estimated energy for a multilevel/longitudinal mannequin
graph1

The desk and graph above point out that 80% energy is achieved with three mixtures of pattern sizes. Given our assumptions, we estimate that we are going to have at the very least 80% energy to detect an interplay parameter of -0.25 with 400 youngsters measured 6 occasions every and 500 youngsters measured 5 or 6 occasions every.

Operating our simulations with energy provides us the flexibleness to make use of energy simmixed to estimate energy for various values of any mannequin parameter we want. For instance, we might think about completely different values of the interplay time period together with completely different numbers of observations per little one.

On this weblog put up, I confirmed you the way to simulate statistical energy for the interplay time period in a multilevel regression mannequin. You could be all in favour of simulating energy for variations of those fashions, and you may modify this instance on your personal functions. In my subsequent put up, I’ll present you the way to simulate energy for path coefficients in structural equation fashions.



Related Articles

Latest Articles