Sunday, January 18, 2026

Precise matching on discrete covariates is similar as regression adjustment


I illustrate that actual matching on discrete covariates and regression adjustment (RA) with absolutely interacted discrete covariates carry out the identical nonparametric estimation.

Evaluating actual matching with RA

A widely known instance from the causal inference literature estimates the common therapy impact (ATE) of pregnant girls smoking on the infants’ delivery weights. Cattaneo (2010) discusses this instance and I take advantage of an extract of his information. (My extract shouldn’t be consultant, and the outcomes beneath solely illustrate the strategies I focus on.) See Wooldridge (2010, chap. 21) for an introduction to estimating an ATE.

The delivery weight of the newborn born to a mom is recorded in bweight. mbsmoke is the binary therapy indicating whether or not every lady smoked whereas she was pregnant. I additionally management for the ladies’s training (medu), a binary indicator for whether or not this was her first child (fbaby), and a binary indicator for whether or not she was married (mmarried).

As is continuously the case, considered one of my management variables has too many classes for actual matching or to incorporate as a categorical variable in absolutely interacted regression. In instance 1, I impose a priori information that enables me to mix 0–8 years of education into the “Earlier than HS” class, 9–11 years into “In HS”, 12 into “HS”, and greater than 12 into “HS+”, the place HS stands for highschool.

Instance 1: Chopping medu into 4 classes


. generate medu2 = irecode(medu, 8, 11, 12)

. label outline  ed2l 0 "earlier than HS"  1 "in HS" 2 "HS" 3 "HS+"

. label values medu2 ed2l

Precise matching requires that not one of the cells fashioned by the therapy variable and the values for the discrete variables be empty. In instance 2, I create case, which enumerates the set of attainable covariate values, after which tabulate case over the therapy ranges.

Instance 2: Tabulating covariate patterns by therapy stage


. egen case = group(medu2 fbaby mmarried) , label

. tab case mbsmoke

    group(medu2 fbaby |  1 if mom smoked
            mmarried) | nonsmoker     smoker |     Complete
----------------------+----------------------+----------
earlier than HS No notmarri |        29         18 |        47
 earlier than HS No married |        63          4 |        67
earlier than HS Sure notmarr |        29         12 |        41
earlier than HS Sure married |        17          3 |        20
  in HS No notmarried |       106        103 |       209
     in HS No married |        76         53 |       129
 in HS Sure notmarried |       173         62 |       235
    in HS Sure married |        28         18 |        46
     HS No notmarried |       197        119 |       316
        HS No married |       706        163 |       869
    HS Sure notmarried |       233         90 |       323
       HS Sure married |       502         69 |       571
    HS+ No notmarried |        77         25 |       102
       HS+ No married |       812         58 |       870
   HS+ Sure notmarried |        95         26 |       121
      HS+ Sure married |       635         41 |       676
----------------------+----------------------+----------
                Complete |     3,778        864 |     4,642

Some additional consolidation is likely to be required, as a result of so few people who smoke with “earlier than HS” training have been married. There are solely 4 handled circumstances with “earlier than HS” training, not first child, and married; there are solely 3 handled circumstances with “earlier than HS” training, first child, and married. As I focus on in Completed and undone, how I mix the classes is essential to acquiring constant estimates. For this instance, I depart the classes as beforehand outlined and proceed to estimate the ATE by matching precisely on the covariates.

Instance 3: ATE estimated by actual matching on discrete covariates


. teffects nnmatch (bweight ) (mbsmoke), ematch(medu2 fbaby mmarried)

Therapy-effects estimation                   Variety of obs      =      4,642
Estimator      : nearest-neighbor matching     Matches: requested =          1
Final result mannequin  : matching                                     min =          3
Distance metric: Mahalanobis                                  max =        812
------------------------------------------------------------------------------
             |              AI Strong
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
     mbsmoke |
    (smoker  |
         vs  |
 nonsmoker)  |  -227.3809   26.99005    -8.42   0.000    -280.2804   -174.4813
------------------------------------------------------------------------------

Precise matching with alternative compares every handled case with the imply of the not-treated circumstances with the identical covariate sample, and it compares every not-treated case with the imply of the handled circumstances with the identical covariate sample. The imply of the case-level comparisons estimates the ATE.

RA estimates the ATE by the distinction between the averages of the expected values for the handled and not-treated circumstances. With absolutely interacted discrete covariates, the expected values are the result averages inside every covariate sample.

Instance 4 illustrates that actual matching with alternative produces the identical level estimates as RA with absolutely interacted discrete covariates.

Instance 4: ATE estimated by RA on discrete covariates


. regress bweight ibn.mbsmoke#ibn.case,            
>         noconstant vce(strong) vsquish

Linear regression                               Variety of obs     =      4,642
                                                F(32, 4610)       =    5472.14
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9731
                                                Root MSE          =     561.89

-------------------------------------------------------------------------------
              |               Strong
      bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
 mbsmoke#case |
   nonsmoker #|
earlier than HS ..  |   3412.345   85.26789    40.02   0.000     3245.179    3579.511
   nonsmoker #|
earlier than HS ..  |   3382.048   64.77681    52.21   0.000     3255.054    3509.041
   nonsmoker #|
earlier than HS ..  |   3095.897   121.4719    25.49   0.000     2857.753     3334.04
   nonsmoker #|
earlier than HS ..  |   3213.588   108.5406    29.61   0.000     3000.797     3426.38
   nonsmoker #|
in HS No n..  |   3219.255    66.9732    48.07   0.000     3087.955    3350.554
   nonsmoker #|
in HS No m..  |   3454.434   57.21777    60.37   0.000      3342.26    3566.608
   nonsmoker #|
in HS Sure ..  |   3227.977   49.20252    65.61   0.000     3131.516    3324.437
   nonsmoker #|
in HS Sure ..  |   3467.286   95.52026    36.30   0.000      3280.02    3654.551
   nonsmoker #|
HS No notm..  |   3327.249   45.20513    73.60   0.000     3238.625    3415.872
   nonsmoker #|
HS No marr..  |   3498.307   20.41325   171.37   0.000     3458.288    3538.327
   nonsmoker #|
HS Sure not..  |   3258.069   38.79208    83.99   0.000     3182.018     3334.12
   nonsmoker #|
HS Sure mar..  |   3382.054   24.69261   136.97   0.000     3333.644    3430.463
   nonsmoker #|
HS+ No not..  |   3227.597   80.73945    39.98   0.000     3069.309    3385.885
   nonsmoker #|
HS+ No mar..  |   3514.036   18.78391   187.08   0.000      3477.21    3550.861
   nonsmoker #|
HS+ Sure no..  |   3248.295   64.86602    50.08   0.000     3121.126    3375.463
   nonsmoker #|
HS+ Sure ma..  |   3441.787   21.05667   163.45   0.000     3400.506    3483.069
      smoker #|
earlier than HS ..  |   3181.111   105.5454    30.14   0.000     2974.192    3388.031
      smoker #|
earlier than HS ..  |    3373.75   229.6108    14.69   0.000     2923.603    3823.897
      smoker #|
earlier than HS ..  |   2924.333   139.0673    21.03   0.000     2651.695    3196.972
      smoker #|
earlier than HS ..  |   2863.333   93.69532    30.56   0.000     2679.646    3047.021
      smoker #|
in HS No n..  |    3038.68   59.37928    51.17   0.000     2922.268    3155.091
      smoker #|
in HS No m..  |   3115.698   58.70879    53.07   0.000     3000.601    3230.795
      smoker #|
in HS Sure ..  |   3147.097   62.21084    50.59   0.000     3025.134     3269.06
      smoker #|
in HS Sure ..  |   3353.889   111.5621    30.06   0.000     3135.174    3572.604
      smoker #|
HS No notm..  |   3061.437   60.37705    50.71   0.000     2943.069    3179.805
      smoker #|
HS No marr..  |   3184.221   47.77988    66.64   0.000     3090.549    3277.892
      smoker #|
HS Sure not..  |   3131.533   44.98026    69.62   0.000     3043.351    3219.716
      smoker #|
HS Sure mar..  |   3199.174   63.82476    50.12   0.000     3074.047    3324.301
      smoker #|
HS+ No not..  |    3002.36   89.60639    33.51   0.000     2826.689    3178.031
      smoker #|
HS+ No mar..  |   3199.707   82.92361    38.59   0.000     3037.137    3362.277
      smoker #|
HS+ Sure no..  |   3161.923   79.54319    39.75   0.000      3005.98    3317.866
      smoker #|
HS+ Sure ma..  |   3271.293   90.92146    35.98   0.000     3093.043    3449.542
-------------------------------------------------------------------------------

. margins r.mbsmoke , vce(unconditional) distinction(nowald)

Contrasts of predictive margins

Expression   : Linear prediction, predict()

------------------------------------------------------------------------
                       |            Unconditional
                       |   Distinction   Std. Err.     [95% Conf. Interval]
-----------------------+------------------------------------------------
               mbsmoke |
(smoker vs nonsmoker)  |  -227.3809   26.82888     -279.9783   -174.7834
------------------------------------------------------------------------

The 32 parameters estimated by regress are the technique of the result for the 32 circumstances within the desk in instance 1. The usual errors reported by actual matching and RA are asymptotically equal however differ in finite samples.

The regression underlying RA with absolutely interacted discrete covariates is an interplay between the therapy issue with an interplay between all of the discrete covariates. Instance 5 illustrates that this regression produces the identical outcomes as instance 4.

Instance 5: RA estimated with interactions


. regress bweight ibn.mbsmoke#ibn.medu2#ibn.fbaby#ibn.mmarried,   
>         noconstant vce(strong) vsquish

Linear regression                               Variety of obs     =      4,642
                                                F(32, 4610)       =    5472.14
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9731
                                                Root MSE          =     561.89

------------------------------------------------------------------------------
             |               Strong
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke#|
 medu2#fbaby#|
    mmarried |
  nonsmoker #|
  earlier than HS #|
         No #|
 notmarried  |   3412.345   85.26789    40.02   0.000     3245.179    3579.511
  nonsmoker #|
  earlier than HS #|
         No #|
    married  |   3382.048   64.77681    52.21   0.000     3255.054    3509.041
  nonsmoker #|
  earlier than HS #|
        Sure #|
 notmarried  |   3095.897   121.4719    25.49   0.000     2857.753     3334.04
  nonsmoker #|
  earlier than HS #|
        Sure #|
    married  |   3213.588   108.5406    29.61   0.000     3000.797     3426.38
  nonsmoker #|
      in HS #|
         No #|
 notmarried  |   3219.255    66.9732    48.07   0.000     3087.955    3350.554
  nonsmoker #|
      in HS #|
         No #|
    married  |   3454.434   57.21777    60.37   0.000      3342.26    3566.608
  nonsmoker #|
      in HS #|
        Sure #|
 notmarried  |   3227.977   49.20252    65.61   0.000     3131.516    3324.437
  nonsmoker #|
      in HS #|
        Sure #|
    married  |   3467.286   95.52026    36.30   0.000      3280.02    3654.551
  nonsmoker #|
         HS #|
         No #|
 notmarried  |   3327.249   45.20513    73.60   0.000     3238.625    3415.872
  nonsmoker #|
         HS #|
         No #|
    married  |   3498.307   20.41325   171.37   0.000     3458.288    3538.327
  nonsmoker #|
         HS #|
        Sure #|
 notmarried  |   3258.069   38.79208    83.99   0.000     3182.018     3334.12
  nonsmoker #|
         HS #|
        Sure #|
    married  |   3382.054   24.69261   136.97   0.000     3333.644    3430.463
  nonsmoker #|
        HS+ #|
         No #|
 notmarried  |   3227.597   80.73945    39.98   0.000     3069.309    3385.885
  nonsmoker #|
        HS+ #|
         No #|
    married  |   3514.036   18.78391   187.08   0.000      3477.21    3550.861
  nonsmoker #|
        HS+ #|
        Sure #|
 notmarried  |   3248.295   64.86602    50.08   0.000     3121.126    3375.463
  nonsmoker #|
        HS+ #|
        Sure #|
    married  |   3441.787   21.05667   163.45   0.000     3400.506    3483.069
     smoker #|
  earlier than HS #|
         No #|
 notmarried  |   3181.111   105.5454    30.14   0.000     2974.192    3388.031
     smoker #|
  earlier than HS #|
         No #|
    married  |    3373.75   229.6108    14.69   0.000     2923.603    3823.897
     smoker #|
  earlier than HS #|
        Sure #|
 notmarried  |   2924.333   139.0673    21.03   0.000     2651.695    3196.972
     smoker #|
  earlier than HS #|
        Sure #|
    married  |   2863.333   93.69532    30.56   0.000     2679.646    3047.021
     smoker #|
      in HS #|
         No #|
 notmarried  |    3038.68   59.37928    51.17   0.000     2922.268    3155.091
     smoker #|
      in HS #|
         No #|
    married  |   3115.698   58.70879    53.07   0.000     3000.601    3230.795
     smoker #|
      in HS #|
        Sure #|
 notmarried  |   3147.097   62.21084    50.59   0.000     3025.134     3269.06
     smoker #|
      in HS #|
        Sure #|
    married  |   3353.889   111.5621    30.06   0.000     3135.174    3572.604
     smoker #|
         HS #|
         No #|
 notmarried  |   3061.437   60.37705    50.71   0.000     2943.069    3179.805
     smoker #|
         HS #|
         No #|
    married  |   3184.221   47.77988    66.64   0.000     3090.549    3277.892
     smoker #|
         HS #|
        Sure #|
 notmarried  |   3131.533   44.98026    69.62   0.000     3043.351    3219.716
     smoker #|
         HS #|
        Sure #|
    married  |   3199.174   63.82476    50.12   0.000     3074.047    3324.301
     smoker #|
        HS+ #|
         No #|
 notmarried  |    3002.36   89.60639    33.51   0.000     2826.689    3178.031
     smoker #|
        HS+ #|
         No #|
    married  |   3199.707   82.92361    38.59   0.000     3037.137    3362.277
     smoker #|
        HS+ #|
        Sure #|
 notmarried  |   3161.923   79.54319    39.75   0.000      3005.98    3317.866
     smoker #|
        HS+ #|
        Sure #|
    married  |   3271.293   90.92146    35.98   0.000     3093.043    3449.542
------------------------------------------------------------------------------

. margins r.mbsmoke , vce(unconditional) distinction(nowald)

Contrasts of predictive margins

Expression   : Linear prediction, predict()

------------------------------------------------------------------------
                       |            Unconditional
                       |   Distinction   Std. Err.     [95% Conf. Interval]
-----------------------+------------------------------------------------
               mbsmoke |
(smoker vs nonsmoker)  |  -227.3809   26.82888     -279.9783   -174.7834
------------------------------------------------------------------------

Lastly, I illustrate that teffects ra produces the identical level estimates.

Instance 6: RA estimated by teffects


. teffects ra (bweight bn.medu2#ibn.fbaby#ibn.mmarried, noconstant) (mbsmoke)

Iteration 0:   EE criterion =  2.010e-25
Iteration 1:   EE criterion =  5.818e-26

Therapy-effects estimation                    Variety of obs     =      4,642
Estimator      : regression adjustment
Final result mannequin  : linear
Therapy mannequin: none
------------------------------------------------------------------------------
             |               Strong
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
     mbsmoke |
    (smoker  |
         vs  |
 nonsmoker)  |  -227.3809   26.73625    -8.50   0.000     -279.783   -174.9788
-------------+----------------------------------------------------------------
POmean       |
     mbsmoke |
  nonsmoker  |   3402.793    9.59059   354.81   0.000     3383.995     3421.59
------------------------------------------------------------------------------

The usual errors are asymptotically equal however differ in finite samples as a result of teffects does alter for the variety of parameters estimated within the regression, as regress does.

Completed and undone

I illustrated that actual matching on discrete covariates is similar as RA with absolutely interacted discrete covariates. Key to each strategies is that the covariates are the truth is discrete. If some collapsing of classes is carried out as above, or if a discrete covariate is fashioned by slicing up a steady covariate, all the outcomes require that this combining step be carried out accurately.

Precise matching on discrete covariates and RA with absolutely interacted discrete covariates carry out the identical nonparametric estimation. Collapsing classes or slicing up discrete covariates performs the identical operate as a bandwidth in nonparametric kernel regression; it determines which observations are comparable with one another. Simply as with kernel regression, the bandwidth have to be correctly chosen to acquire constant estimates.

References

Cattaneo, M. 2010. Effcient semiparametric estimation of multi-valued therapy results beneath ignorability. Journal of Econometrics 155: 138–154.

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Knowledge. 2nd ed. Cambridge, Massachusetts: MIT Press.



Related Articles

Latest Articles