I illustrate that actual matching on discrete covariates and regression adjustment (RA) with absolutely interacted discrete covariates carry out the identical nonparametric estimation.
Evaluating actual matching with RA
A widely known instance from the causal inference literature estimates the common therapy impact (ATE) of pregnant girls smoking on the infants’ delivery weights. Cattaneo (2010) discusses this instance and I take advantage of an extract of his information. (My extract shouldn’t be consultant, and the outcomes beneath solely illustrate the strategies I focus on.) See Wooldridge (2010, chap. 21) for an introduction to estimating an ATE.
The delivery weight of the newborn born to a mom is recorded in bweight. mbsmoke is the binary therapy indicating whether or not every lady smoked whereas she was pregnant. I additionally management for the ladies’s training (medu), a binary indicator for whether or not this was her first child (fbaby), and a binary indicator for whether or not she was married (mmarried).
As is continuously the case, considered one of my management variables has too many classes for actual matching or to incorporate as a categorical variable in absolutely interacted regression. In instance 1, I impose a priori information that enables me to mix 0–8 years of education into the “Earlier than HS” class, 9–11 years into “In HS”, 12 into “HS”, and greater than 12 into “HS+”, the place HS stands for highschool.
Instance 1: Chopping medu into 4 classes
. generate medu2 = irecode(medu, 8, 11, 12) . label outline ed2l 0 "earlier than HS" 1 "in HS" 2 "HS" 3 "HS+" . label values medu2 ed2l
Precise matching requires that not one of the cells fashioned by the therapy variable and the values for the discrete variables be empty. In instance 2, I create case, which enumerates the set of attainable covariate values, after which tabulate case over the therapy ranges.
Instance 2: Tabulating covariate patterns by therapy stage
. egen case = group(medu2 fbaby mmarried) , label
. tab case mbsmoke
group(medu2 fbaby | 1 if mom smoked
mmarried) | nonsmoker smoker | Complete
----------------------+----------------------+----------
earlier than HS No notmarri | 29 18 | 47
earlier than HS No married | 63 4 | 67
earlier than HS Sure notmarr | 29 12 | 41
earlier than HS Sure married | 17 3 | 20
in HS No notmarried | 106 103 | 209
in HS No married | 76 53 | 129
in HS Sure notmarried | 173 62 | 235
in HS Sure married | 28 18 | 46
HS No notmarried | 197 119 | 316
HS No married | 706 163 | 869
HS Sure notmarried | 233 90 | 323
HS Sure married | 502 69 | 571
HS+ No notmarried | 77 25 | 102
HS+ No married | 812 58 | 870
HS+ Sure notmarried | 95 26 | 121
HS+ Sure married | 635 41 | 676
----------------------+----------------------+----------
Complete | 3,778 864 | 4,642
Some additional consolidation is likely to be required, as a result of so few people who smoke with “earlier than HS” training have been married. There are solely 4 handled circumstances with “earlier than HS” training, not first child, and married; there are solely 3 handled circumstances with “earlier than HS” training, first child, and married. As I focus on in Completed and undone, how I mix the classes is essential to acquiring constant estimates. For this instance, I depart the classes as beforehand outlined and proceed to estimate the ATE by matching precisely on the covariates.
Instance 3: ATE estimated by actual matching on discrete covariates
. teffects nnmatch (bweight ) (mbsmoke), ematch(medu2 fbaby mmarried)
Therapy-effects estimation Variety of obs = 4,642
Estimator : nearest-neighbor matching Matches: requested = 1
Final result mannequin : matching min = 3
Distance metric: Mahalanobis max = 812
------------------------------------------------------------------------------
| AI Strong
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE |
mbsmoke |
(smoker |
vs |
nonsmoker) | -227.3809 26.99005 -8.42 0.000 -280.2804 -174.4813
------------------------------------------------------------------------------
Precise matching with alternative compares every handled case with the imply of the not-treated circumstances with the identical covariate sample, and it compares every not-treated case with the imply of the handled circumstances with the identical covariate sample. The imply of the case-level comparisons estimates the ATE.
RA estimates the ATE by the distinction between the averages of the expected values for the handled and not-treated circumstances. With absolutely interacted discrete covariates, the expected values are the result averages inside every covariate sample.
Instance 4 illustrates that actual matching with alternative produces the identical level estimates as RA with absolutely interacted discrete covariates.
Instance 4: ATE estimated by RA on discrete covariates
. regress bweight ibn.mbsmoke#ibn.case,
> noconstant vce(strong) vsquish
Linear regression Variety of obs = 4,642
F(32, 4610) = 5472.14
Prob > F = 0.0000
R-squared = 0.9731
Root MSE = 561.89
-------------------------------------------------------------------------------
| Strong
bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mbsmoke#case |
nonsmoker #|
earlier than HS .. | 3412.345 85.26789 40.02 0.000 3245.179 3579.511
nonsmoker #|
earlier than HS .. | 3382.048 64.77681 52.21 0.000 3255.054 3509.041
nonsmoker #|
earlier than HS .. | 3095.897 121.4719 25.49 0.000 2857.753 3334.04
nonsmoker #|
earlier than HS .. | 3213.588 108.5406 29.61 0.000 3000.797 3426.38
nonsmoker #|
in HS No n.. | 3219.255 66.9732 48.07 0.000 3087.955 3350.554
nonsmoker #|
in HS No m.. | 3454.434 57.21777 60.37 0.000 3342.26 3566.608
nonsmoker #|
in HS Sure .. | 3227.977 49.20252 65.61 0.000 3131.516 3324.437
nonsmoker #|
in HS Sure .. | 3467.286 95.52026 36.30 0.000 3280.02 3654.551
nonsmoker #|
HS No notm.. | 3327.249 45.20513 73.60 0.000 3238.625 3415.872
nonsmoker #|
HS No marr.. | 3498.307 20.41325 171.37 0.000 3458.288 3538.327
nonsmoker #|
HS Sure not.. | 3258.069 38.79208 83.99 0.000 3182.018 3334.12
nonsmoker #|
HS Sure mar.. | 3382.054 24.69261 136.97 0.000 3333.644 3430.463
nonsmoker #|
HS+ No not.. | 3227.597 80.73945 39.98 0.000 3069.309 3385.885
nonsmoker #|
HS+ No mar.. | 3514.036 18.78391 187.08 0.000 3477.21 3550.861
nonsmoker #|
HS+ Sure no.. | 3248.295 64.86602 50.08 0.000 3121.126 3375.463
nonsmoker #|
HS+ Sure ma.. | 3441.787 21.05667 163.45 0.000 3400.506 3483.069
smoker #|
earlier than HS .. | 3181.111 105.5454 30.14 0.000 2974.192 3388.031
smoker #|
earlier than HS .. | 3373.75 229.6108 14.69 0.000 2923.603 3823.897
smoker #|
earlier than HS .. | 2924.333 139.0673 21.03 0.000 2651.695 3196.972
smoker #|
earlier than HS .. | 2863.333 93.69532 30.56 0.000 2679.646 3047.021
smoker #|
in HS No n.. | 3038.68 59.37928 51.17 0.000 2922.268 3155.091
smoker #|
in HS No m.. | 3115.698 58.70879 53.07 0.000 3000.601 3230.795
smoker #|
in HS Sure .. | 3147.097 62.21084 50.59 0.000 3025.134 3269.06
smoker #|
in HS Sure .. | 3353.889 111.5621 30.06 0.000 3135.174 3572.604
smoker #|
HS No notm.. | 3061.437 60.37705 50.71 0.000 2943.069 3179.805
smoker #|
HS No marr.. | 3184.221 47.77988 66.64 0.000 3090.549 3277.892
smoker #|
HS Sure not.. | 3131.533 44.98026 69.62 0.000 3043.351 3219.716
smoker #|
HS Sure mar.. | 3199.174 63.82476 50.12 0.000 3074.047 3324.301
smoker #|
HS+ No not.. | 3002.36 89.60639 33.51 0.000 2826.689 3178.031
smoker #|
HS+ No mar.. | 3199.707 82.92361 38.59 0.000 3037.137 3362.277
smoker #|
HS+ Sure no.. | 3161.923 79.54319 39.75 0.000 3005.98 3317.866
smoker #|
HS+ Sure ma.. | 3271.293 90.92146 35.98 0.000 3093.043 3449.542
-------------------------------------------------------------------------------
. margins r.mbsmoke , vce(unconditional) distinction(nowald)
Contrasts of predictive margins
Expression : Linear prediction, predict()
------------------------------------------------------------------------
| Unconditional
| Distinction Std. Err. [95% Conf. Interval]
-----------------------+------------------------------------------------
mbsmoke |
(smoker vs nonsmoker) | -227.3809 26.82888 -279.9783 -174.7834
------------------------------------------------------------------------
The 32 parameters estimated by regress are the technique of the result for the 32 circumstances within the desk in instance 1. The usual errors reported by actual matching and RA are asymptotically equal however differ in finite samples.
The regression underlying RA with absolutely interacted discrete covariates is an interplay between the therapy issue with an interplay between all of the discrete covariates. Instance 5 illustrates that this regression produces the identical outcomes as instance 4.
Instance 5: RA estimated with interactions
. regress bweight ibn.mbsmoke#ibn.medu2#ibn.fbaby#ibn.mmarried,
> noconstant vce(strong) vsquish
Linear regression Variety of obs = 4,642
F(32, 4610) = 5472.14
Prob > F = 0.0000
R-squared = 0.9731
Root MSE = 561.89
------------------------------------------------------------------------------
| Strong
bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mbsmoke#|
medu2#fbaby#|
mmarried |
nonsmoker #|
earlier than HS #|
No #|
notmarried | 3412.345 85.26789 40.02 0.000 3245.179 3579.511
nonsmoker #|
earlier than HS #|
No #|
married | 3382.048 64.77681 52.21 0.000 3255.054 3509.041
nonsmoker #|
earlier than HS #|
Sure #|
notmarried | 3095.897 121.4719 25.49 0.000 2857.753 3334.04
nonsmoker #|
earlier than HS #|
Sure #|
married | 3213.588 108.5406 29.61 0.000 3000.797 3426.38
nonsmoker #|
in HS #|
No #|
notmarried | 3219.255 66.9732 48.07 0.000 3087.955 3350.554
nonsmoker #|
in HS #|
No #|
married | 3454.434 57.21777 60.37 0.000 3342.26 3566.608
nonsmoker #|
in HS #|
Sure #|
notmarried | 3227.977 49.20252 65.61 0.000 3131.516 3324.437
nonsmoker #|
in HS #|
Sure #|
married | 3467.286 95.52026 36.30 0.000 3280.02 3654.551
nonsmoker #|
HS #|
No #|
notmarried | 3327.249 45.20513 73.60 0.000 3238.625 3415.872
nonsmoker #|
HS #|
No #|
married | 3498.307 20.41325 171.37 0.000 3458.288 3538.327
nonsmoker #|
HS #|
Sure #|
notmarried | 3258.069 38.79208 83.99 0.000 3182.018 3334.12
nonsmoker #|
HS #|
Sure #|
married | 3382.054 24.69261 136.97 0.000 3333.644 3430.463
nonsmoker #|
HS+ #|
No #|
notmarried | 3227.597 80.73945 39.98 0.000 3069.309 3385.885
nonsmoker #|
HS+ #|
No #|
married | 3514.036 18.78391 187.08 0.000 3477.21 3550.861
nonsmoker #|
HS+ #|
Sure #|
notmarried | 3248.295 64.86602 50.08 0.000 3121.126 3375.463
nonsmoker #|
HS+ #|
Sure #|
married | 3441.787 21.05667 163.45 0.000 3400.506 3483.069
smoker #|
earlier than HS #|
No #|
notmarried | 3181.111 105.5454 30.14 0.000 2974.192 3388.031
smoker #|
earlier than HS #|
No #|
married | 3373.75 229.6108 14.69 0.000 2923.603 3823.897
smoker #|
earlier than HS #|
Sure #|
notmarried | 2924.333 139.0673 21.03 0.000 2651.695 3196.972
smoker #|
earlier than HS #|
Sure #|
married | 2863.333 93.69532 30.56 0.000 2679.646 3047.021
smoker #|
in HS #|
No #|
notmarried | 3038.68 59.37928 51.17 0.000 2922.268 3155.091
smoker #|
in HS #|
No #|
married | 3115.698 58.70879 53.07 0.000 3000.601 3230.795
smoker #|
in HS #|
Sure #|
notmarried | 3147.097 62.21084 50.59 0.000 3025.134 3269.06
smoker #|
in HS #|
Sure #|
married | 3353.889 111.5621 30.06 0.000 3135.174 3572.604
smoker #|
HS #|
No #|
notmarried | 3061.437 60.37705 50.71 0.000 2943.069 3179.805
smoker #|
HS #|
No #|
married | 3184.221 47.77988 66.64 0.000 3090.549 3277.892
smoker #|
HS #|
Sure #|
notmarried | 3131.533 44.98026 69.62 0.000 3043.351 3219.716
smoker #|
HS #|
Sure #|
married | 3199.174 63.82476 50.12 0.000 3074.047 3324.301
smoker #|
HS+ #|
No #|
notmarried | 3002.36 89.60639 33.51 0.000 2826.689 3178.031
smoker #|
HS+ #|
No #|
married | 3199.707 82.92361 38.59 0.000 3037.137 3362.277
smoker #|
HS+ #|
Sure #|
notmarried | 3161.923 79.54319 39.75 0.000 3005.98 3317.866
smoker #|
HS+ #|
Sure #|
married | 3271.293 90.92146 35.98 0.000 3093.043 3449.542
------------------------------------------------------------------------------
. margins r.mbsmoke , vce(unconditional) distinction(nowald)
Contrasts of predictive margins
Expression : Linear prediction, predict()
------------------------------------------------------------------------
| Unconditional
| Distinction Std. Err. [95% Conf. Interval]
-----------------------+------------------------------------------------
mbsmoke |
(smoker vs nonsmoker) | -227.3809 26.82888 -279.9783 -174.7834
------------------------------------------------------------------------
Lastly, I illustrate that teffects ra produces the identical level estimates.
Instance 6: RA estimated by teffects
. teffects ra (bweight bn.medu2#ibn.fbaby#ibn.mmarried, noconstant) (mbsmoke)
Iteration 0: EE criterion = 2.010e-25
Iteration 1: EE criterion = 5.818e-26
Therapy-effects estimation Variety of obs = 4,642
Estimator : regression adjustment
Final result mannequin : linear
Therapy mannequin: none
------------------------------------------------------------------------------
| Strong
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE |
mbsmoke |
(smoker |
vs |
nonsmoker) | -227.3809 26.73625 -8.50 0.000 -279.783 -174.9788
-------------+----------------------------------------------------------------
POmean |
mbsmoke |
nonsmoker | 3402.793 9.59059 354.81 0.000 3383.995 3421.59
------------------------------------------------------------------------------
The usual errors are asymptotically equal however differ in finite samples as a result of teffects does alter for the variety of parameters estimated within the regression, as regress does.
Completed and undone
I illustrated that actual matching on discrete covariates is similar as RA with absolutely interacted discrete covariates. Key to each strategies is that the covariates are the truth is discrete. If some collapsing of classes is carried out as above, or if a discrete covariate is fashioned by slicing up a steady covariate, all the outcomes require that this combining step be carried out accurately.
Precise matching on discrete covariates and RA with absolutely interacted discrete covariates carry out the identical nonparametric estimation. Collapsing classes or slicing up discrete covariates performs the identical operate as a bandwidth in nonparametric kernel regression; it determines which observations are comparable with one another. Simply as with kernel regression, the bandwidth have to be correctly chosen to acquire constant estimates.
References
Cattaneo, M. 2010. Effcient semiparametric estimation of multi-valued therapy results beneath ignorability. Journal of Econometrics 155: 138–154.
Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Knowledge. 2nd ed. Cambridge, Massachusetts: MIT Press.
