Do you ever match regressions of the shape
ln(yj) = b0 + b1x1j + b2x2j + … + bokxkj + εj
by typing
. generate lny = ln(y)
. regress lny x1 x2 … xk
The above is simply an extraordinary linear regression besides that ln(y) seems on the left-hand aspect rather than y.
The following time it’s good to match such a mannequin, slightly than becoming a regression on ln(y), take into account typing
. poisson y x1 x2 … xk, vce(sturdy)
which is to say, match as a substitute a mannequin of the shape
yj = exp(b0 + b1x1j + b2x2j + … + bokxkj + εj)
Wait, you’re in all probability pondering. Poisson regression assumes the variance is the same as the imply,
E(yj) = Var(yj) = exp(b0 + b1x1j + b2x2j + … + bokxkj)
whereas linear regression merely assumes E(ln(yj)) = b0 + b1x1j + b2x2j + … + bokxkj and locations no constraint on the variance. Really regression does assume the variance is fixed however since we’re working the logs, that quantities to assuming that Var(yj) is proportional to yj, which is affordable in lots of circumstances and could be relaxed should you specify vce(sturdy).
In any case, in a Poisson course of, the imply is the same as the variance. In case your aim is to suit one thing like a Mincer earnings mannequin,
ln(earningsj) = b0 + b1*trainingj + b2*expertisej + b3*expertisej2 + εj
there may be merely no motive to suppose that the the variance of the log of earnings is the same as its imply. If an individual has an anticipated earnings of $45,000, there is no such thing as a motive to suppose that the variance round that imply is 45,000, which is to say, the usual deviation is $212.13. Certainly, it will be absurd to suppose one might predict earnings so precisely based mostly solely on years of education and job expertise.
Nonetheless, I counsel you match this mannequin utilizing Poisson regression slightly than linear regression. It seems that the estimated coefficients of the maximum-likelihood Poisson estimator on no account rely on the idea that E(yj) = Var(yj), so even when the idea is violated, the estimates of the coefficients b0, b1, …, bok are unaffected. Within the maximum-likelihood estimator for Poisson, what does rely on the idea that E(yj) = Var(yj) are the estimated customary errors of the coefficients b0, b1, …, bok. If the E(yj) = Var(yj) assumption is violated, the reported customary errors are ineffective. I didn’t counsel, nevertheless, that you simply kind
. poisson y x1 x2 … xk
I prompt that you simply kind
. poisson y x1 x2 … xk, vce(sturdy)
That’s, I prompt that you simply specify that the variance-covariance matrix of the estimates (of which the usual errors are the sq. root of the diagonal) be estimated utilizing the Huber/White/Sandwich linearized estimator. That estimator of the variance-covariance matrix doesn’t assume E(yj) = Var(yj), nor does it even require that Var(yj) be fixed throughout j. Thus, Poisson regression with the Huber/White/Sandwich linearized estimator of variance is a permissible different to log linear regression — which I’m about to indicate you — after which I’m going to inform you why it’s higher.
I’ve created simulated information wherein
yj = exp(8.5172 + 0.06*educj + 0.1*expj – 0.002*expj2 + εj)
the place εj is distributed regular with imply 0 and variance 1.083 (customary deviation 1.041). Right here’s the results of estimation utilizing regress:
. regress lny educ exp exp2
Supply | SS df MS Variety of obs = 5000
-------------+------------------------------ F( 3, 4996) = 44.72
Mannequin | 141.437342 3 47.1457806 Prob > F = 0.0000
Residual | 5267.33405 4996 1.05431026 R-squared = 0.0261
-------------+------------------------------ Adj R-squared = 0.0256
Whole | 5408.77139 4999 1.08197067 Root MSE = 1.0268
------------------------------------------------------------------------------
lny | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0716126 .0099511 7.20 0.000 .052104 .0911212
exp | .1091811 .0129334 8.44 0.000 .0838261 .1345362
exp2 | -.0022044 .0002893 -7.62 0.000 -.0027716 -.0016373
_cons | 8.272475 .1855614 44.58 0.000 7.908693 8.636257
------------------------------------------------------------------------------
I deliberately created these information to supply a low R-squared.
We obtained the next outcomes:
reality est. S.E.
----------------------------------
educ 0.0600 0.0716 0.0100
exp 0.1000 0.1092 0.0129
exp2 -0.0020 -0.0022 0.0003
-----------------------------------
_cons 8.5172 8.2725 0.1856 <- unadjusted (1)
9.0587 8.7959 ? <- adjusted (2)
-----------------------------------
(1) For use for predicting E(ln(yj))
(2) For use for predicting E(yj)
Observe that the estimated coefficients are fairly near the true values. Ordinarily, we’d not know the true values, besides I created this synthetic dataset and people are the values I used.
For the intercept, I checklist two values, so I would like to clarify. We estimated a linear regression of the shape,
ln(yj) = b0 + Xjb + εj
As with all linear regressions,
E(ln(yj)) = E(b0 + Xjb + εj)
= b0 + Xjb + E(εj)
= b0 + Xjb
We, nevertheless, don’t have any actual curiosity in E(ln(yj)). We match this log regression as a means of acquiring estimates of our actual mannequin, specifically
yj = exp(b0 + Xjb + εj)
So slightly than taking the expectation of ln(yj), lets take the expectation of yj:
E(yj) = E(exp(b0 + Xjb + εj))
= E(exp(b0 + Xjb) * exp(εj))
= exp(b0 + Xjb) * E(exp(εj))
E(exp(εj)) will not be one. E(exp(εj)) for εj distributed N(0, σ2) is exp(σ2/2). We thus acquire
E(yj) = exp(b0 + Xjb) * exp(σ2/2)
Individuals who match log regressions learn about this — or ought to — and know that to acquire predicted yj values, they have to
-
Acquire predicted values for ln(yj) = b0 + Xjb.
-
Exponentiate the anticipated log values.
-
Multiply these exponentiated values by exp(σ2/2), the place σ2 is the sq. of the root-mean-square-error (RMSE) of the regression.
They do on this in Stata by typing
. predict yhat
. exchange yhat = exp(yhat).
. exchange yhat = yhat*exp(e(rmse)^2/2)
Within the desk I that simply confirmed you,
reality est. S.E.
----------------------------------
educ 0.0600 0.0716 0.0100
exp 0.1000 0.1092 0.0129
exp2 -0.0020 -0.0022 0.0003
-----------------------------------
_cons 8.5172 8.2725 0.1856 <- unadjusted (1)
9.0587 8.7959 ? <- adjusted (2)
-----------------------------------
(1) For use for predicting E(ln(yj))
(2) For use for predicting E(yj)
I’m setting us as much as examine these estimates with these produced by poisson. After we estimate utilizing poisson, we won’t have to take logs as a result of the Poisson mannequin is said when it comes to yj, not ln(yj). In prepartion for that, I’ve included two traces for the intercept — 8.5172, which is the intercept reported by regress and is the one acceptable for making predictions of ln(y) — and 9.0587, an intercept acceptable for making predictions of y and equal to eight.5172 plus σ2/2. Poisson regression will estimate the 9.0587 consequence as a result of Poisson is said when it comes to y slightly than ln(y).
I positioned a query mark within the column for the usual error of the adjusted intercept as a result of, to calculate that, I would wish to know the usual error of the estimated RMSE, and regress doesn’t calculate that.
Let’s now take a look at the outcomes that poisson with possibility vce(sturdy) experiences. We should not neglect to specify possibility vce(sturdy) as a result of in any other case, on this mannequin that violates the Poisson assumption that E(yj) = Var(yj), we’d acquire incorrect customary errors.
. poisson y educ exp exp2, vce(sturdy)
word: you're liable for interpretation of noncount dep. variable
Iteration 0: log pseudolikelihood = -1.484e+08
Iteration 1: log pseudolikelihood = -1.484e+08
Iteration 2: log pseudolikelihood = -1.484e+08
Poisson regression Variety of obs = 5000
Wald chi2(3) = 67.52
Prob > chi2 = 0.0000
Log pseudolikelihood = -1.484e+08 Pseudo R2 = 0.0183
------------------------------------------------------------------------------
| Strong
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0575636 .0127996 4.50 0.000 .0324769 .0826504
exp | .1074603 .0163766 6.56 0.000 .0753628 .1395578
exp2 | -.0022204 .0003604 -6.16 0.000 -.0029267 -.0015141
_cons | 9.016428 .2359002 38.22 0.000 8.554072 9.478784
------------------------------------------------------------------------------
So now we will fill in the remainder of our desk:
regress poisson
reality est. S.E. est. S.E.
-----------------------------------------------------
educ 0.0600 0.0716 0.0100 0.0576 0.1280
exp 0.1000 0.1092 0.0129 0.1075 0.0164
exp2 -0.0020 -0.0022 0.0003 -0.0022 0.0003
------------------------------------------------------
_cons 8.5172 8.2725 0.1856 ? ? <- (1)
9.0587 8.7959 ? 9.0164 0.2359 <- (2)
------------------------------------------------------
(1) For use for predicting E(ln(yj))
(2) For use for predicting E(yj)
I advised you that Poisson works, and on this case, it really works effectively. I’ll now inform you that in all circumstances it really works effectively, and it really works higher than log regression. You wish to take into consideration Poisson regression with the vce(sturdy) possibility as a greater different to log regression.
How is Poisson higher?
First off, Poisson handles outcomes which are zero. Log regression doesn’t as a result of ln(0) is -∞. You wish to watch out about what it means to deal with zeros, nevertheless. Poisson handles zeros that come up in correspondence to the mannequin. Within the Poisson mannequin, all people participates within the yj = exp(b0 + Xjb + εj) course of. Poisson regression doesn’t deal with circumstances the place some take part and others don’t, and amongst those that don’t, had they participated, would seemingly produce an consequence larger than zero. I’d by no means counsel utilizing Poisson regression to deal with zeros in an earned earnings mannequin as a result of people who earned zero merely didn’t take part within the labor pressure. Had they participated, their earnings may need been low, however actually they might have been larger than zero. Log linear regression doesn’t deal with that drawback, both.
Pure zeros do come up in different conditions, nevertheless, and a preferred query on Statalist is whether or not one ought to recode these pure zeros as 0.01, 0.0001, or 0.0000001 to keep away from the lacking values when utilizing log linear regression. The reply is that you shouldn’t recode in any respect; it’s best to use Poisson regression with vce(sturdy).
Secondly, small nonzero values, nevertheless they come up, could be influential in log-linear regressions. 0.01, 0.0001, 0.0000001, and 0 could also be shut to one another, however within the logs they’re -4.61, -9.21, -16.12, and -∞ and thus not shut in any respect. Pretending that the values are shut could be the identical as pretending that that exp(4.61)=100, exp(9.21)=9,997, exp(16.12)=10,019,062, and exp(∞)=∞ are shut to one another. Poisson regression understands that 0.01, 0.0001, 0.0000001, and 0 are certainly practically equal.
Thirdly, when estimating with Poisson, you don’t have to recollect to use the exp(σ2/2) multiplicative adjustment to rework outcomes from ln(y) to y. I wrote earlier that individuals who match log regressions after all bear in mind to use the adjustment, however the unhappy truth is that they don’t.
Lastly, I want to inform you that everybody who estimates log fashions is aware of in regards to the Poisson-regression different and it is just you who’ve been out to lunch. You, nevertheless, are in esteemed firm. On the current Stata Convention in Chicago, I requested a gaggle of educated researchers a loaded query, to which the correct reply was Poisson regression with possibility vce(sturdy), however they principally received it incorrect.
I stated to them, “I’ve a course of for which it’s completely affordable to imagine that the imply of yj is given by exp(b0 + Xjb), however I’ve no motive to consider that E(yj) = Var(yj), which is to say, no motive to suspect that the method is Poisson. How would you counsel I estimate the mannequin?” Definitely not utilizing Poisson, they replied. Social scientists prompt I exploit log regression. Biostatisticians and well being researchers prompt I exploit damaging binomial regression even once I objected that the method was not the gamma combination of Poissons that damaging binomial regression assumes. “What else are you able to do?” they stated and shrugged their collective shoulders. And naturally, they only assumed over dispersion.
Primarily based on these solutions, I used to be prepared to write down this weblog entry, however it turned out in a different way than I anticipated. I used to be going to slam damaging binomial regression. Detrimental binomial regression makes assumptions in regards to the variance, assumptions completely different from that made by Poisson, however assumptions nonetheless, and in contrast to the idea made in Poisson, these assumptions do seem within the first-order circumstances that decide the fitted coefficients that damaging binomial regression experiences. Not solely would damaging binomial’s customary errors be incorrect — which vce(sturdy) might repair — however the coefficients could be biased, too, and vce(sturdy) wouldn’t repair that. I deliberate to run simulations displaying this.
Once I ran the simulations, I used to be stunned by the outcomes. The damaging binomial estimator (Stata’s nbreg) was remarkably sturdy to violations in variance assumptions so long as the info had been overdispersed. In reality, damaging binomial regression did about in addition to Poisson regression. I didn’t run sufficient simulations to make generalizations, and idea tells me these generalizations should favor Poisson, however the simulations prompt that if Poisson does do higher, it’s not within the first 4 decimal locations. I used to be impressed. And upset. It could have been a dynamite weblog entry.
So that you’ll should content material your self with this one.
Others have preceeded me within the information that Poisson regression with vce(sturdy) is a greater different to log-linear regression. I direct you to Jeffery Wooldridge, Econometric Evaluation of Cross Part and Panel Knowledge, 2nd ed., chapter 18. Or see A. Colin Cameron and Pravin Okay. Trivedi, Microeconomics Utilizing Stata, revised version, chapter 17.3.2.
I first realized about this from a chat given by Austin Nichols, Regression for nonnegative skewed dependent variables, given in 2010 on the Stata Convention in Boston. That discuss goes far past what I’ve introduced right here, and I heartily suggest it.