Monday, March 16, 2026
Home Blog

Patrick Boyle lays out the ugly particulars of Elon Musk’s final rip-off (maybe in each senses of the phrase).

0


From the video description:

SpaceX is concentrating on a $1.75 trillion valuation for what might be the
largest IPO in historical past. On this video, we look at how Elon Musk is
folding a money-burning AI startup and a struggling social media
platform right into a rocket firm to justify a price ticket that defies
monetary gravity. From the engineering absurdity of “orbital information
facilities” and lunar railguns to the structural manipulation of the Nasdaq
100, we discover how low-float methods and “fast-track” index
inclusion guidelines are getting used to show passive 401(okay) traders into
exit liquidity for insiders. We have a look at the hole between EBITDA “vibes”
and GAAP actuality and the pivot from Mars to the Moon.

 

It is a barely drier remedy than we’ve come to anticipate from Boyle. There aren’t any laugh-out-loud traces or prolonged bits of beautiful Irish deadpan sarcasm. Boyle means enterprise right here, and his presentation is devastating.

Managing to be concise and but in-depth, Boyle lays out the legal absurdity of your entire factor: from the silliness of capturing information facilities into orbit, to SpaceX’s suspect profitability, to the way in which Musk used the rubber-stamp board of administrators to bail out his disastrous synthetic intelligence firm (at present shedding a billion {dollars} a month), which he had beforehand used to bail out his much more disastrous buy of Twitter (all of which echoed his first nice rip-off and the idea of his fortune, leaving Tesla traders holding the bag for Photo voltaic Metropolis, however we actually do not have time to dive that deep), to the assorted schemes and misrepresentations he’s at present using to get a trillion-dollar-plus market cap for SpaceX earlier than the credit score bubble and/or the bogus intelligence bubble pop.

The video is just too pithy to summarize—simply watch the rattling factor—however I did need to spotlight this one half.

Video transcript cleaned up by ChatGPT:

This all leads us to the query of how Elon Musk truly expects to get that $1.75 trillion price ticket. The reply isn’t discovered within the physics of rockets, however as a substitute within the physics of the inventory market.

Crucial factor to grasp about this IPO is that SpaceX isn’t promoting the entire firm to the general public. They’re possible solely releasing a tiny float. The monetary press is suggesting that simply 5 to 10% of the entire shares is likely to be issued. By retaining the provision of inventory artificially low whereas advertising and marketing it to each retail investor on the planet, this might create a provide squeeze earlier than the opening bell even rings.

When everybody needs a bit of the longer term however there are only some shares to go round, the value has just one solution to go — and that’s up.

Craig Coben wrote about this low-float technique a number of years in the past within the Monetary Occasions, the place he identified that lots of latest tech IPOs had floated solely a tiny variety of shares — usually between 7 and 10%, in comparison with the historic common of 20%. He used the instance of Instacart, the place solely 8% of the shares have been floated, and of these 60% went to cornerstone traders and 5% to family and friends. This left a tiny provide of shares to be traded, presumably to domesticate a way of shortage.

The actual magic for SpaceX may occur with the index-inclusion entice. This isn’t only a fortunate break for Elon Musk — it’s reported to be pushed by a particular demand from him. Based on Reuters, SpaceX made early inclusion within the NASDAQ-100 index a mandatory situation for agreeing to listing on the change. In response, the change is consulting on a brand new rule that may permit SpaceX to affix the NASDAQ-100 after simply 15 buying and selling days, bypassing the 12 months of worth discovery required for each different firm.

And it isn’t simply the NASDAQ both. Bloomberg stories that S&P Dow Jones Indices can also be contemplating historic rule modifications to fast-track SpaceX into the S&P 500.

That is important as a result of there are roughly $24 trillion tied to that index. Perhaps Elon can get on the cellphone to the individuals at Russell to see if he can get SpaceX labeled as a worth inventory too.

If SpaceX is added to the S&P 500 and the NASDAQ-100, each passive investor will discover themselves shopping for the inventory at no matter worth the provision squeeze has manufactured.

As veteran fund supervisor George Noble factors out, the proposed five-times float multiplier used to find out its NASDAQ index weighting is “shameless structural manipulation.” It successfully turns each pension fund into exit liquidity for SpaceX insiders, the place your retirement account is being pressured to purchase right into a bubble designed by the vendor.

As Noble argues, the principles are being rewritten to profit IPO issuers and early-stage insiders, and your capital is the software getting used to counterpoint them.

We’ve seen this film earlier than. In December 2020, Tesla was added to the S&P 500 after a large run-up in worth. Because the day of inclusion, Tesla has underperformed the broader index by greater than 20%. The passive traders who have been pressured to purchase in at an enormous valuation proper earlier than gross sales began drying up turned the bag holders, whereas the early insiders took their victory laps.

SpaceX could also be a rare engineering firm, nevertheless it’s being offered to the general public as a monetary miracle at a worth that, to me, may be very troublesome to elucidate.

This might be Musk’s final rip-off by way of scale — the potential payout right here makes the thoughts reel — but in addition in phrases being his final probability to money in on this degree. There are indicators (spelled out by Boyle and already acquainted to anybody who has been following the story) that issues are beginning to dry up, Add to that the possible scrutiny of congressional hearings if the Democrats retake one or each homes in November. Keep in mind Musk apparently immigrated right here illegally, was already pressured to step down as chairman of Tesla because of fraud prices, is actively funding far proper extremists world wide, has in depth ties to Epstein, and has an AI firm greatest recognized for  creating nonconsensual, sexualized pictures of undressed ladies and youngsters. Retail traders will be the dumbest of dumb cash, however presumably even they are not silly sufficient to wager on an aerospace firm that may’t get any authorities contracts. 

Introduction to therapy results in Stata: Half 1

0


This put up was written collectively with David Drukker, Director of Econometrics, StataCorp.

The subject for right this moment is the treatment-effects options in Stata.

Remedy-effects estimators estimate the causal impact of a therapy on an consequence based mostly on observational information.

In right this moment’s posting, we are going to focus on 4 treatment-effects estimators:

  1. RA: Regression adjustment
  2. IPW: Inverse chance weighting
  3. IPWRA: Inverse chance weighting with regression adjustment
  4. AIPW: Augmented inverse chance weighting

We’ll save the matching estimators for half 2.

We must always be aware that nothing about treatment-effects estimators magically extracts causal relationships. As with all regression evaluation of observational information, the causal interpretation have to be based mostly on an inexpensive underlying scientific rationale.

Introduction

We’re going to focus on remedies and outcomes.

A therapy may very well be a brand new drug and the end result blood stress or levels of cholesterol. A therapy may very well be a surgical process and the end result affected person mobility. A therapy may very well be a job coaching program and the end result employment or wages. A therapy may even be an advert marketing campaign designed to extend the gross sales of a product.

Take into account whether or not a mom’s smoking impacts the load of her child at beginning. Questions like this one can solely be answered utilizing observational information. Experiments could be unethical.

The issue with observational information is that the themes select whether or not to get the therapy. For instance, a mom decides to smoke or to not smoke. The themes are mentioned to have self-selected into the handled and untreated teams.

In a super world, we’d design an experiment to check cause-and-effect and treatment-and-outcome relationships. We might randomly assign topics to the handled or untreated teams. Randomly assigning the therapy ensures that the therapy is impartial of the end result, which significantly simplifies the evaluation.

Causal inference requires the estimation of the unconditional technique of the outcomes for every therapy degree. We solely observe the end result of every topic conditional on the acquired therapy no matter whether or not the info are observational or experimental. For experimental information, random project of the therapy ensures that the therapy is impartial of the end result; so averages of the outcomes conditional on noticed therapy estimate the unconditional technique of curiosity. For observational information, we mannequin the therapy project course of. If our mannequin is appropriate, the therapy project course of is taken into account nearly as good as random conditional on the covariates in our mannequin.

Let’s contemplate an instance. Determine 1 is a scatterplot of observational information much like these utilized by Cattaneo (2010). The therapy variable is the mom’s smoking standing throughout being pregnant, and the end result is the birthweight of her child.

The crimson factors symbolize the moms who smoked throughout being pregnant, whereas the inexperienced factors symbolize the moms who didn’t. The moms themselves selected whether or not to smoke, and that complicates the evaluation.

We can’t estimate the impact of smoking on birthweight by evaluating the imply birthweights of infants of moms who did and didn’t smoke. Why not? Look once more at our graph. Older moms are likely to have heavier infants no matter whether or not they smoked whereas pregnant. In these information, older moms have been additionally extra prone to be people who smoke. Thus, mom’s age is said to each therapy standing and consequence. So how ought to we proceed?

RA: The regression adjustment estimator

RA estimators mannequin the end result to account for the nonrandom therapy project.

We would ask, “How would the outcomes have modified had the moms who smoked chosen to not smoke?” or “How would the outcomes have modified had the moms who didn’t smoke chosen to smoke?”. If we knew the solutions to those counterfactual questions, evaluation could be straightforward: we’d simply subtract the noticed outcomes from the counterfactual outcomes.

The counterfactual outcomes are known as unobserved potential outcomes within the treatment-effects literature. Typically the phrase unobserved is dropped.

We are able to assemble measurements of those unobserved potential outcomes, and our information may appear like this:

Figure2

In determine 2, the noticed information are proven utilizing stable factors and the unobserved potential outcomes are proven utilizing hole factors. The hole crimson factors symbolize the potential outcomes for the people who smoke had they not smoked. The hole inexperienced factors symbolize the potential outcomes for the nonsmokers had they smoked.

We are able to estimate the unobserved potential outcomes then by becoming separate linear regression fashions with the noticed information (stable factors) to the 2 therapy teams.

Figure3

In determine 3, we’ve one regression line for nonsmokers (the inexperienced line) and a separate regression line for people who smoke (the crimson line).

Let’s perceive what the 2 strains imply:

Figure4

The inexperienced level on the left in determine 4, labeled Noticed, is an commentary for a mom who didn’t smoke. The purpose labeled E(y0) on the inexperienced regression line is the anticipated birthweight of the infant given the mom’s age and that she didn’t smoke. The purpose labeled E(y1) on the crimson regression line is the anticipated birthweight of the infant for a similar mom had she smoked.

The distinction between these expectations estimates the covariate-specific therapy impact for individuals who didn’t get the therapy.

Now, let’s take a look at the opposite counterfactual query.

The crimson level on the best in determine 4, labeled Noticed in crimson, is an commentary for a mom who smoked throughout being pregnant. The factors on the inexperienced and crimson regression strains once more symbolize the anticipated birthweights — the potential outcomes — of the mom’s child underneath the 2 therapy situations.

The distinction between these expectations estimates the covariate-specific therapy impact for individuals who acquired the therapy.

Be aware that we estimate a median therapy impact (ATE), conditional on covariate values, for every topic. Moreover, we estimate this impact for every topic, no matter which therapy was truly acquired. Averages of those results over all the themes within the information estimate the ATE.

We may additionally use determine 4 to encourage a prediction of the end result that every topic would receive for every therapy degree, whatever the therapy recieved. The story is analogous to the one above. Averages of those predictions over all the themes within the information estimate the potential-outcome means (POMs) for every therapy degree.

It’s reassuring that variations within the estimated POMs is similar estimate of the ATE mentioned above.

The ATE on the handled (ATET) is just like the ATE, however it makes use of solely the themes who have been noticed within the therapy group. This method to calculating therapy results is named regression adjustment (RA).

Let’s open a dataset and do that utilizing Stata.


. webuse cattaneo2.dta, clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

To estimate the POMs within the two therapy teams, we sort

. teffects ra (bweight mage) (mbsmoke), pomeans

We specify the end result mannequin within the first set of parentheses with the end result variable adopted by its covariates. On this instance, the end result variable is bweight and the one covariate is mage.

We specify the therapy mannequin — merely the therapy variable — within the second set of parentheses. On this instance, we specify solely the therapy variable mbsmoke. We’ll discuss covariates within the subsequent part.

The results of typing the command is


. teffects ra (bweight mage) (mbsmoke), pomeans

Iteration 0:   EE criterion =  7.878e-24
Iteration 1:   EE criterion =  8.468e-26

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : regression adjustment
Consequence mannequin  : linear
Remedy mannequin: none
------------------------------------------------------------------------------
             |               Strong
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3409.435   9.294101   366.84   0.000     3391.219    3427.651
     smoker  |   3132.374   20.61936   151.91   0.000     3091.961    3172.787
------------------------------------------------------------------------------

The output experiences that the common birthweight could be 3,132 grams if all moms smoked and three,409 grams if no mom smoked.

We are able to estimate the ATE of smoking on birthweight by subtracting the POMs: 3132.374 – 3409.435 = -277.061. Or we are able to reissue our teffects ra command with the ate possibility and get customary errors and confidence intervals:


. teffects ra (bweight mage) (mbsmoke), ate

Iteration 0:   EE criterion =  7.878e-24
Iteration 1:   EE criterion =  5.185e-26

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : regression adjustment
Consequence mannequin  : linear
Remedy mannequin: none
-------------------------------------------------------------------------------
              |               Strong   
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
ATE           |        
      mbsmoke |
(smoker vs    |        
  nonsmoker)  |  -277.0611   22.62844   -12.24   0.000    -321.4121   -232.7102
--------------+----------------------------------------------------------------
POmean        |        
      mbsmoke |
   nonsmoker  |   3409.435   9.294101   366.84   0.000     3391.219    3427.651
-------------------------------------------------------------------------------

The output experiences the identical ATE we calculated by hand: -277.061. The ATE is the common of the variations between the birthweights when every mom smokes and the birthweights when no mom smokes.

We are able to additionally estimate the ATET through the use of the teffects ra command with possibility atet, however we is not going to accomplish that right here.

IPW: The inverse chance weighting estimator

RA estimators mannequin the end result to account for the nonrandom therapy project. Some researchers desire to mannequin the therapy project course of and never specify a mannequin for the end result.

We all know that people who smoke are typically older than nonsmokers in our information. We additionally hypothesize that mom’s age straight impacts birthweight. We noticed this in determine 1, which we present once more under.

Figure1

This determine reveals that therapy project depends upon mom’s age. We wish to have a way of adjusting for this dependence. Particularly, we want we had extra upper-age inexperienced factors and lower-age crimson factors. If we did, the imply birthweight for every group would change. We don’t understand how that will have an effect on the distinction in means, however we do know it will be a greater estimate of the distinction.

To attain an analogous outcome, we’re going to weight people who smoke within the lower-age vary and nonsmokers within the upper-age vary extra closely, and weight people who smoke within the upper-age vary and nonsmokers within the lower-age vary much less closely.

We’ll match a probit or logit mannequin of the shape

Pr(lady smokes) = F(a + b*age)

teffects makes use of logit by default, however we are going to specify the probit possibility for illustration.

As soon as we’ve match that mannequin, we are able to receive the prediction Pr(lady smokes) for every commentary within the information; we’ll name this pi. Then, in making our POMs calculations — which is only a imply calculation — we are going to use these chances to weight the observations. We’ll weight observations on people who smoke by 1/pi in order that weights shall be giant when the chance of being a smoker is small. We’ll weight observations on nonsmokers by 1/(1-pi) in order that weights shall be giant when the chance of being a nonsmoker is small.

That ends in the next graph changing determine 1:

Figure5

In determine 5, bigger circles point out bigger weights.

To estimate the POMs with this IPW estimator, we are able to sort


. teffects ipw (bweight) (mbsmoke mage, probit), pomeans

The primary set of parentheses specifies the end result mannequin, which is just the end result variable on this case; there are not any covariates. The second set of parentheses specifies the therapy mannequin, which incorporates the end result variable (mbsmoke) adopted by covariates (on this case, simply mage) and the form of mannequin (probit).

The result’s


. teffects ipw (bweight) (mbsmoke mage, probit), pomeans

Iteration 0:   EE criterion =  3.615e-15
Iteration 1:   EE criterion =  4.381e-25

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : inverse-probability weights
Consequence mannequin  : weighted imply
Remedy mannequin: probit
------------------------------------------------------------------------------
             |               Strong
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3408.979   9.307838   366.25   0.000     3390.736    3427.222
     smoker  |   3133.479   20.66762   151.61   0.000     3092.971    3173.986
------------------------------------------------------------------------------

Our output experiences that the common birthweight could be 3,133 grams if all of the moms smoked and three,409 grams if not one of the moms smoked.

This time, the ATE is -275.5, and if we typed


. teffects ipw (bweight) (mbsmoke mage, probit), ate
(Output omitted)

we’d be taught that the usual error is 22.68 and the 95% confidence interval is [-319.9,231.0].

Simply as with teffects ra, if we needed ATET, we may specify the teffects ipw command with the atet possibility.

IPWRA: The IPW with regression adjustment estimator

RA estimators mannequin the end result to account for the nonrandom therapy project. IPW estimators mannequin the therapy to account for the nonrandom therapy project. IPWRA estimators mannequin each the end result and the therapy to account for the nonrandom therapy project.

IPWRA makes use of IPW weights to estimate corrected regression coefficients which can be subsequently used to carry out regression adjustment.

The covariates within the consequence mannequin and the therapy mannequin would not have to be the identical, and so they typically should not as a result of the variables that affect a topic’s choice of therapy group are sometimes completely different from the variables related to the end result. The IPWRA estimator has the double-robust property, which signifies that the estimates of the consequences shall be constant if both the therapy mannequin or the end result mannequin — however not each — are misspecified.

Let’s contemplate a state of affairs with extra complicated consequence and therapy fashions however nonetheless utilizing our low-birthweight information.

The result mannequin will embody

  1. mage: the mom’s age
  2. prenatal1: an indicator for prenatal go to throughout the first trimester
  3. mmarried: an indicator for marital standing of the mom
  4. fbaby: an indicator for being first born

The therapy mannequin will embody

  1. all of the covariates of the end result mannequin
  2. mage^2
  3. medu: years of maternal schooling

We will even specify the aequations choice to report the coefficients of the end result and therapy fashions.


. teffects ipwra (bweight mage prenatal1 mmarried fbaby)                ///
                 (mbsmoke mmarried c.mage##c.mage fbaby medu, probit)   ///
                 , pomeans aequations

Iteration 0:   EE criterion =  1.001e-20
Iteration 1:   EE criterion =  1.134e-25

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : IPW regression adjustment
Consequence mannequin  : linear
Remedy mannequin: probit
-------------------------------------------------------------------------------
              |               Strong
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
POmeans       |
      mbsmoke |
   nonsmoker  |   3403.336    9.57126   355.58   0.000     3384.576    3422.095
      smoker  |   3173.369   24.86997   127.60   0.000     3124.624    3222.113
--------------+----------------------------------------------------------------
OME0          |
         mage |   2.893051   2.134788     1.36   0.175    -1.291056    7.077158
    prenatal1 |   67.98549   28.78428     2.36   0.018     11.56933    124.4017
     mmarried |   155.5893   26.46903     5.88   0.000      103.711    207.4677
        fbaby |   -71.9215   20.39317    -3.53   0.000    -111.8914   -31.95162
        _cons |   3194.808   55.04911    58.04   0.000     3086.913    3302.702
--------------+----------------------------------------------------------------
OME1          |
         mage |  -5.068833   5.954425    -0.85   0.395    -16.73929    6.601626
    prenatal1 |   34.76923   43.18534     0.81   0.421    -49.87248    119.4109
     mmarried |   124.0941   40.29775     3.08   0.002     45.11193    203.0762
        fbaby |   39.89692   56.82072     0.70   0.483    -71.46966    151.2635
        _cons |   3175.551   153.8312    20.64   0.000     2874.047    3477.054
--------------+----------------------------------------------------------------
TME1          |
     mmarried |  -.6484821   .0554173   -11.70   0.000     -.757098   -.5398663
         mage |   .1744327   .0363718     4.80   0.000     .1031452    .2457202
              |
c.mage#c.mage |  -.0032559   .0006678    -4.88   0.000    -.0045647   -.0019471
              |
        fbaby |  -.2175962   .0495604    -4.39   0.000    -.3147328   -.1204595
         medu |  -.0863631   .0100148    -8.62   0.000    -.1059917   -.0667345
        _cons |  -1.558255   .4639691    -3.36   0.001    -2.467618   -.6488926
-------------------------------------------------------------------------------

The POmeans part of the output shows the POMs for the 2 therapy teams. The ATE is now calculated to be 3173.369 – 3403.336 = -229.967.

The OME0 and OME1 sections show the RA coefficients for the untreated and handled teams, respectively.

The TME1 part of the output shows the coefficients for the probit therapy mannequin.

Simply as within the two earlier instances, if we needed the ATE with customary errors, and many others., we’d specify the ate possibility. If we needed ATET, we’d specify the atet possibility.

AIPW: The augmented IPW estimator

IPWRA estimators mannequin each the end result and the therapy to account for the nonrandom therapy project. So do AIPW estimators.

The AIPW estimator provides a bias-correction time period to the IPW estimator. If the therapy mannequin is accurately specified, the bias-correction time period is 0 and the mannequin is lowered to the IPW estimator. If the therapy mannequin is misspecified however the consequence mannequin is accurately specified, the bias-correction time period corrects the estimator. Thus, the bias-correction time period offers the AIPW estimator the identical double-robust property because the IPWRA estimator.

The syntax and output for the AIPW estimator is nearly similar to that for the IPWRA estimator.


. teffects aipw (bweight mage prenatal1 mmarried fbaby)                 ///
                (mbsmoke mmarried c.mage##c.mage fbaby medu, probit)    ///
                , pomeans aequations

Iteration 0:   EE criterion =  4.632e-21
Iteration 1:   EE criterion =  5.810e-26

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : augmented IPW
Consequence mannequin  : linear by ML
Remedy mannequin: probit
-------------------------------------------------------------------------------
              |               Strong
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
POmeans       |
      mbsmoke |
   nonsmoker  |   3403.355   9.568472   355.68   0.000     3384.601    3422.109
      smoker  |   3172.366   24.42456   129.88   0.000     3124.495    3220.237
--------------+----------------------------------------------------------------
OME0          |
         mage |   2.546828   2.084324     1.22   0.222    -1.538373    6.632028
    prenatal1 |   64.40859   27.52699     2.34   0.019     10.45669    118.3605
     mmarried |   160.9513    26.6162     6.05   0.000     108.7845    213.1181
        fbaby |   -71.3286   19.64701    -3.63   0.000     -109.836   -32.82117
        _cons |   3202.746   54.01082    59.30   0.000     3096.886    3308.605
--------------+----------------------------------------------------------------
OME1          |
         mage |  -7.370881    4.21817    -1.75   0.081    -15.63834    .8965804
    prenatal1 |   25.11133   40.37541     0.62   0.534    -54.02302    104.2457
     mmarried |   133.6617   40.86443     3.27   0.001      53.5689    213.7545
        fbaby |   41.43991   39.70712     1.04   0.297    -36.38461    119.2644
        _cons |   3227.169   104.4059    30.91   0.000     3022.537    3431.801
--------------+----------------------------------------------------------------
TME1          |
     mmarried |  -.6484821   .0554173   -11.70   0.000     -.757098   -.5398663
         mage |   .1744327   .0363718     4.80   0.000     .1031452    .2457202
              |
c.mage#c.mage |  -.0032559   .0006678    -4.88   0.000    -.0045647   -.0019471
              |
        fbaby |  -.2175962   .0495604    -4.39   0.000    -.3147328   -.1204595
         medu |  -.0863631   .0100148    -8.62   0.000    -.1059917   -.0667345
        _cons |  -1.558255   .4639691    -3.36   0.001    -2.467618   -.6488926
-------------------------------------------------------------------------------

The ATE is 3172.366 – 3403.355 = -230.989.

Ultimate ideas

The instance above used a steady consequence: birthweight. teffects will also be used with binary, depend, and nonnegative steady outcomes.

The estimators additionally permit a number of therapy classes.

A complete guide is dedicated to the treatment-effects options in Stata 13, and it features a fundamental introduction, superior dialogue, and labored examples. If you want to be taught extra, you’ll be able to obtain the [TE] Remedy-effects Reference Guide from the Stata web site.

Extra to come back

Subsequent time, partially 2, we are going to cowl the matching estimators.

Reference

Cattaneo, M. D. 2010. Environment friendly semiparametric estimation of multi-valued therapy results underneath ignorability. Journal of Econometrics 155: 138–154.



4 Causes That Make Tailwind Nice for Constructing Layouts

0


Once I discuss layouts, I’m referring to the way you place objects on a web page. The CSS properties which are broadly used right here embrace:

  • show — usually grid or flex these days
  • margin
  • padding
  • width
  • peak
  • place
  • high, left, backside, proper

I usually embrace border-width as a minor merchandise on this listing as effectively.

At this level, there’s just one factor I’d prefer to say.

Tailwind is actually nice for making layouts.

There are numerous the reason why.

First: Format types are extremely depending on the HTML construction

After we shift layouts into CSS, we lose the psychological construction and it takes effort to re-establish them. Think about the next three-column grid in HTML and CSS:

.grid {
  show: grid;
  grid-template-columns: 2fr 1fr;

  .grid-item:first-child {
    grid-column: span 2
  }

  .grid-item:last-child {
    grid-column: span 1
  }
}

Now cowl the HTML construction and simply learn the CSS. As you try this, discover it is advisable to exert effort to think about the HTML construction that this is applicable to.

Now think about the identical, however constructed with Tailwind utilities:

You may virtually start to see the structure manifest in your eyes with out seeing the precise output. It’s fairly clear: A 3-column grid, first merchandise spans two columns whereas the second spans one column.

However grid-cols-3 and col-span-2 are kinda bizarre and foreign-looking as a result of we’re making an attempt to parse Tailwind’s methodology of writing CSS.

Now, watch what occurs after we shift the syntax out of the best way and use CSS variables to outline the structure as a substitute. The structure turns into crystal clear instantly:

Two blue rectangles side-by-side illustrating a two-column layout where the left column is twice the width of the right column.

Similar three-column structure.

Nevertheless it makes the structure a lot simpler to put in writing, learn, and visualize. It additionally has different advantages, however I’ll allow you to discover its documentation as a substitute of explaining it right here.

For now, let’s transfer on.

Why not use 2fr 1fr?

It is sensible to put in writing 2fr 1fr for a three-column grid, doesn’t it?

.grid {
  show: grid;
  grid-template-columns: 2fr 1fr;
}

Sadly, it received’t work. It's because fr is calculated based mostly on the accessible area after subtracting away the grid’s gutters (or hole).

Since 2fr 1fr solely comprises two columns, the output from 2fr 1fr can be completely different from a normal three-column grid.

Three examples of multi-column layouts stacked. The first is an equal three-column layout, the second and third are two columns where the left column is double the width of the right column.

Alright. Let’s proceed with the explanations that make Tailwind nice for constructing layouts.

Second: No want to call layouts

I believe layouts are the toughest issues to call. I hardly ever provide you with higher names than:

  • Quantity + Columns, e.g. .two-columns
  • Semantic names, e.g. .content-sidebar

However these names don’t do the structure justice. You may’t actually inform what’s occurring, even in the event you see .two-columns, as a result of .two-columns can imply a wide range of issues:

  • Two equal columns
  • Two columns with 1fr auto
  • Two columns with auto 1fr
  • Two columns that spans complete of seven “columns” and the primary object takes up 4 columns whereas the second takes up 3…

You may already see me tripping up when I attempt to clarify that final one there…

As a substitute of forcing ourselves to call the structure, we will let the numbers do the speaking — then the entire construction turns into very clear.

The variables paint an image.

Example of a seven-column layout above a two-column layout with equally-sized columns.

Third: Format necessities can change relying on context

A “two-column” structure may need completely different properties when utilized in completely different contexts. Right here’s an instance.

Two two-by-two layouts next to each other. In both cases, the third item wraps to the second line, followed by the fourth item.

On this instance, you'll be able to see that:

  • A bigger hole is used between the I and J teams.
  • A smaller hole is used inside the I and J teams.

The distinction in hole sizes is delicate, however used to indicate that the objects are of separate teams.

Right here’s an instance the place this idea is utilized in an actual challenge. You may see the distinction between the hole used inside the publication container and the hole used between the publication and quote containers.

A two-column layout for a newsletter signup component with the form as the left column that is wider than the width of the right column, containing content.

If this type of structure is barely utilized in one place, we don’t must create a modifier class simply to alter the hole worth. We will change it straight.

One other frequent instance

Let’s say you may have a heading for a advertising and marketing part. The heading would look nicer if you'll be able to range its max-width so the textual content isn’t orphaned.

text-balance may work right here, however that is usually nicer with handbook positioning.

With out Tailwind, you may write an inline type for it.

Your subscription has been confirmed

With Tailwind, you'll be able to specify the max-width in a extra terse approach:

Your subscription has been confirmed

A centered heading in black that says Your subscription has been confirmed.

Fourth: Responsive variants may be created on the fly

“At which breakpoint would you alter your layouts?” is one other issue you’d wish to contemplate when designing your layouts. I shall time period this the responsive issue for this part.

More than likely, comparable layouts ought to have the identical responsive issue. In that case, it is sensible to group the layouts collectively right into a named structure.

.two-column {
  @apply grid-simple;
  /* --cols: 1 is the default */

  @media (width >= 800px) {
    --cols:2;
  }
}

Nevertheless, you will have layouts the place you need two-column grids on a cellular and a a lot bigger column rely on tablets and desktops. This structure type is usually utilized in a website footer part.

For the reason that footer grid is exclusive, we will add Tailwind’s responsive variants and alter the structure on the fly.

Example of a footer that adapts to the screen size. It goes from a two-column layout on small screens to a five-column layout on wider screens.

Once more, we get to create a brand new structure on the fly with out creating a further modifier class — this retains our CSS clear and targeted.

Methods to greatest use Tailwind

This text is a pattern lesson from my course, Unorthodox Tailwind, the place I present you use Tailwind and CSS synergistically.

Personally, I believe one of the best ways to make use of Tailwind is to not litter your HTML with Tailwind utilities, however to create utilities that allow you to create layouts and types simply.

I cowl far more of that within the course in the event you’re to seek out out extra!

Bayesian Pondering for Individuals Who Hated Statistics

0


corridor, Tuesday morning. The professor uncaps a marker and writes throughout the whiteboard: P(A|B) = P(B|A) · P(A) / P(B). Your hand copies the system. Your mind checks out someplace across the vertical bar.

If that reminiscence simply surfaced, you’re in good firm. Analysis suggests as much as 80% of school college students expertise some type of statistics anxiousness. For a lot of, it’s the strongest predictor of their course grade (stronger than prior math means, in response to a College of Kansas examine).

Right here’s what most statistics programs by no means point out: you’ve been doing Bayesian reasoning since childhood. The system on the whiteboard wasn’t instructing you one thing new. It was burying one thing you already understood beneath a pile of notation.


The Drawback That Broke 82% of Docs

Do this earlier than studying additional.

One p.c of ladies aged 40 who take part in routine screening have breast most cancers. A mammogram appropriately identifies most cancers 80% of the time. It additionally produces a false alarm 9.6% of the time, flagging most cancers when none exists.

A lady will get a constructive mammogram. What’s the chance she truly has most cancers?

Take a second.

In 1978, researchers at Harvard Medical College posed an identical base-rate downside to 60 physicians and medical college students. Solely 18% arrived on the right reply. Almost half guessed 95%.

The precise reply for the mammogram downside: 7.8%.

The trick is to depend as a substitute of calculate. Take 10,000 girls:

  • 100 have most cancers (that’s 1%)
  • Of these 100, 80 take a look at constructive (80% sensitivity)
  • Of the 9,900 cancer-free girls, about 950 get a false constructive (9.6%)

Complete constructive mammograms: 80 + 950 = 1,030.

Ladies who even have most cancers among the many positives: 80.

Chance: 80 ÷ 1,030 = 7.8%.

The false positives from the large wholesome group swamp the true positives from the small most cancers group. Picture by the creator.

No Greek letters required. Simply counting.

In Python, it’s 4 traces:

prior = 0.01           # 1% base fee
sensitivity = 0.80     # P(constructive | most cancers)
false_pos = 0.096      # P(constructive | no most cancers)

posterior = (sensitivity * prior) / (
    sensitivity * prior + false_pos * (1 - prior)
)
print(f"{posterior:.1%}")  # 7.8%

German psychologist Gerd Gigerenzer spent a long time finding out this actual failure. When he and Ulrich Hoffrage rewrote chance issues utilizing pure frequencies (counting actual individuals as a substitute of juggling percentages), right responses amongst naive individuals jumped from the only digits to almost 50%. Identical math, completely different illustration. The bottleneck was by no means intelligence. It was the format.


You’ve Been Bayesian Your Complete Life

You do that calculation unconsciously day-after-day.

Your buddy recommends a restaurant. “Finest pad thai within the metropolis,” she says. You open Google Maps: 4.2 stars, 1,200 critiques. Your prior (she is aware of Thai meals, she’s been proper earlier than) meets the proof (stable however not stellar critiques from strangers). Your up to date perception: in all probability good, price making an attempt. You go.

That’s Bayes’ theorem in three seconds. Prior perception + new proof = up to date perception.

A noise at 3 AM. Your prior: the cat knocked one thing over (this occurs twice every week). The proof: it feels like glass shattering, not a smooth thud. Your posterior shifts. You rise up to examine. If you happen to discover the cat standing subsequent to a damaged vase, whiskers twitching, your perception updates once more. Prior confirmed. Again to sleep.

You examine the climate app: 40% likelihood of rain. You look exterior at a blue sky with no clouds on the horizon. Your inside mannequin disagrees with the app. You seize a lightweight jacket however depart the umbrella.

You get an e-mail out of your CEO asking you to purchase reward playing cards. Your prior: she has by no means made a request like this earlier than. The proof: the e-mail got here from a Gmail handle, the grammar feels off, the tone is unsuitable. Your posterior: nearly definitely phishing. You don’t click on.

None of those really feel like statistics. They really feel like widespread sense. That’s the purpose.

The system on the whiteboard was simply notation for what your mind does between sensing an issue and making a call.

The perceived hole between “statistics” and “widespread sense” is an artifact of how statistics is taught. Begin with the system, and also you get confusion. Begin with the instinct, and the system writes itself.


Why Your Statistics Course Received It Backwards

This isn’t a fringe critique. The statistics institution itself has began saying it out loud.

In 2016, the American Statistical Affiliation (ASA) launched its first formal steerage on a particular statistical methodology in 177 years of existence. The goal: p-value misuse. Among the many six rules: p-values don’t measure the chance {that a} speculation is true, and the 0.05 significance threshold is “typical and arbitrary.”

Three years later, 854 scientists signed a Nature commentary titled “Scientists Rise Up Towards Statistical Significance.” The identical concern of The American Statistician carried 43 papers on what comes after p < 0.05.

Picture by the creator.

The core structural downside, as biostatistician Frank Harrell at Vanderbilt describes it: frequentist statistics asks “how unusual are my information, assuming nothing fascinating is occurring?” That’s P(information | speculation). What you truly need is: “given this information, how probably is my speculation?” That’s P(speculation | information).

These usually are not the identical query. Complicated them is what mathematician Aubrey Clayton calls “Bernoulli’s Fallacy,” an error he traces to a particular mistake by Jacob Bernoulli within the 18th century that has been baked into curricula ever since.

How deep does this confusion go? A 2022 examine discovered that 73% of statistics methodology instructors (not college students, instructors) endorsed the commonest misinterpretation of p-values, treating them as P(speculation | information).

“P-values situation on what’s unknown and don’t situation on what is understood. They’re backward possibilities.”

Frank Harrell, Vanderbilt College

The downstream outcome: a replication disaster. The Reproducibility Challenge tried to duplicate 100 printed psychology research. Roughly 60% failed. Replicated results had been, on common, half the initially reported measurement. P-hacking (adjusting evaluation till p < 0.05 seems) was recognized as a main driver.


Bayes in 5 Minutes, No Formulation

Each Bayesian calculation has precisely three components.

The Prior. What you believed earlier than seeing any proof. Within the mammogram downside, it’s the 1% base fee. Within the restaurant resolution, it’s your buddy’s monitor report. Priors aren’t guesses; they’ll incorporate a long time of information. They’re your beginning place.

The Chance. How possible is the proof you noticed, beneath every attainable state of actuality? If most cancers is current, how probably is a constructive take a look at? (80%.) If absent, how probably? (9.6%.) The ratio of those two numbers (80 ÷ 9.6 ≈ 8.3) is the probability ratio. It measures the diagnostic energy of the proof: how a lot ought to this proof transfer your perception?

The Posterior. Your up to date perception after combining prior with proof. That is what you care about. Within the mammogram case: 7.8%.

That’s the entire framework. Prior × Chance = Posterior (after normalizing). The system P(A|B) = P(B|A) · P(A) / P(B) is shorthand for “replace what you believed primarily based on what you simply discovered.”

One crucial rule: a robust prior wants sturdy proof to maneuver. If you happen to’re 95% positive your deployment is steady and a single noisy alert fires, your posterior barely budges. But when three unbiased monitoring programs all flag the identical service at 3 AM, the proof overwhelms the prior. Your perception shifts quick. This is the reason patterns matter greater than single information factors, and why accumulating proof is extra highly effective than any single take a look at.


The PRIOR Framework: Bayesian Reasoning at Work

Right here’s a five-step course of you may apply at your desk on Monday morning. No statistical software program required.

P: Pin Your Prior

Earlier than any information, write down what you imagine and why. Power a quantity. “I feel there’s a 60% likelihood the conversion drop is attributable to the brand new checkout circulation.” This prevents anchoring to regardless of the information reveals first.

Labored instance: Your workforce’s A/B take a look at reviews a 12% raise in sign-ups. Earlier than deciphering, ask: what was your prior? If 9 out of ten comparable experiments at your organization produced lifts beneath 5%, a 12% outcome deserves scrutiny, not celebration. Your prior says giant results are uncommon right here.

R: Price the Proof

Ask two questions:

  • If my perception is right, how probably is that this proof?
  • If my perception is unsuitable, how probably is that this proof?

The ratio issues greater than both quantity alone. A ratio close to 1 means the proof is equally in keeping with each explanations (it’s weak, barely price updating on). A ratio of 8:1 or greater means the proof strongly favors one aspect. Transfer your perception accordingly.

I: Invert the Query

Earlier than concluding something, examine: am I answering the query I care about? “What’s the chance of seeing this information if my speculation had been true” is just not “what’s the chance my speculation is true given this information.” The primary is a p-value. The second is what you need. Complicated them is the only most typical statistical error in printed analysis.

O: Output Your Up to date Perception

Mix prior and proof. Sturdy proof with a excessive probability ratio shifts your perception considerably. Ambiguous proof barely touches it. State the outcome explicitly: “I now estimate a 35% likelihood this impact is actual, down from 60%.”

You don’t want actual numbers. Even tough classes (unlikely, believable, possible, near-certain) beat binary considering (important vs. not important).

R: Rinse and Repeat

Your posterior right this moment turns into tomorrow’s prior. Run a follow-up experiment. Verify a unique information reduce. Each bit of proof refines the image. The self-discipline: by no means throw away your accrued information and begin from scratch with each new dataset.

Picture by the creator.

From Spam Filters to Sunken Submarines

Bayesian reasoning isn’t only a considering instrument. It runs in manufacturing programs processing billions of selections.

Spam filtering. In August 2002, Paul Graham printed “A Plan for Spam,” introducing Bayesian classification for e-mail. The system assigned every phrase a chance of showing in spam versus reliable mail (the probability), mixed it with the bottom fee of spam (the prior), and computed a posterior for every message. Graham’s filter caught spam at a 99.5% fee with zero false positives on his private corpus. Each main e-mail supplier now makes use of some descendant of this strategy.

Hyperparameter tuning. Bayesian optimization has changed grid search at corporations operating costly coaching jobs. As a substitute of exhaustively testing each setting mixture, it builds a probabilistic mannequin of which configurations will carry out nicely (the prior), evaluates essentially the most promising candidate, observes the outcome, and updates (posterior). Every iteration makes a wiser alternative. For a mannequin that takes hours to coach, this could reduce tuning time from weeks to days.

Uncertainty quantification. Probabilistic programming frameworks like PyMC and Stan construct fashions that output full chance distributions as a substitute of single numbers. Somewhat than “the coefficient is 0.42,” you get “the coefficient falls between 0.35 and 0.49 with 95% chance.” This can be a Bayesian credible interval. Not like a frequentist confidence interval, it truly means what most individuals suppose a confidence interval means: there’s a 95% likelihood the true worth is in that vary.

However essentially the most dramatic Bayesian success story entails a nuclear submarine on the backside of the Atlantic.

In Could 1968, the USS Scorpion did not arrive at its dwelling port in Norfolk, Virginia. Ninety-nine males aboard. The Navy knew the sub was someplace within the Atlantic, however the search space spanned 1000’s of sq. miles of deep ocean ground.

Mathematician John Craven took a unique strategy than grid-searching the ocean. He assembled specialists and had them assign possibilities to 9 failure eventualities (hull implosion, torpedo malfunction, navigation error). He divided the search space into grid squares and assigned every a previous chance primarily based on the mixed estimates.

Then the search started. Each time a workforce cleared a grid sq. and located nothing, Craven up to date the posteriors. Empty sq. 47? Chance mass shifted to the remaining squares. Every failed search was not a wasted effort. It was proof, systematically narrowing the probabilities.

Each grid sq. that turned up empty wasn’t a failure. It was information.

The tactic pinpointed the Scorpion inside 220 yards of the anticipated location, on the ocean ground at 10,000 toes. The identical Bayesian search approach later situated a hydrogen bomb misplaced after a 1966 B-52 crash close to Palomares, Spain, and helped discover the wreckage of Air France Flight 447 within the deep Atlantic in 2011.


Return to the mammogram downside for a second.

The rationale 82% of docs acquired it unsuitable wasn’t arithmetic. It was that no person taught them to ask the one query that issues: how widespread is that this situation within the inhabitants being examined?

That query (the prior) is essentially the most uncared for step in information interpretation. Skip it, and also you mistake a false alarm for a prognosis, a loud experiment for an actual impact, a coincidence for a sample.

Each statistic you encounter this week is a mammogram outcome. The headline claiming a drug “doubles your danger.” The A/B take a look at with p = 0.03. The efficiency evaluation primarily based on a single quarter of information.

Each is proof. None is a conclusion.

The conclusion requires what you’ve all the time had: what you knew earlier than you noticed the quantity. Your statistics professor simply by no means gave you permission to make use of it.


References

  1. Casscells, W., Schoenberger, A., & Graboy, T.B. (1978). “Interpretation by Physicians of Scientific Laboratory Outcomes.” New England Journal of Medication, 299(18), 999-1001.
  2. Gigerenzer, G. & Hoffrage, U. (1995). “Easy methods to Enhance Bayesian Reasoning With out Instruction: Frequency Codecs.” Psychological Evaluation, 102, 684-704.
  3. American Statistical Affiliation (2016). “The ASA Assertion on Statistical Significance and P-Values.” The American Statistician, 70(2), 129-133.
  4. Amrhein, V., Greenland, S., & McShane, B. (2019). “Scientists Rise Up Towards Statistical Significance.” Nature, 567, 305-307.
  5. Open Science Collaboration (2015). “Estimating the Reproducibility of Psychological Science.” Science, 349(6251), aac4716.
  6. Graham, P. (2002). “A Plan for Spam.”
  7. Harrell, F. (2017). “My Journey from Frequentist to Bayesian Statistics.” Statistical Pondering.
  8. Clayton, A. (2021). Bernoulli’s Fallacy: Statistical Illogic and the Disaster of Trendy Science. Columbia College Press.
  9. Badenes-Ribera, L., et al. (2022). “Persistent Misconceptions About P-Values Amongst Tutorial Psychologists.” PMC.
  10. Kalid Azad. “An Intuitive (and Quick) Rationalization of Bayes’ Theorem.” BetterExplained.
  11. Wikipedia contributors. “Bayesian Search Concept.” Wikipedia.

Open VSX extensions hijacked: GlassWorm malware spreads through dependency abuse

0

The most recent wave additionally mimics extensively used developer instruments to maximise set up probabilities. “The extensions overwhelmingly impersonate extensively put in developer utilities: linters and formatters like ESLint and Prettier, code runners, well-liked language tooling for Angular, Flutter, Python, and Vue, and customary quality-of-life extensions like vscode-icons, WakaTime, and Higher Feedback,” the researchers stated. “Notably, the marketing campaign additionally targets AI developer tooling, with extensions focusing on Claude Code, Codex, and Antigravity.”

The researchers added that as of March 13, Open VSX has eliminated the vast majority of the transitively malicious extensions, but just a few stay dwell, indicating ongoing takedowns.

Socket revealed indicators of compromise (IOCs) tied to the marketing campaign, together with the names of dozens of malicious Open VSX extensions and related writer accounts believed to be linked to the operation. Moreover, the researchers advocate treating extension dependencies with the identical scrutiny sometimes utilized to software program packages. Organizations ought to monitor extension updates, audit dependency relationships, and prohibit set up to trusted publishers the place doable, as attackers more and more exploit the developer tooling ecosystem as a supply-chain entry level.

Moonshot AI Releases 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 to Change Fastened Residual Mixing with Depth-Smart Consideration for Higher Scaling in Transformers


Residual connections are one of many least questioned components of contemporary Transformer design. In PreNorm architectures, every layer provides its output again right into a operating hidden state, which retains optimization secure and permits deep fashions to coach. Moonshot AI researchers argue that this normal mechanism additionally introduces a structural downside: all prior layer outputs are gathered with fastened unit weights, which causes hidden-state magnitude to develop with depth and progressively weakens the contribution of any single layer.

The analysis workforce proposes Consideration Residuals (AttnRes) as a drop-in alternative for traditional residual accumulation. As an alternative of forcing each layer to eat the identical uniformly blended residual stream, AttnRes lets every layer mixture earlier representations utilizing softmax consideration over depth. The enter to layer (l) is a weighted sum of the token embedding and former layer outputs, the place the weights are computed over prior depth positions fairly than over sequence positions. The core thought is easy: if consideration improved sequence modeling by changing fastened recurrence over time, the same thought might be utilized to the depth dimension of a community.

https://github.com/MoonshotAI/Consideration-Residuals/tree/grasp?tab=readme-ov-file

Why Normal Residuals Turn into a Bottleneck

The analysis workforce recognized three points with normal residual accumulation. First, there may be no selective entry: all layers obtain the identical aggregated state though consideration layers and feed-forward or MoE layers might profit from completely different mixtures of earlier data. Second, there may be irreversible loss: as soon as data is mixed right into a single residual stream, later layers can not selectively recuperate particular earlier representations. Third, there may be output progress: deeper layers have a tendency to supply bigger outputs to stay influential inside an ever-growing gathered state, which might destabilize coaching.

That is the analysis workforce’s primary framing: normal residuals behave like a compressed recurrence over layers. AttnRes replaces that fastened recurrence with express consideration over earlier layer outputs.

Full AttnRes: Consideration Over All Earlier Layers

In Full AttnRes, every layer computes consideration weights over all previous depth sources. The default design does not use an input-conditioned question. As an alternative, every layer has a realized layer-specific pseudo-query vector wl ∈ Rd, whereas keys and values come from the token embedding and former layer outputs after RMSNorm. The RMSNorm step is vital as a result of it prevents large-magnitude layer outputs from dominating the depth-wise consideration weights.

Full AttnRes is simple, however it will increase price. Per token, it requires O(L2 d) arithmetic and (O(Ld)) reminiscence to retailer layer outputs. In normal coaching this reminiscence largely overlaps with activations already wanted for backpropagation, however underneath activation re-computation and pipeline parallelism the overhead turns into extra vital as a result of these earlier outputs should stay out there and will should be transmitted throughout phases.

Block AttnRes: A Sensible Variant for Giant Fashions

To make the tactic usable at scale, Moonshot AI analysis workforce introduces Block AttnRes. As an alternative of attending over each earlier layer output, the mannequin partitions layers into N blocks. Inside every block, outputs are gathered right into a single block illustration, and a focus is utilized solely over these block-level representations plus the token embedding. This reduces reminiscence and communication overhead from O(Ld) to O(Nd).

The analysis workforce describes cache-based pipeline communication and a two-phase computation technique that make Block AttnRes sensible in distributed coaching and inference. This ends in lower than 4% coaching overhead underneath pipeline parallelism, whereas the repository experiences lower than 2% inference latency overhead on typical workloads.

Scaling Outcomes

The analysis workforce evaluates 5 mannequin sizes and compares three variants at every dimension: a PreNorm baseline, Full AttnRes, and Block AttnRes with about eight blocks. All variants inside every dimension group share the identical hyperparameters chosen underneath the baseline, which the analysis workforce notice makes the comparability conservative. The fitted scaling legal guidelines are reported as:

Baseline: L = 1.891 x C-0.057
Block AttnRes: L = 1.870 x C-0.058
Full AttnRes: L = 1.865 x C-0.057

The sensible implication is that AttnRes achieves decrease validation loss throughout the examined compute vary, and the Block AttnRes matches the lack of a baseline skilled with about 1.25× extra compute.

Integration into Kimi Linear

Moonshot AI additionally integrates AttnRes into Kimi Linear, its MoE structure with 48B whole parameters and 3B activated parameters, and pre-trains it on 1.4T tokens. Based on the analysis paper, AttnRes mitigates PreNorm dilution by preserving output magnitudes extra bounded throughout depth and distributing gradients extra uniformly throughout layers. One other implementation element is that each one pseudo-query vectors are initialized to zero so the preliminary consideration weights are uniform throughout supply layers, successfully lowering AttnRes to equal-weight averaging in the beginning of coaching and avoiding early instability.

On downstream analysis, the reported features are constant throughout all listed duties. It experiences enhancements from 73.5 to 74.6 on MMLU, 36.9 to 44.4 on GPQA-Diamond, 76.3 to 78.0 on BBH, 53.5 to 57.1 on Math, 59.1 to 62.2 on HumanEval, 72.0 to 73.9 on MBPP, 82.0 to 82.9 on CMMLU, and 79.6 to 82.5 on C-Eval.

Key Takeaways

  • Consideration Residuals replaces fastened residual accumulation with softmax consideration over earlier layers.
  • The default AttnRes design makes use of a realized layer-specific pseudo-query, not an input-conditioned question.
  • Block AttnRes makes the tactic sensible by lowering depth-wise reminiscence and communication from O(Ld) to O(Nd).
  • Moonshot analysis teamreports decrease scaling loss than the PreNorm baseline, with Block AttnRes matching about 1.25× extra baseline compute.
  • In Kimi Linear, AttnRes improves outcomes throughout reasoning, coding, and analysis benchmarks with restricted overhead.

Try Paper and RepoAdditionally, be at liberty to observe us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be a part of us on telegram as effectively.


Microsoft pulls Samsung app blocking Home windows C: drive from Retailer

0


​Microsoft has eliminated the Samsung Galaxy Join app from the Microsoft Retailer as a result of it was inflicting points on particular Samsung Galaxy Ebook 4 and desktop fashions working Home windows 11.

This comes after the corporate mentioned on Friday that it was investigating studies of app failures and customers shedding entry to their C: drive on some Home windows 11 programs.

“Customers would possibly encounter the error, ‘C: shouldn’t be accessible – Entry denied,’ which prevents entry to recordsdata and blocks the launch of some purposes together with Outlook, Workplace apps, internet browsers, system utilities and Fast Help,” Microsoft defined.

The recognized subject impacts a variety of Samsung Galaxy Ebook 4 and Samsung Desktop fashions working Home windows 11, together with NP750XGJ, NP750XGL, NP754XGJ, NP754XFG, NP754XGK, DM500SGA, DM500TDA, DM500TGA, and DM501SGA.

On affected gadgets, customers have been experiencing issues launching apps, accessing recordsdata, or performing administrative duties, and, in some circumstances, points elevating privileges, uninstalling updates, or gathering logs attributable to permission failures.

​Following a joint investigation with Samsung, Microsoft has attributed these points to the Samsung Galaxy Join app (used for display screen mirroring, file sharing, and knowledge switch between Galaxy gadgets and Home windows PCs) and briefly eliminated it from the Microsoft Retailer.

“The affected Samsung Galaxy Join software was briefly faraway from the Microsoft Retailer to forestall additional installations,” Microsoft mentioned.

“Samsung has republished a secure earlier model of the appliance to cease recurrence on further gadgets. Restoration choices for gadgets already impacted stay restricted, and Samsung continues to judge remediation approaches with Microsoft’s help.”

Microsoft and Samsung haven’t but supplied a workaround and are nonetheless engaged on a repair for affected Home windows 11 gadgets. Impacted customers are suggested to contact Samsung for device-specific help.

On Friday, Microsoft additionally launched an out-of-band (OOB) replace to repair a safety subject within the Routing and Distant Entry Service (RRAS) administration device affecting Home windows 11 Enterprise gadgets that obtain hotpatch updates as a substitute of standard Patch Tuesday cumulative updates.

Malware is getting smarter. The Pink Report 2026 reveals how new threats use math to detect sandboxes and conceal in plain sight.

Obtain our evaluation of 1.1 million malicious samples to uncover the highest 10 methods and see in case your safety stack is blinded.

A 100-year-old concept may clarify what’s mistaken with quantum mechanics

0


Quantum mechanics is each essentially the most highly effective concept physicists have ever devised and essentially the most baffling. On the one hand, numerous experiments have confirmed its predictions; the idea undergirds fashionable know-how and permits the digital gadgets we use daily. Alternatively, quantum mechanics describes an underlying actuality that’s totally at odds with the world we understand. Within the quantum realm, a single particle exists in lots of locations directly—a minimum of whereas nobody is it. The idea additionally permits for inexplicable connectedness: a pair of atoms, irrespective of how extensively separated, could be “entangled,” such that no matter occurs to at least one atom instantaneously impacts the opposite. Albert Einstein known as the phenomenon “spooky motion at a distance.”

These paradoxes have outlined—or plagued—the idea since its inception greater than a century in the past. To at the present time, physicists nonetheless don’t agree on what quantum mechanics is telling us concerning the nature of actuality. Are there a number of universes? Do issues come into existence solely once they’re noticed? Is consciousness in some way central to the legal guidelines of physics? And what if all these mysteries might have been resolved proper on the start of quantum mechanics?

. That’s the case that physicist Antony Valentini, a physicist at Imperial Faculty London, makes in his new e book Past the Quantum: A Quest for the Origin and Hidden That means of Quantum Mechanics (Oxford College Press, 2026).


On supporting science journalism

If you happen to’re having fun with this text, think about supporting our award-winning journalism by subscribing. By buying a subscription you might be serving to to make sure the way forward for impactful tales concerning the discoveries and concepts shaping our world right this moment.


Valentini argues that Louis de Broglie, a French physicist and Nobel laureate, developed a framework for quantum mechanics that eradicated its paradoxes round 100 years in the past. In pilot wave concept, as de Broglie’s brainchild is thought, particles are guided by attendant waves. The particles themselves are at all times in a single place and one place solely; it’s the spatially prolonged pilot wave that creates the impression that a particle is directly right here and there. There’s no want for an observer to conjure that particle into being. Although de Broglie’s conjecture in 1924 concerning the wavelike nature of matter was rapidly confirmed by experiment and have become integral to quantum concept, the physics group discounted or misrepresented the bigger concepts from which he derived his key insights.

Valentini has spent his total profession championing and lengthening de Broglie’s views. He just lately spoke to Scientific American about his lonely path and why de Broglie might need been on to one thing.

[An edited transcript of the interview follows.]

Within the historical past of science, has there ever been one other state of affairs like this, the place there have been such wildly divergent views about what a concept means?

I’m unsure there has. If you happen to return to the time of [Isaac] Newton, he thought that area was empty and that there was a direct gravitational motion at a distance. And on the continent, there have been the Cartesians [followers of mathematician and philosopher René Descartes], who thought, “Oh no, area is stuffed with this materials medium, and that explains gravitational attraction.” However [the debate] didn’t final all that lengthy. Definitely within the quantum case, the sheer number of interpretations that say such fully various things concerning the world—I believe it’s a fairly secure wager that there’s no analogue within the historical past of science.

Some of the placing issues about fashionable physics is the stark divide between the macroscopic and quantum worlds, every of which appears to be ruled by totally totally different bodily legal guidelines. You liken this to the best way medieval astronomers cut up the cosmos into earthly and celestial areas.

I believe it’s a helpful and legitimate parallel, this concept that there was a heavenly realm that we couldn’t perceive; something above the moon and past was everlasting and unchanging, fully totally different from the sublunar world, which was fabricated from bizarre, imperfect matter that was at all times altering. It’s a distinction that goes again to Aristotle. The parallel with quantum mechanics is extraordinary, that the quantum system is one thing that our thoughts can’t perceive. We will solely perceive the macroscopic one.

Austrian physicist Erwin Schrödinger developed quantum concept’s wave equation, which describes quantum programs as waves that evolve with time. What position did this equation play inthe so-called measurement downside: If a particle exists in other places directly, why do measurements discover any given particle in solely a single location?

Schrödinger created the measurement downside by eradicating the particles from de Broglie’s concept. Mathematically, a [quantum wave] is a superposition of many alternative positions: a particle could be right here and right here and right here; it may be anyplace. You’ll be able to have a superposition of a stay cat and a lifeless cat, or a superposition of various energies. They’re all simply totally different variations on the identical theme. The wave equation incorporates all potential positions.How will you then clarify that we see this little pointlike object if the one actuality is an prolonged wave?

And this conundrum was acknowledged early within the growth of quantum concept.

Right here’s Wolfgang Pauli writing to Niels Bohr in 1927: “Within the final concern of the Journal de Physique, a paper by de Broglie has appeared…. It is rather wealthy in concepts and really sharp, and on a a lot greater degree than the infantile papers by Schrödinger, who even right this moment nonetheless thinks he might … abolish materials factors.”And becauseSchrödinger eliminated [particles from his equation], we’ve ended up with many years of confusion.

Why do you assume de Broglie’s concept was put aside and uncared for?

I’m unsure there’s one easy reply. It’s possibly a mixture of causes.

In 1923 de Broglie had developed a brand new concept of movement. It was a whole break, very totally different from Newtonian and even Einsteinian physics. And but this fully handed individuals by. The one factor that entered the collective consciousness of physicists was that de Broglie had proven {that a} particle can behave like a wave.

Phrase of de Broglie’s thesis unfold, although hardly anybody truly learn it. Einstein did. It was Einstein who actually alerted folks that de Broglie had executed one thing crucial. He inspired Schrödinger to learn it—and he learn it. Most different individuals, it appears, by no means learn de Broglie’s thesis.

After which there’s the sociological level that de Broglie was fairly remoted in Paris. De Broglie was a little bit of a loner; he labored primarily by himself. At the moment, within the Twenties, France was actually a backwater in theoretical physics. It was sturdy in experimental physics, sturdy in arithmetic however not in theoretical physics.

Has your pursuit of pilot wave concept been a lonely one? Rewarding? Irritating?

The quick reply is all of that and extra. Has it been lonely? It’s been this peculiar state of affairs. I’ve actually tried to get the important thing factors throughout to physicists. And it simply appears to fall on deaf ears. It’s as if persons are caught on repeat—the identical mistaken arguments, the identical historic misconceptions simply go round and round and round.

Once I first got here throughout pilot wave concept, it appeared to me so apparent. Oh my god, pilot wave concept in precept is a wider physics; quantum concept is a particular case of one thing larger. Pilot wave concept has thrilling new physics, and possibly we will discover proof for it.

In your e book, you describe how pilot wave concept’s predictions concerning the physics of matter differ in some instances from the predictions of accepted quantum mechanics. Particularly, you point out how the cosmic microwave background (CMB)—the radiation created in the course of the large bang that now permeates the universe—may help among the predictions of pilot wave concept.

The CMB is a superb and promising avenue, and I’ve executed a number of work on that with varied collaborators. There are reported anomalies within the CMB that qualitatively match the sort of anomalies that pilot wave concept would predict. There are some tantalizing hints, however the information are simply too noisy to attract any agency conclusions. This in all probability gained’t be settled for an additional 10 years or so.

Is pilot wave concept true? Is it an correct concept of the world? If I knew it was true, I wouldn’t be researching it. There’s at all times, behind my thoughts, the thought that this might all be fully mistaken! Or it could possibly be that it’s form of partly proper. Within the late nineteenth century Ludwig Boltzmann modeled [gas molecules] as little billiard balls—little arduous spheres which are bouncing round. It seems that molecules are far more sophisticated than that. However nonetheless, his mannequin contained a number of fact. It could be that pilot wave concept is a bit like that, an approximate mannequin.

Let’s Get Caught Up… Once more – EpidemioLogical

0


Sorry that I haven’t saved up with writing on this weblog. I really feel responsible, if solely slightly, for posting extra persistently on Medium and Substack than right here. The 2 or three of you who’ve been following me since 2004 deserve extra and higher content material right here. My apologies.

Now, let’s get caught up, lets?

After I final up to date you on my shenanigans, I used to be reeling about how some lives matter, and others don’t. I’ve given this extra thought as we’re dealing now with a nascent warfare in — the place else? — the Center East.

You see, my daughter is now extra awake than ever, and he or she is asking a number of questions. Many are simple to reply, however different questions… I imply, how do you clarify to her why the President of the US is so seemingly filled with anger at individuals he’s by no means met? How do you clarify to her that there are individuals who genuinely consider she shouldn’t be an American as a result of I, not her, was not born right here?

Sure, the world is a loopy place, and she is going to come to grasp that quickly sufficient. However I’m easing her into that understanding. I’m not simply dropping her into the deep finish of the pool, so to talk. (Does that ever work, by the way in which? Do youngsters be taught to swim by being dropped into the deep finish? Or do they drown?) So I’m relying rather a lot on my spouse’s understanding of human habits to clarify to the little one which, sure, on the typical and in the long term, there’ll all the time be evil on the planet. However, on the typical and in the long term, issues are getting higher for us as a species.

I imply, issues are getting higher, proper? We appear to be progressing, even when we take two or three steps again extra usually than I believe apes with huge brains ought to.

Talking of huge brains, why is it that some individuals truthfully assume they’re superior to others? Like, what’s the psychology behind it? Are they compensating for not feeling inferior by making an attempt to make others inferior? Or do they actually really feel superior, they usually need everybody to know that they’re? Why not simply be humble and quiet about it as a substitute of being — for lack of a greater time period — dicks

Sure, sure, I’ve acted all excessive and mighty with individuals earlier than. I’m not resistant to that feeling. However I’ve by no means even pretended to be smarter than somebody whom I do know is smarter than me on a topic. You don’t see me go as much as the Paul Offits or Stanley Plotkins of the world and problem them on vaccines. I don’t inform my dad how one can repair a Dodge transmission.

However individuals certain do appear comfy making an attempt to place me in my place once they don’t know what they’re speaking about, or they clearly don’t know extra a few topic than I do.

I’m wondering why?

Persevering with on the subject of brains, I’ve been present process a battery of checks since November as a result of I developed a bizarre neurological factor. It’s not a dysfunction. I can do work and different actions of each day dwelling with out a downside. It’s simply that my steadiness has been off, and there have been a few instances within the automobile the place I felt like I used to be about to cross out.

So I’ve been assembly with medical doctors, making an attempt totally different therapies, and issues are getting higher. I’ve some dangerous days, however most days are good. And I’m assured I’m going to recover from this. It’s only a matter of being affected person.

Then there’s work… And, man, there’s been a number of work. The present administration operating the Govt Department of the Federal Authorities, and the designees to the Division of Well being and Human Companies, have been doing a reasonably good job at destroying any belief in public well being. Destroying public well being goes to be harder for them, however that’s not stopping them from making an attempt.

It’s virtually as if a sure scion of an American political household bought a brand new toy canine, and he’s determined to interrupt the toy into 1,000,000 items as a result of he’s all the time hated canine. (Then once more, he most likely eats canine, given his tendency to name himself a “carnivore.”)

Anyway, that’s all for now. You will have observed I’ve up to date the positioning slightly bit. I threw on a brand new coat of paint and a few customized code right here and there to make it look slightly higher. I’m very pleased with the end result. One of the best half? I didn’t need to pay hundreds of {dollars} to net builders.

I did all of it alone… With some assist from sure AI’s. However that could be a complete different story.

I hope you’re nicely. Thanks for studying.

The 2026 Definitive Information to Operating Native LLMs in Manufacturing

0



A complete pillar information on architecting, deploying, and managing native Giant Language Fashions (LLMs) for enterprise and manufacturing use instances in 2026. This text should transfer past ‘how you can set up Ollama’ and canopy the complete stack: {hardware} choice (H100 vs A100 vs RTX 4090 clusters), inference engine choice (vLLM vs TGI vs TensorRT-LLM), and observability pipelines.

Key Sections:
1. **The Enterprise Case:** Privateness, latency, and value modeling (Cloud vs On-Prem).
2. **{Hardware} Panorama 2026:** VRAM math, quantization trade-offs (AWQ vs GPTQ vs GGUF), and multi-GPU orchestration.
3. **The Software program Stack:** Working System optimizations, Docker/Containerization, and the rise of ‘AI OS’.
4. **Inference Engines:** Deep dive into high-throughput serving with vLLM and steady batching.
5. **Observability:** Metrics that matter (Time to First Token, Tokens Per Second, Queue Depth) utilizing Prometheus/Grafana.

**Inside Linking Technique:** Hyperlink to all 7 supporting articles on this cluster as deep-dive assets. That is the central hub.

Proceed studying
The 2026 Definitive Information to Operating Native LLMs in Manufacturing
on SitePoint.