Tuesday, June 9, 2026

On-line (one-pass) algorithms


Canonical instance

The pattern variance of a set of numbers is outlined when it comes to the sum of the squared distances from every level to the imply.

So it will appear that you simply first must calculate the imply, then return and compute the squared variations from the imply. And but pattern variance may be computed in a single move by way of the info.

You’ll discover two equal equations in statistics books: the one described above and one other primarily based on the sum of the info factors and the sum of the info factors squared.

s^2 = frac{1}{n(n-1)}left(nsum_{i=1}^n x_i^2 -left(sum_{i=1}^n x_iright)^2right)

Whereas this equation is theoretically appropriate, it’s numerically unstable. Code that straight implements this equation can return a detrimental worth for a amount that’s theoretically optimistic. I’ve seen this occur with actual knowledge, inflicting a program to crash when taking the sq. root of the variance to get the usual deviation.

Nevertheless, there’s an algorithm that computes imply and variance in a single move that’s correct and numerically secure. This algorithm was developed by B. P. Welford in 1962. I talk about Welford’s algorithm and provides code for implementing it right here.

On-line algorithms

Welford’s algorithm is understood in pc science as an “on-line” algorithm. This time period was coined nicely earlier than the Web. For instance, see the paper [1] from 1965.

However in fact now “on-line” means one thing else, and so the technical and colloquial makes use of of “on-line algorithm” have break up. Technical literature makes use of the phrase to explain the sorts of algorithms on this put up. Most individuals would take “on-line algorithm” to imply code that runs on a distant server. You might even see “streaming algorithm” as a up to date technical time period, however I’d nonetheless search on “on-line algorithm” to seek out papers.

Computing greater moments on-line

Welford’s algorithm computes the primary two moments, imply and variance, of a knowledge set on-line. It’s also potential to compute skewness and kurtosis on-line, in addition to greater moments.

On-line regression

Easy linear regression is carefully associated to calculating imply and variance, and it’s potential to compute easy regression coefficients on-line. I’ve some outdated notes on this right here.

This put up was motivated by an e-mail asking me about a number of regression. It’s also potential to compute a number of regression coefficients on-line, however I haven’t carried out this. I discovered a pair references, [2] and [3], however I’ve not learn them. There’s a easy process for 2 predictor variables however I consider issues get a little bit extra difficult with three or extra predictors, requiring a recursive least squares algorithm.

Associated posts

The notion of on-line algorithms is carefully associated to the notion of a fold in practical programming. Listed below are a number of posts on computing issues with folds.

[1] One-Tape, Off-Line Turing Machine Computations by F. C. Hennie. Data and Management. 8, 553-578 (1965). Obtainable right here. On this paper Hennie writes “In an on-line computation the enter knowledge are provided to the machine, one image at a time, at a particular enter terminal. … In an off-line computation the entire enter symbols are written on one of many machine’s tapes previous to the beginning of the computation.

[2] Arthur Albert and Robert W. Sittler, “A Methodology for Computing Least Squares Estimators that Hold Up with the Information,” Journal of the Society for Industrial and Utilized Arithmetic, Sequence A: Management, 3(3), 384–417, 1965. DOI: 10.1137/0303026.

[3] Petre Stoica and Per Ashgren. Actual initialization of the recursive least-squares algorithm. Int. J. Adapt. Management Sign Course of. 2002; 16:219&ndashh;230.

Related Articles

Latest Articles