Ten years of weblog posts
Just a few months in the past—26 July 2025 to be exact—was the tenth anniversary of my first weblog submit. Over that point it seems I’ve written about 225 weblog posts, and an astonishing (to me) 350,000 phrases. That’s after you are taking out the code.
Free Vary Statistics is an old style weblog, with a single writer and really a lot representing the best of a “internet log” simply recording issues of curiosity to me. It’s not a complete private weblog (I by no means have posts nearly my journey, household life, and so on.), however targeted on points that one way or the other relate to statistics—starting from the summary and methodological, via to particular functions of the sort “right here’s a enjoyable chart of some fascinating historic or present information I noticed”. It’s strictly non-monetised; open to the world to learn free of charge, and can by no means make paid endorsements. I’ll go a bit into what’s saved me motivated later, however the spoiler is that, like artwork, running a blog is for my part one thing finest accomplished primarily in your personal pursuits and wishes, and if anybody else likes it that’s a bonus.
The ten years of weblog historical past hasn’t been a good one, however has had some ebbs and flows. We are able to see this on this chart of variety of weblog posts monthly over time.
Code for these charts is on the backside of the submit. Two issues value noting about this one are how I’ve turned the months with zero posts into hole circles to de-emphasise them visually whereas nonetheless together with the zeroes within the modelling; and used for the pattern line a steady single mannequin over all years as a substitute of a separate mannequin match to every year-facet, which might be the straightforward default however does probably not make sense given how time is steady and all.
The low level of submit frequency was 2021 and 2022, when life occasions bought in the best way. I used to be very busy in my day job as Chief Knowledge Scientist for Nous Group, and this additionally was pretty hands-on technical itself which diminished my motivation to jot down code out-of-hours to loosen up. I used to be additionally taking part in loads of Elite Harmful on this interval, proper up till 2024 (when the civil unrest in Noumea led me to drop that chilly turkey). Mid 2018 and mid 2022 each noticed me change jobs and international locations. In 2025 I’ve had well being challenges, however these appear to be below management and I’m entering into a greater modus vivendi with them.
The previous couple of years has seen a delicate however materials uptick in my posting frequency, and I believe that is going to proceed. I’ve bought fairly a backlog of half-finished posts to jot down about. These are on matters starting from artificial controls, to energy and p-values, to plenty of empirical stuff on the Pacific.
One factor that’s occurred over time is the posts have gotten longer and, maybe, extra thorough over time. Definitely they’re much extra more likely to be crafted over weeks and even months (or years in some circumstances), moderately than knocked out in a single Saturday morning as was once the case. Again after I wrote 45 posts in 2016—practically one every week—they have been brief, very single subject, no nice degree of element. Extra just lately I’m extra inclined to attempt to completely tease one thing out, notably when I’m studying for myself or making an attempt to consolidate my understanding of one thing. A very good instance could be my current set of posts on modelling fertility charges, which I needed to break up into two, one on the substance and one on the seize bag of issues I discovered on the best way.
Right here’s a linked scatter plot that lets us see each phrase depend and posting frequency collectively, with some very crude characterisations of attribute themes I used to be writing about on the time:
Whereas one does one’s artwork for its personal sake, there’s no denying it’s fascinating to see what different individuals learn in my weblog too. I get a modest however regular trickle of round 60 distinctive guests and 80-100 pages learn a day. That’s, modest in comparison with say Heather Armstrong’s peak numbers of about 300,000 guests a day on the peak of mummy running a blog, however fairly a couple of greater than I believed I’d get after I set out (which might have been, to be trustworthy, in spherical numbers, round zero).
At its excessive level again when Twitter roughly labored, I wrote extra regularly and was doing election forecasts, I assume I bought about 70% extra site visitors than now, nevertheless it’s exhausting to inform, with altering approaches to monitoring guests.
I used to have an automatic “hottest” itemizing however adjustments in analytics companies over the weblog’s lifetime degraded this and I’ve pulled it. However from a extra advert hoc examination utilizing partial information from some blended sources (too difficult to speak about right here), listed here are some posts which were most learn just lately:
That is fascinating and I believe might be exhibiting some exterior searches are turning up my weblog on fundamental methodological questions. This have to be dominating over social media or RSS feeds pulling in guests after I publish a brand new submit. I’m happy to say every of those posts above does certainly have one thing helpful in it—roughly outlined as that means I typically return to them myself to see what I believed. So I hope different individuals are discovering them of some use on the finish of their random internet search too.
If I had an extended collection of analytics information I’m positive my numerous election-related posts pages, and time collection modelling posts, could be within the real prime hits. At one level it appeared like a few of my comparisons of forecasting strategies have been within the required studying for some programs, they have been getting so many hits.
Weblog benchmarks
I did some cursory web analysis into weblog longevity, to see how my 10 years stands up compared. ChatGPT first assured me that analysis stated the 60-70% of blogs are deserted after one 12 months (attributed to Herring et al) and that the median life was 4 months (Mishne and de Rijke) or 50% stopped after one month (similar alleged authors).
These all sound believable! And perhaps these authors did discover that. However I can’t (with restricted time and entry, admittedly) discover them doing so. Software of intensive interrogation strategies to ChatGPT revealed that these have been issues that it thought sounded believable as issues these individuals may have written, moderately than it may truly discover actual, revealed papers that contained these numbers.
Really, ChatGPT is like an enthusiastic, immensely well-read however very unreliable analysis assistant who has had a few drinks, whose outputs ought to all be prefaced with “I appear to recollect studying or listening to someplace….” and handled with a heap of scepticism.
When it comes to actual findings I can truly supply, some analysis from again when blogs have been cool and earlier than short-form social media actually took off discovered that 1 / 4 of blogs solely final one submit. Again in 2003, apparently, “the standard weblog is written by a teenage woman who makes use of it twice a month to replace her buddies and classmates on happenings in her life.” Today, I don’t assume such individuals write blogs and even micro-blogs, however submit movies on TikTok or equal.
A 2012 research of analysis blogs—nearer in type and motivation to my very own than the extra private blogs that make (or made) up the majority of the blogosphere—discovered 84% of analysis blogs revealed below the writer’s personal identify; 86% in English; and 72% by one or two male authors. So I’m within the majority in these respects.
At round 1,500 phrases every, my weblog posts are for much longer than the common of 200-300 phrases discovered by Susan Herring and others in a 2004 research.
A lot of the analysis above is dated. Successfully it precedes the rise of video-based influencers. Quick-form video (TikTok and so on), podcasts, common video, and short-form textual content (X, Bluesky, LinkedIn and so on) appear to dominate over written blogs as of late. I’ve little interest in producing any of these items besides the short-form textual content / social media websites.
There are nonetheless apparently one million or so lively blogs, a lot of them forming a extra steady piece of infrastructure beneath the froth of those extra trendy types. That is mainly how I have interaction with Bluesky, Mastodon and LinkedIn too, when it comes to the connection with my weblog. I write within the weblog, and use the social media to publicise that writing.
Why I write my weblog
Ten years is successful, I assume. Whereas I couldn’t discover a citable supply, I’m properly ready to imagine that the majority blogs are deserted after a couple of months. So what saved me motivated to maintain writing for ten years?
My motivations have definitely advanced over time as I settled right into a rhythm of writing and publishing posts. In comparison with after I set out, I can provide a way more correct image of why I’m actually doing this:
- It helps me train and lengthen my hands-on technical craft—one thing that doesn’t occur naturally within the managerial roles in my day job, however remains to be helpful for executing these roles even in a directorial and decision-making moderately than hands-on capability.
- I can study issues, with the motivation for further self-discipline (I actually need to be assured I’m getting some unfamiiar factor proper if I’m going to submit it) that comes from doing so in public. Quite a lot of instances I’ve had my course corrected by optimistic engagements after posting a weblog, both on social media or within the feedback part.
- Generally it’s simply enjoyable, and stress-free, to mess around with information and code. Significantly after I learn one thing fascinating and need to verify “wait, is that for actual?”. Or after I simply need to make a cool animation.
- I can check out stuff we’d (or won’t) need to use at work however, for no matter purpose, wants me to offer it a go myself in a method that doesn’t slot in with my regular work tasks but could be drawn on if useful.
- Generally (however not fairly often) I truly need to make an intervention within the public sphere and talk some information and concepts. How necessary this motivation is has assorted over time, and it’s by no means been notably necessary. There have been intervals after I revealed election forecasts for New Zealand that had no different equal on the time, and a few Covid modelling in Australia, the place speaking precise points was an important factor for my weblog. However these didn’t (and nearly definitely, couldn’t probably, given my power and curiosity ranges) final. Maybe the excessive level weblog submit that I actually needed individuals to learn was my publicity of Surgisphere, which made me Twitter-famous for a couple of days and was an necessary contribution to an investigation by the Guardian after which retraction of an article within the Lancet (surprisingly however gratifyingly quickly).
- My day job is helped by networking, and my curiosity and abilities in information and code is one device I can use in a small method to do this. I’m definitely not into running a blog for fame (or I hope I’d do in another way and higher than I’m), however I do search to make use of my posts in a sure method to broaden and strengthen my skilled networks. I publicise my posts on Bluesky and LinkedIn, typically Fb (and till 2024 on Twitter). They’re a method of getting myself identified to area of interest audiences, and really often a method to obtain an goal for my day job by publicising one thing cool we’re doing, positions we’re recruiting, or a problem we’re involved about.
Technical stuff in regards to the weblog
After I arrange my weblog I actually, actually hated the non-data technical stuff about getting it to work, having the fonts proper, understanding how domains work, deciding on format, and so on. I needed to learn fairly a couple of blogs on arrange blogs, and vowed to myself to not turn into certainly one of them. So I’ve comparatively few posts on the again finish of my weblog. However ten years on, there may be some (small) potential curiosity in what works for me, so right here is how my weblog works below the hood:
- It’s hosted on GitHub pages however has its personal area identify. This (the GitHub half) is free, and offers me loads of management over formatting, and works properly with Jekyll.
- I exploit Jekyll and resisted upgrading to Hugo when it comes alongside. In issues like this, “there’s a time for change, which is when it may possibly not be resisted”. If it ain’t broke, don’t repair it.
- It’s a Git repository inside a repository. The supply code is the necessary one which I work on and has a
_working
folder with all of the R and different technical scripts, and a_posts
folder with Markdown or HTML recordsdata for the precise posts. - After I construct the positioning it seems within the
_site
folder of the supply code repository._site
can be a Git repository and, when it’s all good to go, I push that to the https://github.com/ellisp/ellisp.github.io repository on GitHub, which is robotically revealed on GitHub pages. - I write all of the Markdown or HTML pages by hand. I exploit HTML when issues get too difficult layout-wise for Markdown (not fairly often).
- I don’t use RMarkdown or related for this weblog (knitting leads to with the code and textual content) as a result of I want to have full, guide management of the place I put a code chunk, plot or desk. And my inventive course of may be very a lot “work on the evaluation” after which “write it up”, which is properly supported by having a separate R script with the evaluation and a Mardown or HTML file with the write-up.
- I created and use the
frs
R bundle with a couple of supporting features, most necessary of which is thesvg_png()
operate. It makes use of the strategy described on this submit. It helps SVG recordsdata look good with Google fonts and dealing throughout platforms. It additionally saves near-identical PNG and SVG variations of pictures, so I can have PNG fall-backs for browsers that don’t present SVGs (this was an actual concern 10 years in the past, I don’t find out about now). - There are some issues like syntax highlighting, the area identify, hyperlink to Disqus for feedback part, that concerned a bunch of mucking round that I’m happy to say I’ve forgotten utterly what I needed to do.
Yeah, weblog to reside, don’t reside to weblog. That’s true on the whole, however by no means extra so than in fascinated about the stuff that makes it potential to weblog.
Phrase depend code
Right here’s the code that produced the charts proven earlier on this submit:
library(tidyverse)
library(stylo) # for delete.markup
library(glue)
library(ggtext)
#---------------Import and course of weblog posts-------------
blog_names <- checklist.recordsdata("../_posts", full.names = TRUE)
blogs <- tibble()
for(i in 1:size(blog_names)){
blogs[i, "full"] <- paste(readLines(blog_names[i]), collapse = " ")
blogs[i, "filename"] <- gsub("../_posts/", "", blog_names[i], fastened = TRUE)
}
blogs <- blogs |>
mutate(no_jekyll = gsub("{% spotlight R.*?%}.*?{% endhighlight %}", " ", full),
txt = "")
# delete markup solely works on one string at a time, appears best to do it in a loop:
for(i in 1:nrow(blogs)){
blogs[i, ]$txt <- delete.markup(blogs[i, ]$no_jekyll, markup.kind = "html")
}
# a couple of extra fundamental stats per weblog submit:
blogs <- blogs |>
mutate(word_count = stringi::stri_count_words(txt),
word_count_with_tags = stringi::stri_count_words(no_jekyll),
date = as.Date(str_extract(filename, "^[0-9]*-[0-9]*-[0-9]*")),
month = month(date),
12 months = 12 months(date))
#---------------Minimal anaylsis----------------
# Abstract aggregates
blog_sum <- blogs |>
summarise(number_blogs = n(),
words_with_tabs = sum(word_count_with_tags),
total_words = sum(word_count),
mean_words = imply(word_count),
median_words = median(word_count),
max_words = max(word_count),
min_words = min(word_count))
# Shortest weblog (seems to be one simply saying a piece shiny app):
blogs |>
organize(word_count) |>
slice(1) |>
pull(txt)
#------------------Graphics to be used in blog-------------------------
the_caption <- "Supply: https://freerangestats.information"
# Time collection plot exhibiting variety of posts by month:
d1 <- blogs |>
group_by(12 months, month) |>
summarise(number_blogs = n()) |>
ungroup() |>
full(12 months, month, fill = checklist(number_blogs = 0)) |>
# take away October, November, December in 2025 (as time of writing is September 2025):
filter(!(12 months == 2025 & month %in% 10:12)) |>
# take away months weblog didn't exist:
filter(!(12 months == 2015 & month %in% 1:6)) |>
group_by(12 months) |>
mutate(year_lab = glue("{12 months}: {sum(number_blogs)} posts"),
is_zero = ifelse(number_blogs == 0, "Zero", "NotZero"))
# mannequin a clean curve to the entire information set (don't need)
# to do that with geom_smooth within the plot as then it has
# break yearly:
mod <- loess(number_blogs ~ I(12 months + month / 12), information = d1, span = 0.15)
d1$fitted <- predict(mod)
# draw time collection plot of variety of blogs:
d1 |>
ggplot(aes(x = month, y = number_blogs)) +
facet_wrap(~year_lab) +
geom_line(aes(y = fitted), color = "grey80") +
geom_point(color = "steelblue", measurement = 2.5, aes(form = is_zero)) +
expand_limits(y = 0) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_shape_manual(values = c("Zero" = 1, "NotZero" = 19)) +
theme(panel.grid.minor = element_blank(),
axis.textual content.x = element_text(angle = 45, hjust = 1),
legend.place = "none") +
labs(x = "",
y = "Variety of weblog posts",
title = "Ten years of Free Vary Statistics running a blog",
subtitle = glue("{nrow(blogs)} posts and {comma(blog_sum$total_words)} phrases, in simply over ten years."),
caption = the_caption)
# Linked scatter plot evaluating common phrase depend to variety of posts:
blogs |>
mutate(number_months = case_when(
12 months == 2015 ~ 6,
12 months == 2025 ~8.5,
TRUE ~ 12
)) |>
group_by(12 months, number_months) |>
summarise(avg_word_count = imply(word_count, tr = 0.1),
number_blogs = n()) |>
ungroup() |>
mutate(blogs_per_month = number_blogs / number_months) |>
ggplot(aes(x = blogs_per_month, y = avg_word_count, label = 12 months)) +
geom_path(color = "grey80") +
geom_text(color = "grey50") +
scale_y_continuous(label = comma) +
expand_limits(x = 4.5) +
annotate("textual content", fontface = "italic", hjust = 0, color = "darkblue",
x = c(4, 3.4, 2.1),
y = c(1165, 1350, 1880),
label = c("Time collection", "Elections", "Covid")
) +
# add day jobs
annotate("textual content", fontface = "italic", hjust = 0, color = "brown",
x = c(3.1, 2.5, 0, 1.1),
y = c(1130, 1675, 1420, 1330),
label = c("NZ economics", "Marketing consultant", "Chief Knowledge Scientist", "Pacific")
) +
labs(x = "Weblog posts monthly",
y = "Common phrases per weblog submit",
title = "Ten years of Free Vary Statistics running a blog",
subtitle = "Annotated with necessary (however not essentially dominant) themes and day-jobs for various phases.",
caption = the_caption) +
theme(plot.subtitle = element_markdown())