# Introduction
If you happen to work with sensor readings, server metrics, or any knowledge that arrives over time, you already know that normal scikit-learn pipelines do not fairly match. Time sequence knowledge has construction that tabular fashions ignore: seasonality, pattern, temporal ordering, and the truth that future values depend upon previous ones.
sktime is a Python library constructed particularly for this. It provides you a scikit-learn-style API — match, predict, remodel — however designed from the bottom up for time sequence. You are able to do forecasting, classification, regression, and clustering on time sequence, all with a constant interface.
On this article, you will work by way of an instance drawback: forecasting temperature readings from an industrial HVAC sensor. You may learn the way sktime handles time sequence knowledge, the right way to construct preprocessing pipelines, the right way to match forecasters, and the right way to consider them.
You may get the code on GitHub.
# Stipulations
You may want Python 3.10 or increased and a primary familiarity with pandas. Set up every thing you want with:
pip set up sktime pmdarima statsmodels
If you happen to’d relatively have all optionally available dependencies in a single shot, pip set up sktime[all_extras] covers them.
# What Makes sktime Helpful
It helps to grasp the issue sktime is fixing. In scikit-learn, your knowledge is a 2D desk — rows are samples, columns are options. Time sequence knowledge breaks this assumption as a result of every “row” is definitely a sequence of values over time, and the order of these values issues.
The primary knowledge containers you will use are:
| Information Kind | Illustration | Description |
|---|---|---|
| Collection |
pd.Collection or pd.DataFrame
|
A single time sequence utilized in vanilla forecasting. |
| Panel |
pd.DataFrame with a 2-level MultiIndex
|
A group of a number of impartial time sequence. |
| Hierarchical |
pd.DataFrame with a 3+ stage MultiIndex
|
A structured set of time sequence with aggregation ranges throughout a number of dimensions. |
For the time index itself, sktime helps a number of time indexes: DatetimeIndex, PeriodIndex, Int64Index, and RangeIndex in your pandas objects. The index have to be monotonic. If you happen to’re utilizing DatetimeIndex, the freq attribute needs to be set.
# Setting Up the Dataset
Let’s create a practical dataset. Think about an HVAC sensor in a manufacturing unit that data temperature each hour. The readings have a day by day seasonal sample (increased throughout working hours), a slight upward pattern resulting from summer time, and a few noise.
import numpy as np
import pandas as pd
np.random.seed(42)
# 90 days of hourly readings beginning Jan 1, 2026
n_hours = 90 * 24
timestamps = pd.date_range(begin="2026-01-01", durations=n_hours, freq="h")
# Pattern: gradual 5-degree rise over 90 days
pattern = np.linspace(0, 5, n_hours)
# Every day seasonality: temperature peaks at 2pm, dips at 4am
hour_of_day = np.arange(n_hours) % 24
daily_cycle = 4 * np.sin(2 * np.pi * (hour_of_day - 4) / 24)
# Noise
noise = np.random.regular(0, 0.8, n_hours)
# Base temperature round 20°C
temperature = 20 + pattern + daily_cycle + noise
# Introduce a number of lacking values (sensor dropout)
dropout_indices = [300, 301, 302, 1440, 1441]
temperature[dropout_indices] = np.nan
y = pd.Collection(temperature, index=timestamps, identify="temp_celsius")
y.index.freq = pd.tseries.frequencies.to_offset("h")
print(y.head())
print(f"nShape: {y.form}")
print(f"Lacking values: {y.isna().sum()}")
print(f"Index kind: {kind(y.index)}")
Output:
2026-01-01 00:00:00 16.933270
2026-01-01 01:00:00 17.063277
2026-01-01 02:00:00 18.522783
2026-01-01 03:00:00 20.190095
2026-01-01 04:00:00 19.821941
Freq: h, Identify: temp_celsius, dtype: float64
Form: (2160,)
Lacking values: 5
Index kind:
# Splitting Time Collection Information for Coaching and Testing
Splitting time sequence knowledge is totally different from tabular knowledge — you’ll be able to’t shuffle rows. You will need to at all times cut up chronologically: prepare on earlier knowledge, check on later knowledge.
sktime offers temporal_train_test_split for this goal:
from sktime.cut up import temporal_train_test_split
# Maintain out the final 7 days (168 hours) because the check set
y_train, y_test = temporal_train_test_split(y, test_size=168)
print(f"Practice: {y_train.index[0]} → {y_train.index[-1]}")
print(f"Take a look at: {y_test.index[0]} → {y_test.index[-1]}")
print(f"Practice dimension: {len(y_train)}, Take a look at dimension: {len(y_test)}")
Output:
Practice: 2026-01-01 00:00:00 → 2026-03-24 23:00:00
Take a look at: 2026-03-25 00:00:00 → 2026-03-31 23:00:00
Practice dimension: 1992, Take a look at dimension: 168
The perform ensures the cut up is clear and chronological — no knowledge leakage from the longer term into the coaching set.
# Defining the Forecasting Horizon
Earlier than becoming any mannequin, you want to inform sktime which era steps you wish to predict. That is the ForecastingHorizon.
from sktime.forecasting.base import ForecastingHorizon
# Predict 168 steps forward (7 days of hourly knowledge)
# is_relative=False means we're utilizing absolute timestamps
fh = ForecastingHorizon(y_test.index, is_relative=False)
print(f"Horizon size: {len(fh)}")
print(f"First forecast level: {fh[0]}")
print(f"Final forecast level: {fh[-1]}")
This offers:
Horizon size: 168
First forecast level: 2026-03-25 00:00:00
Final forecast level: 2026-03-31 23:00:00
You can too use relative horizons like fh = [1, 2, 3, ..., 168], which implies “1 step forward, 2 steps forward, …”. Absolute horizons are cleaner when you have got precise timestamps you need predictions for.
# Constructing a Preprocessing and Forecasting Pipeline
Actual sensor knowledge has lacking values, seasonal patterns, and pattern — you want to deal with all of those earlier than or throughout forecasting. sktime’s TransformedTargetForecaster enables you to chain transformations with a forecaster right into a single estimator. The transformations are utilized to the goal sequence y earlier than becoming, and robotically reversed on the way in which out throughout prediction.
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.sequence.impute import Imputer
from sktime.transformations.sequence.detrend import Deseasonalizer, Detrender
pipeline = TransformedTargetForecaster(
steps=[
# Step 1: Fill missing sensor readings using linear interpolation
("imputer", Imputer(method="linear")),
# Step 2: Remove the linear trend so the forecaster sees a stationary series
("detrender", Detrender()),
# Step 3: Remove the daily seasonality (sp=24 for hourly data with 24-hour cycles)
("deseasonalizer", Deseasonalizer(model="additive", sp=24)),
# Step 4: Forecast the cleaned, stationary residuals
("forecaster", ExponentialSmoothing(trend=None, seasonal=None)),
]
)
pipeline.match(y_train, fh=fh)
y_pred = pipeline.predict()
print(y_pred.head())
Output:
2026-03-25 00:00:00 21.210066
2026-03-25 01:00:00 21.788986
2026-03-25 02:00:00 22.615184
2026-03-25 03:00:00 23.688449
2026-03-25 04:00:00 24.621127
Freq: h, Identify: temp_celsius, dtype: float64
This is what every step does:
Imputer(methodology="linear")fills lacking values by linearly interpolating between the encircling readings, which works properly for sensor knowledge.Detrender()suits a linear pattern to the coaching sequence and subtracts it; on prediction it provides the pattern again.Deseasonalizer(sp=24)removes the 24-hour cycle from the residuals;spstands for seasonal interval.- Lastly,
ExponentialSmoothingforecasts the detrended, deseasonalized residuals. - When
predict()known as, all inverse transformations are utilized in reverse order robotically, and also you get again predictions within the unique temperature scale.
# Evaluating the Forecast
sktime integrates with normal analysis metrics. For forecasting, imply absolute error (MAE) and imply absolute share error (MAPE) are frequent selections.
from sktime.performance_metrics.forecasting import (
mean_absolute_error,
mean_absolute_percentage_error,
)
mae = mean_absolute_error(y_test, y_pred)
mape = mean_absolute_percentage_error(y_test, y_pred)
print(f"MAE: {mae:.3f} °C")
print(f"MAPE: {mape*100:.2f}%")
Output:
MAE: 0.584 °C
MAPE: 2.40%
# Swapping in a Totally different Forecaster
One of many largest benefits of the sktime interface is that swapping the underlying algorithm requires altering only one line. Let’s attempt an ARIMA mannequin rather than exponential smoothing and evaluate.
from sktime.forecasting.arima import ARIMA
pipeline_arima = TransformedTargetForecaster(
steps=[
("imputer", Imputer(method="linear")),
("detrender", Detrender()),
("deseasonalizer", Deseasonalizer(model="additive", sp=24)),
# ARIMA(1,1,1) on the cleaned residuals
("forecaster", ARIMA(order=(1, 1, 1), suppress_warnings=True)),
]
)
pipeline_arima.match(y_train, fh=fh)
y_pred_arima = pipeline_arima.predict()
mae_arima = mean_absolute_error(y_test, y_pred_arima)
mape_arima = mean_absolute_percentage_error(y_test, y_pred_arima)
print(f"ARIMA MAE: {mae_arima:.3f} °C")
print(f"ARIMA MAPE: {mape_arima*100:.2f}%")
Output:
ARIMA MAE: 0.586 °C
ARIMA MAPE: 2.41%
The important thing level is that the preprocessing steps — imputation, detrending, deseasonalization — stayed similar. You solely modified the ultimate forecaster, and every thing else composed cleanly round it.
# Cross-Validating Throughout Time
Holding out a single check window will be deceptive. sktime offers time sequence cross-validation by way of splitters that respect temporal ordering.
SlidingWindowSplitter makes use of a rolling window: the coaching window slides ahead in time, at all times staying the identical size. ExpandingWindowSplitter grows the coaching set cumulatively as you progress ahead, which is extra acceptable whenever you wish to use all obtainable historical past.
from sktime.cut up import ExpandingWindowSplitter
from sktime.forecasting.model_evaluation import consider
# Increasing window: begin with 1800-hour prepare set, consider on 168-hour home windows
cv = ExpandingWindowSplitter(
initial_window=1800,
fh=record(vary(1, 169)),
step_length=168,
)
outcomes = consider(
forecaster=pipeline,
y=y,
cv=cv,
scoring=mean_absolute_error,
return_data=False,
)
print(outcomes[["test__DynamicForecastingErrorMetric", "fit_time"]].spherical(3))
print(f"nMean CV MAE: {outcomes['test__DynamicForecastingErrorMetric'].imply():.3f} °C")
Output:
test__DynamicForecastingErrorMetric fit_time
0 0.627 0.274
1 0.585 0.100
Imply CV MAE: 0.606 °C
consider returns a DataFrame with per-fold metrics and timing. The cross-validation MAE confirms that the mannequin generalizes persistently throughout totally different time home windows within the knowledge.
# Subsequent Steps
This text coated the core forecasting workflow in sktime, however the library extends far past primary prediction duties.
It additionally helps time-series classification, probabilistic forecasting with uncertainty estimates, coaching shared fashions throughout a number of associated time sequence, adapting conventional machine studying algorithms for sequential forecasting, and automating mannequin choice and tuning workflows.
One in all sktime’s largest strengths is its constant API and integration with the broader Python machine studying ecosystem, making experimentation simpler for each inexperienced persons and skilled practitioners. The sktime docs and instance notebooks are particularly well-written and are price bookmarking in the event you usually work with forecasting or temporal knowledge issues.
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embody DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At the moment, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.
