Overview

TimeDataModel is a lightweight Python library for working with time series data. It provides a metadata-rich container whose underlying DataFrame is optional — the same TimeSeries class covers both data-bearing series and metadata-only declarations. Backed by Polars internally and fully interoperable with pandas, NumPy, Polars, and PyArrow — together with enums and geographic types for annotating your data.

The library is designed for energy, weather, and forecasting workflows where data arrives from many sources at different times — but it is general enough for any domain that works with timestamped numerical data.


Core data structures

TimeSeries

A single univariate time series. The underlying Polars DataFrame always has a "value" column and one or more timestamp columns whose layout is determined by the inferred DataShape (see below).

import pandas as pd
from timedatamodel import TimeSeries, Frequency

df = pd.DataFrame({
    "valid_time": pd.date_range("2024-01-01", periods=48, freq="h", tz="UTC"),
    "value": [float(i) for i in range(48)],
})

ts = TimeSeries.from_pandas(
    df,
    frequency=Frequency.PT1H,
    name="wind_power",
    unit="MW",
)

Supported metadata fields: name, unit, data_type, timeseries_type, frequency, timezone, description.

Key operations:

Operation

Example

Slicing

ts.head(n), ts.tail(n)

Unit conversion

ts.convert_unit("kW") (requires [pint])

Format export

ts.to_pandas(), ts.to_polars(), ts.to_list(), ts.to_numpy(), ts.to_pyarrow()

Format import

TimeSeries.from_pandas(df, ...), from_polars(...), from_list(...), from_numpy(...), from_pyarrow(...)

Validation

ts.validate_for_insert()


Metadata-only TimeSeries

Construct a TimeSeries with df=None to declare the structure of a series before any data exists — useful for cataloging or registering series in advance. All metadata fields (name, unit, data_type, timeseries_type, frequency, timezone, description) are available; data-bearing methods (to_pandas, head, convert_unit, …) raise ValueError. Use ts.has_df to check.

from timedatamodel import TimeSeries, DataType

# Metadata-only — no DataFrame yet.
ts = TimeSeries(
    name="wind_power",
    unit="MW",
    data_type=DataType.FORECAST,
)
assert ts.has_df is False
assert ts.shape is None

# Later, once data arrives, build a fresh data-bearing instance.
ts_with_data = TimeSeries(df, name=ts.name, unit=ts.unit, data_type=ts.data_type)

DataShape is inferred from the DataFrame at construction time; metadata-only instances have shape is None.

Identity-shaped concepts (labels, tags) and spatial context (locations) belong to consumer layers (e.g. energydatamodel, energydb), not to time-series metadata itself.


Data shapes

The DataShape enum controls which timestamp columns are present in the underlying DataFrame. This lets a single class cover standard time series as well as versioned and audit-trail patterns:

Shape

Columns

Use case

SIMPLE

valid_time, value

Standard time series

VERSIONED

knowledge_time, valid_time, value

Record when each value was produced

CORRECTED

valid_time, change_time, value

Track when a value was revised

AUDIT

knowledge_time, change_time, valid_time, value

Full audit trail

All timestamp columns are stored as pl.Datetime("us", time_zone="UTC").


Frequency

The Frequency enum defines ISO 8601 duration values. Three are calendar-based (their exact duration depends on the calendar); the rest are fixed-interval:

Value

Description

Calendar-based

P1Y

1 year

yes

P3M

3 months (quarter)

yes

P1M

1 month

yes

P1W

1 week

P1D

1 day

PT1H

1 hour

PT30M

30 minutes

PT15M

15 minutes

PT10M

10 minutes

PT5M

5 minutes

PT1M

1 minute

PT1S

1 second

NONE

No fixed frequency


DataType taxonomy

Every time series can carry a DataType annotation describing the nature of the data. The taxonomy has two roots and 10 leaf types:

ACTUAL
├── OBSERVATION ── directly measured or recorded values
└── DERIVED ────── computed from observations (e.g. aggregated measurements)

CALCULATED
├── ESTIMATION
│   ├── FORECAST ─────── future predictions with a known issue time
│   ├── PREDICTION ───── statistical/ML predictions (general)
│   ├── SCENARIO ─────── what-if analyses under assumed conditions
│   ├── SIMULATION ───── outputs from physical or numerical models
│   └── RECONSTRUCTION ─ hindcasts or gap-filled historical data
└── REFERENCE
    ├── BASELINE ──────── reference scenario for comparison
    ├── BENCHMARK ─────── industry-standard or best-practice reference
    └── IDEAL ─────────── theoretical optimum or design values

Each DataType member exposes .parent, .children, .is_leaf, and .root properties.


Geographic types

Geographic primitives are exported from the package for consumer layers (e.g. energydatamodel) that need to attach spatial context to entities. They are no longer attached to TimeSeries itself.

GeoLocation — a single point defined by latitude and longitude:

from timedatamodel import GeoLocation

loc = GeoLocation(latitude=59.33, longitude=18.07)  # Stockholm

Provides distance_to() (Haversine), bearing_to(), midpoint(), and offset().

GeoArea — a polygon region (requires the [geo] extra):

from timedatamodel import GeoArea

area = GeoArea.from_coordinates([(59.0, 17.5), (59.0, 18.5), (59.5, 18.5), (59.5, 17.5)])

Provides contains_point(), overlaps(), centroid, and bounding_box().


Other types

DataPoint

A NamedTuple with two fields: timestamp (a datetime) and value (a float or None).

TimeSeriesType

Describes the index structure of a time series:

  • FLAT — standard non-overlapping timestamps

  • OVERLAPPING — overlapping windows (e.g. rolling forecasts with a (issue_time, valid_time) multi-index)


Next steps