Skip to content

Pandas helpers

Everything in this module requires the vayulib[data] extra (pip install 'vayulib[data]').

slice_frame

Slice a DataFrame or Series by an Interval (or TimeWindow). Works on the index, a specific index level, or a column:

from vayu import Interval, TimeWindow
from vayu.pandas_utils import slice_frame

# Slice by index (defaults to axis=0, level=0)
subset = slice_frame(Interval(100, 200), df)

# Slice by a named column
subset = slice_frame(TimeWindow.behind(hours=6), df, key="timestamp")

# Slice by a specific MultiIndex level
subset = slice_frame(Interval(0, 10), df, level=1)

# Invert — everything OUTSIDE the interval
outside = slice_frame(Interval(100, 200), df, exclude=True)

Caveat: when slicing by index (no key), the index must be sorted.

select_frame / df.select

A tiny query DSL using double-underscore operators, borrowed from the Django ORM:

from vayu.pandas_utils import select_frame

select_frame(df, status="active")             # status == "active"
select_frame(df, price__gt=100)               # price > 100
select_frame(df, price__gte=100, price__lt=500)
select_frame(df, tag__in=["a", "b"])
select_frame(df, tag__notin=["x"])
select_frame(df, notes__isnull=True)          # notes.isna()
select_frame(df, created=TimeWindow.behind(hours=6))   # Interval → slice_frame

Supported ops: eq (default), ne/neq, gt, gte, lt, lte, in/isin, nin/notin, isna/isnull.

The .select() method

Enable a df.select(...) / series.select(...) method by calling this once at startup:

import vayu
vayu.install_pandas_extensions()

df.select(price__gt=100, status="active")

This is opt-in because it monkey-patches pandas.DataFrame and pandas.Series globally. It only installs if the attribute isn't already present.

split_frame

Split at a fractional, integer, or datetime boundary:

from vayu.pandas_utils import split_frame

train, test = split_frame(df, 0.8)             # 80/20 by length
a, b = split_frame(df, 1000)                   # split at iloc 1000
before, after = split_frame(df, cutoff_dt)     # split at a timestamp

concat_frame_from_dir

Read and concatenate all files of an extension in a directory, sorted by filename:

from vayu.pandas_utils import concat_frame_from_dir

big = concat_frame_from_dir("/data/ingest", extension="parquet", progress=True)
filtered = concat_frame_from_dir("/data/ingest", prefix="2026-04-", extension="parquet")

Supported extensions: parquet (default), feather, csv, pickle. Empty frames are dropped before concatenation.

get_frame_window

Derive a TimeWindow from a DataFrame's index or a timestamp column:

from vayu.pandas_utils import get_frame_window

window = get_frame_window(df)                 # uses index
window = get_frame_window(df, column="ts")    # uses a column
window = get_frame_window(df, level=1)        # uses MultiIndex level 1

Returns None for empty frames.

See also