Pandas helpers¶
Everything in this module requires the vayulib[data] extra (pip install 'vayulib[data]').
slice_frame¶
Slice a DataFrame or Series by an Interval (or TimeWindow). Works on the index, a specific index level, or a column:
from vayu import Interval, TimeWindow
from vayu.pandas_utils import slice_frame
# Slice by index (defaults to axis=0, level=0)
subset = slice_frame(Interval(100, 200), df)
# Slice by a named column
subset = slice_frame(TimeWindow.behind(hours=6), df, key="timestamp")
# Slice by a specific MultiIndex level
subset = slice_frame(Interval(0, 10), df, level=1)
# Invert — everything OUTSIDE the interval
outside = slice_frame(Interval(100, 200), df, exclude=True)
Caveat: when slicing by index (no key), the index must be sorted.
select_frame / df.select¶
A tiny query DSL using double-underscore operators, borrowed from the Django ORM:
from vayu.pandas_utils import select_frame
select_frame(df, status="active") # status == "active"
select_frame(df, price__gt=100) # price > 100
select_frame(df, price__gte=100, price__lt=500)
select_frame(df, tag__in=["a", "b"])
select_frame(df, tag__notin=["x"])
select_frame(df, notes__isnull=True) # notes.isna()
select_frame(df, created=TimeWindow.behind(hours=6)) # Interval → slice_frame
Supported ops: eq (default), ne/neq, gt, gte, lt, lte, in/isin, nin/notin, isna/isnull.
The .select() method¶
Enable a df.select(...) / series.select(...) method by calling this once at startup:
This is opt-in because it monkey-patches pandas.DataFrame and pandas.Series globally. It only installs if the attribute isn't already present.
split_frame¶
Split at a fractional, integer, or datetime boundary:
from vayu.pandas_utils import split_frame
train, test = split_frame(df, 0.8) # 80/20 by length
a, b = split_frame(df, 1000) # split at iloc 1000
before, after = split_frame(df, cutoff_dt) # split at a timestamp
concat_frame_from_dir¶
Read and concatenate all files of an extension in a directory, sorted by filename:
from vayu.pandas_utils import concat_frame_from_dir
big = concat_frame_from_dir("/data/ingest", extension="parquet", progress=True)
filtered = concat_frame_from_dir("/data/ingest", prefix="2026-04-", extension="parquet")
Supported extensions: parquet (default), feather, csv, pickle. Empty frames are dropped before concatenation.
get_frame_window¶
Derive a TimeWindow from a DataFrame's index or a timestamp column:
from vayu.pandas_utils import get_frame_window
window = get_frame_window(df) # uses index
window = get_frame_window(df, column="ts") # uses a column
window = get_frame_window(df, level=1) # uses MultiIndex level 1
Returns None for empty frames.