Skip to content

Python API

This page documents the public Python API of ioc_cleanup. Only stable, user-facing functions are listed here.


Transformations & Cleaning

Functions related to loading, applying, and managing cleaning transformations.

ioc_cleanup.load_transformation(ioc_code, sensor, src_dir=_constants.TRANSFORMATIONS_DIR)

Load a transformation definition for a station and sensor.

This is a convenience wrapper around load_transformation_from_path that constructs the transformation filename from the IOC station code and sensor identifier.

Parameters:

Name Type Description Default
ioc_code str

IOC station code.

required
sensor str

Sensor identifier.

required
src_dir str | PathLike[str]

Directory containing transformation JSON files.

TRANSFORMATIONS_DIR

Returns:

Type Description
Transformation

Parsed transformation model.

Source code in ioc_cleanup/_tools.py
def load_transformation(
    ioc_code: str,
    sensor: str,
    src_dir: str | os.PathLike[str] = _constants.TRANSFORMATIONS_DIR,
) -> _models.Transformation:
    """
    Load a transformation definition for a station and sensor.

    This is a convenience wrapper around
    `load_transformation_from_path` that constructs the transformation
    filename from the IOC station code and sensor identifier.

    Args:
        ioc_code: IOC station code.
        sensor: Sensor identifier.
        src_dir: Directory containing transformation JSON files.

    Returns:
        Parsed transformation model.
    """
    path = f"{src_dir}/{ioc_code}_{sensor}.json"
    return load_transformation_from_path(path)

ioc_cleanup.load_transformation_from_path(path)

Load a transformation definition from a JSON file.

Parameters:

Name Type Description Default
path str | PathLike[str]

Path to a transformation JSON file describing cleaning rules.

required

Returns:

Type Description
Transformation

Parsed transformation model.

Source code in ioc_cleanup/_tools.py
def load_transformation_from_path(path: str | os.PathLike[str]) -> _models.Transformation:
    """
    Load a transformation definition from a JSON file.

    Parameters:
        path: Path to a transformation JSON file describing cleaning rules.

    Returns:
        Parsed transformation model.
    """
    with open(path) as fd:
        contents = fd.read()
    model = _models.Transformation.model_validate_json(contents)
    return model

ioc_cleanup.transform(df, transformation=None)

Apply a cleaning transformation to an IOC sea-level time series.

The transformation defines the valid time window, dropped timestamps, dropped date ranges, and sensor breakpoints. Bad data is ropped data from the DatFrame; no offset correction is applied.

Parameters:

Name Type Description Default
df DataFrame

Raw IOC sea-level time series. The DataFrame must have ioc_code and sensor entries in its attributes if transformation is not provided.

required
transformation Transformation | None

Cleaning transformation to apply. If not provided, it is loaded automatically using DataFrame attributes.

None

Returns:

Type Description
DataFrame

Cleaned time series with metadata stored in DataFrame.attrs.

Source code in ioc_cleanup/_tools.py
def transform(df: pd.DataFrame, transformation: _models.Transformation | None = None) -> pd.DataFrame:
    """
    Apply a cleaning transformation to an IOC sea-level time series.

    The transformation defines the valid time window, dropped timestamps,
    dropped date ranges, and sensor breakpoints. Bad data is ropped data from the DatFrame;
    no offset correction is applied.

    Parameters:
        df: Raw IOC sea-level time series. The DataFrame must have
            `ioc_code` and `sensor` entries in its attributes if
            `transformation` is not provided.
        transformation: Cleaning transformation to apply. If not provided,
            it is loaded automatically using DataFrame attributes.

    Returns:
        Cleaned time series with metadata stored in `DataFrame.attrs`.
    """
    df = df.copy()
    if transformation is None:
        transformation = load_transformation(ioc_code=df.attrs["ioc_code"], sensor=df.attrs["sensor"])
    df = df[transformation.start : transformation.end]  # type: ignore[misc]  # https://stackoverflow.com/questions/70763542/pandas-dataframe-mypy-error-slice-index-must-be-an-integer-or-none
    for start, end in transformation.dropped_date_ranges:
        df[start:end] = np.nan  # type: ignore[misc]  # https://stackoverflow.com/questions/70763542/pandas-dataframe-mypy-error-slice-index-must-be-an-integer-or-none
    if transformation.dropped_timestamps:
        t_ = pd.DatetimeIndex(transformation.dropped_timestamps)
        t0 = df.index[0]
        t1 = df.index[-1]
        drop_index = np.where(np.logical_and(t_ > t0, t_ < t1))[
            0
        ]  # this step is needed to select only timestamps within the DataFrame time window
        df.loc[t_[drop_index], :] = np.nan
    df.attrs["breakpoints"] = sorted(transformation.breakpoints)
    df.attrs["status"] = "transformed"
    return df

ioc_cleanup.clean(df, station, sensor)

Clean a raw IOC time series using the corresponding transformation file.

Wrapper around transform function: returns a single sensor series.

Parameters:

Name Type Description Default
df DataFrame

Raw IOC station data.

required
station str

IOC station code.

required
sensor str

Sensor identifier.

required

Returns:

Type Description
Series

Cleaned sea-level time series for the selected sensor.

Source code in ioc_cleanup/_tools.py
def clean(df: pd.DataFrame, station: str, sensor: str) -> pd.Series:
    """
    Clean a raw IOC time series using the corresponding transformation file.

    Wrapper around `transform` function: returns a single sensor series.

    Parameters:
        df: Raw IOC station data.
        station: IOC station code.
        sensor: Sensor identifier.

    Returns:
        Cleaned sea-level time series for the selected sensor.
    """
    trans = load_transformation_from_path("./transformations/" + station + "_" + sensor + ".json")
    return transform(df, trans)[sensor]

Surge & Signal Processing

Utilities for tidal analysis, demeaning, and surge extraction.

ioc_cleanup.surge(ts, opts, rsmp)

Compute the non-tidal (surge) component of a sea-level time series.

Tidal constituents are estimated using UTide and reconstructed at the original timestamps. The tidal signal is then subtracted from the observed series.

Parameters:

Name Type Description Default
ts Series

Sea-level time series.

required
opts Mapping[str, Any]

UTide solver options.

required
rsmp int | None

Optional resampling interval in minutes. If provided, the series is resampled before tidal analysis.

required

Returns:

Type Description
Series

Surge (non-tidal residual) time series.

Source code in ioc_cleanup/_tools.py
def surge(ts: pd.Series, opts: T.Mapping[str, T.Any], rsmp: int | None) -> pd.Series:
    """
    Compute the non-tidal (surge) component of a sea-level time series.

    Tidal constituents are estimated using UTide and reconstructed at the
    original timestamps. The tidal signal is then subtracted from the
    observed series.

    Parameters:
        ts: Sea-level time series.
        opts: UTide solver options.
        rsmp: Optional resampling interval in minutes. If provided, the
            series is resampled before tidal analysis.

    Returns:
        Surge (non-tidal residual) time series.
    """
    ts0 = ts.copy()
    if rsmp is not None:
        ts = ts.resample(f"{rsmp}min").mean()
        ts = ts.shift(freq=f"{rsmp / 2}min")
    coef = utide.solve(ts.index, ts, **opts)
    tidal = utide.reconstruct(ts0.index, coef, verbose=OPTS["verbose"])
    data = T.cast(np.ndarray, ts0.values - tidal.h)
    return pd.Series(data=data, index=ts0.index)

Station Metadata

Access to IOC station metadata and geographic information.

ioc_cleanup.get_meta() cached

Retrieve IOC station metadata with geographic coordinates.

Metadata are collected from both the IOC web service and the IOC API and merged into a single GeoDataFrame.

Returns:

Type Description
GeoDataFrame

GeoDataFrame containing IOC station codes, longitude, latitude,

GeoDataFrame

and geometry in EPSG:4326.

Source code in ioc_cleanup/_searvey.py
@functools.cache
def get_meta() -> gpd.GeoDataFrame:
    """
    Retrieve IOC station metadata with geographic coordinates.

    Metadata are collected from both the IOC web service and the IOC API
    and merged into a single GeoDataFrame.

    Returns:
        GeoDataFrame containing IOC station codes, longitude, latitude,
        and geometry in EPSG:4326.
    """
    meta_web = searvey.get_ioc_stations()
    meta_api = (
        pd.read_json("http://www.ioc-sealevelmonitoring.org/service.php?query=stationlist&showall=all")
        .drop_duplicates()
        .rename(columns={"Code": "ioc_code"})
    )

    merged = pd.merge(
        meta_web.drop(columns=["lon", "lat", "geometry"]),
        meta_api[["ioc_code", "Lon", "Lat"]].rename(columns={"Lon": "lon", "Lat": "lat"}).drop_duplicates(),
        on=["ioc_code"],
    )
    merged = T.cast(
        gpd.GeoDataFrame,
        merged.assign(geometry=gpd.points_from_xy(merged.lon, merged.lat, crs="EPSG:4326")),
    )
    return merged

Data Download

Helpers for downloading and storing IOC raw data.

ioc_cleanup.download_raw(ioc_codes, start, end)

Download raw IOC sea-level data for multiple stations.

Parameters:

Name Type Description Default
ioc_codes list[str]

List of IOC station codes.

required
start Timestamp

Start timestamp.

required
end Timestamp

End timestamp.

required

Returns:

Type Description
dict[str, DataFrame]

Dictionary mapping station codes to raw dataframes.

Source code in ioc_cleanup/_searvey.py
def download_raw(ioc_codes: list[str], start: pd.Timestamp, end: pd.Timestamp) -> dict[str, pd.DataFrame]:
    """
    Download raw IOC sea-level data for multiple stations.

    Parameters:
        ioc_codes: List of IOC station codes.
        start: Start timestamp.
        end: End timestamp.

    Returns:
        Dictionary mapping station codes to raw dataframes.
    """
    no_codes = len(ioc_codes)
    start_dates = pd.DatetimeIndex([start] * no_codes)
    end_dates = pd.DatetimeIndex([end] * no_codes)
    dataframes: dict[str, pd.DataFrame] = searvey._ioc_api._fetch_ioc(
        station_ids=ioc_codes,
        start_dates=start_dates,
        end_dates=end_dates,
        http_client=None,
        rate_limit=None,
        multiprocessing_executor=None,
        multithreading_executor=None,
        progress_bar=False,
    )
    return dataframes

ioc_cleanup.download_year_station(station, year, data_folder='./data')

Download and store one year of IOC data for a single station.

Data are saved as Parquet files under <data_folder>/<year>/.

Parameters:

Name Type Description Default
station str

IOC station code.

required
year int

Year to download.

required
data_folder str

Base directory for storing downloaded data.

'./data'
Source code in ioc_cleanup/_searvey.py
def download_year_station(
    station: str,
    year: int,
    data_folder: str = "./data",
) -> None:
    """
    Download and store one year of IOC data for a single station.

    Data are saved as Parquet files under `<data_folder>/<year>/`.

    Parameters:
        station: IOC station code.
        year: Year to download.
        data_folder: Base directory for storing downloaded data.
    """
    data_folder = os.path.abspath(data_folder)
    year_folder = os.path.join(data_folder, str(year))
    os.makedirs(year_folder, exist_ok=True)
    try:
        start = pd.Timestamp(f"{year}-01-01")
        end = pd.Timestamp(f"{year}-12-31T23:59:59")
        dict_df = download_raw([station], start, end)
        df = dict_df[station]
        if not df.empty:
            df.to_parquet(f"{year_folder}/{station}.parquet")
            logger.info(f"  Saved {station} for {year}")
    except Exception as e:
        logger.error(f"Error for {station} in {year}: {e}")

Data Loading

Utilities for loading archived IOC data from disk.

ioc_cleanup.load_station(station, data_dir=Path('./data'), start_year=2011, end_year=2024)

Load multi-year IOC data for a station from local Parquet files.

Parameters:

Name Type Description Default
station str

IOC station code.

required
data_dir Path

Base directory containing yearly Parquet files.

Path('./data')
start_year int

First year to load (inclusive).

2011
end_year int

Last year to load (exclusive).

2024

Returns:

Type Description
DataFrame

Concatenated DataFrame containing the available station data.

DataFrame

Returns an empty DataFrame if no data are found.

Source code in ioc_cleanup/_searvey.py
def load_station(
    station: str,
    data_dir: Path = Path("./data"),
    start_year: int = 2011,
    end_year: int = 2024,
) -> pd.DataFrame:
    """
    Load multi-year IOC data for a station from local Parquet files.

    Parameters:
        station: IOC station code.
        data_dir: Base directory containing yearly Parquet files.
        start_year: First year to load (inclusive).
        end_year: Last year to load (exclusive).

    Returns:
        Concatenated DataFrame containing the available station data.
        Returns an empty DataFrame if no data are found.
    """
    dfs = []
    for year in range(start_year, end_year):
        path = data_dir / str(year) / f"{station}.parquet"
        if not os.path.exists(path):
            continue
        df = pd.read_parquet(path)
        if df.empty:
            continue
        dfs.append(df)

    if dfs:
        return pd.concat(dfs)
    else:
        logger.error(f"No data found for station {station}")
        return pd.DataFrame()

Models

Core data models used by the cleaning workflow.

ioc_cleanup.Transformation

Bases: BaseModel

Source code in ioc_cleanup/_models.py
class Transformation(pydantic.BaseModel):
    ioc_code: str
    sensor: str
    notes: str = ""
    skip: bool = False
    wip: bool = False
    start: datetime.datetime
    end: datetime.datetime
    high: float | None = None
    low: float | None = None
    dropped_date_ranges: list[tuple[datetime.datetime, datetime.datetime]] = []
    dropped_timestamps: list[datetime.datetime] = []
    breakpoints: list[datetime.datetime] = []
    tsunami: list[tuple[datetime.datetime, datetime.datetime]] = []