to get this notebook working:

poetry add ipykernel hvplot shapely searvey geoviews utide

let's add also a package we use for cleaning tide gauges (link):

poetry add git+https://github.com/seareport/ioc_cleanup.git#ioc_2024
In [1]:
import searvey
import shapely
import utide
import pandas as pd
import hvplot.pandas

import ioc_cleanup as C

some functions

In [ ]:
def data_availability(series: pd.Series, freq="60min") -> float:
    resampled = series.resample(freq).mean()
    data_avail_ratio = 1 - resampled.isna().sum() / len(resampled)
    return float(data_avail_ratio)

# small function to detide signal (using Utide: https://github.com/wesleybowman/UTide)
def surge(ts: pd.Series, lat: float, resample: int = None): 
    ts0 = ts.copy()
    OPTS = {
        "constit": "auto", 
        "method": "ols", 
        "order_constit": "frequency",
        "Rayleigh_min": 0.97,
        "lat": lat,
        "verbose": True
    }
    if resample is not None:
        ts = ts.resample(f"{resample}min").mean()
        ts = ts.shift(freq=f"{resample / 2}min")
    coef = utide.solve(ts.index, ts, **OPTS)
    tidal = utide.reconstruct(ts0.index, coef, verbose = OPTS["verbose"])
    return pd.Series(data=ts0.values - tidal.h, index = ts0.index)
In [20]:
# Albatross project sites
albat_dict = {
    "Site": [
        "Keta Basin, Ghana",
        "Kigamboni District Hub, Tanzania",
        "Morondava District Hub, Madagascar"
    ],
    "lat": [5.9000, -6.8500, -20.2833],
    "lon": [0.9833, 39.3000, 44.3167]
}

albatross_sites = pd.DataFrame(albat_dict)
albatross_sites
Out[20]:
Site lat lon
0 Keta Basin, Ghana 5.9000 0.9833
1 Kigamboni District Hub, Tanzania -6.8500 39.3000
2 Morondava District Hub, Madagascar -20.2833 44.3167

get stations around africa

In [7]:
ioc_df = searvey.get_ioc_stations()
africa = shapely.box(-26, -35, 63, 38)
ioc_africa = ioc_df[ioc_df.geometry.within(africa)]

example for zanz, station in Zanzibar:

In [5]:
station = "zanz"
sensor = "prs"

details about the station, and location:

In [27]:
ioc_df[ioc_df.ioc_code==station]
plot = ioc_africa.hvplot(
 tiles=True, 
 hover_cols = ['ioc_code'], 
 label = "IOC stations"
) * ioc_df[ioc_df.ioc_code==station].hvplot(
 geo=True, 
 hover_cols = ['ioc_code'], 
 c="r", 
 label = "zanz"
) * albatross_sites.hvplot.points(
 x = "lon",
 y = "lat",
 geo=True, 
 hover_cols = ['Site'], 
 c="g",
 s = 700,
 marker = "*", 
 label = "albatross sites")
plot.opts(width = 1000, height = 1000)
Out[27]:
ioc_code gloss_id country location connection dcp_id last_observation_level last_observation_time delay interval ... observations_expected_per_month observations_ratio_per_month observations_ratio_per_day sample_interval average_delay_per_day transmit_interval lat lon contacts geometry
1653 zanz 297 Tanzania Zanzibar SXXX33 1605D3E0 -2.23 06:58 20' 15' ... 43200.0 99 100 1' 5' 15' -6.15 39.183 Department of Land and Survey ( Tanzania ) +Un... POINT (39.183 -6.15)

1 rows × 25 columns

Out[27]:

let's extract data and check it's availability (extraction for 25 years should take around 3min with an average internet connection)

In [8]:
raw = searvey.fetch_ioc_station(
    station, 
    "2000-01-01", 
    pd.Timestamp.now()
)
raw.describe()
Out[8]:
prs rad enc bat sw1 sw2
count 8.784477e+06 2.967301e+06 1.561668e+06 426897.000000 427939.000000 427611.000000
mean 5.372223e+00 4.971022e+00 5.076518e+00 0.012746 0.030172 0.028526
std 3.862470e+02 5.506712e+02 1.590083e+00 0.006977 0.039193 0.029911
min -6.228000e+00 -6.497000e+00 -8.893000e+00 0.000000 -0.008000 0.000000
25% 2.690000e+00 3.660000e+00 4.120000e+00 0.012400 0.000000 0.000000
50% 3.711000e+00 4.536000e+00 5.182000e+00 0.012600 0.029000 0.018000
75% 5.032000e+00 5.427000e+00 6.191000e+00 0.013000 0.060000 0.060000
max 5.945058e+05 7.049471e+05 6.349000e+01 4.253000 4.910000 4.230000
In [28]:
_lat = ioc_df[ioc_df.ioc_code == station].lat.values[0]
In [29]:
raw[sensor].loc["2014-10":].resample("1h").mean().hvplot()
Out[29]:

let's clean the data, using ioc_cleanup

In [10]:
!mkdir -p transformations
import requests
open(f'transformations/{station}_{sensor}.json', 'wb').write(
    requests.get(f'https://raw.githubusercontent.com/seareport/ioc_cleanup/refs/heads/ioc_2024/transformations/{station}_{sensor}.json').content
)
Out[10]:
55773
In [11]:
! head -20 "transformations/{station}_{sensor}.json" 
{
  "ioc_code": "zanz",
  "sensor": "prs",
  "notes": "Various small spikes, but we need it for coverage",
  "skip": false,
  "wip": false,
  "start": "2022-01-01T00:00:00",
  "end": "2025-01-01T00:00:00",
  "high": null,
  "low": null,
  "dropped_date_ranges": [
    [
      "2022-03-27T00:41:00",
      "2022-03-27T01:59:00"
    ],
    [
      "2022-07-31T22:20:00",
      "2022-07-31T22:29:00"
    ],
    [

let's clean the signal using the transformation

In [30]:
trans = C.load_transformation(station, sensor)
ts = C.transform(raw, trans)[sensor]
ts.resample("1h").mean().hvplot(title=f"water level signal in '{station}'")
Out[30]:

let's detide the signal to isolate storm surges

In [32]:
detided = surge(ts, lat = _lat, resample=2)
solve: matrix prep ... solution ... done.
prep/calcs ... done.

visualise the raw signal

In [34]:
detided.resample("1h").mean().dropna().hvplot().opts(width=1300, height = 400, title=f"surge level signal in '{station}'")
Out[34]:

not really interesting.. no particular events happened in 2022-2024, although we can see a small contribution in september 2024

Unfortunately we only cleaned from 2022 to 2025, for our model validation purposes.

However you cans still change the start date to 2000 to get more data

let's have a look at the bigger time series.. and modify the transformation accordingly

In [36]:
trans.start 
trans.start = pd.Timestamp(2014,10,10).to_pydatetime()
trans.dropped_date_ranges.append([pd.Timestamp(2020,4,14).to_pydatetime(), pd.Timestamp(2021,10,27).to_pydatetime()])
Out[36]:
datetime.datetime(2022, 1, 1, 0, 0)
In [37]:
ts = C.transform(raw, trans)[sensor]
ts.resample("1h").mean().hvplot(title=f"water level signal in '{station}'")
Out[37]:

detide for the last 10 years (this might take time.. and some processing)

In [38]:
detided = surge(ts, lat = _lat, resample=20)
solve: matrix prep ... solution ... done.
prep/calcs ... done.
In [53]:
YEAR = 2014
detided.loc[f"{YEAR}":f"{YEAR+1}"].resample("1h").mean().hvplot().opts(width=1300, height = 400, title=f"surge level signal in '{station}'")
Out[53]: