to get this notebook working:
poetry add ipykernel hvplot shapely searvey geoviews utide
let's add also a package we use for cleaning tide gauges (link):
poetry add git+https://github.com/seareport/ioc_cleanup.git#ioc_2024
import searvey
import shapely
import utide
import pandas as pd
import hvplot.pandas
import ioc_cleanup as C
some functions
def data_availability(series: pd.Series, freq="60min") -> float:
resampled = series.resample(freq).mean()
data_avail_ratio = 1 - resampled.isna().sum() / len(resampled)
return float(data_avail_ratio)
# small function to detide signal (using Utide: https://github.com/wesleybowman/UTide)
def surge(ts: pd.Series, lat: float, resample: int = None):
ts0 = ts.copy()
OPTS = {
"constit": "auto",
"method": "ols",
"order_constit": "frequency",
"Rayleigh_min": 0.97,
"lat": lat,
"verbose": True
}
if resample is not None:
ts = ts.resample(f"{resample}min").mean()
ts = ts.shift(freq=f"{resample / 2}min")
coef = utide.solve(ts.index, ts, **OPTS)
tidal = utide.reconstruct(ts0.index, coef, verbose = OPTS["verbose"])
return pd.Series(data=ts0.values - tidal.h, index = ts0.index)
# Albatross project sites
albat_dict = {
"Site": [
"Keta Basin, Ghana",
"Kigamboni District Hub, Tanzania",
"Morondava District Hub, Madagascar"
],
"lat": [5.9000, -6.8500, -20.2833],
"lon": [0.9833, 39.3000, 44.3167]
}
albatross_sites = pd.DataFrame(albat_dict)
albatross_sites
Site | lat | lon | |
---|---|---|---|
0 | Keta Basin, Ghana | 5.9000 | 0.9833 |
1 | Kigamboni District Hub, Tanzania | -6.8500 | 39.3000 |
2 | Morondava District Hub, Madagascar | -20.2833 | 44.3167 |
get stations around africa
ioc_df = searvey.get_ioc_stations()
africa = shapely.box(-26, -35, 63, 38)
ioc_africa = ioc_df[ioc_df.geometry.within(africa)]
example for zanz
, station in Zanzibar:
station = "zanz"
sensor = "prs"
details about the station, and location:
ioc_df[ioc_df.ioc_code==station]
plot = ioc_africa.hvplot(
tiles=True,
hover_cols = ['ioc_code'],
label = "IOC stations"
) * ioc_df[ioc_df.ioc_code==station].hvplot(
geo=True,
hover_cols = ['ioc_code'],
c="r",
label = "zanz"
) * albatross_sites.hvplot.points(
x = "lon",
y = "lat",
geo=True,
hover_cols = ['Site'],
c="g",
s = 700,
marker = "*",
label = "albatross sites")
plot.opts(width = 1000, height = 1000)
ioc_code | gloss_id | country | location | connection | dcp_id | last_observation_level | last_observation_time | delay | interval | ... | observations_expected_per_month | observations_ratio_per_month | observations_ratio_per_day | sample_interval | average_delay_per_day | transmit_interval | lat | lon | contacts | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1653 | zanz | 297 | Tanzania | Zanzibar | SXXX33 | 1605D3E0 | -2.23 | 06:58 | 20' | 15' | ... | 43200.0 | 99 | 100 | 1' | 5' | 15' | -6.15 | 39.183 | Department of Land and Survey ( Tanzania ) +Un... | POINT (39.183 -6.15) |
1 rows × 25 columns
let's extract data and check it's availability (extraction for 25 years should take around 3min with an average internet connection)
raw = searvey.fetch_ioc_station(
station,
"2000-01-01",
pd.Timestamp.now()
)
raw.describe()
prs | rad | enc | bat | sw1 | sw2 | |
---|---|---|---|---|---|---|
count | 8.784477e+06 | 2.967301e+06 | 1.561668e+06 | 426897.000000 | 427939.000000 | 427611.000000 |
mean | 5.372223e+00 | 4.971022e+00 | 5.076518e+00 | 0.012746 | 0.030172 | 0.028526 |
std | 3.862470e+02 | 5.506712e+02 | 1.590083e+00 | 0.006977 | 0.039193 | 0.029911 |
min | -6.228000e+00 | -6.497000e+00 | -8.893000e+00 | 0.000000 | -0.008000 | 0.000000 |
25% | 2.690000e+00 | 3.660000e+00 | 4.120000e+00 | 0.012400 | 0.000000 | 0.000000 |
50% | 3.711000e+00 | 4.536000e+00 | 5.182000e+00 | 0.012600 | 0.029000 | 0.018000 |
75% | 5.032000e+00 | 5.427000e+00 | 6.191000e+00 | 0.013000 | 0.060000 | 0.060000 |
max | 5.945058e+05 | 7.049471e+05 | 6.349000e+01 | 4.253000 | 4.910000 | 4.230000 |
_lat = ioc_df[ioc_df.ioc_code == station].lat.values[0]
raw[sensor].loc["2014-10":].resample("1h").mean().hvplot()
let's clean the data, using ioc_cleanup
!mkdir -p transformations
import requests
open(f'transformations/{station}_{sensor}.json', 'wb').write(
requests.get(f'https://raw.githubusercontent.com/seareport/ioc_cleanup/refs/heads/ioc_2024/transformations/{station}_{sensor}.json').content
)
55773
! head -20 "transformations/{station}_{sensor}.json"
{ "ioc_code": "zanz", "sensor": "prs", "notes": "Various small spikes, but we need it for coverage", "skip": false, "wip": false, "start": "2022-01-01T00:00:00", "end": "2025-01-01T00:00:00", "high": null, "low": null, "dropped_date_ranges": [ [ "2022-03-27T00:41:00", "2022-03-27T01:59:00" ], [ "2022-07-31T22:20:00", "2022-07-31T22:29:00" ], [
let's clean the signal using the transformation
trans = C.load_transformation(station, sensor)
ts = C.transform(raw, trans)[sensor]
ts.resample("1h").mean().hvplot(title=f"water level signal in '{station}'")
let's detide the signal to isolate storm surges
detided = surge(ts, lat = _lat, resample=2)
solve: matrix prep ... solution ... done. prep/calcs ... done.
visualise the raw signal
detided.resample("1h").mean().dropna().hvplot().opts(width=1300, height = 400, title=f"surge level signal in '{station}'")
not really interesting.. no particular events happened in 2022-2024, although we can see a small contribution in september 2024
Unfortunately we only cleaned from 2022 to 2025, for our model validation purposes.
However you cans still change the start date to 2000 to get more data
let's have a look at the bigger time series.. and modify the transformation accordingly
trans.start
trans.start = pd.Timestamp(2014,10,10).to_pydatetime()
trans.dropped_date_ranges.append([pd.Timestamp(2020,4,14).to_pydatetime(), pd.Timestamp(2021,10,27).to_pydatetime()])
datetime.datetime(2022, 1, 1, 0, 0)
ts = C.transform(raw, trans)[sensor]
ts.resample("1h").mean().hvplot(title=f"water level signal in '{station}'")
detide for the last 10 years (this might take time.. and some processing)
detided = surge(ts, lat = _lat, resample=20)
solve: matrix prep ... solution ... done. prep/calcs ... done.
YEAR = 2014
detided.loc[f"{YEAR}":f"{YEAR+1}"].resample("1h").mean().hvplot().opts(width=1300, height = 400, title=f"surge level signal in '{station}'")