Files
data-centers/historical_climate_data_centers.ipynb

935 lines
42 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "0",
"metadata": {},
"source": [
"# Historical Climate for Master Data Centers (Daymet + gridMET)\n",
"\n",
"This notebook builds or refreshes `public.data_center_historical_climate`, a one-row-per-data-center climate summary keyed by `master_id`.\n",
"\n",
"**Source switch on 2026-05-20.** Previously this notebook used Open-Meteo's archive API, but a 30-year × 9-variable request consumes ~7,000 of the free 10,000 daily call budget, limiting the run to ~1 site/day. Two free, generous federal sources replace it:\n",
"\n",
"- **Daymet v4 R1** (ORNL DAAC): daily 1 km gridded `tmin`, `tmax`, `prcp`, `vp` (vapor pressure) for North America, 1980present. Pulled via the single-pixel REST API.\n",
"- **gridMET / METDATA** (U. Idaho): daily 4 km gridded daily-mean wind speed (`vs`) for CONUS, 1979present. Pulled via THREDDS NetCDF Subset Service as CSV (no `xarray` dependency).\n",
"\n",
"Climate normal period: **19912020** (standard WMO baseline).\n",
"\n",
"| Metric | Field(s) | Source / derivation |\n",
"| --- | --- | --- |\n",
"| Mean annual / summer temperature | `mean_annual_temperature_c`, `mean_summer_temperature_c` | Daymet `(tmin + tmax) / 2` |\n",
"| Max / min daily temperature | `max_daily_temperature_c`, `min_daily_temperature_c` | Daymet `tmax`, `tmin` |\n",
"| Diurnal temperature range | `mean_diurnal_temperature_range_c` | Daymet `tmax tmin` |\n",
"| Relative humidity | `mean_relative_humidity_pct` | Derived `100 × vp / e_sat(T)` via Magnus formula |\n",
"| Wet bulb temperature | `mean_wet_bulb_temperature_c`, `max_wet_bulb_temperature_c`, `extreme_wet_bulb_days` | Stull (2011) approximation from T + derived RH |\n",
"| Cooling degree days | `cooling_degree_days_c`, `annual_cooling_degree_days_c_mean` | Σ max(0, T_mean 18.3 °C) |\n",
"| Extreme heat days | `extreme_heat_days`, `annual_extreme_heat_days_mean` | Count days with `tmax ≥ 35 °C` |\n",
"| Precipitation | `precipitation_total_mm`, `annual_precipitation_mm_mean`, `annual_precipitation_cv`, `wet_day_precipitation_p95_mm` | Daymet `prcp` aggregated |\n",
"| Wind | `mean_wind_speed_ms`, `max_daily_mean_wind_speed_ms`, `sustained_wind_days`, `annual_sustained_wind_days_mean` | gridMET `vs` (daily-mean wind). **Sustained-wind day = daily mean ≥ 8 m/s** — this is *not* the same as the old Open-Meteo gust threshold; see notes at end. |\n",
"| Drought severity, wildfire smoke | `*_source_status` columns | External sources needed (USDM, gridMET PDSI, NOAA HMS) |\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import math\n",
"import os\n",
"import time\n",
"import urllib.error\n",
"import urllib.parse\n",
"import urllib.request\n",
"from io import StringIO\n",
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import psycopg2\n",
"from psycopg2 import sql\n",
"from psycopg2.extras import execute_values\n",
"\n",
"pd.set_option('display.max_columns', 100)\n",
"pd.set_option('display.max_rows', 120)\n",
"\n",
"print('pandas:', pd.__version__)\n",
"print('numpy:', np.__version__)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2",
"metadata": {},
"outputs": [],
"source": [
"def load_env_file(env_path: str = '.env') -> None:\n",
" p = Path(env_path)\n",
" if not p.exists():\n",
" print(f'No {env_path} file found in {Path.cwd()}')\n",
" return\n",
" loaded = 0\n",
" for raw_line in p.read_text(encoding='utf-8').splitlines():\n",
" line = raw_line.strip()\n",
" if not line or line.startswith('#') or '=' not in line:\n",
" continue\n",
" key, value = line.split('=', 1)\n",
" key = key.strip()\n",
" value = value.strip().strip('\"').strip(\"'\")\n",
" if key and key not in os.environ:\n",
" os.environ[key] = value\n",
" loaded += 1\n",
" print(f'Loaded {loaded} env var(s) from {env_path}')\n",
"\n",
"\n",
"def require_env(keys):\n",
" missing = [k for k in keys if not os.getenv(k)]\n",
" if missing:\n",
" raise EnvironmentError(\n",
" 'Missing required env vars in notebook kernel: ' + ', '.join(missing) +\n",
" '.\\nSet them in this notebook, or add them to a .env file in this folder.'\n",
" )\n",
"\n",
"\n",
"load_env_file('.env')\n",
"required_keys = ['PGWEB_HOST', 'PGWEB_PORT', 'PGWEB_USER', 'PGWEB_PASSWORD']\n",
"require_env(required_keys)\n",
"\n",
"DB_NAME = os.getenv('PGDATABASE', 'data_centers')\n",
"MASTER_TABLE = 'public.master_data_centers'\n",
"CLIMATE_TABLE = 'public.data_center_historical_climate'\n",
"\n",
"\n",
"def get_conn():\n",
" return psycopg2.connect(\n",
" host=os.environ['PGWEB_HOST'],\n",
" port=os.environ['PGWEB_PORT'],\n",
" user=os.environ['PGWEB_USER'],\n",
" password=os.environ['PGWEB_PASSWORD'],\n",
" dbname='data_centers',\n",
" )\n",
"\n",
"\n",
"with get_conn() as conn:\n",
" with conn.cursor() as cur:\n",
" cur.execute('select current_database(), current_user')\n",
" db, usr = cur.fetchone()\n",
" print('Connected to DB:', db)\n",
" print('As user:', usr)\n",
" cur.execute('create extension if not exists postgis')\n",
" cur.execute('select to_regclass(%s)', (MASTER_TABLE,))\n",
" if cur.fetchone()[0] is None:\n",
" raise RuntimeError(f'{MASTER_TABLE} does not exist. Run build_master_data_centers.py first.')\n",
" cur.execute(sql.SQL('select count(*) from {}').format(sql.SQL(MASTER_TABLE)))\n",
" print(f'{MASTER_TABLE} rows:', f'{cur.fetchone()[0]:,}')\n"
]
},
{
"cell_type": "markdown",
"id": "3",
"metadata": {},
"source": [
"## Parameters\n",
"\n",
"Climate normal period defaults to **19912020**, matching the current WMO baseline. Both Daymet and gridMET cover this range fully.\n",
"\n",
"- `REFRESH_EXISTING=True` recomputes existing rows (e.g. after a threshold change).\n",
"- `INCLUDE_WIND=False` skips the per-site gridMET pass; wind columns stay null and `wind_status='pending_gridmet'`. Useful for a fast first pass since Daymet alone is one HTTP call per site, while gridMET is ~30 (one per year).\n",
"- `MAX_POINTS=3` for testing, `None` for all eligible sites.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4",
"metadata": {},
"outputs": [],
"source": [
"# ---- Daymet (T, precip, vapor pressure) ----\n",
"DAYMET_API_URL = 'https://daymet.ornl.gov/single-pixel/api/data'\n",
"DAYMET_VARIABLES = ['tmin', 'tmax', 'prcp', 'vp']\n",
"DAYMET_DATASET_VERSION = 'daymet_v4r1'\n",
"\n",
"# ---- gridMET (daily-mean wind) ----\n",
"GRIDMET_NCSS_URL_TEMPLATE = 'https://thredds.northwestknowledge.net/thredds/ncss/MET/vs/vs_{year}.nc'\n",
"GRIDMET_VARIABLE = 'wind_speed'\n",
"GRIDMET_SCALE_FACTOR = 0.1 # NCSS CSV returns packed short ints; multiply to get m/s\n",
"GRIDMET_DATASET_VERSION = 'gridmet_vs'\n",
"\n",
"# ---- Period ----\n",
"START_DATE = '1991-01-01'\n",
"END_DATE = '2020-12-31'\n",
"START_YEAR = int(START_DATE[:4])\n",
"END_YEAR = int(END_DATE[:4])\n",
"\n",
"# ---- Run control ----\n",
"REFRESH_EXISTING = False\n",
"INCLUDE_WIND = True\n",
"DRY_RUN = False\n",
"MAX_POINTS = None # small int for testing; None for all eligible\n",
"REQUEST_SLEEP_SECONDS = 0.5 # politeness pause between sites\n",
"REQUEST_TIMEOUT_SECONDS = 90\n",
"MAX_RETRIES = 5\n",
"\n",
"# ---- Metric thresholds ----\n",
"CDD_BASE_C = 18.3\n",
"EXTREME_HEAT_THRESHOLD_C = 35.0\n",
"EXTREME_WET_BULB_THRESHOLD_C = 28.0\n",
"SUSTAINED_WIND_THRESHOLD_MS = 8.0 # ~18 mph daily-mean wind. NOT a gust threshold.\n",
"WET_DAY_THRESHOLD_MM = 1.0\n",
"\n",
"print(f'Period: {START_DATE} to {END_DATE}')\n",
"print(f'Target table: {CLIMATE_TABLE}')\n",
"print(f'Daymet vars: {DAYMET_VARIABLES}')\n",
"print(f'Include wind: {INCLUDE_WIND} (gridMET sustained-wind threshold = {SUSTAINED_WIND_THRESHOLD_MS} m/s daily mean)')\n",
"print(f'Refresh: {REFRESH_EXISTING}')\n",
"print(f'Max sites: {MAX_POINTS}')\n"
]
},
{
"cell_type": "markdown",
"id": "5",
"metadata": {},
"source": [
"## Create Target Table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6",
"metadata": {},
"outputs": [],
"source": [
"CREATE_CLIMATE_TABLE_SQL = f'''\n",
"create table if not exists {CLIMATE_TABLE} (\n",
" master_id text primary key references public.master_data_centers(master_id) on delete cascade,\n",
" source text,\n",
" name text,\n",
" operator text,\n",
" city text,\n",
" state text,\n",
" country text,\n",
" longitude double precision not null,\n",
" latitude double precision not null,\n",
" geom geometry(Point, 4326),\n",
"\n",
" daymet_dataset_version text not null,\n",
" gridmet_dataset_version text,\n",
" climate_period_start date not null,\n",
" climate_period_end date not null,\n",
" daymet_tile integer,\n",
" daymet_elevation_m double precision,\n",
" daymet_grid_x_m double precision,\n",
" daymet_grid_y_m double precision,\n",
" gridmet_grid_latitude double precision,\n",
" gridmet_grid_longitude double precision,\n",
" observation_days integer not null,\n",
" observation_years integer not null,\n",
"\n",
" -- temperature\n",
" mean_annual_temperature_c double precision,\n",
" mean_summer_temperature_c double precision,\n",
" max_daily_temperature_c double precision,\n",
" min_daily_temperature_c double precision,\n",
" mean_diurnal_temperature_range_c double precision,\n",
"\n",
" -- humidity / wet-bulb (derived from Daymet T + vp)\n",
" mean_relative_humidity_pct double precision,\n",
" mean_wet_bulb_temperature_c double precision,\n",
" max_wet_bulb_temperature_c double precision,\n",
" extreme_wet_bulb_days integer,\n",
"\n",
" -- cooling load / heat extremes\n",
" cooling_degree_days_c double precision,\n",
" annual_cooling_degree_days_c_mean double precision,\n",
" extreme_heat_days integer,\n",
" annual_extreme_heat_days_mean double precision,\n",
"\n",
" -- precipitation\n",
" precipitation_total_mm double precision,\n",
" annual_precipitation_mm_mean double precision,\n",
" annual_precipitation_cv double precision,\n",
" wet_day_precipitation_p95_mm double precision,\n",
"\n",
" -- wind (gridMET daily-mean, NOT gust)\n",
" mean_wind_speed_ms double precision,\n",
" max_daily_mean_wind_speed_ms double precision,\n",
" sustained_wind_days integer,\n",
" annual_sustained_wind_days_mean double precision,\n",
" wind_status text not null default 'pending_gridmet',\n",
"\n",
" -- external-source placeholders\n",
" drought_severity_index_source_status text not null default 'needs_external_source',\n",
" wildfire_smoke_exposure_source_status text not null default 'needs_external_source',\n",
"\n",
" -- provenance\n",
" daymet_variables text[] not null,\n",
" daymet_url text,\n",
" fetched_at timestamptz not null default now(),\n",
" updated_at timestamptz not null default now()\n",
");\n",
"\n",
"create index if not exists data_center_hist_climate_geom_gix\n",
" on {CLIMATE_TABLE} using gist (geom);\n",
"create index if not exists data_center_hist_climate_state_idx\n",
" on {CLIMATE_TABLE} (state);\n",
"create index if not exists data_center_hist_climate_period_idx\n",
" on {CLIMATE_TABLE} (climate_period_start, climate_period_end);\n",
"'''\n",
"\n",
"with get_conn() as conn:\n",
" with conn.cursor() as cur:\n",
" cur.execute(CREATE_CLIMATE_TABLE_SQL)\n",
" cur.execute(sql.SQL('select count(*) from {}').format(sql.SQL(CLIMATE_TABLE)))\n",
" existing = cur.fetchone()[0]\n",
" print(f'{CLIMATE_TABLE} currently has {existing:,} row(s)')\n"
]
},
{
"cell_type": "markdown",
"id": "7",
"metadata": {},
"source": [
"## Load Data Centers Needing Climate Metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8",
"metadata": {},
"outputs": [],
"source": [
"def load_target_points(refresh_existing=False, max_points=None):\n",
" exists_filter = ''\n",
" params = []\n",
" if not refresh_existing:\n",
" exists_filter = f'''\n",
" and not exists (\n",
" select 1\n",
" from {CLIMATE_TABLE} c\n",
" where c.master_id = m.master_id\n",
" and c.daymet_dataset_version = %s\n",
" and c.climate_period_start = %s::date\n",
" and c.climate_period_end = %s::date\n",
" )\n",
" '''\n",
" params = [DAYMET_DATASET_VERSION, START_DATE, END_DATE]\n",
"\n",
" limit_sql = ''\n",
" if max_points is not None:\n",
" limit_sql = 'limit %s'\n",
" params.append(int(max_points))\n",
"\n",
" query = f'''\n",
" select\n",
" m.master_id, m.source, m.name, m.operator,\n",
" m.city, m.state, m.country,\n",
" m.longitude, m.latitude\n",
" from {MASTER_TABLE} m\n",
" where m.longitude is not null\n",
" and m.latitude is not null\n",
" and m.geom is not null\n",
" -- Daymet covers North America bounding box\n",
" and m.longitude between -178 and -52\n",
" and m.latitude between 14 and 83\n",
" {exists_filter}\n",
" order by m.master_id\n",
" {limit_sql}\n",
" '''\n",
" with get_conn() as conn:\n",
" return pd.read_sql_query(query, conn, params=params)\n",
"\n",
"\n",
"targets = load_target_points(refresh_existing=REFRESH_EXISTING, max_points=MAX_POINTS)\n",
"print(f'Target data centers to fetch: {len(targets):,}')\n",
"display(targets.head())\n"
]
},
{
"cell_type": "markdown",
"id": "9",
"metadata": {},
"source": [
"## Fetchers and derived-metric helpers\n",
"\n",
"`fetch_daymet` issues one single-pixel REST call for the full 30-year period. `fetch_gridmet_wind` issues one NCSS CSV call per year (Daymet has no equivalent multi-year aggregated endpoint we can rely on; per-year fetches keep each response small).\n",
"\n",
"Wet bulb is derived in two steps:\n",
"1. **RH from Daymet vp** using the Magnus saturation-vapor-pressure formula.\n",
"2. **Wet bulb from T + RH** using the Stull (2011) approximation (valid 5 % ≤ RH ≤ 99 %, 20 °C ≤ T ≤ +50 °C).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"metadata": {},
"outputs": [],
"source": [
"def http_get(url, max_retries=MAX_RETRIES, timeout=REQUEST_TIMEOUT_SECONDS):\n",
" last_error = None\n",
" for attempt in range(1, max_retries + 1):\n",
" try:\n",
" req = urllib.request.Request(url, headers={'User-Agent': 'data-center-climate-notebook/1.1'})\n",
" with urllib.request.urlopen(req, timeout=timeout) as resp:\n",
" return resp.read().decode('utf-8')\n",
" except urllib.error.HTTPError as exc:\n",
" last_error = exc\n",
" try:\n",
" body = exc.read().decode('utf-8', errors='replace')[:200]\n",
" except Exception:\n",
" body = ''\n",
" if exc.code in (400, 404):\n",
" raise RuntimeError(f'HTTP {exc.code} on {url}: {body or exc.reason}')\n",
" sleep_for = min(2 ** attempt, 60)\n",
" print(f' HTTP {exc.code} attempt {attempt}/{max_retries}: {body}; sleeping {sleep_for}s')\n",
" time.sleep(sleep_for)\n",
" except (urllib.error.URLError, TimeoutError) as exc:\n",
" last_error = exc\n",
" sleep_for = min(2 ** attempt, 60)\n",
" print(f' Network error attempt {attempt}/{max_retries}: {exc}; sleeping {sleep_for}s')\n",
" time.sleep(sleep_for)\n",
" raise RuntimeError(f'Request failed after {max_retries} attempt(s) for {url}: {last_error}')\n",
"\n",
"\n",
"def fetch_daymet(lat, lon):\n",
" \"\"\"Return (header_dict, dataframe) for one point covering START_DATE..END_DATE.\"\"\"\n",
" params = {\n",
" 'lat': f'{float(lat):.6f}',\n",
" 'lon': f'{float(lon):.6f}',\n",
" 'vars': ','.join(DAYMET_VARIABLES),\n",
" 'start': START_DATE,\n",
" 'end': END_DATE,\n",
" 'format': 'csv',\n",
" }\n",
" url = DAYMET_API_URL + '?' + urllib.parse.urlencode(params)\n",
" text = http_get(url)\n",
" # Daymet CSV preamble: Latitude / Longitude / X & Y / Tile / Elevation / \"All years; ...\" / cite line\n",
" lines = text.splitlines()\n",
" header = {'url': url}\n",
" data_start = None\n",
" for i, line in enumerate(lines):\n",
" if line.strip().startswith('year,'):\n",
" data_start = i\n",
" break\n",
" if ':' in line:\n",
" k, v = line.split(':', 1)\n",
" header[k.strip()] = v.strip()\n",
" if data_start is None:\n",
" raise RuntimeError(f'Daymet CSV missing column header. First 500 chars: {text[:500]}')\n",
" df = pd.read_csv(StringIO('\\n'.join(lines[data_start:])))\n",
" # Normalize column names: 'prcp (mm/day)' -> 'prcp', etc.\n",
" df.columns = [c.split(' ')[0] for c in df.columns]\n",
" # Daymet uses 365-day calendar (Dec 31 dropped on leap years), yday is 1..365\n",
" df['date'] = pd.to_datetime(df['year'].astype(str), format='%Y') + pd.to_timedelta(df['yday'] - 1, unit='D')\n",
" return header, df\n",
"\n",
"\n",
"# gridMET 4 km grid (from THREDDS dataset.xml): lat origin 49.4°N decreasing,\n",
"# lon origin -124.7667°E increasing, step 0.04166667°. Two sites in the same cell\n",
"# yield the same (row, col) and share one set of 30 yearly NCSS fetches.\n",
"GRIDMET_LAT_ORIGIN = 49.400000000000006\n",
"GRIDMET_LAT_STEP = -0.041666666666671404\n",
"GRIDMET_LON_ORIGIN = -124.76666663333334\n",
"GRIDMET_LON_STEP = 0.041666666666671404\n",
"\n",
"GRIDMET_WIND_CACHE = {}\n",
"_GRIDMET_CACHE_STATS = {'hits': 0, 'misses': 0}\n",
"\n",
"\n",
"def gridmet_cell_key(lat, lon):\n",
" row = int(round((float(lat) - GRIDMET_LAT_ORIGIN) / GRIDMET_LAT_STEP))\n",
" col = int(round((float(lon) - GRIDMET_LON_ORIGIN) / GRIDMET_LON_STEP))\n",
" return (row, col)\n",
"\n",
"\n",
"# gridMET CONUS coverage bounds (from THREDDS dataset.xml projectionBox).\n",
"# Sites outside this box (AK, HI, PR, territories) cannot get gridMET wind.\n",
"GRIDMET_CONUS_LAT_MIN = 25.04583333333333\n",
"GRIDMET_CONUS_LAT_MAX = 49.420833333333334\n",
"GRIDMET_CONUS_LON_MIN = -124.78749996666667\n",
"GRIDMET_CONUS_LON_MAX = -67.03749996666669\n",
"\n",
"\n",
"def is_in_gridmet_conus(lat, lon):\n",
" return (GRIDMET_CONUS_LAT_MIN <= float(lat) <= GRIDMET_CONUS_LAT_MAX\n",
" and GRIDMET_CONUS_LON_MIN <= float(lon) <= GRIDMET_CONUS_LON_MAX)\n",
"\n",
"\n",
"def fetch_gridmet_wind(lat, lon):\n",
" \"\"\"Return (meta_dict, dataframe[date, wind_speed_ms]) covering START_YEAR..END_YEAR.\n",
"\n",
" Cached by gridMET 4 km cell — sites in the same cell share one set of fetches.\n",
" Cache miss issues one NCSS CSV request per year (~30 requests, ~3 KB each).\"\"\"\n",
" cache_key = gridmet_cell_key(lat, lon)\n",
" cached = GRIDMET_WIND_CACHE.get(cache_key)\n",
" if cached is not None:\n",
" _GRIDMET_CACHE_STATS['hits'] += 1\n",
" return cached\n",
" _GRIDMET_CACHE_STATS['misses'] += 1\n",
"\n",
" frames = []\n",
" grid_lat = grid_lon = None\n",
" for year in range(START_YEAR, END_YEAR + 1):\n",
" url = (GRIDMET_NCSS_URL_TEMPLATE.format(year=year) +\n",
" f'?var={GRIDMET_VARIABLE}'\n",
" f'&latitude={float(lat):.6f}&longitude={float(lon):.6f}'\n",
" f'&time_start={year}-01-01T00:00:00Z&time_end={year}-12-31T23:59:59Z'\n",
" f'&accept=csv')\n",
" text = http_get(url)\n",
" df = pd.read_csv(StringIO(text))\n",
" # NCSS column names look like: time, latitude[unit=\"...\"], longitude[..], wind_speed[unit=\"m/s\"]\n",
" df = df.rename(columns={c: c.split('[')[0].strip() for c in df.columns})\n",
" df['date'] = pd.to_datetime(df['time']).dt.tz_localize(None).dt.normalize()\n",
" # NCSS CSV does NOT apply scale_factor for packed shorts; apply here.\n",
" df['wind_speed_ms'] = pd.to_numeric(df['wind_speed'], errors='coerce') * GRIDMET_SCALE_FACTOR\n",
" if grid_lat is None and 'latitude' in df.columns:\n",
" grid_lat = float(df['latitude'].iloc[0])\n",
" grid_lon = float(df['longitude'].iloc[0])\n",
" frames.append(df[['date', 'wind_speed_ms']])\n",
" result = ({'grid_lat': grid_lat, 'grid_lon': grid_lon}, pd.concat(frames, ignore_index=True))\n",
" GRIDMET_WIND_CACHE[cache_key] = result\n",
" return result\n",
"\n",
"\n",
"def saturation_vapor_pressure_pa(t_c):\n",
" \"\"\"Magnus / Tetens. T in °C, returns saturation vapor pressure in Pa.\"\"\"\n",
" return 611.2 * np.exp(17.67 * t_c / (t_c + 243.5))\n",
"\n",
"\n",
"def relative_humidity_pct(vp_pa, t_c):\n",
" e_sat = saturation_vapor_pressure_pa(t_c)\n",
" rh = 100.0 * vp_pa / e_sat\n",
" return np.clip(rh, 0.0, 100.0)\n",
"\n",
"\n",
"def wet_bulb_temperature_c(t_c, rh_pct):\n",
" \"\"\"Stull (2011) approximation. Valid for 5 ≤ RH ≤ 99 %, 20 ≤ T ≤ +50 °C.\"\"\"\n",
" rh = np.clip(rh_pct, 5.0, 99.0)\n",
" return (\n",
" t_c * np.arctan(0.151977 * np.sqrt(rh + 8.313659))\n",
" + np.arctan(t_c + rh)\n",
" - np.arctan(rh - 1.676331)\n",
" + 0.00391838 * (rh ** 1.5) * np.arctan(0.023101 * rh)\n",
" - 4.686035\n",
" )\n",
"\n",
"\n",
"def _parse_daymet_header(header):\n",
" def f(s):\n",
" try: return float(str(s).split()[0])\n",
" except Exception: return None\n",
" elevation_m = f(header.get('Elevation', '').replace('meters', '').strip())\n",
" tile = None\n",
" try: tile = int(header.get('Tile', '').strip())\n",
" except Exception: pass\n",
" xy = header.get('X & Y on Lambert Conformal Conic', '')\n",
" daymet_x = daymet_y = None\n",
" if xy:\n",
" parts = xy.split()\n",
" try:\n",
" daymet_x = float(parts[0]); daymet_y = float(parts[1])\n",
" except Exception:\n",
" pass\n",
" return tile, elevation_m, daymet_x, daymet_y\n",
"\n",
"\n",
"def summarize_site(dc_row, daymet_header, daymet_df, wind_info):\n",
" df = daymet_df.copy()\n",
" df['tmean'] = (df['tmin'] + df['tmax']) / 2.0\n",
" df['year'] = df['date'].dt.year\n",
" df['month'] = df['date'].dt.month\n",
"\n",
" df['rh_mean_pct'] = relative_humidity_pct(df['vp'], df['tmean'])\n",
" df['rh_at_tmax_pct'] = relative_humidity_pct(df['vp'], df['tmax'])\n",
" df['wet_bulb_mean_c'] = wet_bulb_temperature_c(df['tmean'], df['rh_mean_pct'])\n",
" df['wet_bulb_max_c'] = wet_bulb_temperature_c(df['tmax'], df['rh_at_tmax_pct'])\n",
"\n",
" df['cdd_c'] = (df['tmean'] - CDD_BASE_C).clip(lower=0)\n",
" df['extreme_heat'] = df['tmax'] >= EXTREME_HEAT_THRESHOLD_C\n",
" df['extreme_wet_bulb'] = df['wet_bulb_max_c'] >= EXTREME_WET_BULB_THRESHOLD_C\n",
" df['dtr_c'] = df['tmax'] - df['tmin']\n",
"\n",
" annual = (df.groupby('year', as_index=False)\n",
" .agg(annual_mean_t_c=('tmean','mean'),\n",
" annual_cdd=('cdd_c','sum'),\n",
" annual_extreme_heat=('extreme_heat','sum'),\n",
" annual_prcp_mm=('prcp','sum')))\n",
" summer = df.loc[df['month'].isin([6, 7, 8])]\n",
" wet_days = df.loc[df['prcp'] >= WET_DAY_THRESHOLD_MM, 'prcp']\n",
"\n",
" annual_prcp = annual['annual_prcp_mm'].dropna()\n",
" precip_cv = float(annual_prcp.std(ddof=1) / annual_prcp.mean()) if len(annual_prcp) > 1 and annual_prcp.mean() != 0 else np.nan\n",
"\n",
" tile, elevation_m, daymet_x, daymet_y = _parse_daymet_header(daymet_header)\n",
"\n",
" row = {\n",
" 'master_id': dc_row.master_id,\n",
" 'source': dc_row.source,\n",
" 'name': dc_row.name,\n",
" 'operator': dc_row.operator,\n",
" 'city': dc_row.city,\n",
" 'state': dc_row.state,\n",
" 'country': dc_row.country,\n",
" 'longitude': float(dc_row.longitude),\n",
" 'latitude': float(dc_row.latitude),\n",
"\n",
" 'daymet_dataset_version': DAYMET_DATASET_VERSION,\n",
" 'gridmet_dataset_version': GRIDMET_DATASET_VERSION if wind_info is not None else None,\n",
" 'climate_period_start': START_DATE,\n",
" 'climate_period_end': END_DATE,\n",
" 'daymet_tile': tile,\n",
" 'daymet_elevation_m': elevation_m,\n",
" 'daymet_grid_x_m': daymet_x,\n",
" 'daymet_grid_y_m': daymet_y,\n",
" 'gridmet_grid_latitude': wind_info['grid_lat'] if wind_info is not None else None,\n",
" 'gridmet_grid_longitude': wind_info['grid_lon'] if wind_info is not None else None,\n",
" 'observation_days': int(len(df)),\n",
" 'observation_years': int(df['year'].nunique()),\n",
"\n",
" 'mean_annual_temperature_c': float(annual['annual_mean_t_c'].mean()),\n",
" 'mean_summer_temperature_c': float(summer['tmean'].mean()) if len(summer) else np.nan,\n",
" 'max_daily_temperature_c': float(df['tmax'].max()),\n",
" 'min_daily_temperature_c': float(df['tmin'].min()),\n",
" 'mean_diurnal_temperature_range_c': float(df['dtr_c'].mean()),\n",
"\n",
" 'mean_relative_humidity_pct': float(df['rh_mean_pct'].mean()),\n",
" 'mean_wet_bulb_temperature_c': float(df['wet_bulb_mean_c'].mean()),\n",
" 'max_wet_bulb_temperature_c': float(df['wet_bulb_max_c'].max()),\n",
" 'extreme_wet_bulb_days': int(df['extreme_wet_bulb'].sum()),\n",
"\n",
" 'cooling_degree_days_c': float(df['cdd_c'].sum()),\n",
" 'annual_cooling_degree_days_c_mean': float(annual['annual_cdd'].mean()),\n",
" 'extreme_heat_days': int(df['extreme_heat'].sum()),\n",
" 'annual_extreme_heat_days_mean': float(annual['annual_extreme_heat'].mean()),\n",
"\n",
" 'precipitation_total_mm': float(df['prcp'].sum()),\n",
" 'annual_precipitation_mm_mean': float(annual['annual_prcp_mm'].mean()),\n",
" 'annual_precipitation_cv': precip_cv,\n",
" 'wet_day_precipitation_p95_mm': float(wet_days.quantile(0.95)) if len(wet_days) else 0.0,\n",
"\n",
" 'mean_wind_speed_ms': None,\n",
" 'max_daily_mean_wind_speed_ms': None,\n",
" 'sustained_wind_days': None,\n",
" 'annual_sustained_wind_days_mean': None,\n",
" 'wind_status': 'pending_gridmet',\n",
"\n",
" 'drought_severity_index_source_status': 'needs_external_source',\n",
" 'wildfire_smoke_exposure_source_status': 'needs_external_source',\n",
"\n",
" 'daymet_variables': DAYMET_VARIABLES,\n",
" 'daymet_url': daymet_header.get('url'),\n",
" }\n",
"\n",
" if wind_info is not None:\n",
" wdf = wind_info['data'].copy()\n",
" wdf['year'] = wdf['date'].dt.year\n",
" wdf['sustained'] = wdf['wind_speed_ms'] >= SUSTAINED_WIND_THRESHOLD_MS\n",
" annual_wind = wdf.groupby('year', as_index=False).agg(d=('sustained', 'sum'))\n",
" row.update({\n",
" 'mean_wind_speed_ms': float(wdf['wind_speed_ms'].mean()),\n",
" 'max_daily_mean_wind_speed_ms': float(wdf['wind_speed_ms'].max()),\n",
" 'sustained_wind_days': int(wdf['sustained'].sum()),\n",
" 'annual_sustained_wind_days_mean': float(annual_wind['d'].mean()),\n",
" 'wind_status': 'gridmet',\n",
" })\n",
"\n",
" return row\n"
]
},
{
"cell_type": "markdown",
"id": "11",
"metadata": {},
"source": [
"## Upsert Helper"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12",
"metadata": {},
"outputs": [],
"source": [
"UPSERT_COLUMNS = [\n",
" 'master_id', 'source', 'name', 'operator', 'city', 'state', 'country',\n",
" 'longitude', 'latitude',\n",
" 'daymet_dataset_version', 'gridmet_dataset_version',\n",
" 'climate_period_start', 'climate_period_end',\n",
" 'daymet_tile', 'daymet_elevation_m', 'daymet_grid_x_m', 'daymet_grid_y_m',\n",
" 'gridmet_grid_latitude', 'gridmet_grid_longitude',\n",
" 'observation_days', 'observation_years',\n",
" 'mean_annual_temperature_c', 'mean_summer_temperature_c',\n",
" 'max_daily_temperature_c', 'min_daily_temperature_c',\n",
" 'mean_diurnal_temperature_range_c',\n",
" 'mean_relative_humidity_pct',\n",
" 'mean_wet_bulb_temperature_c', 'max_wet_bulb_temperature_c',\n",
" 'extreme_wet_bulb_days',\n",
" 'cooling_degree_days_c', 'annual_cooling_degree_days_c_mean',\n",
" 'extreme_heat_days', 'annual_extreme_heat_days_mean',\n",
" 'precipitation_total_mm', 'annual_precipitation_mm_mean',\n",
" 'annual_precipitation_cv', 'wet_day_precipitation_p95_mm',\n",
" 'mean_wind_speed_ms', 'max_daily_mean_wind_speed_ms',\n",
" 'sustained_wind_days', 'annual_sustained_wind_days_mean',\n",
" 'wind_status',\n",
" 'drought_severity_index_source_status',\n",
" 'wildfire_smoke_exposure_source_status',\n",
" 'daymet_variables', 'daymet_url',\n",
"]\n",
"\n",
"\n",
"def clean_db_value(value):\n",
" if isinstance(value, (list, tuple)):\n",
" return list(value)\n",
" if value is None:\n",
" return None\n",
" if isinstance(value, float) and np.isnan(value):\n",
" return None\n",
" try:\n",
" if pd.isna(value):\n",
" return None\n",
" except (TypeError, ValueError):\n",
" pass\n",
" return value\n",
"\n",
"\n",
"def upsert_climate_rows(rows):\n",
" if not rows:\n",
" return\n",
"\n",
" values = [\n",
" tuple(clean_db_value(row.get(col)) for col in UPSERT_COLUMNS)\n",
" for row in rows\n",
" ]\n",
" col_sql = sql.SQL(', ').join(map(sql.Identifier, UPSERT_COLUMNS))\n",
" update_assignments = sql.SQL(', ').join(\n",
" sql.SQL('{} = excluded.{}').format(sql.Identifier(col), sql.Identifier(col))\n",
" for col in UPSERT_COLUMNS if col != 'master_id'\n",
" )\n",
"\n",
" insert_sql = sql.SQL('''\n",
" insert into {table} ({cols}, geom, updated_at)\n",
" values %s\n",
" on conflict (master_id) do update set\n",
" {updates},\n",
" geom = excluded.geom,\n",
" fetched_at = now(),\n",
" updated_at = now()\n",
" ''').format(\n",
" table=sql.SQL(CLIMATE_TABLE),\n",
" cols=col_sql,\n",
" updates=update_assignments,\n",
" )\n",
"\n",
" template = '(' + ', '.join(['%s'] * len(UPSERT_COLUMNS)) + ', ST_SetSRID(ST_MakePoint(%s, %s), 4326), now())'\n",
" lon_i = UPSERT_COLUMNS.index('longitude')\n",
" lat_i = UPSERT_COLUMNS.index('latitude')\n",
" values_with_geom = [r + (r[lon_i], r[lat_i]) for r in values]\n",
"\n",
" with get_conn() as conn:\n",
" with conn.cursor() as cur:\n",
" execute_values(cur, insert_sql.as_string(conn), values_with_geom, template=template, page_size=500)\n"
]
},
{
"cell_type": "markdown",
"id": "13",
"metadata": {},
"source": [
"## Fetch + Upsert (per site, with checkpointing)\n",
"\n",
"Each site is fetched, summarized, **and upserted to Postgres immediately**. That way:\n",
"\n",
"- A mid-run crash never loses more than the current site.\n",
"- Re-running with `REFRESH_EXISTING=False` skips everything already in `data_center_historical_climate` (filtered in `load_target_points`).\n",
"- The CSV checkpoint at `output/historical_climate_checkpoint.csv` is purely a debug artifact; the DB is the source of truth.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14",
"metadata": {},
"outputs": [],
"source": [
"if targets.empty:\n",
" climate_rows = []\n",
" print('No new target rows. Set REFRESH_EXISTING=True to recompute existing rows.')\n",
"else:\n",
" climate_rows = []\n",
" errors = []\n",
" total = len(targets)\n",
" checkpoint_path = Path('output/historical_climate_checkpoint.csv')\n",
" checkpoint_path.parent.mkdir(exist_ok=True)\n",
"\n",
" for i, dc_row in enumerate(targets.itertuples(index=False), start=1):\n",
" try:\n",
" daymet_header, daymet_df = fetch_daymet(dc_row.latitude, dc_row.longitude)\n",
" wind_info = None\n",
" wind_skip_reason = None\n",
" if INCLUDE_WIND:\n",
" if is_in_gridmet_conus(dc_row.latitude, dc_row.longitude):\n",
" grid_meta, wind_df = fetch_gridmet_wind(dc_row.latitude, dc_row.longitude)\n",
" wind_info = {\n",
" 'grid_lat': grid_meta['grid_lat'],\n",
" 'grid_lon': grid_meta['grid_lon'],\n",
" 'data': wind_df,\n",
" }\n",
" else:\n",
" wind_skip_reason = 'unavailable_outside_conus'\n",
" row = summarize_site(dc_row, daymet_header, daymet_df, wind_info)\n",
" if wind_skip_reason is not None:\n",
" row['wind_status'] = wind_skip_reason\n",
" climate_rows.append(row)\n",
"\n",
" if not DRY_RUN:\n",
" upsert_climate_rows([row])\n",
" pd.DataFrame(climate_rows).to_csv(checkpoint_path, index=False)\n",
"\n",
" print(f'{i:>5,}/{total:,} {dc_row.master_id} '\n",
" f'T_mean={row[\"mean_annual_temperature_c\"]:.2f}°C '\n",
" f'CDD={row[\"cooling_degree_days_c\"]:,.0f} '\n",
" f'wind={row[\"wind_status\"]} '\n",
" f'successes={len(climate_rows):,} errors={len(errors):,}')\n",
" except Exception as exc:\n",
" errors.append({'master_id': dc_row.master_id, 'error': str(exc)})\n",
" print(f'ERROR {i:>5,}/{total:,} ({dc_row.master_id}): {exc}')\n",
"\n",
" time.sleep(REQUEST_SLEEP_SECONDS)\n",
"\n",
" climate = pd.DataFrame(climate_rows)\n",
" error_log = pd.DataFrame(errors)\n",
" print(f'\\nDone: {len(climate):,} successes / {len(error_log):,} errors')\n",
" if INCLUDE_WIND:\n",
" h = _GRIDMET_CACHE_STATS['hits']; m = _GRIDMET_CACHE_STATS['misses']\n",
" tot = h + m\n",
" print(f'gridMET cache: {h:,} hits / {m:,} misses ({100*h/tot:.1f}% hit rate, {len(GRIDMET_WIND_CACHE):,} unique cells)' if tot else 'gridMET cache: no calls')\n",
" if not climate.empty:\n",
" display(climate.head())\n",
" if not error_log.empty:\n",
" display(error_log)\n"
]
},
{
"cell_type": "markdown",
"id": "15",
"metadata": {},
"source": [
"## Inspect Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16",
"metadata": {},
"outputs": [],
"source": [
"summary_query = f'''\n",
"select\n",
" count(*) as rows,\n",
" min(climate_period_start) as min_period_start,\n",
" max(climate_period_end) as max_period_end,\n",
" count(*) filter (where wind_status = 'gridmet') as with_wind,\n",
" count(*) filter (where wind_status = 'pending_gridmet') as wind_pending,\n",
" count(*) filter (where drought_severity_index_source_status = 'needs_external_source') as drought_external_needed,\n",
" count(*) filter (where wildfire_smoke_exposure_source_status = 'needs_external_source') as smoke_external_needed\n",
"from {CLIMATE_TABLE}\n",
"where daymet_dataset_version = %s\n",
" and climate_period_start = %s::date\n",
" and climate_period_end = %s::date\n",
"'''\n",
"\n",
"sample_query = f'''\n",
"select\n",
" c.master_id, c.name, c.state,\n",
" round(c.mean_annual_temperature_c::numeric, 2) as mean_t_c,\n",
" round(c.mean_summer_temperature_c::numeric, 2) as summer_t_c,\n",
" round(c.mean_relative_humidity_pct::numeric, 1) as rh_pct,\n",
" round(c.mean_wet_bulb_temperature_c::numeric, 2) as wb_mean_c,\n",
" round(c.cooling_degree_days_c::numeric, 0) as cdd_total,\n",
" c.extreme_heat_days,\n",
" round(c.annual_precipitation_cv::numeric, 3) as precip_cv,\n",
" round(c.mean_wind_speed_ms::numeric, 2) as wind_mean_ms,\n",
" c.sustained_wind_days,\n",
" c.wind_status\n",
"from {CLIMATE_TABLE} c\n",
"order by c.cooling_degree_days_c desc nulls last\n",
"limit 20\n",
"'''\n",
"\n",
"with get_conn() as conn:\n",
" summary = pd.read_sql_query(summary_query, conn, params=(DAYMET_DATASET_VERSION, START_DATE, END_DATE))\n",
" sample = pd.read_sql_query(sample_query, conn)\n",
"\n",
"display(summary)\n",
"display(sample)\n"
]
},
{
"cell_type": "markdown",
"id": "17",
"metadata": {},
"source": [
"## Notes for the Next Data Sources\n",
"\n",
"Daymet (T/precip/VP at 1 km) + gridMET (wind at 4 km) cover the core weather/climate burden metrics for CONUS data centers. Two metrics still need better external sources:\n",
"\n",
"- **Drought severity index** — gridMET also publishes `pdsi` (Palmer Drought Severity Index) and `etr` (reference ET) at the same THREDDS server. A follow-up cell can pull PDSI percentiles using the same NCSS pattern as `vs` here. Alternatively, the US Drought Monitor (USDM) shapefiles give weekly drought categories that can be spatially joined to the DC point.\n",
"- **Wildfire smoke exposure** — NOAA HMS smoke shapefiles (daily plume polygons, 2005present) or EPA AirNow PM2.5 monitor data. Smoke-day counts would be directly comparable to the heat-day metric here.\n",
"\n",
"**Metric change vs. the prior Open-Meteo implementation.** The old `windstorm_days` field counted days where the daily *max wind gust* exceeded 17.2 m/s (~38 mph, ERA5 reanalysis gust). gridMET reports only daily *mean* wind, not gusts, so the new `sustained_wind_days` field counts days with daily-mean wind ≥ 8 m/s (~18 mph). These are fundamentally different signals — do not compare values across the two columns. If true gust/storm event counts are needed, NOAA's Storm Events Database is the appropriate next source.\n"
]
},
{
"cell_type": "markdown",
"id": "18",
"metadata": {},
"source": [
"## Tables Created by This Notebook and Their Relationships\n",
"\n",
"### Tables Created / Maintained\n",
"1. `public.data_center_historical_climate`\n",
"- One row per `master_id` with climate summary fields and geometry.\n",
"- Populated by incremental upsert so reruns refresh existing sites and add new sites.\n",
"\n",
"### Key Relationships\n",
"- `public.master_data_centers (master_id)`\n",
" - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
"\n",
"### Rerun Notes\n",
"- Safe to rerun when the master data-center set changes.\n",
"- Existing rows are updated in place; no duplicate-per-site history table is created by this notebook."
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}