Standardize notebook table-relationship documentation cells
This commit is contained in:
@@ -677,134 +677,32 @@
|
||||
"id": "16",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tables Created\n",
|
||||
"## Tables Created by This Notebook and Their Relationships\n",
|
||||
"\n",
|
||||
"This notebook builds three tables in the `public` schema, all keyed (directly or transitively) to `master_data_centers.master_id`.\n",
|
||||
"### Tables Created / Maintained\n",
|
||||
"1. `public.usdm_drought_weekly`\n",
|
||||
"- Weekly USDM drought polygons by `week_date` and drought category.\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"2. `public.data_center_usdm_drought_dc_week`\n",
|
||||
"- One row per `(master_id, week_date)` with weekly worst drought category at each data center.\n",
|
||||
"\n",
|
||||
"### 1. `public.usdm_drought_weekly`\n",
|
||||
"3. `public.data_center_usdm_drought_exposure`\n",
|
||||
"- One row per `master_id` with summary drought-exposure metrics and streak fields.\n",
|
||||
"\n",
|
||||
"Raw weekly USDM drought polygons — one row per `(week_date, dm_category)` (occasionally multiple rows for early-USDM weeks that published per-category fragments). Source of truth for any later spatial query against the drought record.\n",
|
||||
"### Key Relationships\n",
|
||||
"- `public.usdm_drought_weekly (week_date, dm_category, geom)`\n",
|
||||
" - spatial/time source for -> `public.data_center_usdm_drought_dc_week`\n",
|
||||
"\n",
|
||||
"| Column | Type | Meaning |\n",
|
||||
"|---|---|---|\n",
|
||||
"| `id` | `bigserial` PK | Surrogate row id |\n",
|
||||
"| `week_date` | `date` | Tuesday-of-publication date parsed from filename (`USDM_YYYYMMDD_M.zip`) |\n",
|
||||
"| `dm_category` | `smallint` | 0=D0 Abnormally Dry, 1=D1 Moderate, 2=D2 Severe, 3=D3 Extreme, 4=D4 Exceptional. **Cumulative** — D4 polygon is inside D3 inside D2… |\n",
|
||||
"| `objectid`, `shape_leng`, `shape_area` | original shapefile attributes |\n",
|
||||
"| `geom` | `geometry(MultiPolygon, 4326)` | Drought-affected area for that category that week |\n",
|
||||
"- `public.master_data_centers (master_id)`\n",
|
||||
" - 1-to-many -> `public.data_center_usdm_drought_dc_week (master_id, week_date)`\n",
|
||||
" - 1-to-1 (effective) -> `public.data_center_usdm_drought_exposure (master_id)`\n",
|
||||
"\n",
|
||||
"**Indexes:** GIST on `geom`, btree on `week_date`.\n",
|
||||
"- `public.data_center_usdm_drought_dc_week`\n",
|
||||
" - many-to-1 summary rollup -> `public.data_center_usdm_drought_exposure`\n",
|
||||
"\n",
|
||||
"**Size:** ~12,000 polygon rows across 1,356 weeks (Jan 2000 – mid 2025).\n",
|
||||
"\n",
|
||||
"**Example uses:**\n",
|
||||
"```sql\n",
|
||||
"-- Map of D3+ drought in August 2022\n",
|
||||
"SELECT week_date, dm_category, geom\n",
|
||||
"FROM usdm_drought_weekly\n",
|
||||
"WHERE week_date = '2022-08-30' AND dm_category >= 3;\n",
|
||||
"\n",
|
||||
"-- Worst week ever for a specific lat/lon\n",
|
||||
"SELECT week_date, MAX(dm_category) AS worst_dm\n",
|
||||
"FROM usdm_drought_weekly\n",
|
||||
"WHERE ST_Within(ST_SetSRID(ST_MakePoint(-98.5, 29.5), 4326), geom)\n",
|
||||
"GROUP BY week_date ORDER BY worst_dm DESC, week_date LIMIT 10;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"### 2. `public.data_center_usdm_drought_dc_week`\n",
|
||||
"\n",
|
||||
"Long-form per-(DC, week) intermediate. One row per data center per USDM week observed; useful for time-series and streak analysis. Computed from `usdm_drought_weekly` via spatial join, then back-filled so every covered DC has a row for every week.\n",
|
||||
"\n",
|
||||
"| Column | Type | Meaning |\n",
|
||||
"|---|---|---|\n",
|
||||
"| `master_id` | `text` PK (composite) | FK → `master_data_centers.master_id` |\n",
|
||||
"| `week_date` | `date` PK (composite) | USDM week |\n",
|
||||
"| `worst_dm` | `smallint` | Max `dm_category` whose polygon contained the DC point that week. **`-1` means observed week but no drought polygon contained the DC** (filter `worst_dm >= 0` for actual drought weeks) |\n",
|
||||
"\n",
|
||||
"**Indexes:** PK on `(master_id, week_date)`, btree on `week_date`, btree on `worst_dm`.\n",
|
||||
"\n",
|
||||
"**Size:** ~2.5 M rows (1,833 DCs × 1,356 weeks, minus DCs not covered by USDM).\n",
|
||||
"\n",
|
||||
"**Example uses:**\n",
|
||||
"```sql\n",
|
||||
"-- Drought timeline for one DC\n",
|
||||
"SELECT week_date, worst_dm\n",
|
||||
"FROM data_center_usdm_drought_dc_week\n",
|
||||
"WHERE master_id = 'curated/1010260676' AND worst_dm >= 0\n",
|
||||
"ORDER BY week_date;\n",
|
||||
"\n",
|
||||
"-- DCs that were in D4 during a specific week\n",
|
||||
"SELECT master_id FROM data_center_usdm_drought_dc_week\n",
|
||||
"WHERE week_date = '2012-07-24' AND worst_dm = 4;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you only need the per-DC summary, this table can be dropped — it's regenerable from `usdm_drought_weekly` + `master_data_centers`.\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"### 3. `public.data_center_usdm_drought_exposure`\n",
|
||||
"\n",
|
||||
"Per-DC drought-exposure summary keyed by `master_id`. The analytical surface — one row per data center with all the headline metrics. Joinable directly to `master_data_centers` and `data_center_historical_climate`.\n",
|
||||
"\n",
|
||||
"| Column | Type | Meaning |\n",
|
||||
"|---|---|---|\n",
|
||||
"| `master_id` | `text` PK | FK → `master_data_centers.master_id` |\n",
|
||||
"| Identity cols | `source`, `name`, `operator`, `city`, `state`, `country`, `longitude`, `latitude`, `geom` — denormalized from master for convenience |\n",
|
||||
"| `usdm_status` | `text` | `'covered'` (USDM zone) or `'no_coverage'` (outside USDM extent) |\n",
|
||||
"| `drought_period_start`, `drought_period_end` | `date` | First / last USDM week observed for this DC |\n",
|
||||
"| `weeks_observed` | `int` | Total weekly observations |\n",
|
||||
"| `weeks_in_d0_or_worse` … `weeks_in_d4` | `int` | Cumulative weekly counts at each severity threshold |\n",
|
||||
"| `pct_weeks_in_d0_or_worse` … `pct_weeks_in_d4` | `double` | Same as ratios over `weeks_observed` |\n",
|
||||
"| `worst_dm_category` | `smallint` | Max DM ever experienced (0–4) |\n",
|
||||
"| `mean_dm_category` | `double` | Average DM across all weeks, treating no-drought (`-1`) as 0 |\n",
|
||||
"| `longest_d0_streak_weeks` | `int` | Longest consecutive run with any drought (D0+) |\n",
|
||||
"| `longest_d2_streak_weeks` | `int` | Longest consecutive run with severe drought (D2+) — **the headline streak metric** |\n",
|
||||
"| `longest_d3_streak_weeks` | `int` | Longest consecutive run with extreme drought (D3+) |\n",
|
||||
"| `fetched_at`, `updated_at` | `timestamptz` | Provenance |\n",
|
||||
"\n",
|
||||
"**Indexes:** GIST on `geom`, btree on `state`, btree on `worst_dm_category`.\n",
|
||||
"\n",
|
||||
"**Size:** 1,833 rows (one per master DC; PR sites flagged `no_coverage` if applicable).\n",
|
||||
"\n",
|
||||
"**Headline metric for site-selection analysis:** `pct_weeks_in_d2_or_worse`. D2 = \"Severe Drought\" is the threshold at which water-use restrictions typically kick in for utilities and municipalities.\n",
|
||||
"\n",
|
||||
"**Example: joined climate + drought view for cooling-water risk analysis**\n",
|
||||
"```sql\n",
|
||||
"SELECT\n",
|
||||
" c.master_id, c.name, c.state,\n",
|
||||
" c.cooling_degree_days_c, -- baseline cooling load\n",
|
||||
" c.mean_wet_bulb_temperature_c, -- evaporative-cooling efficiency\n",
|
||||
" d.pct_weeks_in_d2_or_worse * 100 AS pct_severe_drought,\n",
|
||||
" d.longest_d2_streak_weeks,\n",
|
||||
" d.worst_dm_category\n",
|
||||
"FROM data_center_historical_climate c\n",
|
||||
"JOIN data_center_usdm_drought_exposure d USING (master_id)\n",
|
||||
"WHERE d.usdm_status = 'covered'\n",
|
||||
"ORDER BY (c.cooling_degree_days_c * d.pct_weeks_in_d2_or_worse) DESC\n",
|
||||
"LIMIT 25;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"### Relationship diagram\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"master_data_centers (master_id PK)\n",
|
||||
" │\n",
|
||||
" ├── data_center_historical_climate (master_id PK) ← from open_meteo/Daymet/gridMET notebook\n",
|
||||
" │\n",
|
||||
" └── data_center_usdm_drought_exposure (master_id PK) ← this notebook\n",
|
||||
" │\n",
|
||||
" └── data_center_usdm_drought_dc_week (master_id, week_date)\n",
|
||||
" │\n",
|
||||
" └── usdm_drought_weekly (id PK, week_date, dm_category, geom)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"All three USDM tables are regenerable from the zip files in `USDM Shape Files/`. `RELOAD_WEEKLY=True` rebuilds from scratch; `RECOMPUTE_SUMMARY=True` (default) recomputes the dc-week + exposure tables from whatever's in `usdm_drought_weekly`.\n"
|
||||
"### Rerun Notes\n",
|
||||
"- Supports repeat runs when new USDM weeks or new data centers are added.\n",
|
||||
"- Weekly table can be reloaded and the downstream `dc_week` + `exposure` tables can be recomputed from that source."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
Reference in New Issue
Block a user