Standardize notebook table-relationship documentation cells

2026-05-22 14:21:51 -07:00
parent c95f22fcdb
commit 03239ad007
9 changed files with 147 additions and 191 deletions
--- a/build_fcc_bdc_broadband_connection_table.ipynb
+++ b/build_fcc_bdc_broadband_connection_table.ipynb
@@ -1486,29 +1486,23 @@
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook creates and/or maintains five PostgreSQL tables in the `public` schema:\n",
+    "### Tables Created / Maintained\n",
    "\n",
    "1. `public.fcc_bdc_as_of`\n",
-    "- One row per FCC BDC release date and data type.\n",
+    "- Release/version metadata by `as_of_date`.\n",
    "- Primary metadata table used to track versioning (`as_of_date`) for downstream loads.\n",
    "\n",
    "2. `public.fcc_bdc_files`\n",
-    "- One row per file discovered/downloaded for a release.\n",
+    "- File-level lineage records for each FCC BDC release.\n",
    "- Linked to releases via `as_of_date` and used as file-level lineage/provenance.\n",
    "\n",
    "3. `public.fcc_bdc_broadband_by_datacenter`\n",
-    "- Fact table keyed by `(master_id, as_of_date)` for per-data-center broadband availability metrics.\n",
+    "- Per-data-center broadband fact table keyed by `(master_id, as_of_date)`.\n",
    "- Includes scalar broadband fields and summary JSON payloads.\n",
    "- `master_id` aligns with `public.master_data_centers.master_id`.\n",
    "\n",
    "4. `public.fcc_bdc_broadband_summary`\n",
-    "- Aggregated summary metrics by release (`as_of_date`) used for QA and reporting.\n",
+    "- Release-level aggregate summary metrics.\n",
    "\n",
    "5. `public.fcc_bdc_provider_summary`\n",
-    "- Provider catalog/aggregation table by release (`as_of_date`) with provider class rollups.\n",
+    "- Release-level provider catalog and provider-class summary metrics.\n",
    "\n",
    "### Relationship Summary\n",
    "\n",
    "### Key Relationships\n",
    "- `public.fcc_bdc_as_of (as_of_date)`\n",
    "  - 1-to-many -> `public.fcc_bdc_files (as_of_date)`\n",
    "  - 1-to-many -> `public.fcc_bdc_broadband_by_datacenter (as_of_date)`\n",
@@ -1518,7 +1512,9 @@
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-many over time -> `public.fcc_bdc_broadband_by_datacenter (master_id, as_of_date)`\n",
    "\n",
-    "In short: **release metadata (`as_of` + `files`) supports reproducible loads, while per-DC broadband facts and release-level/provider-level summaries support analysis.**"
+    "### Rerun Notes\n",
    "- The notebook is designed for repeat refreshes as new FCC releases arrive.\n",
    "- Use `as_of_date` as the version key when comparing snapshots over time."
   ]
  }
 ],
--- a/cluster_analysis.ipynb
+++ b/cluster_analysis.ipynb
@@ -916,6 +916,28 @@
    "print('Top non-metro watersheds (RUCA 4-10):')\n",
    "nm_ws.head(15).reset_index(drop=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
    "### Tables Created / Maintained\n",
    "1. `public.ruca_codes_2020_tract`\n",
    "- Tract-level RUCA lookup loaded from `new/RUCA-codes-2020-tract.csv`.\n",
    "- Rebuilt with drop + recreate during load.\n",
    "- Primary key: `tract_fips_20`.\n",
    "\n",
    "### Key Relationships\n",
    "- `public.master_data_centers (geoid)`\n",
    "  - many-to-1 -> `public.ruca_codes_2020_tract (tract_fips_20)`\n",
    "\n",
    "### Rerun Notes\n",
    "- Rerunning refreshes the RUCA lookup table from the latest CSV.\n",
    "- Downstream joins in this notebook read from this table but do not create additional persistent analysis tables."
   ]
  }
 ],
 "metadata": {
--- a/historical_climate_data_centers.ipynb
+++ b/historical_climate_data_centers.ipynb
@@ -895,21 +895,18 @@
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook creates and/or maintains one primary PostGIS table:\n",
+    "### Tables Created / Maintained\n",
    "\n",
    "1. `public.data_center_historical_climate`\n",
-    "- One row per data center (`master_id`).\n",
+    "- One row per `master_id` with climate summary fields and geometry.\n",
-    "- Stores climate summary metrics (temperature, humidity, wet-bulb, precipitation variability, cooling-degree-days, wind fields/status), geometry, and lineage timestamps.\n",
+    "- Populated by incremental upsert so reruns refresh existing sites and add new sites.\n",
    "- Upserted incrementally so reruns refresh changed rows without duplicating records.\n",
    "\n",
    "### Relationship Summary\n",
    "\n",
    "### Key Relationships\n",
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
    "\n",
-    "`public.data_center_historical_climate.master_id` is a foreign key to `public.master_data_centers.master_id` (with cascade delete), so climate rows track the master data-center record set.\n",
+    "### Rerun Notes\n",
-    "\n",
+    "- Safe to rerun when the master data-center set changes.\n",
-    "In short: **`master_data_centers` is the entity table, and `data_center_historical_climate` is its one-row-per-site climate feature extension.**"
+    "- Existing rows are updated in place; no duplicate-per-site history table is created by this notebook."
   ]
  }
 ],
--- a/hms_smoke_data_centers.ipynb
+++ b/hms_smoke_data_centers.ipynb
@@ -1184,23 +1184,35 @@
   "id": "22",
   "metadata": {},
   "source": [
-    "## Tables Created\n",
+    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook creates four PostGIS tables for NOAA HMS smoke exposure analysis. The tables are designed to separate source observations, raw geometries, long-form data-center exposure, and the final per-site summary.\n",
+    "### Tables Created / Maintained\n",
    "1. `public.hms_smoke_days`\n",
    "- One row per observed HMS product day (daily denominator table).\n",
    "\n",
-    "| Table | Grain | Purpose |\n",
+    "2. `public.hms_smoke_daily`\n",
-    "|---|---|---|\n",
+    "- One row per smoke polygon geometry from HMS source products.\n",
    "| `public.hms_smoke_days` | One row per observed HMS product day | Denominator table for daily percentages, including days with zero smoke polygons. Stores `smoke_date`, source metadata, and `feature_count`. |\n",
    "| `public.hms_smoke_daily` | One row per HMS smoke polygon | Raw smoke plume geometry table. Stores `smoke_date`, satellite/time fields, normalized `density`, `density_rank`, source metadata, and `geom`. |\n",
    "| `public.data_center_hms_smoke_dc_day` | One row per `(master_id, smoke_date)` | Long-form daily exposure table for every data center on every observed HMS day. `max_density_rank = 0` means observed no smoke; `1`, `2`, and `3` represent light/unspecified, medium, and heavy smoke exposure. |\n",
    "| `public.data_center_hms_smoke_exposure` | One row per `master_id` | Final per-data-center summary table joinable to `public.master_data_centers`. Includes location fields, observation status, smoke-period dates, exposure-day counts, percentage metrics, worst/mean density, and longest streak metrics. |\n",
    "\n",
-    "Recommended use:\n",
+    "3. `public.data_center_hms_smoke_dc_day`\n",
    "- One row per `(master_id, smoke_date)` with daily smoke exposure classification.\n",
    "\n",
-    "- Use `public.data_center_hms_smoke_exposure` for most site-level analysis and ranking.\n",
+    "4. `public.data_center_hms_smoke_exposure`\n",
-    "- Use `public.data_center_hms_smoke_dc_day` for time-series analysis, seasonal summaries, or custom thresholds.\n",
+    "- One row per `master_id` with summary smoke-exposure metrics.\n",
-    "- Use `public.hms_smoke_daily` when you need the original smoke plume geometries for mapping or spatial QA.\n",
+    "\n",
-    "- Use `public.hms_smoke_days` whenever calculating percentages so no-smoke observed days remain in the denominator."
+    "### Key Relationships\n",
    "- `public.hms_smoke_days (smoke_date)`\n",
    "  - 1-to-many -> `public.hms_smoke_daily (smoke_date)`\n",
    "\n",
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-many -> `public.data_center_hms_smoke_dc_day (master_id, smoke_date)`\n",
    "  - 1-to-1 (effective) -> `public.data_center_hms_smoke_exposure (master_id)`\n",
    "\n",
    "- `public.data_center_hms_smoke_dc_day`\n",
    "  - many-to-1 summary rollup -> `public.data_center_hms_smoke_exposure`\n",
    "\n",
    "### Rerun Notes\n",
    "- Designed for repeat refreshes as additional HMS days become available.\n",
    "- Summary exposure table is recomputed from daily source/bridge tables so results stay consistent after reloads."
   ]
  }
 ],
--- a/open_meteo_historical_data_centers.ipynb
+++ b/open_meteo_historical_data_centers.ipynb
@@ -844,21 +844,18 @@
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook creates and/or maintains one primary PostGIS table:\n",
+    "### Tables Created / Maintained\n",
    "\n",
    "1. `public.data_center_historical_climate`\n",
-    "- One row per data center (`master_id`).\n",
+    "- One row per `master_id` with climate summary fields and geometry.\n",
-    "- Stores climate summary metrics (temperature, humidity, wet-bulb, precipitation variability, cooling-degree-days, wind fields/status), geometry, and lineage timestamps.\n",
+    "- Populated by incremental upsert so reruns refresh existing sites and add new sites.\n",
    "- Upserted incrementally so reruns refresh changed rows without duplicating records.\n",
    "\n",
    "### Relationship Summary\n",
    "\n",
    "### Key Relationships\n",
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
    "\n",
-    "`public.data_center_historical_climate.master_id` is a foreign key to `public.master_data_centers.master_id` (with cascade delete), so climate rows track the master data-center record set.\n",
+    "### Rerun Notes\n",
-    "\n",
+    "- Safe to rerun when the master data-center set changes.\n",
-    "In short: **`master_data_centers` is the entity table, and `data_center_historical_climate` is its one-row-per-site climate feature extension.**"
+    "- Existing rows are updated in place; no duplicate-per-site history table is created by this notebook."
   ]
  }
 ],
--- a/postgis_table_loader.ipynb
+++ b/postgis_table_loader.ipynb
@@ -538,6 +538,29 @@
    "        for row in cur.fetchall():\n",
    "            print(f'{row[0]}.{row[1]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
    "### Tables Created / Maintained\n",
    "1. `TARGET_TABLE` (configured at runtime)\n",
    "- Generic loader output table built from the current dataframe schema.\n",
    "- Replaced/appended according to `if_exists` behavior.\n",
    "- Optional point geometry can be added in helper cells.\n",
    "\n",
    "### Key Relationships\n",
    "- This notebook is table-agnostic: relationships depend on the selected `TARGET_TABLE` and source columns.\n",
    "- When key columns (for example `master_id`, `geoid`, IDs, dates) are present, the loaded table can be joined to domain tables.\n",
    "- When geometry is present, the loaded table can participate in spatial joins.\n",
    "\n",
    "### Rerun Notes\n",
    "- Safe to rerun for recurring refreshes of different source files.\n",
    "- Always confirm `TARGET_TABLE` and `if_exists` before execution to avoid unintended replacement of existing tables."
   ]
  }
 ],
 "metadata": {
--- a/rdh_precinct_vote_data_centers.ipynb
+++ b/rdh_precinct_vote_data_centers.ipynb
@@ -1676,36 +1676,20 @@
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook creates and/or maintains the following PostGIS/PostgreSQL tables:\n",
+    "### Tables Created / Maintained\n",
    "\n",
    "1. `public.rdh_precinct_vote_layers`\n",
-    "- One row per RDH precinct-election layer ingested.\n",
+    "- One row per ingested precinct-election layer.\n",
    "- Key columns: `layer_id` (PK), `state_code`, `title`, `format`, file/source metadata, `loaded_at`.\n",
    "\n",
    "2. `public.rdh_precinct_vote_features`\n",
-    "- One row per precinct polygon feature from a loaded layer.\n",
+    "- One row per precinct geometry feature with source properties JSON.\n",
    "- Key columns: `feature_id` (PK), `layer_id` (FK), `state_code`, `source_row`, `properties` (JSONB), `geom` (MultiPolygon).\n",
    "- Relationship: many features belong to one layer.\n",
    "\n",
    "3. `public.data_center_rdh_precinct_vote_matches`\n",
-    "- Spatial match table linking data centers to precinct features.\n",
+    "- Bridge table linking data centers to matched precinct features.\n",
    "- Key columns: `master_id` (FK), `feature_id` (FK), `layer_id` (FK), `state_code`, `join_method`, `match_distance_m`, `matched_at`.\n",
    "- Primary key: (`master_id`, `feature_id`).\n",
    "- Relationship: many-to-many bridge between data centers and precinct features (with match metadata).\n",
    "\n",
    "4. `public.data_center_election_context`\n",
-    "- Final standardized, one-row-per-data-center election context used by downstream mapping/analysis.\n",
+    "- Standardized, one-row-per-data-center election context for downstream analysis/mapping.\n",
    "- Key columns: `master_id` (PK, FK), `name`, `city`, `state`, `rdh_layer_title`,\n",
    "  `precinct_identifier_name`, `election_year`, `office`, `democratic_votes`, `republican_votes`,\n",
    "  `total_votes`, `turnout_or_vote_share`, `updated_at`.\n",
    "- Relationship: one row per `master_id` in `public.master_data_centers` (left-joined so all master rows can be retained, even if election fields are null).\n",
    "\n",
    "### Relationship Summary\n",
    "\n",
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (master_id)`\n",
    "  - 1-to-1 (effective in this notebook) -> `public.data_center_election_context (master_id)`\n",
    "\n",
    "### Key Relationships\n",
    "- `public.rdh_precinct_vote_layers (layer_id)`\n",
    "  - 1-to-many -> `public.rdh_precinct_vote_features (layer_id)`\n",
    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (layer_id)`\n",
@@ -1713,7 +1697,13 @@
    "- `public.rdh_precinct_vote_features (feature_id)`\n",
    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (feature_id)`\n",
    "\n",
-    "In short: **layers -> features -> matches**, then matches are standardized into **one election-context row per data center**."
+    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (master_id)`\n",
    "  - 1-to-1 (effective) -> `public.data_center_election_context (master_id)`\n",
    "\n",
    "### Rerun Notes\n",
    "- Safe to rerun as new RDH layers and/or data centers are added.\n",
    "- Reruns refresh matching outputs and regenerate standardized election context rows."
   ]
  }
 ],
--- a/spatial_clustering_master_data_centers.ipynb
+++ b/spatial_clustering_master_data_centers.ipynb
@@ -1116,6 +1116,27 @@
    "else:\n",
    "    print('WRITE_BACK_TO_DB is False; no database table was modified.')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
    "### Tables Created / Maintained\n",
    "1. `public.master_data_center_spatial_clusters` (optional write)\n",
    "- One row per `master_id` with cluster label and clustering metadata.\n",
    "- Written only when `WRITE_BACK_TO_DB = True`.\n",
    "\n",
    "### Key Relationships\n",
    "- `public.master_data_centers (master_id)`\n",
    "  - 1-to-1 (effective) -> `public.master_data_center_spatial_clusters (master_id)`\n",
    "\n",
    "### Rerun Notes\n",
    "- Default behavior (`WRITE_BACK_TO_DB = False`) performs no table writes.\n",
    "- With write-back enabled, reruns replace cluster assignments using the current parameters/data."
   ]
  }
 ],
 "metadata": {
--- a/usdm_drought_data_centers.ipynb
+++ b/usdm_drought_data_centers.ipynb
@@ -677,134 +677,32 @@
   "id": "16",
   "metadata": {},
   "source": [
-    "## Tables Created\n",
+    "## Tables Created by This Notebook and Their Relationships\n",
    "\n",
-    "This notebook builds three tables in the `public` schema, all keyed (directly or transitively) to `master_data_centers.master_id`.\n",
+    "### Tables Created / Maintained\n",
    "1. `public.usdm_drought_weekly`\n",
    "- Weekly USDM drought polygons by `week_date` and drought category.\n",
    "\n",
-    "---\n",
+    "2. `public.data_center_usdm_drought_dc_week`\n",
    "- One row per `(master_id, week_date)` with weekly worst drought category at each data center.\n",
    "\n",
-    "### 1. `public.usdm_drought_weekly`\n",
+    "3. `public.data_center_usdm_drought_exposure`\n",
    "- One row per `master_id` with summary drought-exposure metrics and streak fields.\n",
    "\n",
-    "Raw weekly USDM drought polygons — one row per `(week_date, dm_category)` (occasionally multiple rows for early-USDM weeks that published per-category fragments). Source of truth for any later spatial query against the drought record.\n",
+    "### Key Relationships\n",
    "- `public.usdm_drought_weekly (week_date, dm_category, geom)`\n",
    "  - spatial/time source for -> `public.data_center_usdm_drought_dc_week`\n",
    "\n",
-    "| Column | Type | Meaning |\n",
+    "- `public.master_data_centers (master_id)`\n",
-    "|---|---|---|\n",
+    "  - 1-to-many -> `public.data_center_usdm_drought_dc_week (master_id, week_date)`\n",
-    "| `id` | `bigserial` PK | Surrogate row id |\n",
+    "  - 1-to-1 (effective) -> `public.data_center_usdm_drought_exposure (master_id)`\n",
    "| `week_date` | `date` | Tuesday-of-publication date parsed from filename (`USDM_YYYYMMDD_M.zip`) |\n",
    "| `dm_category` | `smallint` | 0=D0 Abnormally Dry, 1=D1 Moderate, 2=D2 Severe, 3=D3 Extreme, 4=D4 Exceptional. **Cumulative** — D4 polygon is inside D3 inside D2… |\n",
    "| `objectid`, `shape_leng`, `shape_area` | original shapefile attributes |\n",
    "| `geom` | `geometry(MultiPolygon, 4326)` | Drought-affected area for that category that week |\n",
    "\n",
-    "**Indexes:** GIST on `geom`, btree on `week_date`.\n",
+    "- `public.data_center_usdm_drought_dc_week`\n",
    "  - many-to-1 summary rollup -> `public.data_center_usdm_drought_exposure`\n",
    "\n",
-    "**Size:** ~12,000 polygon rows across 1,356 weeks (Jan 2000 – mid 2025).\n",
+    "### Rerun Notes\n",
-    "\n",
+    "- Supports repeat runs when new USDM weeks or new data centers are added.\n",
-    "**Example uses:**\n",
+    "- Weekly table can be reloaded and the downstream `dc_week` + `exposure` tables can be recomputed from that source."
    "```sql\n",
    "-- Map of D3+ drought in August 2022\n",
    "SELECT week_date, dm_category, geom\n",
    "FROM usdm_drought_weekly\n",
    "WHERE week_date = '2022-08-30' AND dm_category >= 3;\n",
    "\n",
    "-- Worst week ever for a specific lat/lon\n",
    "SELECT week_date, MAX(dm_category) AS worst_dm\n",
    "FROM usdm_drought_weekly\n",
    "WHERE ST_Within(ST_SetSRID(ST_MakePoint(-98.5, 29.5), 4326), geom)\n",
    "GROUP BY week_date ORDER BY worst_dm DESC, week_date LIMIT 10;\n",
    "```\n",
    "\n",
    "---\n",
    "\n",
    "### 2. `public.data_center_usdm_drought_dc_week`\n",
    "\n",
    "Long-form per-(DC, week) intermediate. One row per data center per USDM week observed; useful for time-series and streak analysis. Computed from `usdm_drought_weekly` via spatial join, then back-filled so every covered DC has a row for every week.\n",
    "\n",
    "| Column | Type | Meaning |\n",
    "|---|---|---|\n",
    "| `master_id` | `text` PK (composite) | FK → `master_data_centers.master_id` |\n",
    "| `week_date` | `date` PK (composite) | USDM week |\n",
    "| `worst_dm` | `smallint` | Max `dm_category` whose polygon contained the DC point that week. **`-1` means observed week but no drought polygon contained the DC** (filter `worst_dm >= 0` for actual drought weeks) |\n",
    "\n",
    "**Indexes:** PK on `(master_id, week_date)`, btree on `week_date`, btree on `worst_dm`.\n",
    "\n",
    "**Size:** ~2.5 M rows (1,833 DCs × 1,356 weeks, minus DCs not covered by USDM).\n",
    "\n",
    "**Example uses:**\n",
    "```sql\n",
    "-- Drought timeline for one DC\n",
    "SELECT week_date, worst_dm\n",
    "FROM data_center_usdm_drought_dc_week\n",
    "WHERE master_id = 'curated/1010260676' AND worst_dm >= 0\n",
    "ORDER BY week_date;\n",
    "\n",
    "-- DCs that were in D4 during a specific week\n",
    "SELECT master_id FROM data_center_usdm_drought_dc_week\n",
    "WHERE week_date = '2012-07-24' AND worst_dm = 4;\n",
    "```\n",
    "\n",
    "If you only need the per-DC summary, this table can be dropped — it's regenerable from `usdm_drought_weekly` + `master_data_centers`.\n",
    "\n",
    "---\n",
    "\n",
    "### 3. `public.data_center_usdm_drought_exposure`\n",
    "\n",
    "Per-DC drought-exposure summary keyed by `master_id`. The analytical surface — one row per data center with all the headline metrics. Joinable directly to `master_data_centers` and `data_center_historical_climate`.\n",
    "\n",
    "| Column | Type | Meaning |\n",
    "|---|---|---|\n",
    "| `master_id` | `text` PK | FK → `master_data_centers.master_id` |\n",
    "| Identity cols | `source`, `name`, `operator`, `city`, `state`, `country`, `longitude`, `latitude`, `geom` — denormalized from master for convenience |\n",
    "| `usdm_status` | `text` | `'covered'` (USDM zone) or `'no_coverage'` (outside USDM extent) |\n",
    "| `drought_period_start`, `drought_period_end` | `date` | First / last USDM week observed for this DC |\n",
    "| `weeks_observed` | `int` | Total weekly observations |\n",
    "| `weeks_in_d0_or_worse` … `weeks_in_d4` | `int` | Cumulative weekly counts at each severity threshold |\n",
    "| `pct_weeks_in_d0_or_worse` … `pct_weeks_in_d4` | `double` | Same as ratios over `weeks_observed` |\n",
    "| `worst_dm_category` | `smallint` | Max DM ever experienced (0–4) |\n",
    "| `mean_dm_category` | `double` | Average DM across all weeks, treating no-drought (`-1`) as 0 |\n",
    "| `longest_d0_streak_weeks` | `int` | Longest consecutive run with any drought (D0+) |\n",
    "| `longest_d2_streak_weeks` | `int` | Longest consecutive run with severe drought (D2+) — **the headline streak metric** |\n",
    "| `longest_d3_streak_weeks` | `int` | Longest consecutive run with extreme drought (D3+) |\n",
    "| `fetched_at`, `updated_at` | `timestamptz` | Provenance |\n",
    "\n",
    "**Indexes:** GIST on `geom`, btree on `state`, btree on `worst_dm_category`.\n",
    "\n",
    "**Size:** 1,833 rows (one per master DC; PR sites flagged `no_coverage` if applicable).\n",
    "\n",
    "**Headline metric for site-selection analysis:** `pct_weeks_in_d2_or_worse`. D2 = \"Severe Drought\" is the threshold at which water-use restrictions typically kick in for utilities and municipalities.\n",
    "\n",
    "**Example: joined climate + drought view for cooling-water risk analysis**\n",
    "```sql\n",
    "SELECT\n",
    "    c.master_id, c.name, c.state,\n",
    "    c.cooling_degree_days_c,                  -- baseline cooling load\n",
    "    c.mean_wet_bulb_temperature_c,            -- evaporative-cooling efficiency\n",
    "    d.pct_weeks_in_d2_or_worse * 100 AS pct_severe_drought,\n",
    "    d.longest_d2_streak_weeks,\n",
    "    d.worst_dm_category\n",
    "FROM data_center_historical_climate c\n",
    "JOIN data_center_usdm_drought_exposure d USING (master_id)\n",
    "WHERE d.usdm_status = 'covered'\n",
    "ORDER BY (c.cooling_degree_days_c * d.pct_weeks_in_d2_or_worse) DESC\n",
    "LIMIT 25;\n",
    "```\n",
    "\n",
    "---\n",
    "\n",
    "### Relationship diagram\n",
    "\n",
    "```\n",
    "master_data_centers (master_id PK)\n",
    "        │\n",
    "        ├── data_center_historical_climate     (master_id PK)  ← from open_meteo/Daymet/gridMET notebook\n",
    "        │\n",
    "        └── data_center_usdm_drought_exposure  (master_id PK)  ← this notebook\n",
    "                  │\n",
    "                  └── data_center_usdm_drought_dc_week  (master_id, week_date)\n",
    "                                  │\n",
    "                                  └── usdm_drought_weekly  (id PK, week_date, dm_category, geom)\n",
    "```\n",
    "\n",
    "All three USDM tables are regenerable from the zip files in `USDM Shape Files/`. `RELOAD_WEEKLY=True` rebuilds from scratch; `RECOMPUTE_SUMMARY=True` (default) recomputes the dc-week + exposure tables from whatever's in `usdm_drought_weekly`.\n"
   ]
  }
 ],