Standardize notebook table-relationship documentation cells
This commit is contained in:
@@ -1486,29 +1486,23 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Tables Created by This Notebook and Their Relationships\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook creates and/or maintains five PostgreSQL tables in the `public` schema:\n",
|
"### Tables Created / Maintained\n",
|
||||||
"\n",
|
|
||||||
"1. `public.fcc_bdc_as_of`\n",
|
"1. `public.fcc_bdc_as_of`\n",
|
||||||
"- One row per FCC BDC release date and data type.\n",
|
"- Release/version metadata by `as_of_date`.\n",
|
||||||
"- Primary metadata table used to track versioning (`as_of_date`) for downstream loads.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"2. `public.fcc_bdc_files`\n",
|
"2. `public.fcc_bdc_files`\n",
|
||||||
"- One row per file discovered/downloaded for a release.\n",
|
"- File-level lineage records for each FCC BDC release.\n",
|
||||||
"- Linked to releases via `as_of_date` and used as file-level lineage/provenance.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"3. `public.fcc_bdc_broadband_by_datacenter`\n",
|
"3. `public.fcc_bdc_broadband_by_datacenter`\n",
|
||||||
"- Fact table keyed by `(master_id, as_of_date)` for per-data-center broadband availability metrics.\n",
|
"- Per-data-center broadband fact table keyed by `(master_id, as_of_date)`.\n",
|
||||||
"- Includes scalar broadband fields and summary JSON payloads.\n",
|
|
||||||
"- `master_id` aligns with `public.master_data_centers.master_id`.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"4. `public.fcc_bdc_broadband_summary`\n",
|
"4. `public.fcc_bdc_broadband_summary`\n",
|
||||||
"- Aggregated summary metrics by release (`as_of_date`) used for QA and reporting.\n",
|
"- Release-level aggregate summary metrics.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"5. `public.fcc_bdc_provider_summary`\n",
|
"5. `public.fcc_bdc_provider_summary`\n",
|
||||||
"- Provider catalog/aggregation table by release (`as_of_date`) with provider class rollups.\n",
|
"- Release-level provider catalog and provider-class summary metrics.\n",
|
||||||
"\n",
|
|
||||||
"### Relationship Summary\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
"- `public.fcc_bdc_as_of (as_of_date)`\n",
|
"- `public.fcc_bdc_as_of (as_of_date)`\n",
|
||||||
" - 1-to-many -> `public.fcc_bdc_files (as_of_date)`\n",
|
" - 1-to-many -> `public.fcc_bdc_files (as_of_date)`\n",
|
||||||
" - 1-to-many -> `public.fcc_bdc_broadband_by_datacenter (as_of_date)`\n",
|
" - 1-to-many -> `public.fcc_bdc_broadband_by_datacenter (as_of_date)`\n",
|
||||||
@@ -1518,7 +1512,9 @@
|
|||||||
"- `public.master_data_centers (master_id)`\n",
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
" - 1-to-many over time -> `public.fcc_bdc_broadband_by_datacenter (master_id, as_of_date)`\n",
|
" - 1-to-many over time -> `public.fcc_bdc_broadband_by_datacenter (master_id, as_of_date)`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In short: **release metadata (`as_of` + `files`) supports reproducible loads, while per-DC broadband facts and release-level/provider-level summaries support analysis.**"
|
"### Rerun Notes\n",
|
||||||
|
"- The notebook is designed for repeat refreshes as new FCC releases arrive.\n",
|
||||||
|
"- Use `as_of_date` as the version key when comparing snapshots over time."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -916,6 +916,28 @@
|
|||||||
"print('Top non-metro watersheds (RUCA 4-10):')\n",
|
"print('Top non-metro watersheds (RUCA 4-10):')\n",
|
||||||
"nm_ws.head(15).reset_index(drop=True)\n"
|
"nm_ws.head(15).reset_index(drop=True)\n"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "25",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
|
"\n",
|
||||||
|
"### Tables Created / Maintained\n",
|
||||||
|
"1. `public.ruca_codes_2020_tract`\n",
|
||||||
|
"- Tract-level RUCA lookup loaded from `new/RUCA-codes-2020-tract.csv`.\n",
|
||||||
|
"- Rebuilt with drop + recreate during load.\n",
|
||||||
|
"- Primary key: `tract_fips_20`.\n",
|
||||||
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
|
"- `public.master_data_centers (geoid)`\n",
|
||||||
|
" - many-to-1 -> `public.ruca_codes_2020_tract (tract_fips_20)`\n",
|
||||||
|
"\n",
|
||||||
|
"### Rerun Notes\n",
|
||||||
|
"- Rerunning refreshes the RUCA lookup table from the latest CSV.\n",
|
||||||
|
"- Downstream joins in this notebook read from this table but do not create additional persistent analysis tables."
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@@ -895,21 +895,18 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Tables Created by This Notebook and Their Relationships\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook creates and/or maintains one primary PostGIS table:\n",
|
"### Tables Created / Maintained\n",
|
||||||
"\n",
|
|
||||||
"1. `public.data_center_historical_climate`\n",
|
"1. `public.data_center_historical_climate`\n",
|
||||||
"- One row per data center (`master_id`).\n",
|
"- One row per `master_id` with climate summary fields and geometry.\n",
|
||||||
"- Stores climate summary metrics (temperature, humidity, wet-bulb, precipitation variability, cooling-degree-days, wind fields/status), geometry, and lineage timestamps.\n",
|
"- Populated by incremental upsert so reruns refresh existing sites and add new sites.\n",
|
||||||
"- Upserted incrementally so reruns refresh changed rows without duplicating records.\n",
|
|
||||||
"\n",
|
|
||||||
"### Relationship Summary\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
"- `public.master_data_centers (master_id)`\n",
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
" - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
|
" - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`public.data_center_historical_climate.master_id` is a foreign key to `public.master_data_centers.master_id` (with cascade delete), so climate rows track the master data-center record set.\n",
|
"### Rerun Notes\n",
|
||||||
"\n",
|
"- Safe to rerun when the master data-center set changes.\n",
|
||||||
"In short: **`master_data_centers` is the entity table, and `data_center_historical_climate` is its one-row-per-site climate feature extension.**"
|
"- Existing rows are updated in place; no duplicate-per-site history table is created by this notebook."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -1184,23 +1184,35 @@
|
|||||||
"id": "22",
|
"id": "22",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Tables Created\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook creates four PostGIS tables for NOAA HMS smoke exposure analysis. The tables are designed to separate source observations, raw geometries, long-form data-center exposure, and the final per-site summary.\n",
|
"### Tables Created / Maintained\n",
|
||||||
|
"1. `public.hms_smoke_days`\n",
|
||||||
|
"- One row per observed HMS product day (daily denominator table).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"| Table | Grain | Purpose |\n",
|
"2. `public.hms_smoke_daily`\n",
|
||||||
"|---|---|---|\n",
|
"- One row per smoke polygon geometry from HMS source products.\n",
|
||||||
"| `public.hms_smoke_days` | One row per observed HMS product day | Denominator table for daily percentages, including days with zero smoke polygons. Stores `smoke_date`, source metadata, and `feature_count`. |\n",
|
|
||||||
"| `public.hms_smoke_daily` | One row per HMS smoke polygon | Raw smoke plume geometry table. Stores `smoke_date`, satellite/time fields, normalized `density`, `density_rank`, source metadata, and `geom`. |\n",
|
|
||||||
"| `public.data_center_hms_smoke_dc_day` | One row per `(master_id, smoke_date)` | Long-form daily exposure table for every data center on every observed HMS day. `max_density_rank = 0` means observed no smoke; `1`, `2`, and `3` represent light/unspecified, medium, and heavy smoke exposure. |\n",
|
|
||||||
"| `public.data_center_hms_smoke_exposure` | One row per `master_id` | Final per-data-center summary table joinable to `public.master_data_centers`. Includes location fields, observation status, smoke-period dates, exposure-day counts, percentage metrics, worst/mean density, and longest streak metrics. |\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"Recommended use:\n",
|
"3. `public.data_center_hms_smoke_dc_day`\n",
|
||||||
|
"- One row per `(master_id, smoke_date)` with daily smoke exposure classification.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- Use `public.data_center_hms_smoke_exposure` for most site-level analysis and ranking.\n",
|
"4. `public.data_center_hms_smoke_exposure`\n",
|
||||||
"- Use `public.data_center_hms_smoke_dc_day` for time-series analysis, seasonal summaries, or custom thresholds.\n",
|
"- One row per `master_id` with summary smoke-exposure metrics.\n",
|
||||||
"- Use `public.hms_smoke_daily` when you need the original smoke plume geometries for mapping or spatial QA.\n",
|
"\n",
|
||||||
"- Use `public.hms_smoke_days` whenever calculating percentages so no-smoke observed days remain in the denominator."
|
"### Key Relationships\n",
|
||||||
|
"- `public.hms_smoke_days (smoke_date)`\n",
|
||||||
|
" - 1-to-many -> `public.hms_smoke_daily (smoke_date)`\n",
|
||||||
|
"\n",
|
||||||
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
|
" - 1-to-many -> `public.data_center_hms_smoke_dc_day (master_id, smoke_date)`\n",
|
||||||
|
" - 1-to-1 (effective) -> `public.data_center_hms_smoke_exposure (master_id)`\n",
|
||||||
|
"\n",
|
||||||
|
"- `public.data_center_hms_smoke_dc_day`\n",
|
||||||
|
" - many-to-1 summary rollup -> `public.data_center_hms_smoke_exposure`\n",
|
||||||
|
"\n",
|
||||||
|
"### Rerun Notes\n",
|
||||||
|
"- Designed for repeat refreshes as additional HMS days become available.\n",
|
||||||
|
"- Summary exposure table is recomputed from daily source/bridge tables so results stay consistent after reloads."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -844,21 +844,18 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Tables Created by This Notebook and Their Relationships\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook creates and/or maintains one primary PostGIS table:\n",
|
"### Tables Created / Maintained\n",
|
||||||
"\n",
|
|
||||||
"1. `public.data_center_historical_climate`\n",
|
"1. `public.data_center_historical_climate`\n",
|
||||||
"- One row per data center (`master_id`).\n",
|
"- One row per `master_id` with climate summary fields and geometry.\n",
|
||||||
"- Stores climate summary metrics (temperature, humidity, wet-bulb, precipitation variability, cooling-degree-days, wind fields/status), geometry, and lineage timestamps.\n",
|
"- Populated by incremental upsert so reruns refresh existing sites and add new sites.\n",
|
||||||
"- Upserted incrementally so reruns refresh changed rows without duplicating records.\n",
|
|
||||||
"\n",
|
|
||||||
"### Relationship Summary\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
"- `public.master_data_centers (master_id)`\n",
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
" - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
|
" - 1-to-1 (effective) -> `public.data_center_historical_climate (master_id)`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`public.data_center_historical_climate.master_id` is a foreign key to `public.master_data_centers.master_id` (with cascade delete), so climate rows track the master data-center record set.\n",
|
"### Rerun Notes\n",
|
||||||
"\n",
|
"- Safe to rerun when the master data-center set changes.\n",
|
||||||
"In short: **`master_data_centers` is the entity table, and `data_center_historical_climate` is its one-row-per-site climate feature extension.**"
|
"- Existing rows are updated in place; no duplicate-per-site history table is created by this notebook."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -538,6 +538,29 @@
|
|||||||
" for row in cur.fetchall():\n",
|
" for row in cur.fetchall():\n",
|
||||||
" print(f'{row[0]}.{row[1]}')"
|
" print(f'{row[0]}.{row[1]}')"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "11",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
|
"\n",
|
||||||
|
"### Tables Created / Maintained\n",
|
||||||
|
"1. `TARGET_TABLE` (configured at runtime)\n",
|
||||||
|
"- Generic loader output table built from the current dataframe schema.\n",
|
||||||
|
"- Replaced/appended according to `if_exists` behavior.\n",
|
||||||
|
"- Optional point geometry can be added in helper cells.\n",
|
||||||
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
|
"- This notebook is table-agnostic: relationships depend on the selected `TARGET_TABLE` and source columns.\n",
|
||||||
|
"- When key columns (for example `master_id`, `geoid`, IDs, dates) are present, the loaded table can be joined to domain tables.\n",
|
||||||
|
"- When geometry is present, the loaded table can participate in spatial joins.\n",
|
||||||
|
"\n",
|
||||||
|
"### Rerun Notes\n",
|
||||||
|
"- Safe to rerun for recurring refreshes of different source files.\n",
|
||||||
|
"- Always confirm `TARGET_TABLE` and `if_exists` before execution to avoid unintended replacement of existing tables."
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@@ -1676,36 +1676,20 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Tables Created by This Notebook and Their Relationships\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook creates and/or maintains the following PostGIS/PostgreSQL tables:\n",
|
"### Tables Created / Maintained\n",
|
||||||
"\n",
|
|
||||||
"1. `public.rdh_precinct_vote_layers`\n",
|
"1. `public.rdh_precinct_vote_layers`\n",
|
||||||
"- One row per RDH precinct-election layer ingested.\n",
|
"- One row per ingested precinct-election layer.\n",
|
||||||
"- Key columns: `layer_id` (PK), `state_code`, `title`, `format`, file/source metadata, `loaded_at`.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"2. `public.rdh_precinct_vote_features`\n",
|
"2. `public.rdh_precinct_vote_features`\n",
|
||||||
"- One row per precinct polygon feature from a loaded layer.\n",
|
"- One row per precinct geometry feature with source properties JSON.\n",
|
||||||
"- Key columns: `feature_id` (PK), `layer_id` (FK), `state_code`, `source_row`, `properties` (JSONB), `geom` (MultiPolygon).\n",
|
|
||||||
"- Relationship: many features belong to one layer.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"3. `public.data_center_rdh_precinct_vote_matches`\n",
|
"3. `public.data_center_rdh_precinct_vote_matches`\n",
|
||||||
"- Spatial match table linking data centers to precinct features.\n",
|
"- Bridge table linking data centers to matched precinct features.\n",
|
||||||
"- Key columns: `master_id` (FK), `feature_id` (FK), `layer_id` (FK), `state_code`, `join_method`, `match_distance_m`, `matched_at`.\n",
|
|
||||||
"- Primary key: (`master_id`, `feature_id`).\n",
|
|
||||||
"- Relationship: many-to-many bridge between data centers and precinct features (with match metadata).\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"4. `public.data_center_election_context`\n",
|
"4. `public.data_center_election_context`\n",
|
||||||
"- Final standardized, one-row-per-data-center election context used by downstream mapping/analysis.\n",
|
"- Standardized, one-row-per-data-center election context for downstream analysis/mapping.\n",
|
||||||
"- Key columns: `master_id` (PK, FK), `name`, `city`, `state`, `rdh_layer_title`,\n",
|
|
||||||
" `precinct_identifier_name`, `election_year`, `office`, `democratic_votes`, `republican_votes`,\n",
|
|
||||||
" `total_votes`, `turnout_or_vote_share`, `updated_at`.\n",
|
|
||||||
"- Relationship: one row per `master_id` in `public.master_data_centers` (left-joined so all master rows can be retained, even if election fields are null).\n",
|
|
||||||
"\n",
|
|
||||||
"### Relationship Summary\n",
|
|
||||||
"\n",
|
|
||||||
"- `public.master_data_centers (master_id)`\n",
|
|
||||||
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (master_id)`\n",
|
|
||||||
" - 1-to-1 (effective in this notebook) -> `public.data_center_election_context (master_id)`\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
"- `public.rdh_precinct_vote_layers (layer_id)`\n",
|
"- `public.rdh_precinct_vote_layers (layer_id)`\n",
|
||||||
" - 1-to-many -> `public.rdh_precinct_vote_features (layer_id)`\n",
|
" - 1-to-many -> `public.rdh_precinct_vote_features (layer_id)`\n",
|
||||||
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (layer_id)`\n",
|
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (layer_id)`\n",
|
||||||
@@ -1713,7 +1697,13 @@
|
|||||||
"- `public.rdh_precinct_vote_features (feature_id)`\n",
|
"- `public.rdh_precinct_vote_features (feature_id)`\n",
|
||||||
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (feature_id)`\n",
|
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (feature_id)`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In short: **layers -> features -> matches**, then matches are standardized into **one election-context row per data center**."
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
|
" - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (master_id)`\n",
|
||||||
|
" - 1-to-1 (effective) -> `public.data_center_election_context (master_id)`\n",
|
||||||
|
"\n",
|
||||||
|
"### Rerun Notes\n",
|
||||||
|
"- Safe to rerun as new RDH layers and/or data centers are added.\n",
|
||||||
|
"- Reruns refresh matching outputs and regenerate standardized election context rows."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -1116,6 +1116,27 @@
|
|||||||
"else:\n",
|
"else:\n",
|
||||||
" print('WRITE_BACK_TO_DB is False; no database table was modified.')"
|
" print('WRITE_BACK_TO_DB is False; no database table was modified.')"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "32",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
|
"\n",
|
||||||
|
"### Tables Created / Maintained\n",
|
||||||
|
"1. `public.master_data_center_spatial_clusters` (optional write)\n",
|
||||||
|
"- One row per `master_id` with cluster label and clustering metadata.\n",
|
||||||
|
"- Written only when `WRITE_BACK_TO_DB = True`.\n",
|
||||||
|
"\n",
|
||||||
|
"### Key Relationships\n",
|
||||||
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
|
" - 1-to-1 (effective) -> `public.master_data_center_spatial_clusters (master_id)`\n",
|
||||||
|
"\n",
|
||||||
|
"### Rerun Notes\n",
|
||||||
|
"- Default behavior (`WRITE_BACK_TO_DB = False`) performs no table writes.\n",
|
||||||
|
"- With write-back enabled, reruns replace cluster assignments using the current parameters/data."
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@@ -677,134 +677,32 @@
|
|||||||
"id": "16",
|
"id": "16",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Tables Created\n",
|
"## Tables Created by This Notebook and Their Relationships\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook builds three tables in the `public` schema, all keyed (directly or transitively) to `master_data_centers.master_id`.\n",
|
"### Tables Created / Maintained\n",
|
||||||
|
"1. `public.usdm_drought_weekly`\n",
|
||||||
|
"- Weekly USDM drought polygons by `week_date` and drought category.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"---\n",
|
"2. `public.data_center_usdm_drought_dc_week`\n",
|
||||||
|
"- One row per `(master_id, week_date)` with weekly worst drought category at each data center.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### 1. `public.usdm_drought_weekly`\n",
|
"3. `public.data_center_usdm_drought_exposure`\n",
|
||||||
|
"- One row per `master_id` with summary drought-exposure metrics and streak fields.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Raw weekly USDM drought polygons — one row per `(week_date, dm_category)` (occasionally multiple rows for early-USDM weeks that published per-category fragments). Source of truth for any later spatial query against the drought record.\n",
|
"### Key Relationships\n",
|
||||||
|
"- `public.usdm_drought_weekly (week_date, dm_category, geom)`\n",
|
||||||
|
" - spatial/time source for -> `public.data_center_usdm_drought_dc_week`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"| Column | Type | Meaning |\n",
|
"- `public.master_data_centers (master_id)`\n",
|
||||||
"|---|---|---|\n",
|
" - 1-to-many -> `public.data_center_usdm_drought_dc_week (master_id, week_date)`\n",
|
||||||
"| `id` | `bigserial` PK | Surrogate row id |\n",
|
" - 1-to-1 (effective) -> `public.data_center_usdm_drought_exposure (master_id)`\n",
|
||||||
"| `week_date` | `date` | Tuesday-of-publication date parsed from filename (`USDM_YYYYMMDD_M.zip`) |\n",
|
|
||||||
"| `dm_category` | `smallint` | 0=D0 Abnormally Dry, 1=D1 Moderate, 2=D2 Severe, 3=D3 Extreme, 4=D4 Exceptional. **Cumulative** — D4 polygon is inside D3 inside D2… |\n",
|
|
||||||
"| `objectid`, `shape_leng`, `shape_area` | original shapefile attributes |\n",
|
|
||||||
"| `geom` | `geometry(MultiPolygon, 4326)` | Drought-affected area for that category that week |\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"**Indexes:** GIST on `geom`, btree on `week_date`.\n",
|
"- `public.data_center_usdm_drought_dc_week`\n",
|
||||||
|
" - many-to-1 summary rollup -> `public.data_center_usdm_drought_exposure`\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Size:** ~12,000 polygon rows across 1,356 weeks (Jan 2000 – mid 2025).\n",
|
"### Rerun Notes\n",
|
||||||
"\n",
|
"- Supports repeat runs when new USDM weeks or new data centers are added.\n",
|
||||||
"**Example uses:**\n",
|
"- Weekly table can be reloaded and the downstream `dc_week` + `exposure` tables can be recomputed from that source."
|
||||||
"```sql\n",
|
|
||||||
"-- Map of D3+ drought in August 2022\n",
|
|
||||||
"SELECT week_date, dm_category, geom\n",
|
|
||||||
"FROM usdm_drought_weekly\n",
|
|
||||||
"WHERE week_date = '2022-08-30' AND dm_category >= 3;\n",
|
|
||||||
"\n",
|
|
||||||
"-- Worst week ever for a specific lat/lon\n",
|
|
||||||
"SELECT week_date, MAX(dm_category) AS worst_dm\n",
|
|
||||||
"FROM usdm_drought_weekly\n",
|
|
||||||
"WHERE ST_Within(ST_SetSRID(ST_MakePoint(-98.5, 29.5), 4326), geom)\n",
|
|
||||||
"GROUP BY week_date ORDER BY worst_dm DESC, week_date LIMIT 10;\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"### 2. `public.data_center_usdm_drought_dc_week`\n",
|
|
||||||
"\n",
|
|
||||||
"Long-form per-(DC, week) intermediate. One row per data center per USDM week observed; useful for time-series and streak analysis. Computed from `usdm_drought_weekly` via spatial join, then back-filled so every covered DC has a row for every week.\n",
|
|
||||||
"\n",
|
|
||||||
"| Column | Type | Meaning |\n",
|
|
||||||
"|---|---|---|\n",
|
|
||||||
"| `master_id` | `text` PK (composite) | FK → `master_data_centers.master_id` |\n",
|
|
||||||
"| `week_date` | `date` PK (composite) | USDM week |\n",
|
|
||||||
"| `worst_dm` | `smallint` | Max `dm_category` whose polygon contained the DC point that week. **`-1` means observed week but no drought polygon contained the DC** (filter `worst_dm >= 0` for actual drought weeks) |\n",
|
|
||||||
"\n",
|
|
||||||
"**Indexes:** PK on `(master_id, week_date)`, btree on `week_date`, btree on `worst_dm`.\n",
|
|
||||||
"\n",
|
|
||||||
"**Size:** ~2.5 M rows (1,833 DCs × 1,356 weeks, minus DCs not covered by USDM).\n",
|
|
||||||
"\n",
|
|
||||||
"**Example uses:**\n",
|
|
||||||
"```sql\n",
|
|
||||||
"-- Drought timeline for one DC\n",
|
|
||||||
"SELECT week_date, worst_dm\n",
|
|
||||||
"FROM data_center_usdm_drought_dc_week\n",
|
|
||||||
"WHERE master_id = 'curated/1010260676' AND worst_dm >= 0\n",
|
|
||||||
"ORDER BY week_date;\n",
|
|
||||||
"\n",
|
|
||||||
"-- DCs that were in D4 during a specific week\n",
|
|
||||||
"SELECT master_id FROM data_center_usdm_drought_dc_week\n",
|
|
||||||
"WHERE week_date = '2012-07-24' AND worst_dm = 4;\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"If you only need the per-DC summary, this table can be dropped — it's regenerable from `usdm_drought_weekly` + `master_data_centers`.\n",
|
|
||||||
"\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"### 3. `public.data_center_usdm_drought_exposure`\n",
|
|
||||||
"\n",
|
|
||||||
"Per-DC drought-exposure summary keyed by `master_id`. The analytical surface — one row per data center with all the headline metrics. Joinable directly to `master_data_centers` and `data_center_historical_climate`.\n",
|
|
||||||
"\n",
|
|
||||||
"| Column | Type | Meaning |\n",
|
|
||||||
"|---|---|---|\n",
|
|
||||||
"| `master_id` | `text` PK | FK → `master_data_centers.master_id` |\n",
|
|
||||||
"| Identity cols | `source`, `name`, `operator`, `city`, `state`, `country`, `longitude`, `latitude`, `geom` — denormalized from master for convenience |\n",
|
|
||||||
"| `usdm_status` | `text` | `'covered'` (USDM zone) or `'no_coverage'` (outside USDM extent) |\n",
|
|
||||||
"| `drought_period_start`, `drought_period_end` | `date` | First / last USDM week observed for this DC |\n",
|
|
||||||
"| `weeks_observed` | `int` | Total weekly observations |\n",
|
|
||||||
"| `weeks_in_d0_or_worse` … `weeks_in_d4` | `int` | Cumulative weekly counts at each severity threshold |\n",
|
|
||||||
"| `pct_weeks_in_d0_or_worse` … `pct_weeks_in_d4` | `double` | Same as ratios over `weeks_observed` |\n",
|
|
||||||
"| `worst_dm_category` | `smallint` | Max DM ever experienced (0–4) |\n",
|
|
||||||
"| `mean_dm_category` | `double` | Average DM across all weeks, treating no-drought (`-1`) as 0 |\n",
|
|
||||||
"| `longest_d0_streak_weeks` | `int` | Longest consecutive run with any drought (D0+) |\n",
|
|
||||||
"| `longest_d2_streak_weeks` | `int` | Longest consecutive run with severe drought (D2+) — **the headline streak metric** |\n",
|
|
||||||
"| `longest_d3_streak_weeks` | `int` | Longest consecutive run with extreme drought (D3+) |\n",
|
|
||||||
"| `fetched_at`, `updated_at` | `timestamptz` | Provenance |\n",
|
|
||||||
"\n",
|
|
||||||
"**Indexes:** GIST on `geom`, btree on `state`, btree on `worst_dm_category`.\n",
|
|
||||||
"\n",
|
|
||||||
"**Size:** 1,833 rows (one per master DC; PR sites flagged `no_coverage` if applicable).\n",
|
|
||||||
"\n",
|
|
||||||
"**Headline metric for site-selection analysis:** `pct_weeks_in_d2_or_worse`. D2 = \"Severe Drought\" is the threshold at which water-use restrictions typically kick in for utilities and municipalities.\n",
|
|
||||||
"\n",
|
|
||||||
"**Example: joined climate + drought view for cooling-water risk analysis**\n",
|
|
||||||
"```sql\n",
|
|
||||||
"SELECT\n",
|
|
||||||
" c.master_id, c.name, c.state,\n",
|
|
||||||
" c.cooling_degree_days_c, -- baseline cooling load\n",
|
|
||||||
" c.mean_wet_bulb_temperature_c, -- evaporative-cooling efficiency\n",
|
|
||||||
" d.pct_weeks_in_d2_or_worse * 100 AS pct_severe_drought,\n",
|
|
||||||
" d.longest_d2_streak_weeks,\n",
|
|
||||||
" d.worst_dm_category\n",
|
|
||||||
"FROM data_center_historical_climate c\n",
|
|
||||||
"JOIN data_center_usdm_drought_exposure d USING (master_id)\n",
|
|
||||||
"WHERE d.usdm_status = 'covered'\n",
|
|
||||||
"ORDER BY (c.cooling_degree_days_c * d.pct_weeks_in_d2_or_worse) DESC\n",
|
|
||||||
"LIMIT 25;\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"### Relationship diagram\n",
|
|
||||||
"\n",
|
|
||||||
"```\n",
|
|
||||||
"master_data_centers (master_id PK)\n",
|
|
||||||
" │\n",
|
|
||||||
" ├── data_center_historical_climate (master_id PK) ← from open_meteo/Daymet/gridMET notebook\n",
|
|
||||||
" │\n",
|
|
||||||
" └── data_center_usdm_drought_exposure (master_id PK) ← this notebook\n",
|
|
||||||
" │\n",
|
|
||||||
" └── data_center_usdm_drought_dc_week (master_id, week_date)\n",
|
|
||||||
" │\n",
|
|
||||||
" └── usdm_drought_weekly (id PK, week_date, dm_category, geom)\n",
|
|
||||||
"```\n",
|
|
||||||
"\n",
|
|
||||||
"All three USDM tables are regenerable from the zip files in `USDM Shape Files/`. `RELOAD_WEEKLY=True` rebuilds from scratch; `RECOMPUTE_SUMMARY=True` (default) recomputes the dc-week + exposure tables from whatever's in `usdm_drought_weekly`.\n"
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
Reference in New Issue
Block a user