diff --git a/output/data_center_demographic_ruca_energy_summary.docx b/output/data_center_demographic_ruca_energy_summary.docx index eebca32..b61ae3c 100644 Binary files a/output/data_center_demographic_ruca_energy_summary.docx and b/output/data_center_demographic_ruca_energy_summary.docx differ diff --git a/output/data_center_demographic_ruca_energy_summary.md b/output/data_center_demographic_ruca_energy_summary.md index 6c38dc5..48c3b7b 100644 --- a/output/data_center_demographic_ruca_energy_summary.md +++ b/output/data_center_demographic_ruca_energy_summary.md @@ -29,8 +29,9 @@ | `ruca_codes_2020_tract` | 85,528 tracts | `tract_fips_20 = geoid` | 1,826 matched (99.6%) | | `watershed_huc8` | 2,139 watersheds | `ST_Contains(w.geom, m.geom)` | 1,831 matched (99.9%) | | `energy_eia_operating_generator_capacity_flat` | 4.7M rows | `ST_DWithin(geom, 50km)` | 1,831 DCs have ≥1 nearby gen | +| `energy_eia_seds_flat` (annual, 1960–2024) | 2.57M rows | `state_id` | Used in §7 for state electricity consumption (series `ESTCB`, 2024) | -Energy aggregation uses period `2026-02` only with `status='OP'`, summing `nameplate_capacity_mw` for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the `_flat` table had no MW values despite its name. +Energy aggregation uses period `2026-02` only with `status='OP'`, summing `nameplate_capacity_mw` for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the `_flat` table had no MW values despite its name. SEDS was backfilled 2026-05-18 (initial smoke-test had only 50 rows). --- @@ -205,7 +206,44 @@ Aggregated across DCs in RUCA 2–10 (i.e. anything outside dense metro core, n= --- -## 7. Watershed (HUC8) concentration +## 7. State grid context — how DC-saturated is each top state? + +Section 6 shows DC-adjacent capacity in absolute MW, which is hard to interpret without knowing the size of the state grid. Using OGC for state-total generating capacity (period `2026-02`, status `OP`) and SEDS series `ESTCB` for 2024 in-state electricity consumption, we can express each state's DC footprint as a **share of its own grid**. + +The "DC-adjacent capacity" column sums distinct in-state generators (i.e., no double-counting) whose 50 km neighborhood includes at least one in-state data center. + +| State | DCs | State grid (GW) | State elec. consumption (TWh, 2024) | DC-adjacent capacity (GW) | **% of state capacity within 50 km of a DC** | +|---|---:|---:|---:|---:|---:| +| VA | 378 | 30.8 | 138.0 | 15.4 | 50% | +| TX | 162 | 194.2 | 505.3 | 61.4 | 32% | +| CA | 147 | 105.1 | 245.6 | 51.6 | 49% | +| **OR** | 145 | 17.2 | 59.7 | 11.7 | **68%** | +| OH | 103 | 34.4 | 153.7 | 12.7 | 37% | +| WA | 93 | 29.6 | 90.0 | 7.9 | 27% | +| AZ | 69 | 40.1 | 90.8 | 22.5 | 56% | +| IA | 65 | 24.6 | 54.9 | 4.9 | 20% | +| **NJ** | 62 | 17.8 | 73.5 | 14.7 | **83%** | +| IL | 61 | 51.7 | 133.2 | 17.4 | 34% | +| GA | 50 | 42.3 | 150.0 | 14.2 | 34% | +| NY | 48 | 42.7 | 140.5 | 25.8 | 61% | +| **NV** | 41 | 18.7 | 40.7 | 14.0 | **75%** | +| **TN** | 32 | 23.3 | 102.9 | 16.4 | **70%** | +| NC | 31 | 38.9 | 136.9 | 17.4 | 45% | + +**The DC-saturation reordering.** Virginia leads in raw DC count (378), but four states have grids where *more than two-thirds* of all in-state generating capacity sits within 50 km of a data center: + +- **New Jersey — 83%.** Effectively the entire state's electrical economy is DC-adjacent. NJ's 62 DCs are NYC-metro carrier hotels concentrated in a small geographic footprint relative to a small state grid (17.8 GW). +- **Nevada — 75%.** Las Vegas and Reno DCs co-locate with the gas-and-solar generation that serves Las Vegas urbanization. NV has a small grid (18.7 GW) and most of it serves the same two metros. +- **Tennessee — 70%.** Nashville + Memphis DCs sit near TVA's central generation belt. +- **Oregon — 68%.** Even though OR's DC cluster is mostly non-metro (Boardman / Hermiston / The Dalles), the Columbia hydro corridor serving them accounts for two-thirds of OR's 17.2 GW grid. This is the only state where the saturation comes from rural hyperscale builds rather than urban carrier hotels. + +**The opposite end.** **Iowa (20%)** has 65 DCs but they all cluster around Council Bluffs / Des Moines, leaving the rural wind belt that dominates IA's grid unrelated to DC siting. **Washington (27%)** is similar — the Quincy hyperscale cluster is small relative to WA's Columbia hydro and Puget-area generation. + +**Why the proportional view matters.** A 1 GW DC load lands very differently on the NJ grid (5.6% of total capacity) than on the TX grid (0.5%). Reliability, transmission-queue interconnection waits, and political pushback all scale with the proportional draw, not the absolute MW. By that yardstick, the canonical "VA dominates US DCs" story is incomplete — VA, NJ, OR, NV, TN, NY, and AZ are the states where the DC industry is *structurally entangled* with the grid, and where any large new build runs into capacity-share constraints first. + +--- + +## 8. Watershed (HUC8) concentration Each DC sits in exactly one USGS HUC8 watershed (8-digit hydrologic unit, subbasin scale, median ~3,000 sq km). Cooling-water draw and wastewater discharge happen at watershed scale, not state scale — a single stressed basin can cap an entire DC corridor regardless of how big the state's overall water budget is. @@ -277,6 +315,71 @@ This watershed view is a **boundary set** for downstream water-stress analysis. --- +## 9. Database inventory (`data_centers` schema `public`) + +All tables in the working database as of 2026-05-18. "Used here" = referenced in §1–§8 of this report. PostGIS internal tables (`spatial_ref_sys`, `geography_columns`, `geometry_columns`) are omitted. + +### Data center inventory and clustering + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `master_data_centers` | 1,833 | ✓ | Unified, deduplicated DC inventory — the canonical row-per-DC table joining curated, OSM, and sample sources via `master_id`. | +| `osm_data_centers` | 1,549 | — | Raw OSM-derived DC features (nodes/ways tagged as data centers), one of the inputs to `master_data_centers`. | +| `us_dc_sample_geocoded` | 1,489 | — | Earlier sample-list DC inventory with geocoding lineage (Nominatim + Census TIGER), superseded by `master_data_centers` but retained for provenance. | +| `data_centers_union` (view) | — | — | Convenience view unioning the curated and OSM source rows with a `source` discriminator. | +| `master_data_center_spatial_clusters` | 1,831 | ✓ | DBSCAN cluster assignment per DC (`cluster_id`, noise flag), used in §3. | + +### Per-DC join tables + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `data_center_census_tracts_2024` | 1,815 | ✓ | One row per DC with attached ACS-2024 demographics from its containing tract — the master demographic join. | +| `data_center_watershed_huc8` | 1,833 | ✓ | One row per DC with its containing USGS HUC8 watershed (`huc8`, name, states, area), built 2026-05-18 via `ST_Within`. | + +### Base geographic / demographic layers + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `_dc_census_tract_acs_2024` | 85,382 | ✓ | Staging: ACS-2024 5-year profile attributes for every US tract that contains a DC (and surrounding tracts for context). | +| `_dc_census_tract_boundaries_2024` | 85,058 | — | Staging: TIGER 2024 tract polygons for the DC-tract universe. | +| `ruca_codes_2020_tract` | 85,528 | ✓ | USDA RUCA 2020 codes per tract, the metro/micropolitan/rural classification used in §4–§5. | +| `watershed_huc8` | 2,139 | ✓ | USGS Watershed Boundary Dataset HUC8 subbasin polygons (median ~3,000 km²) covering CONUS + AK. | +| `_watershed_huc8_stage` | 369 | — | Staging table from an earlier partial WBD load, superseded by `watershed_huc8`. Candidate for cleanup. | +| `census_tract_huc8_link` | 806 | — | Tract↔HUC8 spatial overlap table (with overlap %) for the subset of tracts containing a DC. Useful for downstream tract-level water-stress joins. | + +### Energy data + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `energy_eia_operating_generator_capacity_flat` | 4.7M | ✓ | EIA Form-860 operating generator inventory, monthly 2008–2026, with nameplate / summer / winter MW and point geometry. Source for §6 and §7 capacity figures. | +| `energy_eia_seds_flat` | 2.57M | ✓ | EIA SEDS annual state energy series 1960–2024 (consumption, prices, expenditures by sector / fuel). Source for §7 state electricity consumption (`ESTCB`, 2024). Backfilled 2026-05-18. | +| `energy_atlas_layers_catalog` | ~5 | — | Metadata catalog of EIA layers ingested by `ingest_eia_energy_layers.py` (table name, source URL, import timestamp). | +| `im3_state_projected_moderate_50` | 328 | — | PNNL IM3 projected DC siting under the moderate-growth scenario at gravity-weight 0.50 — one row per projected facility (cost, IT MW, cooling-water demand, lat/lon). Loaded but unused. | +| `im3_projected_state_demand_summary` | 31 | — | State-level rollup of IM3 projected facility counts, IT MW, and cooling demand. Loaded but unused. | +| `seds_national_msn_year` | 0 | — | Empty placeholder for national SEDS time-series; superseded by `energy_eia_seds_flat`. Drop candidate. | +| `seds_state_msn_year` | 0 | — | Empty placeholder for state SEDS time-series; superseded by `energy_eia_seds_flat`. Drop candidate. | +| `utility_rate_tracker_2025_2028` | 374 | — | Utility rate-increase tracker by provider × state × service type, with effective dates and monthly $ + % increases. Loaded but unused in the demographic/energy analysis. | + +### Connectivity (submarine cables, exchange capacity) + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `internet_cables` | 693 | — | Submarine cable routes (geometry, RFS year, decommission year, owners, length km) from TeleGeography-style data. | +| `internet_cable_landing_points` | 3,361 | — | Cable landing points (country, name, TBD flag) — endpoint nodes for `internet_cables`. | +| `internet_cable_meta` | 2 | — | Source-provenance metadata for the cable dataset (key/value). | +| `internet_cable_year_summaries` | 58 | — | Year-by-year narrative descriptions of cable activity. | +| `internet_city_dominance` | 4,552 | — | City-level physical capacity (Tbps), logical-dominance IP count, and top ASNs — proxy for internet-hub strength of each candidate DC city. | + +### Other + +| Table | Rows | Used here | Description | +|---|---:|:-:|---| +| `opposition_cases_geocoded` | 18 | — | Geocoded community-opposition cases against DC builds (developer, investment $B, outcome, governance response). Loaded but unused — see next-steps item #5. | + +**Cleanup candidates.** `_watershed_huc8_stage`, `seds_national_msn_year`, `seds_state_msn_year`, and possibly `us_dc_sample_geocoded` are superseded by their canonical counterparts and could be dropped to reduce confusion. + +--- + ## Data quality flags 1. **`master_data_centers.power_mw` is populated for only 108 / 1,833 DCs (5.9%).** Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap. @@ -293,5 +396,5 @@ This watershed view is a **boundary set** for downstream water-stress analysis. 1. **Backfill `power_mw`** from Baxtel / Data Center Map (paid scrape — grant work). 2. **Operator-string deduplication** — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis. 3. **Water-stress overlay against the 257 watersheds** — now that the HUC8 join is in place, pull USGS WaterWatch streamflow data, USGS water-use estimates, or EPA drought-status indicators against this watershed set. A single stress index per HUC8 would size the entire US fleet's water exposure. -4. **State-level energy demand context** — `im3_state_projected_moderate_50` and `seds_state_msn_year` are loaded; joining these would let us compute "DC nearby capacity as share of state grid" rather than absolute MW. +4. **Forward-projected demand overlay** — the static SEDS / OGC capacity-share view in §7 is a snapshot. Joining `im3_state_projected_moderate_50` against the §7 saturation table would let us flag which already-saturated states (NJ, NV, TN, OR) are projected to need the most additional generation before 2050. 5. **Opposition cases overlay** — `opposition_cases_geocoded` is loaded but unused; could test whether cluster-vs-isolated demographic differences (or watershed concentration) predict community opposition.