Add state grid context and database inventory to DC summary
Extends the demographic/RUCA/energy summary with two new sections: - §7 quantifies each top-DC state's "share of state capacity within 50 km of a DC," surfacing NJ (83%), NV (75%), TN (70%), and OR (68%) as the most DC-saturated grids — reframing the canonical VA-centric story by structural entanglement rather than raw count. - §9 inventories every table in the data_centers schema with a one-line description, flagging cleanup candidates and unused layers for downstream work. Also renumbers watershed analysis to §8, adds the SEDS row to the dataset coverage table, and narrows next-step #4 to the IM3 projection overlay (now that the SEDS join is complete). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Binary file not shown.
@@ -29,8 +29,9 @@
|
|||||||
| `ruca_codes_2020_tract` | 85,528 tracts | `tract_fips_20 = geoid` | 1,826 matched (99.6%) |
|
| `ruca_codes_2020_tract` | 85,528 tracts | `tract_fips_20 = geoid` | 1,826 matched (99.6%) |
|
||||||
| `watershed_huc8` | 2,139 watersheds | `ST_Contains(w.geom, m.geom)` | 1,831 matched (99.9%) |
|
| `watershed_huc8` | 2,139 watersheds | `ST_Contains(w.geom, m.geom)` | 1,831 matched (99.9%) |
|
||||||
| `energy_eia_operating_generator_capacity_flat` | 4.7M rows | `ST_DWithin(geom, 50km)` | 1,831 DCs have ≥1 nearby gen |
|
| `energy_eia_operating_generator_capacity_flat` | 4.7M rows | `ST_DWithin(geom, 50km)` | 1,831 DCs have ≥1 nearby gen |
|
||||||
|
| `energy_eia_seds_flat` (annual, 1960–2024) | 2.57M rows | `state_id` | Used in §7 for state electricity consumption (series `ESTCB`, 2024) |
|
||||||
|
|
||||||
Energy aggregation uses period `2026-02` only with `status='OP'`, summing `nameplate_capacity_mw` for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the `_flat` table had no MW values despite its name.
|
Energy aggregation uses period `2026-02` only with `status='OP'`, summing `nameplate_capacity_mw` for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the `_flat` table had no MW values despite its name. SEDS was backfilled 2026-05-18 (initial smoke-test had only 50 rows).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -205,7 +206,44 @@ Aggregated across DCs in RUCA 2–10 (i.e. anything outside dense metro core, n=
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 7. Watershed (HUC8) concentration
|
## 7. State grid context — how DC-saturated is each top state?
|
||||||
|
|
||||||
|
Section 6 shows DC-adjacent capacity in absolute MW, which is hard to interpret without knowing the size of the state grid. Using OGC for state-total generating capacity (period `2026-02`, status `OP`) and SEDS series `ESTCB` for 2024 in-state electricity consumption, we can express each state's DC footprint as a **share of its own grid**.
|
||||||
|
|
||||||
|
The "DC-adjacent capacity" column sums distinct in-state generators (i.e., no double-counting) whose 50 km neighborhood includes at least one in-state data center.
|
||||||
|
|
||||||
|
| State | DCs | State grid (GW) | State elec. consumption (TWh, 2024) | DC-adjacent capacity (GW) | **% of state capacity within 50 km of a DC** |
|
||||||
|
|---|---:|---:|---:|---:|---:|
|
||||||
|
| VA | 378 | 30.8 | 138.0 | 15.4 | 50% |
|
||||||
|
| TX | 162 | 194.2 | 505.3 | 61.4 | 32% |
|
||||||
|
| CA | 147 | 105.1 | 245.6 | 51.6 | 49% |
|
||||||
|
| **OR** | 145 | 17.2 | 59.7 | 11.7 | **68%** |
|
||||||
|
| OH | 103 | 34.4 | 153.7 | 12.7 | 37% |
|
||||||
|
| WA | 93 | 29.6 | 90.0 | 7.9 | 27% |
|
||||||
|
| AZ | 69 | 40.1 | 90.8 | 22.5 | 56% |
|
||||||
|
| IA | 65 | 24.6 | 54.9 | 4.9 | 20% |
|
||||||
|
| **NJ** | 62 | 17.8 | 73.5 | 14.7 | **83%** |
|
||||||
|
| IL | 61 | 51.7 | 133.2 | 17.4 | 34% |
|
||||||
|
| GA | 50 | 42.3 | 150.0 | 14.2 | 34% |
|
||||||
|
| NY | 48 | 42.7 | 140.5 | 25.8 | 61% |
|
||||||
|
| **NV** | 41 | 18.7 | 40.7 | 14.0 | **75%** |
|
||||||
|
| **TN** | 32 | 23.3 | 102.9 | 16.4 | **70%** |
|
||||||
|
| NC | 31 | 38.9 | 136.9 | 17.4 | 45% |
|
||||||
|
|
||||||
|
**The DC-saturation reordering.** Virginia leads in raw DC count (378), but four states have grids where *more than two-thirds* of all in-state generating capacity sits within 50 km of a data center:
|
||||||
|
|
||||||
|
- **New Jersey — 83%.** Effectively the entire state's electrical economy is DC-adjacent. NJ's 62 DCs are NYC-metro carrier hotels concentrated in a small geographic footprint relative to a small state grid (17.8 GW).
|
||||||
|
- **Nevada — 75%.** Las Vegas and Reno DCs co-locate with the gas-and-solar generation that serves Las Vegas urbanization. NV has a small grid (18.7 GW) and most of it serves the same two metros.
|
||||||
|
- **Tennessee — 70%.** Nashville + Memphis DCs sit near TVA's central generation belt.
|
||||||
|
- **Oregon — 68%.** Even though OR's DC cluster is mostly non-metro (Boardman / Hermiston / The Dalles), the Columbia hydro corridor serving them accounts for two-thirds of OR's 17.2 GW grid. This is the only state where the saturation comes from rural hyperscale builds rather than urban carrier hotels.
|
||||||
|
|
||||||
|
**The opposite end.** **Iowa (20%)** has 65 DCs but they all cluster around Council Bluffs / Des Moines, leaving the rural wind belt that dominates IA's grid unrelated to DC siting. **Washington (27%)** is similar — the Quincy hyperscale cluster is small relative to WA's Columbia hydro and Puget-area generation.
|
||||||
|
|
||||||
|
**Why the proportional view matters.** A 1 GW DC load lands very differently on the NJ grid (5.6% of total capacity) than on the TX grid (0.5%). Reliability, transmission-queue interconnection waits, and political pushback all scale with the proportional draw, not the absolute MW. By that yardstick, the canonical "VA dominates US DCs" story is incomplete — VA, NJ, OR, NV, TN, NY, and AZ are the states where the DC industry is *structurally entangled* with the grid, and where any large new build runs into capacity-share constraints first.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Watershed (HUC8) concentration
|
||||||
|
|
||||||
Each DC sits in exactly one USGS HUC8 watershed (8-digit hydrologic unit, subbasin scale, median ~3,000 sq km). Cooling-water draw and wastewater discharge happen at watershed scale, not state scale — a single stressed basin can cap an entire DC corridor regardless of how big the state's overall water budget is.
|
Each DC sits in exactly one USGS HUC8 watershed (8-digit hydrologic unit, subbasin scale, median ~3,000 sq km). Cooling-water draw and wastewater discharge happen at watershed scale, not state scale — a single stressed basin can cap an entire DC corridor regardless of how big the state's overall water budget is.
|
||||||
|
|
||||||
@@ -277,6 +315,71 @@ This watershed view is a **boundary set** for downstream water-stress analysis.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 9. Database inventory (`data_centers` schema `public`)
|
||||||
|
|
||||||
|
All tables in the working database as of 2026-05-18. "Used here" = referenced in §1–§8 of this report. PostGIS internal tables (`spatial_ref_sys`, `geography_columns`, `geometry_columns`) are omitted.
|
||||||
|
|
||||||
|
### Data center inventory and clustering
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `master_data_centers` | 1,833 | ✓ | Unified, deduplicated DC inventory — the canonical row-per-DC table joining curated, OSM, and sample sources via `master_id`. |
|
||||||
|
| `osm_data_centers` | 1,549 | — | Raw OSM-derived DC features (nodes/ways tagged as data centers), one of the inputs to `master_data_centers`. |
|
||||||
|
| `us_dc_sample_geocoded` | 1,489 | — | Earlier sample-list DC inventory with geocoding lineage (Nominatim + Census TIGER), superseded by `master_data_centers` but retained for provenance. |
|
||||||
|
| `data_centers_union` (view) | — | — | Convenience view unioning the curated and OSM source rows with a `source` discriminator. |
|
||||||
|
| `master_data_center_spatial_clusters` | 1,831 | ✓ | DBSCAN cluster assignment per DC (`cluster_id`, noise flag), used in §3. |
|
||||||
|
|
||||||
|
### Per-DC join tables
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `data_center_census_tracts_2024` | 1,815 | ✓ | One row per DC with attached ACS-2024 demographics from its containing tract — the master demographic join. |
|
||||||
|
| `data_center_watershed_huc8` | 1,833 | ✓ | One row per DC with its containing USGS HUC8 watershed (`huc8`, name, states, area), built 2026-05-18 via `ST_Within`. |
|
||||||
|
|
||||||
|
### Base geographic / demographic layers
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `_dc_census_tract_acs_2024` | 85,382 | ✓ | Staging: ACS-2024 5-year profile attributes for every US tract that contains a DC (and surrounding tracts for context). |
|
||||||
|
| `_dc_census_tract_boundaries_2024` | 85,058 | — | Staging: TIGER 2024 tract polygons for the DC-tract universe. |
|
||||||
|
| `ruca_codes_2020_tract` | 85,528 | ✓ | USDA RUCA 2020 codes per tract, the metro/micropolitan/rural classification used in §4–§5. |
|
||||||
|
| `watershed_huc8` | 2,139 | ✓ | USGS Watershed Boundary Dataset HUC8 subbasin polygons (median ~3,000 km²) covering CONUS + AK. |
|
||||||
|
| `_watershed_huc8_stage` | 369 | — | Staging table from an earlier partial WBD load, superseded by `watershed_huc8`. Candidate for cleanup. |
|
||||||
|
| `census_tract_huc8_link` | 806 | — | Tract↔HUC8 spatial overlap table (with overlap %) for the subset of tracts containing a DC. Useful for downstream tract-level water-stress joins. |
|
||||||
|
|
||||||
|
### Energy data
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `energy_eia_operating_generator_capacity_flat` | 4.7M | ✓ | EIA Form-860 operating generator inventory, monthly 2008–2026, with nameplate / summer / winter MW and point geometry. Source for §6 and §7 capacity figures. |
|
||||||
|
| `energy_eia_seds_flat` | 2.57M | ✓ | EIA SEDS annual state energy series 1960–2024 (consumption, prices, expenditures by sector / fuel). Source for §7 state electricity consumption (`ESTCB`, 2024). Backfilled 2026-05-18. |
|
||||||
|
| `energy_atlas_layers_catalog` | ~5 | — | Metadata catalog of EIA layers ingested by `ingest_eia_energy_layers.py` (table name, source URL, import timestamp). |
|
||||||
|
| `im3_state_projected_moderate_50` | 328 | — | PNNL IM3 projected DC siting under the moderate-growth scenario at gravity-weight 0.50 — one row per projected facility (cost, IT MW, cooling-water demand, lat/lon). Loaded but unused. |
|
||||||
|
| `im3_projected_state_demand_summary` | 31 | — | State-level rollup of IM3 projected facility counts, IT MW, and cooling demand. Loaded but unused. |
|
||||||
|
| `seds_national_msn_year` | 0 | — | Empty placeholder for national SEDS time-series; superseded by `energy_eia_seds_flat`. Drop candidate. |
|
||||||
|
| `seds_state_msn_year` | 0 | — | Empty placeholder for state SEDS time-series; superseded by `energy_eia_seds_flat`. Drop candidate. |
|
||||||
|
| `utility_rate_tracker_2025_2028` | 374 | — | Utility rate-increase tracker by provider × state × service type, with effective dates and monthly $ + % increases. Loaded but unused in the demographic/energy analysis. |
|
||||||
|
|
||||||
|
### Connectivity (submarine cables, exchange capacity)
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `internet_cables` | 693 | — | Submarine cable routes (geometry, RFS year, decommission year, owners, length km) from TeleGeography-style data. |
|
||||||
|
| `internet_cable_landing_points` | 3,361 | — | Cable landing points (country, name, TBD flag) — endpoint nodes for `internet_cables`. |
|
||||||
|
| `internet_cable_meta` | 2 | — | Source-provenance metadata for the cable dataset (key/value). |
|
||||||
|
| `internet_cable_year_summaries` | 58 | — | Year-by-year narrative descriptions of cable activity. |
|
||||||
|
| `internet_city_dominance` | 4,552 | — | City-level physical capacity (Tbps), logical-dominance IP count, and top ASNs — proxy for internet-hub strength of each candidate DC city. |
|
||||||
|
|
||||||
|
### Other
|
||||||
|
|
||||||
|
| Table | Rows | Used here | Description |
|
||||||
|
|---|---:|:-:|---|
|
||||||
|
| `opposition_cases_geocoded` | 18 | — | Geocoded community-opposition cases against DC builds (developer, investment $B, outcome, governance response). Loaded but unused — see next-steps item #5. |
|
||||||
|
|
||||||
|
**Cleanup candidates.** `_watershed_huc8_stage`, `seds_national_msn_year`, `seds_state_msn_year`, and possibly `us_dc_sample_geocoded` are superseded by their canonical counterparts and could be dropped to reduce confusion.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Data quality flags
|
## Data quality flags
|
||||||
|
|
||||||
1. **`master_data_centers.power_mw` is populated for only 108 / 1,833 DCs (5.9%).** Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap.
|
1. **`master_data_centers.power_mw` is populated for only 108 / 1,833 DCs (5.9%).** Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap.
|
||||||
@@ -293,5 +396,5 @@ This watershed view is a **boundary set** for downstream water-stress analysis.
|
|||||||
1. **Backfill `power_mw`** from Baxtel / Data Center Map (paid scrape — grant work).
|
1. **Backfill `power_mw`** from Baxtel / Data Center Map (paid scrape — grant work).
|
||||||
2. **Operator-string deduplication** — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis.
|
2. **Operator-string deduplication** — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis.
|
||||||
3. **Water-stress overlay against the 257 watersheds** — now that the HUC8 join is in place, pull USGS WaterWatch streamflow data, USGS water-use estimates, or EPA drought-status indicators against this watershed set. A single stress index per HUC8 would size the entire US fleet's water exposure.
|
3. **Water-stress overlay against the 257 watersheds** — now that the HUC8 join is in place, pull USGS WaterWatch streamflow data, USGS water-use estimates, or EPA drought-status indicators against this watershed set. A single stress index per HUC8 would size the entire US fleet's water exposure.
|
||||||
4. **State-level energy demand context** — `im3_state_projected_moderate_50` and `seds_state_msn_year` are loaded; joining these would let us compute "DC nearby capacity as share of state grid" rather than absolute MW.
|
4. **Forward-projected demand overlay** — the static SEDS / OGC capacity-share view in §7 is a snapshot. Joining `im3_state_projected_moderate_50` against the §7 saturation table would let us flag which already-saturated states (NJ, NV, TN, OR) are projected to need the most additional generation before 2050.
|
||||||
5. **Opposition cases overlay** — `opposition_cases_geocoded` is loaded but unused; could test whether cluster-vs-isolated demographic differences (or watershed concentration) predict community opposition.
|
5. **Opposition cases overlay** — `opposition_cases_geocoded` is loaded but unused; could test whether cluster-vs-isolated demographic differences (or watershed concentration) predict community opposition.
|
||||||
|
|||||||
Reference in New Issue
Block a user