Files
data-centers/output/data_center_demographic_ruca_energy_summary.md
dadams eccfbdbad9 Add data center demographic, RUCA, and energy capacity analysis
Adds three coordinated changes:

- Request nameplate, summer, and winter capacity from the EIA
  operating-generator-capacity endpoint and project them as typed columns
  on energy_eia_operating_generator_capacity_flat. The original ingest
  only pulled latitude and longitude, leaving the flat table with no MW
  values despite its name.
- New cluster_analysis.ipynb joins master_data_centers to ACS-2024
  demographics, USDA RUCA-2020 codes (loaded from new/), and EIA
  generation capacity within 50 km of each site.
- Summary doc consolidates the headline findings: DC tracts skew higher
  income / more educated / more racially diverse than US average, the
  metro over-index is only 1.11x, the non-metro tail is dominated by
  hyperscalers in the Columbia River corridor (OR+WA = 66% of non-metro
  DCs), and Microsoft co-locates with Palo Verde Nuclear in Goodyear AZ.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 08:14:57 -07:00

14 KiB
Raw Blame History

US Data Centers — Demographic, Urban-Rural & Energy Context Analysis

Date: 2026-05-18 Notebook: cluster_analysis.ipynb Universe: 1,833 data centers in public.master_data_centers, joined to ACS-2024 demographics, USDA RUCA-2020 codes, and EIA operating-generator capacity (50 km radius, latest period 2026-02, status=OP).

Update 2026-05-18: 196 previously-null state values were backfilled from geoid (first 2 chars = state FIPS). All 1,833 DCs now have a state; all state-level numbers below reflect the corrected attribution.


Headline findings

  1. DC tracts are richer, more educated, and more diverse than the US average. Median household income $103,623 vs. national $78,538 (+32%); 49% bachelor's+ vs. 35% (+14 pp); poverty rate 7.2% vs. 12.4%. Non-Hispanic white share is below national (50% vs. 58%), driven by Asian-heavy (mean 13% vs. 6%) and Hispanic-significant tracts.
  2. The metro skew is more modest than expected: 1.11×. 89% of DCs sit in metropolitan tracts, but 80% of all US tracts are metropolitan — so DCs are only slightly more concentrated than the underlying population distribution would predict.
  3. The non-metro tail is overwhelmingly hyperscale and Pacific Northwest. Of 190 DCs outside metropolitan tracts (RUCA 410), AWS owns 67, Meta 22, Microsoft 10, Google 4, Yahoo 2 — combined 55% of the non-metro footprint. Oregon (86) and Washington (40) alone hold 66% of non-metro DCs, anchored to the Columbia River hydropower corridor.
  4. Clustered DCs are demographically distinct from isolated ones. DCs in DBSCAN clusters (n=1,583) sit in tracts with $108K median income vs. $73K for isolated DCs (n=250) — a $35K gap. Clustered DCs are more educated (+18 pp bachelor's), more diverse (25 pp non-Hispanic white), and embedded in much denser energy infrastructure (89 vs. 40 generators within 50 km).
  5. Microsoft co-locates with the largest US nuclear plant. Microsoft's Goodyear, AZ campus has 14.6 GW of generation within 50 km — including 4.2 GW from Palo Verde Nuclear, the largest in the US. Despite the campus being in a RUCA-2 "Metro high-commute" tract (not strictly metro core), the surrounding grid is the densest by capacity in our analysis.

Dataset coverage and joins

Source table Rows Join key Coverage
master_data_centers 1,833 base
master_data_center_spatial_clusters 1,831 master_id 99.9%
_dc_census_tract_acs_2024 ~73,000 tracts geoid 1,807 matched (98.6%)
ruca_codes_2020_tract 85,528 tracts tract_fips_20 = geoid 1,826 matched (99.6%)
energy_eia_operating_generator_capacity_flat 4.7M rows ST_DWithin(geom, 50km) 1,831 DCs have ≥1 nearby gen

Energy aggregation uses period 2026-02 only with status='OP', summing nameplate_capacity_mw for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the _flat table had no MW values despite its name.


1. Demographic profile of DC tracts (n=1,807 with non-null ACS)

Metric DC tract (median) DC tract (mean) US avg Δ mean vs. US
Median household income $103,623 $114,543 $78,538 +$36,005
Per-capita income $51,283 $55,725 $43,313 +$12,412
Poverty rate 7.2% 10.1% 12.4% 2.3 pp
Unemployment rate 3.5% 4.4% 5.4% 1.0 pp
Bachelor's+ % 49.3% 46.2% 35.0% +11.2 pp
Broadband subscription % 94.9% 93.5% 89.0% +4.5 pp
Non-Hispanic white % 50.2% 51.0% 58.4% 7.4 pp
Hispanic / Latino % 12.8% 19.5% 19.5% 0.0 pp
Non-Hispanic Black % 5.9% 10.6% 12.1% 1.5 pp
Non-Hispanic Asian % 6.4% 13.4% 6.4% +7.0 pp

Interpretation. DC tracts skew toward high-income, highly-educated, technically connected, and racially diverse (specifically Asian-heavy). The race composition is interesting: DC tracts are less non-Hispanic white than national average, not more. This reflects DC siting in mixed-race coastal/exurban tech corridors (Bay Area, Northern Virginia, Seattle) rather than in homogeneous suburbs.

Data quality note. avg_household_size contains sentinel-value pollution (min: 666,666,666), so the mean is unusable; the median (2.55) is sensible.


2. Geographic concentration (top 15 states)

State DC count Total power_mw (where known) Median HH income Median bachelor's % Median % white Notes
VA 378 255 $141,250 62.6% 42.5% Loudoun / DC-Alley dominance (20.6% of all US DCs)
TX 162 597 $88,228 46.2% 32.0% DFW + Austin + San Antonio
CA 147 130 $164,928 56.4% 22.4% Bay Area + LA basin
OR 145 125 $72,719 20.0% 63.2% Columbia River hydro corridor (rural)
OH 103 135 $128,875 47.0% 74.5% Columbus boom — fastest-rising market
WA 93 70 $91,623 21.9% 40.3% Quincy/Wenatchee + Seattle
AZ 69 54 $85,335 35.2% 51.6% Phoenix/Goodyear hyperscale
IA 65 0 $93,393 34.3% 88.1% 88% white (rural Midwest)
NJ 62 98 $147,321 59.4% 32.9% NYC-metro carrier hotels
IL 61 128 $96,191 52.9% 52.0% Chicago metro
GA 50 241 $101,176 51.4% 31.6% Atlanta + high-power rural builds
NY 48 47 $77,465 47.6% 74.8% NYC + upstate
NV 41 0 $93,409 31.2% 34.6% Reno + Las Vegas
TN 32 0 54.8% Nashville + Memphis (newly visible after state backfill)
NC 31 56 $82,708 44.7% 59.6% Charlotte + Catawba (nuclear-adjacent)

Virginia alone holds 20.6% of all US DCs (378 of 1,833), with the most affluent tract profile in the top 15 — a Loudoun County effect. The state backfill substantially elevated Ohio (76 → 103) and Texas (135 → 162), pushing TX into the #2 slot. The previously-uncounted Tennessee (32) now appears in the top 15.

Oregon and Washington tracts look notably different from the urban-heavy states (lower income, lower education, lower broadband, higher Hispanic share), reflecting their rural Columbia River siting.


3. Spatially clustered DCs vs. isolated DCs

DBSCAN cluster assignment from master_data_center_spatial_clusters (1,583 clustered, 250 isolated):

Metric (median) Isolated In cluster Δ
Median household income $73,500 $108,359 +$34,859
Bachelor's+ % 33.2 51.2 +18.0 pp
Poverty rate 11.6 6.9 4.7 pp
Non-Hispanic white % 71.0 45.9 25.1 pp
EIA generators within 50 km 40 89 +49
EIA capacity within 50 km (MW) 2,176 3,300 +1,125

Reading. A clustered data center sits, at the median, in a tract that is ~$35K richer, 18 pp more educated, and 25 pp less non-Hispanic white than an isolated one — and is surrounded by twice as much energy infrastructure (and 50% more generation capacity). The isolated set looks like rural / small-town America (whiter, poorer, less educated); the clustered set looks like coastal exurban tech corridors.


4. RUCA (urban-rural) distribution

National baseline of all US tracts: 80% Metropolitan, 9% Micropolitan, 3% Small town, 8% Rural.

RUCA band DCs DC % US tract % Over-index
Metropolitan (13) 1,636 89.3% 80.1% 1.11×
Micropolitan (46) 98 5.3% 9.0% 0.59×
Small town (79) 15 0.8% 2.9% 0.28×
Rural (10) 77 4.2% 7.6% 0.55×
Unknown / missed match 7 0.4%

Reading. The metro skew is real but only mild — 1.11×. The eye-catching pattern is that rural tracts (RUCA 10) hold more DCs than micropolitan or small-town combined, because the hyperscale greenfield model deliberately bypasses small-city economies in favor of remote, cheap-power, low-population sites.

Per-RUCA-code drilldown

RUCA Description DCs Median HH income Median pop density Median EIA gens (50km)
1 Metro core 1,425 $110,333 1,859 / sq mi 97
2 Metro high-commute 206 $105,404 96 49
3 Metro low-commute 5 $119,495 22 23
4 Micropolitan core 54 $63,698 312 53
5 Micropolitan high-commute 22 $72,465 191 51
6 Micropolitan low-commute 22 $72,719 69 59
7 Small town core 14 $87,522 2,336 40
8 Small town high-commute 1 $69,074 36 41
10 Rural area 77 $93,820 12 42

Two surprises:

  • Rural DCs (RUCA 10) sit in tracts with $93.8K median income — higher than micropolitan DCs ($63.7K$72.7K). The rural DC sites are not poor rural America; they are wealthy-by-rural-standards counties chosen for power and water access.
  • Micropolitan-core DCs (RUCA 4) have the lowest median income at $63.7K — the closest thing to "economic-development DC siting" in the dataset.

5. Non-metro deep dive (190 DCs, RUCA 410)

Operators

Operator Non-metro DCs
Amazon Web Services 67
(null operator) 50
Meta 20 (+ 2 as "Meta, Inc.")
Microsoft 10
Google 4
Rowan Green Data 4
NTT 2
Yahoo 2
Amazon AWS (dupe) 2

The five hyperscalers (AWS, Meta, Microsoft, Google, Yahoo) account for 105 of 190 non-metro DCs (55%). If the 50 null-operator rows skew similarly hyperscale (likely — they're disproportionately in OR/WA), the share is probably closer to 75%.

States (post-backfill)

State Non-metro DCs
Oregon 86
Washington 40
Texas 9
New Mexico 7
North Carolina 6
Pennsylvania 5
Wisconsin 4
New York 3
Tennessee 3
Georgia 3

Oregon + Washington = 126 (66%) of all non-metro DCs. This is the Columbia River basin: Prineville / Hermiston / Boardman / The Dalles (OR) and Quincy / East Wenatchee / Moses Lake (WA). The pull is hydroelectric power (cheap, low-carbon, abundant) and cool dry climate (free-cooling).

The state backfill clarified the rest of the non-metro tail: Texas (9) and Pennsylvania (5) were previously hidden in the null bucket. These likely represent shale-gas-adjacent builds (Permian and Marcellus respectively).


6. Energy footprint by operator (using EIA capacity within 50 km)

Aggregated across DCs in RUCA 210 (i.e. anything outside dense metro core, n=401):

Operator DCs States Total nearby capacity (GW) Median per site (GW) Hydro (GW) Nuclear (GW) NG (GW) Solar (GW) Wind (GW)
AWS 93 5 397 4.8 66 2.5 201 4.6 114
(Unknown) 118 26 339 2.3 86 35 135 23 19
Meta 51 11 120 2.0 4.9 0 61 16 0.3
Microsoft 26 6 113 3.4 28 13 39 9.1 8.1
Google 31 5 100 3.9 14 0 43 3.6 4.7
Apple 5 2 4 0.6 1.6 0 1.1 0.9 0.4
Yahoo 2 1 7 3.5 6.4 0 0 0 0.7

Distinct hyperscaler strategies, visible in the fuel mix:

  • AWS has aggregated 114 GW of wind exposure across its 93 sites — by far the most renewable-coupled portfolio. Also heavy hydro (66 GW) from its OR/WA footprint and 201 GW of natural gas as baseline.
  • Microsoft has the highest nuclear exposure (12.6 GW) — almost entirely from its Goodyear, AZ campuses near Palo Verde Nuclear.
  • Meta has the most solar (16 GW) among the named hyperscalers, but minimal nuclear or wind — consistent with its New Mexico (Los Lunas) and Iowa builds.
  • Google is split — moderate hydro and natural gas, modest renewables.

Largest non-metro grid neighborhoods (top sites by surrounding capacity)

DC Operator Location Nearby capacity Fuel highlight
PHX70 / PHX-10 / PHX-11 Microsoft (Azure) Goodyear, AZ (RUCA 2) 14.014.6 GW 4.2 GW nuclear (Palo Verde) + 6.4 GW gas + 2.2 GW solar
Stream PHX-1 Stream Data Centers Goodyear, AZ 13.4 GW Same Palo Verde / gas mix
T5 Charlotte Campus T5 Kings Mountain, NC (RUCA 6) 12.9 GW 4.9 GW nuclear (Catawba) + 5.5 GW gas + 1.5 GW coal
Apple Maiden Apple Maiden, NC (RUCA 2) 9.1 GW 2.4 GW nuclear + 4.6 GW gas
Percheron DC Rowan Green Data (Texas, RUCA 10) 6.7 GW 3.0 GW wind + 0.9 GW hydro + 2.4 GW gas

Data quality flags

  1. 196 of 1,833 DCs (10.7%) have null state Resolved 2026-05-18 by backfilling from geoid first-2-chars (state FIPS).
  2. master_data_centers.power_mw is populated for only 108 / 1,833 DCs (5.9%). Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap.
  3. 50 of 190 non-metro DCs (26%) have null operator. Likely hyperscalers based on geography (OR/WA) but unconfirmed.
  4. Operator-string fragmentation: "Meta" vs. "Meta, Inc."; "Amazon Web Services" vs. "Amazon AWS" vs. "amazon web services"; "Microsoft" vs. "Microsoft Azure". Inflates distinct-operator counts and fragments per-operator totals.
  5. avg_household_size column has sentinel pollution (min: 666,666,666). Use median or filter before using.
  6. 7 DCs failed RUCA join — Puerto Rico tracts or non-US locations; trivial.
  7. EIA generator coordinates had a longitude sign error for 2008-01 through 2010-11 (~11K rows with positive lower-48 longitudes). The flat-table build at ingest_eia_energy_layers.py:839-870 corrects this in longitude and geom, so spatial joins are unaffected.

Suggested next steps

  1. Backfill power_mw from Baxtel / Data Center Map (paid scrape — grant work).
  2. Operator-string deduplication — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis.
  3. Watershed (HUC8) joinpublic.watershed_huc8 is loaded but unused; would let us look at water stress overlap, particularly for the 190 non-metro DCs.
  4. State-level energy demand contextim3_state_projected_moderate_50 and seds_state_msn_year are loaded; joining these would let us compute "DC nearby capacity as share of state grid" rather than absolute MW.
  5. Opposition cases overlayopposition_cases_geocoded is loaded but unused; could test whether cluster-vs-isolated demographic differences predict community opposition.