Files
data-centers/output/data_center_demographic_ruca_energy_summary.md

19 KiB
Raw Blame History

US Data Centers — Demographic, Urban-Rural & Energy Context Analysis

Date: 2026-05-18

Notebook: cluster_analysis.ipynb

Universe: 1,833 data centers in public.master_data_centers, joined to ACS-2024 demographics, USDA RUCA-2020 codes, USGS HUC8 watersheds, and EIA operating-generator capacity (50 km radius, latest period 2026-02, status=OP).


Headline findings

  1. DC tracts are richer, more educated, and more diverse than the US average. Median household income $103,623 vs. national $78,538 (+32%); 49% bachelor's+ vs. 35% (+14 pp); poverty rate 7.2% vs. 12.4%. Non-Hispanic white share is below national (50% vs. 58%), driven by Asian-heavy (mean 13% vs. 6%) and Hispanic-significant tracts.
  2. The metro skew is more modest than expected: 1.11×. 89% of DCs sit in metropolitan tracts, but 80% of all US tracts are metropolitan — so DCs are only slightly more concentrated than the underlying population distribution would predict.
  3. The non-metro tail is overwhelmingly hyperscale and Pacific Northwest. Of 190 DCs outside metropolitan tracts (RUCA 410), AWS owns 67, Meta 22, Microsoft 10, Google 4, Yahoo 2 — combined 55% of the non-metro footprint. Oregon (86) and Washington (40) alone hold 66% of non-metro DCs, anchored to the Columbia River hydropower corridor.
  4. Clustered DCs are demographically distinct from isolated ones. DCs in DBSCAN clusters (n=1,583) sit in tracts with $108K median income vs. $73K for isolated DCs (n=250) — a $35K gap. Clustered DCs are more educated (+18 pp bachelor's), more diverse (25 pp non-Hispanic white), and embedded in much denser energy infrastructure (89 vs. 40 generators within 50 km).
  5. Microsoft co-locates with the largest US nuclear plant. Microsoft's Goodyear, AZ campus has 14.6 GW of generation within 50 km — including 4.2 GW from Palo Verde Nuclear, the largest in the US. Despite the campus being in a RUCA-2 "Metro high-commute" tract (not strictly metro core), the surrounding grid is the densest by capacity in our analysis.
  6. Extreme watershed concentration: half of all US DCs sit in just 15 of 2,139 HUC8 watersheds. A single watershed — Middle Potomac-Catoctin (Loudoun County) — holds 235 DCs (12.8% of the US total). The top 2 (both DC-Alley watersheds) hold 18.9%; the top 10 hold 40%. Water stress in any one of these basins propagates to a huge share of national DC capacity.

Dataset coverage and joins

Source table Rows Join key Coverage
master_data_centers 1,833 base
master_data_center_spatial_clusters 1,831 master_id 99.9%
_dc_census_tract_acs_2024 ~73,000 tracts geoid 1,807 matched (98.6%)
ruca_codes_2020_tract 85,528 tracts tract_fips_20 = geoid 1,826 matched (99.6%)
watershed_huc8 2,139 watersheds ST_Contains(w.geom, m.geom) 1,831 matched (99.9%)
energy_eia_operating_generator_capacity_flat 4.7M rows ST_DWithin(geom, 50km) 1,831 DCs have ≥1 nearby gen

Energy aggregation uses period 2026-02 only with status='OP', summing nameplate_capacity_mw for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the _flat table had no MW values despite its name.


1. Demographic profile of DC tracts (n=1,807 with non-null ACS)

Metric DC tract (median) DC tract (mean) US avg Δ mean vs. US
Median household income $103,623 $114,543 $78,538 +$36,005
Per-capita income $51,283 $55,725 $43,313 +$12,412
Poverty rate 7.2% 10.1% 12.4% 2.3 pp
Unemployment rate 3.5% 4.4% 5.4% 1.0 pp
Bachelor's+ % 49.3% 46.2% 35.0% +11.2 pp
Broadband subscription % 94.9% 93.5% 89.0% +4.5 pp
Non-Hispanic white % 50.2% 51.0% 58.4% 7.4 pp
Hispanic / Latino % 12.8% 19.5% 19.5% 0.0 pp
Non-Hispanic Black % 5.9% 10.6% 12.1% 1.5 pp
Non-Hispanic Asian % 6.4% 13.4% 6.4% +7.0 pp

Interpretation. DC tracts skew toward high-income, highly-educated, technically connected, and racially diverse (specifically Asian-heavy). The race composition is interesting: DC tracts are less non-Hispanic white than national average, not more. This reflects DC siting in mixed-race coastal/exurban tech corridors (Bay Area, Northern Virginia, Seattle) rather than in homogeneous suburbs.

Data quality note. avg_household_size previously contained ACS sentinel-value pollution (666,666,666) for 1,089 zero-population tracts in _dc_census_tract_acs_2024 (29 of which contained DCs) plus 16 rows in data_center_census_tracts_2024. As of 2026-05-18, those sentinels have been replaced with NULL. The column now has plausible ranges (min 1.00, max 9.33) and a usable mean.


2. Geographic concentration (top 15 states)

State DC count Total power_mw (where known) Median HH income Median bachelor's % Median % white Notes
VA 378 255 $141,250 62.6% 42.5% Loudoun / DC-Alley dominance (20.6% of all US DCs)
TX 162 597 $88,228 46.2% 32.0% DFW + Austin + San Antonio
CA 147 130 $164,928 56.4% 22.4% Bay Area + LA basin
OR 145 125 $72,719 20.0% 63.2% Columbia River hydro corridor (rural)
OH 103 135 $128,875 47.0% 74.5% Columbus boom — fastest-rising market
WA 93 70 $91,623 21.9% 40.3% Quincy/Wenatchee + Seattle
AZ 69 54 $85,335 35.2% 51.6% Phoenix/Goodyear hyperscale
IA 65 0 $93,393 34.3% 88.1% 88% white (rural Midwest)
NJ 62 98 $147,321 59.4% 32.9% NYC-metro carrier hotels
IL 61 128 $96,191 52.9% 52.0% Chicago metro
GA 50 241 $101,176 51.4% 31.6% Atlanta + high-power rural builds
NY 48 47 $77,465 47.6% 74.8% NYC + upstate
NV 41 0 $93,409 31.2% 34.6% Reno + Las Vegas
TN 32 0 54.8% Nashville + Memphis (newly visible after state backfill)
NC 31 56 $82,708 44.7% 59.6% Charlotte + Catawba (nuclear-adjacent)

Virginia alone holds 20.6% of all US DCs (378 of 1,833), with the most affluent tract profile in the top 15 — a Loudoun County effect. The state backfill substantially elevated Ohio (76 → 103) and Texas (135 → 162), pushing TX into the #2 slot. The previously-uncounted Tennessee (32) now appears in the top 15.

Oregon and Washington tracts look notably different from the urban-heavy states (lower income, lower education, lower broadband, higher Hispanic share), reflecting their rural Columbia River siting.


3. Spatially clustered DCs vs. isolated DCs

DBSCAN cluster assignment from master_data_center_spatial_clusters (1,583 clustered, 250 isolated):

Metric (median) Isolated In cluster Δ
Median household income $73,500 $108,359 +$34,859
Bachelor's+ % 33.2 51.2 +18.0 pp
Poverty rate 11.6 6.9 4.7 pp
Non-Hispanic white % 71.0 45.9 25.1 pp
EIA generators within 50 km 40 89 +49
EIA capacity within 50 km (MW) 2,176 3,300 +1,125

Reading. A clustered data center sits, at the median, in a tract that is ~$35K richer, 18 pp more educated, and 25 pp less non-Hispanic white than an isolated one — and is surrounded by twice as much energy infrastructure (and 50% more generation capacity). The isolated set looks like rural / small-town America (whiter, poorer, less educated); the clustered set looks like coastal exurban tech corridors.


4. RUCA (urban-rural) distribution

National baseline of all US tracts: 80% Metropolitan, 9% Micropolitan, 3% Small town, 8% Rural.

RUCA band DCs DC % US tract % Over-index
Metropolitan (13) 1,636 89.3% 80.1% 1.11×
Micropolitan (46) 98 5.3% 9.0% 0.59×
Small town (79) 15 0.8% 2.9% 0.28×
Rural (10) 77 4.2% 7.6% 0.55×
Unknown / missed match 7 0.4%

Reading. The metro skew is real but only mild — 1.11×. The eye-catching pattern is that rural tracts (RUCA 10) hold more DCs than micropolitan or small-town combined, because the hyperscale greenfield model deliberately bypasses small-city economies in favor of remote, cheap-power, low-population sites.

Per-RUCA-code drilldown

RUCA Description DCs Median HH income Median pop density Median EIA gens (50km)
1 Metro core 1,425 $110,333 1,859 / sq mi 97
2 Metro high-commute 206 $105,404 96 49
3 Metro low-commute 5 $119,495 22 23
4 Micropolitan core 54 $63,698 312 53
5 Micropolitan high-commute 22 $72,465 191 51
6 Micropolitan low-commute 22 $72,719 69 59
7 Small town core 14 $87,522 2,336 40
8 Small town high-commute 1 $69,074 36 41
10 Rural area 77 $93,820 12 42

Two surprises:

  • Rural DCs (RUCA 10) sit in tracts with $93.8K median income — higher than micropolitan DCs ($63.7K$72.7K). The rural DC sites are not poor rural America; they are wealthy-by-rural-standards counties chosen for power and water access.
  • Micropolitan-core DCs (RUCA 4) have the lowest median income at $63.7K — the closest thing to "economic-development DC siting" in the dataset.

5. Non-metro deep dive (190 DCs, RUCA 410)

Operators

Operator Non-metro DCs
Amazon Web Services 67
(null operator) 50
Meta 20 (+ 2 as "Meta, Inc.")
Microsoft 10
Google 4
Rowan Green Data 4
NTT 2
Yahoo 2
Amazon AWS (dupe) 2

The five hyperscalers (AWS, Meta, Microsoft, Google, Yahoo) account for 105 of 190 non-metro DCs (55%). If the 50 null-operator rows skew similarly hyperscale (likely — they're disproportionately in OR/WA), the share is probably closer to 75%.

States (post-backfill)

State Non-metro DCs
Oregon 86
Washington 40
Texas 9
New Mexico 7
North Carolina 6
Pennsylvania 5
Wisconsin 4
New York 3
Tennessee 3
Georgia 3

Oregon + Washington = 126 (66%) of all non-metro DCs. This is the Columbia River basin: Prineville / Hermiston / Boardman / The Dalles (OR) and Quincy / East Wenatchee / Moses Lake (WA). The pull is hydroelectric power (cheap, low-carbon, abundant) and cool dry climate (free-cooling).

The state backfill clarified the rest of the non-metro tail: Texas (9) and Pennsylvania (5) were previously hidden in the null bucket. These likely represent shale-gas-adjacent builds (Permian and Marcellus respectively).


6. Energy footprint by operator (using EIA capacity within 50 km)

Aggregated across DCs in RUCA 210 (i.e. anything outside dense metro core, n=401):

Operator DCs States Total nearby capacity (GW) Median per site (GW) Hydro (GW) Nuclear (GW) NG (GW) Solar (GW) Wind (GW)
AWS 93 5 397 4.8 66 2.5 201 4.6 114
(Unknown) 118 26 339 2.3 86 35 135 23 19
Meta 51 11 120 2.0 4.9 0 61 16 0.3
Microsoft 26 6 113 3.4 28 13 39 9.1 8.1
Google 31 5 100 3.9 14 0 43 3.6 4.7
Apple 5 2 4 0.6 1.6 0 1.1 0.9 0.4
Yahoo 2 1 7 3.5 6.4 0 0 0 0.7

Distinct hyperscaler strategies, visible in the fuel mix:

  • AWS has aggregated 114 GW of wind exposure across its 93 sites — by far the most renewable-coupled portfolio. Also heavy hydro (66 GW) from its OR/WA footprint and 201 GW of natural gas as baseline.
  • Microsoft has the highest nuclear exposure (12.6 GW) — almost entirely from its Goodyear, AZ campuses near Palo Verde Nuclear.
  • Meta has the most solar (16 GW) among the named hyperscalers, but minimal nuclear or wind — consistent with its New Mexico (Los Lunas) and Iowa builds.
  • Google is split — moderate hydro and natural gas, modest renewables.

Largest non-metro grid neighborhoods (top sites by surrounding capacity)

DC Operator Location Nearby capacity Fuel highlight
PHX70 / PHX-10 / PHX-11 Microsoft (Azure) Goodyear, AZ (RUCA 2) 14.014.6 GW 4.2 GW nuclear (Palo Verde) + 6.4 GW gas + 2.2 GW solar
Stream PHX-1 Stream Data Centers Goodyear, AZ 13.4 GW Same Palo Verde / gas mix
T5 Charlotte Campus T5 Kings Mountain, NC (RUCA 6) 12.9 GW 4.9 GW nuclear (Catawba) + 5.5 GW gas + 1.5 GW coal
Apple Maiden Apple Maiden, NC (RUCA 2) 9.1 GW 2.4 GW nuclear + 4.6 GW gas
Percheron DC Rowan Green Data (Texas, RUCA 10) 6.7 GW 3.0 GW wind + 0.9 GW hydro + 2.4 GW gas

7. Watershed (HUC8) concentration

Each DC sits in exactly one USGS HUC8 watershed (8-digit hydrologic unit, subbasin scale, median ~3,000 sq km). Cooling-water draw and wastewater discharge happen at watershed scale, not state scale — a single stressed basin can cap an entire DC corridor regardless of how big the state's overall water budget is.

Where the 1,831 matched DCs land

  • 257 distinct HUC8s hold at least one DC — that's only 12% of the 2,139 US watersheds (the other 88% have zero data centers).
  • The top 1 watershed alone (Middle Potomac-Catoctin) holds 235 DCs — 12.8% of the entire US data-center footprint.
  • DC concentration is much more extreme at the watershed level than at the state level. Virginia has 20.6% of US DCs; the single Loudoun watershed holds 12.8%.

Cumulative concentration

Top N watersheds DCs Share of all US DCs
1 235 12.8%
2 346 18.9%
3 434 23.7%
5 551 30.1%
10 736 40.2%
15 887 48.4%
20 1,012 55.3%
30 1,186 64.8%
50 1,380 75.4%
100 1,611 88.0%

Half of all US data centers live in just 15 watersheds. Three-quarters in 50. Water stress, drought policy, or thermal-discharge limits in any one of these basins propagates to a large share of the national footprint.

Top 15 watersheds by DC count

HUC8 Name States DCs Cluster
02070008 Middle Potomac-Catoctin DC, MD, VA, WV 235 Loudoun / Ashburn (DC-Alley)
02070010 Middle Potomac-Anacostia-Occoquan DC, MD, VA 111 Fairfax + inner Loudoun + DC
18050003 Coyote CA 88 Silicon Valley / San Jose
05060001 Upper Scioto OH 73 Columbus (fastest-growing market)
17070101 Middle Columbia-Lake Wallula OR, WA 44 Boardman / Hermiston (hyperscale hydro)
17020015 Lower Crab WA 40 Quincy / Moses Lake (hyperscale hydro)
17090010 Tualatin OR 39 Hillsboro (Intel / Google)
12030105 Upper Trinity TX 37 DFW
10230006 Big Papillion-Mosquito IA, NE 36 Council Bluffs / Omaha (Meta)
07120004 Des Plaines IL, WI 33 Chicago metro
12100302 Medina TX 32 San Antonio
02030105 Raritan NJ 31 Central NJ carrier hotels
15050100 Middle Gila AZ 30 Phoenix metro
02030103 Hackensack-Passaic NJ, NY 29 NYC metro east
17070103 Umatilla OR 29 AWS-only (all 29 DCs) — pure single-operator basin

The non-metro / hyperscale Pacific Northwest story is visible at watershed scale: three Columbia-system watersheds (Middle Columbia-Lake Wallula, Lower Crab, Umatilla) hold 113 DCs combined, all hyperscale-dominated. Umatilla is operationally AWS-exclusive — all 29 DCs in that basin are AWS.

Non-metro watersheds (RUCA 410) — where hyperscalers cluster

HUC8 Name States DCs Operators
17070101 Middle Columbia-Lake Wallula OR, WA 44 AWS (multiple variants), Rowan Green Data
17020015 Lower Crab WA 40 CyrusOne, Intuit, Microsoft, NTT, Sabey, Yahoo
17070103 Umatilla OR 29 AWS only
17070305 Lower Crooked OR 8 Meta (Prineville)
13020203 Rio Grande-Albuquerque NM 7 Meta (Los Lunas)
03050105 Upper Broad NC, SC 6 Meta
13070001 Lower Pecos-Red Bluff Reservoir NM, TX 5 IONIC Digital
17070105 Middle Columbia-Hood OR, WA 4 Google (The Dalles)
02050107 Upper Susquehanna-Lackawanna PA 3 AWS
03070103 Upper Ocmulgee GA 2 Meta

This view is the cleanest evidence yet of the hyperscale geographic strategy — single-operator capture of individual watersheds (Umatilla = AWS, Lower Crooked = Meta, Middle Columbia-Hood = Google, Rio Grande-Albuquerque = Meta). Each of these basins has effectively been claimed by one player.

Implications for water-stress analysis

This watershed view is a boundary set for downstream water-stress analysis. Pull WaterWatch streamflow data, USGS water-use estimates, or EPA drought indicators against just these 257 HUC8s (or against just the top 15 for the highest-leverage story). A single-pull stress index against this set would size the "water exposure" of the entire US DC fleet.


Data quality flags

  1. master_data_centers.power_mw is populated for only 108 / 1,833 DCs (5.9%). Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap.
  2. 50 of 190 non-metro DCs (26%) have null operator. Likely hyperscalers based on geography (OR/WA) but unconfirmed.
  3. Operator-string fragmentation: "Meta" vs. "Meta, Inc."; "Amazon Web Services" vs. "Amazon AWS" vs. "amazon web services"; "Microsoft" vs. "Microsoft Azure". Inflates distinct-operator counts and fragments per-operator totals.
  4. avg_household_size column has sentinel pollution Resolved 2026-05-18 — 1,089 sentinel values (666,666,666) in _dc_census_tract_acs_2024 and 16 in data_center_census_tracts_2024 replaced with NULL. Affected 29 DCs.
  5. 7 DCs failed RUCA join — Puerto Rico tracts or non-US locations; trivial.
  6. EIA generator coordinates had a longitude sign error for 2008-01 through 2010-11 (~11K rows with positive lower-48 longitudes). The flat-table build at ingest_eia_energy_layers.py:839-870 corrects this in longitude and geom, so spatial joins are unaffected.

Suggested next steps

  1. Backfill power_mw from Baxtel / Data Center Map (paid scrape — grant work).
  2. Operator-string deduplication — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis.
  3. Water-stress overlay against the 257 watersheds — now that the HUC8 join is in place, pull USGS WaterWatch streamflow data, USGS water-use estimates, or EPA drought-status indicators against this watershed set. A single stress index per HUC8 would size the entire US fleet's water exposure.
  4. State-level energy demand contextim3_state_projected_moderate_50 and seds_state_msn_year are loaded; joining these would let us compute "DC nearby capacity as share of state grid" rather than absolute MW.
  5. Opposition cases overlayopposition_cases_geocoded is loaded but unused; could test whether cluster-vs-isolated demographic differences (or watershed concentration) predict community opposition.