Adds three coordinated changes: - Request nameplate, summer, and winter capacity from the EIA operating-generator-capacity endpoint and project them as typed columns on energy_eia_operating_generator_capacity_flat. The original ingest only pulled latitude and longitude, leaving the flat table with no MW values despite its name. - New cluster_analysis.ipynb joins master_data_centers to ACS-2024 demographics, USDA RUCA-2020 codes (loaded from new/), and EIA generation capacity within 50 km of each site. - Summary doc consolidates the headline findings: DC tracts skew higher income / more educated / more racially diverse than US average, the metro over-index is only 1.11x, the non-metro tail is dominated by hyperscalers in the Columbia River corridor (OR+WA = 66% of non-metro DCs), and Microsoft co-locates with Palo Verde Nuclear in Goodyear AZ. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
US Data Centers — Demographic, Urban-Rural & Energy Context Analysis
Date: 2026-05-18
Notebook: cluster_analysis.ipynb
Universe: 1,833 data centers in public.master_data_centers, joined to ACS-2024 demographics, USDA RUCA-2020 codes, and EIA operating-generator capacity (50 km radius, latest period 2026-02, status=OP).
Update 2026-05-18: 196 previously-null
statevalues were backfilled fromgeoid(first 2 chars = state FIPS). All 1,833 DCs now have a state; all state-level numbers below reflect the corrected attribution.
Headline findings
- DC tracts are richer, more educated, and more diverse than the US average. Median household income $103,623 vs. national $78,538 (+32%); 49% bachelor's+ vs. 35% (+14 pp); poverty rate 7.2% vs. 12.4%. Non-Hispanic white share is below national (50% vs. 58%), driven by Asian-heavy (mean 13% vs. 6%) and Hispanic-significant tracts.
- The metro skew is more modest than expected: 1.11×. 89% of DCs sit in metropolitan tracts, but 80% of all US tracts are metropolitan — so DCs are only slightly more concentrated than the underlying population distribution would predict.
- The non-metro tail is overwhelmingly hyperscale and Pacific Northwest. Of 190 DCs outside metropolitan tracts (RUCA 4–10), AWS owns 67, Meta 22, Microsoft 10, Google 4, Yahoo 2 — combined 55% of the non-metro footprint. Oregon (86) and Washington (40) alone hold 66% of non-metro DCs, anchored to the Columbia River hydropower corridor.
- Clustered DCs are demographically distinct from isolated ones. DCs in DBSCAN clusters (n=1,583) sit in tracts with $108K median income vs. $73K for isolated DCs (n=250) — a $35K gap. Clustered DCs are more educated (+18 pp bachelor's), more diverse (–25 pp non-Hispanic white), and embedded in much denser energy infrastructure (89 vs. 40 generators within 50 km).
- Microsoft co-locates with the largest US nuclear plant. Microsoft's Goodyear, AZ campus has 14.6 GW of generation within 50 km — including 4.2 GW from Palo Verde Nuclear, the largest in the US. Despite the campus being in a RUCA-2 "Metro high-commute" tract (not strictly metro core), the surrounding grid is the densest by capacity in our analysis.
Dataset coverage and joins
| Source table | Rows | Join key | Coverage |
|---|---|---|---|
master_data_centers |
1,833 | base | — |
master_data_center_spatial_clusters |
1,831 | master_id |
99.9% |
_dc_census_tract_acs_2024 |
~73,000 tracts | geoid |
1,807 matched (98.6%) |
ruca_codes_2020_tract |
85,528 tracts | tract_fips_20 = geoid |
1,826 matched (99.6%) |
energy_eia_operating_generator_capacity_flat |
4.7M rows | ST_DWithin(geom, 50km) |
1,831 DCs have ≥1 nearby gen |
Energy aggregation uses period 2026-02 only with status='OP', summing nameplate_capacity_mw for operating generators within 50 km of each DC. Note: EIA capacity columns were added to this table on 2026-05-17 — prior to that the _flat table had no MW values despite its name.
1. Demographic profile of DC tracts (n=1,807 with non-null ACS)
| Metric | DC tract (median) | DC tract (mean) | US avg | Δ mean vs. US |
|---|---|---|---|---|
| Median household income | $103,623 | $114,543 | $78,538 | +$36,005 |
| Per-capita income | $51,283 | $55,725 | $43,313 | +$12,412 |
| Poverty rate | 7.2% | 10.1% | 12.4% | −2.3 pp |
| Unemployment rate | 3.5% | 4.4% | 5.4% | −1.0 pp |
| Bachelor's+ % | 49.3% | 46.2% | 35.0% | +11.2 pp |
| Broadband subscription % | 94.9% | 93.5% | 89.0% | +4.5 pp |
| Non-Hispanic white % | 50.2% | 51.0% | 58.4% | −7.4 pp |
| Hispanic / Latino % | 12.8% | 19.5% | 19.5% | 0.0 pp |
| Non-Hispanic Black % | 5.9% | 10.6% | 12.1% | −1.5 pp |
| Non-Hispanic Asian % | 6.4% | 13.4% | 6.4% | +7.0 pp |
Interpretation. DC tracts skew toward high-income, highly-educated, technically connected, and racially diverse (specifically Asian-heavy). The race composition is interesting: DC tracts are less non-Hispanic white than national average, not more. This reflects DC siting in mixed-race coastal/exurban tech corridors (Bay Area, Northern Virginia, Seattle) rather than in homogeneous suburbs.
Data quality note. avg_household_size contains sentinel-value pollution (min: −666,666,666), so the mean is unusable; the median (2.55) is sensible.
2. Geographic concentration (top 15 states)
| State | DC count | Total power_mw (where known) | Median HH income | Median bachelor's % | Median % white | Notes |
|---|---|---|---|---|---|---|
| VA | 378 | 255 | $141,250 | 62.6% | 42.5% | Loudoun / DC-Alley dominance (20.6% of all US DCs) |
| TX | 162 | 597 | $88,228 | 46.2% | 32.0% | DFW + Austin + San Antonio |
| CA | 147 | 130 | $164,928 | 56.4% | 22.4% | Bay Area + LA basin |
| OR | 145 | 125 | $72,719 | 20.0% | 63.2% | Columbia River hydro corridor (rural) |
| OH | 103 | 135 | $128,875 | 47.0% | 74.5% | Columbus boom — fastest-rising market |
| WA | 93 | 70 | $91,623 | 21.9% | 40.3% | Quincy/Wenatchee + Seattle |
| AZ | 69 | 54 | $85,335 | 35.2% | 51.6% | Phoenix/Goodyear hyperscale |
| IA | 65 | 0 | $93,393 | 34.3% | 88.1% | 88% white (rural Midwest) |
| NJ | 62 | 98 | $147,321 | 59.4% | 32.9% | NYC-metro carrier hotels |
| IL | 61 | 128 | $96,191 | 52.9% | 52.0% | Chicago metro |
| GA | 50 | 241 | $101,176 | 51.4% | 31.6% | Atlanta + high-power rural builds |
| NY | 48 | 47 | $77,465 | 47.6% | 74.8% | NYC + upstate |
| NV | 41 | 0 | $93,409 | 31.2% | 34.6% | Reno + Las Vegas |
| TN | 32 | 0 | — | — | 54.8% | Nashville + Memphis (newly visible after state backfill) |
| NC | 31 | 56 | $82,708 | 44.7% | 59.6% | Charlotte + Catawba (nuclear-adjacent) |
Virginia alone holds 20.6% of all US DCs (378 of 1,833), with the most affluent tract profile in the top 15 — a Loudoun County effect. The state backfill substantially elevated Ohio (76 → 103) and Texas (135 → 162), pushing TX into the #2 slot. The previously-uncounted Tennessee (32) now appears in the top 15.
Oregon and Washington tracts look notably different from the urban-heavy states (lower income, lower education, lower broadband, higher Hispanic share), reflecting their rural Columbia River siting.
3. Spatially clustered DCs vs. isolated DCs
DBSCAN cluster assignment from master_data_center_spatial_clusters (1,583 clustered, 250 isolated):
| Metric (median) | Isolated | In cluster | Δ |
|---|---|---|---|
| Median household income | $73,500 | $108,359 | +$34,859 |
| Bachelor's+ % | 33.2 | 51.2 | +18.0 pp |
| Poverty rate | 11.6 | 6.9 | −4.7 pp |
| Non-Hispanic white % | 71.0 | 45.9 | −25.1 pp |
| EIA generators within 50 km | 40 | 89 | +49 |
| EIA capacity within 50 km (MW) | 2,176 | 3,300 | +1,125 |
Reading. A clustered data center sits, at the median, in a tract that is ~$35K richer, 18 pp more educated, and 25 pp less non-Hispanic white than an isolated one — and is surrounded by twice as much energy infrastructure (and 50% more generation capacity). The isolated set looks like rural / small-town America (whiter, poorer, less educated); the clustered set looks like coastal exurban tech corridors.
4. RUCA (urban-rural) distribution
National baseline of all US tracts: 80% Metropolitan, 9% Micropolitan, 3% Small town, 8% Rural.
| RUCA band | DCs | DC % | US tract % | Over-index |
|---|---|---|---|---|
| Metropolitan (1–3) | 1,636 | 89.3% | 80.1% | 1.11× |
| Micropolitan (4–6) | 98 | 5.3% | 9.0% | 0.59× |
| Small town (7–9) | 15 | 0.8% | 2.9% | 0.28× |
| Rural (10) | 77 | 4.2% | 7.6% | 0.55× |
| Unknown / missed match | 7 | 0.4% | — | — |
Reading. The metro skew is real but only mild — 1.11×. The eye-catching pattern is that rural tracts (RUCA 10) hold more DCs than micropolitan or small-town combined, because the hyperscale greenfield model deliberately bypasses small-city economies in favor of remote, cheap-power, low-population sites.
Per-RUCA-code drilldown
| RUCA | Description | DCs | Median HH income | Median pop density | Median EIA gens (50km) |
|---|---|---|---|---|---|
| 1 | Metro core | 1,425 | $110,333 | 1,859 / sq mi | 97 |
| 2 | Metro high-commute | 206 | $105,404 | 96 | 49 |
| 3 | Metro low-commute | 5 | $119,495 | 22 | 23 |
| 4 | Micropolitan core | 54 | $63,698 | 312 | 53 |
| 5 | Micropolitan high-commute | 22 | $72,465 | 191 | 51 |
| 6 | Micropolitan low-commute | 22 | $72,719 | 69 | 59 |
| 7 | Small town core | 14 | $87,522 | 2,336 | 40 |
| 8 | Small town high-commute | 1 | $69,074 | 36 | 41 |
| 10 | Rural area | 77 | $93,820 | 12 | 42 |
Two surprises:
- Rural DCs (RUCA 10) sit in tracts with $93.8K median income — higher than micropolitan DCs ($63.7K–$72.7K). The rural DC sites are not poor rural America; they are wealthy-by-rural-standards counties chosen for power and water access.
- Micropolitan-core DCs (RUCA 4) have the lowest median income at $63.7K — the closest thing to "economic-development DC siting" in the dataset.
5. Non-metro deep dive (190 DCs, RUCA 4–10)
Operators
| Operator | Non-metro DCs |
|---|---|
| Amazon Web Services | 67 |
| (null operator) | 50 |
| Meta | 20 (+ 2 as "Meta, Inc.") |
| Microsoft | 10 |
| 4 | |
| Rowan Green Data | 4 |
| NTT | 2 |
| Yahoo | 2 |
| Amazon AWS (dupe) | 2 |
The five hyperscalers (AWS, Meta, Microsoft, Google, Yahoo) account for 105 of 190 non-metro DCs (55%). If the 50 null-operator rows skew similarly hyperscale (likely — they're disproportionately in OR/WA), the share is probably closer to 75%.
States (post-backfill)
| State | Non-metro DCs |
|---|---|
| Oregon | 86 |
| Washington | 40 |
| Texas | 9 |
| New Mexico | 7 |
| North Carolina | 6 |
| Pennsylvania | 5 |
| Wisconsin | 4 |
| New York | 3 |
| Tennessee | 3 |
| Georgia | 3 |
Oregon + Washington = 126 (66%) of all non-metro DCs. This is the Columbia River basin: Prineville / Hermiston / Boardman / The Dalles (OR) and Quincy / East Wenatchee / Moses Lake (WA). The pull is hydroelectric power (cheap, low-carbon, abundant) and cool dry climate (free-cooling).
The state backfill clarified the rest of the non-metro tail: Texas (9) and Pennsylvania (5) were previously hidden in the null bucket. These likely represent shale-gas-adjacent builds (Permian and Marcellus respectively).
6. Energy footprint by operator (using EIA capacity within 50 km)
Aggregated across DCs in RUCA 2–10 (i.e. anything outside dense metro core, n=401):
| Operator | DCs | States | Total nearby capacity (GW) | Median per site (GW) | Hydro (GW) | Nuclear (GW) | NG (GW) | Solar (GW) | Wind (GW) |
|---|---|---|---|---|---|---|---|---|---|
| AWS | 93 | 5 | 397 | 4.8 | 66 | 2.5 | 201 | 4.6 | 114 |
| (Unknown) | 118 | 26 | 339 | 2.3 | 86 | 35 | 135 | 23 | 19 |
| Meta | 51 | 11 | 120 | 2.0 | 4.9 | 0 | 61 | 16 | 0.3 |
| Microsoft | 26 | 6 | 113 | 3.4 | 28 | 13 | 39 | 9.1 | 8.1 |
| 31 | 5 | 100 | 3.9 | 14 | 0 | 43 | 3.6 | 4.7 | |
| Apple | 5 | 2 | 4 | 0.6 | 1.6 | 0 | 1.1 | 0.9 | 0.4 |
| Yahoo | 2 | 1 | 7 | 3.5 | 6.4 | 0 | 0 | 0 | 0.7 |
Distinct hyperscaler strategies, visible in the fuel mix:
- AWS has aggregated 114 GW of wind exposure across its 93 sites — by far the most renewable-coupled portfolio. Also heavy hydro (66 GW) from its OR/WA footprint and 201 GW of natural gas as baseline.
- Microsoft has the highest nuclear exposure (12.6 GW) — almost entirely from its Goodyear, AZ campuses near Palo Verde Nuclear.
- Meta has the most solar (16 GW) among the named hyperscalers, but minimal nuclear or wind — consistent with its New Mexico (Los Lunas) and Iowa builds.
- Google is split — moderate hydro and natural gas, modest renewables.
Largest non-metro grid neighborhoods (top sites by surrounding capacity)
| DC | Operator | Location | Nearby capacity | Fuel highlight |
|---|---|---|---|---|
| PHX70 / PHX-10 / PHX-11 | Microsoft (Azure) | Goodyear, AZ (RUCA 2) | 14.0–14.6 GW | 4.2 GW nuclear (Palo Verde) + 6.4 GW gas + 2.2 GW solar |
| Stream PHX-1 | Stream Data Centers | Goodyear, AZ | 13.4 GW | Same Palo Verde / gas mix |
| T5 Charlotte Campus | T5 | Kings Mountain, NC (RUCA 6) | 12.9 GW | 4.9 GW nuclear (Catawba) + 5.5 GW gas + 1.5 GW coal |
| Apple Maiden | Apple | Maiden, NC (RUCA 2) | 9.1 GW | 2.4 GW nuclear + 4.6 GW gas |
| Percheron DC | Rowan Green Data | (Texas, RUCA 10) | 6.7 GW | 3.0 GW wind + 0.9 GW hydro + 2.4 GW gas |
Data quality flags
196 of 1,833 DCs (10.7%) have nullResolved 2026-05-18 by backfilling fromstategeoidfirst-2-chars (state FIPS).master_data_centers.power_mwis populated for only 108 / 1,833 DCs (5.9%). Useless as a sizing metric without imputation or alternative source. Nearby EIA capacity is the more reliable proxy (used as the per-DC scale in this analysis). A grant-funded scrape of Baxtel / Data Center Map would close this gap.- 50 of 190 non-metro DCs (26%) have null
operator. Likely hyperscalers based on geography (OR/WA) but unconfirmed. - Operator-string fragmentation: "Meta" vs. "Meta, Inc."; "Amazon Web Services" vs. "Amazon AWS" vs. "amazon web services"; "Microsoft" vs. "Microsoft Azure". Inflates distinct-operator counts and fragments per-operator totals.
avg_household_sizecolumn has sentinel pollution (min: −666,666,666). Use median or filter before using.- 7 DCs failed RUCA join — Puerto Rico tracts or non-US locations; trivial.
- EIA generator coordinates had a longitude sign error for 2008-01 through 2010-11 (~11K rows with positive lower-48 longitudes). The flat-table build at ingest_eia_energy_layers.py:839-870 corrects this in
longitudeandgeom, so spatial joins are unaffected.
Suggested next steps
- Backfill
power_mwfrom Baxtel / Data Center Map (paid scrape — grant work). - Operator-string deduplication — collapse "Meta"/"Meta, Inc.", "AWS" variants, etc., before any per-operator analysis.
- Watershed (HUC8) join —
public.watershed_huc8is loaded but unused; would let us look at water stress overlap, particularly for the 190 non-metro DCs. - State-level energy demand context —
im3_state_projected_moderate_50andseds_state_msn_yearare loaded; joining these would let us compute "DC nearby capacity as share of state grid" rather than absolute MW. - Opposition cases overlay —
opposition_cases_geocodedis loaded but unused; could test whether cluster-vs-isolated demographic differences predict community opposition.