Move all Python scripts to scripts/, documentation to docs/, raw input data to data/, and generated HTML/CSV outputs to output/. Update path references in 8 scripts to use Path(__file__).parent.parent as project root so they work correctly from the new location. Update README links and quick-start commands accordingly. Notebooks remain at root. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.8 KiB
Data Centers, Submarine Cables, and the Concentrated-Costs / Dispersed-Benefits Frame
Author: David Adams · Date: 2026-05-17
Data: PostGIS data_centers DB — us_dc_sample_geocoded (1,489 DCs),
data_center_census_tracts_2024 (611 tracts, ACS 2024 5-yr enriched),
internet_cables (693 cables), internet_city_dominance (4,552 cities),
census_tract_acs_2024_selected_states.csv (83,811 tracts, 46 states).
1. Are US data centers spatially tied to submarine cables?
Distance from each point to the nearest submarine cable line (km):
| Group | n | Mean | p10 | p25 | p50 | p75 | p90 | ≤10 km | ≤50 km | ≤100 km | ≤250 km |
|---|---|---|---|---|---|---|---|---|---|---|---|
| US data centers | 1,489 | 358.7 | 21.6 | 163.1 | 276.1 | 477.4 | 867.4 | 5.2% | 16.8% | 21.4% | 32.2% |
| US population cities | 1,291 | 339.7 | 18.7 | 61.2 | 256.1 | 528.0 | 811.0 | 6.8% | 22.5% | 31.8% | 49.5% |
Mann-Whitney U two-sided: z = 2.66, p ≈ 0.008 — significant, but in the opposite direction. DCs are not systematically closer to cables than ordinary US cities.
Interpretation. At the national level the "cables drive DC siting" story fails. The largest clusters — Loudoun County VA (Ashburn), central Washington, Hillsboro OR, Columbus OH, Iowa — are inland, anchored to terrestrial fiber, cheap power, and tax incentives rather than submarine landings. Only 21.4% of DCs sit within 100 km of any cable.
2. Cost concentration at the state level
| Measure | Value |
|---|---|
| States covered | 46 |
| Gini of DC counts across states | 0.648 |
| HHI of state shares | 0.080 |
Top states by share of US data centers:
| State | DCs | Share | Cumulative |
|---|---|---|---|
| VA | 319 | 21.4% | 21.4% |
| CA | 129 | 8.7% | 30.1% |
| TX | 120 | 8.1% | 38.1% |
| OR | 102 | 6.9% | 45.0% |
| WA | 90 | 6.0% | 51.0% |
| OH | 69 | 4.6% | 55.7% |
| AZ | 60 | 4.0% | 59.7% |
| IA | 58 | 3.9% | 63.6% |
Five states hold half of all US data centers.
3. Cost concentration at the tract level
Much sharper than state-level:
| Measure | Value |
|---|---|
| DC-hosting tracts | 611 |
| DCs in those tracts | 1,489 |
| Gini of DC counts across DC-hosting tracts | 0.499 |
| HHI of DC shares across DC-hosting tracts | 0.0069 |
| Top 1% of host tracts (6 tracts) hold | 14.6% of all DCs |
| Top 5% of host tracts (30 tracts) hold | 33.3% of all DCs |
| Top 20% of host tracts (122 tracts) hold | 60.6% of all DCs |
Population scaling:
| Metric | Value |
|---|---|
| Population living in a DC-hosting tract | 2,868,863 |
| Total population (DC-state ACS universe) | 332,343,349 |
| % of DC-host-state residents in a DC-hosting tract | 0.86% |
| DCs per resident, DC-hosting tracts | 1 per 1,927 |
| DCs per resident, DC-state average | 1 per 223,199 |
| Per-capita DC burden, host vs. average | ~115× |
4. Who bears the costs? (ACS profile of DC tracts vs. peer tracts in same states)
| Field | DC tracts (median) | Non-DC peers (median) | Δ (DC − peer) |
|---|---|---|---|
| Median household income ($) | 91,082 | 76,637 | +14,446 |
| Per-capita income ($) | 48,111 | 38,546 | +9,565 |
| Broadband subscription (%) | 94.2 | 92.0 | +2.2 |
| Poverty rate (%) | 8.8 | 10.8 | −2.0 |
| Non-Hispanic White (%) | 52.4 | 64.7 | −12.3 |
| Non-Hispanic Black (%) | 6.7 | 3.9 | +2.8 |
| Hispanic/Latino (%) | 11.9 | 9.8 | +2.1 |
| Non-Hispanic Asian (%) | 5.2 | 1.5 | +3.7 |
Population-weighted means in DC tracts: MHI $109,145, broadband 93.2%, poverty 11.1%. The actual residents of host communities are concentrated in affluent tech corridors (Loudoun, Silicon Valley, Seattle eastside, Hillsboro OR).
Primary-industry mix of host tracts (count of tracts):
| Tracts | Primary industry |
|---|---|
| 351 | Educational services, and health care and social assistance |
| 133 | Professional, scientific, management, administrative, and waste management services |
| 35 | Manufacturing |
| 26 | Arts, entertainment, recreation, accommodation, and food services |
| 22 | Retail trade |
| 14 | Agriculture, forestry, fishing and hunting, and mining |
| 10 | Finance and insurance, and real estate and rental and leasing |
| 9 | Construction |
| 4 | Transportation and warehousing, and utilities |
| 3 | Public administration |
5. Cable-adjacent vs. inland DC tracts
| ≤100 km from a cable | >100 km from a cable | |
|---|---|---|
| Tracts | 159 | 452 |
| Data centers | 319 | 1,170 |
| Median household income ($) | 106,406 | 86,289 |
| Median broadband (%) | 95.2 | 93.9 |
| Median DC count | 1 | 1 |
Inland DCs are roughly 3.7× the cable-adjacent count. Coastal/cable tracts skew even wealthier than inland DC tracts.
6. Benefit dispersion (broadband subscribers as a benefit proxy)
| Measure | Value |
|---|---|
| Estimated broadband subscribers (DC states) | 119,719,313 |
| Tracts with subscriber data | 81,839 |
| Gini of subscribers across tracts | 0.253 |
| HHI of subscribers across tracts | 0.00001 |
Side-by-side concentration:
| Series | HHI |
|---|---|
| DCs across DC-hosting tracts | 0.0069 |
| Broadband subscribers across DC-state tracts | 0.00001 |
| Concentration ratio | ~464× more concentrated for DCs |
7. Verdict
| Element of the frame | Holds? |
|---|---|
| Costs concentrated geographically | Yes — top 6 tracts carry 15% of DCs; <1% of host-state population lives in a DC tract; per-capita burden ~115× the average. |
| Driven by submarine cable infrastructure | No, broadly — proximity test fails nationally; submarine cables matter for a coastal subset only. Terrestrial fiber, power, water, land, and tax incentives dominate. |
| Benefits dispersed among users | Yes — broadband subscribers ~464× more dispersed (by HHI) than DCs. |
| Classic political failure mode (weak losers vs. diffuse winners) | No. Host tracts skew wealthier, higher-income, higher-broadband than peers. The cost-bearing communities are affluent tech corridors with strong bargaining capacity — they tend to convert concentrated costs into concentrated rents (tax base, jobs, infrastructure concessions). |
Bottom line. The structural asymmetry that defines "concentrated costs / dispersed benefits" is unambiguous in the data — DC siting is hyper-local while benefits are continental. But the predicted political dynamic doesn't fit cleanly, because the loser side here is not weak. A more targeted test would split host tracts into power-stressed exurban tracts (parts of Loudoun's edges, central Oregon, Iowa) and urban-suburban tech-corridor tracts, and look at whether the exurban subset shows the weak-loser pattern (lower income, slower broadband, higher poverty than its neighbors).
Caveats
- The ACS universe is the 46 DC-host states (already DC-heavy); excludes states with no DCs in the sample.
data_center_census_tracts_2024only contains tracts that host at least one DC, by construction.- Broadband-subscription rate is a coarse benefit proxy; cloud services benefit any internet user globally, not just local subscribers.
- 45 of 1,489 DCs use city-precision fallback coordinates, so a small share of tract assignments are approximate.
- The
logical_dominance_ipsfield ininternet_city_dominancemeasures IP blocks routed/hosted at each city — a supply-side measure that duplicates the DC signal, not a demand-side user-location measure. It was excluded from the benefit-dispersion calculation for that reason.
Reproducible scripts
load_postgis_internet_cables.py— ingest cables/landings/citiesmake_internet_cables_map.py— render the combined Leaflet mapanalyze_cables_concentration.py— state-level + cable-proximity analysisanalyze_dc_tract_concentration.py— tract-level analysis used here