Enhance documentation with detailed findings from analysis report

- Add clustered vs isolated facility comparison to README
- Expand infrastructure insights with hyperscaler energy strategies
- Document additional database tables (opposition cases, IM3 projections, utility rates)
- Enhance research ideas with specific watershed names and grid saturation data
- Add data quality notes about EIA longitude corrections
- Reference loaded but unused tables for future analysis
This commit is contained in:
2026-05-27 11:36:50 -07:00
parent 3758dcc02a
commit 46c8c58545
3 changed files with 158 additions and 7 deletions

View File

@@ -40,12 +40,23 @@ Compared to the US average, data center host communities are:
- **Better connected**: 94.9% broadband (vs. 89%)
### Infrastructure Insights
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts)
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts) - only 1.11× over-index
- **Non-metro data centers (11%)** are dominated by hyperscalers:
- AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities
- 66% are in Oregon + Washington (Columbia River hydro corridor)
- **Energy infrastructure**: 4 states have >2/3 of generation within 50 km of a data center:
- **Grid saturation**: 4 states have >2/3 of generation within 50 km of a data center:
- New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68%
- **Hyperscaler energy strategies** (non-metro sites):
- AWS: 114 GW wind + 66 GW hydro
- Microsoft: 13 GW nuclear (Palo Verde co-location)
- Meta: 16 GW solar
### Clustered vs. Isolated Facilities
Facilities in DBSCAN clusters differ significantly from isolated sites:
- **$35K income gap**: Clustered sites in tracts with median income $108K vs. $73K for isolated
- **+18 pp education**: 51% bachelor's+ vs. 33%
- **More diverse**: 25 pp less non-Hispanic white
- **2× energy infrastructure**: 89 vs. 40 generators within 50 km
### Submarine Cables
- **Data centers are NOT systematically closer to cables** than ordinary US cities
@@ -194,10 +205,18 @@ python3 make_internet_cables_map.py
## Data Quality Notes
1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833)
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.") inflate distinct-operator counts
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.", AWS variants) inflate distinct-operator counts
3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment)
4. **7 facilities** failed RUCA join (Puerto Rico / non-US)
5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global)
6. **EIA longitude correction**: 2008-2010 generator coordinates had sign errors, corrected in flat-table build
## Known Limitations
- **Power capacity**: Only 5.9% populated - nearby EIA generator capacity used as proxy
- **Operator strings**: Need deduplication (50 of 190 non-metro facilities have null operator)
- **Benefit measurement**: Broadband subscribers are an imperfect proxy for cloud computing benefits
- **Universe**: Limited to 46 DC-host states (excludes DC-free states from ACS comparison)
## Research Ideas & Future Work