Enhance documentation with detailed findings from analysis report

- Add clustered vs isolated facility comparison to README
- Expand infrastructure insights with hyperscaler energy strategies
- Document additional database tables (opposition cases, IM3 projections, utility rates)
- Enhance research ideas with specific watershed names and grid saturation data
- Add data quality notes about EIA longitude corrections
- Reference loaded but unused tables for future analysis
This commit is contained in:
2026-05-27 11:36:50 -07:00
parent 3758dcc02a
commit 46c8c58545
3 changed files with 158 additions and 7 deletions

View File

@@ -40,12 +40,23 @@ Compared to the US average, data center host communities are:
- **Better connected**: 94.9% broadband (vs. 89%) - **Better connected**: 94.9% broadband (vs. 89%)
### Infrastructure Insights ### Infrastructure Insights
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts) - **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts) - only 1.11× over-index
- **Non-metro data centers (11%)** are dominated by hyperscalers: - **Non-metro data centers (11%)** are dominated by hyperscalers:
- AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities - AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities
- 66% are in Oregon + Washington (Columbia River hydro corridor) - 66% are in Oregon + Washington (Columbia River hydro corridor)
- **Energy infrastructure**: 4 states have >2/3 of generation within 50 km of a data center: - **Grid saturation**: 4 states have >2/3 of generation within 50 km of a data center:
- New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68% - New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68%
- **Hyperscaler energy strategies** (non-metro sites):
- AWS: 114 GW wind + 66 GW hydro
- Microsoft: 13 GW nuclear (Palo Verde co-location)
- Meta: 16 GW solar
### Clustered vs. Isolated Facilities
Facilities in DBSCAN clusters differ significantly from isolated sites:
- **$35K income gap**: Clustered sites in tracts with median income $108K vs. $73K for isolated
- **+18 pp education**: 51% bachelor's+ vs. 33%
- **More diverse**: 25 pp less non-Hispanic white
- **2× energy infrastructure**: 89 vs. 40 generators within 50 km
### Submarine Cables ### Submarine Cables
- **Data centers are NOT systematically closer to cables** than ordinary US cities - **Data centers are NOT systematically closer to cables** than ordinary US cities
@@ -194,10 +205,18 @@ python3 make_internet_cables_map.py
## Data Quality Notes ## Data Quality Notes
1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833) 1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833)
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.") inflate distinct-operator counts 2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.", AWS variants) inflate distinct-operator counts
3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment) 3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment)
4. **7 facilities** failed RUCA join (Puerto Rico / non-US) 4. **7 facilities** failed RUCA join (Puerto Rico / non-US)
5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global) 5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global)
6. **EIA longitude correction**: 2008-2010 generator coordinates had sign errors, corrected in flat-table build
## Known Limitations
- **Power capacity**: Only 5.9% populated - nearby EIA generator capacity used as proxy
- **Operator strings**: Need deduplication (50 of 190 non-metro facilities have null operator)
- **Benefit measurement**: Broadband subscribers are an imperfect proxy for cloud computing benefits
- **Universe**: Limited to 46 DC-host states (excludes DC-free states from ACS comparison)
## Research Ideas & Future Work ## Research Ideas & Future Work

View File

@@ -412,6 +412,93 @@ Tables are organized into four categories:
--- ---
### Other Tables
#### `opposition_cases_geocoded`
**Rows**: 18
**Purpose**: Geocoded community-opposition cases against data center builds
**Key Columns**:
- `case_id` (TEXT) - Unique identifier
- `developer` (TEXT) - Proposed developer/operator
- `investment_billions` (DOUBLE PRECISION) - Investment amount in billions
- `outcome` (TEXT) - Case outcome (approved, rejected, pending)
- `governance_response` (TEXT) - Government response
- `latitude`, `longitude`, `geom`
**Source**: Compiled from news archives
**Notes**: Loaded but currently unused - see research-ideas.md for proposed analyses
#### `census_tract_huc8_link`
**Rows**: 806
**Purpose**: Tract↔HUC8 spatial overlap table
**Key Columns**:
- `geoid` (TEXT) - Census tract GEOID
- `huc8` (TEXT) - HUC8 watershed code
- `overlap_pct` (DOUBLE PRECISION) - Percentage of tract overlapping watershed
**Notes**: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers
#### `im3_state_projected_moderate_50`
**Rows**: 328
**Purpose**: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)
**Key Columns**:
- `facility_id` (TEXT)
- `state` (TEXT)
- `cost_millions` (DOUBLE PRECISION)
- `it_mw` (DOUBLE PRECISION) - IT load in megawatts
- `cooling_water_demand_gal_per_day` (DOUBLE PRECISION)
- `latitude`, `longitude`, `geom`
**Source**: PNNL Integrated Multisector Multiscale Modeling (IM3)
**Notes**: Loaded but unused - potential for forward-projection analysis
#### `im3_projected_state_demand_summary`
**Rows**: 31
**Purpose**: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand
**Key Columns**:
- `state` (TEXT)
- `facility_count` (INTEGER)
- `total_it_mw` (DOUBLE PRECISION)
- `total_cooling_demand_mgd` (DOUBLE PRECISION) - Million gallons per day
**Source**: IM3 model outputs
#### `utility_rate_tracker_2025_2028`
**Rows**: 374
**Purpose**: Utility rate-increase tracker by provider × state × service type
**Key Columns**:
- `provider` (TEXT) - Utility provider name
- `state` (TEXT)
- `service_type` (TEXT)
- `effective_date` (DATE)
- `monthly_increase_dollars` (DOUBLE PRECISION)
- `percent_increase` (DOUBLE PRECISION)
**Source**: Utility rate tracker database
**Notes**: Loaded but unused in demographic/energy analysis
#### `energy_atlas_layers_catalog`
**Rows**: ~5
**Purpose**: Metadata catalog of EIA layers ingested
**Key Columns**:
- `table_name` (TEXT)
- `source_url` (TEXT)
- `import_timestamp` (TIMESTAMP)
- `row_count` (INTEGER)
**Notes**: Created by `ingest_eia_energy_layers.py`
---
## Commonly Used Joins ## Commonly Used Joins
### Data Center to Demographics ### Data Center to Demographics

View File

@@ -61,6 +61,8 @@ canonical_map = {
### 3. Water Stress Overlay ### 3. Water Stress Overlay
**Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities **Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities
**Priority**: HIGH - Critical for environmental impact analysis
**Approach**: **Approach**:
- Join to USGS WaterWatch streamflow data - Join to USGS WaterWatch streamflow data
- Add USGS Drought Watch indicators by HUC8 - Add USGS Drought Watch indicators by HUC8
@@ -69,10 +71,18 @@ canonical_map = {
- Surface water withdrawal permits - Surface water withdrawal permits
- Drought frequency/severity (USDM historical data) - Drought frequency/severity (USDM historical data)
**Key Watersheds for Focus**:
- **Middle Potomac-Catoctin** (HUC8 02070008): 235 DCs (12.8% of US total) - Loudoun/Ashburn
- **Middle Potomac-Anacostia-Occoquan** (02070010): 111 DCs - Fairfax/inner Loudoun
- **Coyote** (18050003): 88 DCs - Silicon Valley
- **Upper Scioto** (05060001): 73 DCs - Columbus OH
- **Umatilla** (17070103): 29 DCs - AWS-exclusive watershed
**Research Questions**: **Research Questions**:
- Are data centers sited in water-stressed watersheds? - Are data centers sited in water-stressed watersheds?
- Do high-density clusters (Loudoun County, Columbus OH) face water constraints? - Do high-density clusters (Loudoun County, Columbus OH) face water constraints?
- Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters - Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters
- Does single-operator watershed capture (Umatilla = AWS only) correlate with water availability?
**Tables to Create**: **Tables to Create**:
- `watershed_water_stress` - HUC8-level water stress indicators - `watershed_water_stress` - HUC8-level water stress indicators
@@ -83,27 +93,38 @@ canonical_map = {
--- ---
### 4. Opposition Cases Overlay ### 4. Opposition Cases Overlay
**Status**: Anecdotal evidence of community opposition to new data centers **Status**: 18 geocoded opposition cases in `opposition_cases_geocoded` table (loaded but unused)
**Approach**: **Approach**:
- Compile cases of rejected/delayed data center proposals (news archive scraping) - Expand dataset: Compile additional cases of rejected/delayed data center proposals from news archives
- Geocode opposition cases, join to demographics/hazards - Geocode all opposition cases, join to demographics/hazards
- Test hypotheses: - Test hypotheses:
- Do wealthier/more educated communities successfully block projects? - Do wealthier/more educated communities successfully block projects?
- Are opposition cases more common in water-stressed or drought-prone areas? - Are opposition cases more common in water-stressed or drought-prone areas?
- Do smaller non-metro communities have less bargaining power? - Do smaller non-metro communities have less bargaining power?
- Does clustered vs. isolated location predict opposition likelihood?
**Research Questions**: **Research Questions**:
- What predicts opposition success? - What predicts opposition success?
- Are opposition cases spatially clustered? - Are opposition cases spatially clustered?
- Do demographics differ between accepted vs. rejected sites? - Do demographics differ between accepted vs. rejected sites?
- Correlation with FEMA hazard exposure scores?
**Analysis Plan**:
```sql
-- Join opposition cases to demographics
SELECT o.*, ct.median_household_income, ct.bachelors_or_higher_pct
FROM opposition_cases_geocoded o
JOIN _dc_census_tract_acs_2024 ct
ON ST_Contains(ct.geom, o.geom);
```
**Output**: `opposition_cases_analysis.md` **Output**: `opposition_cases_analysis.md`
--- ---
### 5. IM3 Forward Projection Integration ### 5. IM3 Forward Projection Integration
**Status**: IM3 model includes projected data center demand growth **Status**: IM3 model data loaded in `im3_state_projected_moderate_50` (328 rows) and `im3_projected_state_demand_summary` (31 rows)
**Approach**: **Approach**:
- Load IM3 projected demand scenarios (2030, 2040, 2050) - Load IM3 projected demand scenarios (2030, 2040, 2050)
@@ -113,10 +134,34 @@ canonical_map = {
- Land availability (zoned industrial parcels) - Land availability (zoned industrial parcels)
- Identify regions where projected demand may exceed infrastructure capacity - Identify regions where projected demand may exceed infrastructure capacity
**Grid Saturation Context** (from current analysis):
- **New Jersey**: 83% of grid within 50 km of DC
- **Nevada**: 75%
- **Tennessee**: 70%
- **Oregon**: 68%
- **Arizona**: 56%
- **Virginia**: 50%
**Research Questions**: **Research Questions**:
- Which states face grid saturation from data center growth? - Which states face grid saturation from data center growth?
- Are projected sites in water-stressed watersheds? - Are projected sites in water-stressed watersheds?
- Does IM3 assume spatial distribution patterns consistent with current siting? - Does IM3 assume spatial distribution patterns consistent with current siting?
- Can states with >50% grid saturation accommodate projected demand?
**Implementation**:
```sql
-- Compare current saturation to IM3 projected demand
SELECT
current.state,
current.dc_count,
current.pct_grid_saturated,
proj.facility_count AS projected_new_facilities,
proj.total_it_mw AS projected_new_mw
FROM state_grid_saturation current
JOIN im3_projected_state_demand_summary proj ON current.state = proj.state
WHERE current.pct_grid_saturated > 50
ORDER BY current.pct_grid_saturated DESC;
```
**Notebook**: `im3_projection_overlay.ipynb` **Notebook**: `im3_projection_overlay.ipynb`