Enhance documentation with detailed findings from analysis report
- Add clustered vs isolated facility comparison to README - Expand infrastructure insights with hyperscaler energy strategies - Document additional database tables (opposition cases, IM3 projections, utility rates) - Enhance research ideas with specific watershed names and grid saturation data - Add data quality notes about EIA longitude corrections - Reference loaded but unused tables for future analysis
This commit is contained in:
25
README.md
25
README.md
@@ -40,12 +40,23 @@ Compared to the US average, data center host communities are:
|
||||
- **Better connected**: 94.9% broadband (vs. 89%)
|
||||
|
||||
### Infrastructure Insights
|
||||
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts)
|
||||
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts) - only 1.11× over-index
|
||||
- **Non-metro data centers (11%)** are dominated by hyperscalers:
|
||||
- AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities
|
||||
- 66% are in Oregon + Washington (Columbia River hydro corridor)
|
||||
- **Energy infrastructure**: 4 states have >2/3 of generation within 50 km of a data center:
|
||||
- **Grid saturation**: 4 states have >2/3 of generation within 50 km of a data center:
|
||||
- New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68%
|
||||
- **Hyperscaler energy strategies** (non-metro sites):
|
||||
- AWS: 114 GW wind + 66 GW hydro
|
||||
- Microsoft: 13 GW nuclear (Palo Verde co-location)
|
||||
- Meta: 16 GW solar
|
||||
|
||||
### Clustered vs. Isolated Facilities
|
||||
Facilities in DBSCAN clusters differ significantly from isolated sites:
|
||||
- **$35K income gap**: Clustered sites in tracts with median income $108K vs. $73K for isolated
|
||||
- **+18 pp education**: 51% bachelor's+ vs. 33%
|
||||
- **More diverse**: 25 pp less non-Hispanic white
|
||||
- **2× energy infrastructure**: 89 vs. 40 generators within 50 km
|
||||
|
||||
### Submarine Cables
|
||||
- **Data centers are NOT systematically closer to cables** than ordinary US cities
|
||||
@@ -194,10 +205,18 @@ python3 make_internet_cables_map.py
|
||||
## Data Quality Notes
|
||||
|
||||
1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833)
|
||||
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.") inflate distinct-operator counts
|
||||
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.", AWS variants) inflate distinct-operator counts
|
||||
3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment)
|
||||
4. **7 facilities** failed RUCA join (Puerto Rico / non-US)
|
||||
5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global)
|
||||
6. **EIA longitude correction**: 2008-2010 generator coordinates had sign errors, corrected in flat-table build
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- **Power capacity**: Only 5.9% populated - nearby EIA generator capacity used as proxy
|
||||
- **Operator strings**: Need deduplication (50 of 190 non-metro facilities have null operator)
|
||||
- **Benefit measurement**: Broadband subscribers are an imperfect proxy for cloud computing benefits
|
||||
- **Universe**: Limited to 46 DC-host states (excludes DC-free states from ACS comparison)
|
||||
|
||||
## Research Ideas & Future Work
|
||||
|
||||
|
||||
@@ -412,6 +412,93 @@ Tables are organized into four categories:
|
||||
|
||||
---
|
||||
|
||||
### Other Tables
|
||||
|
||||
#### `opposition_cases_geocoded`
|
||||
**Rows**: 18
|
||||
**Purpose**: Geocoded community-opposition cases against data center builds
|
||||
|
||||
**Key Columns**:
|
||||
- `case_id` (TEXT) - Unique identifier
|
||||
- `developer` (TEXT) - Proposed developer/operator
|
||||
- `investment_billions` (DOUBLE PRECISION) - Investment amount in billions
|
||||
- `outcome` (TEXT) - Case outcome (approved, rejected, pending)
|
||||
- `governance_response` (TEXT) - Government response
|
||||
- `latitude`, `longitude`, `geom`
|
||||
|
||||
**Source**: Compiled from news archives
|
||||
|
||||
**Notes**: Loaded but currently unused - see research-ideas.md for proposed analyses
|
||||
|
||||
#### `census_tract_huc8_link`
|
||||
**Rows**: 806
|
||||
**Purpose**: Tract↔HUC8 spatial overlap table
|
||||
|
||||
**Key Columns**:
|
||||
- `geoid` (TEXT) - Census tract GEOID
|
||||
- `huc8` (TEXT) - HUC8 watershed code
|
||||
- `overlap_pct` (DOUBLE PRECISION) - Percentage of tract overlapping watershed
|
||||
|
||||
**Notes**: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers
|
||||
|
||||
#### `im3_state_projected_moderate_50`
|
||||
**Rows**: 328
|
||||
**Purpose**: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)
|
||||
|
||||
**Key Columns**:
|
||||
- `facility_id` (TEXT)
|
||||
- `state` (TEXT)
|
||||
- `cost_millions` (DOUBLE PRECISION)
|
||||
- `it_mw` (DOUBLE PRECISION) - IT load in megawatts
|
||||
- `cooling_water_demand_gal_per_day` (DOUBLE PRECISION)
|
||||
- `latitude`, `longitude`, `geom`
|
||||
|
||||
**Source**: PNNL Integrated Multisector Multiscale Modeling (IM3)
|
||||
|
||||
**Notes**: Loaded but unused - potential for forward-projection analysis
|
||||
|
||||
#### `im3_projected_state_demand_summary`
|
||||
**Rows**: 31
|
||||
**Purpose**: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand
|
||||
|
||||
**Key Columns**:
|
||||
- `state` (TEXT)
|
||||
- `facility_count` (INTEGER)
|
||||
- `total_it_mw` (DOUBLE PRECISION)
|
||||
- `total_cooling_demand_mgd` (DOUBLE PRECISION) - Million gallons per day
|
||||
|
||||
**Source**: IM3 model outputs
|
||||
|
||||
#### `utility_rate_tracker_2025_2028`
|
||||
**Rows**: 374
|
||||
**Purpose**: Utility rate-increase tracker by provider × state × service type
|
||||
|
||||
**Key Columns**:
|
||||
- `provider` (TEXT) - Utility provider name
|
||||
- `state` (TEXT)
|
||||
- `service_type` (TEXT)
|
||||
- `effective_date` (DATE)
|
||||
- `monthly_increase_dollars` (DOUBLE PRECISION)
|
||||
- `percent_increase` (DOUBLE PRECISION)
|
||||
|
||||
**Source**: Utility rate tracker database
|
||||
|
||||
**Notes**: Loaded but unused in demographic/energy analysis
|
||||
|
||||
#### `energy_atlas_layers_catalog`
|
||||
**Rows**: ~5
|
||||
**Purpose**: Metadata catalog of EIA layers ingested
|
||||
|
||||
**Key Columns**:
|
||||
- `table_name` (TEXT)
|
||||
- `source_url` (TEXT)
|
||||
- `import_timestamp` (TIMESTAMP)
|
||||
- `row_count` (INTEGER)
|
||||
|
||||
**Notes**: Created by `ingest_eia_energy_layers.py`
|
||||
|
||||
---
|
||||
|
||||
## Commonly Used Joins
|
||||
|
||||
### Data Center to Demographics
|
||||
|
||||
@@ -61,6 +61,8 @@ canonical_map = {
|
||||
### 3. Water Stress Overlay
|
||||
**Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities
|
||||
|
||||
**Priority**: HIGH - Critical for environmental impact analysis
|
||||
|
||||
**Approach**:
|
||||
- Join to USGS WaterWatch streamflow data
|
||||
- Add USGS Drought Watch indicators by HUC8
|
||||
@@ -69,10 +71,18 @@ canonical_map = {
|
||||
- Surface water withdrawal permits
|
||||
- Drought frequency/severity (USDM historical data)
|
||||
|
||||
**Key Watersheds for Focus**:
|
||||
- **Middle Potomac-Catoctin** (HUC8 02070008): 235 DCs (12.8% of US total) - Loudoun/Ashburn
|
||||
- **Middle Potomac-Anacostia-Occoquan** (02070010): 111 DCs - Fairfax/inner Loudoun
|
||||
- **Coyote** (18050003): 88 DCs - Silicon Valley
|
||||
- **Upper Scioto** (05060001): 73 DCs - Columbus OH
|
||||
- **Umatilla** (17070103): 29 DCs - AWS-exclusive watershed
|
||||
|
||||
**Research Questions**:
|
||||
- Are data centers sited in water-stressed watersheds?
|
||||
- Do high-density clusters (Loudoun County, Columbus OH) face water constraints?
|
||||
- Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters
|
||||
- Does single-operator watershed capture (Umatilla = AWS only) correlate with water availability?
|
||||
|
||||
**Tables to Create**:
|
||||
- `watershed_water_stress` - HUC8-level water stress indicators
|
||||
@@ -83,27 +93,38 @@ canonical_map = {
|
||||
---
|
||||
|
||||
### 4. Opposition Cases Overlay
|
||||
**Status**: Anecdotal evidence of community opposition to new data centers
|
||||
**Status**: 18 geocoded opposition cases in `opposition_cases_geocoded` table (loaded but unused)
|
||||
|
||||
**Approach**:
|
||||
- Compile cases of rejected/delayed data center proposals (news archive scraping)
|
||||
- Geocode opposition cases, join to demographics/hazards
|
||||
- Expand dataset: Compile additional cases of rejected/delayed data center proposals from news archives
|
||||
- Geocode all opposition cases, join to demographics/hazards
|
||||
- Test hypotheses:
|
||||
- Do wealthier/more educated communities successfully block projects?
|
||||
- Are opposition cases more common in water-stressed or drought-prone areas?
|
||||
- Do smaller non-metro communities have less bargaining power?
|
||||
- Does clustered vs. isolated location predict opposition likelihood?
|
||||
|
||||
**Research Questions**:
|
||||
- What predicts opposition success?
|
||||
- Are opposition cases spatially clustered?
|
||||
- Do demographics differ between accepted vs. rejected sites?
|
||||
- Correlation with FEMA hazard exposure scores?
|
||||
|
||||
**Analysis Plan**:
|
||||
```sql
|
||||
-- Join opposition cases to demographics
|
||||
SELECT o.*, ct.median_household_income, ct.bachelors_or_higher_pct
|
||||
FROM opposition_cases_geocoded o
|
||||
JOIN _dc_census_tract_acs_2024 ct
|
||||
ON ST_Contains(ct.geom, o.geom);
|
||||
```
|
||||
|
||||
**Output**: `opposition_cases_analysis.md`
|
||||
|
||||
---
|
||||
|
||||
### 5. IM3 Forward Projection Integration
|
||||
**Status**: IM3 model includes projected data center demand growth
|
||||
**Status**: IM3 model data loaded in `im3_state_projected_moderate_50` (328 rows) and `im3_projected_state_demand_summary` (31 rows)
|
||||
|
||||
**Approach**:
|
||||
- Load IM3 projected demand scenarios (2030, 2040, 2050)
|
||||
@@ -113,10 +134,34 @@ canonical_map = {
|
||||
- Land availability (zoned industrial parcels)
|
||||
- Identify regions where projected demand may exceed infrastructure capacity
|
||||
|
||||
**Grid Saturation Context** (from current analysis):
|
||||
- **New Jersey**: 83% of grid within 50 km of DC
|
||||
- **Nevada**: 75%
|
||||
- **Tennessee**: 70%
|
||||
- **Oregon**: 68%
|
||||
- **Arizona**: 56%
|
||||
- **Virginia**: 50%
|
||||
|
||||
**Research Questions**:
|
||||
- Which states face grid saturation from data center growth?
|
||||
- Are projected sites in water-stressed watersheds?
|
||||
- Does IM3 assume spatial distribution patterns consistent with current siting?
|
||||
- Can states with >50% grid saturation accommodate projected demand?
|
||||
|
||||
**Implementation**:
|
||||
```sql
|
||||
-- Compare current saturation to IM3 projected demand
|
||||
SELECT
|
||||
current.state,
|
||||
current.dc_count,
|
||||
current.pct_grid_saturated,
|
||||
proj.facility_count AS projected_new_facilities,
|
||||
proj.total_it_mw AS projected_new_mw
|
||||
FROM state_grid_saturation current
|
||||
JOIN im3_projected_state_demand_summary proj ON current.state = proj.state
|
||||
WHERE current.pct_grid_saturated > 50
|
||||
ORDER BY current.pct_grid_saturated DESC;
|
||||
```
|
||||
|
||||
**Notebook**: `im3_projection_overlay.ipynb`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user