Enhance documentation with detailed findings from analysis report
- Add clustered vs isolated facility comparison to README - Expand infrastructure insights with hyperscaler energy strategies - Document additional database tables (opposition cases, IM3 projections, utility rates) - Enhance research ideas with specific watershed names and grid saturation data - Add data quality notes about EIA longitude corrections - Reference loaded but unused tables for future analysis
This commit is contained in:
25
README.md
25
README.md
@@ -40,12 +40,23 @@ Compared to the US average, data center host communities are:
|
|||||||
- **Better connected**: 94.9% broadband (vs. 89%)
|
- **Better connected**: 94.9% broadband (vs. 89%)
|
||||||
|
|
||||||
### Infrastructure Insights
|
### Infrastructure Insights
|
||||||
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts)
|
- **89% of data centers are in metropolitan tracts** (vs. 80% of all US tracts) - only 1.11× over-index
|
||||||
- **Non-metro data centers (11%)** are dominated by hyperscalers:
|
- **Non-metro data centers (11%)** are dominated by hyperscalers:
|
||||||
- AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities
|
- AWS (67), Meta (22), Microsoft (10), Google (4) = 55% of non-metro facilities
|
||||||
- 66% are in Oregon + Washington (Columbia River hydro corridor)
|
- 66% are in Oregon + Washington (Columbia River hydro corridor)
|
||||||
- **Energy infrastructure**: 4 states have >2/3 of generation within 50 km of a data center:
|
- **Grid saturation**: 4 states have >2/3 of generation within 50 km of a data center:
|
||||||
- New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68%
|
- New Jersey: 83%, Nevada: 75%, Tennessee: 70%, Oregon: 68%
|
||||||
|
- **Hyperscaler energy strategies** (non-metro sites):
|
||||||
|
- AWS: 114 GW wind + 66 GW hydro
|
||||||
|
- Microsoft: 13 GW nuclear (Palo Verde co-location)
|
||||||
|
- Meta: 16 GW solar
|
||||||
|
|
||||||
|
### Clustered vs. Isolated Facilities
|
||||||
|
Facilities in DBSCAN clusters differ significantly from isolated sites:
|
||||||
|
- **$35K income gap**: Clustered sites in tracts with median income $108K vs. $73K for isolated
|
||||||
|
- **+18 pp education**: 51% bachelor's+ vs. 33%
|
||||||
|
- **More diverse**: 25 pp less non-Hispanic white
|
||||||
|
- **2× energy infrastructure**: 89 vs. 40 generators within 50 km
|
||||||
|
|
||||||
### Submarine Cables
|
### Submarine Cables
|
||||||
- **Data centers are NOT systematically closer to cables** than ordinary US cities
|
- **Data centers are NOT systematically closer to cables** than ordinary US cities
|
||||||
@@ -194,10 +205,18 @@ python3 make_internet_cables_map.py
|
|||||||
## Data Quality Notes
|
## Data Quality Notes
|
||||||
|
|
||||||
1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833)
|
1. **Incomplete power ratings**: Only 5.9% of data centers have power ratings (108/1,833)
|
||||||
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.") inflate distinct-operator counts
|
2. **Operator fragmentation**: String variations ("Meta" vs. "Meta, Inc.", AWS variants) inflate distinct-operator counts
|
||||||
3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment)
|
3. **45 facilities** use city-precision fallback coordinates (approximate tract assignment)
|
||||||
4. **7 facilities** failed RUCA join (Puerto Rico / non-US)
|
4. **7 facilities** failed RUCA join (Puerto Rico / non-US)
|
||||||
5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global)
|
5. **Broadband subscribers** are a coarse benefit proxy (actual cloud users are global)
|
||||||
|
6. **EIA longitude correction**: 2008-2010 generator coordinates had sign errors, corrected in flat-table build
|
||||||
|
|
||||||
|
## Known Limitations
|
||||||
|
|
||||||
|
- **Power capacity**: Only 5.9% populated - nearby EIA generator capacity used as proxy
|
||||||
|
- **Operator strings**: Need deduplication (50 of 190 non-metro facilities have null operator)
|
||||||
|
- **Benefit measurement**: Broadband subscribers are an imperfect proxy for cloud computing benefits
|
||||||
|
- **Universe**: Limited to 46 DC-host states (excludes DC-free states from ACS comparison)
|
||||||
|
|
||||||
## Research Ideas & Future Work
|
## Research Ideas & Future Work
|
||||||
|
|
||||||
|
|||||||
@@ -412,6 +412,93 @@ Tables are organized into four categories:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
### Other Tables
|
||||||
|
|
||||||
|
#### `opposition_cases_geocoded`
|
||||||
|
**Rows**: 18
|
||||||
|
**Purpose**: Geocoded community-opposition cases against data center builds
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `case_id` (TEXT) - Unique identifier
|
||||||
|
- `developer` (TEXT) - Proposed developer/operator
|
||||||
|
- `investment_billions` (DOUBLE PRECISION) - Investment amount in billions
|
||||||
|
- `outcome` (TEXT) - Case outcome (approved, rejected, pending)
|
||||||
|
- `governance_response` (TEXT) - Government response
|
||||||
|
- `latitude`, `longitude`, `geom`
|
||||||
|
|
||||||
|
**Source**: Compiled from news archives
|
||||||
|
|
||||||
|
**Notes**: Loaded but currently unused - see research-ideas.md for proposed analyses
|
||||||
|
|
||||||
|
#### `census_tract_huc8_link`
|
||||||
|
**Rows**: 806
|
||||||
|
**Purpose**: Tract↔HUC8 spatial overlap table
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `geoid` (TEXT) - Census tract GEOID
|
||||||
|
- `huc8` (TEXT) - HUC8 watershed code
|
||||||
|
- `overlap_pct` (DOUBLE PRECISION) - Percentage of tract overlapping watershed
|
||||||
|
|
||||||
|
**Notes**: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers
|
||||||
|
|
||||||
|
#### `im3_state_projected_moderate_50`
|
||||||
|
**Rows**: 328
|
||||||
|
**Purpose**: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `facility_id` (TEXT)
|
||||||
|
- `state` (TEXT)
|
||||||
|
- `cost_millions` (DOUBLE PRECISION)
|
||||||
|
- `it_mw` (DOUBLE PRECISION) - IT load in megawatts
|
||||||
|
- `cooling_water_demand_gal_per_day` (DOUBLE PRECISION)
|
||||||
|
- `latitude`, `longitude`, `geom`
|
||||||
|
|
||||||
|
**Source**: PNNL Integrated Multisector Multiscale Modeling (IM3)
|
||||||
|
|
||||||
|
**Notes**: Loaded but unused - potential for forward-projection analysis
|
||||||
|
|
||||||
|
#### `im3_projected_state_demand_summary`
|
||||||
|
**Rows**: 31
|
||||||
|
**Purpose**: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `state` (TEXT)
|
||||||
|
- `facility_count` (INTEGER)
|
||||||
|
- `total_it_mw` (DOUBLE PRECISION)
|
||||||
|
- `total_cooling_demand_mgd` (DOUBLE PRECISION) - Million gallons per day
|
||||||
|
|
||||||
|
**Source**: IM3 model outputs
|
||||||
|
|
||||||
|
#### `utility_rate_tracker_2025_2028`
|
||||||
|
**Rows**: 374
|
||||||
|
**Purpose**: Utility rate-increase tracker by provider × state × service type
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `provider` (TEXT) - Utility provider name
|
||||||
|
- `state` (TEXT)
|
||||||
|
- `service_type` (TEXT)
|
||||||
|
- `effective_date` (DATE)
|
||||||
|
- `monthly_increase_dollars` (DOUBLE PRECISION)
|
||||||
|
- `percent_increase` (DOUBLE PRECISION)
|
||||||
|
|
||||||
|
**Source**: Utility rate tracker database
|
||||||
|
|
||||||
|
**Notes**: Loaded but unused in demographic/energy analysis
|
||||||
|
|
||||||
|
#### `energy_atlas_layers_catalog`
|
||||||
|
**Rows**: ~5
|
||||||
|
**Purpose**: Metadata catalog of EIA layers ingested
|
||||||
|
|
||||||
|
**Key Columns**:
|
||||||
|
- `table_name` (TEXT)
|
||||||
|
- `source_url` (TEXT)
|
||||||
|
- `import_timestamp` (TIMESTAMP)
|
||||||
|
- `row_count` (INTEGER)
|
||||||
|
|
||||||
|
**Notes**: Created by `ingest_eia_energy_layers.py`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Commonly Used Joins
|
## Commonly Used Joins
|
||||||
|
|
||||||
### Data Center to Demographics
|
### Data Center to Demographics
|
||||||
|
|||||||
@@ -61,6 +61,8 @@ canonical_map = {
|
|||||||
### 3. Water Stress Overlay
|
### 3. Water Stress Overlay
|
||||||
**Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities
|
**Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities
|
||||||
|
|
||||||
|
**Priority**: HIGH - Critical for environmental impact analysis
|
||||||
|
|
||||||
**Approach**:
|
**Approach**:
|
||||||
- Join to USGS WaterWatch streamflow data
|
- Join to USGS WaterWatch streamflow data
|
||||||
- Add USGS Drought Watch indicators by HUC8
|
- Add USGS Drought Watch indicators by HUC8
|
||||||
@@ -69,10 +71,18 @@ canonical_map = {
|
|||||||
- Surface water withdrawal permits
|
- Surface water withdrawal permits
|
||||||
- Drought frequency/severity (USDM historical data)
|
- Drought frequency/severity (USDM historical data)
|
||||||
|
|
||||||
|
**Key Watersheds for Focus**:
|
||||||
|
- **Middle Potomac-Catoctin** (HUC8 02070008): 235 DCs (12.8% of US total) - Loudoun/Ashburn
|
||||||
|
- **Middle Potomac-Anacostia-Occoquan** (02070010): 111 DCs - Fairfax/inner Loudoun
|
||||||
|
- **Coyote** (18050003): 88 DCs - Silicon Valley
|
||||||
|
- **Upper Scioto** (05060001): 73 DCs - Columbus OH
|
||||||
|
- **Umatilla** (17070103): 29 DCs - AWS-exclusive watershed
|
||||||
|
|
||||||
**Research Questions**:
|
**Research Questions**:
|
||||||
- Are data centers sited in water-stressed watersheds?
|
- Are data centers sited in water-stressed watersheds?
|
||||||
- Do high-density clusters (Loudoun County, Columbus OH) face water constraints?
|
- Do high-density clusters (Loudoun County, Columbus OH) face water constraints?
|
||||||
- Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters
|
- Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters
|
||||||
|
- Does single-operator watershed capture (Umatilla = AWS only) correlate with water availability?
|
||||||
|
|
||||||
**Tables to Create**:
|
**Tables to Create**:
|
||||||
- `watershed_water_stress` - HUC8-level water stress indicators
|
- `watershed_water_stress` - HUC8-level water stress indicators
|
||||||
@@ -83,27 +93,38 @@ canonical_map = {
|
|||||||
---
|
---
|
||||||
|
|
||||||
### 4. Opposition Cases Overlay
|
### 4. Opposition Cases Overlay
|
||||||
**Status**: Anecdotal evidence of community opposition to new data centers
|
**Status**: 18 geocoded opposition cases in `opposition_cases_geocoded` table (loaded but unused)
|
||||||
|
|
||||||
**Approach**:
|
**Approach**:
|
||||||
- Compile cases of rejected/delayed data center proposals (news archive scraping)
|
- Expand dataset: Compile additional cases of rejected/delayed data center proposals from news archives
|
||||||
- Geocode opposition cases, join to demographics/hazards
|
- Geocode all opposition cases, join to demographics/hazards
|
||||||
- Test hypotheses:
|
- Test hypotheses:
|
||||||
- Do wealthier/more educated communities successfully block projects?
|
- Do wealthier/more educated communities successfully block projects?
|
||||||
- Are opposition cases more common in water-stressed or drought-prone areas?
|
- Are opposition cases more common in water-stressed or drought-prone areas?
|
||||||
- Do smaller non-metro communities have less bargaining power?
|
- Do smaller non-metro communities have less bargaining power?
|
||||||
|
- Does clustered vs. isolated location predict opposition likelihood?
|
||||||
|
|
||||||
**Research Questions**:
|
**Research Questions**:
|
||||||
- What predicts opposition success?
|
- What predicts opposition success?
|
||||||
- Are opposition cases spatially clustered?
|
- Are opposition cases spatially clustered?
|
||||||
- Do demographics differ between accepted vs. rejected sites?
|
- Do demographics differ between accepted vs. rejected sites?
|
||||||
|
- Correlation with FEMA hazard exposure scores?
|
||||||
|
|
||||||
|
**Analysis Plan**:
|
||||||
|
```sql
|
||||||
|
-- Join opposition cases to demographics
|
||||||
|
SELECT o.*, ct.median_household_income, ct.bachelors_or_higher_pct
|
||||||
|
FROM opposition_cases_geocoded o
|
||||||
|
JOIN _dc_census_tract_acs_2024 ct
|
||||||
|
ON ST_Contains(ct.geom, o.geom);
|
||||||
|
```
|
||||||
|
|
||||||
**Output**: `opposition_cases_analysis.md`
|
**Output**: `opposition_cases_analysis.md`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### 5. IM3 Forward Projection Integration
|
### 5. IM3 Forward Projection Integration
|
||||||
**Status**: IM3 model includes projected data center demand growth
|
**Status**: IM3 model data loaded in `im3_state_projected_moderate_50` (328 rows) and `im3_projected_state_demand_summary` (31 rows)
|
||||||
|
|
||||||
**Approach**:
|
**Approach**:
|
||||||
- Load IM3 projected demand scenarios (2030, 2040, 2050)
|
- Load IM3 projected demand scenarios (2030, 2040, 2050)
|
||||||
@@ -113,10 +134,34 @@ canonical_map = {
|
|||||||
- Land availability (zoned industrial parcels)
|
- Land availability (zoned industrial parcels)
|
||||||
- Identify regions where projected demand may exceed infrastructure capacity
|
- Identify regions where projected demand may exceed infrastructure capacity
|
||||||
|
|
||||||
|
**Grid Saturation Context** (from current analysis):
|
||||||
|
- **New Jersey**: 83% of grid within 50 km of DC
|
||||||
|
- **Nevada**: 75%
|
||||||
|
- **Tennessee**: 70%
|
||||||
|
- **Oregon**: 68%
|
||||||
|
- **Arizona**: 56%
|
||||||
|
- **Virginia**: 50%
|
||||||
|
|
||||||
**Research Questions**:
|
**Research Questions**:
|
||||||
- Which states face grid saturation from data center growth?
|
- Which states face grid saturation from data center growth?
|
||||||
- Are projected sites in water-stressed watersheds?
|
- Are projected sites in water-stressed watersheds?
|
||||||
- Does IM3 assume spatial distribution patterns consistent with current siting?
|
- Does IM3 assume spatial distribution patterns consistent with current siting?
|
||||||
|
- Can states with >50% grid saturation accommodate projected demand?
|
||||||
|
|
||||||
|
**Implementation**:
|
||||||
|
```sql
|
||||||
|
-- Compare current saturation to IM3 projected demand
|
||||||
|
SELECT
|
||||||
|
current.state,
|
||||||
|
current.dc_count,
|
||||||
|
current.pct_grid_saturated,
|
||||||
|
proj.facility_count AS projected_new_facilities,
|
||||||
|
proj.total_it_mw AS projected_new_mw
|
||||||
|
FROM state_grid_saturation current
|
||||||
|
JOIN im3_projected_state_demand_summary proj ON current.state = proj.state
|
||||||
|
WHERE current.pct_grid_saturated > 50
|
||||||
|
ORDER BY current.pct_grid_saturated DESC;
|
||||||
|
```
|
||||||
|
|
||||||
**Notebook**: `im3_projection_overlay.ipynb`
|
**Notebook**: `im3_projection_overlay.ipynb`
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user