Reorganize project into scripts/, docs/, data/, output/ directories
Move all Python scripts to scripts/, documentation to docs/, raw input data to data/, and generated HTML/CSV outputs to output/. Update path references in 8 scripts to use Path(__file__).parent.parent as project root so they work correctly from the new location. Update README links and quick-start commands accordingly. Notebooks remain at root. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
623
docs/research-ideas.md
Normal file
623
docs/research-ideas.md
Normal file
@@ -0,0 +1,623 @@
|
||||
# Research Ideas & Future Work
|
||||
|
||||
This document outlines potential research directions, data improvements, and analyses that could extend the current US Data Centers geospatial research infrastructure.
|
||||
|
||||
## Priority Next Steps
|
||||
|
||||
### 1. Backfill Power Capacity Data
|
||||
**Status**: Only 5.9% of facilities have `power_mw` values (108/1,833)
|
||||
|
||||
**Approach**:
|
||||
- Scrape Baxtel data center database (requires paid subscription)
|
||||
- Use Data Center Map API/scraping
|
||||
- Cross-reference with utility interconnection queue filings
|
||||
- FOIA requests to state utility commissions for large loads
|
||||
|
||||
**Research Impact**:
|
||||
- Enable capacity-weighted concentration metrics (current analyses are facility-count only)
|
||||
- Correlate power capacity with demographic/environmental variables
|
||||
- Identify "hyperscale" facilities (>100 MW) vs. edge/enterprise (<10 MW)
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
# Add capacity-weighted HHI calculation to analyze_dc_tract_concentration.py
|
||||
capacity_weighted_hhi = sum((mw_i / total_mw) ** 2 for mw_i in tract_capacities)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Operator Name Deduplication
|
||||
**Status**: String fragmentation inflates operator counts ("Meta" vs. "Meta, Inc.", AWS variants)
|
||||
|
||||
**Approach**:
|
||||
- Create `operator_mapping` table with canonical names
|
||||
- Use fuzzy matching (e.g., `fuzzywuzzy` library) to standardize
|
||||
- Add `operator_canonical` column to `master_data_centers`
|
||||
|
||||
**Research Impact**:
|
||||
- Accurate hyperscaler market share analysis
|
||||
- Study operator-specific siting strategies (AWS hydro, Microsoft nuclear, Meta solar)
|
||||
- Enable "operator power" political economy analyses
|
||||
|
||||
**Script**:
|
||||
```python
|
||||
# operators_dedupe.py
|
||||
import pandas as pd
|
||||
from fuzzywuzzy import process
|
||||
|
||||
# Load unique operators
|
||||
operators = pd.read_sql("SELECT DISTINCT operator FROM master_data_centers", conn)
|
||||
|
||||
# Manual + fuzzy matching to canonical names
|
||||
canonical_map = {
|
||||
"Meta": ["Meta", "Meta, Inc.", "Meta Platforms", "Facebook"],
|
||||
"Amazon Web Services": ["AWS", "Amazon", "Amazon Web Services"],
|
||||
# ... etc.
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Water Stress Overlay
|
||||
**Status**: 257 HUC8 watersheds contain data centers; 15 watersheds hold 50% of facilities
|
||||
|
||||
**Priority**: HIGH - Critical for environmental impact analysis
|
||||
|
||||
**Approach**:
|
||||
- Join to USGS WaterWatch streamflow data
|
||||
- Add USGS Drought Watch indicators by HUC8
|
||||
- Correlate data center density with:
|
||||
- Groundwater depletion rates
|
||||
- Surface water withdrawal permits
|
||||
- Drought frequency/severity (USDM historical data)
|
||||
|
||||
**Key Watersheds for Focus**:
|
||||
- **Middle Potomac-Catoctin** (HUC8 02070008): 235 DCs (12.8% of US total) - Loudoun/Ashburn
|
||||
- **Middle Potomac-Anacostia-Occoquan** (02070010): 111 DCs - Fairfax/inner Loudoun
|
||||
- **Coyote** (18050003): 88 DCs - Silicon Valley
|
||||
- **Upper Scioto** (05060001): 73 DCs - Columbus OH
|
||||
- **Umatilla** (17070103): 29 DCs - AWS-exclusive watershed
|
||||
|
||||
**Research Questions**:
|
||||
- Are data centers sited in water-stressed watersheds?
|
||||
- Do high-density clusters (Loudoun County, Columbus OH) face water constraints?
|
||||
- Compare water stress in hyperscaler non-metro sites (Columbia River corridor) vs. metro clusters
|
||||
- Does single-operator watershed capture (Umatilla = AWS only) correlate with water availability?
|
||||
|
||||
**Tables to Create**:
|
||||
- `watershed_water_stress` - HUC8-level water stress indicators
|
||||
- `data_center_water_risk` - Per-facility water-stress exposure
|
||||
|
||||
**Notebook**: `water_stress_analysis.ipynb`
|
||||
|
||||
---
|
||||
|
||||
### 4. Opposition Cases Overlay
|
||||
**Status**: 18 geocoded opposition cases in `opposition_cases_geocoded` table (loaded but unused)
|
||||
|
||||
**Approach**:
|
||||
- Expand dataset: Compile additional cases of rejected/delayed data center proposals from news archives
|
||||
- Geocode all opposition cases, join to demographics/hazards
|
||||
- Test hypotheses:
|
||||
- Do wealthier/more educated communities successfully block projects?
|
||||
- Are opposition cases more common in water-stressed or drought-prone areas?
|
||||
- Do smaller non-metro communities have less bargaining power?
|
||||
- Does clustered vs. isolated location predict opposition likelihood?
|
||||
|
||||
**Research Questions**:
|
||||
- What predicts opposition success?
|
||||
- Are opposition cases spatially clustered?
|
||||
- Do demographics differ between accepted vs. rejected sites?
|
||||
- Correlation with FEMA hazard exposure scores?
|
||||
|
||||
**Analysis Plan**:
|
||||
```sql
|
||||
-- Join opposition cases to demographics
|
||||
SELECT o.*, ct.median_household_income, ct.bachelors_or_higher_pct
|
||||
FROM opposition_cases_geocoded o
|
||||
JOIN _dc_census_tract_acs_2024 ct
|
||||
ON ST_Contains(ct.geom, o.geom);
|
||||
```
|
||||
|
||||
**Output**: `opposition_cases_analysis.md`
|
||||
|
||||
---
|
||||
|
||||
### 5. IM3 Forward Projection Integration
|
||||
**Status**: IM3 model data loaded in `im3_state_projected_moderate_50` (328 rows) and `im3_projected_state_demand_summary` (31 rows)
|
||||
|
||||
**Approach**:
|
||||
- Load IM3 projected demand scenarios (2030, 2040, 2050)
|
||||
- Overlay projected growth with:
|
||||
- Current grid saturation (% of generation within 50 km)
|
||||
- Water stress indicators
|
||||
- Land availability (zoned industrial parcels)
|
||||
- Identify regions where projected demand may exceed infrastructure capacity
|
||||
|
||||
**Grid Saturation Context** (from current analysis):
|
||||
- **New Jersey**: 83% of grid within 50 km of DC
|
||||
- **Nevada**: 75%
|
||||
- **Tennessee**: 70%
|
||||
- **Oregon**: 68%
|
||||
- **Arizona**: 56%
|
||||
- **Virginia**: 50%
|
||||
|
||||
**Research Questions**:
|
||||
- Which states face grid saturation from data center growth?
|
||||
- Are projected sites in water-stressed watersheds?
|
||||
- Does IM3 assume spatial distribution patterns consistent with current siting?
|
||||
- Can states with >50% grid saturation accommodate projected demand?
|
||||
|
||||
**Implementation**:
|
||||
```sql
|
||||
-- Compare current saturation to IM3 projected demand
|
||||
SELECT
|
||||
current.state,
|
||||
current.dc_count,
|
||||
current.pct_grid_saturated,
|
||||
proj.facility_count AS projected_new_facilities,
|
||||
proj.total_it_mw AS projected_new_mw
|
||||
FROM state_grid_saturation current
|
||||
JOIN im3_projected_state_demand_summary proj ON current.state = proj.state
|
||||
WHERE current.pct_grid_saturated > 50
|
||||
ORDER BY current.pct_grid_saturated DESC;
|
||||
```
|
||||
|
||||
**Notebook**: `im3_projection_overlay.ipynb`
|
||||
|
||||
---
|
||||
|
||||
## Methodological Extensions
|
||||
|
||||
### 6. Time-Series Analysis of Cluster Growth
|
||||
**Approach**:
|
||||
- Use `rfs_year` (ready for service) from cable data and EIA generator vintage
|
||||
- Reconstruct data center siting over time (requires RFS dates for facilities)
|
||||
- Animate cluster formation in interactive map
|
||||
|
||||
**Research Questions**:
|
||||
- Did Ashburn VA become dominant before or after major cable landings?
|
||||
- Do clusters grow via agglomeration (new facilities near existing) or simultaneous build-out?
|
||||
- Correlation between energy infrastructure build-out and data center growth
|
||||
|
||||
**Data Needed**:
|
||||
- Facility RFS dates (scrape from press releases, Baxtel historical data)
|
||||
- Historical tract demographics (decennial Census + ACS back to 2000)
|
||||
|
||||
---
|
||||
|
||||
### 7. Network Effects: Fiber Route Proximity
|
||||
**Status**: Current analysis tests submarine cable proximity (negative result)
|
||||
|
||||
**Approach**:
|
||||
- Obtain fiber optic backbone route GIS data (from FCC, carriers, or Infrapedia)
|
||||
- Test proximity to long-haul fiber routes (not just submarine cables)
|
||||
- Hypothesis: Data centers cluster near fiber, not cables
|
||||
|
||||
**Data Sources**:
|
||||
- FCC Form 477 fiber deployment data
|
||||
- Infrapedia fiber route database
|
||||
- State-level fiber maps (e.g., Virginia Broadband Map)
|
||||
|
||||
**Expected Result**: Positive correlation (unlike submarine cables)
|
||||
|
||||
---
|
||||
|
||||
### 8. Land Use & Zoning Analysis
|
||||
**Approach**:
|
||||
- Join data centers to local zoning classifications (industrial, commercial, etc.)
|
||||
- Analyze land prices in data center tracts before/after facility construction
|
||||
- Correlate with property tax revenues
|
||||
|
||||
**Research Questions**:
|
||||
- Do data centers drive local property value increases?
|
||||
- Are they preferentially sited in already-zoned industrial areas?
|
||||
- Do host communities capture tax base growth?
|
||||
|
||||
**Data Sources**:
|
||||
- Zillow Home Value Index (ZHVI) by ZIP
|
||||
- ATTOM property tax assessments
|
||||
- Municipal zoning GIS layers (city-specific, requires scraping/FOIA)
|
||||
|
||||
---
|
||||
|
||||
### 9. Environmental Justice Scoring
|
||||
**Approach**:
|
||||
- Compare data center host tracts to EPA's EJScreen indices
|
||||
- Add CalEnviroScreen-style burden/benefit framework
|
||||
- Test if data centers increase cumulative environmental burdens
|
||||
|
||||
**Metrics**:
|
||||
- Air quality (PM2.5, ozone)
|
||||
- Hazardous waste proximity
|
||||
- Superfund site proximity
|
||||
- Heat island effect (LST from Landsat)
|
||||
- Noise pollution (traffic, cooling systems)
|
||||
|
||||
**Expected Challenge**: Data centers may improve local metrics (compared to heavy industry) but increase water/energy consumption
|
||||
|
||||
---
|
||||
|
||||
## Policy & Political Economy Research
|
||||
|
||||
### 10. Tax Incentive Analysis
|
||||
**Approach**:
|
||||
- Compile state/local tax incentives for data center siting (property tax abatements, sales tax exemptions)
|
||||
- Create `data_center_incentives` table with per-facility incentive details
|
||||
- Correlate incentive generosity with:
|
||||
- State fiscal health
|
||||
- Local government bargaining power
|
||||
- Facility size/operator
|
||||
|
||||
**Research Questions**:
|
||||
- Do weaker fiscal states offer larger incentives?
|
||||
- Are incentives regressive (larger for hyperscalers)?
|
||||
- Do incentives predict siting decisions (natural experiment approach)?
|
||||
|
||||
**Data Sources**:
|
||||
- Good Jobs First Subsidy Tracker
|
||||
- State economic development agency press releases
|
||||
- Local news archives
|
||||
|
||||
---
|
||||
|
||||
### 11. Employment & Labor Market Effects
|
||||
**Approach**:
|
||||
- Join to BLS Quarterly Census of Employment and Wages (QCEW) by ZIP/county
|
||||
- Identify "data center construction boom" periods (before/after major facility openings)
|
||||
- Analyze employment effects in:
|
||||
- Construction (NAICS 23)
|
||||
- Transportation/warehousing (NAICS 48-49)
|
||||
- Professional services (NAICS 54)
|
||||
|
||||
**Research Questions**:
|
||||
- Do data centers create durable local employment?
|
||||
- Are jobs filled by local residents or commuters?
|
||||
- Wage effects in host tracts?
|
||||
|
||||
**Data Sources**:
|
||||
- BLS QCEW
|
||||
- Census LEHD Origin-Destination Employment Statistics (LODES)
|
||||
|
||||
---
|
||||
|
||||
### 12. Energy Cost Pass-Through
|
||||
**Approach**:
|
||||
- Join to state-level electricity rate data (EIA, utility rate tracker)
|
||||
- Test if data center density correlates with residential rate increases
|
||||
- Natural experiment: Compare rate trajectories in high-DC vs. low-DC states
|
||||
|
||||
**Research Questions**:
|
||||
- Do data centers drive residential rate increases (capacity cost allocation)?
|
||||
- Are rate increases concentrated in utility service territories with large data center loads?
|
||||
- Do states with retail choice (deregulated markets) see different effects?
|
||||
|
||||
**Data Sources**:
|
||||
- EIA Form 861 (retail rates by state/utility)
|
||||
- Utility rate case filings (state public utility commissions)
|
||||
|
||||
---
|
||||
|
||||
## Data Quality & Infrastructure Improvements
|
||||
|
||||
### 13. Address Validation & Geocoding Refinement
|
||||
**Approach**:
|
||||
- Re-geocode the 45 facilities using city-precision fallback
|
||||
- Use USPS address validation API
|
||||
- Cross-reference with Google Maps satellite imagery (manual review)
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
# Re-run geocoding with stricter thresholds
|
||||
python3 load_postgis_data_centers.py --revalidate-addresses
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 14. OSM Continuous Monitoring
|
||||
**Approach**:
|
||||
- Set up automated Overpass API queries (daily/weekly)
|
||||
- Detect new OSM data center tags
|
||||
- Alert for review + merge into `master_data_centers`
|
||||
|
||||
**Implementation**:
|
||||
- Cron job running `load_postgis_osm_data_centers.py --update-only`
|
||||
- Slack/email notification on new facilities
|
||||
|
||||
---
|
||||
|
||||
### 15. Broadband Speed Validation
|
||||
**Approach**:
|
||||
- Cross-reference FCC BDC provider data with Ookla Speedtest results
|
||||
- Test if data center host tracts have faster actual speeds (not just availability)
|
||||
|
||||
**Hypothesis**: Data center presence correlates with infrastructure investment → higher speeds
|
||||
|
||||
**Data Sources**:
|
||||
- Ookla Open Data (aggregated Speedtest results by tile)
|
||||
- FCC Measuring Broadband America
|
||||
|
||||
---
|
||||
|
||||
## Visualization & Communication
|
||||
|
||||
### 16. Interactive Story Map
|
||||
**Approach**:
|
||||
- Build Scrollama.js narrative map
|
||||
- Sections:
|
||||
1. National overview (cluster map)
|
||||
2. Ashburn VA zoom (dominance of single region)
|
||||
3. Demographics comparison (host vs. national)
|
||||
4. Water stress hot spots
|
||||
5. Energy infrastructure saturation
|
||||
|
||||
**Output**: `story_map.html` (standalone web page)
|
||||
|
||||
---
|
||||
|
||||
### 17. Policy Brief Generation
|
||||
**Approach**:
|
||||
- Auto-generate policy briefs from analysis outputs
|
||||
- Targeted audiences:
|
||||
- State legislators (energy/water policy)
|
||||
- Local governments (tax incentive negotiation)
|
||||
- Environmental justice advocates
|
||||
|
||||
**Template**:
|
||||
```markdown
|
||||
# Data Center Siting in [STATE]: Key Facts for Policymakers
|
||||
|
||||
- **[STATE] hosts X% of US data centers** (rank: #Y)
|
||||
- **Host communities are Z% wealthier** than state average
|
||||
- **A% of state generation is within 50 km of a data center**
|
||||
- **Top watershed holds B facilities** (water stress: [HIGH/MEDIUM/LOW])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 18. Comparative International Analysis
|
||||
**Approach**:
|
||||
- Extend methodology to EU, Canada, Australia
|
||||
- Compare siting patterns (e.g., Nordic countries = renewable energy, cold climate)
|
||||
- Test if "concentrated costs / dispersed benefits" holds internationally
|
||||
|
||||
**Data Sources**:
|
||||
- OpenStreetMap (global coverage)
|
||||
- Eurostat demographics
|
||||
- IEA energy data
|
||||
- TeleGeography global cable data (already available)
|
||||
|
||||
**Research Questions**:
|
||||
- Are US patterns unique (tax-driven siting) vs. EU (regulatory constraints)?
|
||||
- Do Nordic countries see more equitable distribution?
|
||||
|
||||
---
|
||||
|
||||
## Speculative / Long-Term Ideas
|
||||
|
||||
### 19. AI Demand Forecasting
|
||||
**Approach**:
|
||||
- Train ML model to predict data center siting
|
||||
- Features: demographics, energy capacity, fiber proximity, tax rates, water availability
|
||||
- Test on historical data (train on pre-2015, test on 2015-2025)
|
||||
|
||||
**Use Case**:
|
||||
- Identify "likely future sites" for proactive policy intervention
|
||||
- Warn communities of potential incoming projects
|
||||
|
||||
---
|
||||
|
||||
### 20. Cooling Technology Analysis
|
||||
**Approach**:
|
||||
- Classify facilities by cooling type (air, water, hybrid)
|
||||
- Correlate with:
|
||||
- Climate (CDD: cooling degree days)
|
||||
- Water availability
|
||||
- Facility size
|
||||
|
||||
**Data Sources**:
|
||||
- Manual classification from news/press releases
|
||||
- FOIA requests to water utilities (cooling water withdrawal permits)
|
||||
|
||||
**Research Questions**:
|
||||
- Are water-cooled facilities concentrated in water-stressed regions (paradox)?
|
||||
- Do hyperscalers use more efficient cooling (e.g., Meta's Prineville OR evaporative cooling)?
|
||||
|
||||
---
|
||||
|
||||
### 21. Bitcoin Mining Facilities
|
||||
**Approach**:
|
||||
- Overlay cryptocurrency mining facilities (subset of "data centers")
|
||||
- Compare siting patterns: Bitcoin mines prefer low electricity costs (WA, TX, NY hydro)
|
||||
- Test if Bitcoin mines face more opposition (negative perception)
|
||||
|
||||
**Data Sources**:
|
||||
- Cambridge Bitcoin Electricity Consumption Index (has facility locations)
|
||||
- News archives of mining farm proposals/rejections
|
||||
|
||||
---
|
||||
|
||||
### 22. Disaster Resilience & Redundancy
|
||||
**Approach**:
|
||||
- Model simultaneous hazard exposure across data center clusters
|
||||
- E.g., "What % of US data centers are in wildfire risk zones?"
|
||||
- Identify single points of failure (e.g., Ashburn VA = 20% of US capacity)
|
||||
|
||||
**Research Questions**:
|
||||
- Is the current spatial distribution resilient to climate change?
|
||||
- Should policy incentivize geographic diversification?
|
||||
|
||||
**Output**: `disaster_resilience_report.md`
|
||||
|
||||
---
|
||||
|
||||
### 23. Edge Data Center Network
|
||||
**Approach**:
|
||||
- Separately analyze edge facilities (<1 MW) vs. hyperscale (>100 MW)
|
||||
- Test if edge DCs follow different siting logic (population density > energy cost)
|
||||
|
||||
**Data Challenge**: Current inventory does not distinguish edge vs. hyperscale (need `power_mw` backfill)
|
||||
|
||||
---
|
||||
|
||||
### 24. Carbon Intensity of Host Grids
|
||||
**Approach**:
|
||||
- Join to EPA eGRID subregion carbon intensity (lb CO₂/MWh)
|
||||
- Calculate per-facility estimated carbon footprint (if `power_mw` available)
|
||||
- Compare to corporate renewable energy procurement (RECs, PPAs)
|
||||
|
||||
**Research Questions**:
|
||||
- Are data centers disproportionately in high-carbon grids?
|
||||
- Do hyperscaler renewable commitments offset grid carbon?
|
||||
|
||||
**Data Sources**:
|
||||
- EPA eGRID
|
||||
- Corporate sustainability reports (Google, Microsoft, Meta, AWS)
|
||||
|
||||
---
|
||||
|
||||
## Collaboration Opportunities
|
||||
|
||||
### Academic Partnerships
|
||||
- **Energy researchers**: Joint analysis of grid saturation + IM3 projections
|
||||
- **Environmental justice scholars**: EJScreen overlay + opposition case studies
|
||||
- **Political scientists**: Tax incentive analysis + local government bargaining power
|
||||
|
||||
### Policy Stakeholders
|
||||
- **State energy offices**: Share grid saturation maps
|
||||
- **Water resource agencies**: Watershed analysis for permitting
|
||||
- **Local governments**: Demographic/tax revenue analysis for negotiation leverage
|
||||
|
||||
### Industry Engagement
|
||||
- **Data center operators**: Validate facility data, discuss siting criteria
|
||||
- **Colocation providers**: Access to tenant mix data (multi-tenant vs. single-tenant)
|
||||
|
||||
---
|
||||
|
||||
## Tools & Infrastructure Improvements
|
||||
|
||||
### Database Enhancements
|
||||
- Add `version` column to track data vintage
|
||||
- Implement `audit_log` table for data lineage
|
||||
- Set up automated backups to S3/Azure Blob
|
||||
|
||||
### Code Quality
|
||||
- Add unit tests for geocoding functions
|
||||
- Create `config.yaml` for database credentials (replace hardcoded env vars)
|
||||
- Dockerize analysis environment for reproducibility
|
||||
|
||||
### Documentation
|
||||
- Add JSDoc-style comments to all Python functions
|
||||
- Create `CONTRIBUTING.md` for external collaborators
|
||||
- Record Jupyter notebook walkthroughs (video tutorials)
|
||||
|
||||
---
|
||||
|
||||
## Unfunded / Ambitious Ideas
|
||||
|
||||
### 25. Real-Time Energy Monitoring
|
||||
- Partner with utility to get live load data from data center substations
|
||||
- Build dashboard showing real-time energy consumption by facility
|
||||
- Correlate with AWS/Azure/GCP service outages (reverse-engineer capacity from brownouts)
|
||||
|
||||
### 26. Social Media Sentiment Analysis
|
||||
- Scrape Twitter/Reddit for mentions of local data center projects
|
||||
- NLP sentiment analysis: support vs. opposition
|
||||
- Correlate sentiment with facility approval outcomes
|
||||
|
||||
### 27. LIDAR Analysis of Cooling Infrastructure
|
||||
- Use aerial LIDAR to measure rooftop cooling equipment volume
|
||||
- Proxy for facility size (cooling = f(IT load))
|
||||
- Build predictive model: cooling equipment → power capacity
|
||||
|
||||
---
|
||||
|
||||
## Contact & Contributions
|
||||
|
||||
If you're interested in collaborating on any of these research directions, please contact the repository owner.
|
||||
|
||||
**Priorities for external collaboration**:
|
||||
1. Power capacity data acquisition
|
||||
2. Water stress/drought overlay
|
||||
3. Opposition cases database compilation
|
||||
4. International comparative analysis
|
||||
|
||||
---
|
||||
|
||||
## References for Future Work
|
||||
|
||||
### Data Sources to Explore
|
||||
- **Department of Energy**: Grid resilience reports, interconnection queues
|
||||
- **NREL**: Renewable energy potential by HUC (solar, wind)
|
||||
- **USDA**: Agricultural water use by county (competition for water)
|
||||
- **NOAA**: Climate normals + projections by grid cell
|
||||
- **BLS**: QCEW employment data, wage data
|
||||
- **EPA**: eGRID, EJScreen, Superfund sites
|
||||
|
||||
### Academic Literature Gaps
|
||||
- Limited peer-reviewed research on data center spatial concentration
|
||||
- No published studies on water stress exposure of data centers
|
||||
- Opportunity for "first mover" publication in major geography/planning journals
|
||||
|
||||
### Policy Levers to Investigate
|
||||
- State renewable portfolio standards (RPS) → data center siting
|
||||
- Federal infrastructure investment (IRA, CHIPS Act) → energy grid capacity
|
||||
- Local zoning reform (industrial land use restrictions)
|
||||
|
||||
---
|
||||
|
||||
## Legislative Analysis (LegiScan Data)
|
||||
|
||||
**Status**: Data loaded — 1.3M bills across all US states + federal, 2016–2026; ~60K tagged relevant.
|
||||
**Tables**: `legiscan_sessions`, `legiscan_bills`
|
||||
**Query file**: `query_legiscan_bills.sql`
|
||||
|
||||
### Research Questions
|
||||
|
||||
**1. Ratepayer Cost Shifting**
|
||||
Do states with high data center density show more legislative activity on ratepayer protection?
|
||||
- Join `legiscan_bills WHERE 'ratepayer_protection' = ANY(relevance_tags)` to `master_data_centers` counts by state
|
||||
- Test correlation between DC concentration and # of ratepayer bills introduced/passed
|
||||
- Compare outcomes: do high-DC states pass or fail more ratepayer protections?
|
||||
|
||||
**2. Data Center Legislative Wave**
|
||||
Is there a measurable increase in DC-specific legislation after 2022 (AI boom)?
|
||||
- Trend `data_center` and `large_load` tagged bills by `year_start`
|
||||
- Cross-reference with major AI facility announcements (2022–2025)
|
||||
|
||||
**3. Tax Incentive Geography**
|
||||
Which states enacted tax incentives that may have influenced DC location decisions?
|
||||
- `tax_incentive` bills with `status IN (4,8)` (passed/chaptered)
|
||||
- Overlay with `master_data_centers` growth by state over the same period
|
||||
- Candidate for difference-in-differences analysis
|
||||
|
||||
**4. Grid Interconnection Policy**
|
||||
Do states with `grid_impact` legislation show different EIA capacity expansion patterns?
|
||||
- Join relevant bills to `energy_eia_operating_generator_capacity_flat` by state
|
||||
- Look for correlations between grid policy activity and nameplate MW additions
|
||||
|
||||
**5. Siting Preemption vs. Local Control**
|
||||
Are states passing bills to streamline or restrict local siting authority?
|
||||
- Full-text search within `siting_permitting` bills for "preemption" vs. "local control"
|
||||
- Map bill outcomes by state political environment (cross-ref RDH vote data)
|
||||
|
||||
### Suggested Joins
|
||||
|
||||
```sql
|
||||
-- States with DCs and legislative activity by topic
|
||||
SELECT
|
||||
dc.state,
|
||||
COUNT(DISTINCT dc.id) AS data_centers,
|
||||
COUNT(DISTINCT lb.bill_id) FILTER (WHERE 'data_center' = ANY(relevance_tags)) AS dc_bills,
|
||||
COUNT(DISTINCT lb.bill_id) FILTER (WHERE 'ratepayer_protection'= ANY(relevance_tags)) AS ratepayer_bills,
|
||||
COUNT(DISTINCT lb.bill_id) FILTER (WHERE 'tax_incentive' = ANY(relevance_tags)
|
||||
AND lb.status IN (4,8)) AS tax_incentives_passed
|
||||
FROM master_data_centers dc
|
||||
LEFT JOIN legiscan_bills lb ON dc.state = lb.state AND lb.is_relevant
|
||||
GROUP BY dc.state
|
||||
ORDER BY data_centers DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: May 2026
|
||||
Reference in New Issue
Block a user