Cross-tabs normalized data-center operator (owner) against the leading
ACS 2024 workforce industry of each enrichment geography (ZCTA and census
tract). Emits raw-count and row-percentage CSVs for both geographies.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirrors create_data_center_census_tract_table.py but at ZIP Code
Tabulation Area geography (2020 boundary vintage, since ZCTAs are only
redrawn each decennial census). Builds data_center_zcta_2024 (607 ZCTAs
hosting >=1 facility, joined to ACS 2024 5-year demographics) and adds
master_data_centers.zcta_geoid, parallel to the existing tract geoid
column. Used to verify the income/education premium for DC host
communities holds at ZIP-code resolution, not just census-tract
resolution, for the dc-siting-politics paper.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update 8 scripts to use Path(__file__).parent.parent as PROJECT_ROOT
so they resolve data/, output/, and internet_cables/ relative to the
project root rather than the caller's working directory.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move all Python scripts to scripts/, documentation to docs/, raw input
data to data/, and generated HTML/CSV outputs to output/. Update path
references in 8 scripts to use Path(__file__).parent.parent as project
root so they work correctly from the new location. Update README links
and quick-start commands accordingly. Notebooks remain at root.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ingest_legiscan.py to pull all US state + federal bills (2016-2026)
from the LegiScan API into legiscan_sessions and legiscan_bills tables.
Bills are keyword-tagged across 8 research categories (data_center,
ratepayer_protection, large_load, grid_impact, tax_incentive, etc.).
Loads ~1.3M bills; ~60K tagged relevant. Adds query_legiscan_bills.sql
with pre-built analysis queries including state/DC joins. Updates
database-tables.md, README.md, and research-ideas.md accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add clustered vs isolated facility comparison to README
- Expand infrastructure insights with hyperscaler energy strategies
- Document additional database tables (opposition cases, IM3 projections, utility rates)
- Enhance research ideas with specific watershed names and grid saturation data
- Add data quality notes about EIA longitude corrections
- Reference loaded but unused tables for future analysis
Extends the demographic/RUCA/energy summary with two new sections:
- §7 quantifies each top-DC state's "share of state capacity within
50 km of a DC," surfacing NJ (83%), NV (75%), TN (70%), and OR (68%)
as the most DC-saturated grids — reframing the canonical VA-centric
story by structural entanglement rather than raw count.
- §9 inventories every table in the data_centers schema with a
one-line description, flagging cleanup candidates and unused layers
for downstream work.
Also renumbers watershed analysis to §8, adds the SEDS row to the
dataset coverage table, and narrows next-step #4 to the IM3 projection
overlay (now that the SEDS join is complete).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three coordinated changes:
- Request nameplate, summer, and winter capacity from the EIA
operating-generator-capacity endpoint and project them as typed columns
on energy_eia_operating_generator_capacity_flat. The original ingest
only pulled latitude and longitude, leaving the flat table with no MW
values despite its name.
- New cluster_analysis.ipynb joins master_data_centers to ACS-2024
demographics, USDA RUCA-2020 codes (loaded from new/), and EIA
generation capacity within 50 km of each site.
- Summary doc consolidates the headline findings: DC tracts skew higher
income / more educated / more racially diverse than US average, the
metro over-index is only 1.11x, the non-metro tail is dominated by
hyperscalers in the Columbia River corridor (OR+WA = 66% of non-metro
DCs), and Microsoft co-locates with Palo Verde Nuclear in Goodyear AZ.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>