30 KiB
Database Tables Documentation
Database Configuration
Database Name: data_centers
Type: PostgreSQL with PostGIS extension
Connection: Environment variables from ~/.zsh_secrets
PGWEB_HOST: Database hostPGWEB_PORT: Database port (5433)PGWEB_USER: Database userPGWEB_PASSWORD: Database passwordPGWEB_DATABASE: Database name (data_centers)
Table Organization
Tables are organized into six categories:
- Core Data Center Tables - Master inventories and source data
- Enrichment Tables - Data centers joined with contextual data
- Environmental and Election Source Tables - Long-form climate, drought, fire/smoke, and precinct-election source layers
- Base Layer Tables - Geographic and demographic reference layers
- Infrastructure Tables - Energy and connectivity infrastructure
- Legislation Tables - LegiScan state and federal bill data (2016-2026)
Core Data Center Tables
master_data_centers
Rows: 1,833
Purpose: Canonical data center inventory - deduplicated merge of curated + OSM sources
Key Columns:
id(INTEGER) - Unique identifiername(TEXT) - Facility nameaddress(TEXT) - Street addresscity(TEXT) - Citystate(TEXT) - State codelatitude(DOUBLE PRECISION) - Latitudelongitude(DOUBLE PRECISION) - Longitudegeom(GEOMETRY) - PostGIS point geometry (EPSG:4326)operator(TEXT) - Operator/ownerpower_mw(DOUBLE PRECISION) - Power capacity in megawatts (sparse: 5.9% populated)source(TEXT) - Data source (curated,osm, orboth)osm_id(TEXT) - OpenStreetMap ID if applicablegeocode_method(TEXT) - Geocoding provenance
Notes:
- 108 of 1,833 facilities have power ratings
- 45 facilities use city-precision fallback coordinates
- Operator strings have fragmentation issues ("Meta" vs. "Meta, Inc.")
us_dc_sample_geocoded
Rows: 1,489
Purpose: Original curated sample with geocoding provenance (superseded by master_data_centers)
Key Columns:
name,address,city,state,ziplatitude,longitude,geomoperator,power_mwcensus_lat,census_lon- Census TIGER geocode resultsnominatim_lat,nominatim_lon- Nominatim fallback resultsgeocode_source- Which geocoder was used
osm_data_centers
Rows: 1,549
Purpose: Raw OpenStreetMap-derived facilities
Key Columns:
osm_id(TEXT) - OSM element IDosm_type(TEXT) -node,way, orrelationname(TEXT) - OSM name taglatitude,longitude,geomtags(JSONB) - All OSM tags as JSONoperator(TEXT) - Extracted from OSM tagscity,state,country
Notes: Fetched via Overpass API with query for telecom=data_center or building=data_center
master_data_center_spatial_clusters
Rows: 1,831
Purpose: DBSCAN cluster assignments for master data centers
Key Columns:
- All columns from
master_data_centers cluster_id(INTEGER) - Cluster assignment (-1 = noise/singleton)cluster_size(INTEGER) - Number of facilities in clustercluster_label(TEXT) - Human-readable cluster name
Notes: DBSCAN parameters: eps=15 km, min_samples=2
Enrichment Tables
data_center_census_tracts_2024
Rows: 1,815
Purpose: Per-facility demographics from containing Census tract
Key Columns:
- All columns from
master_data_centers geoid(TEXT) - 11-digit Census tract GEOIDstate_fips,county_fips,tract- Population:
total_population,population_density_sq_mi - Age:
median_age,under_18_pct,over_65_pct - Race/Ethnicity:
white_nh_pct,black_nh_pct,asian_nh_pct,hispanic_pct - Economics:
median_household_income,per_capita_income,poverty_rate - Education:
bachelors_or_higher_pct,high_school_or_higher_pct - Housing:
median_home_value,median_rent,homeownership_rate - Broadband:
broadband_pct- Households with broadband subscription
Source: ACS 2024 5-year estimates
Notes:
- 18 of 1,833 facilities failed tract join (geocoding issues)
- Data from
_dc_census_tract_acs_2024base table
data_center_watershed_huc8
Rows: 1,833
Purpose: Per-facility watershed assignment
Key Columns:
- All columns from
master_data_centers huc8(TEXT) - 8-digit Hydrologic Unit Codewatershed_name(TEXT) - Watershed namewatershed_area_sq_km(DOUBLE PRECISION)states(TEXT) - States intersecting watershed
Source: USGS Watershed Boundary Dataset
Notes: 257 unique HUC8 watersheds contain at least one data center
data_center_nri_exposure
Rows: 1,833
Purpose: Per-facility FEMA National Risk Index hazard exposure scores
Key Columns:
- All columns from
master_data_centers nri_id(TEXT) - Census tract GEOID (matchesgeoidfrom demographics)risk_score(DOUBLE PRECISION) - Overall NRI risk scoresocial_vulnerability(DOUBLE PRECISION) - Social vulnerability index- Hazard-specific risk scores (18 hazards):
avalanche_risk,coastal_flooding_risk,cold_wave_riskdrought_risk,earthquake_risk,hail_riskheat_wave_risk,hurricane_risk,ice_storm_risklandslide_risk,lightning_risk,riverine_flooding_riskstrong_wind_risk,tornado_risk,tsunami_riskvolcanic_activity_risk,wildfire_risk,winter_weather_risk
Source: FEMA National Risk Index (December 2025 release)
data_center_historical_climate
Rows: 1,833
Purpose: One-row-per-facility historical climate summary for data center locations
Key Columns:
master_id(TEXT) - FK tomaster_data_centerssource,name,operator,city,state,countrylatitude,longitude,geomdaymet_dataset_version,gridmet_dataset_versionclimate_period_start,climate_period_end- Current period: 1991-01-01 to 2020-12-31- Temperature:
mean_annual_temperature_c,mean_summer_temperature_c,max_daily_temperature_c,min_daily_temperature_c - Humidity / wet bulb:
mean_relative_humidity_pct,mean_wet_bulb_temperature_c,max_wet_bulb_temperature_c,extreme_wet_bulb_days - Cooling / heat:
cooling_degree_days_c,annual_cooling_degree_days_c_mean,extreme_heat_days,annual_extreme_heat_days_mean - Precipitation:
precipitation_total_mm,annual_precipitation_mm_mean,annual_precipitation_cv,wet_day_precipitation_p95_mm - Wind:
mean_wind_speed_ms,max_daily_mean_wind_speed_ms,sustained_wind_days,annual_sustained_wind_days_mean
Source: Daymet + gridMET historical climate data
Notes: Built by historical_climate_data_centers.ipynb / open_meteo_historical_data_centers.ipynb
data_center_usdm_drought_exposure
Rows: 1,833
Purpose: Per-facility drought exposure summary from weekly U.S. Drought Monitor polygons
Key Columns:
master_id(TEXT) - FK tomaster_data_centerssource,name,operator,city,state,countrylatitude,longitude,geomusdm_status-coveredorno_coveragedrought_period_start,drought_period_end- Current period: 2000-01-04 to 2025-12-30weeks_observedweeks_in_d0_or_worse,weeks_in_d1_or_worse,weeks_in_d2_or_worse,weeks_in_d3_or_worse,weeks_in_d4pct_weeks_in_d0_or_worse,pct_weeks_in_d1_or_worse,pct_weeks_in_d2_or_worse,pct_weeks_in_d3_or_worse,pct_weeks_in_d4worst_dm_category,mean_dm_categorylongest_d0_streak_weeks,longest_d2_streak_weeks,longest_d3_streak_weeks
Source: U.S. Drought Monitor weekly spatial data
Notes:
- Summary table is rolled up from
data_center_usdm_drought_dc_week dm_categoryscale: D0-D4, stored as 0-4- 1,830 facilities have covered status; 3 have no coverage
data_center_hms_smoke_exposure
Rows: 1,833
Purpose: Per-facility wildfire-smoke exposure summary from NOAA HMS smoke polygons
Key Columns:
master_id(TEXT) - FK tomaster_data_centerssource,name,operator,city,state,countrylatitude,longitude,geomhms_statussmoke_period_start,smoke_period_end- Current period: 2005-08-05 to 2026-05-22days_observeddays_with_any_smoke,days_with_light_or_worse,days_with_medium_or_worse,days_with_heavy_smokepct_days_with_any_smoke,pct_days_with_light_or_worse,pct_days_with_medium_or_worse,pct_days_with_heavy_smokeworst_density_rank,worst_density,mean_density_ranklongest_any_smoke_streak_days,longest_medium_or_heavy_streak_days,longest_heavy_smoke_streak_days
Source: NOAA Hazard Mapping System (HMS) smoke polygons
Notes:
- Summary table is rolled up from
data_center_hms_smoke_dc_day - Density rank: 0 = observed no smoke, 1 = Light, 2 = Medium, 3 = Heavy
- HMS product path uses NOAA's
/FIRE/web/HMS/Smoke_Polygons/archive
data_center_election_context
Rows: 1,833
Purpose: Standardized one-row-per-facility election context derived from RDH precinct matches
Key Columns:
master_id(TEXT) - FK tomaster_data_centersname,city,staterdh_layer_titleprecinct_identifier_nameelection_year,officedemocratic_votes,republican_votes,total_votesturnout_or_vote_shareupdated_at
Source: Redistricting Data Hub precinct election shapefiles
Notes:
- Built from
data_center_rdh_precinct_vote_matchesplus RDH feature properties - Current rows cover 2020-2024 election layers; 1,829 facilities have non-null election year context
data_center_rdh_precinct_vote_matches
Rows: 3,330
Purpose: Spatial join bridge between data centers and RDH precinct vote features
Key Columns:
master_id(TEXT) - FK tomaster_data_centersfeature_id(TEXT) - FK tordh_precinct_vote_featureslayer_id(TEXT) - FK tordh_precinct_vote_layersstate_codejoin_methodmatch_distance_mmatched_at
Source: Redistricting Data Hub precinct shapefiles
Notes: Spatial join to voting precincts (point-in-polygon, with nearest/fallback logic where needed)
Environmental and Election Source Tables
usdm_drought_weekly
Rows: 12,080
Purpose: Raw weekly U.S. Drought Monitor polygons by drought category
Key Columns:
id(BIGINT) - Primary keyweek_date(DATE)dm_category(SMALLINT) - Drought Monitor category D0-D4 stored as 0-4objectid,shape_leng,shape_areageom(GEOMETRY) - Drought polygon geometry
Source: U.S. Drought Monitor spatial archive
Notes: Source table for data_center_usdm_drought_dc_week
data_center_usdm_drought_dc_week
Rows: ~2.48 million
Purpose: Long-form weekly drought exposure for each covered data center
Key Columns:
master_id(TEXT) - FK tomaster_data_centersweek_date(DATE)worst_dm(SMALLINT) - Worst drought category covering the facility that week
Source: Spatial join of master_data_centers to usdm_drought_weekly
Notes:
- Primary key: (
master_id,week_date) worst_dm = -1indicates an observed week with no drought polygon covering the facility
hms_smoke_days
Rows: 7,075
Purpose: One row per observed NOAA HMS smoke product day, including zero-polygon days
Key Columns:
smoke_date(DATE) - Primary keysource,source_file,source_urlfeature_count(INTEGER) - Number of smoke polygons for the dayfetched_at,updated_at
Source: NOAA HMS smoke polygon archive
Notes: Denominator table for daily smoke-exposure percentages
hms_smoke_daily
Rows: 536,286
Purpose: Raw daily NOAA HMS smoke polygons with density categories
Key Columns:
id(BIGINT) - Primary keysmoke_date(DATE) - FK tohms_smoke_dayssatellitestart_raw,end_raw,start_utc,end_utcdensity,density_ranksource,source_file,source_urlgeom(GEOMETRY) - Smoke polygon geometry
Source: NOAA Hazard Mapping System (HMS) smoke polygons
Notes: Density rank 1-3 corresponds to Light, Medium, Heavy
data_center_hms_smoke_dc_day
Rows: ~13.9 million
Purpose: Long-form daily smoke exposure for each data center and observed HMS product day
Key Columns:
master_id(TEXT) - FK tomaster_data_centerssmoke_date(DATE) - FK tohms_smoke_daysmax_density_rank(SMALLINT) - Maximum smoke density covering the facility on that datepolygon_hits(INTEGER)
Source: Spatial join of master_data_centers to hms_smoke_daily
Notes:
- Primary key: (
master_id,smoke_date) max_density_rank = 0indicates an observed HMS day with no smoke polygon covering the facility
rdh_precinct_vote_layers
Rows: 69
Purpose: Metadata for downloaded RDH precinct election layers
Key Columns:
layer_id(TEXT) - Primary keystate_codetitleformatdatasetidsource_urlfilename,local_path,spatial_pathmetadata(JSONB)loaded_at
Source: Redistricting Data Hub precinct election datasets
Notes: Current loaded layers cover 45 distinct state codes
rdh_precinct_vote_features
Rows: 260,953
Purpose: Staged RDH precinct polygons and source attributes
Key Columns:
feature_id(TEXT) - Primary keylayer_id(TEXT) - FK tordh_precinct_vote_layersstate_codesource_rowproperties(JSONB) - Raw RDH election attributesgeom(GEOMETRY) - Precinct polygon geometry
Source: Redistricting Data Hub precinct election shapefiles
Notes: Source feature table for data_center_rdh_precinct_vote_matches
Base Layer Tables
_dc_census_tract_acs_2024
Rows: 85,382
Purpose: ACS 2024 demographics for all Census tracts in states with data centers
Key Columns:
geoid(TEXT) - 11-digit tract GEOID (PRIMARY KEY)name(TEXT) - Tract namestate_fips,county_fips,tract- Full ACS 5-year estimates (85+ columns):
- Population by age, sex, race/ethnicity
- Households, families, housing units
- Income, poverty, education, employment
- Housing values, rents, costs
- Broadband, computer access
- Commuting, vehicles
Source: Census ACS 2024 5-year estimates API
Notes: Universe limited to 46 states with data centers (excludes DC-free states)
_dc_census_tract_boundaries_2024
Rows: 85,058
Purpose: TIGER 2024 tract polygons for data center states
Key Columns:
geoid(TEXT) - 11-digit tract GEOIDname(TEXT) - Tract namestate_fips,county_fips,tract_codegeom(GEOMETRY) - Polygon geometry (EPSG:4326)area_land_sq_m(DOUBLE PRECISION) - Land area in square metersarea_water_sq_m(DOUBLE PRECISION) - Water area in square meters
Source: Census TIGER/Line 2024
ruca_codes_2020_tract
Rows: 85,528
Purpose: USDA Rural-Urban Commuting Area codes for metro/rural classification
Key Columns:
geoid(TEXT) - 11-digit tract GEOID (matches Census tracts)ruca_code(TEXT) - Primary RUCA code (1-10)ruca_category(TEXT) - Simplified category:Metropolitan(codes 1-3)Micropolitan(codes 4-6)Small town(codes 7-9)Rural(code 10)
ruca_description(TEXT) - Full RUCA code descriptionpopulation_2020(INTEGER)
Source: USDA Economic Research Service RUCA 2020
Notes:
- Based on 2020 Census tracts and 2010-2020 commuting patterns
- 7 data centers failed RUCA join (Puerto Rico / non-US)
watershed_huc8
Rows: 2,139
Purpose: USGS HUC8 subbasin polygons for water-stress analysis
Key Columns:
huc8(TEXT) - 8-digit Hydrologic Unit Code (PRIMARY KEY)name(TEXT) - Watershed namegeom(GEOMETRY) - Polygon geometry (EPSG:4326)area_sq_km(DOUBLE PRECISION)states(TEXT) - Comma-separated state codesdc_count(INTEGER) - Number of data centers in watershed
Source: USGS Watershed Boundary Dataset
Notes:
- 257 of 2,139 watersheds contain at least one data center
- Top 15 watersheds contain 50% of all US data centers
nri_census_tracts
Rows: ~84,000
Purpose: Full FEMA National Risk Index by Census tract
Key Columns:
nri_id(TEXT) - Census tract GEOIDstate_name,county_name,tract_name- 460+ columns including:
- Overall risk scores and ratings
- Expected annual loss (dollars and building value %)
- Social vulnerability components (15 factors)
- Community resilience score
- Individual hazard risk scores (18 hazards)
- Exposure, annualized frequency, historic loss ratios per hazard
Source: FEMA National Risk Index v2.1 (December 2025)
Notes:
- Massive table with comprehensive natural hazard risk data
- Join to data centers via
geoidfield - See FEMA NRI Technical Documentation
Infrastructure Tables
Energy Infrastructure
energy_eia_operating_generator_capacity_flat
Rows: 4.7 million
Purpose: EIA generator inventory with lat/lon/MW (monthly 2008-2026)
Key Columns:
plant_id(INTEGER) - EIA plant IDgenerator_id(TEXT) - Generator unit IDplant_name(TEXT)latitude,longitude,geomstate,countyutility_name,operator_namenameplate_capacity_mw(DOUBLE PRECISION)technology(TEXT) - Generation technologyenergy_source_1,energy_source_2- Primary fuel codesoperating_month,operating_year- When unit became operationalstatus(TEXT) - Operating, standby, retired, etc.report_month,report_year- Data snapshot date
Source: EIA Form 860 via API
Notes:
- "Flat" means denormalized for fast spatial queries
- Each generator-month is a row (4.7M rows from monthly snapshots)
- Use for proximity analysis (e.g., "all generators within 50 km of data center")
energy_eia_facility_fuel_flat
Rows: Not loaded yet
Purpose: Monthly generation by plant/fuel
Key Columns:
plant_id,plant_namereport_month,report_yearenergy_source(TEXT) - Fuel codenet_generation_mwh(DOUBLE PRECISION)fuel_consumed_mmbtu(DOUBLE PRECISION)
Source: EIA Form 923 via API
Notes: Target table defined in ingest_eia_energy_layers.py; current database does not yet have public.energy_eia_facility_fuel_flat.
energy_eia_seds_flat
Rows: 2.57 million
Purpose: Annual state energy consumption/production (1960-2024)
Key Columns:
state_code(TEXT)year(INTEGER)msn(TEXT) - Mnemonic series names (e.g.,TETCB= total energy consumption)value(DOUBLE PRECISION) - Energy in trillion BTUunit(TEXT)description(TEXT) - Human-readable MSN description
Source: EIA State Energy Data System (SEDS)
Notes:
- Annual aggregates by state
- Use for state-level energy context analysis
Connectivity Infrastructure
internet_cables
Rows: 693
Purpose: Submarine cable routes
Key Columns:
cable_id(TEXT) - Unique cable identifiercable_name(TEXT) - Official cable namegeom(GEOMETRY) - LineString geometry (EPSG:4326)rfs_year(INTEGER) - Ready For Service yearlength_km(DOUBLE PRECISION)owners(TEXT[]) - Array of owner nameslanding_points(TEXT[]) - Array of landing point names
Source: TeleGeography-style cable database
Notes:
- 693 unique submarine cables
- Geometry is approximate route (not exact seabed path)
internet_cable_landing_points
Rows: 3,361
Purpose: Cable landing points (where cables come ashore)
Key Columns:
landing_point_id(TEXT) - Unique identifiername(TEXT) - Landing point namecity,countrylatitude,longitude,geomcables(TEXT[]) - Array of cable names landing at this pointcable_count(INTEGER)
Source: TeleGeography-style cable database
Notes:
- Used for proximity analysis (how close are data centers to cable landings?)
- Key finding: Data centers are NOT systematically closer to cables than ordinary US cities
internet_city_dominance
Rows: 4,552
Purpose: City-level IPs/capacity (internet hub strength proxy)
Key Columns:
city(TEXT)country(TEXT)latitude,longitude,geomip_addresses(INTEGER) - Number of routable IP addressescapacity_rank(INTEGER) - Relative capacity ranking
Source: Internet topology datasets
Notes: Proxy for "internet hub" strength (not directly used in main analyses)
Broadband
fcc_bdc_location_provider_aggregates
Rows: Varies
Purpose: FCC BDC provider availability aggregated by county/tract
Key Columns:
geoid(TEXT) - County or tract GEOIDgeography_level(TEXT) -countyortractprovider_count(INTEGER)technology_counts(JSONB) - Count by technology typemax_download_mbps,max_upload_mbps
Source: FCC Broadband Data Collection (BDC)
fcc_bdc_broadband_connection_table
Rows: Varies
Purpose: Per-data-center broadband provider availability
Key Columns:
- Data center identifiers
provider_id,provider_nametechnology(TEXT)max_advertised_download_speed,max_advertised_upload_speedlow_latency(BOOLEAN)
Source: FCC BDC, joined to data center locations
Notes: Built by build_fcc_bdc_broadband_connection_table.py
Other Tables
opposition_cases_geocoded
Rows: 18
Purpose: Geocoded community-opposition cases against data center builds
Key Columns:
case_id(TEXT) - Unique identifierdeveloper(TEXT) - Proposed developer/operatorinvestment_billions(DOUBLE PRECISION) - Investment amount in billionsoutcome(TEXT) - Case outcome (approved, rejected, pending)governance_response(TEXT) - Government responselatitude,longitude,geom
Source: Compiled from news archives
Notes: Loaded but currently unused - see research-ideas.md for proposed analyses
census_tract_huc8_link
Rows: 806
Purpose: Tract↔HUC8 spatial overlap table
Key Columns:
geoid(TEXT) - Census tract GEOIDhuc8(TEXT) - HUC8 watershed codeoverlap_pct(DOUBLE PRECISION) - Percentage of tract overlapping watershed
Notes: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers
im3_state_projected_moderate_50
Rows: 328
Purpose: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)
Key Columns:
facility_id(TEXT)state(TEXT)cost_millions(DOUBLE PRECISION)it_mw(DOUBLE PRECISION) - IT load in megawattscooling_water_demand_gal_per_day(DOUBLE PRECISION)latitude,longitude,geom
Source: PNNL Integrated Multisector Multiscale Modeling (IM3)
Notes: Loaded but unused - potential for forward-projection analysis
im3_projected_state_demand_summary
Rows: 31
Purpose: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand
Key Columns:
state(TEXT)facility_count(INTEGER)total_it_mw(DOUBLE PRECISION)total_cooling_demand_mgd(DOUBLE PRECISION) - Million gallons per day
Source: IM3 model outputs
utility_rate_tracker_2025_2028
Rows: 374
Purpose: Utility rate-increase tracker by provider × state × service type
Key Columns:
provider(TEXT) - Utility provider namestate(TEXT)service_type(TEXT)effective_date(DATE)monthly_increase_dollars(DOUBLE PRECISION)percent_increase(DOUBLE PRECISION)
Source: Utility rate tracker database
Notes: Loaded but unused in demographic/energy analysis
energy_atlas_layers_catalog
Rows: ~5
Purpose: Metadata catalog of EIA layers ingested
Key Columns:
table_name(TEXT)source_url(TEXT)import_timestamp(TIMESTAMP)row_count(INTEGER)
Notes: Created by ingest_eia_energy_layers.py
Legislation Tables
Populated by ingest_legiscan.py using the LegiScan API.
Covers all 50 states + DC + US Congress, sessions from 2016 through 2026.
Data licensed CC BY 4.0 — attribute LegiScan LLC.
legiscan_sessions
Rows: 646
Purpose: One row per legislative session dataset downloaded from LegiScan
Key Columns:
session_id(INTEGER) - LegiScan session ID (PRIMARY KEY)state_abbr(VARCHAR) - Two-letter state code (CA,TX,US, etc.)state_id(INTEGER) - LegiScan numeric state IDyear_start,year_end(INTEGER) - Session year rangesession_title(TEXT) - Full session namesession_tag(TEXT) - Short tag (e.g., "Regular Session", "1st Special Session")is_special(BOOLEAN) - True for special/extraordinary sessionsis_prior(BOOLEAN) - True for completed/sine-die sessionsdataset_hash(VARCHAR) - MD5 of dataset ZIP; used to detect updatesdataset_date(DATE) - Date dataset was last published by LegiScandataset_size_mb(FLOAT) - Compressed ZIP sizebill_count(INTEGER) - Number of bills loaded from this sessionimported_at(TIMESTAMPTZ) - When this session was last imported
legiscan_bills
Rows: ~1,313,000
Purpose: All bills from all sessions; tagged for relevance to data center research topics
Key Columns:
bill_id(INTEGER) - LegiScan bill ID (PRIMARY KEY)session_id(INTEGER) - FK →legiscan_sessionsstate(VARCHAR) - Two-letter state codebill_number(VARCHAR) - Bill number (e.g.,SB 1000,HB 233)bill_type(VARCHAR) -B=Bill,R=Resolution,CR=Concurrent Resolution, etc.title(TEXT) - Short titledescription(TEXT) - Longer descriptionstatus(INTEGER) - Current status code (see below)status_date(DATE) - Date of last status changecompleted(INTEGER) - 1 if bill is in a terminal statebody(VARCHAR) - Originating chamber (H=House,S=Senate,C=Council, etc.)url(TEXT) - LegiScan bill page URLstate_link(TEXT) - Official state legislature URLchange_hash(VARCHAR) - MD5 used to detect bill-level updatessubjects(TEXT[]) - LegiScan subject tags (GIN indexed)sponsor_count(INTEGER) - Number of sponsorsvote_count(INTEGER) - Number of recorded votestext_count(INTEGER) - Number of bill text versionsis_relevant(BOOLEAN) - True if any relevance tag matched (GIN indexed)relevance_tags(TEXT[]) - Matched topic tags (GIN indexed)imported_at(TIMESTAMPTZ) - When this bill was last upserted
Status codes: 1=Introduced, 2=Engrossed, 3=Enrolled, 4=Passed, 5=Vetoed, 6=Failed, 7=Override, 8=Chaptered, 9=Referred, 12=Draft
Relevance tags (keyword-matched against title + description + subjects):
| Tag | What it captures |
|---|---|
data_center |
Data centers, hyperscale, colocation, AI campuses, HPC facilities |
large_load |
Crypto mining, large industrial loads, extraordinary load |
ratepayer_protection |
Cost shifting, cross-subsidy, rate design, affordability, rate class |
grid_impact |
Grid reliability, transmission, interconnection queue, IRP |
tax_incentive |
Tax exemptions, abatements, credits for facilities |
energy_policy |
Renewable PPAs, green tariffs, clean electricity, decarbonization |
water_use |
Cooling water, evaporative cooling, water footprint |
siting_permitting |
Zoning, conditional use permits, local control, preemption |
Notes:
- ~60,000 relevant bills out of 1.3M total (~4.6%)
data_centertag: ~2,182 bills;ratepayer_protection: ~49,000- GIN indexes on
subjects,relevance_tags, and full-text (title || description) - Use
query_legiscan_bills.sqlfor pre-built research queries - Re-run
python ingest_legiscan.py --fetch --loadweekly to pick up dataset updates - Re-run
python ingest_legiscan.py --tagafter editing keyword lists
Commonly Used Joins
Data Center to Demographics
SELECT
dc.*,
ct.median_household_income,
ct.bachelors_or_higher_pct,
ct.broadband_pct
FROM master_data_centers dc
JOIN data_center_census_tracts_2024 ct
ON dc.id = ct.id;
Data Center to Watershed
SELECT
dc.*,
w.huc8,
w.watershed_name
FROM master_data_centers dc
JOIN data_center_watershed_huc8 dw ON dc.id = dw.id
JOIN watershed_huc8 w ON dw.huc8 = w.huc8;
Data Center to Energy Infrastructure (50 km radius)
SELECT
dc.id,
dc.name,
SUM(eg.nameplate_capacity_mw) AS total_capacity_50km
FROM master_data_centers dc
JOIN energy_eia_operating_generator_capacity_flat eg
ON ST_DWithin(
dc.geom::geography,
eg.geom::geography,
50000 -- 50 km in meters
)
WHERE eg.status = 'OP' -- Operating only
GROUP BY dc.id, dc.name;
Data Center to FEMA Hazard Risk
SELECT
dc.*,
nri.risk_score,
nri.wildfire_risk,
nri.drought_risk,
nri.heat_wave_risk
FROM master_data_centers dc
JOIN data_center_census_tracts_2024 ct ON dc.id = ct.id
JOIN nri_census_tracts nri ON ct.geoid = nri.nri_id;
Table Naming Conventions
master_*- Canonical, deduplicated tables (use these for analysis)data_center_*- Data center-specific enrichment tables_dc_*- Base layers scoped to data center states (underscore prefix = private/internal)energy_eia_*- EIA energy datainternet_*- Connectivity infrastructurefcc_bdc_*- FCC Broadband Data Collection
Indexes and Performance
All tables have spatial indexes on geom columns for fast spatial joins:
CREATE INDEX idx_tablename_geom ON tablename USING GIST(geom);
Key geoid columns are indexed for fast demographic joins:
CREATE INDEX idx_tablename_geoid ON tablename(geoid);
Maintenance Notes
Updating Data Centers
- Run
load_postgis_osm_data_centers.pyto refresh OSM data - Run
build_master_data_centers.pyto rebuild master table - Run enrichment scripts to update joins
Updating Demographics
- Update
_dc_census_tract_acs_2024from Census API - Run
create_data_center_census_tract_table.py --replace-final
Updating Energy Data
python3 ingest_eia_energy_layers.py --category power --update
Schema Export
To export the full schema:
pg_dump -h $PGWEB_HOST -U $PGWEB_USER -d data_centers --schema-only > schema.sql
To list all tables:
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Contact
For database access or questions, contact the repository owner.