Files

dadams 176f3d1eb6 Document database table previews

2026-06-09 15:04:47 -07:00

30 KiB

Raw Blame History

Database Tables Documentation

Database Configuration

Database Name: data_centers
Type: PostgreSQL with PostGIS extension
Connection: Environment variables from ~/.zsh_secrets

PGWEB_HOST: Database host
PGWEB_PORT: Database port (5433)
PGWEB_USER: Database user
PGWEB_PASSWORD: Database password
PGWEB_DATABASE: Database name (data_centers)

Table Organization

Tables are organized into six categories:

Core Data Center Tables - Master inventories and source data
Enrichment Tables - Data centers joined with contextual data
Environmental and Election Source Tables - Long-form climate, drought, fire/smoke, and precinct-election source layers
Base Layer Tables - Geographic and demographic reference layers
Infrastructure Tables - Energy and connectivity infrastructure
Legislation Tables - LegiScan state and federal bill data (2016-2026)

Core Data Center Tables

`master_data_centers`

Rows: 1,833
Purpose: Canonical data center inventory - deduplicated merge of curated + OSM sources

Key Columns:

id (INTEGER) - Unique identifier
name (TEXT) - Facility name
address (TEXT) - Street address
city (TEXT) - City
state (TEXT) - State code
latitude (DOUBLE PRECISION) - Latitude
longitude (DOUBLE PRECISION) - Longitude
geom (GEOMETRY) - PostGIS point geometry (EPSG:4326)
operator (TEXT) - Operator/owner
power_mw (DOUBLE PRECISION) - Power capacity in megawatts (sparse: 5.9% populated)
source (TEXT) - Data source (curated, osm, or both)
osm_id (TEXT) - OpenStreetMap ID if applicable
geocode_method (TEXT) - Geocoding provenance

Notes:

108 of 1,833 facilities have power ratings
45 facilities use city-precision fallback coordinates
Operator strings have fragmentation issues ("Meta" vs. "Meta, Inc.")

`us_dc_sample_geocoded`

Rows: 1,489
Purpose: Original curated sample with geocoding provenance (superseded by master_data_centers)

Key Columns:

name, address, city, state, zip
latitude, longitude, geom
operator, power_mw
census_lat, census_lon - Census TIGER geocode results
nominatim_lat, nominatim_lon - Nominatim fallback results
geocode_source - Which geocoder was used

`osm_data_centers`

Rows: 1,549
Purpose: Raw OpenStreetMap-derived facilities

Key Columns:

osm_id (TEXT) - OSM element ID
osm_type (TEXT) - node, way, or relation
name (TEXT) - OSM name tag
latitude, longitude, geom
tags (JSONB) - All OSM tags as JSON
operator (TEXT) - Extracted from OSM tags
city, state, country

Notes: Fetched via Overpass API with query for telecom=data_center or building=data_center

`master_data_center_spatial_clusters`

Rows: 1,831
Purpose: DBSCAN cluster assignments for master data centers

Key Columns:

All columns from master_data_centers
cluster_id (INTEGER) - Cluster assignment (-1 = noise/singleton)
cluster_size (INTEGER) - Number of facilities in cluster
cluster_label (TEXT) - Human-readable cluster name

Notes: DBSCAN parameters: eps=15 km, min_samples=2

Enrichment Tables

`data_center_census_tracts_2024`

Rows: 1,815
Purpose: Per-facility demographics from containing Census tract

Key Columns:

All columns from master_data_centers
geoid (TEXT) - 11-digit Census tract GEOID
state_fips, county_fips, tract
Population: total_population, population_density_sq_mi
Age: median_age, under_18_pct, over_65_pct
Race/Ethnicity: white_nh_pct, black_nh_pct, asian_nh_pct, hispanic_pct
Economics: median_household_income, per_capita_income, poverty_rate
Education: bachelors_or_higher_pct, high_school_or_higher_pct
Housing: median_home_value, median_rent, homeownership_rate
Broadband: broadband_pct - Households with broadband subscription

Source: ACS 2024 5-year estimates

Notes:

18 of 1,833 facilities failed tract join (geocoding issues)
Data from _dc_census_tract_acs_2024 base table

`data_center_watershed_huc8`

Rows: 1,833
Purpose: Per-facility watershed assignment

Key Columns:

All columns from master_data_centers
huc8 (TEXT) - 8-digit Hydrologic Unit Code
watershed_name (TEXT) - Watershed name
watershed_area_sq_km (DOUBLE PRECISION)
states (TEXT) - States intersecting watershed

Source: USGS Watershed Boundary Dataset

Notes: 257 unique HUC8 watersheds contain at least one data center

`data_center_nri_exposure`

Rows: 1,833
Purpose: Per-facility FEMA National Risk Index hazard exposure scores

Key Columns:

All columns from master_data_centers
nri_id (TEXT) - Census tract GEOID (matches geoid from demographics)
risk_score (DOUBLE PRECISION) - Overall NRI risk score
social_vulnerability (DOUBLE PRECISION) - Social vulnerability index
Hazard-specific risk scores (18 hazards):
- avalanche_risk, coastal_flooding_risk, cold_wave_risk
- drought_risk, earthquake_risk, hail_risk
- heat_wave_risk, hurricane_risk, ice_storm_risk
- landslide_risk, lightning_risk, riverine_flooding_risk
- strong_wind_risk, tornado_risk, tsunami_risk
- volcanic_activity_risk, wildfire_risk, winter_weather_risk

Source: FEMA National Risk Index (December 2025 release)

`data_center_historical_climate`

Rows: 1,833
Purpose: One-row-per-facility historical climate summary for data center locations

Key Columns:

master_id (TEXT) - FK to master_data_centers
source, name, operator, city, state, country
latitude, longitude, geom
daymet_dataset_version, gridmet_dataset_version
climate_period_start, climate_period_end - Current period: 1991-01-01 to 2020-12-31
Temperature: mean_annual_temperature_c, mean_summer_temperature_c, max_daily_temperature_c, min_daily_temperature_c
Humidity / wet bulb: mean_relative_humidity_pct, mean_wet_bulb_temperature_c, max_wet_bulb_temperature_c, extreme_wet_bulb_days
Cooling / heat: cooling_degree_days_c, annual_cooling_degree_days_c_mean, extreme_heat_days, annual_extreme_heat_days_mean
Precipitation: precipitation_total_mm, annual_precipitation_mm_mean, annual_precipitation_cv, wet_day_precipitation_p95_mm
Wind: mean_wind_speed_ms, max_daily_mean_wind_speed_ms, sustained_wind_days, annual_sustained_wind_days_mean

Source: Daymet + gridMET historical climate data

Notes: Built by historical_climate_data_centers.ipynb / open_meteo_historical_data_centers.ipynb

`data_center_usdm_drought_exposure`

Rows: 1,833
Purpose: Per-facility drought exposure summary from weekly U.S. Drought Monitor polygons

Key Columns:

master_id (TEXT) - FK to master_data_centers
source, name, operator, city, state, country
latitude, longitude, geom
usdm_status - covered or no_coverage
drought_period_start, drought_period_end - Current period: 2000-01-04 to 2025-12-30
weeks_observed
weeks_in_d0_or_worse, weeks_in_d1_or_worse, weeks_in_d2_or_worse, weeks_in_d3_or_worse, weeks_in_d4
pct_weeks_in_d0_or_worse, pct_weeks_in_d1_or_worse, pct_weeks_in_d2_or_worse, pct_weeks_in_d3_or_worse, pct_weeks_in_d4
worst_dm_category, mean_dm_category
longest_d0_streak_weeks, longest_d2_streak_weeks, longest_d3_streak_weeks

Source: U.S. Drought Monitor weekly spatial data

Notes:

Summary table is rolled up from data_center_usdm_drought_dc_week
dm_category scale: D0-D4, stored as 0-4
1,830 facilities have covered status; 3 have no coverage

`data_center_hms_smoke_exposure`

Rows: 1,833
Purpose: Per-facility wildfire-smoke exposure summary from NOAA HMS smoke polygons

Key Columns:

master_id (TEXT) - FK to master_data_centers
source, name, operator, city, state, country
latitude, longitude, geom
hms_status
smoke_period_start, smoke_period_end - Current period: 2005-08-05 to 2026-05-22
days_observed
days_with_any_smoke, days_with_light_or_worse, days_with_medium_or_worse, days_with_heavy_smoke
pct_days_with_any_smoke, pct_days_with_light_or_worse, pct_days_with_medium_or_worse, pct_days_with_heavy_smoke
worst_density_rank, worst_density, mean_density_rank
longest_any_smoke_streak_days, longest_medium_or_heavy_streak_days, longest_heavy_smoke_streak_days

Source: NOAA Hazard Mapping System (HMS) smoke polygons

Notes:

Summary table is rolled up from data_center_hms_smoke_dc_day
Density rank: 0 = observed no smoke, 1 = Light, 2 = Medium, 3 = Heavy
HMS product path uses NOAA's /FIRE/web/HMS/Smoke_Polygons/ archive

`data_center_election_context`

Rows: 1,833
Purpose: Standardized one-row-per-facility election context derived from RDH precinct matches

Key Columns:

master_id (TEXT) - FK to master_data_centers
name, city, state
rdh_layer_title
precinct_identifier_name
election_year, office
democratic_votes, republican_votes, total_votes
turnout_or_vote_share
updated_at

Source: Redistricting Data Hub precinct election shapefiles

Notes:

Built from data_center_rdh_precinct_vote_matches plus RDH feature properties
Current rows cover 2020-2024 election layers; 1,829 facilities have non-null election year context

`data_center_rdh_precinct_vote_matches`

Rows: 3,330
Purpose: Spatial join bridge between data centers and RDH precinct vote features

Key Columns:

master_id (TEXT) - FK to master_data_centers
feature_id (TEXT) - FK to rdh_precinct_vote_features
layer_id (TEXT) - FK to rdh_precinct_vote_layers
state_code
join_method
match_distance_m
matched_at

Source: Redistricting Data Hub precinct shapefiles

Notes: Spatial join to voting precincts (point-in-polygon, with nearest/fallback logic where needed)

Environmental and Election Source Tables

`usdm_drought_weekly`

Rows: 12,080
Purpose: Raw weekly U.S. Drought Monitor polygons by drought category

Key Columns:

id (BIGINT) - Primary key
week_date (DATE)
dm_category (SMALLINT) - Drought Monitor category D0-D4 stored as 0-4
objectid, shape_leng, shape_area
geom (GEOMETRY) - Drought polygon geometry

Source: U.S. Drought Monitor spatial archive

Notes: Source table for data_center_usdm_drought_dc_week

`data_center_usdm_drought_dc_week`

Rows: ~2.48 million
Purpose: Long-form weekly drought exposure for each covered data center

Key Columns:

master_id (TEXT) - FK to master_data_centers
week_date (DATE)
worst_dm (SMALLINT) - Worst drought category covering the facility that week

Source: Spatial join of master_data_centers to usdm_drought_weekly

Notes:

Primary key: (master_id, week_date)
worst_dm = -1 indicates an observed week with no drought polygon covering the facility

`hms_smoke_days`

Rows: 7,075
Purpose: One row per observed NOAA HMS smoke product day, including zero-polygon days

Key Columns:

smoke_date (DATE) - Primary key
source, source_file, source_url
feature_count (INTEGER) - Number of smoke polygons for the day
fetched_at, updated_at

Source: NOAA HMS smoke polygon archive

Notes: Denominator table for daily smoke-exposure percentages

`hms_smoke_daily`

Rows: 536,286
Purpose: Raw daily NOAA HMS smoke polygons with density categories

Key Columns:

id (BIGINT) - Primary key
smoke_date (DATE) - FK to hms_smoke_days
satellite
start_raw, end_raw, start_utc, end_utc
density, density_rank
source, source_file, source_url
geom (GEOMETRY) - Smoke polygon geometry

Source: NOAA Hazard Mapping System (HMS) smoke polygons

Notes: Density rank 1-3 corresponds to Light, Medium, Heavy

`data_center_hms_smoke_dc_day`

Rows: ~13.9 million
Purpose: Long-form daily smoke exposure for each data center and observed HMS product day

Key Columns:

master_id (TEXT) - FK to master_data_centers
smoke_date (DATE) - FK to hms_smoke_days
max_density_rank (SMALLINT) - Maximum smoke density covering the facility on that date
polygon_hits (INTEGER)

Source: Spatial join of master_data_centers to hms_smoke_daily

Notes:

Primary key: (master_id, smoke_date)
max_density_rank = 0 indicates an observed HMS day with no smoke polygon covering the facility

`rdh_precinct_vote_layers`

Rows: 69
Purpose: Metadata for downloaded RDH precinct election layers

Key Columns:

layer_id (TEXT) - Primary key
state_code
title
format
datasetid
source_url
filename, local_path, spatial_path
metadata (JSONB)
loaded_at

Source: Redistricting Data Hub precinct election datasets

Notes: Current loaded layers cover 45 distinct state codes

`rdh_precinct_vote_features`

Rows: 260,953
Purpose: Staged RDH precinct polygons and source attributes

Key Columns:

feature_id (TEXT) - Primary key
layer_id (TEXT) - FK to rdh_precinct_vote_layers
state_code
source_row
properties (JSONB) - Raw RDH election attributes
geom (GEOMETRY) - Precinct polygon geometry

Source: Redistricting Data Hub precinct election shapefiles

Notes: Source feature table for data_center_rdh_precinct_vote_matches

Base Layer Tables

`_dc_census_tract_acs_2024`

Rows: 85,382
Purpose: ACS 2024 demographics for all Census tracts in states with data centers

Key Columns:

geoid (TEXT) - 11-digit tract GEOID (PRIMARY KEY)
name (TEXT) - Tract name
state_fips, county_fips, tract
Full ACS 5-year estimates (85+ columns):
- Population by age, sex, race/ethnicity
- Households, families, housing units
- Income, poverty, education, employment
- Housing values, rents, costs
- Broadband, computer access
- Commuting, vehicles

Source: Census ACS 2024 5-year estimates API

Notes: Universe limited to 46 states with data centers (excludes DC-free states)

`_dc_census_tract_boundaries_2024`

Rows: 85,058
Purpose: TIGER 2024 tract polygons for data center states

Key Columns:

geoid (TEXT) - 11-digit tract GEOID
name (TEXT) - Tract name
state_fips, county_fips, tract_code
geom (GEOMETRY) - Polygon geometry (EPSG:4326)
area_land_sq_m (DOUBLE PRECISION) - Land area in square meters
area_water_sq_m (DOUBLE PRECISION) - Water area in square meters

Source: Census TIGER/Line 2024

`ruca_codes_2020_tract`

Rows: 85,528
Purpose: USDA Rural-Urban Commuting Area codes for metro/rural classification

Key Columns:

geoid (TEXT) - 11-digit tract GEOID (matches Census tracts)
ruca_code (TEXT) - Primary RUCA code (1-10)
ruca_category (TEXT) - Simplified category:
- Metropolitan (codes 1-3)
- Micropolitan (codes 4-6)
- Small town (codes 7-9)
- Rural (code 10)
ruca_description (TEXT) - Full RUCA code description
population_2020 (INTEGER)

Source: USDA Economic Research Service RUCA 2020

Notes:

Based on 2020 Census tracts and 2010-2020 commuting patterns
7 data centers failed RUCA join (Puerto Rico / non-US)

`watershed_huc8`

Rows: 2,139
Purpose: USGS HUC8 subbasin polygons for water-stress analysis

Key Columns:

huc8 (TEXT) - 8-digit Hydrologic Unit Code (PRIMARY KEY)
name (TEXT) - Watershed name
geom (GEOMETRY) - Polygon geometry (EPSG:4326)
area_sq_km (DOUBLE PRECISION)
states (TEXT) - Comma-separated state codes
dc_count (INTEGER) - Number of data centers in watershed

Source: USGS Watershed Boundary Dataset

Notes:

257 of 2,139 watersheds contain at least one data center
Top 15 watersheds contain 50% of all US data centers

`nri_census_tracts`

Rows: ~84,000
Purpose: Full FEMA National Risk Index by Census tract

Key Columns:

nri_id (TEXT) - Census tract GEOID
state_name, county_name, tract_name
460+ columns including:
- Overall risk scores and ratings
- Expected annual loss (dollars and building value %)
- Social vulnerability components (15 factors)
- Community resilience score
- Individual hazard risk scores (18 hazards)
- Exposure, annualized frequency, historic loss ratios per hazard

Source: FEMA National Risk Index v2.1 (December 2025)

Notes:

Massive table with comprehensive natural hazard risk data
Join to data centers via geoid field
See FEMA NRI Technical Documentation

Infrastructure Tables

Energy Infrastructure

`energy_eia_operating_generator_capacity_flat`

Rows: 4.7 million
Purpose: EIA generator inventory with lat/lon/MW (monthly 2008-2026)

Key Columns:

plant_id (INTEGER) - EIA plant ID
generator_id (TEXT) - Generator unit ID
plant_name (TEXT)
latitude, longitude, geom
state, county
utility_name, operator_name
nameplate_capacity_mw (DOUBLE PRECISION)
technology (TEXT) - Generation technology
energy_source_1, energy_source_2 - Primary fuel codes
operating_month, operating_year - When unit became operational
status (TEXT) - Operating, standby, retired, etc.
report_month, report_year - Data snapshot date

Source: EIA Form 860 via API

Notes:

"Flat" means denormalized for fast spatial queries
Each generator-month is a row (4.7M rows from monthly snapshots)
Use for proximity analysis (e.g., "all generators within 50 km of data center")

`energy_eia_facility_fuel_flat`

Rows: Not loaded yet
Purpose: Monthly generation by plant/fuel

Key Columns:

plant_id, plant_name
report_month, report_year
energy_source (TEXT) - Fuel code
net_generation_mwh (DOUBLE PRECISION)
fuel_consumed_mmbtu (DOUBLE PRECISION)

Source: EIA Form 923 via API

Notes: Target table defined in ingest_eia_energy_layers.py; current database does not yet have public.energy_eia_facility_fuel_flat.

`energy_eia_seds_flat`

Rows: 2.57 million
Purpose: Annual state energy consumption/production (1960-2024)

Key Columns:

state_code (TEXT)
year (INTEGER)
msn (TEXT) - Mnemonic series names (e.g., TETCB = total energy consumption)
value (DOUBLE PRECISION) - Energy in trillion BTU
unit (TEXT)
description (TEXT) - Human-readable MSN description

Source: EIA State Energy Data System (SEDS)

Notes:

Annual aggregates by state
Use for state-level energy context analysis

Connectivity Infrastructure

`internet_cables`

Rows: 693
Purpose: Submarine cable routes

Key Columns:

cable_id (TEXT) - Unique cable identifier
cable_name (TEXT) - Official cable name
geom (GEOMETRY) - LineString geometry (EPSG:4326)
rfs_year (INTEGER) - Ready For Service year
length_km (DOUBLE PRECISION)
owners (TEXT[]) - Array of owner names
landing_points (TEXT[]) - Array of landing point names

Source: TeleGeography-style cable database

Notes:

693 unique submarine cables
Geometry is approximate route (not exact seabed path)

`internet_cable_landing_points`

Rows: 3,361
Purpose: Cable landing points (where cables come ashore)

Key Columns:

landing_point_id (TEXT) - Unique identifier
name (TEXT) - Landing point name
city, country
latitude, longitude, geom
cables (TEXT[]) - Array of cable names landing at this point
cable_count (INTEGER)

Source: TeleGeography-style cable database

Notes:

Used for proximity analysis (how close are data centers to cable landings?)
Key finding: Data centers are NOT systematically closer to cables than ordinary US cities

`internet_city_dominance`

Rows: 4,552
Purpose: City-level IPs/capacity (internet hub strength proxy)

Key Columns:

city (TEXT)
country (TEXT)
latitude, longitude, geom
ip_addresses (INTEGER) - Number of routable IP addresses
capacity_rank (INTEGER) - Relative capacity ranking

Source: Internet topology datasets

Notes: Proxy for "internet hub" strength (not directly used in main analyses)

Broadband

`fcc_bdc_location_provider_aggregates`

Rows: Varies
Purpose: FCC BDC provider availability aggregated by county/tract

Key Columns:

geoid (TEXT) - County or tract GEOID
geography_level (TEXT) - county or tract
provider_count (INTEGER)
technology_counts (JSONB) - Count by technology type
max_download_mbps, max_upload_mbps

Source: FCC Broadband Data Collection (BDC)

`fcc_bdc_broadband_connection_table`

Rows: Varies
Purpose: Per-data-center broadband provider availability

Key Columns:

Data center identifiers
provider_id, provider_name
technology (TEXT)
max_advertised_download_speed, max_advertised_upload_speed
low_latency (BOOLEAN)

Source: FCC BDC, joined to data center locations

Notes: Built by build_fcc_bdc_broadband_connection_table.py

Other Tables

`opposition_cases_geocoded`

Rows: 18
Purpose: Geocoded community-opposition cases against data center builds

Key Columns:

case_id (TEXT) - Unique identifier
developer (TEXT) - Proposed developer/operator
investment_billions (DOUBLE PRECISION) - Investment amount in billions
outcome (TEXT) - Case outcome (approved, rejected, pending)
governance_response (TEXT) - Government response
latitude, longitude, geom

Source: Compiled from news archives

Notes: Loaded but currently unused - see research-ideas.md for proposed analyses

`census_tract_huc8_link`

Rows: 806
Purpose: Tract↔HUC8 spatial overlap table

Key Columns:

geoid (TEXT) - Census tract GEOID
huc8 (TEXT) - HUC8 watershed code
overlap_pct (DOUBLE PRECISION) - Percentage of tract overlapping watershed

Notes: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers

`im3_state_projected_moderate_50`

Rows: 328
Purpose: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)

Key Columns:

facility_id (TEXT)
state (TEXT)
cost_millions (DOUBLE PRECISION)
it_mw (DOUBLE PRECISION) - IT load in megawatts
cooling_water_demand_gal_per_day (DOUBLE PRECISION)
latitude, longitude, geom

Source: PNNL Integrated Multisector Multiscale Modeling (IM3)

Notes: Loaded but unused - potential for forward-projection analysis

`im3_projected_state_demand_summary`

Rows: 31
Purpose: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand

Key Columns:

state (TEXT)
facility_count (INTEGER)
total_it_mw (DOUBLE PRECISION)
total_cooling_demand_mgd (DOUBLE PRECISION) - Million gallons per day

Source: IM3 model outputs

`utility_rate_tracker_2025_2028`

Rows: 374
Purpose: Utility rate-increase tracker by provider × state × service type

Key Columns:

provider (TEXT) - Utility provider name
state (TEXT)
service_type (TEXT)
effective_date (DATE)
monthly_increase_dollars (DOUBLE PRECISION)
percent_increase (DOUBLE PRECISION)

Source: Utility rate tracker database

Notes: Loaded but unused in demographic/energy analysis

`energy_atlas_layers_catalog`

Rows: ~5
Purpose: Metadata catalog of EIA layers ingested

Key Columns:

table_name (TEXT)
source_url (TEXT)
import_timestamp (TIMESTAMP)
row_count (INTEGER)

Notes: Created by ingest_eia_energy_layers.py

Legislation Tables

Populated by ingest_legiscan.py using the LegiScan API.
Covers all 50 states + DC + US Congress, sessions from 2016 through 2026.
Data licensed CC BY 4.0 — attribute LegiScan LLC.

`legiscan_sessions`

Rows: 646
Purpose: One row per legislative session dataset downloaded from LegiScan

Key Columns:

session_id (INTEGER) - LegiScan session ID (PRIMARY KEY)
state_abbr (VARCHAR) - Two-letter state code (CA, TX, US, etc.)
state_id (INTEGER) - LegiScan numeric state ID
year_start, year_end (INTEGER) - Session year range
session_title (TEXT) - Full session name
session_tag (TEXT) - Short tag (e.g., "Regular Session", "1st Special Session")
is_special (BOOLEAN) - True for special/extraordinary sessions
is_prior (BOOLEAN) - True for completed/sine-die sessions
dataset_hash (VARCHAR) - MD5 of dataset ZIP; used to detect updates
dataset_date (DATE) - Date dataset was last published by LegiScan
dataset_size_mb (FLOAT) - Compressed ZIP size
bill_count (INTEGER) - Number of bills loaded from this session
imported_at (TIMESTAMPTZ) - When this session was last imported

`legiscan_bills`

Rows: ~1,313,000
Purpose: All bills from all sessions; tagged for relevance to data center research topics

Key Columns:

bill_id (INTEGER) - LegiScan bill ID (PRIMARY KEY)
session_id (INTEGER) - FK → legiscan_sessions
state (VARCHAR) - Two-letter state code
bill_number (VARCHAR) - Bill number (e.g., SB 1000, HB 233)
bill_type (VARCHAR) - B=Bill, R=Resolution, CR=Concurrent Resolution, etc.
title (TEXT) - Short title
description (TEXT) - Longer description
status (INTEGER) - Current status code (see below)
status_date (DATE) - Date of last status change
completed (INTEGER) - 1 if bill is in a terminal state
body (VARCHAR) - Originating chamber (H=House, S=Senate, C=Council, etc.)
url (TEXT) - LegiScan bill page URL
state_link (TEXT) - Official state legislature URL
change_hash (VARCHAR) - MD5 used to detect bill-level updates
subjects (TEXT[]) - LegiScan subject tags (GIN indexed)
sponsor_count (INTEGER) - Number of sponsors
vote_count (INTEGER) - Number of recorded votes
text_count (INTEGER) - Number of bill text versions
is_relevant (BOOLEAN) - True if any relevance tag matched (GIN indexed)
relevance_tags (TEXT[]) - Matched topic tags (GIN indexed)
imported_at (TIMESTAMPTZ) - When this bill was last upserted

Status codes: 1=Introduced, 2=Engrossed, 3=Enrolled, 4=Passed, 5=Vetoed, 6=Failed, 7=Override, 8=Chaptered, 9=Referred, 12=Draft

Relevance tags (keyword-matched against title + description + subjects):

Tag	What it captures
`data_center`	Data centers, hyperscale, colocation, AI campuses, HPC facilities
`large_load`	Crypto mining, large industrial loads, extraordinary load
`ratepayer_protection`	Cost shifting, cross-subsidy, rate design, affordability, rate class
`grid_impact`	Grid reliability, transmission, interconnection queue, IRP
`tax_incentive`	Tax exemptions, abatements, credits for facilities
`energy_policy`	Renewable PPAs, green tariffs, clean electricity, decarbonization
`water_use`	Cooling water, evaporative cooling, water footprint
`siting_permitting`	Zoning, conditional use permits, local control, preemption

Notes:

~60,000 relevant bills out of 1.3M total (~4.6%)
data_center tag: ~2,182 bills; ratepayer_protection: ~49,000
GIN indexes on subjects, relevance_tags, and full-text (title || description)
Use query_legiscan_bills.sql for pre-built research queries
Re-run python ingest_legiscan.py --fetch --load weekly to pick up dataset updates
Re-run python ingest_legiscan.py --tag after editing keyword lists

Commonly Used Joins

Data Center to Demographics

SELECT 
    dc.*,
    ct.median_household_income,
    ct.bachelors_or_higher_pct,
    ct.broadband_pct
FROM master_data_centers dc
JOIN data_center_census_tracts_2024 ct 
    ON dc.id = ct.id;

Data Center to Watershed

SELECT 
    dc.*,
    w.huc8,
    w.watershed_name
FROM master_data_centers dc
JOIN data_center_watershed_huc8 dw ON dc.id = dw.id
JOIN watershed_huc8 w ON dw.huc8 = w.huc8;

Data Center to Energy Infrastructure (50 km radius)

SELECT 
    dc.id,
    dc.name,
    SUM(eg.nameplate_capacity_mw) AS total_capacity_50km
FROM master_data_centers dc
JOIN energy_eia_operating_generator_capacity_flat eg
    ON ST_DWithin(
        dc.geom::geography,
        eg.geom::geography,
        50000  -- 50 km in meters
    )
WHERE eg.status = 'OP'  -- Operating only
GROUP BY dc.id, dc.name;

Data Center to FEMA Hazard Risk

SELECT 
    dc.*,
    nri.risk_score,
    nri.wildfire_risk,
    nri.drought_risk,
    nri.heat_wave_risk
FROM master_data_centers dc
JOIN data_center_census_tracts_2024 ct ON dc.id = ct.id
JOIN nri_census_tracts nri ON ct.geoid = nri.nri_id;

Table Naming Conventions

master_* - Canonical, deduplicated tables (use these for analysis)
data_center_* - Data center-specific enrichment tables
_dc_* - Base layers scoped to data center states (underscore prefix = private/internal)
energy_eia_* - EIA energy data
internet_* - Connectivity infrastructure
fcc_bdc_* - FCC Broadband Data Collection

Indexes and Performance

All tables have spatial indexes on geom columns for fast spatial joins:

CREATE INDEX idx_tablename_geom ON tablename USING GIST(geom);

Key geoid columns are indexed for fast demographic joins:

CREATE INDEX idx_tablename_geoid ON tablename(geoid);

Maintenance Notes

Updating Data Centers

Run load_postgis_osm_data_centers.py to refresh OSM data
Run build_master_data_centers.py to rebuild master table
Run enrichment scripts to update joins

Updating Demographics

Update _dc_census_tract_acs_2024 from Census API
Run create_data_center_census_tract_table.py --replace-final

Updating Energy Data

python3 ingest_eia_energy_layers.py --category power --update

Schema Export

To export the full schema:

pg_dump -h $PGWEB_HOST -U $PGWEB_USER -d data_centers --schema-only > schema.sql

To list all tables:

SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Contact

For database access or questions, contact the repository owner.

30 KiB Raw Blame History Unescape Escape

Database Tables Documentation

Database Configuration

Table Organization

Core Data Center Tables

master_data_centers

us_dc_sample_geocoded

osm_data_centers

master_data_center_spatial_clusters

Enrichment Tables

data_center_census_tracts_2024

data_center_watershed_huc8

data_center_nri_exposure

data_center_historical_climate

data_center_usdm_drought_exposure

data_center_hms_smoke_exposure

data_center_election_context

data_center_rdh_precinct_vote_matches

Environmental and Election Source Tables

usdm_drought_weekly

data_center_usdm_drought_dc_week

hms_smoke_days

hms_smoke_daily

data_center_hms_smoke_dc_day

rdh_precinct_vote_layers

rdh_precinct_vote_features

Base Layer Tables

_dc_census_tract_acs_2024

_dc_census_tract_boundaries_2024

ruca_codes_2020_tract

watershed_huc8

nri_census_tracts

Infrastructure Tables

Energy Infrastructure

energy_eia_operating_generator_capacity_flat

energy_eia_facility_fuel_flat

energy_eia_seds_flat

Connectivity Infrastructure

internet_cables

internet_cable_landing_points

internet_city_dominance

Broadband

fcc_bdc_location_provider_aggregates

fcc_bdc_broadband_connection_table

Other Tables

opposition_cases_geocoded

census_tract_huc8_link

im3_state_projected_moderate_50

im3_projected_state_demand_summary

utility_rate_tracker_2025_2028

energy_atlas_layers_catalog

Legislation Tables

legiscan_sessions

legiscan_bills

Commonly Used Joins

Data Center to Demographics

Data Center to Watershed

Data Center to Energy Infrastructure (50 km radius)

Data Center to FEMA Hazard Risk

Table Naming Conventions

Indexes and Performance

Maintenance Notes

Updating Data Centers

Updating Demographics

Updating Energy Data

Schema Export

Contact

30 KiB

Raw Blame History

`master_data_centers`

`us_dc_sample_geocoded`

`osm_data_centers`

`master_data_center_spatial_clusters`

`data_center_census_tracts_2024`

`data_center_watershed_huc8`

`data_center_nri_exposure`

`data_center_historical_climate`

`data_center_usdm_drought_exposure`

`data_center_hms_smoke_exposure`

`data_center_election_context`

`data_center_rdh_precinct_vote_matches`

`usdm_drought_weekly`

`data_center_usdm_drought_dc_week`

`hms_smoke_days`

`hms_smoke_daily`

`data_center_hms_smoke_dc_day`

`rdh_precinct_vote_layers`

`rdh_precinct_vote_features`

`_dc_census_tract_acs_2024`

`_dc_census_tract_boundaries_2024`

`ruca_codes_2020_tract`

`watershed_huc8`

`nri_census_tracts`

`energy_eia_operating_generator_capacity_flat`

`energy_eia_facility_fuel_flat`

`energy_eia_seds_flat`

`internet_cables`

`internet_cable_landing_points`

`internet_city_dominance`

`fcc_bdc_location_provider_aggregates`

`fcc_bdc_broadband_connection_table`

`opposition_cases_geocoded`

`census_tract_huc8_link`

`im3_state_projected_moderate_50`

`im3_projected_state_demand_summary`

`utility_rate_tracker_2025_2028`

`energy_atlas_layers_catalog`

`legiscan_sessions`

`legiscan_bills`