- Add clustered vs isolated facility comparison to README - Expand infrastructure insights with hyperscaler energy strategies - Document additional database tables (opposition cases, IM3 projections, utility rates) - Enhance research ideas with specific watershed names and grid saturation data - Add data quality notes about EIA longitude corrections - Reference loaded but unused tables for future analysis
622 lines
18 KiB
Markdown
622 lines
18 KiB
Markdown
# Database Tables Documentation
|
||
|
||
## Database Configuration
|
||
|
||
**Database Name**: `data_centers`
|
||
**Type**: PostgreSQL with PostGIS extension
|
||
**Connection**: Environment variables from `~/.zsh_secrets`
|
||
- `PGWEB_HOST`: Database host
|
||
- `PGWEB_PORT`: Database port (typically 5432)
|
||
- `PGWEB_USER`: Database user
|
||
- `PGWEB_PASSWORD`: Database password
|
||
- `PGWEB_DATABASE`: Database name (`data_centers`)
|
||
|
||
## Table Organization
|
||
|
||
Tables are organized into four categories:
|
||
1. **Core Data Center Tables** - Master inventories and source data
|
||
2. **Enrichment Tables** - Data centers joined with contextual data
|
||
3. **Base Layer Tables** - Geographic and demographic reference layers
|
||
4. **Infrastructure Tables** - Energy and connectivity infrastructure
|
||
|
||
---
|
||
|
||
## Core Data Center Tables
|
||
|
||
### `master_data_centers`
|
||
**Rows**: 1,833
|
||
**Purpose**: Canonical data center inventory - deduplicated merge of curated + OSM sources
|
||
|
||
**Key Columns**:
|
||
- `id` (INTEGER) - Unique identifier
|
||
- `name` (TEXT) - Facility name
|
||
- `address` (TEXT) - Street address
|
||
- `city` (TEXT) - City
|
||
- `state` (TEXT) - State code
|
||
- `latitude` (DOUBLE PRECISION) - Latitude
|
||
- `longitude` (DOUBLE PRECISION) - Longitude
|
||
- `geom` (GEOMETRY) - PostGIS point geometry (EPSG:4326)
|
||
- `operator` (TEXT) - Operator/owner
|
||
- `power_mw` (DOUBLE PRECISION) - Power capacity in megawatts (sparse: 5.9% populated)
|
||
- `source` (TEXT) - Data source (`curated`, `osm`, or `both`)
|
||
- `osm_id` (TEXT) - OpenStreetMap ID if applicable
|
||
- `geocode_method` (TEXT) - Geocoding provenance
|
||
|
||
**Notes**:
|
||
- 108 of 1,833 facilities have power ratings
|
||
- 45 facilities use city-precision fallback coordinates
|
||
- Operator strings have fragmentation issues ("Meta" vs. "Meta, Inc.")
|
||
|
||
### `us_dc_sample_geocoded`
|
||
**Rows**: 1,489
|
||
**Purpose**: Original curated sample with geocoding provenance (superseded by `master_data_centers`)
|
||
|
||
**Key Columns**:
|
||
- `name`, `address`, `city`, `state`, `zip`
|
||
- `latitude`, `longitude`, `geom`
|
||
- `operator`, `power_mw`
|
||
- `census_lat`, `census_lon` - Census TIGER geocode results
|
||
- `nominatim_lat`, `nominatim_lon` - Nominatim fallback results
|
||
- `geocode_source` - Which geocoder was used
|
||
|
||
### `osm_data_centers`
|
||
**Rows**: 1,549
|
||
**Purpose**: Raw OpenStreetMap-derived facilities
|
||
|
||
**Key Columns**:
|
||
- `osm_id` (TEXT) - OSM element ID
|
||
- `osm_type` (TEXT) - `node`, `way`, or `relation`
|
||
- `name` (TEXT) - OSM name tag
|
||
- `latitude`, `longitude`, `geom`
|
||
- `tags` (JSONB) - All OSM tags as JSON
|
||
- `operator` (TEXT) - Extracted from OSM tags
|
||
- `city`, `state`, `country`
|
||
|
||
**Notes**: Fetched via Overpass API with query for `telecom=data_center` or `building=data_center`
|
||
|
||
### `master_data_center_spatial_clusters`
|
||
**Rows**: 1,831
|
||
**Purpose**: DBSCAN cluster assignments for master data centers
|
||
|
||
**Key Columns**:
|
||
- All columns from `master_data_centers`
|
||
- `cluster_id` (INTEGER) - Cluster assignment (-1 = noise/singleton)
|
||
- `cluster_size` (INTEGER) - Number of facilities in cluster
|
||
- `cluster_label` (TEXT) - Human-readable cluster name
|
||
|
||
**Notes**: DBSCAN parameters: eps=15 km, min_samples=2
|
||
|
||
---
|
||
|
||
## Enrichment Tables
|
||
|
||
### `data_center_census_tracts_2024`
|
||
**Rows**: 1,815
|
||
**Purpose**: Per-facility demographics from containing Census tract
|
||
|
||
**Key Columns**:
|
||
- All columns from `master_data_centers`
|
||
- `geoid` (TEXT) - 11-digit Census tract GEOID
|
||
- `state_fips`, `county_fips`, `tract`
|
||
- **Population**: `total_population`, `population_density_sq_mi`
|
||
- **Age**: `median_age`, `under_18_pct`, `over_65_pct`
|
||
- **Race/Ethnicity**: `white_nh_pct`, `black_nh_pct`, `asian_nh_pct`, `hispanic_pct`
|
||
- **Economics**: `median_household_income`, `per_capita_income`, `poverty_rate`
|
||
- **Education**: `bachelors_or_higher_pct`, `high_school_or_higher_pct`
|
||
- **Housing**: `median_home_value`, `median_rent`, `homeownership_rate`
|
||
- **Broadband**: `broadband_pct` - Households with broadband subscription
|
||
|
||
**Source**: ACS 2024 5-year estimates
|
||
|
||
**Notes**:
|
||
- 18 of 1,833 facilities failed tract join (geocoding issues)
|
||
- Data from `_dc_census_tract_acs_2024` base table
|
||
|
||
### `data_center_watershed_huc8`
|
||
**Rows**: 1,833
|
||
**Purpose**: Per-facility watershed assignment
|
||
|
||
**Key Columns**:
|
||
- All columns from `master_data_centers`
|
||
- `huc8` (TEXT) - 8-digit Hydrologic Unit Code
|
||
- `watershed_name` (TEXT) - Watershed name
|
||
- `watershed_area_sq_km` (DOUBLE PRECISION)
|
||
- `states` (TEXT) - States intersecting watershed
|
||
|
||
**Source**: USGS Watershed Boundary Dataset
|
||
|
||
**Notes**: 257 unique HUC8 watersheds contain at least one data center
|
||
|
||
### `data_center_nri_exposure`
|
||
**Rows**: 1,833
|
||
**Purpose**: Per-facility FEMA National Risk Index hazard exposure scores
|
||
|
||
**Key Columns**:
|
||
- All columns from `master_data_centers`
|
||
- `nri_id` (TEXT) - Census tract GEOID (matches `geoid` from demographics)
|
||
- `risk_score` (DOUBLE PRECISION) - Overall NRI risk score
|
||
- `social_vulnerability` (DOUBLE PRECISION) - Social vulnerability index
|
||
- **Hazard-specific risk scores** (18 hazards):
|
||
- `avalanche_risk`, `coastal_flooding_risk`, `cold_wave_risk`
|
||
- `drought_risk`, `earthquake_risk`, `hail_risk`
|
||
- `heat_wave_risk`, `hurricane_risk`, `ice_storm_risk`
|
||
- `landslide_risk`, `lightning_risk`, `riverine_flooding_risk`
|
||
- `strong_wind_risk`, `tornado_risk`, `tsunami_risk`
|
||
- `volcanic_activity_risk`, `wildfire_risk`, `winter_weather_risk`
|
||
|
||
**Source**: FEMA National Risk Index (December 2025 release)
|
||
|
||
### `data_center_rdh_precinct_vote_matches`
|
||
**Rows**: Varies
|
||
**Purpose**: Per-facility precinct-level election results
|
||
|
||
**Key Columns**:
|
||
- Data center identifiers
|
||
- `precinct_name`, `precinct_id`
|
||
- `election_year`, `office`
|
||
- `candidate`, `party`, `votes`
|
||
- `vote_share_pct`
|
||
|
||
**Source**: Redistricting Data Hub precinct shapefiles
|
||
|
||
**Notes**: Spatial join to voting precincts (point-in-polygon)
|
||
|
||
---
|
||
|
||
## Base Layer Tables
|
||
|
||
### `_dc_census_tract_acs_2024`
|
||
**Rows**: 85,382
|
||
**Purpose**: ACS 2024 demographics for all Census tracts in states with data centers
|
||
|
||
**Key Columns**:
|
||
- `geoid` (TEXT) - 11-digit tract GEOID (PRIMARY KEY)
|
||
- `name` (TEXT) - Tract name
|
||
- `state_fips`, `county_fips`, `tract`
|
||
- **Full ACS 5-year estimates** (85+ columns):
|
||
- Population by age, sex, race/ethnicity
|
||
- Households, families, housing units
|
||
- Income, poverty, education, employment
|
||
- Housing values, rents, costs
|
||
- Broadband, computer access
|
||
- Commuting, vehicles
|
||
|
||
**Source**: Census ACS 2024 5-year estimates API
|
||
|
||
**Notes**: Universe limited to 46 states with data centers (excludes DC-free states)
|
||
|
||
### `_dc_census_tract_boundaries_2024`
|
||
**Rows**: 85,058
|
||
**Purpose**: TIGER 2024 tract polygons for data center states
|
||
|
||
**Key Columns**:
|
||
- `geoid` (TEXT) - 11-digit tract GEOID
|
||
- `name` (TEXT) - Tract name
|
||
- `state_fips`, `county_fips`, `tract_code`
|
||
- `geom` (GEOMETRY) - Polygon geometry (EPSG:4326)
|
||
- `area_land_sq_m` (DOUBLE PRECISION) - Land area in square meters
|
||
- `area_water_sq_m` (DOUBLE PRECISION) - Water area in square meters
|
||
|
||
**Source**: Census TIGER/Line 2024
|
||
|
||
### `ruca_codes_2020_tract`
|
||
**Rows**: 85,528
|
||
**Purpose**: USDA Rural-Urban Commuting Area codes for metro/rural classification
|
||
|
||
**Key Columns**:
|
||
- `geoid` (TEXT) - 11-digit tract GEOID (matches Census tracts)
|
||
- `ruca_code` (TEXT) - Primary RUCA code (1-10)
|
||
- `ruca_category` (TEXT) - Simplified category:
|
||
- `Metropolitan` (codes 1-3)
|
||
- `Micropolitan` (codes 4-6)
|
||
- `Small town` (codes 7-9)
|
||
- `Rural` (code 10)
|
||
- `ruca_description` (TEXT) - Full RUCA code description
|
||
- `population_2020` (INTEGER)
|
||
|
||
**Source**: USDA Economic Research Service RUCA 2020
|
||
|
||
**Notes**:
|
||
- Based on 2020 Census tracts and 2010-2020 commuting patterns
|
||
- 7 data centers failed RUCA join (Puerto Rico / non-US)
|
||
|
||
### `watershed_huc8`
|
||
**Rows**: 2,139
|
||
**Purpose**: USGS HUC8 subbasin polygons for water-stress analysis
|
||
|
||
**Key Columns**:
|
||
- `huc8` (TEXT) - 8-digit Hydrologic Unit Code (PRIMARY KEY)
|
||
- `name` (TEXT) - Watershed name
|
||
- `geom` (GEOMETRY) - Polygon geometry (EPSG:4326)
|
||
- `area_sq_km` (DOUBLE PRECISION)
|
||
- `states` (TEXT) - Comma-separated state codes
|
||
- `dc_count` (INTEGER) - Number of data centers in watershed
|
||
|
||
**Source**: USGS Watershed Boundary Dataset
|
||
|
||
**Notes**:
|
||
- 257 of 2,139 watersheds contain at least one data center
|
||
- Top 15 watersheds contain 50% of all US data centers
|
||
|
||
### `nri_census_tracts`
|
||
**Rows**: ~84,000
|
||
**Purpose**: Full FEMA National Risk Index by Census tract
|
||
|
||
**Key Columns**:
|
||
- `nri_id` (TEXT) - Census tract GEOID
|
||
- `state_name`, `county_name`, `tract_name`
|
||
- **460+ columns** including:
|
||
- Overall risk scores and ratings
|
||
- Expected annual loss (dollars and building value %)
|
||
- Social vulnerability components (15 factors)
|
||
- Community resilience score
|
||
- Individual hazard risk scores (18 hazards)
|
||
- Exposure, annualized frequency, historic loss ratios per hazard
|
||
|
||
**Source**: FEMA National Risk Index v2.1 (December 2025)
|
||
|
||
**Notes**:
|
||
- Massive table with comprehensive natural hazard risk data
|
||
- Join to data centers via `geoid` field
|
||
- See [FEMA NRI Technical Documentation](https://hazards.fema.gov/nri/)
|
||
|
||
---
|
||
|
||
## Infrastructure Tables
|
||
|
||
### Energy Infrastructure
|
||
|
||
#### `energy_eia_operating_generator_capacity_flat`
|
||
**Rows**: 4.7 million
|
||
**Purpose**: EIA generator inventory with lat/lon/MW (monthly 2008-2026)
|
||
|
||
**Key Columns**:
|
||
- `plant_id` (INTEGER) - EIA plant ID
|
||
- `generator_id` (TEXT) - Generator unit ID
|
||
- `plant_name` (TEXT)
|
||
- `latitude`, `longitude`, `geom`
|
||
- `state`, `county`
|
||
- `utility_name`, `operator_name`
|
||
- `nameplate_capacity_mw` (DOUBLE PRECISION)
|
||
- `technology` (TEXT) - Generation technology
|
||
- `energy_source_1`, `energy_source_2` - Primary fuel codes
|
||
- `operating_month`, `operating_year` - When unit became operational
|
||
- `status` (TEXT) - Operating, standby, retired, etc.
|
||
- `report_month`, `report_year` - Data snapshot date
|
||
|
||
**Source**: EIA Form 860 via API
|
||
|
||
**Notes**:
|
||
- "Flat" means denormalized for fast spatial queries
|
||
- Each generator-month is a row (4.7M rows from monthly snapshots)
|
||
- Use for proximity analysis (e.g., "all generators within 50 km of data center")
|
||
|
||
#### `energy_eia_facility_fuel_flat`
|
||
**Rows**: Varies
|
||
**Purpose**: Monthly generation by plant/fuel
|
||
|
||
**Key Columns**:
|
||
- `plant_id`, `plant_name`
|
||
- `report_month`, `report_year`
|
||
- `energy_source` (TEXT) - Fuel code
|
||
- `net_generation_mwh` (DOUBLE PRECISION)
|
||
- `fuel_consumed_mmbtu` (DOUBLE PRECISION)
|
||
|
||
**Source**: EIA Form 923 via API
|
||
|
||
#### `energy_eia_seds_flat`
|
||
**Rows**: 2.57 million
|
||
**Purpose**: Annual state energy consumption/production (1960-2024)
|
||
|
||
**Key Columns**:
|
||
- `state_code` (TEXT)
|
||
- `year` (INTEGER)
|
||
- `msn` (TEXT) - Mnemonic series names (e.g., `TETCB` = total energy consumption)
|
||
- `value` (DOUBLE PRECISION) - Energy in trillion BTU
|
||
- `unit` (TEXT)
|
||
- `description` (TEXT) - Human-readable MSN description
|
||
|
||
**Source**: EIA State Energy Data System (SEDS)
|
||
|
||
**Notes**:
|
||
- Annual aggregates by state
|
||
- Use for state-level energy context analysis
|
||
|
||
---
|
||
|
||
### Connectivity Infrastructure
|
||
|
||
#### `internet_cables`
|
||
**Rows**: 693
|
||
**Purpose**: Submarine cable routes
|
||
|
||
**Key Columns**:
|
||
- `cable_id` (TEXT) - Unique cable identifier
|
||
- `cable_name` (TEXT) - Official cable name
|
||
- `geom` (GEOMETRY) - LineString geometry (EPSG:4326)
|
||
- `rfs_year` (INTEGER) - Ready For Service year
|
||
- `length_km` (DOUBLE PRECISION)
|
||
- `owners` (TEXT[]) - Array of owner names
|
||
- `landing_points` (TEXT[]) - Array of landing point names
|
||
|
||
**Source**: TeleGeography-style cable database
|
||
|
||
**Notes**:
|
||
- 693 unique submarine cables
|
||
- Geometry is approximate route (not exact seabed path)
|
||
|
||
#### `internet_cable_landing_points`
|
||
**Rows**: 3,361
|
||
**Purpose**: Cable landing points (where cables come ashore)
|
||
|
||
**Key Columns**:
|
||
- `landing_point_id` (TEXT) - Unique identifier
|
||
- `name` (TEXT) - Landing point name
|
||
- `city`, `country`
|
||
- `latitude`, `longitude`, `geom`
|
||
- `cables` (TEXT[]) - Array of cable names landing at this point
|
||
- `cable_count` (INTEGER)
|
||
|
||
**Source**: TeleGeography-style cable database
|
||
|
||
**Notes**:
|
||
- Used for proximity analysis (how close are data centers to cable landings?)
|
||
- **Key finding**: Data centers are NOT systematically closer to cables than ordinary US cities
|
||
|
||
#### `internet_city_dominance`
|
||
**Rows**: 4,552
|
||
**Purpose**: City-level IPs/capacity (internet hub strength proxy)
|
||
|
||
**Key Columns**:
|
||
- `city` (TEXT)
|
||
- `country` (TEXT)
|
||
- `latitude`, `longitude`, `geom`
|
||
- `ip_addresses` (INTEGER) - Number of routable IP addresses
|
||
- `capacity_rank` (INTEGER) - Relative capacity ranking
|
||
|
||
**Source**: Internet topology datasets
|
||
|
||
**Notes**: Proxy for "internet hub" strength (not directly used in main analyses)
|
||
|
||
---
|
||
|
||
### Broadband
|
||
|
||
#### `fcc_bdc_location_provider_aggregates`
|
||
**Rows**: Varies
|
||
**Purpose**: FCC BDC provider availability aggregated by county/tract
|
||
|
||
**Key Columns**:
|
||
- `geoid` (TEXT) - County or tract GEOID
|
||
- `geography_level` (TEXT) - `county` or `tract`
|
||
- `provider_count` (INTEGER)
|
||
- `technology_counts` (JSONB) - Count by technology type
|
||
- `max_download_mbps`, `max_upload_mbps`
|
||
|
||
**Source**: FCC Broadband Data Collection (BDC)
|
||
|
||
#### `fcc_bdc_broadband_connection_table`
|
||
**Rows**: Varies
|
||
**Purpose**: Per-data-center broadband provider availability
|
||
|
||
**Key Columns**:
|
||
- Data center identifiers
|
||
- `provider_id`, `provider_name`
|
||
- `technology` (TEXT)
|
||
- `max_advertised_download_speed`, `max_advertised_upload_speed`
|
||
- `low_latency` (BOOLEAN)
|
||
|
||
**Source**: FCC BDC, joined to data center locations
|
||
|
||
**Notes**: Built by `build_fcc_bdc_broadband_connection_table.py`
|
||
|
||
---
|
||
|
||
### Other Tables
|
||
|
||
#### `opposition_cases_geocoded`
|
||
**Rows**: 18
|
||
**Purpose**: Geocoded community-opposition cases against data center builds
|
||
|
||
**Key Columns**:
|
||
- `case_id` (TEXT) - Unique identifier
|
||
- `developer` (TEXT) - Proposed developer/operator
|
||
- `investment_billions` (DOUBLE PRECISION) - Investment amount in billions
|
||
- `outcome` (TEXT) - Case outcome (approved, rejected, pending)
|
||
- `governance_response` (TEXT) - Government response
|
||
- `latitude`, `longitude`, `geom`
|
||
|
||
**Source**: Compiled from news archives
|
||
|
||
**Notes**: Loaded but currently unused - see research-ideas.md for proposed analyses
|
||
|
||
#### `census_tract_huc8_link`
|
||
**Rows**: 806
|
||
**Purpose**: Tract↔HUC8 spatial overlap table
|
||
|
||
**Key Columns**:
|
||
- `geoid` (TEXT) - Census tract GEOID
|
||
- `huc8` (TEXT) - HUC8 watershed code
|
||
- `overlap_pct` (DOUBLE PRECISION) - Percentage of tract overlapping watershed
|
||
|
||
**Notes**: Useful for downstream tract-level water-stress joins; limited to tracts containing data centers
|
||
|
||
#### `im3_state_projected_moderate_50`
|
||
**Rows**: 328
|
||
**Purpose**: PNNL IM3 projected data center siting (moderate growth, gravity weight 0.50)
|
||
|
||
**Key Columns**:
|
||
- `facility_id` (TEXT)
|
||
- `state` (TEXT)
|
||
- `cost_millions` (DOUBLE PRECISION)
|
||
- `it_mw` (DOUBLE PRECISION) - IT load in megawatts
|
||
- `cooling_water_demand_gal_per_day` (DOUBLE PRECISION)
|
||
- `latitude`, `longitude`, `geom`
|
||
|
||
**Source**: PNNL Integrated Multisector Multiscale Modeling (IM3)
|
||
|
||
**Notes**: Loaded but unused - potential for forward-projection analysis
|
||
|
||
#### `im3_projected_state_demand_summary`
|
||
**Rows**: 31
|
||
**Purpose**: State-level rollup of IM3 projected facility counts, IT MW, and cooling demand
|
||
|
||
**Key Columns**:
|
||
- `state` (TEXT)
|
||
- `facility_count` (INTEGER)
|
||
- `total_it_mw` (DOUBLE PRECISION)
|
||
- `total_cooling_demand_mgd` (DOUBLE PRECISION) - Million gallons per day
|
||
|
||
**Source**: IM3 model outputs
|
||
|
||
#### `utility_rate_tracker_2025_2028`
|
||
**Rows**: 374
|
||
**Purpose**: Utility rate-increase tracker by provider × state × service type
|
||
|
||
**Key Columns**:
|
||
- `provider` (TEXT) - Utility provider name
|
||
- `state` (TEXT)
|
||
- `service_type` (TEXT)
|
||
- `effective_date` (DATE)
|
||
- `monthly_increase_dollars` (DOUBLE PRECISION)
|
||
- `percent_increase` (DOUBLE PRECISION)
|
||
|
||
**Source**: Utility rate tracker database
|
||
|
||
**Notes**: Loaded but unused in demographic/energy analysis
|
||
|
||
#### `energy_atlas_layers_catalog`
|
||
**Rows**: ~5
|
||
**Purpose**: Metadata catalog of EIA layers ingested
|
||
|
||
**Key Columns**:
|
||
- `table_name` (TEXT)
|
||
- `source_url` (TEXT)
|
||
- `import_timestamp` (TIMESTAMP)
|
||
- `row_count` (INTEGER)
|
||
|
||
**Notes**: Created by `ingest_eia_energy_layers.py`
|
||
|
||
---
|
||
|
||
## Commonly Used Joins
|
||
|
||
### Data Center to Demographics
|
||
```sql
|
||
SELECT
|
||
dc.*,
|
||
ct.median_household_income,
|
||
ct.bachelors_or_higher_pct,
|
||
ct.broadband_pct
|
||
FROM master_data_centers dc
|
||
JOIN data_center_census_tracts_2024 ct
|
||
ON dc.id = ct.id;
|
||
```
|
||
|
||
### Data Center to Watershed
|
||
```sql
|
||
SELECT
|
||
dc.*,
|
||
w.huc8,
|
||
w.watershed_name
|
||
FROM master_data_centers dc
|
||
JOIN data_center_watershed_huc8 dw ON dc.id = dw.id
|
||
JOIN watershed_huc8 w ON dw.huc8 = w.huc8;
|
||
```
|
||
|
||
### Data Center to Energy Infrastructure (50 km radius)
|
||
```sql
|
||
SELECT
|
||
dc.id,
|
||
dc.name,
|
||
SUM(eg.nameplate_capacity_mw) AS total_capacity_50km
|
||
FROM master_data_centers dc
|
||
JOIN energy_eia_operating_generator_capacity_flat eg
|
||
ON ST_DWithin(
|
||
dc.geom::geography,
|
||
eg.geom::geography,
|
||
50000 -- 50 km in meters
|
||
)
|
||
WHERE eg.status = 'OP' -- Operating only
|
||
GROUP BY dc.id, dc.name;
|
||
```
|
||
|
||
### Data Center to FEMA Hazard Risk
|
||
```sql
|
||
SELECT
|
||
dc.*,
|
||
nri.risk_score,
|
||
nri.wildfire_risk,
|
||
nri.drought_risk,
|
||
nri.heat_wave_risk
|
||
FROM master_data_centers dc
|
||
JOIN data_center_census_tracts_2024 ct ON dc.id = ct.id
|
||
JOIN nri_census_tracts nri ON ct.geoid = nri.nri_id;
|
||
```
|
||
|
||
---
|
||
|
||
## Table Naming Conventions
|
||
|
||
- **`master_*`** - Canonical, deduplicated tables (use these for analysis)
|
||
- **`data_center_*`** - Data center-specific enrichment tables
|
||
- **`_dc_*`** - Base layers scoped to data center states (underscore prefix = private/internal)
|
||
- **`energy_eia_*`** - EIA energy data
|
||
- **`internet_*`** - Connectivity infrastructure
|
||
- **`fcc_bdc_*`** - FCC Broadband Data Collection
|
||
|
||
---
|
||
|
||
## Indexes and Performance
|
||
|
||
All tables have spatial indexes on `geom` columns for fast spatial joins:
|
||
```sql
|
||
CREATE INDEX idx_tablename_geom ON tablename USING GIST(geom);
|
||
```
|
||
|
||
Key `geoid` columns are indexed for fast demographic joins:
|
||
```sql
|
||
CREATE INDEX idx_tablename_geoid ON tablename(geoid);
|
||
```
|
||
|
||
---
|
||
|
||
## Maintenance Notes
|
||
|
||
### Updating Data Centers
|
||
1. Run `load_postgis_osm_data_centers.py` to refresh OSM data
|
||
2. Run `build_master_data_centers.py` to rebuild master table
|
||
3. Run enrichment scripts to update joins
|
||
|
||
### Updating Demographics
|
||
1. Update `_dc_census_tract_acs_2024` from Census API
|
||
2. Run `create_data_center_census_tract_table.py --replace-final`
|
||
|
||
### Updating Energy Data
|
||
```bash
|
||
python3 ingest_eia_energy_layers.py --category power --update
|
||
```
|
||
|
||
---
|
||
|
||
## Schema Export
|
||
|
||
To export the full schema:
|
||
```bash
|
||
pg_dump -h $PGWEB_HOST -U $PGWEB_USER -d data_centers --schema-only > schema.sql
|
||
```
|
||
|
||
To list all tables:
|
||
```sql
|
||
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
|
||
FROM pg_tables
|
||
WHERE schemaname = 'public'
|
||
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
||
```
|
||
|
||
---
|
||
|
||
## Contact
|
||
|
||
For database access or questions, contact the repository owner.
|