first commit

This commit is contained in:
2026-05-15 20:48:41 -07:00
commit f57969c9ee
34 changed files with 89262 additions and 0 deletions

186
postgis_tables_summary.txt Normal file
View File

@@ -0,0 +1,186 @@
PostGIS Tables Summary
Database:
data_centers
Host:
db.dadams.io:5433
Credentials:
Loaded from PGWEB_* environment variables in /home/dadams/.zsh_secrets.
Point table:
public.us_dc_sample_geocoded
Point table description:
One row per data center loaded into public.us_dc_sample_geocoded from either:
- US_DC_Sample_geocoded.csv (Census + Nominatim geocoded sample), or
- new/IM3_Existing_DataCenters.csv (adapted to the same schema).
The geoid column stores the Census tract GEOID when assigned.
Point table geometry:
- Column: geom
- Type: geometry(Point, 4326)
- Generated from longitude and latitude
Point table validation:
- 1489 rows
- 1489 geom values
- 1489 geoid values
- 1240 rows from IM3_Existing_DataCenters
- 204 address_range geocoded points from U.S. Census Geocoder
- 45 city-precision fallback points from Nominatim/OpenStreetMap
- geocode_precision distribution: building=1039, campus=108, point=93, address_range=204, city=45
Point table indexes:
- Primary key on id
- GiST index on geom
- B-tree index on state_code, city
- B-tree index on geoid
Tract table:
public.data_center_census_tracts_2024
Tract table description:
One row per 2024 Census tract containing at least one data-center point from public.us_dc_sample_geocoded.
Tract table geometry:
- Column: geom
- Type: geometry(MultiPolygon, 4326)
- Source: U.S. Census 2024 cartographic tract boundaries
Tract table validation:
- 611 tract rows
- 1489 data centers assigned to tracts
- 204 address_range data-center points assigned
- 45 city-precision data-center points assigned
- 0 missing ACS population values
Tract table scope note:
The tract table now reflects the expanded 1489-point dataset, including IM3 rows adapted into the point table schema.
Tract table indexes:
- Primary key on geoid
- GiST index on geom
- B-tree index on statefp, countyfp
- B-tree index on data_center_count desc
Tract table enrichment:
ACS 2024 5-year profile fields were added for:
- population
- median age
- households
- average household size
- education
- broadband subscription
- labor force
- unemployment
- household income
- per-capita income
- family poverty rate
- overall poverty rate
- Hispanic/Latino population and percent
- non-Hispanic White population and percent
- non-Hispanic Black population and percent
- non-Hispanic Asian population and percent
- industry worker counts
Derived tract fields:
- data_center_count
- address_range_data_center_count
- city_precision_data_center_count
- data_center_ids
- providers
- primary_industry
- primary_industry_workers
- primary_industry_pct
Primary industry note:
primary_industry is derived from ACS DP03 industry worker categories. It is the industry category with the largest worker count in that tract, not an industry classification for the data center itself.
Recommended direct join:
Use geoid for normal analysis joins between the point table and the tract table.
Example:
select
dc.id,
dc.provider,
dc.facility_name,
tr.geoid,
tr.population,
tr.median_household_income,
tr.primary_industry
from public.us_dc_sample_geocoded dc
join public.data_center_census_tracts_2024 tr
on tr.geoid = dc.geoid;
Array-based join:
Use data_center_ids to reproduce the reverse point-to-tract assignment stored in the tract table.
Example:
select
dc.id,
dc.provider,
dc.facility_name,
tr.geoid,
tr.population,
tr.median_household_income,
tr.primary_industry
from public.data_center_census_tracts_2024 tr
join public.us_dc_sample_geocoded dc
on dc.id = any(tr.data_center_ids);
Recommended spatial join:
Use this when you want PostGIS to recalculate the tract relationship from geometry.
Example:
select
dc.id,
dc.provider,
dc.facility_name,
tr.geoid,
tr.population,
tr.primary_industry
from public.us_dc_sample_geocoded dc
join public.data_center_census_tracts_2024 tr
on st_covers(tr.geom, dc.geom);
Useful tract-level query:
select
geoid,
data_center_count,
population,
median_household_income,
primary_industry,
providers
from public.data_center_census_tracts_2024
order by data_center_count desc;
Caveats:
- 45 data centers used city-precision fallback coordinates, so their tract assignments are approximate.
- Census address geocoding returns address-range coordinates, not guaranteed rooftop or building-centroid coordinates.
- The tract table only includes tracts containing at least one data-center point. It is not a full national or selected-state tract universe.
- ACS values are estimates from the 2024 ACS 5-year profile and should be interpreted as survey estimates.
Reproducible scripts:
- load_postgis_data_centers.py
- create_data_center_census_tract_table.py
IM3 append command used:
python3 load_postgis_data_centers.py --source im3 --append --upsert
Rebuild command used:
python3 create_data_center_census_tract_table.py --replace-final
Local audit files:
- US_DC_Sample_geocoded.csv
- census_tract_acs_2024_selected_states.csv
- census_address_results.csv
- nominatim_city_cache.csv
Primary sources:
- U.S. Census ACS API
- U.S. Census 2024 cartographic tract boundaries
- U.S. Census Geocoder
- Nominatim/OpenStreetMap for city-level fallback coordinates