diff --git a/README.md b/README.md index 2af2a7c..003e563 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,9 @@ A comprehensive geospatial research project investigating the spatial concentrat ## Documentation -- **[Database Tables](database-tables.md)** - Complete database schema with table descriptions, column definitions, and SQL examples -- **[Research Ideas](research-ideas.md)** - Future research directions, data improvements, and potential collaborations +- **[Database Tables](docs/database-tables.md)** - Complete database schema with table descriptions, column definitions, and SQL examples +- **[Research Ideas](docs/research-ideas.md)** - Future research directions, data improvements, and potential collaborations +- **[SQL Queries](docs/query_legiscan_bills.sql)** - Pre-built legislative analysis queries ## Project Overview @@ -93,25 +94,25 @@ Facilities in DBSCAN clusters differ significantly from isolated sites: ### Core Python Scripts -**Data Ingestion** -- `load_postgis_data_centers.py` - Load curated data center CSV into PostGIS -- `load_postgis_osm_data_centers.py` - Fetch OSM data centers via Overpass API -- `build_master_data_centers.py` - Deduplicate & merge curated + OSM sources -- `load_postgis_internet_cables.py` - Load submarine cables and landing points -- `ingest_eia_energy_layers.py` - Ingest EIA energy data via API -- `build_watershed_huc8_tables.py` - Load USGS HUC8 watersheds -- `ingest_legiscan.py` - Download all US state/federal bills 2016–2026 via LegiScan API, tag for data center research topics +**Data Ingestion** (`scripts/`) +- `scripts/load_postgis_data_centers.py` - Load curated data center CSV into PostGIS +- `scripts/load_postgis_osm_data_centers.py` - Fetch OSM data centers via Overpass API +- `scripts/build_master_data_centers.py` - Deduplicate & merge curated + OSM sources +- `scripts/load_postgis_internet_cables.py` - Load submarine cables and landing points +- `scripts/ingest_eia_energy_layers.py` - Ingest EIA energy data via API +- `scripts/build_watershed_huc8_tables.py` - Load USGS HUC8 watersheds +- `scripts/ingest_legiscan.py` - Download all US state/federal bills 2016–2026 via LegiScan API, tag for data center research topics **Enrichment** -- `create_data_center_census_tract_table.py` - Join data centers to Census tracts with ACS demographics -- `build_fcc_bdc_broadband_connection_table.py` - Build per-facility broadband provider table -- `build_fcc_bdc_location_provider_aggregates.py` - Aggregate FCC BDC data by county/tract +- `scripts/create_data_center_census_tract_table.py` - Join data centers to Census tracts with ACS demographics +- `scripts/build_fcc_bdc_broadband_connection_table.py` - Build per-facility broadband provider table +- `scripts/build_fcc_bdc_location_provider_aggregates.py` - Aggregate FCC BDC data by county/tract **Analysis** -- `analyze_dc_tract_concentration.py` - Tract-level cost concentration analysis (Gini, HHI, demographic deltas) -- `analyze_cables_concentration.py` - Test if data centers cluster near submarine cables -- `make_data_center_map.py` - Generate Leaflet map of data centers -- `make_internet_cables_map.py` - Generate Leaflet map of data centers + cables +- `scripts/analyze_dc_tract_concentration.py` - Tract-level cost concentration analysis (Gini, HHI, demographic deltas) +- `scripts/analyze_cables_concentration.py` - Test if data centers cluster near submarine cables +- `scripts/make_data_center_map.py` - Generate Leaflet map of data centers +- `scripts/make_internet_cables_map.py` - Generate Leaflet map of data centers + cables ### Key Jupyter Notebooks - `spatial_clustering_master_data_centers.ipynb` - DBSCAN clustering of data centers @@ -161,34 +162,34 @@ Credentials stored in `~/.zsh_secrets`, loaded via environment variables: ```bash # 1. Load base data center data -python3 load_postgis_data_centers.py -python3 load_postgis_osm_data_centers.py -python3 build_master_data_centers.py +python3 scripts/load_postgis_data_centers.py +python3 scripts/load_postgis_osm_data_centers.py +python3 scripts/build_master_data_centers.py # 2. Enrich with context layers -python3 create_data_center_census_tract_table.py --replace-final -python3 load_postgis_internet_cables.py -python3 ingest_eia_energy_layers.py --category power -python3 build_watershed_huc8_tables.py +python3 scripts/create_data_center_census_tract_table.py --replace-final +python3 scripts/load_postgis_internet_cables.py +python3 scripts/ingest_eia_energy_layers.py --category power +python3 scripts/build_watershed_huc8_tables.py # 3. Run analyses -python3 analyze_dc_tract_concentration.py > output/tract_analysis.txt -python3 analyze_cables_concentration.py > output/cables_analysis.txt +python3 scripts/analyze_dc_tract_concentration.py > output/tract_analysis.txt +python3 scripts/analyze_cables_concentration.py > output/cables_analysis.txt # 4. Execute notebooks jupyter notebook cluster_analysis.ipynb # 5. Load legislation (all states, 2016-2026) -python3 ingest_legiscan.py --all +python3 scripts/ingest_legiscan.py --all # Weekly refresh (skips unchanged sessions): -python3 ingest_legiscan.py --fetch --load +python3 scripts/ingest_legiscan.py --fetch --load ``` ### Generate Maps ```bash -python3 make_data_center_map.py -python3 make_internet_cables_map.py +python3 scripts/make_data_center_map.py +python3 scripts/make_internet_cables_map.py ``` ## Key Outputs diff --git a/DC_by_State.csv b/data/DC_by_State.csv similarity index 100% rename from DC_by_State.csv rename to data/DC_by_State.csv diff --git a/DC_by_State.xlsx b/data/DC_by_State.xlsx similarity index 100% rename from DC_by_State.xlsx rename to data/DC_by_State.xlsx diff --git a/US_DC_Sample.xlsx b/data/US_DC_Sample.xlsx similarity index 100% rename from US_DC_Sample.xlsx rename to data/US_DC_Sample.xlsx diff --git a/US_DC_Sample_clean.csv b/data/US_DC_Sample_clean.csv similarity index 100% rename from US_DC_Sample_clean.csv rename to data/US_DC_Sample_clean.csv diff --git a/US_DC_Sample_geocoded.csv b/data/US_DC_Sample_geocoded.csv similarity index 100% rename from US_DC_Sample_geocoded.csv rename to data/US_DC_Sample_geocoded.csv diff --git a/US_DC_Sample_geocoding_base.csv b/data/US_DC_Sample_geocoding_base.csv similarity index 100% rename from US_DC_Sample_geocoding_base.csv rename to data/US_DC_Sample_geocoding_base.csv diff --git a/cb_2024_us_tract_500k.zip b/data/cb_2024_us_tract_500k.zip similarity index 100% rename from cb_2024_us_tract_500k.zip rename to data/cb_2024_us_tract_500k.zip diff --git a/census_address_input.csv b/data/census_address_input.csv similarity index 100% rename from census_address_input.csv rename to data/census_address_input.csv diff --git a/census_address_results.csv b/data/census_address_results.csv similarity index 100% rename from census_address_results.csv rename to data/census_address_results.csv diff --git a/census_test_input.csv b/data/census_test_input.csv similarity index 100% rename from census_test_input.csv rename to data/census_test_input.csv diff --git a/census_test_results.csv b/data/census_test_results.csv similarity index 100% rename from census_test_results.csv rename to data/census_test_results.csv diff --git a/census_tract_acs_2024_selected_states.csv b/data/census_tract_acs_2024_selected_states.csv similarity index 100% rename from census_tract_acs_2024_selected_states.csv rename to data/census_tract_acs_2024_selected_states.csv diff --git a/geocoding_summary.txt b/data/geocoding_summary.txt similarity index 100% rename from geocoding_summary.txt rename to data/geocoding_summary.txt diff --git a/nominatim_city_cache.csv b/data/nominatim_city_cache.csv similarity index 100% rename from nominatim_city_cache.csv rename to data/nominatim_city_cache.csv diff --git a/postgis_tables_summary.txt b/data/postgis_tables_summary.txt similarity index 100% rename from postgis_tables_summary.txt rename to data/postgis_tables_summary.txt diff --git a/cables_concentration_report.docx b/docs/cables_concentration_report.docx similarity index 100% rename from cables_concentration_report.docx rename to docs/cables_concentration_report.docx diff --git a/cables_concentration_report.md b/docs/cables_concentration_report.md similarity index 100% rename from cables_concentration_report.md rename to docs/cables_concentration_report.md diff --git a/database-tables.md b/docs/database-tables.md similarity index 100% rename from database-tables.md rename to docs/database-tables.md diff --git a/query_legiscan_bills.sql b/docs/query_legiscan_bills.sql similarity index 100% rename from query_legiscan_bills.sql rename to docs/query_legiscan_bills.sql diff --git a/research-ideas.md b/docs/research-ideas.md similarity index 100% rename from research-ideas.md rename to docs/research-ideas.md diff --git a/data_center_map.html b/output/data_center_map.html similarity index 100% rename from data_center_map.html rename to output/data_center_map.html diff --git a/data_centers_cables_map.html b/output/data_centers_cables_map.html similarity index 100% rename from data_centers_cables_map.html rename to output/data_centers_cables_map.html diff --git a/data_centers_cables_map_us.html b/output/data_centers_cables_map_us.html similarity index 100% rename from data_centers_cables_map_us.html rename to output/data_centers_cables_map_us.html diff --git a/master_data_center_spatial_cluster_points.csv b/output/master_data_center_spatial_cluster_points_v2.csv similarity index 100% rename from master_data_center_spatial_cluster_points.csv rename to output/master_data_center_spatial_cluster_points_v2.csv diff --git a/master_data_center_spatial_cluster_summary.csv b/output/master_data_center_spatial_cluster_summary_v2.csv similarity index 100% rename from master_data_center_spatial_cluster_summary.csv rename to output/master_data_center_spatial_cluster_summary_v2.csv diff --git a/analyze_cables_concentration.py b/scripts/analyze_cables_concentration.py similarity index 100% rename from analyze_cables_concentration.py rename to scripts/analyze_cables_concentration.py diff --git a/analyze_dc_tract_concentration.py b/scripts/analyze_dc_tract_concentration.py similarity index 100% rename from analyze_dc_tract_concentration.py rename to scripts/analyze_dc_tract_concentration.py diff --git a/build_fcc_bdc_broadband_connection_table.py b/scripts/build_fcc_bdc_broadband_connection_table.py similarity index 100% rename from build_fcc_bdc_broadband_connection_table.py rename to scripts/build_fcc_bdc_broadband_connection_table.py diff --git a/build_fcc_bdc_location_provider_aggregates.py b/scripts/build_fcc_bdc_location_provider_aggregates.py similarity index 100% rename from build_fcc_bdc_location_provider_aggregates.py rename to scripts/build_fcc_bdc_location_provider_aggregates.py diff --git a/build_master_data_centers.py b/scripts/build_master_data_centers.py similarity index 100% rename from build_master_data_centers.py rename to scripts/build_master_data_centers.py diff --git a/build_watershed_huc8_tables.py b/scripts/build_watershed_huc8_tables.py similarity index 100% rename from build_watershed_huc8_tables.py rename to scripts/build_watershed_huc8_tables.py diff --git a/create_data_center_census_tract_table.py b/scripts/create_data_center_census_tract_table.py similarity index 100% rename from create_data_center_census_tract_table.py rename to scripts/create_data_center_census_tract_table.py diff --git a/ingest_eia_energy_layers.py b/scripts/ingest_eia_energy_layers.py similarity index 100% rename from ingest_eia_energy_layers.py rename to scripts/ingest_eia_energy_layers.py diff --git a/ingest_legiscan.py b/scripts/ingest_legiscan.py similarity index 100% rename from ingest_legiscan.py rename to scripts/ingest_legiscan.py diff --git a/load_postgis_data_centers.py b/scripts/load_postgis_data_centers.py similarity index 100% rename from load_postgis_data_centers.py rename to scripts/load_postgis_data_centers.py diff --git a/load_postgis_internet_cables.py b/scripts/load_postgis_internet_cables.py similarity index 100% rename from load_postgis_internet_cables.py rename to scripts/load_postgis_internet_cables.py diff --git a/load_postgis_osm_data_centers.py b/scripts/load_postgis_osm_data_centers.py similarity index 100% rename from load_postgis_osm_data_centers.py rename to scripts/load_postgis_osm_data_centers.py diff --git a/make_data_center_map.py b/scripts/make_data_center_map.py similarity index 100% rename from make_data_center_map.py rename to scripts/make_data_center_map.py diff --git a/make_internet_cables_map.py b/scripts/make_internet_cables_map.py similarity index 100% rename from make_internet_cables_map.py rename to scripts/make_internet_cables_map.py