{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Enhanced Data Center Cluster Map\n", "\n", "This notebook starts from the spatial clustering outputs created by `spatial_clustering_master_data_centers.ipynb` and adds contextual layers from the demographic/RUCA/energy analysis.\n", "\n", "Current features:\n", "- Loads point and cluster summary CSVs from `output/`.\n", "- Recreates the cluster-colored Folium map.\n", "- Enriches point popups with HUC8 watershed, RUCA, tract demographics, and state energy context where available.\n", "- Adds separate layers for clustered points, isolated/noise points, cluster centroids, HUC8 watersheds, and state IM3 projected demand.\n", "- Saves a standalone HTML map to `output/enhanced_master_data_center_spatial_clusters_map.html`.\n", "\n", "Notes from `output/data_center_demographic_ruca_energy_summary.md`:\n", "- HUC8 watershed join is a recommended next step for water-context analysis.\n", "- `im3_state_projected_moderate_50` is populated and used for state projected demand context.\n", "- `seds_state_msn_year` is checked through the state context export, but it currently has no rows, so SEDS fields are blank until that table is populated.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import subprocess\n", "from html import escape\n", "from pathlib import Path\n", "\n", "os.environ.setdefault('MPLCONFIGDIR', '/tmp/matplotlib')\n", "Path(os.environ['MPLCONFIGDIR']).mkdir(parents=True, exist_ok=True)\n", "\n", "import pandas as pd\n", "import folium\n", "import psycopg2\n", "from folium import plugins\n", "\n", "print('pandas:', pd.__version__)\n", "print('folium:', folium.__version__)\n", "print('psycopg2:', psycopg2.__version__)\n" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## Paths And Display Settings" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "OUTPUT_DIR = Path('output')\n", "POINTS_CSV = OUTPUT_DIR / 'master_data_center_spatial_cluster_points.csv'\n", "CLUSTERS_CSV = OUTPUT_DIR / 'master_data_center_spatial_cluster_summary.csv'\n", "POINT_CONTEXT_CSV = OUTPUT_DIR / 'master_data_center_map_context.csv'\n", "HUC8_GEOJSON = OUTPUT_DIR / 'master_data_center_huc8_watersheds.geojson'\n", "STATE_ENERGY_CSV = OUTPUT_DIR / 'master_data_center_state_energy_context.csv'\n", "MAP_HTML = OUTPUT_DIR / 'enhanced_master_data_center_spatial_clusters_map.html'\n", "\n", "MAP_CENTER = [39, -98]\n", "MAP_ZOOM = 4\n", "BASE_TILES = 'CartoDB positron'\n", "\n", "MAX_POINTS = None\n", "\n", "CLUSTERED_RADIUS = 5\n", "NOISE_RADIUS = 3\n", "CENTROID_RADIUS = 7\n", "SHOW_CENTROID_P90_CIRCLES = True\n", "SHOW_HUC8_LAYER = True\n", "SHOW_STATE_ENERGY_LAYER = True\n", "\n", "# Existing DB-backed overlays.\n", "ENABLE_DB_LAYER_LOAD = True\n", "SHOW_INTERNET_CABLES_LAYER = True\n", "SHOW_OPPOSITION_CASES_LAYER = True\n", "SHOW_DROUGHT_AND_SMOKE_CONTEXT = True\n", "\n", "# New requested overlays.\n", "SHOW_CLIMATE_LAYER = True\n", "SHOW_BROADBAND_LAYER = True\n", "SHOW_ELECTION_LAYER = True\n", "\n", "OUTPUT_DIR.mkdir(exist_ok=True)\n", "print('points:', POINTS_CSV)\n", "print('clusters:', CLUSTERS_CSV)\n", "print('point context:', POINT_CONTEXT_CSV)\n", "print('HUC8 GeoJSON:', HUC8_GEOJSON)\n", "print('state energy context:', STATE_ENERGY_CSV)\n", "print('html output:', MAP_HTML)\n" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## Load Cluster Outputs" ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "required_files = [POINTS_CSV, CLUSTERS_CSV]\n", "missing = [str(p) for p in required_files if not p.exists()]\n", "if missing:\n", " raise FileNotFoundError('Missing required cluster output CSV(s): ' + ', '.join(missing))\n", "\n", "points = pd.read_csv(POINTS_CSV)\n", "clusters = pd.read_csv(CLUSTERS_CSV)\n", "point_context = pd.read_csv(POINT_CONTEXT_CSV) if POINT_CONTEXT_CSV.exists() else pd.DataFrame()\n", "state_energy = pd.read_csv(STATE_ENERGY_CSV) if STATE_ENERGY_CSV.exists() else pd.DataFrame()\n", "\n", "if MAX_POINTS is not None:\n", " points = points.head(MAX_POINTS).copy()\n", "\n", "points['cluster_id'] = pd.to_numeric(points['cluster_id'], errors='coerce').fillna(-1).astype(int)\n", "points['is_noise'] = points['cluster_id'].eq(-1)\n", "points['is_clustered'] = ~points['is_noise']\n", "points['name'] = points['name'].fillna('')\n", "points['operator'] = points['operator'].fillna('Unknown').replace('', 'Unknown')\n", "points['city'] = points['city'].fillna('Unknown').replace('', 'Unknown')\n", "points['state'] = points['state'].fillna('UNK').replace('', 'UNK')\n", "points['source'] = points['source'].fillna('unknown').replace('', 'unknown')\n", "\n", "if not point_context.empty:\n", " context_cols = [c for c in point_context.columns if c != 'master_id']\n", " points = points.merge(point_context[['master_id'] + context_cols], on='master_id', how='left')\n", "\n", "if not state_energy.empty:\n", " state_cols = [c for c in state_energy.columns if c != 'state_code']\n", " points = points.merge(state_energy[['state_code'] + state_cols], left_on='state', right_on='state_code', how='left')\n", "\n", "clusters['cluster_id'] = pd.to_numeric(clusters['cluster_id'], errors='coerce').astype(int)\n", "clusters = clusters.sort_values(['point_count', 'radius_km_p90'], ascending=[False, True]).reset_index(drop=True)\n", "clusters['cluster_rank'] = clusters.index + 1\n", "\n", "huc8_geojson = None\n", "if HUC8_GEOJSON.exists():\n", " huc8_geojson = json.loads(HUC8_GEOJSON.read_text())\n", "\n", "n_clusters = points.loc[points['cluster_id'].ne(-1), 'cluster_id'].nunique()\n", "print(f'Loaded {len(points):,} points and {n_clusters:,} clusters')\n", "print('point context columns:', 0 if point_context.empty else len(point_context.columns))\n", "print('HUC8 features:', 0 if huc8_geojson is None else len(huc8_geojson.get('features', [])))\n", "if not state_energy.empty:\n", " seds_available = state_energy['seds_series_count'].notna().sum() if 'seds_series_count' in state_energy.columns else 0\n", " print(f'state energy rows: {len(state_energy):,}; SEDS rows represented: {seds_available:,}')\n", "else:\n", " print('state energy context file not found')\n", "display(points.head())\n", "display(clusters.head(10))\n", "if not state_energy.empty:\n", " display(state_energy.head(10))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "DB_NAME = 'data_centers'\n", "DB_REQUIRED_ENV = ['PGWEB_HOST', 'PGWEB_PORT', 'PGWEB_USER', 'PGWEB_PASSWORD']\n", "\n", "internet_cables_geojson = None\n", "opposition_cases = pd.DataFrame()\n", "drought_context = pd.DataFrame()\n", "smoke_context = pd.DataFrame()\n", "climate_context = pd.DataFrame()\n", "broadband_context = pd.DataFrame()\n", "election_context = pd.DataFrame()\n", "\n", "\n", "def load_zsh_secrets() -> None:\n", " secrets = Path.home() / '.zsh_secrets'\n", " if not secrets.exists():\n", " return\n", " result = subprocess.run(\n", " ['zsh', '-lc', 'source ~/.zsh_secrets >/dev/null 2>&1; env'],\n", " check=True,\n", " capture_output=True,\n", " text=True,\n", " )\n", " for line in result.stdout.splitlines():\n", " if '=' not in line:\n", " continue\n", " key, value = line.split('=', 1)\n", " if key and key not in os.environ:\n", " os.environ[key] = value\n", "\n", "\n", "def db_ready() -> bool:\n", " return all(os.getenv(k) for k in DB_REQUIRED_ENV)\n", "\n", "\n", "def get_conn():\n", " return psycopg2.connect(\n", " host=os.environ['PGWEB_HOST'],\n", " port=os.environ['PGWEB_PORT'],\n", " user=os.environ['PGWEB_USER'],\n", " password=os.environ['PGWEB_PASSWORD'],\n", " dbname=DB_NAME,\n", " )\n", "\n", "\n", "def load_optional_db_layers() -> None:\n", " global internet_cables_geojson, opposition_cases, drought_context, smoke_context\n", " global climate_context, broadband_context, election_context, points\n", "\n", " if not ENABLE_DB_LAYER_LOAD:\n", " print('DB layer load disabled')\n", " return\n", "\n", " load_zsh_secrets()\n", " if not db_ready():\n", " print('Skipping DB-backed layers: missing PGWEB_* environment variables')\n", " return\n", "\n", " with get_conn() as conn:\n", " if SHOW_INTERNET_CABLES_LAYER:\n", " cable_sql = \"\"\"\n", " select json_build_object(\n", " 'type','FeatureCollection',\n", " 'features', coalesce(json_agg(\n", " json_build_object(\n", " 'type','Feature',\n", " 'geometry', ST_AsGeoJSON(geom)::json,\n", " 'properties', json_build_object(\n", " 'feature_id', feature_id,\n", " 'name', name,\n", " 'owners', owners,\n", " 'rfs_year', rfs_year,\n", " 'decommission_year', decommission_year,\n", " 'length_km', length_km,\n", " 'cable_type', cable_type\n", " )\n", " )\n", " ), '[]'::json)\n", " ) as fc\n", " from public.internet_cables\n", " where geom is not null\n", " \"\"\"\n", " internet_cables_geojson = pd.read_sql(cable_sql, conn).iloc[0]['fc']\n", " n_cables = len(internet_cables_geojson.get('features', [])) if internet_cables_geojson else 0\n", " print(f'internet_cables features: {n_cables:,}')\n", "\n", " if SHOW_OPPOSITION_CASES_LAYER:\n", " opposition_sql = \"\"\"\n", " select\n", " id, location, state, lat, lon, investment_billion, status,\n", " developer, commons_type, governance_response, outcome, opposition_type, data_source\n", " from public.opposition_cases_geocoded\n", " where lat is not null and lon is not null\n", " \"\"\"\n", " opposition_cases = pd.read_sql(opposition_sql, conn)\n", " print(f'opposition_cases rows: {len(opposition_cases):,}')\n", "\n", " if SHOW_DROUGHT_AND_SMOKE_CONTEXT:\n", " drought_sql = \"\"\"\n", " select\n", " master_id, usdm_status, worst_dm_category, mean_dm_category,\n", " pct_weeks_in_d2_or_worse, pct_weeks_in_d3_or_worse,\n", " longest_d2_streak_weeks, longest_d3_streak_weeks\n", " from public.data_center_usdm_drought_exposure\n", " \"\"\"\n", " smoke_sql = \"\"\"\n", " select\n", " master_id, hms_status, smoke_period_start, smoke_period_end,\n", " days_observed, days_with_any_smoke, days_with_heavy_smoke,\n", " pct_days_with_any_smoke, pct_days_with_heavy_smoke,\n", " worst_density, mean_density_rank\n", " from public.data_center_hms_smoke_exposure\n", " \"\"\"\n", " drought_context = pd.read_sql(drought_sql, conn)\n", " smoke_context = pd.read_sql(smoke_sql, conn)\n", " print(f'drought_context rows: {len(drought_context):,}')\n", " print(f'smoke_context rows: {len(smoke_context):,}')\n", "\n", " if not drought_context.empty:\n", " cols = [c for c in drought_context.columns if c != 'master_id']\n", " points = points.merge(drought_context[['master_id'] + cols], on='master_id', how='left')\n", "\n", " if not smoke_context.empty:\n", " cols = [c for c in smoke_context.columns if c != 'master_id']\n", " points = points.merge(smoke_context[['master_id'] + cols], on='master_id', how='left')\n", "\n", " if SHOW_CLIMATE_LAYER:\n", " climate_sql = \"\"\"\n", " select\n", " master_id, mean_annual_temperature_c, mean_summer_temperature_c,\n", " max_wet_bulb_temperature_c, extreme_heat_days,\n", " annual_cooling_degree_days_c_mean, annual_precipitation_mm_mean\n", " from public.data_center_historical_climate\n", " \"\"\"\n", " climate_context = pd.read_sql(climate_sql, conn)\n", " print(f'climate_context rows: {len(climate_context):,}')\n", " if not climate_context.empty:\n", " cols = [c for c in climate_context.columns if c != 'master_id']\n", " points = points.merge(climate_context[['master_id'] + cols], on='master_id', how='left')\n", "\n", " if SHOW_BROADBAND_LAYER:\n", " broadband_sql = \"\"\"\n", " select\n", " master_id, census_broadband_subscription_pct,\n", " fcc_bdc_status, fcc_bdc_as_of_date,\n", " fcc_provider_count, fcc_fiber_provider_count, fcc_cable_provider_count,\n", " fcc_fixed_wireless_provider_count,\n", " fcc_max_advertised_download_mbps, fcc_max_advertised_upload_mbps,\n", " fcc_100_20_provider_count\n", " from public.data_center_broadband_connection\n", " \"\"\"\n", " broadband_context = pd.read_sql(broadband_sql, conn)\n", " print(f'broadband_context rows: {len(broadband_context):,}')\n", " if not broadband_context.empty:\n", " cols = [c for c in broadband_context.columns if c != 'master_id']\n", " points = points.merge(broadband_context[['master_id'] + cols], on='master_id', how='left')\n", "\n", " if SHOW_ELECTION_LAYER:\n", " election_sql = \"\"\"\n", " with best_match as (\n", " select distinct on (m.master_id)\n", " m.master_id,\n", " m.state_code as election_state_code,\n", " m.join_method as election_join_method,\n", " m.match_distance_m as election_match_distance_m,\n", " f.feature_id, f.layer_id, f.properties,\n", " ST_Y(ST_PointOnSurface(f.geom)) as election_latitude,\n", " ST_X(ST_PointOnSurface(f.geom)) as election_longitude\n", " from public.data_center_rdh_precinct_vote_matches m\n", " join public.rdh_precinct_vote_features f\n", " on f.feature_id = m.feature_id and f.layer_id = m.layer_id\n", " where f.geom is not null\n", " order by m.master_id,\n", " case m.join_method when 'point_in_precinct' then 0 else 1 end,\n", " m.match_distance_m asc nulls last\n", " )\n", " select\n", " master_id, election_state_code, election_join_method, election_match_distance_m,\n", " feature_id, layer_id, election_latitude, election_longitude,\n", " coalesce((properties->>'LOCALITY'), '') as election_locality,\n", " coalesce((properties->>'PRECINCT'), '') as election_precinct,\n", " nullif(properties->>'G20PREDBID','')::double precision as election_biden_votes,\n", " nullif(properties->>'G20PRERTRU','')::double precision as election_trump_votes,\n", " case\n", " when (coalesce(nullif(properties->>'G20PREDBID','')::double precision,0)\n", " + coalesce(nullif(properties->>'G20PRERTRU','')::double precision,0)) > 0\n", " then 100.0 * coalesce(nullif(properties->>'G20PREDBID','')::double precision,0)\n", " / (coalesce(nullif(properties->>'G20PREDBID','')::double precision,0)\n", " + coalesce(nullif(properties->>'G20PRERTRU','')::double precision,0))\n", " end as election_biden_share_pct,\n", " case\n", " when (coalesce(nullif(properties->>'G20PREDBID','')::double precision,0)\n", " + coalesce(nullif(properties->>'G20PRERTRU','')::double precision,0)) > 0\n", " then 100.0 * coalesce(nullif(properties->>'G20PRERTRU','')::double precision,0)\n", " / (coalesce(nullif(properties->>'G20PREDBID','')::double precision,0)\n", " + coalesce(nullif(properties->>'G20PRERTRU','')::double precision,0))\n", " end as election_trump_share_pct\n", " from best_match\n", " \"\"\"\n", " election_context = pd.read_sql(election_sql, conn)\n", " if not election_context.empty:\n", " election_context['election_trump_margin_pct'] = (\n", " election_context['election_trump_share_pct'] - election_context['election_biden_share_pct']\n", " )\n", " print(f'election_context rows: {len(election_context):,}')\n", " if not election_context.empty:\n", " cols = [c for c in election_context.columns if c != 'master_id']\n", " points = points.merge(election_context[['master_id'] + cols], on='master_id', how='left')\n", "\n", "\n", "load_optional_db_layers()" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "## Optional DB-backed Layer Context\n", "\n", "This section pulls additional overlays directly from PostGIS:\n", "- `public.internet_cables` (line layer)\n", "- `public.opposition_cases_geocoded` (point layer)\n", "- `public.data_center_usdm_drought_exposure` (point popup enrichment)\n", "- `public.data_center_hms_smoke_exposure` (point popup enrichment)\n", "\n", "If DB credentials are unavailable, map generation still works with CSV/GeoJSON sources." ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## Map Helpers" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "CLUSTER_COLORS = [\n", " '#2563eb', '#dc2626', '#16a34a', '#9333ea', '#ea580c', '#0891b2',\n", " '#be123c', '#4f46e5', '#65a30d', '#c026d3', '#0f766e', '#b45309',\n", "]\n", "NOISE_COLOR = '#9ca3af'\n", "CENTROID_COLOR = '#111827'\n", "STATE_ENERGY_COLOR = '#f59e0b'\n", "INTERNET_CABLE_COLOR = '#7c3aed'\n", "OPPOSITION_CASE_COLOR = '#b91c1c'\n", "\n", "cluster_info = clusters.set_index('cluster_id').to_dict('index')\n", "\n", "\n", "def clean_value(value):\n", " if pd.isna(value):\n", " return ''\n", " return escape(str(value))\n", "\n", "\n", "def fmt_number(value, decimals=0, prefix='', suffix=''):\n", " if pd.isna(value):\n", " return ''\n", " try:\n", " value = float(value)\n", " except (TypeError, ValueError):\n", " return clean_value(value)\n", " return f\"{prefix}{value:,.{decimals}f}{suffix}\"\n", "\n", "\n", "def cluster_color(cluster_id):\n", " if cluster_id == -1:\n", " return NOISE_COLOR\n", " info = cluster_info.get(cluster_id, {})\n", " rank = int(info.get('cluster_rank', cluster_id + 1))\n", " return CLUSTER_COLORS[(rank - 1) % len(CLUSTER_COLORS)]\n", "\n", "\n", "def cluster_label_and_size(cluster_id):\n", " if cluster_id == -1:\n", " return 'Noise / isolated', '1', ''\n", " info = cluster_info.get(cluster_id, {})\n", " rank = int(info.get('cluster_rank', cluster_id + 1))\n", " point_count = int(info.get('point_count', 0))\n", " return f'Cluster ID {cluster_id}', f'{point_count:,}', f'Rank {rank} of {n_clusters} by size'\n", "\n", "\n", "def climate_color(mean_summer_c):\n", " if pd.isna(mean_summer_c):\n", " return '#94a3b8'\n", " if mean_summer_c >= 32:\n", " return '#7f1d1d'\n", " if mean_summer_c >= 29:\n", " return '#b91c1c'\n", " if mean_summer_c >= 26:\n", " return '#ea580c'\n", " if mean_summer_c >= 23:\n", " return '#f59e0b'\n", " return '#0284c7'\n", "\n", "\n", "def broadband_color(provider_count):\n", " if pd.isna(provider_count):\n", " return '#94a3b8'\n", " p = float(provider_count)\n", " if p >= 20:\n", " return '#166534'\n", " if p >= 10:\n", " return '#16a34a'\n", " if p >= 5:\n", " return '#65a30d'\n", " if p >= 2:\n", " return '#ca8a04'\n", " return '#b45309'\n", "\n", "\n", "def election_color(margin_pct):\n", " if pd.isna(margin_pct):\n", " return '#94a3b8'\n", " m = float(margin_pct)\n", " if m >= 20:\n", " return '#7f1d1d'\n", " if m >= 5:\n", " return '#dc2626'\n", " if m <= -20:\n", " return '#1e3a8a'\n", " if m <= -5:\n", " return '#2563eb'\n", " return '#6b7280'\n", "\n", "\n", "def point_popup(row):\n", " cluster_label, cluster_size, cluster_rank = cluster_label_and_size(row.cluster_id)\n", " nearest = row.nearest_neighbor_km\n", " nearest_text = f'{nearest:.2f} km' if pd.notna(nearest) else ''\n", " title = clean_value(row.name) or clean_value(row.master_id)\n", "\n", " huc8_lines = ''\n", " if hasattr(row, 'huc8') and pd.notna(row.huc8):\n", " huc8_lines = f'''\n", "
\n", " Watershed
\n", " HUC8: {clean_value(row.huc8)}
\n", " Name: {clean_value(row.huc8_name)}
\n", " States: {clean_value(row.huc8_states)}
\n", " '''\n", "\n", " ruca_lines = ''\n", " if hasattr(row, 'primary_ruca') and pd.notna(row.primary_ruca):\n", " ruca_lines = f'''\n", "
\n", " RUCA / tract context
\n", " RUCA band: {clean_value(row.ruca_band)}
\n", " RUCA code: {fmt_number(row.primary_ruca)}
\n", " {clean_value(row.primary_ruca_description)}
\n", " Median HH income: {fmt_number(row.median_household_income, prefix='$')}
\n", " Bachelor's+: {fmt_number(row.bachelor_or_higher_pct, 1, suffix='%')}
\n", " Poverty: {fmt_number(row.poverty_rate, 1, suffix='%')}
\n", " Non-Hispanic white: {fmt_number(row.non_hispanic_white_pct, 1, suffix='%')}
\n", " '''\n", "\n", " energy_lines = ''\n", " if hasattr(row, 'im3_projected_it_power_mw') and pd.notna(row.im3_projected_it_power_mw):\n", " if hasattr(row, 'seds_series_count') and pd.notna(row.seds_series_count):\n", " seds_note = f\"SEDS year: {fmt_number(row.seds_latest_year)}; series: {fmt_number(row.seds_series_count)}
\"\n", " else:\n", " seds_note = 'SEDS context: unavailable in seds_state_msn_year
'\n", " energy_lines = f'''\n", "
\n", " State energy demand context
\n", " IM3 projected IT power: {fmt_number(row.im3_projected_it_power_mw, suffix=' MW')}
\n", " IM3 cooling water demand: {fmt_number(row.im3_cooling_water_demand_mgy, 1, suffix=' MGY')}
\n", " IM3 water consumption: {fmt_number(row.im3_cooling_water_consumption_mgy, 1, suffix=' MGY')}
\n", " IM3 avg siting score: {fmt_number(row.im3_avg_weighted_siting_score, 3)}
\n", " {seds_note}\n", " '''\n", "\n", " drought_lines = ''\n", " if hasattr(row, 'usdm_status') and pd.notna(row.usdm_status):\n", " drought_lines = f'''\n", "
\n", " Drought context (USDM)
\n", " Status: {clean_value(row.usdm_status)}
\n", " Worst DM category: {fmt_number(row.worst_dm_category)}
\n", " Mean DM category: {fmt_number(row.mean_dm_category, 2)}
\n", " % weeks D2+: {fmt_number(row.pct_weeks_in_d2_or_worse, 1, suffix='%')}
\n", " % weeks D3+: {fmt_number(row.pct_weeks_in_d3_or_worse, 1, suffix='%')}
\n", " Longest D2 streak: {fmt_number(row.longest_d2_streak_weeks)} weeks
\n", " Longest D3 streak: {fmt_number(row.longest_d3_streak_weeks)} weeks
\n", " '''\n", "\n", " smoke_lines = ''\n", " if hasattr(row, 'hms_status') and pd.notna(row.hms_status):\n", " smoke_lines = f'''\n", "
\n", " Wildfire smoke context (HMS)
\n", " Status: {clean_value(row.hms_status)}
\n", " Observed days: {fmt_number(row.days_observed)}
\n", " Any-smoke days: {fmt_number(row.days_with_any_smoke)} ({fmt_number(row.pct_days_with_any_smoke, 1, suffix='%')})
\n", " Heavy-smoke days: {fmt_number(row.days_with_heavy_smoke)} ({fmt_number(row.pct_days_with_heavy_smoke, 1, suffix='%')})
\n", " Worst density class: {clean_value(row.worst_density)}
\n", " Mean density rank: {fmt_number(row.mean_density_rank, 2)}
\n", " '''\n", "\n", " climate_lines = ''\n", " if hasattr(row, 'mean_summer_temperature_c') and pd.notna(row.mean_summer_temperature_c):\n", " climate_lines = f'''\n", "
\n", " Climate context
\n", " Mean annual temp: {fmt_number(row.mean_annual_temperature_c, 1, suffix=' C')}
\n", " Mean summer temp: {fmt_number(row.mean_summer_temperature_c, 1, suffix=' C')}
\n", " Max wet-bulb temp: {fmt_number(row.max_wet_bulb_temperature_c, 1, suffix=' C')}
\n", " Extreme heat days: {fmt_number(row.extreme_heat_days)}
\n", " Annual CDD mean: {fmt_number(row.annual_cooling_degree_days_c_mean, 0)}
\n", " Annual precip mean: {fmt_number(row.annual_precipitation_mm_mean, 0, suffix=' mm')}
\n", " '''\n", "\n", " broadband_lines = ''\n", " if hasattr(row, 'fcc_bdc_status') and pd.notna(row.fcc_bdc_status):\n", " broadband_lines = f'''\n", "
\n", " Broadband context
\n", " FCC BDC status: {clean_value(row.fcc_bdc_status)}
\n", " FCC as-of date: {clean_value(row.fcc_bdc_as_of_date)}
\n", " Census broadband subscription: {fmt_number(row.census_broadband_subscription_pct, 1, suffix='%')}
\n", " Provider count: {fmt_number(row.fcc_provider_count)}
\n", " Fiber providers: {fmt_number(row.fcc_fiber_provider_count)}
\n", " Cable providers: {fmt_number(row.fcc_cable_provider_count)}
\n", " Fixed wireless providers: {fmt_number(row.fcc_fixed_wireless_provider_count)}
\n", " Max advertised down/up: {fmt_number(row.fcc_max_advertised_download_mbps, 0, suffix=' /')} {fmt_number(row.fcc_max_advertised_upload_mbps, 0, suffix=' Mbps')}
\n", " Providers >=100/20: {fmt_number(row.fcc_100_20_provider_count)}
\n", " '''\n", "\n", " election_lines = ''\n", " if hasattr(row, 'election_biden_share_pct') and pd.notna(row.election_biden_share_pct):\n", " election_lines = f'''\n", "
\n", " Election context (2020 precinct)
\n", " State: {clean_value(row.election_state_code)}
\n", " Locality: {clean_value(row.election_locality)}
\n", " Precinct: {clean_value(row.election_precinct)}
\n", " Biden share: {fmt_number(row.election_biden_share_pct, 1, suffix='%')}
\n", " Trump share: {fmt_number(row.election_trump_share_pct, 1, suffix='%')}
\n", " Trump margin: {fmt_number(row.election_trump_margin_pct, 1, suffix=' pp')}
\n", " Join method: {clean_value(row.election_join_method)}
\n", " '''\n", "\n", " return folium.Popup(f'''\n", "
\n", " {title}
\n", " {clean_value(row.city)}, {clean_value(row.state)}
\n", "
\n", " {cluster_label}
\n", " {cluster_rank}
\n", " Cluster size: {cluster_size} data center(s)
\n", " Source: {clean_value(row.source)}
\n", " Operator: {clean_value(row.operator)}
\n", " Nearest neighbor: {nearest_text}
\n", " Master ID: {clean_value(row.master_id)}\n", " {huc8_lines}\n", " {ruca_lines}\n", " {energy_lines}\n", " {drought_lines}\n", " {smoke_lines}\n", " {climate_lines}\n", " {broadband_lines}\n", " {election_lines}\n", "
\n", " ''', max_width=460)\n", "\n", "\n", "def centroid_popup(row):\n", " return folium.Popup(f'''\n", "
\n", " Cluster ID {int(row.cluster_id)}
\n", " Rank {int(row.cluster_rank)} of {n_clusters} by size
\n", "
\n", " Points: {int(row.point_count):,}
\n", " p50 radius: {row.radius_km_p50:.1f} km
\n", " p90 radius: {row.radius_km_p90:.1f} km
\n", " Max radius: {row.radius_km_max:.1f} km
\n", " States: {clean_value(row.states)}
\n", " Cities: {clean_value(row.cities)}
\n", " Operators: {clean_value(row.operators)}\n", "
\n", " ''', max_width=420)\n", "\n", "\n", "def huc8_style(feature):\n", " count = feature['properties'].get('data_center_count') or 0\n", " if count >= 100:\n", " fill = '#075985'\n", " elif count >= 50:\n", " fill = '#0284c7'\n", " elif count >= 20:\n", " fill = '#38bdf8'\n", " elif count >= 10:\n", " fill = '#7dd3fc'\n", " else:\n", " fill = '#bae6fd'\n", " return {'fillColor': fill, 'color': '#0369a1', 'weight': 1, 'fillOpacity': 0.22}\n", "\n", "\n", "def huc8_popup(feature):\n", " p = feature['properties']\n", " return folium.Popup(f'''\n", "
\n", " {clean_value(p.get('name'))}
\n", " HUC8: {clean_value(p.get('huc8'))}
\n", " States: {clean_value(p.get('states'))}
\n", "
\n", " Data centers: {fmt_number(p.get('data_center_count'))}
\n", " Clustered DCs: {fmt_number(p.get('clustered_data_center_count'))}
\n", " Distinct clusters: {fmt_number(p.get('cluster_count'))}
\n", " Area: {fmt_number(p.get('areasqkm'), 0, suffix=' sq km')}\n", "
\n", " ''', max_width=360)\n", "\n", "\n", "def state_energy_popup(row):\n", " if hasattr(row, 'seds_series_count') and pd.notna(row.seds_series_count):\n", " seds_note = f\"SEDS latest year: {fmt_number(row.seds_latest_year)}; series: {fmt_number(row.seds_series_count)}\"\n", " else:\n", " seds_note = 'SEDS context: unavailable in seds_state_msn_year'\n", " return folium.Popup(f'''\n", "
\n", " {clean_value(row.state_code)} state energy context
\n", " Current data centers: {fmt_number(row.current_data_center_count)}
\n", "
\n", " IM3 projected sites: {fmt_number(row.im3_project_count)}
\n", " IM3 projected IT power: {fmt_number(row.im3_projected_it_power_mw, suffix=' MW')}
\n", " IM3 cooling water demand: {fmt_number(row.im3_cooling_water_demand_mgy, 1, suffix=' MGY')}
\n", " IM3 water consumption: {fmt_number(row.im3_cooling_water_consumption_mgy, 1, suffix=' MGY')}
\n", " IM3 avg siting score: {fmt_number(row.im3_avg_weighted_siting_score, 3)}
\n", " {seds_note}\n", "
\n", " ''', max_width=380)\n", "\n", "\n", "def cable_style(_feature):\n", " return {'color': INTERNET_CABLE_COLOR, 'weight': 1.6, 'opacity': 0.45}\n", "\n", "\n", "def cable_popup(feature):\n", " p = feature.get('properties', {})\n", " return folium.Popup(f'''\n", "
\n", " {clean_value(p.get('name') or 'Internet cable')}
\n", " Owners: {clean_value(p.get('owners'))}
\n", " Type: {clean_value(p.get('cable_type'))}
\n", " RFS year: {fmt_number(p.get('rfs_year'))}
\n", " Decommission year: {fmt_number(p.get('decommission_year'))}
\n", " Length: {fmt_number(p.get('length_km'), 0, suffix=' km')}
\n", " Feature ID: {clean_value(p.get('feature_id'))}\n", "
\n", " ''', max_width=380)\n", "\n", "\n", "def opposition_popup(row):\n", " return folium.Popup(f'''\n", "
\n", " Opposition case {fmt_number(row.id)}
\n", " Location: {clean_value(row.location)}
\n", " State: {clean_value(row.state)}
\n", "
\n", " Status: {clean_value(row.status)}
\n", " Developer: {clean_value(row.developer)}
\n", " Investment: {fmt_number(row.investment_billion, 2, prefix='$', suffix='B')}
\n", " Opposition type: {clean_value(row.opposition_type)}
\n", " Commons type: {clean_value(row.commons_type)}
\n", " Governance response: {clean_value(row.governance_response)}
\n", " Outcome: {clean_value(row.outcome)}
\n", " Source: {clean_value(row.data_source)}\n", "
\n", " ''', max_width=400)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "def add_overlay_legend(map_obj: folium.Map) -> None:\n", " legend_html = \"\"\"\n", "
\n", "
Overlay Legend
\n", "\n", "
Climate (mean summer temperature)
\n", "
< 23 C
\n", "
23-25.9 C
\n", "
26-28.9 C
\n", "
29-31.9 C
\n", "
>= 32 C
\n", "\n", "
Broadband (FCC provider count)
\n", "
0-1
\n", "
2-4
\n", "
5-9
\n", "
10-19
\n", "
>= 20
\n", "\n", "
Election (Trump margin, pp)
\n", "
<= -20
\n", "
-19.9 to -5
\n", "
-4.9 to 4.9
\n", "
5 to 19.9
\n", "
>= 20
\n", "
\n", " \"\"\"\n", " map_obj.get_root().html.add_child(folium.Element(legend_html))" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "## Build The Map" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "def build_cluster_map(points_df: pd.DataFrame, clusters_df: pd.DataFrame) -> folium.Map:\n", " m = folium.Map(location=MAP_CENTER, zoom_start=MAP_ZOOM, tiles=BASE_TILES, control_scale=True)\n", " plugins.Fullscreen(position='topleft').add_to(m)\n", " plugins.MeasureControl(position='topleft', primary_length_unit='kilometers').add_to(m)\n", " plugins.MiniMap(toggle_display=True, minimized=True).add_to(m)\n", "\n", " huc8_layer = folium.FeatureGroup(name='HUC8 watersheds with data centers', show=False)\n", " state_energy_layer = folium.FeatureGroup(name='State energy demand context (IM3 / SEDS)', show=False)\n", " cables_layer = folium.FeatureGroup(name='Internet cable network', show=False)\n", " opposition_layer = folium.FeatureGroup(name='Opposition cases', show=False)\n", " climate_layer = folium.FeatureGroup(name='Climate stress context', show=False)\n", " broadband_layer = folium.FeatureGroup(name='Broadband capacity context', show=False)\n", " election_layer = folium.FeatureGroup(name='Election context (2020 precinct match)', show=False)\n", " clustered_layer = folium.FeatureGroup(name='Data centers: clustered', show=True)\n", " noise_layer = folium.FeatureGroup(name='Data centers: noise / isolated', show=True)\n", " centroid_layer = folium.FeatureGroup(name='Cluster centroids and p90 radius', show=True)\n", "\n", " if SHOW_HUC8_LAYER and huc8_geojson is not None:\n", " folium.GeoJson(\n", " huc8_geojson,\n", " name='HUC8 watersheds with data centers',\n", " style_function=huc8_style,\n", " highlight_function=lambda feature: {'weight': 3, 'fillOpacity': 0.35},\n", " tooltip=folium.GeoJsonTooltip(\n", " fields=['name', 'huc8', 'data_center_count', 'cluster_count'],\n", " aliases=['HUC8', 'Code', 'Data centers', 'Clusters'],\n", " localize=True,\n", " sticky=False,\n", " ),\n", " popup=huc8_popup,\n", " ).add_to(huc8_layer)\n", "\n", " if SHOW_STATE_ENERGY_LAYER and not state_energy.empty:\n", " for row in state_energy.dropna(subset=['map_latitude', 'map_longitude']).itertuples(index=False):\n", " power = getattr(row, 'im3_projected_it_power_mw')\n", " radius = 6 if pd.isna(power) else max(6, min(28, 4 + float(power) ** 0.5 / 2.4))\n", " folium.CircleMarker(\n", " location=[row.map_latitude, row.map_longitude],\n", " radius=radius,\n", " color='#92400e',\n", " fill=True,\n", " fill_color=STATE_ENERGY_COLOR,\n", " fill_opacity=0.55,\n", " weight=1.5,\n", " popup=state_energy_popup(row),\n", " tooltip=f'{row.state_code}: IM3 {fmt_number(power, suffix=\" MW\")}',\n", " ).add_to(state_energy_layer)\n", "\n", " if SHOW_INTERNET_CABLES_LAYER and internet_cables_geojson is not None:\n", " folium.GeoJson(\n", " internet_cables_geojson,\n", " name='Internet cable network',\n", " style_function=cable_style,\n", " highlight_function=lambda _f: {'weight': 3.0, 'opacity': 0.85},\n", " popup=cable_popup,\n", " tooltip=folium.GeoJsonTooltip(\n", " fields=['name', 'cable_type', 'rfs_year'],\n", " aliases=['Cable', 'Type', 'RFS year'],\n", " localize=True,\n", " sticky=False,\n", " ),\n", " ).add_to(cables_layer)\n", "\n", " if SHOW_OPPOSITION_CASES_LAYER and not opposition_cases.empty:\n", " for row in opposition_cases.itertuples(index=False):\n", " marker_radius = 5 if pd.isna(row.investment_billion) else max(5, min(14, 4 + float(row.investment_billion) ** 0.5 * 2.2))\n", " folium.CircleMarker(\n", " location=[row.lat, row.lon],\n", " radius=marker_radius,\n", " color='#7f1d1d',\n", " fill=True,\n", " fill_color=OPPOSITION_CASE_COLOR,\n", " fill_opacity=0.75,\n", " weight=1.2,\n", " popup=opposition_popup(row),\n", " tooltip=f\"Opposition case: {row.state} ({clean_value(row.status)})\",\n", " ).add_to(opposition_layer)\n", "\n", " if SHOW_CLIMATE_LAYER:\n", " climate_rows = points_df.dropna(subset=['mean_summer_temperature_c']) if 'mean_summer_temperature_c' in points_df.columns else pd.DataFrame()\n", " for row in climate_rows.itertuples(index=False):\n", " color = climate_color(row.mean_summer_temperature_c)\n", " radius = max(4, min(12, 3 + (float(row.extreme_heat_days) if pd.notna(row.extreme_heat_days) else 0.0) ** 0.5 / 2.0))\n", " folium.CircleMarker(\n", " location=[row.latitude, row.longitude],\n", " radius=radius,\n", " color=color,\n", " fill=True,\n", " fill_color=color,\n", " fill_opacity=0.35,\n", " weight=1,\n", " tooltip=f\"Climate: summer {fmt_number(row.mean_summer_temperature_c, 1, suffix=' C')}; heat days {fmt_number(row.extreme_heat_days)}\",\n", " ).add_to(climate_layer)\n", "\n", " if SHOW_BROADBAND_LAYER:\n", " bb_rows = points_df.dropna(subset=['fcc_provider_count']) if 'fcc_provider_count' in points_df.columns else pd.DataFrame()\n", " for row in bb_rows.itertuples(index=False):\n", " color = broadband_color(row.fcc_provider_count)\n", " speed = float(row.fcc_max_advertised_download_mbps) if pd.notna(row.fcc_max_advertised_download_mbps) else 0.0\n", " radius = max(4, min(12, 4 + speed ** 0.5 / 10.0))\n", " folium.CircleMarker(\n", " location=[row.latitude, row.longitude],\n", " radius=radius,\n", " color=color,\n", " fill=True,\n", " fill_color=color,\n", " fill_opacity=0.3,\n", " weight=1,\n", " tooltip=f\"Broadband: providers {fmt_number(row.fcc_provider_count)}; max down {fmt_number(row.fcc_max_advertised_download_mbps, 0, suffix=' Mbps')}\",\n", " ).add_to(broadband_layer)\n", "\n", " if SHOW_ELECTION_LAYER and not election_context.empty:\n", " for row in election_context.dropna(subset=['election_latitude', 'election_longitude']).itertuples(index=False):\n", " margin = getattr(row, 'election_trump_margin_pct')\n", " color = election_color(margin)\n", " radius = max(4, min(11, 4 + abs(float(margin)) / 8.0)) if pd.notna(margin) else 5\n", " tip = (\n", " f\"Election precinct: {row.election_state_code} {clean_value(row.election_locality)}; \"\n", " f\"Biden {fmt_number(row.election_biden_share_pct, 1, suffix='%')} / \"\n", " f\"Trump {fmt_number(row.election_trump_share_pct, 1, suffix='%')}\"\n", " )\n", " folium.CircleMarker(\n", " location=[row.election_latitude, row.election_longitude],\n", " radius=radius,\n", " color=color,\n", " fill=True,\n", " fill_color=color,\n", " fill_opacity=0.4,\n", " weight=1,\n", " tooltip=tip,\n", " ).add_to(election_layer)\n", "\n", " bounds = []\n", " for row in points_df.itertuples(index=False):\n", " cluster_label, cluster_size, _ = cluster_label_and_size(row.cluster_id)\n", " marker = folium.CircleMarker(\n", " location=[row.latitude, row.longitude],\n", " radius=NOISE_RADIUS if row.cluster_id == -1 else CLUSTERED_RADIUS,\n", " color=cluster_color(row.cluster_id),\n", " fill=True,\n", " fill_opacity=0.75,\n", " weight=1,\n", " popup=point_popup(row),\n", " tooltip=f'{cluster_label}; size={cluster_size}',\n", " )\n", " if row.cluster_id == -1:\n", " marker.add_to(noise_layer)\n", " else:\n", " marker.add_to(clustered_layer)\n", " bounds.append([row.latitude, row.longitude])\n", "\n", " for row in clusters_df.itertuples(index=False):\n", " color = cluster_color(int(row.cluster_id))\n", " location = [row.centroid_latitude, row.centroid_longitude]\n", " if SHOW_CENTROID_P90_CIRCLES and pd.notna(row.radius_km_p90):\n", " folium.Circle(\n", " location=location,\n", " radius=float(row.radius_km_p90) * 1000,\n", " color=color,\n", " fill=False,\n", " weight=1,\n", " opacity=0.45,\n", " ).add_to(centroid_layer)\n", " folium.CircleMarker(\n", " location=location,\n", " radius=CENTROID_RADIUS,\n", " color=CENTROID_COLOR,\n", " fill=True,\n", " fill_color=color,\n", " fill_opacity=0.95,\n", " weight=2,\n", " popup=centroid_popup(row),\n", " tooltip=f'Cluster {int(row.cluster_id)} centroid; {int(row.point_count):,} points',\n", " ).add_to(centroid_layer)\n", "\n", " huc8_layer.add_to(m)\n", " state_energy_layer.add_to(m)\n", " cables_layer.add_to(m)\n", " opposition_layer.add_to(m)\n", " climate_layer.add_to(m)\n", " broadband_layer.add_to(m)\n", " election_layer.add_to(m)\n", " clustered_layer.add_to(m)\n", " noise_layer.add_to(m)\n", " centroid_layer.add_to(m)\n", " folium.LayerControl(collapsed=False).add_to(m)\n", " if bounds:\n", " m.fit_bounds(bounds, padding=(20, 20))\n", " return m\n", "\n", "\n", "cluster_map = build_cluster_map(points, clusters)\n", "cluster_map\n" ] }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "## Export HTML" ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "cluster_map.save(MAP_HTML)\n", "print('Wrote:', MAP_HTML.resolve())" ] }, { "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "## Feature Staging Area\n", "\n", "Tell me what you want to add next and I will build it here. Good candidates:\n", "- filters by source/operator/state/cluster size\n", "- toggle layers for top-N clusters\n", "- water-stress overlays on top of the HUC8 layer\n", "- generator capacity / fuel mix overlays around each DC\n", "- opposition cases overlay\n", "- cluster labels or summary panels\n", "- downloadable GeoJSON exports\n" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.5" } }, "nbformat": 4, "nbformat_minor": 5 }