extended broadband, fema table, updated map

2026-05-22 20:14:11 -07:00
parent 6afa97e0ba
commit c81dba025b
6 changed files with 64044 additions and 33182 deletions
--- a/fema_nri_data_centers.ipynb
+++ b/fema_nri_data_centers.ipynb
@@ -0,0 +1,623 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0",
+   "metadata": {},
+   "source": [
+    "# FEMA National Risk Index Exposure for Master Data Centers\n",
+    "\n",
+    "Builds two tables from the FEMA NRI (December 2025) Census Tract shapefile in `data/NRI_Shapefile_CensusTracts/`:\n",
+    "\n",
+    "1. **`public.nri_census_tracts`** — full NRI by census tract (~84k rows). All 460+ NRI columns plus the polygon geometry. Useful for any later spatial / tract-level work.\n",
+    "2. **`public.data_center_nri_exposure`** — per-DC summary keyed by `master_id`, joinable to `master_data_centers`. Each row carries the NRI fields of the census tract that contains the DC point (composite scores + per-hazard risk scores/ratings).\n",
+    "\n",
+    "**NRI composite metrics** (tract level):\n",
+    "\n",
+    "| Field | Meaning |\n",
+    "|---|---|\n",
+    "| `RISK_SCORE` / `RISK_RATNG` | Overall National Risk Index score & rating |\n",
+    "| `EAL_SCORE` / `EAL_RATNG` | Expected Annual Loss score & rating |\n",
+    "| `SOVI_SCORE` / `SOVI_RATNG` | Social Vulnerability score & rating |\n",
+    "| `RESL_SCORE` / `RESL_RATNG` | Community Resilience score & rating |\n",
+    "\n",
+    "**Per-hazard fields** (18 hazards: AVLN, CFLD, CWAV, DRGT, ERQK, HAIL, HWAV, HRCN, ISTM, IFLD, LNDS, LTNG, SWND, TRND, TSUN, VLCN, WFIR, WNTW):\n",
+    "- `{HZ}_AFREQ` — annualized event frequency\n",
+    "- `{HZ}_EXPT` — total exposure value\n",
+    "- `{HZ}_EALT` — total expected annual loss (dollars)\n",
+    "- `{HZ}_RISKS` / `{HZ}_RISKR` — risk score & rating\n",
+    "\n",
+    "Source: FEMA NRI v1.20.0 (December 2025). Census tract is the most granular layer FEMA publishes.\n",
+    "\n",
+    "**Null handling:** NRI shapefiles use `-9999` for nulls; we convert to SQL `NULL` on load.\n",
+    "\n",
+    "**Coverage:** All US states + DC + PR + territories. A DC point outside any NRI tract gets `nri_status='no_coverage'`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import time\n",
+    "from pathlib import Path\n",
+    "\n",
+    "import geopandas as gpd\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import psycopg2\n",
+    "from psycopg2 import sql\n",
+    "from psycopg2.extras import execute_values\n",
+    "from shapely import wkb as shapely_wkb\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "pd.set_option('display.max_columns', 100)\n",
+    "pd.set_option('display.max_rows', 120)\n",
+    "\n",
+    "print('geopandas:', gpd.__version__)\n",
+    "print('pandas:   ', pd.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def load_env_file(env_path: str = '.env') -> None:\n",
+    "    p = Path(env_path)\n",
+    "    if not p.exists():\n",
+    "        print(f'No {env_path} file found in {Path.cwd()}')\n",
+    "        return\n",
+    "    loaded = 0\n",
+    "    for raw_line in p.read_text(encoding='utf-8').splitlines():\n",
+    "        line = raw_line.strip()\n",
+    "        if not line or line.startswith('#') or '=' not in line:\n",
+    "            continue\n",
+    "        key, value = line.split('=', 1)\n",
+    "        key = key.strip()\n",
+    "        value = value.strip().strip('\"').strip(\"'\")\n",
+    "        if key and key not in os.environ:\n",
+    "            os.environ[key] = value\n",
+    "            loaded += 1\n",
+    "    print(f'Loaded {loaded} env var(s) from {env_path}')\n",
+    "\n",
+    "\n",
+    "def require_env(keys):\n",
+    "    missing = [k for k in keys if not os.getenv(k)]\n",
+    "    if missing:\n",
+    "        raise EnvironmentError('Missing required env vars: ' + ', '.join(missing))\n",
+    "\n",
+    "\n",
+    "load_env_file('.env')\n",
+    "require_env(['PGWEB_HOST', 'PGWEB_PORT', 'PGWEB_USER', 'PGWEB_PASSWORD'])\n",
+    "\n",
+    "MASTER_TABLE   = 'public.master_data_centers'\n",
+    "TRACTS_TABLE   = 'public.nri_census_tracts'\n",
+    "EXPOSURE_TABLE = 'public.data_center_nri_exposure'\n",
+    "\n",
+    "\n",
+    "def get_conn():\n",
+    "    return psycopg2.connect(\n",
+    "        host=os.environ['PGWEB_HOST'],\n",
+    "        port=os.environ['PGWEB_PORT'],\n",
+    "        user=os.environ['PGWEB_USER'],\n",
+    "        password=os.environ['PGWEB_PASSWORD'],\n",
+    "        dbname='data_centers',\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "def get_engine():\n",
+    "    return create_engine(\n",
+    "        f\"postgresql+psycopg2://{os.environ['PGWEB_USER']}:{os.environ['PGWEB_PASSWORD']}\"\n",
+    "        f\"@{os.environ['PGWEB_HOST']}:{os.environ['PGWEB_PORT']}/data_centers\"\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    with conn.cursor() as cur:\n",
+    "        cur.execute('select current_database(), current_user')\n",
+    "        print('Connected:', cur.fetchone())\n",
+    "        cur.execute('create extension if not exists postgis')\n",
+    "        cur.execute('select to_regclass(%s)', (MASTER_TABLE,))\n",
+    "        if cur.fetchone()[0] is None:\n",
+    "            raise RuntimeError(f'{MASTER_TABLE} missing')\n",
+    "        cur.execute(sql.SQL('select count(*) from {}').format(sql.SQL(MASTER_TABLE)))\n",
+    "        print(f'{MASTER_TABLE}: {cur.fetchone()[0]:,} rows')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3",
+   "metadata": {},
+   "source": [
+    "## Parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "NRI_DIR = Path('data/NRI_Shapefile_CensusTracts')\n",
+    "NRI_SHP = NRI_DIR / 'NRI_Shapefile_CensusTracts.shp'\n",
+    "\n",
+    "# If True, drop and rebuild the census-tracts NRI table from scratch.\n",
+    "# If False, only build it when missing or empty.\n",
+    "RELOAD_TRACTS = False\n",
+    "\n",
+    "# If True, drop and rebuild the per-DC exposure summary table.\n",
+    "RECOMPUTE_SUMMARY = True\n",
+    "\n",
+    "# 18 NRI hazard prefixes (matches NRI_HazardInfo.csv \"Prefix\" column).\n",
+    "HAZARDS = [\n",
+    "    'AVLN', 'CFLD', 'CWAV', 'DRGT', 'ERQK', 'HAIL', 'HWAV', 'HRCN', 'ISTM',\n",
+    "    'LNDS', 'LTNG', 'IFLD', 'SWND', 'TRND', 'TSUN', 'VLCN', 'WFIR', 'WNTW',\n",
+    "]\n",
+    "\n",
+    "assert NRI_SHP.exists(), f'{NRI_SHP.resolve()} not found'\n",
+    "print(f'NRI shapefile: {NRI_SHP}')\n",
+    "print(f'  size: {NRI_SHP.stat().st_size / 1e6:.0f} MB')\n",
+    "print(f'Hazards: {len(HAZARDS)}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5",
+   "metadata": {},
+   "source": [
+    "## Discover NRI Schema\n",
+    "\n",
+    "Read the shapefile header (no geometries, no rows) to derive the column list and dtypes. We map dtypes to PostgreSQL types so the rest of the notebook doesn't hardcode 460 column names."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Read 1 row just to get the column dtypes — much faster than reading the full 766 MB shapefile.\n",
+    "_schema_gdf = gpd.read_file(NRI_SHP, rows=1)\n",
+    "print(f'NRI columns: {len(_schema_gdf.columns)} (incl. geometry)')\n",
+    "print(f'CRS: {_schema_gdf.crs}')\n",
+    "\n",
+    "\n",
+    "def pg_type_for(dtype, colname: str) -> str:\n",
+    "    \"\"\"Map pandas dtype + column name to a PostgreSQL column type.\"\"\"\n",
+    "    s = str(dtype)\n",
+    "    if s.startswith('int'):\n",
+    "        return 'bigint'\n",
+    "    if s.startswith('float'):\n",
+    "        return 'double precision'\n",
+    "    return 'text'\n",
+    "\n",
+    "\n",
+    "# Drop shapefile-only bookkeeping fields; keep geometry separate.\n",
+    "DROP_COLS = {'geometry', 'Shape_Leng', 'Shape_Area'}\n",
+    "NRI_COLS = [c for c in _schema_gdf.columns if c not in DROP_COLS]\n",
+    "\n",
+    "NRI_COL_TYPES = [(c, pg_type_for(_schema_gdf[c].dtype, c)) for c in NRI_COLS]\n",
+    "print('First 8 NRI cols:', NRI_COL_TYPES[:8])\n",
+    "print('Last 4 NRI cols: ', NRI_COL_TYPES[-4:])\n",
+    "\n",
+    "# Sanity-check: each hazard prefix should contribute multiple columns.\n",
+    "_per_hz = {h: sum(1 for c in NRI_COLS if c.startswith(h + '_')) for h in HAZARDS}\n",
+    "print('Cols per hazard prefix:', _per_hz)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7",
+   "metadata": {},
+   "source": [
+    "## Create Tables\n",
+    "\n",
+    "`public.nri_census_tracts` carries the full NRI schema (one row per census tract, ~84k rows) plus the polygon geometry. `public.data_center_nri_exposure` is keyed on `master_id` and stores DC identity fields + the NRI fields of the containing tract."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def quote_ident(name: str) -> str:\n",
+    "    return '\"' + name.replace('\"', '\"\"') + '\"'\n",
+    "\n",
+    "\n",
+    "nri_col_defs = ',\\n    '.join(f'{quote_ident(c)} {t}' for c, t in NRI_COL_TYPES)\n",
+    "\n",
+    "CREATE_TRACTS_SQL = f'''\n",
+    "create table if not exists {TRACTS_TABLE} (\n",
+    "    {nri_col_defs},\n",
+    "    geom geometry(MultiPolygon, 4326) not null\n",
+    ");\n",
+    "\n",
+    "create index if not exists nri_census_tracts_geom_gix\n",
+    "    on {TRACTS_TABLE} using gist (geom);\n",
+    "create index if not exists nri_census_tracts_tractfips_idx\n",
+    "    on {TRACTS_TABLE} (\"TRACTFIPS\");\n",
+    "create index if not exists nri_census_tracts_state_idx\n",
+    "    on {TRACTS_TABLE} (\"STATEABBRV\");\n",
+    "'''\n",
+    "\n",
+    "# Per-DC exposure: DC identity + the NRI fields of the containing tract.\n",
+    "CREATE_EXPOSURE_SQL = f'''\n",
+    "create table if not exists {EXPOSURE_TABLE} (\n",
+    "    master_id text primary key references public.master_data_centers(master_id) on delete cascade,\n",
+    "    source text, name text, operator text, city text, dc_state text, country text,\n",
+    "    longitude double precision, latitude double precision,\n",
+    "    geom geometry(Point, 4326),\n",
+    "\n",
+    "    nri_status text not null,   -- 'covered' or 'no_coverage'\n",
+    "    {nri_col_defs},\n",
+    "\n",
+    "    fetched_at timestamptz not null default now(),\n",
+    "    updated_at timestamptz not null default now()\n",
+    ");\n",
+    "\n",
+    "create index if not exists data_center_nri_exposure_state_idx\n",
+    "    on {EXPOSURE_TABLE} (dc_state);\n",
+    "create index if not exists data_center_nri_exposure_geom_gix\n",
+    "    on {EXPOSURE_TABLE} using gist (geom);\n",
+    "create index if not exists data_center_nri_exposure_risk_idx\n",
+    "    on {EXPOSURE_TABLE} (\"RISK_SCORE\");\n",
+    "create index if not exists data_center_nri_exposure_tract_idx\n",
+    "    on {EXPOSURE_TABLE} (\"TRACTFIPS\");\n",
+    "'''\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    with conn.cursor() as cur:\n",
+    "        if RELOAD_TRACTS:\n",
+    "            cur.execute(f'drop table if exists {EXPOSURE_TABLE} cascade')\n",
+    "            cur.execute(f'drop table if exists {TRACTS_TABLE} cascade')\n",
+    "        cur.execute(CREATE_TRACTS_SQL)\n",
+    "        cur.execute(CREATE_EXPOSURE_SQL)\n",
+    "        for t in (TRACTS_TABLE, EXPOSURE_TABLE):\n",
+    "            cur.execute(sql.SQL('select count(*) from {}').format(sql.SQL(t)))\n",
+    "            print(f'{t}: {cur.fetchone()[0]:,} rows')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9",
+   "metadata": {},
+   "source": [
+    "## Load NRI Shapefile into PostGIS\n",
+    "\n",
+    "Reads the full 766 MB shapefile once, reprojects to EPSG:4326 (it ships as EPSG:3857), converts `-9999` sentinels to NULL, coerces any stray `Polygon` to `MultiPolygon`, and bulk-inserts via `execute_values`.\n",
+    "\n",
+    "`RELOAD_TRACTS=False` skips this step if the table is already populated, so re-runs are cheap."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def tracts_row_count() -> int:\n",
+    "    with get_conn() as conn:\n",
+    "        with conn.cursor() as cur:\n",
+    "            cur.execute(f'select count(*) from {TRACTS_TABLE}')\n",
+    "            return cur.fetchone()[0]\n",
+    "\n",
+    "\n",
+    "def load_nri_shapefile():\n",
+    "    t0 = time.time()\n",
+    "    print('Reading NRI shapefile (this takes ~1-2 min)...')\n",
+    "    gdf = gpd.read_file(NRI_SHP)\n",
+    "    print(f'  read {len(gdf):,} tracts, {len(gdf.columns)} cols in {time.time()-t0:.0f}s')\n",
+    "    print(f'  source CRS: {gdf.crs}')\n",
+    "\n",
+    "    if str(gdf.crs).lower() not in ('epsg:4326', 'wgs 84'):\n",
+    "        t = time.time()\n",
+    "        gdf = gdf.to_crs('EPSG:4326')\n",
+    "        print(f'  reprojected to EPSG:4326 in {time.time()-t:.0f}s')\n",
+    "\n",
+    "    # Drop shapefile bookkeeping cols we don't store.\n",
+    "    keep = [c for c in NRI_COLS] + ['geometry']\n",
+    "    gdf = gdf[keep]\n",
+    "\n",
+    "    # NRI shapefile uses -9999 for nulls in numeric fields.\n",
+    "    num_cols = [c for c, t in NRI_COL_TYPES if t in ('bigint', 'double precision')]\n",
+    "    t = time.time()\n",
+    "    gdf[num_cols] = gdf[num_cols].mask(gdf[num_cols] == -9999)\n",
+    "    # Strip whitespace from text fields and convert empty strings to NULL.\n",
+    "    txt_cols = [c for c, t in NRI_COL_TYPES if t == 'text']\n",
+    "    for c in txt_cols:\n",
+    "        gdf[c] = gdf[c].where(gdf[c].notna(), None)\n",
+    "        gdf[c] = gdf[c].map(lambda v: None if v is None or (isinstance(v, str) and v.strip() == '') else v)\n",
+    "    print(f'  nulled -9999 sentinels in {time.time()-t:.0f}s')\n",
+    "\n",
+    "    # Coerce geometry to MultiPolygon.\n",
+    "    from shapely.geometry import MultiPolygon\n",
+    "    def _to_multi(g):\n",
+    "        if g is None or g.is_empty:\n",
+    "            return None\n",
+    "        if g.geom_type == 'Polygon':\n",
+    "            return MultiPolygon([g])\n",
+    "        return g\n",
+    "    gdf['geometry'] = gdf['geometry'].apply(_to_multi)\n",
+    "    gdf = gdf[gdf['geometry'].notna()].copy()\n",
+    "    print(f'  {len(gdf):,} tracts with valid geometry')\n",
+    "\n",
+    "    # Build the insert. With ~84k rows and ~460 cols this is the slow step;\n",
+    "    # execute_values with a generous page_size keeps round-trips down.\n",
+    "    col_names = NRI_COLS  # ordered\n",
+    "    quoted_cols = ', '.join(quote_ident(c) for c in col_names)\n",
+    "    placeholders = ', '.join(['%s'] * len(col_names)) + ', ST_Multi(ST_GeomFromWKB(%s, 4326))'\n",
+    "\n",
+    "    insert_sql = (\n",
+    "        f'insert into {TRACTS_TABLE} ({quoted_cols}, geom) values %s'\n",
+    "    )\n",
+    "    template = '(' + placeholders + ')'\n",
+    "\n",
+    "    t = time.time()\n",
+    "    rows = []\n",
+    "    for r in gdf.itertuples(index=False, name=None):\n",
+    "        # itertuples respects column order: [NRI_COLS..., geometry]\n",
+    "        *vals, geom = r\n",
+    "        # Replace NaN with None for psycopg2.\n",
+    "        clean = [None if (isinstance(v, float) and np.isnan(v)) else v for v in vals]\n",
+    "        rows.append(tuple(clean) + (psycopg2.Binary(shapely_wkb.dumps(geom, hex=False)),))\n",
+    "    print(f'  built {len(rows):,} insert tuples in {time.time()-t:.0f}s')\n",
+    "\n",
+    "    t = time.time()\n",
+    "    with get_conn() as conn:\n",
+    "        with conn.cursor() as cur:\n",
+    "            cur.execute(f'truncate {TRACTS_TABLE}')\n",
+    "            execute_values(cur, insert_sql, rows, template=template, page_size=200)\n",
+    "        conn.commit()\n",
+    "    print(f'  inserted in {time.time()-t:.0f}s')\n",
+    "\n",
+    "    with get_conn() as conn:\n",
+    "        with conn.cursor() as cur:\n",
+    "            cur.execute(f'analyze {TRACTS_TABLE}')\n",
+    "    print(f'Done loading NRI in {(time.time()-t0)/60:.1f} min')\n",
+    "\n",
+    "\n",
+    "existing = tracts_row_count()\n",
+    "if RELOAD_TRACTS or existing == 0:\n",
+    "    load_nri_shapefile()\n",
+    "else:\n",
+    "    print(f'{TRACTS_TABLE} already has {existing:,} rows -- skipping (set RELOAD_TRACTS=True to rebuild)')\n",
+    "\n",
+    "print(f'\\n{TRACTS_TABLE}: {tracts_row_count():,} rows')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11",
+   "metadata": {},
+   "source": [
+    "## Compute Per-DC NRI Exposure\n",
+    "\n",
+    "Single spatial join: for each DC point, find the census tract whose polygon contains it (`ST_Within`), and pull all NRI fields from that tract. DCs with no containing tract get `nri_status='no_coverage'` and NULL NRI fields.\n",
+    "\n",
+    "PostGIS uses the GIST index on `nri_census_tracts.geom` — with ~84k tract polygons and ~1,800 DC points this completes in seconds."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# NRI columns to copy from tract -> exposure row (all of them).\n",
+    "nri_quoted = [quote_ident(c) for c in NRI_COLS]\n",
+    "nri_select_from_t = ', '.join(f't.{q}' for q in nri_quoted)\n",
+    "nri_insert_cols   = ', '.join(nri_quoted)\n",
+    "\n",
+    "POPULATE_EXPOSURE_SQL = f'''\n",
+    "truncate {EXPOSURE_TABLE};\n",
+    "\n",
+    "with dc_tract as (\n",
+    "    -- Pick the single containing tract per DC. (ST_Within on point/polygon\n",
+    "    -- yields at most one match unless tract boundaries overlap, which they\n",
+    "    -- shouldn't; we DISTINCT ON master_id as a safety net.)\n",
+    "    select distinct on (dc.master_id)\n",
+    "        dc.master_id, dc.source, dc.name, dc.operator, dc.city, dc.state as dc_state,\n",
+    "        dc.country, dc.longitude, dc.latitude, dc.geom as dc_geom,\n",
+    "        t.*\n",
+    "    from {MASTER_TABLE} dc\n",
+    "    left join {TRACTS_TABLE} t on st_within(dc.geom, t.geom)\n",
+    "    where dc.geom is not null\n",
+    "    order by dc.master_id, t.\"TRACTFIPS\"\n",
+    ")\n",
+    "insert into {EXPOSURE_TABLE} (\n",
+    "    master_id, source, name, operator, city, dc_state, country, longitude, latitude, geom,\n",
+    "    nri_status, {nri_insert_cols}\n",
+    ")\n",
+    "select\n",
+    "    master_id, source, name, operator, city, dc_state, country, longitude, latitude, dc_geom,\n",
+    "    case when \"TRACTFIPS\" is not null then 'covered' else 'no_coverage' end as nri_status,\n",
+    "    {nri_insert_cols}\n",
+    "from dc_tract;\n",
+    "\n",
+    "analyze {EXPOSURE_TABLE};\n",
+    "'''\n",
+    "\n",
+    "if RECOMPUTE_SUMMARY:\n",
+    "    t = time.time()\n",
+    "    with get_conn() as conn:\n",
+    "        with conn.cursor() as cur:\n",
+    "            cur.execute('set local statement_timeout = 0')\n",
+    "            cur.execute(POPULATE_EXPOSURE_SQL)\n",
+    "        conn.commit()\n",
+    "    print(f'Populated {EXPOSURE_TABLE} in {time.time()-t:.0f}s')\n",
+    "\n",
+    "    with get_conn() as conn:\n",
+    "        with conn.cursor() as cur:\n",
+    "            cur.execute(f\"select nri_status, count(*) from {EXPOSURE_TABLE} group by nri_status order by 1\")\n",
+    "            print('Status distribution:')\n",
+    "            for r in cur.fetchall():\n",
+    "                print(' ', r)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13",
+   "metadata": {},
+   "source": [
+    "## Inspect Results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "engine = get_engine()\n",
+    "\n",
+    "overall_q = f'''\n",
+    "select\n",
+    "    count(*) as covered_dcs,\n",
+    "    round(avg(\"RISK_SCORE\")::numeric, 2) as avg_risk_score,\n",
+    "    round(avg(\"EAL_SCORE\")::numeric, 2)  as avg_eal_score,\n",
+    "    round(avg(\"SOVI_SCORE\")::numeric, 2) as avg_sovi_score,\n",
+    "    round(avg(\"RESL_SCORE\")::numeric, 2) as avg_resl_score,\n",
+    "    max(\"RISK_SCORE\") as max_risk_score,\n",
+    "    sum(case when \"RISK_RATNG\" in ('Relatively High','Very High') then 1 else 0 end) as n_high_risk\n",
+    "from {EXPOSURE_TABLE}\n",
+    "where nri_status = 'covered'\n",
+    "'''\n",
+    "\n",
+    "by_state_q = f'''\n",
+    "select\n",
+    "    dc_state,\n",
+    "    count(*) as n_dcs,\n",
+    "    round(avg(\"RISK_SCORE\")::numeric, 1)  as avg_risk_score,\n",
+    "    round(avg(\"EAL_SCORE\")::numeric, 1)   as avg_eal_score,\n",
+    "    round(avg(\"SOVI_SCORE\")::numeric, 1)  as avg_sovi_score,\n",
+    "    round(avg(\"RESL_SCORE\")::numeric, 1)  as avg_resl_score,\n",
+    "    sum(case when \"RISK_RATNG\" in ('Relatively High','Very High') then 1 else 0 end) as n_high_risk\n",
+    "from {EXPOSURE_TABLE}\n",
+    "where nri_status = 'covered'\n",
+    "group by dc_state\n",
+    "order by avg_risk_score desc nulls last\n",
+    "limit 15\n",
+    "'''\n",
+    "\n",
+    "# Top-10 most NRI-exposed individual DCs.\n",
+    "worst_q = f'''\n",
+    "select\n",
+    "    master_id, name, city, dc_state, \"TRACTFIPS\",\n",
+    "    round(\"RISK_SCORE\"::numeric, 1) as risk_score, \"RISK_RATNG\",\n",
+    "    round(\"EAL_SCORE\"::numeric,  1) as eal_score,  \"EAL_RATNG\",\n",
+    "    round(\"SOVI_SCORE\"::numeric, 1) as sovi_score, \"SOVI_RATNG\",\n",
+    "    round(\"RESL_SCORE\"::numeric, 1) as resl_score, \"RESL_RATNG\"\n",
+    "from {EXPOSURE_TABLE}\n",
+    "where nri_status = 'covered'\n",
+    "order by \"RISK_SCORE\" desc nulls last\n",
+    "limit 10\n",
+    "'''\n",
+    "\n",
+    "# Average per-hazard risk score across all DCs (which hazards dominate the DC portfolio?).\n",
+    "hazard_rollup_sql = 'select ' + ', '.join(\n",
+    "    f'round(avg(\"{h}_RISKS\")::numeric, 2) as \"{h.lower()}_risk\"'\n",
+    "    for h in HAZARDS\n",
+    ") + f' from {EXPOSURE_TABLE} where nri_status=\\'covered\\''\n",
+    "\n",
+    "print('=== Overall ===')\n",
+    "overall = pd.read_sql(overall_q, engine)\n",
+    "display(overall)\n",
+    "\n",
+    "print('\\n=== Top 15 states by avg NRI risk score (DC-weighted) ===')\n",
+    "display(pd.read_sql(by_state_q, engine))\n",
+    "\n",
+    "print('\\n=== Top 10 most NRI-exposed individual DCs ===')\n",
+    "display(pd.read_sql(worst_q, engine))\n",
+    "\n",
+    "print('\\n=== Average per-hazard risk score across all DCs ===')\n",
+    "hz = pd.read_sql(hazard_rollup_sql, engine).T.rename(columns={0: 'avg_risk_score'})\n",
+    "hz.index = [i.upper().replace('_RISK', '') for i in hz.index]\n",
+    "display(hz.sort_values('avg_risk_score', ascending=False))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15",
+   "metadata": {},
+   "source": [
+    "## Notes\n",
+    "\n",
+    "- **Composite scores are 0-100 percentile-style indices.** They are *not* comparable across NRI versions; pin the version in any downstream work (this load is `December 2025`, NRI v1.20.0).\n",
+    "- **`RISK_SCORE` already bakes in `EAL`, `SOVI`, and `RESL`** — use it as the single overall metric. Use the components when you want to disentangle hazard exposure (`EAL`) from population vulnerability (`SOVI`) and community readiness (`RESL`).\n",
+    "- **Per-hazard fields use prefixes** listed in `HAZARDS`. For each hazard `{HZ}`:\n",
+    "  - `{HZ}_AFREQ` annualized event frequency\n",
+    "  - `{HZ}_EXPB` / `EXPP` / `EXPA` / `EXPT` exposure (building / population / agriculture / total)\n",
+    "  - `{HZ}_HLRB` / `HLRP` / `HLRA` historic loss ratio\n",
+    "  - `{HZ}_EALB` / `EALP` / `EALA` / `EALT` / `EALS` / `EALR` expected annual loss + score + rating\n",
+    "  - `{HZ}_RISKV` / `RISKS` / `RISKR` risk value + score + rating\n",
+    "  - Some hazards (DRGT, ERQK, ISTM, etc.) lack agricultural or building fields — see `NRI_HazardInfo.csv`.\n",
+    "- **`-9999` is FEMA's null sentinel** in the shapefile; we convert to SQL `NULL` on load. Filter `is not null` when ranking.\n",
+    "- **DCs outside any tract** (offshore, foreign territories not in NRI scope) get `nri_status='no_coverage'` and NULL NRI fields. Most US DCs will be covered.\n",
+    "- **Census tract is the most granular NRI layer.** Two DCs in the same tract will have identical NRI fields. If you need building-level resolution, NRI is not the right source."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16",
+   "metadata": {},
+   "source": [
+    "## Tables Created by This Notebook and Their Relationships\n",
+    "\n",
+    "### Tables Created / Maintained\n",
+    "1. `public.nri_census_tracts`\n",
+    "- One row per US census tract (~84k) with the full FEMA NRI v1.20.0 (December 2025) field set plus the tract polygon geometry.\n",
+    "\n",
+    "2. `public.data_center_nri_exposure`\n",
+    "- One row per `master_id` with DC identity fields and the NRI fields of the census tract containing the DC point.\n",
+    "\n",
+    "### Key Relationships\n",
+    "- `public.nri_census_tracts (\"TRACTFIPS\", geom)`\n",
+    "  - spatial source for -> `public.data_center_nri_exposure (master_id)` via `ST_Within(dc.geom, tract.geom)`\n",
+    "\n",
+    "- `public.master_data_centers (master_id)`\n",
+    "  - 1-to-1 (effective) -> `public.data_center_nri_exposure (master_id)`\n",
+    "\n",
+    "- `public.data_center_nri_exposure (\"TRACTFIPS\")`\n",
+    "  - many-to-1 -> `public.nri_census_tracts (\"TRACTFIPS\")` for any extra tract-level joins.\n",
+    "\n",
+    "### Rerun Notes\n",
+    "- Supports repeat runs when new master data centers are added: re-run with `RECOMPUTE_SUMMARY=True` (default) to rebuild the per-DC exposure table from the existing tracts table.\n",
+    "- The tracts table only needs to be rebuilt when FEMA releases a new NRI version. Set `RELOAD_TRACTS=True` and re-run the load cell after dropping the new shapefile into `data/NRI_Shapefile_CensusTracts/`."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.14.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}