expanded voter data

2026-05-22 14:18:01 -07:00
parent dc8755cde0
commit c95f22fcdb
4 changed files with 628 additions and 7 deletions
--- a/rdh_precinct_vote_data_centers.ipynb
+++ b/rdh_precinct_vote_data_centers.ipynb
@@ -145,7 +145,7 @@
   "source": [
    "## Parameters\n",
    "\n",
-    "The defaults run a small real pilot for Virginia 2020, because Virginia has many data centers in the master table and a statewide precinct layer should produce visible matches. After the pilot works, broaden `TARGET_STATES` and `FILTER_YEARS_ANY`. Use `TARGET_STATES = None` to infer all states from `public.master_data_centers`.\n"
+    "The defaults now target both 2020 and 2024 precinct election layers across all inferred data-center states. Set `TARGET_STATES` to a small list like `['VA']` for a quick pilot run, or keep `TARGET_STATES = None` to infer all states from `public.master_data_centers`. Use `FILTER_YEARS_ANY = []` to keep all years returned by RDH."
   ]
  },
  {
@@ -166,7 +166,7 @@
    "TARGET_STATES = None  # None = infer all states from master_data_centers; or list e.g. ['VA','TX']\n",
    "FILTER_TERMS_ALL = ['election results', 'precinct']\n",
    "FILTER_TERMS_ANY = []  # e.g. ['general', 'president']\n",
-    "FILTER_YEARS_ANY = ['2020']  # pilot first; empty keeps all years returned by RDH\n",
+    "FILTER_YEARS_ANY = ['2020', '2024']  # set [] to keep all years returned by RDH\n",
    "PREFERRED_FORMATS = ['SHP']  # point-in-precinct joins need spatial files\n",
    "\n",
    "DOWNLOAD_FILES = True\n",
@@ -387,6 +387,11 @@
    "    return re.sub(r'[^A-Za-z0-9._-]+', '_', name)\n",
    "\n",
    "\n",
+    "def detect_year(text):\n",
+    "    match = re.search(r'\\b(20\\d{2})\\b', str(text))\n",
+    "    return match.group(1) if match else None\n",
+    "\n",
+    "\n",
    "work = listing.copy()\n",
    "for required in ['Title', 'Format', 'URL']:\n",
    "    if required not in work.columns:\n",
@@ -405,8 +410,20 @@
    "].copy()\n",
    "\n",
    "filtered = filtered.sort_values(['query_state_code', 'Title', 'Format', 'filename']).reset_index(drop=True)\n",
+    "filtered['detected_year'] = filtered['Title'].map(detect_year)\n",
+    "\n",
    "print(f'Filtered candidate files: {len(filtered):,}')\n",
-    "display(filtered[['query_state_code', 'Title', 'Format', 'datasetid', 'filename', 'URL']].head(100))\n"
+    "year_summary = (\n",
+    "    filtered.assign(detected_year=filtered['detected_year'].fillna('unknown'))\n",
+    "    .groupby('detected_year', dropna=False)\n",
+    "    .size()\n",
+    "    .reset_index(name='rows')\n",
+    "    .sort_values('detected_year')\n",
+    ")\n",
+    "print('Candidate rows by detected year:')\n",
+    "display(year_summary)\n",
+    "\n",
+    "display(filtered[['query_state_code', 'detected_year', 'Title', 'Format', 'datasetid', 'filename', 'URL']].head(100))"
   ]
  },
  {
@@ -1169,11 +1186,11 @@
   "id": "28",
   "metadata": {},
   "source": [
-    "## Next Refinement: Tidy Vote Columns\n",
+    "## Standardized Vote Fields\n",
    "\n",
-    "The RDH staging table intentionally stores each precinct row's original attributes in `properties jsonb`. Once the downloaded layers are visible, inspect `precinct_properties` above to identify vote-column patterns for the states/years you care about.\n",
+    "The cell below extracts a standardized set of election attributes from `precinct_properties` using heuristic key matching across RDH file families.\n",
    "\n",
-    "Useful follow-up views can then extract fields like:\n",
+    "Extracted fields:\n",
    "- precinct identifier/name\n",
    "- election year\n",
    "- office\n",
@@ -1182,7 +1199,521 @@
    "- total votes\n",
    "- turnout or vote share\n",
    "\n",
-    "That extraction is best added after confirming the specific RDH file families selected by the filters.\n"
+    "Because RDH schemas vary by state and source, this step is intentionally tolerant and computes fallback vote-share values when direct turnout/share fields are not present."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "29",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "STANDARDIZED_LIMIT = None  # set an int (e.g., 2000) for faster sampling\n",
+    "\n",
+    "limit_clause = '' if STANDARDIZED_LIMIT is None else 'limit %s'\n",
+    "standardized_sql = f'''\n",
+    "select\n",
+    "    m.master_id,\n",
+    "    dc.name,\n",
+    "    dc.city,\n",
+    "    dc.state,\n",
+    "    l.title as rdh_layer_title,\n",
+    "    f.properties as precinct_properties\n",
+    "from {MATCH_TABLE} m\n",
+    "join {MASTER_TABLE} dc on dc.master_id = m.master_id\n",
+    "join {FEATURE_TABLE} f on f.feature_id = m.feature_id\n",
+    "join {LAYER_TABLE} l on l.layer_id = m.layer_id\n",
+    "order by dc.state, dc.city, dc.name\n",
+    "{limit_clause}\n",
+    "'''\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    if STANDARDIZED_LIMIT is None:\n",
+    "        raw_standardized = pd.read_sql_query(standardized_sql, conn)\n",
+    "    else:\n",
+    "        raw_standardized = pd.read_sql_query(standardized_sql, conn, params=[STANDARDIZED_LIMIT])\n",
+    "\n",
+    "\n",
+    "def parse_props(value):\n",
+    "    if isinstance(value, dict):\n",
+    "        return value\n",
+    "    if pd.isna(value):\n",
+    "        return {}\n",
+    "    text = str(value).strip()\n",
+    "    if not text:\n",
+    "        return {}\n",
+    "    try:\n",
+    "        obj = json.loads(text)\n",
+    "        return obj if isinstance(obj, dict) else {}\n",
+    "    except Exception:\n",
+    "        return {}\n",
+    "\n",
+    "\n",
+    "def norm_key(k):\n",
+    "    return re.sub(r'[^a-z0-9]+', '_', str(k).strip().lower()).strip('_')\n",
+    "\n",
+    "\n",
+    "def as_number(v):\n",
+    "    if v is None:\n",
+    "        return None\n",
+    "    if isinstance(v, (int, float, np.integer, np.floating)):\n",
+    "        if pd.isna(v):\n",
+    "            return None\n",
+    "        return float(v)\n",
+    "    text = str(v).strip().replace(',', '')\n",
+    "    if text == '':\n",
+    "        return None\n",
+    "    if re.fullmatch(r'-?\\d+(\\.\\d+)?', text):\n",
+    "        return float(text)\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "def parse_year_from_title(title):\n",
+    "    m = re.search(r'\\b((?:19|20)\\d{2})\\b', str(title))\n",
+    "    return int(m.group(1)) if m else None\n",
+    "\n",
+    "\n",
+    "def infer_year_from_keys(props_norm):\n",
+    "    key_patterns = [\n",
+    "        re.compile(r'^[pg](\\d{2})(pre|uss|con|gov|ag|sos|ltg|tre|aud).*'),\n",
+    "        re.compile(r'^[pg](\\d{2}).*'),\n",
+    "    ]\n",
+    "    for key in props_norm.keys():\n",
+    "        nk = norm_key(key)\n",
+    "        for pat in key_patterns:\n",
+    "            m = pat.match(nk)\n",
+    "            if m:\n",
+    "                yy = int(m.group(1))\n",
+    "                return 2000 + yy if yy < 60 else 1900 + yy\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "def decode_rdh_vote_key(key):\n",
+    "    k = norm_key(key)\n",
+    "\n",
+    "    m = re.match(r'^[pg](\\d{2})pre([a-z]).*', k)\n",
+    "    if m:\n",
+    "        party_code = m.group(2)\n",
+    "        return ('President', party_code)\n",
+    "\n",
+    "    m = re.match(r'^[pg](\\d{2})uss([a-z]).*', k)\n",
+    "    if m:\n",
+    "        party_code = m.group(2)\n",
+    "        return ('U.S. Senate', party_code)\n",
+    "\n",
+    "    m = re.match(r'^[pg](\\d{2})con(\\d{2})([a-z]).*', k)\n",
+    "    if m:\n",
+    "        district = m.group(2)\n",
+    "        party_code = m.group(3)\n",
+    "        return (f'U.S. House District {district}', party_code)\n",
+    "\n",
+    "    return (None, None)\n",
+    "\n",
+    "\n",
+    "def party_from_key(key):\n",
+    "    k = norm_key(key)\n",
+    "    office, party_code = decode_rdh_vote_key(k)\n",
+    "    if party_code == 'd':\n",
+    "        return office, 'D'\n",
+    "    if party_code == 'r':\n",
+    "        return office, 'R'\n",
+    "\n",
+    "    if any(t in k for t in ['biden', 'dem', 'democrat']):\n",
+    "        return office, 'D'\n",
+    "    if any(t in k for t in ['trump', 'gop', 'rep', 'republican']):\n",
+    "        return office, 'R'\n",
+    "\n",
+    "    return office, None\n",
+    "\n",
+    "\n",
+    "def detect_office(title, props_norm, vote_office_totals):\n",
+    "    title_lower = str(title).lower()\n",
+    "    if 'president' in title_lower or 'presidential' in title_lower:\n",
+    "        return 'President'\n",
+    "    if 'senate' in title_lower:\n",
+    "        return 'U.S. Senate'\n",
+    "    if 'house' in title_lower or 'congress' in title_lower:\n",
+    "        return 'U.S. House'\n",
+    "    if 'governor' in title_lower:\n",
+    "        return 'Governor'\n",
+    "\n",
+    "    if vote_office_totals:\n",
+    "        return max(vote_office_totals.items(), key=lambda x: x[1])[0]\n",
+    "\n",
+    "    office_key_hits = [k for k in props_norm if any(x in k for x in ['office', 'contest', 'race'])]\n",
+    "    if office_key_hits:\n",
+    "        best = office_key_hits[0]\n",
+    "        val = props_norm.get(best)\n",
+    "        if isinstance(val, str) and val.strip():\n",
+    "            return val.strip()\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "def best_precinct_identifier(props_norm):\n",
+    "    preferred_keys = [\n",
+    "        'precinct', 'precinct_name', 'precinctid', 'precinct_id', 'precinct20',\n",
+    "        'pctname', 'pct', 'vtd', 'vtdst', 'vtdst20', 'name20',\n",
+    "        'district', 'district_name', 'ward', 'geoid', 'geoid20', 'unique_id',\n",
+    "    ]\n",
+    "    for key in preferred_keys:\n",
+    "        if key in props_norm and str(props_norm[key]).strip():\n",
+    "            return str(props_norm[key]).strip()\n",
+    "\n",
+    "    fallback_candidates = [\n",
+    "        (k, v) for k, v in props_norm.items()\n",
+    "        if any(t in k for t in ['precinct', 'vtd', 'ward', 'district', 'geo', 'name']) and str(v).strip()\n",
+    "    ]\n",
+    "    if fallback_candidates:\n",
+    "        return str(fallback_candidates[0][1]).strip()\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "def extract_vote_fields(row):\n",
+    "    props = parse_props(row['precinct_properties'])\n",
+    "    props_norm = {norm_key(k): v for k, v in props.items()}\n",
+    "\n",
+    "    precinct_id_or_name = best_precinct_identifier(props_norm)\n",
+    "    election_year = parse_year_from_title(row['rdh_layer_title'])\n",
+    "    if election_year is None:\n",
+    "        election_year = infer_year_from_keys(props_norm)\n",
+    "\n",
+    "    year_keys = [k for k in props_norm if 'year' in k]\n",
+    "    if election_year is None and year_keys:\n",
+    "        for k in year_keys:\n",
+    "            y = as_number(props_norm[k])\n",
+    "            if y and 1900 <= y <= 2100:\n",
+    "                election_year = int(y)\n",
+    "                break\n",
+    "\n",
+    "    numeric_items = [(k, as_number(v)) for k, v in props_norm.items()]\n",
+    "    numeric_items = [(k, v) for k, v in numeric_items if v is not None]\n",
+    "\n",
+    "    dem_votes = None\n",
+    "    rep_votes = None\n",
+    "    vote_office_totals = {}\n",
+    "\n",
+    "    for key, value in numeric_items:\n",
+    "        office_guess, party_guess = party_from_key(key)\n",
+    "        if party_guess == 'D':\n",
+    "            dem_votes = value if dem_votes is None else max(dem_votes, value)\n",
+    "        elif party_guess == 'R':\n",
+    "            rep_votes = value if rep_votes is None else max(rep_votes, value)\n",
+    "\n",
+    "        if office_guess is not None and party_guess in {'D', 'R'}:\n",
+    "            vote_office_totals[office_guess] = vote_office_totals.get(office_guess, 0.0) + float(value)\n",
+    "\n",
+    "    total_candidates = [\n",
+    "        v for k, v in numeric_items\n",
+    "        if (\n",
+    "            ('total' in k and 'vote' in k)\n",
+    "            or ('tot' in k and 'vote' in k)\n",
+    "            or k in {'votes_total', 'total_votes', 'vote_total'}\n",
+    "        )\n",
+    "    ]\n",
+    "    total_votes = max(total_candidates) if total_candidates else None\n",
+    "    if total_votes is None and dem_votes is not None and rep_votes is not None:\n",
+    "        total_votes = dem_votes + rep_votes\n",
+    "\n",
+    "    turnout_candidates = [\n",
+    "        v for k, v in numeric_items\n",
+    "        if any(x in k for x in ['turnout', 'turnout_pct', 'turnout_rate', 'vote_share', 'share', 'pct'])\n",
+    "    ]\n",
+    "    turnout_or_vote_share = turnout_candidates[0] if turnout_candidates else None\n",
+    "\n",
+    "    if turnout_or_vote_share is None:\n",
+    "        reg_voters = props_norm.get('reg_voters')\n",
+    "        reg_voters_num = as_number(reg_voters)\n",
+    "        if reg_voters_num and total_votes:\n",
+    "            turnout_or_vote_share = total_votes / reg_voters_num\n",
+    "        elif dem_votes is not None and rep_votes is not None and (dem_votes + rep_votes) > 0:\n",
+    "            turnout_or_vote_share = dem_votes / (dem_votes + rep_votes)\n",
+    "\n",
+    "    office = detect_office(row['rdh_layer_title'], props_norm, vote_office_totals)\n",
+    "\n",
+    "    return pd.Series({\n",
+    "        'precinct_identifier_name': precinct_id_or_name,\n",
+    "        'election_year': election_year,\n",
+    "        'office': office,\n",
+    "        'democratic_votes': dem_votes,\n",
+    "        'republican_votes': rep_votes,\n",
+    "        'total_votes': total_votes,\n",
+    "        'turnout_or_vote_share': turnout_or_vote_share,\n",
+    "    })\n",
+    "\n",
+    "\n",
+    "standardized_fields = raw_standardized.apply(extract_vote_fields, axis=1)\n",
+    "standardized_preview = pd.concat(\n",
+    "    [\n",
+    "        raw_standardized[['master_id', 'name', 'city', 'state', 'rdh_layer_title']],\n",
+    "        standardized_fields,\n",
+    "    ],\n",
+    "    axis=1,\n",
+    ")\n",
+    "\n",
+    "standardized_summary = pd.DataFrame({\n",
+    "    'field': [\n",
+    "        'precinct_identifier_name', 'election_year', 'office',\n",
+    "        'democratic_votes', 'republican_votes', 'total_votes', 'turnout_or_vote_share',\n",
+    "    ]\n",
+    "})\n",
+    "standardized_summary['non_null_rows'] = standardized_summary['field'].map(\n",
+    "    lambda c: int(standardized_preview[c].notna().sum())\n",
+    ")\n",
+    "\n",
+    "print(f'Standardized preview rows: {len(standardized_preview):,}')\n",
+    "display(standardized_summary)\n",
+    "display(standardized_preview.head(50))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "30",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ELECTION_CONTEXT_TABLE = 'public.data_center_election_context'\n",
+    "\n",
+    "required_cols = [\n",
+    "    'master_id', 'rdh_layer_title',\n",
+    "    'precinct_identifier_name', 'election_year', 'office',\n",
+    "    'democratic_votes', 'republican_votes', 'total_votes', 'turnout_or_vote_share',\n",
+    "]\n",
+    "\n",
+    "missing_cols = [c for c in required_cols if c not in standardized_preview.columns]\n",
+    "if missing_cols:\n",
+    "    raise RuntimeError(\n",
+    "        'standardized_preview is missing required columns: '\n",
+    "        + ', '.join(missing_cols)\n",
+    "        + '. Run the standardized extraction cell first.'\n",
+    "    )\n",
+    "\n",
+    "persist_best = standardized_preview[required_cols].copy()\n",
+    "persist_best['non_null_score'] = persist_best[\n",
+    "    ['precinct_identifier_name', 'election_year', 'office', 'democratic_votes', 'republican_votes', 'total_votes', 'turnout_or_vote_share']\n",
+    "].notna().sum(axis=1)\n",
+    "\n",
+    "persist_best = persist_best.sort_values(\n",
+    "    ['master_id', 'non_null_score', 'total_votes'],\n",
+    "    ascending=[True, False, False],\n",
+    "    na_position='last'\n",
+    ")\n",
+    "persist_best = persist_best.drop_duplicates(subset=['master_id'], keep='first').copy()\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    master_base = pd.read_sql_query(\n",
+    "        f'''\n",
+    "        select master_id, name, city, upper(state) as state\n",
+    "        from {MASTER_TABLE}\n",
+    "        ''',\n",
+    "        conn,\n",
+    "    )\n",
+    "\n",
+    "persist_df = master_base.merge(\n",
+    "    persist_best.drop(columns=['non_null_score']),\n",
+    "    on='master_id',\n",
+    "    how='left',\n",
+    ")\n",
+    "\n",
+    "create_sql = f'''\n",
+    "create table if not exists {ELECTION_CONTEXT_TABLE} (\n",
+    "    master_id text primary key references public.master_data_centers(master_id) on delete cascade,\n",
+    "    name text,\n",
+    "    city text,\n",
+    "    state text,\n",
+    "    rdh_layer_title text,\n",
+    "    precinct_identifier_name text,\n",
+    "    election_year integer,\n",
+    "    office text,\n",
+    "    democratic_votes double precision,\n",
+    "    republican_votes double precision,\n",
+    "    total_votes double precision,\n",
+    "    turnout_or_vote_share double precision,\n",
+    "    updated_at timestamptz not null default now()\n",
+    ");\n",
+    "create index if not exists data_center_election_context_state_idx\n",
+    "    on {ELECTION_CONTEXT_TABLE} (state);\n",
+    "create index if not exists data_center_election_context_year_idx\n",
+    "    on {ELECTION_CONTEXT_TABLE} (election_year);\n",
+    "'''\n",
+    "\n",
+    "upsert_sql = f'''\n",
+    "insert into {ELECTION_CONTEXT_TABLE} (\n",
+    "    master_id, name, city, state, rdh_layer_title,\n",
+    "    precinct_identifier_name, election_year, office,\n",
+    "    democratic_votes, republican_votes, total_votes, turnout_or_vote_share,\n",
+    "    updated_at\n",
+    ")\n",
+    "values %s\n",
+    "on conflict (master_id) do update set\n",
+    "    name = excluded.name,\n",
+    "    city = excluded.city,\n",
+    "    state = excluded.state,\n",
+    "    rdh_layer_title = excluded.rdh_layer_title,\n",
+    "    precinct_identifier_name = excluded.precinct_identifier_name,\n",
+    "    election_year = excluded.election_year,\n",
+    "    office = excluded.office,\n",
+    "    democratic_votes = excluded.democratic_votes,\n",
+    "    republican_votes = excluded.republican_votes,\n",
+    "    total_votes = excluded.total_votes,\n",
+    "    turnout_or_vote_share = excluded.turnout_or_vote_share,\n",
+    "    updated_at = now()\n",
+    "'''\n",
+    "\n",
+    "rows = []\n",
+    "for rec in persist_df.to_dict('records'):\n",
+    "    rows.append((\n",
+    "        rec['master_id'],\n",
+    "        rec['name'],\n",
+    "        rec['city'],\n",
+    "        rec['state'],\n",
+    "        rec.get('rdh_layer_title'),\n",
+    "        rec.get('precinct_identifier_name'),\n",
+    "        int(rec['election_year']) if pd.notna(rec.get('election_year')) else None,\n",
+    "        rec.get('office'),\n",
+    "        float(rec['democratic_votes']) if pd.notna(rec.get('democratic_votes')) else None,\n",
+    "        float(rec['republican_votes']) if pd.notna(rec.get('republican_votes')) else None,\n",
+    "        float(rec['total_votes']) if pd.notna(rec.get('total_votes')) else None,\n",
+    "        float(rec['turnout_or_vote_share']) if pd.notna(rec.get('turnout_or_vote_share')) else None,\n",
+    "    ))\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    with conn.cursor() as cur:\n",
+    "        cur.execute(create_sql)\n",
+    "        if rows:\n",
+    "            execute_values(\n",
+    "                cur,\n",
+    "                upsert_sql,\n",
+    "                rows,\n",
+    "                template='(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, now())',\n",
+    "                page_size=1000,\n",
+    "            )\n",
+    "        cur.execute(f'select count(*) from {ELECTION_CONTEXT_TABLE}')\n",
+    "        table_rows = cur.fetchone()[0]\n",
+    "        cur.execute(\n",
+    "            f'''\n",
+    "            select\n",
+    "                state,\n",
+    "                count(*) as rows,\n",
+    "                count(*) filter (\n",
+    "                    where election_year is not null\n",
+    "                       or office is not null\n",
+    "                       or democratic_votes is not null\n",
+    "                       or republican_votes is not null\n",
+    "                       or total_votes is not null\n",
+    "                       or turnout_or_vote_share is not null\n",
+    "                ) as rows_with_election\n",
+    "            from {ELECTION_CONTEXT_TABLE}\n",
+    "            group by state\n",
+    "            order by rows desc, state\n",
+    "            limit 15\n",
+    "            '''\n",
+    "        )\n",
+    "        state_counts = cur.fetchall()\n",
+    "\n",
+    "rows_with_election = int(\n",
+    "    persist_df[\n",
+    "        ['election_year', 'office', 'democratic_votes', 'republican_votes', 'total_votes', 'turnout_or_vote_share']\n",
+    "    ].notna().any(axis=1).sum()\n",
+    ")\n",
+    "print(f'Rows prepared for upsert: {len(rows):,}')\n",
+    "print(f'Rows with election context: {rows_with_election:,}')\n",
+    "print(f'Rows currently in {ELECTION_CONTEXT_TABLE}: {table_rows:,}')\n",
+    "display(pd.DataFrame(state_counts, columns=['state', 'rows', 'rows_with_election']))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31",
+   "metadata": {},
+   "source": [
+    "## Persist Standardized Election Context\n",
+    "\n",
+    "Writes one standardized election-context row per `master_id` into `public.data_center_election_context` for reuse in map and reporting workflows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "32",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Targeted state coverage check\n",
+    "states = ['VA', 'WA', 'WI', 'WV', 'WY', 'DC', 'PR']\n",
+    "\n",
+    "with get_conn() as conn:\n",
+    "    check_df = pd.read_sql_query(\n",
+    "        f'''\n",
+    "        select\n",
+    "            state,\n",
+    "            count(*) as rows,\n",
+    "            count(*) filter (\n",
+    "                where election_year is not null\n",
+    "                   or office is not null\n",
+    "                   or democratic_votes is not null\n",
+    "                   or republican_votes is not null\n",
+    "                   or total_votes is not null\n",
+    "                   or turnout_or_vote_share is not null\n",
+    "            ) as rows_with_election\n",
+    "        from {ELECTION_CONTEXT_TABLE}\n",
+    "        where state = any(%s)\n",
+    "        group by state\n",
+    "        order by state\n",
+    "        ''',\n",
+    "        conn,\n",
+    "        params=[states],\n",
+    "    )\n",
+    "\n",
+    "display(check_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33",
+   "metadata": {},
+   "source": [
+    "## Tables Created by This Notebook and Their Relationships\n",
+    "\n",
+    "This notebook creates and/or maintains the following PostGIS/PostgreSQL tables:\n",
+    "\n",
+    "1. `public.rdh_precinct_vote_layers`\n",
+    "- One row per RDH precinct-election layer ingested.\n",
+    "- Key columns: `layer_id` (PK), `state_code`, `title`, `format`, file/source metadata, `loaded_at`.\n",
+    "\n",
+    "2. `public.rdh_precinct_vote_features`\n",
+    "- One row per precinct polygon feature from a loaded layer.\n",
+    "- Key columns: `feature_id` (PK), `layer_id` (FK), `state_code`, `source_row`, `properties` (JSONB), `geom` (MultiPolygon).\n",
+    "- Relationship: many features belong to one layer.\n",
+    "\n",
+    "3. `public.data_center_rdh_precinct_vote_matches`\n",
+    "- Spatial match table linking data centers to precinct features.\n",
+    "- Key columns: `master_id` (FK), `feature_id` (FK), `layer_id` (FK), `state_code`, `join_method`, `match_distance_m`, `matched_at`.\n",
+    "- Primary key: (`master_id`, `feature_id`).\n",
+    "- Relationship: many-to-many bridge between data centers and precinct features (with match metadata).\n",
+    "\n",
+    "4. `public.data_center_election_context`\n",
+    "- Final standardized, one-row-per-data-center election context used by downstream mapping/analysis.\n",
+    "- Key columns: `master_id` (PK, FK), `name`, `city`, `state`, `rdh_layer_title`,\n",
+    "  `precinct_identifier_name`, `election_year`, `office`, `democratic_votes`, `republican_votes`,\n",
+    "  `total_votes`, `turnout_or_vote_share`, `updated_at`.\n",
+    "- Relationship: one row per `master_id` in `public.master_data_centers` (left-joined so all master rows can be retained, even if election fields are null).\n",
+    "\n",
+    "### Relationship Summary\n",
+    "\n",
+    "- `public.master_data_centers (master_id)`\n",
+    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (master_id)`\n",
+    "  - 1-to-1 (effective in this notebook) -> `public.data_center_election_context (master_id)`\n",
+    "\n",
+    "- `public.rdh_precinct_vote_layers (layer_id)`\n",
+    "  - 1-to-many -> `public.rdh_precinct_vote_features (layer_id)`\n",
+    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (layer_id)`\n",
+    "\n",
+    "- `public.rdh_precinct_vote_features (feature_id)`\n",
+    "  - 1-to-many -> `public.data_center_rdh_precinct_vote_matches (feature_id)`\n",
+    "\n",
+    "In short: **layers -> features -> matches**, then matches are standardized into **one election-context row per data center**."
   ]
  }
 ],