# Monday morning checklist — EIA ingest verification Written 2026-05-16 (Saturday). The user will return Monday after the scheduled weekly ingest runs at **Mon 03:30** via the systemd user timer `ingest-eia-energy-layers.timer`. ## What's new since last session This was the first weekly run after wiring up a **second EIA endpoint**: `electricity/facility-fuel` (Form EIA-923, monthly per-plant generation in MWh). Previously only `electricity/operating-generator-capacity` was ingested. **Critical unknown:** the facility-fuel column mapping in `build_flat_tables` was written from EIA's API docs without confirmation of actual JSON key casing. EIA-923 endpoints were 503/timing-out all day Saturday so we couldn't smoke-test. `raw_properties` JSONB is the safety net. The longitude-sign bug (historical lower-48 stored positive in 2008-01 → 2010-11) was fixed in `build_flat_tables` and applied in-place to the live table on Saturday. Monday's rebuild should produce identical corrected data. ## Step 1 — Did the timer fire? Did it succeed? ```bash systemctl --user status ingest-eia-energy-layers.service journalctl --user -u ingest-eia-energy-layers.service --since "yesterday" | tail -50 ls -lt output/ingest_*.log | head -3 ``` Look for `Active: inactive (dead)` with `Main PID: ... (code=exited, status=0/SUCCESS)`. Anything else = job failed; read the log. ## Step 2 — Check both tables populated ```sql -- Expected: ~4.7M rows, 2008-01 → ~2026-03 select count(*), min(period), max(period) from public.energy_eia_operating_generator_capacity_flat; -- Expected (if facility-fuel ingest succeeded): millions of rows, ~2001-01 → ~2026-02 select count(*), min(period), max(period) from public.energy_eia_facility_fuel_flat; ``` ## Step 3 — Verify facility-fuel column mapping This is the one that needs human eyes. Run: ```sql select plant_id, plant_name, state_id, state_name, prime_mover_code, prime_mover_desc, energy_source_code, energy_source_desc, generation_mwh, gross_generation_mwh, raw_properties from public.energy_eia_facility_fuel_flat limit 3; ``` **If typed columns are populated:** mapping is correct, ship it. **If typed columns are NULL but `raw_properties` has data:** EIA's actual JSON keys differ from my guesses. Inspect `raw_properties` to find the real keys (probably some combination of camelCase vs lowercase or hyphens), then patch the SELECT in [ingest_eia_energy_layers.py](../ingest_eia_energy_layers.py) at the `if "energy_eia_electricity_facility_fuel" in available:` block (~line 870). After patching, rebuild just the flat table without re-ingesting: ```bash set -a && . ~/.zsh_secrets && set +a python3 ingest_eia_energy_layers.py --skip-ingest --endpoint facility-fuel ``` Wait — `--skip-ingest` bypasses ingest but `build_flat_tables` runs from the *intermediate raw table* which gets pruned at end-of-run by `keep_only_target_flat_table`. So after a successful weekly run, the raw facility-fuel table is gone. To patch flat columns without a full re-ingest, you'll need to re-fetch the raw data: ```bash python3 ingest_eia_energy_layers.py --endpoint facility-fuel ``` That re-ingests *only* facility-fuel (does not touch OGC), then rebuilds both flat tables with the corrected SELECT. ## Step 4 — Possible failure modes & responses | Symptom | Diagnosis | Action | |---|---|---| | Service status = failed, log shows `503` from facility-fuel | EIA-923 still down | Wait, manually re-run `python3 ingest_eia_energy_layers.py --endpoint facility-fuel` when EIA recovers | | Service status = failed, log shows error in OGC ingest | EIA-860 down or different bug | Diagnose from the traceback; OGC has run successfully many times so likely transient | | Service succeeded, OGC row count looks right, facility-fuel table missing | Endpoint silently failed but didn't propagate — should not happen with current code | Check log carefully; bug in the new error handling | | Service succeeded, both tables present, facility-fuel columns NULL | JSON key casing wrong | See Step 3 patch path | ## Key paths - Script: [`ingest_eia_energy_layers.py`](../ingest_eia_energy_layers.py) - Wrapper: `~/.local/bin/ingest-eia-energy-layers-weekly` - Service: `~/.config/systemd/user/ingest-eia-energy-layers.service` - Timer: `~/.config/systemd/user/ingest-eia-energy-layers.timer` - Per-run logs: `output/ingest_YYYYMMDD_HHMMSS.log` (kept by wrapper) - Sample/narrative output: [`output/operating_generator_capacity_sample.txt`](../output/operating_generator_capacity_sample.txt) - Facility-fuel narrative (pre-ingest): [`output/facility_fuel_pending_narrative.txt`](../output/facility_fuel_pending_narrative.txt) - Env vars sourced from: `~/.zsh_secrets` ## What does NOT need doing - OGC longitude fix is already deployed (in script + applied in-place Saturday). No re-run needed. - systemd unit files: no changes required, the new code path uses the same wrapper. - The `--endpoint` flag was added Saturday for ad-hoc per-dataset runs. Useful for re-running just facility-fuel without disturbing OGC. ## Open thread to close out after verification Update [`output/facility_fuel_pending_narrative.txt`](../output/facility_fuel_pending_narrative.txt) once facility-fuel is actually ingested. Replace the "Pending" framing with the real row count, period range, and any column-mapping notes from Step 3. Mirror the format of `operating_generator_capacity_sample.txt`. ## New endpoint added — SEDS (State Energy Data System) Wired up 2026-05-17. Endpoint: `seds` (annual frequency, `https://api.eia.gov/v2/seds/data/`). Probed live, columns verified, smoke test of 50 rows landed in `public.energy_eia_seds_flat` with all typed columns populated. **Verified JSON keys** (no sector field — sector is encoded in `seriesId`): `period` (YYYY), `seriesId`, `seriesDescription`, `stateId`, `stateDescription`, `value`, `unit`. **Total volume:** ~2.57M rows across 65 years (1960–2024), ~40k rows/year. Ingested year-by-year via the generalized `fetch_eia_pages_by_period` to stay under EIA's 503 threshold (same pattern as the monthly endpoints). **What to verify Monday:** ```sql -- Expected: ~2.5M+ rows, 1960 → 2024 select count(*), min(year), max(year) from public.energy_eia_seds_flat; -- Spot-check that typed columns landed (not all NULL) select period, year, series_id, state_id, value, unit from public.energy_eia_seds_flat order by random() limit 5; ``` If row count is way under 2.5M, suspect a mid-run failure — check the log for `503` errors on the `seds` endpoint and re-run with `python3 ingest_eia_energy_layers.py --category state_energy --endpoint seds`. **Product/API docs for reference:** - Product page: https://www.eia.gov/state/seds/ - Technical notes: https://www.eia.gov/state/seds/seds-technical-notes-complete.php - API documentation: https://www.eia.gov/opendata/documentation.php