Files
data-centers/.claude/MONDAY_CHECKLIST.md
2026-05-17 18:52:29 -07:00

6.8 KiB
Raw Blame History

Monday morning checklist — EIA ingest verification

Written 2026-05-16 (Saturday). The user will return Monday after the scheduled weekly ingest runs at Mon 03:30 via the systemd user timer ingest-eia-energy-layers.timer.

What's new since last session

This was the first weekly run after wiring up a second EIA endpoint: electricity/facility-fuel (Form EIA-923, monthly per-plant generation in MWh). Previously only electricity/operating-generator-capacity was ingested.

Critical unknown: the facility-fuel column mapping in build_flat_tables was written from EIA's API docs without confirmation of actual JSON key casing. EIA-923 endpoints were 503/timing-out all day Saturday so we couldn't smoke-test. raw_properties JSONB is the safety net.

The longitude-sign bug (historical lower-48 stored positive in 2008-01 → 2010-11) was fixed in build_flat_tables and applied in-place to the live table on Saturday. Monday's rebuild should produce identical corrected data.

Step 1 — Did the timer fire? Did it succeed?

systemctl --user status ingest-eia-energy-layers.service
journalctl --user -u ingest-eia-energy-layers.service --since "yesterday" | tail -50
ls -lt output/ingest_*.log | head -3

Look for Active: inactive (dead) with Main PID: ... (code=exited, status=0/SUCCESS). Anything else = job failed; read the log.

Step 2 — Check both tables populated

-- Expected: ~4.7M rows, 2008-01 → ~2026-03
select count(*), min(period), max(period)
from public.energy_eia_operating_generator_capacity_flat;

-- Expected (if facility-fuel ingest succeeded): millions of rows, ~2001-01 → ~2026-02
select count(*), min(period), max(period)
from public.energy_eia_facility_fuel_flat;

Step 3 — Verify facility-fuel column mapping

This is the one that needs human eyes. Run:

select plant_id, plant_name, state_id, state_name,
       prime_mover_code, prime_mover_desc,
       energy_source_code, energy_source_desc,
       generation_mwh, gross_generation_mwh,
       raw_properties
from public.energy_eia_facility_fuel_flat
limit 3;

If typed columns are populated: mapping is correct, ship it.

If typed columns are NULL but raw_properties has data: EIA's actual JSON keys differ from my guesses. Inspect raw_properties to find the real keys (probably some combination of camelCase vs lowercase or hyphens), then patch the SELECT in ingest_eia_energy_layers.py at the if "energy_eia_electricity_facility_fuel" in available: block (~line 870). After patching, rebuild just the flat table without re-ingesting:

set -a && . ~/.zsh_secrets && set +a
python3 ingest_eia_energy_layers.py --skip-ingest --endpoint facility-fuel

Wait — --skip-ingest bypasses ingest but build_flat_tables runs from the intermediate raw table which gets pruned at end-of-run by keep_only_target_flat_table. So after a successful weekly run, the raw facility-fuel table is gone. To patch flat columns without a full re-ingest, you'll need to re-fetch the raw data:

python3 ingest_eia_energy_layers.py --endpoint facility-fuel

That re-ingests only facility-fuel (does not touch OGC), then rebuilds both flat tables with the corrected SELECT.

Step 4 — Possible failure modes & responses

Symptom Diagnosis Action
Service status = failed, log shows 503 from facility-fuel EIA-923 still down Wait, manually re-run python3 ingest_eia_energy_layers.py --endpoint facility-fuel when EIA recovers
Service status = failed, log shows error in OGC ingest EIA-860 down or different bug Diagnose from the traceback; OGC has run successfully many times so likely transient
Service succeeded, OGC row count looks right, facility-fuel table missing Endpoint silently failed but didn't propagate — should not happen with current code Check log carefully; bug in the new error handling
Service succeeded, both tables present, facility-fuel columns NULL JSON key casing wrong See Step 3 patch path

Key paths

What does NOT need doing

  • OGC longitude fix is already deployed (in script + applied in-place Saturday). No re-run needed.
  • systemd unit files: no changes required, the new code path uses the same wrapper.
  • The --endpoint flag was added Saturday for ad-hoc per-dataset runs. Useful for re-running just facility-fuel without disturbing OGC.

Open thread to close out after verification

Update output/facility_fuel_pending_narrative.txt once facility-fuel is actually ingested. Replace the "Pending" framing with the real row count, period range, and any column-mapping notes from Step 3. Mirror the format of operating_generator_capacity_sample.txt.

New endpoint added — SEDS (State Energy Data System)

Wired up 2026-05-17. Endpoint: seds (annual frequency, https://api.eia.gov/v2/seds/data/). Probed live, columns verified, smoke test of 50 rows landed in public.energy_eia_seds_flat with all typed columns populated.

Verified JSON keys (no sector field — sector is encoded in seriesId): period (YYYY), seriesId, seriesDescription, stateId, stateDescription, value, unit.

Total volume: ~2.57M rows across 65 years (19602024), ~40k rows/year. Ingested year-by-year via the generalized fetch_eia_pages_by_period to stay under EIA's 503 threshold (same pattern as the monthly endpoints).

What to verify Monday:

-- Expected: ~2.5M+ rows, 1960 → 2024
select count(*), min(year), max(year)
from public.energy_eia_seds_flat;

-- Spot-check that typed columns landed (not all NULL)
select period, year, series_id, state_id, value, unit
from public.energy_eia_seds_flat
order by random()
limit 5;

If row count is way under 2.5M, suspect a mid-run failure — check the log for 503 errors on the seds endpoint and re-run with python3 ingest_eia_energy_layers.py --category state_energy --endpoint seds.

Product/API docs for reference: