157 lines
6.8 KiB
Markdown
157 lines
6.8 KiB
Markdown
# Monday morning checklist — EIA ingest verification
|
||
|
||
Written 2026-05-16 (Saturday). The user will return Monday after the
|
||
scheduled weekly ingest runs at **Mon 03:30** via the systemd user timer
|
||
`ingest-eia-energy-layers.timer`.
|
||
|
||
## What's new since last session
|
||
|
||
This was the first weekly run after wiring up a **second EIA endpoint**:
|
||
`electricity/facility-fuel` (Form EIA-923, monthly per-plant generation in
|
||
MWh). Previously only `electricity/operating-generator-capacity` was ingested.
|
||
|
||
**Critical unknown:** the facility-fuel column mapping in `build_flat_tables`
|
||
was written from EIA's API docs without confirmation of actual JSON key
|
||
casing. EIA-923 endpoints were 503/timing-out all day Saturday so we couldn't
|
||
smoke-test. `raw_properties` JSONB is the safety net.
|
||
|
||
The longitude-sign bug (historical lower-48 stored positive in 2008-01 →
|
||
2010-11) was fixed in `build_flat_tables` and applied in-place to the live
|
||
table on Saturday. Monday's rebuild should produce identical corrected data.
|
||
|
||
## Step 1 — Did the timer fire? Did it succeed?
|
||
|
||
```bash
|
||
systemctl --user status ingest-eia-energy-layers.service
|
||
journalctl --user -u ingest-eia-energy-layers.service --since "yesterday" | tail -50
|
||
ls -lt output/ingest_*.log | head -3
|
||
```
|
||
|
||
Look for `Active: inactive (dead)` with `Main PID: ... (code=exited, status=0/SUCCESS)`.
|
||
Anything else = job failed; read the log.
|
||
|
||
## Step 2 — Check both tables populated
|
||
|
||
```sql
|
||
-- Expected: ~4.7M rows, 2008-01 → ~2026-03
|
||
select count(*), min(period), max(period)
|
||
from public.energy_eia_operating_generator_capacity_flat;
|
||
|
||
-- Expected (if facility-fuel ingest succeeded): millions of rows, ~2001-01 → ~2026-02
|
||
select count(*), min(period), max(period)
|
||
from public.energy_eia_facility_fuel_flat;
|
||
```
|
||
|
||
## Step 3 — Verify facility-fuel column mapping
|
||
|
||
This is the one that needs human eyes. Run:
|
||
|
||
```sql
|
||
select plant_id, plant_name, state_id, state_name,
|
||
prime_mover_code, prime_mover_desc,
|
||
energy_source_code, energy_source_desc,
|
||
generation_mwh, gross_generation_mwh,
|
||
raw_properties
|
||
from public.energy_eia_facility_fuel_flat
|
||
limit 3;
|
||
```
|
||
|
||
**If typed columns are populated:** mapping is correct, ship it.
|
||
|
||
**If typed columns are NULL but `raw_properties` has data:** EIA's actual JSON
|
||
keys differ from my guesses. Inspect `raw_properties` to find the real keys
|
||
(probably some combination of camelCase vs lowercase or hyphens), then patch
|
||
the SELECT in [ingest_eia_energy_layers.py](../ingest_eia_energy_layers.py)
|
||
at the `if "energy_eia_electricity_facility_fuel" in available:` block
|
||
(~line 870). After patching, rebuild just the flat table without re-ingesting:
|
||
|
||
```bash
|
||
set -a && . ~/.zsh_secrets && set +a
|
||
python3 ingest_eia_energy_layers.py --skip-ingest --endpoint facility-fuel
|
||
```
|
||
|
||
Wait — `--skip-ingest` bypasses ingest but `build_flat_tables` runs from
|
||
the *intermediate raw table* which gets pruned at end-of-run by
|
||
`keep_only_target_flat_table`. So after a successful weekly run, the raw
|
||
facility-fuel table is gone. To patch flat columns without a full re-ingest,
|
||
you'll need to re-fetch the raw data:
|
||
|
||
```bash
|
||
python3 ingest_eia_energy_layers.py --endpoint facility-fuel
|
||
```
|
||
|
||
That re-ingests *only* facility-fuel (does not touch OGC), then rebuilds
|
||
both flat tables with the corrected SELECT.
|
||
|
||
## Step 4 — Possible failure modes & responses
|
||
|
||
| Symptom | Diagnosis | Action |
|
||
|---|---|---|
|
||
| Service status = failed, log shows `503` from facility-fuel | EIA-923 still down | Wait, manually re-run `python3 ingest_eia_energy_layers.py --endpoint facility-fuel` when EIA recovers |
|
||
| Service status = failed, log shows error in OGC ingest | EIA-860 down or different bug | Diagnose from the traceback; OGC has run successfully many times so likely transient |
|
||
| Service succeeded, OGC row count looks right, facility-fuel table missing | Endpoint silently failed but didn't propagate — should not happen with current code | Check log carefully; bug in the new error handling |
|
||
| Service succeeded, both tables present, facility-fuel columns NULL | JSON key casing wrong | See Step 3 patch path |
|
||
|
||
## Key paths
|
||
|
||
- Script: [`ingest_eia_energy_layers.py`](../ingest_eia_energy_layers.py)
|
||
- Wrapper: `~/.local/bin/ingest-eia-energy-layers-weekly`
|
||
- Service: `~/.config/systemd/user/ingest-eia-energy-layers.service`
|
||
- Timer: `~/.config/systemd/user/ingest-eia-energy-layers.timer`
|
||
- Per-run logs: `output/ingest_YYYYMMDD_HHMMSS.log` (kept by wrapper)
|
||
- Sample/narrative output: [`output/operating_generator_capacity_sample.txt`](../output/operating_generator_capacity_sample.txt)
|
||
- Facility-fuel narrative (pre-ingest): [`output/facility_fuel_pending_narrative.txt`](../output/facility_fuel_pending_narrative.txt)
|
||
- Env vars sourced from: `~/.zsh_secrets`
|
||
|
||
## What does NOT need doing
|
||
|
||
- OGC longitude fix is already deployed (in script + applied in-place Saturday). No re-run needed.
|
||
- systemd unit files: no changes required, the new code path uses the same wrapper.
|
||
- The `--endpoint` flag was added Saturday for ad-hoc per-dataset runs. Useful for re-running just facility-fuel without disturbing OGC.
|
||
|
||
## Open thread to close out after verification
|
||
|
||
Update [`output/facility_fuel_pending_narrative.txt`](../output/facility_fuel_pending_narrative.txt)
|
||
once facility-fuel is actually ingested. Replace the "Pending" framing with
|
||
the real row count, period range, and any column-mapping notes from Step 3.
|
||
Mirror the format of `operating_generator_capacity_sample.txt`.
|
||
|
||
## New endpoint added — SEDS (State Energy Data System)
|
||
|
||
Wired up 2026-05-17. Endpoint: `seds` (annual frequency,
|
||
`https://api.eia.gov/v2/seds/data/`). Probed live, columns verified, smoke
|
||
test of 50 rows landed in `public.energy_eia_seds_flat` with all typed
|
||
columns populated.
|
||
|
||
**Verified JSON keys** (no sector field — sector is encoded in `seriesId`):
|
||
`period` (YYYY), `seriesId`, `seriesDescription`, `stateId`,
|
||
`stateDescription`, `value`, `unit`.
|
||
|
||
**Total volume:** ~2.57M rows across 65 years (1960–2024), ~40k rows/year.
|
||
Ingested year-by-year via the generalized `fetch_eia_pages_by_period` to
|
||
stay under EIA's 503 threshold (same pattern as the monthly endpoints).
|
||
|
||
**What to verify Monday:**
|
||
|
||
```sql
|
||
-- Expected: ~2.5M+ rows, 1960 → 2024
|
||
select count(*), min(year), max(year)
|
||
from public.energy_eia_seds_flat;
|
||
|
||
-- Spot-check that typed columns landed (not all NULL)
|
||
select period, year, series_id, state_id, value, unit
|
||
from public.energy_eia_seds_flat
|
||
order by random()
|
||
limit 5;
|
||
```
|
||
|
||
If row count is way under 2.5M, suspect a mid-run failure — check the log
|
||
for `503` errors on the `seds` endpoint and re-run with
|
||
`python3 ingest_eia_energy_layers.py --category state_energy --endpoint seds`.
|
||
|
||
**Product/API docs for reference:**
|
||
|
||
- Product page: https://www.eia.gov/state/seds/
|
||
- Technical notes: https://www.eia.gov/state/seds/seds-technical-notes-complete.php
|
||
- API documentation: https://www.eia.gov/opendata/documentation.php
|