================================================================================
EIA Facility-Fuel — Pending Dataset Narrative
Drafted 2026-05-16, prior to first successful ingest
================================================================================

STATUS
------
Wired into the weekly ingest pipeline as of 2026-05-16, but not yet
populated. EIA's facility-fuel endpoint and its parent EIA-923 service
were experiencing a sustained outage at write time (network-level
connection timeouts, also visible on EIA's public dashboard). The
endpoint is queued for the next successful systemd run (Monday 03:30,
or sooner if EIA recovers).

Target table when populated: public.energy_eia_facility_fuel_flat

WHAT THIS DATA IS
-----------------
The "facility-fuel" endpoint
(https://api.eia.gov/v2/electricity/facility-fuel/) exposes Form EIA-923:
the monthly survey collected from electric power plants reporting their
fuel consumption and electricity output. Where operating-generator-capacity
tells us WHAT generators exist and WHERE they are, facility-fuel tells us
HOW MUCH electricity each plant actually produced each month.

Each row represents one (plant × energy source × prime mover × month)
combination. A coal-gas hybrid plant with both steam turbines and
combustion turbines, for example, would have multiple rows per month —
one for each fuel/prime-mover combination it ran during that month.

WHAT IT TELLS US (PLANNED COLUMNS)
----------------------------------
For each plant, in each reporting month:

  period                 YYYY-MM reporting month
  plant_id               EIA plant code — joins to operating_generator_capacity_flat
  plant_name             Plant name (when present)
  state_id               Two-letter state
  state_name             Full state name (when present)
  prime_mover_code       ST=steam, CT=combustion, HY=hydro, etc.
  prime_mover_desc       Human-readable prime mover
  energy_source_code     EIA fuel code (e.g., NG=natural gas, BIT=bituminous coal)
  energy_source_desc     Human-readable fuel
  generation_mwh         NET generation in megawatt-hours (after plant use)
  gross_generation_mwh   GROSS generation in megawatt-hours (at the busbar)
  raw_properties         Full JSONB of the EIA response row (safety net)

The two MWh fields are the headline numbers — actual electricity output.

WHY BOTH TABLES MATTER
----------------------
The capacity table answers "what generators exist and where," but a
generator that exists is not the same as a generator that produces. A
1,000 MW coal plant in standby status produces zero MWh; a 100 MW solar
farm at noon produces near its nameplate. Capacity sets the upper bound;
facility-fuel reports the realized output.

For data-center analyses specifically, this matters because:

  - Siting decisions correlate with available local generation. The
    capacity table shows nearby supply potential. The facility-fuel
    table shows whether that potential is actually being realized
    month-to-month (e.g., a nearby gas plant that runs only as peaker
    is a very different story from one running baseload).

  - Carbon intensity per data center can be estimated by attributing
    nearby generation MWh to fuel type, weighted by distance or
    balancing-authority membership.

  - Grid stress signals (capacity utilization = generation / capacity)
    flag regions where new data-center load may be unwelcome.

JOIN PATTERN
------------
The natural join key is plant_id (text). Typical analyst query:

  select
      cap.plant_name,
      cap.state_id,
      cap.entity_name,
      cap.latitude,
      cap.longitude,
      ff.period,
      ff.energy_source_desc,
      ff.generation_mwh,
      ff.gross_generation_mwh
  from public.energy_eia_facility_fuel_flat ff
  join public.energy_eia_operating_generator_capacity_flat cap
       on cap.plant_id = ff.plant_id
      and cap.period   = ff.period
  where ff.period = '2026-01';

Note: capacity rows are per-generator; facility-fuel rows are per
plant × fuel × prime mover. A join on plant_id alone will multiply rows.
For most aggregate questions, aggregate one side first (e.g., sum MWh
per plant-month, or pick a representative generator per plant).

EXPECTED SIZE
-------------
Form EIA-923 monthly publishes back to 2001-01. With ~10,000 reporting
plants and multiple fuel/prime-mover combinations per plant per month,
the table is expected in the 5–10 million row range — similar to or
somewhat larger than the capacity table. The per-month ingest strategy
(start=YYYY-MM&end=YYYY-MM, retry/backoff) is identical to the capacity
ingest and was chosen specifically because it kept that table's wall
time near two hours and recovered cleanly from EIA's transient 503s.

UNKNOWNS AT TIME OF DRAFT
-------------------------
The flat-table SELECT was written from EIA's API documentation without
confirmation of the exact JSON key casing returned by the live endpoint
(the documentation lists facets as plantCode, fuel2002, primeMover, state
— the SELECT uses these names). If the live response differs (e.g.,
plantid vs plantCode), the typed columns will populate as NULL for
those rows, and the full original payload will still be available in
raw_properties for inspection. The fix in that case is a one-line edit
to the SELECT in build_flat_tables() in ingest_eia_energy_layers.py.

OPERATIONAL NOTES
-----------------
  - Runs in the same weekly systemd job as operating-generator-capacity,
    sequentially after it (Monday 03:30 via
    ingest-eia-energy-layers.timer).

  - Both tables are rebuilt from scratch each run (TRUNCATE on first
    page), so historical revisions EIA pushes upstream propagate
    automatically. There is no incremental-load mode and none is
    planned — total wall time is acceptable.

  - If EIA-923 is down at run time, the wrapper's `set -e` will mark
    the systemd service as failed; the capacity ingest will still have
    completed successfully because it runs first.
