133 lines
5.9 KiB
Plaintext
133 lines
5.9 KiB
Plaintext
================================================================================
|
||
EIA Facility-Fuel — Pending Dataset Narrative
|
||
Drafted 2026-05-16, prior to first successful ingest
|
||
================================================================================
|
||
|
||
STATUS
|
||
------
|
||
Wired into the weekly ingest pipeline as of 2026-05-16, but not yet
|
||
populated. EIA's facility-fuel endpoint and its parent EIA-923 service
|
||
were experiencing a sustained outage at write time (network-level
|
||
connection timeouts, also visible on EIA's public dashboard). The
|
||
endpoint is queued for the next successful systemd run (Monday 03:30,
|
||
or sooner if EIA recovers).
|
||
|
||
Target table when populated: public.energy_eia_facility_fuel_flat
|
||
|
||
WHAT THIS DATA IS
|
||
-----------------
|
||
The "facility-fuel" endpoint
|
||
(https://api.eia.gov/v2/electricity/facility-fuel/) exposes Form EIA-923:
|
||
the monthly survey collected from electric power plants reporting their
|
||
fuel consumption and electricity output. Where operating-generator-capacity
|
||
tells us WHAT generators exist and WHERE they are, facility-fuel tells us
|
||
HOW MUCH electricity each plant actually produced each month.
|
||
|
||
Each row represents one (plant × energy source × prime mover × month)
|
||
combination. A coal-gas hybrid plant with both steam turbines and
|
||
combustion turbines, for example, would have multiple rows per month —
|
||
one for each fuel/prime-mover combination it ran during that month.
|
||
|
||
WHAT IT TELLS US (PLANNED COLUMNS)
|
||
----------------------------------
|
||
For each plant, in each reporting month:
|
||
|
||
period YYYY-MM reporting month
|
||
plant_id EIA plant code — joins to operating_generator_capacity_flat
|
||
plant_name Plant name (when present)
|
||
state_id Two-letter state
|
||
state_name Full state name (when present)
|
||
prime_mover_code ST=steam, CT=combustion, HY=hydro, etc.
|
||
prime_mover_desc Human-readable prime mover
|
||
energy_source_code EIA fuel code (e.g., NG=natural gas, BIT=bituminous coal)
|
||
energy_source_desc Human-readable fuel
|
||
generation_mwh NET generation in megawatt-hours (after plant use)
|
||
gross_generation_mwh GROSS generation in megawatt-hours (at the busbar)
|
||
raw_properties Full JSONB of the EIA response row (safety net)
|
||
|
||
The two MWh fields are the headline numbers — actual electricity output.
|
||
|
||
WHY BOTH TABLES MATTER
|
||
----------------------
|
||
The capacity table answers "what generators exist and where," but a
|
||
generator that exists is not the same as a generator that produces. A
|
||
1,000 MW coal plant in standby status produces zero MWh; a 100 MW solar
|
||
farm at noon produces near its nameplate. Capacity sets the upper bound;
|
||
facility-fuel reports the realized output.
|
||
|
||
For data-center analyses specifically, this matters because:
|
||
|
||
- Siting decisions correlate with available local generation. The
|
||
capacity table shows nearby supply potential. The facility-fuel
|
||
table shows whether that potential is actually being realized
|
||
month-to-month (e.g., a nearby gas plant that runs only as peaker
|
||
is a very different story from one running baseload).
|
||
|
||
- Carbon intensity per data center can be estimated by attributing
|
||
nearby generation MWh to fuel type, weighted by distance or
|
||
balancing-authority membership.
|
||
|
||
- Grid stress signals (capacity utilization = generation / capacity)
|
||
flag regions where new data-center load may be unwelcome.
|
||
|
||
JOIN PATTERN
|
||
------------
|
||
The natural join key is plant_id (text). Typical analyst query:
|
||
|
||
select
|
||
cap.plant_name,
|
||
cap.state_id,
|
||
cap.entity_name,
|
||
cap.latitude,
|
||
cap.longitude,
|
||
ff.period,
|
||
ff.energy_source_desc,
|
||
ff.generation_mwh,
|
||
ff.gross_generation_mwh
|
||
from public.energy_eia_facility_fuel_flat ff
|
||
join public.energy_eia_operating_generator_capacity_flat cap
|
||
on cap.plant_id = ff.plant_id
|
||
and cap.period = ff.period
|
||
where ff.period = '2026-01';
|
||
|
||
Note: capacity rows are per-generator; facility-fuel rows are per
|
||
plant × fuel × prime mover. A join on plant_id alone will multiply rows.
|
||
For most aggregate questions, aggregate one side first (e.g., sum MWh
|
||
per plant-month, or pick a representative generator per plant).
|
||
|
||
EXPECTED SIZE
|
||
-------------
|
||
Form EIA-923 monthly publishes back to 2001-01. With ~10,000 reporting
|
||
plants and multiple fuel/prime-mover combinations per plant per month,
|
||
the table is expected in the 5–10 million row range — similar to or
|
||
somewhat larger than the capacity table. The per-month ingest strategy
|
||
(start=YYYY-MM&end=YYYY-MM, retry/backoff) is identical to the capacity
|
||
ingest and was chosen specifically because it kept that table's wall
|
||
time near two hours and recovered cleanly from EIA's transient 503s.
|
||
|
||
UNKNOWNS AT TIME OF DRAFT
|
||
-------------------------
|
||
The flat-table SELECT was written from EIA's API documentation without
|
||
confirmation of the exact JSON key casing returned by the live endpoint
|
||
(the documentation lists facets as plantCode, fuel2002, primeMover, state
|
||
— the SELECT uses these names). If the live response differs (e.g.,
|
||
plantid vs plantCode), the typed columns will populate as NULL for
|
||
those rows, and the full original payload will still be available in
|
||
raw_properties for inspection. The fix in that case is a one-line edit
|
||
to the SELECT in build_flat_tables() in ingest_eia_energy_layers.py.
|
||
|
||
OPERATIONAL NOTES
|
||
-----------------
|
||
- Runs in the same weekly systemd job as operating-generator-capacity,
|
||
sequentially after it (Monday 03:30 via
|
||
ingest-eia-energy-layers.timer).
|
||
|
||
- Both tables are rebuilt from scratch each run (TRUNCATE on first
|
||
page), so historical revisions EIA pushes upstream propagate
|
||
automatically. There is no incremental-load mode and none is
|
||
planned — total wall time is acceptable.
|
||
|
||
- If EIA-923 is down at run time, the wrapper's `set -e` will mark
|
||
the systemd service as failed; the capacity ingest will still have
|
||
completed successfully because it runs first.
|