got the ingest for energy eia data. created txt files of their descriptions
This commit is contained in:
132
output/facility_fuel_pending_narrative.txt
Normal file
132
output/facility_fuel_pending_narrative.txt
Normal file
@@ -0,0 +1,132 @@
|
||||
================================================================================
|
||||
EIA Facility-Fuel — Pending Dataset Narrative
|
||||
Drafted 2026-05-16, prior to first successful ingest
|
||||
================================================================================
|
||||
|
||||
STATUS
|
||||
------
|
||||
Wired into the weekly ingest pipeline as of 2026-05-16, but not yet
|
||||
populated. EIA's facility-fuel endpoint and its parent EIA-923 service
|
||||
were experiencing a sustained outage at write time (network-level
|
||||
connection timeouts, also visible on EIA's public dashboard). The
|
||||
endpoint is queued for the next successful systemd run (Monday 03:30,
|
||||
or sooner if EIA recovers).
|
||||
|
||||
Target table when populated: public.energy_eia_facility_fuel_flat
|
||||
|
||||
WHAT THIS DATA IS
|
||||
-----------------
|
||||
The "facility-fuel" endpoint
|
||||
(https://api.eia.gov/v2/electricity/facility-fuel/) exposes Form EIA-923:
|
||||
the monthly survey collected from electric power plants reporting their
|
||||
fuel consumption and electricity output. Where operating-generator-capacity
|
||||
tells us WHAT generators exist and WHERE they are, facility-fuel tells us
|
||||
HOW MUCH electricity each plant actually produced each month.
|
||||
|
||||
Each row represents one (plant × energy source × prime mover × month)
|
||||
combination. A coal-gas hybrid plant with both steam turbines and
|
||||
combustion turbines, for example, would have multiple rows per month —
|
||||
one for each fuel/prime-mover combination it ran during that month.
|
||||
|
||||
WHAT IT TELLS US (PLANNED COLUMNS)
|
||||
----------------------------------
|
||||
For each plant, in each reporting month:
|
||||
|
||||
period YYYY-MM reporting month
|
||||
plant_id EIA plant code — joins to operating_generator_capacity_flat
|
||||
plant_name Plant name (when present)
|
||||
state_id Two-letter state
|
||||
state_name Full state name (when present)
|
||||
prime_mover_code ST=steam, CT=combustion, HY=hydro, etc.
|
||||
prime_mover_desc Human-readable prime mover
|
||||
energy_source_code EIA fuel code (e.g., NG=natural gas, BIT=bituminous coal)
|
||||
energy_source_desc Human-readable fuel
|
||||
generation_mwh NET generation in megawatt-hours (after plant use)
|
||||
gross_generation_mwh GROSS generation in megawatt-hours (at the busbar)
|
||||
raw_properties Full JSONB of the EIA response row (safety net)
|
||||
|
||||
The two MWh fields are the headline numbers — actual electricity output.
|
||||
|
||||
WHY BOTH TABLES MATTER
|
||||
----------------------
|
||||
The capacity table answers "what generators exist and where," but a
|
||||
generator that exists is not the same as a generator that produces. A
|
||||
1,000 MW coal plant in standby status produces zero MWh; a 100 MW solar
|
||||
farm at noon produces near its nameplate. Capacity sets the upper bound;
|
||||
facility-fuel reports the realized output.
|
||||
|
||||
For data-center analyses specifically, this matters because:
|
||||
|
||||
- Siting decisions correlate with available local generation. The
|
||||
capacity table shows nearby supply potential. The facility-fuel
|
||||
table shows whether that potential is actually being realized
|
||||
month-to-month (e.g., a nearby gas plant that runs only as peaker
|
||||
is a very different story from one running baseload).
|
||||
|
||||
- Carbon intensity per data center can be estimated by attributing
|
||||
nearby generation MWh to fuel type, weighted by distance or
|
||||
balancing-authority membership.
|
||||
|
||||
- Grid stress signals (capacity utilization = generation / capacity)
|
||||
flag regions where new data-center load may be unwelcome.
|
||||
|
||||
JOIN PATTERN
|
||||
------------
|
||||
The natural join key is plant_id (text). Typical analyst query:
|
||||
|
||||
select
|
||||
cap.plant_name,
|
||||
cap.state_id,
|
||||
cap.entity_name,
|
||||
cap.latitude,
|
||||
cap.longitude,
|
||||
ff.period,
|
||||
ff.energy_source_desc,
|
||||
ff.generation_mwh,
|
||||
ff.gross_generation_mwh
|
||||
from public.energy_eia_facility_fuel_flat ff
|
||||
join public.energy_eia_operating_generator_capacity_flat cap
|
||||
on cap.plant_id = ff.plant_id
|
||||
and cap.period = ff.period
|
||||
where ff.period = '2026-01';
|
||||
|
||||
Note: capacity rows are per-generator; facility-fuel rows are per
|
||||
plant × fuel × prime mover. A join on plant_id alone will multiply rows.
|
||||
For most aggregate questions, aggregate one side first (e.g., sum MWh
|
||||
per plant-month, or pick a representative generator per plant).
|
||||
|
||||
EXPECTED SIZE
|
||||
-------------
|
||||
Form EIA-923 monthly publishes back to 2001-01. With ~10,000 reporting
|
||||
plants and multiple fuel/prime-mover combinations per plant per month,
|
||||
the table is expected in the 5–10 million row range — similar to or
|
||||
somewhat larger than the capacity table. The per-month ingest strategy
|
||||
(start=YYYY-MM&end=YYYY-MM, retry/backoff) is identical to the capacity
|
||||
ingest and was chosen specifically because it kept that table's wall
|
||||
time near two hours and recovered cleanly from EIA's transient 503s.
|
||||
|
||||
UNKNOWNS AT TIME OF DRAFT
|
||||
-------------------------
|
||||
The flat-table SELECT was written from EIA's API documentation without
|
||||
confirmation of the exact JSON key casing returned by the live endpoint
|
||||
(the documentation lists facets as plantCode, fuel2002, primeMover, state
|
||||
— the SELECT uses these names). If the live response differs (e.g.,
|
||||
plantid vs plantCode), the typed columns will populate as NULL for
|
||||
those rows, and the full original payload will still be available in
|
||||
raw_properties for inspection. The fix in that case is a one-line edit
|
||||
to the SELECT in build_flat_tables() in ingest_eia_energy_layers.py.
|
||||
|
||||
OPERATIONAL NOTES
|
||||
-----------------
|
||||
- Runs in the same weekly systemd job as operating-generator-capacity,
|
||||
sequentially after it (Monday 03:30 via
|
||||
ingest-eia-energy-layers.timer).
|
||||
|
||||
- Both tables are rebuilt from scratch each run (TRUNCATE on first
|
||||
page), so historical revisions EIA pushes upstream propagate
|
||||
automatically. There is no incremental-load mode and none is
|
||||
planned — total wall time is acceptable.
|
||||
|
||||
- If EIA-923 is down at run time, the wrapper's `set -e` will mark
|
||||
the systemd service as failed; the capacity ingest will still have
|
||||
completed successfully because it runs first.
|
||||
Reference in New Issue
Block a user