got the ingest for energy eia data. created txt files of their descriptions
This commit is contained in:
132
output/facility_fuel_pending_narrative.txt
Normal file
132
output/facility_fuel_pending_narrative.txt
Normal file
@@ -0,0 +1,132 @@
|
||||
================================================================================
|
||||
EIA Facility-Fuel — Pending Dataset Narrative
|
||||
Drafted 2026-05-16, prior to first successful ingest
|
||||
================================================================================
|
||||
|
||||
STATUS
|
||||
------
|
||||
Wired into the weekly ingest pipeline as of 2026-05-16, but not yet
|
||||
populated. EIA's facility-fuel endpoint and its parent EIA-923 service
|
||||
were experiencing a sustained outage at write time (network-level
|
||||
connection timeouts, also visible on EIA's public dashboard). The
|
||||
endpoint is queued for the next successful systemd run (Monday 03:30,
|
||||
or sooner if EIA recovers).
|
||||
|
||||
Target table when populated: public.energy_eia_facility_fuel_flat
|
||||
|
||||
WHAT THIS DATA IS
|
||||
-----------------
|
||||
The "facility-fuel" endpoint
|
||||
(https://api.eia.gov/v2/electricity/facility-fuel/) exposes Form EIA-923:
|
||||
the monthly survey collected from electric power plants reporting their
|
||||
fuel consumption and electricity output. Where operating-generator-capacity
|
||||
tells us WHAT generators exist and WHERE they are, facility-fuel tells us
|
||||
HOW MUCH electricity each plant actually produced each month.
|
||||
|
||||
Each row represents one (plant × energy source × prime mover × month)
|
||||
combination. A coal-gas hybrid plant with both steam turbines and
|
||||
combustion turbines, for example, would have multiple rows per month —
|
||||
one for each fuel/prime-mover combination it ran during that month.
|
||||
|
||||
WHAT IT TELLS US (PLANNED COLUMNS)
|
||||
----------------------------------
|
||||
For each plant, in each reporting month:
|
||||
|
||||
period YYYY-MM reporting month
|
||||
plant_id EIA plant code — joins to operating_generator_capacity_flat
|
||||
plant_name Plant name (when present)
|
||||
state_id Two-letter state
|
||||
state_name Full state name (when present)
|
||||
prime_mover_code ST=steam, CT=combustion, HY=hydro, etc.
|
||||
prime_mover_desc Human-readable prime mover
|
||||
energy_source_code EIA fuel code (e.g., NG=natural gas, BIT=bituminous coal)
|
||||
energy_source_desc Human-readable fuel
|
||||
generation_mwh NET generation in megawatt-hours (after plant use)
|
||||
gross_generation_mwh GROSS generation in megawatt-hours (at the busbar)
|
||||
raw_properties Full JSONB of the EIA response row (safety net)
|
||||
|
||||
The two MWh fields are the headline numbers — actual electricity output.
|
||||
|
||||
WHY BOTH TABLES MATTER
|
||||
----------------------
|
||||
The capacity table answers "what generators exist and where," but a
|
||||
generator that exists is not the same as a generator that produces. A
|
||||
1,000 MW coal plant in standby status produces zero MWh; a 100 MW solar
|
||||
farm at noon produces near its nameplate. Capacity sets the upper bound;
|
||||
facility-fuel reports the realized output.
|
||||
|
||||
For data-center analyses specifically, this matters because:
|
||||
|
||||
- Siting decisions correlate with available local generation. The
|
||||
capacity table shows nearby supply potential. The facility-fuel
|
||||
table shows whether that potential is actually being realized
|
||||
month-to-month (e.g., a nearby gas plant that runs only as peaker
|
||||
is a very different story from one running baseload).
|
||||
|
||||
- Carbon intensity per data center can be estimated by attributing
|
||||
nearby generation MWh to fuel type, weighted by distance or
|
||||
balancing-authority membership.
|
||||
|
||||
- Grid stress signals (capacity utilization = generation / capacity)
|
||||
flag regions where new data-center load may be unwelcome.
|
||||
|
||||
JOIN PATTERN
|
||||
------------
|
||||
The natural join key is plant_id (text). Typical analyst query:
|
||||
|
||||
select
|
||||
cap.plant_name,
|
||||
cap.state_id,
|
||||
cap.entity_name,
|
||||
cap.latitude,
|
||||
cap.longitude,
|
||||
ff.period,
|
||||
ff.energy_source_desc,
|
||||
ff.generation_mwh,
|
||||
ff.gross_generation_mwh
|
||||
from public.energy_eia_facility_fuel_flat ff
|
||||
join public.energy_eia_operating_generator_capacity_flat cap
|
||||
on cap.plant_id = ff.plant_id
|
||||
and cap.period = ff.period
|
||||
where ff.period = '2026-01';
|
||||
|
||||
Note: capacity rows are per-generator; facility-fuel rows are per
|
||||
plant × fuel × prime mover. A join on plant_id alone will multiply rows.
|
||||
For most aggregate questions, aggregate one side first (e.g., sum MWh
|
||||
per plant-month, or pick a representative generator per plant).
|
||||
|
||||
EXPECTED SIZE
|
||||
-------------
|
||||
Form EIA-923 monthly publishes back to 2001-01. With ~10,000 reporting
|
||||
plants and multiple fuel/prime-mover combinations per plant per month,
|
||||
the table is expected in the 5–10 million row range — similar to or
|
||||
somewhat larger than the capacity table. The per-month ingest strategy
|
||||
(start=YYYY-MM&end=YYYY-MM, retry/backoff) is identical to the capacity
|
||||
ingest and was chosen specifically because it kept that table's wall
|
||||
time near two hours and recovered cleanly from EIA's transient 503s.
|
||||
|
||||
UNKNOWNS AT TIME OF DRAFT
|
||||
-------------------------
|
||||
The flat-table SELECT was written from EIA's API documentation without
|
||||
confirmation of the exact JSON key casing returned by the live endpoint
|
||||
(the documentation lists facets as plantCode, fuel2002, primeMover, state
|
||||
— the SELECT uses these names). If the live response differs (e.g.,
|
||||
plantid vs plantCode), the typed columns will populate as NULL for
|
||||
those rows, and the full original payload will still be available in
|
||||
raw_properties for inspection. The fix in that case is a one-line edit
|
||||
to the SELECT in build_flat_tables() in ingest_eia_energy_layers.py.
|
||||
|
||||
OPERATIONAL NOTES
|
||||
-----------------
|
||||
- Runs in the same weekly systemd job as operating-generator-capacity,
|
||||
sequentially after it (Monday 03:30 via
|
||||
ingest-eia-energy-layers.timer).
|
||||
|
||||
- Both tables are rebuilt from scratch each run (TRUNCATE on first
|
||||
page), so historical revisions EIA pushes upstream propagate
|
||||
automatically. There is no incremental-load mode and none is
|
||||
planned — total wall time is acceptable.
|
||||
|
||||
- If EIA-923 is down at run time, the wrapper's `set -e` will mark
|
||||
the systemd service as failed; the capacity ingest will still have
|
||||
completed successfully because it runs first.
|
||||
134
output/operating_generator_capacity_sample.txt
Normal file
134
output/operating_generator_capacity_sample.txt
Normal file
@@ -0,0 +1,134 @@
|
||||
================================================================================
|
||||
EIA Operating Generator Capacity — Sample Rows + Narrative
|
||||
Generated 2026-05-16 from public.energy_eia_operating_generator_capacity_flat
|
||||
================================================================================
|
||||
|
||||
WHAT THIS DATA IS
|
||||
-----------------
|
||||
This table is a flat, queryable view of EIA's "operating-generator-capacity"
|
||||
endpoint (https://api.eia.gov/v2/electricity/operating-generator-capacity/).
|
||||
The underlying source is Form EIA-860, which inventories every electric
|
||||
generator in the United States that is reported as operating (or recently
|
||||
operating) by its owner.
|
||||
|
||||
Each row represents one generator's reported status in one month. A single
|
||||
power plant typically has multiple generators, so a plant like Plant Barry in
|
||||
Alabama appears as several rows per month — one for each generator unit
|
||||
(generator_id 1, 2, 3, ...). The same generator reappears every month it
|
||||
remains in the inventory, so the table is a time series of (plant × generator
|
||||
× month) records.
|
||||
|
||||
WHAT IT TELLS US
|
||||
----------------
|
||||
For each generator, in each reporting month:
|
||||
- Where it is (state, balancing authority, exact latitude/longitude)
|
||||
- Who owns or operates it (entity_id, entity_name)
|
||||
- What fuel/energy source it uses (energy_source_code + descriptive name)
|
||||
- How it generates electricity (prime_mover_code, e.g. ST=steam turbine,
|
||||
HY=hydro, IC=internal combustion, WT=wind turbine)
|
||||
- Its current operating status (status code, see below)
|
||||
- What sector it serves (utility, IPP, industrial, commercial, etc.)
|
||||
|
||||
What it does NOT tell us is how much electricity the generator actually
|
||||
produces in that month — that data comes from a separate EIA endpoint
|
||||
("facility-fuel", Form EIA-923), captured in a sibling table.
|
||||
|
||||
STATUS CODES IN THIS TABLE
|
||||
--------------------------
|
||||
OP Operating 4,229,083 rows
|
||||
SB Standby / backup 339,057 rows
|
||||
OS Out of service 99,816 rows
|
||||
OA Out of service (annual) 28,769 rows
|
||||
|
||||
SUMMARY STATISTICS
|
||||
------------------
|
||||
Total rows: 4,696,725
|
||||
Distinct generators (by plant_id × generator_id): ~75k
|
||||
Distinct plants (plant_id): 15,791
|
||||
Distinct states/territories: 51
|
||||
Distinct months covered: 218
|
||||
Period range: 2008-01 → 2026-02
|
||||
Rows with lat/lon geometry: 4,685,500 (99.76%)
|
||||
Distinct fuel codes: 38
|
||||
|
||||
TOP 10 FUELS BY ROW COUNT
|
||||
-------------------------
|
||||
Natural Gas 1,301,782
|
||||
Water (hydro) 908,741
|
||||
Distillate Fuel Oil* 767,207
|
||||
Solar 624,113
|
||||
Landfill Gas 317,709
|
||||
Wind 245,214
|
||||
Bituminous Coal 108,352
|
||||
Subbituminous Coal 75,587
|
||||
Electricity used for energy storage 43,833
|
||||
Geothermal 41,066
|
||||
|
||||
* EIA stores this as "Disillate Fuel Oil" (sic). The misspelling is in
|
||||
EIA's source data, not introduced by ingest. Preserved verbatim.
|
||||
|
||||
FIRST 5 ROWS (earliest period, ordered by plant_id)
|
||||
---------------------------------------------------
|
||||
period | plant_id | plant_name | state | entity_name | gen_id | status | fuel | pm | latitude | longitude
|
||||
---------+----------+--------------+-------+------------------+--------+--------+------------------+----+-----------+-----------
|
||||
2008-01 | 2 | Bankhead Dam | AL | Alabama Power Co | 1 | OP | Water | HY | 33.218889 | -87.579722
|
||||
2008-01 | 3 | Barry | AL | Alabama Power Co | 1 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
|
||||
2008-01 | 3 | Barry | AL | Alabama Power Co | 2 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
|
||||
2008-01 | 3 | Barry | AL | Alabama Power Co | 3 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
|
||||
2008-01 | 3 | Barry | AL | Alabama Power Co | 4 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
|
||||
|
||||
(Both plants are in Alabama; Bankhead Dam is a hydro facility on the Black
|
||||
Warrior River, Plant Barry is a coal-fired steam plant near Mobile. Both
|
||||
were operating in January 2008.)
|
||||
|
||||
LAST 5 ROWS (latest period, ordered by plant_id)
|
||||
------------------------------------------------
|
||||
period | plant_id | plant_name | state | entity_name | gen_id | status | fuel | pm | latitude | longitude
|
||||
---------+----------+------------+-------+----------------------------+--------+--------+---------------------+----+-----------+-------------
|
||||
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 1 | SB | Disillate Fuel Oil | IC | 55.339722 | -160.497222
|
||||
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 2 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
|
||||
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 3 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
|
||||
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 5.1 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
|
||||
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | WT1 | OS | Wind | WT | 55.339722 | -160.497222
|
||||
|
||||
(Sand Point is a small remote-Alaska community station with five generators:
|
||||
four diesel internal-combustion units and one wind turbine. The wind turbine
|
||||
is currently out of service.)
|
||||
|
||||
KNOWN DATA-QUALITY QUIRKS IN EIA'S SOURCE DATA
|
||||
----------------------------------------------
|
||||
- Historical longitude sign bug (FIXED at ingest time, 2026-05-16).
|
||||
For reporting periods 2008-01 through 2010-11, EIA stored lower-48
|
||||
longitudes as positive numbers (Bankhead Dam was +87.579722 instead
|
||||
of -87.579722). EIA cleaned this up in their own data starting
|
||||
2010-12, but the historical periods still had the bug. The flat
|
||||
table's build step now applies:
|
||||
|
||||
CASE WHEN longitude > 0 AND state_id <> 'AK'
|
||||
THEN -longitude ELSE longitude END
|
||||
|
||||
and rebuilds geom from the corrected coordinates. Alaska is
|
||||
excluded because some Aleutian plants (~11k bug-era rows) are
|
||||
legitimately east of the dateline with positive longitudes.
|
||||
Affected non-AK rows fixed: 403,558. After the fix, every plant
|
||||
in the table is at a geographically plausible US location.
|
||||
|
||||
- Fuel description "Disillate Fuel Oil" (missing 't', should be
|
||||
"Distillate") — EIA's spelling, preserved as-is in energy_source_desc.
|
||||
|
||||
REFRESH CADENCE
|
||||
---------------
|
||||
A systemd user timer rebuilds this table every Monday at 03:30 local time
|
||||
via ~/.local/bin/ingest-eia-energy-layers-weekly. The ingest fetches the
|
||||
full dataset per month (Jan 2008 → current) and rebuilds the flat table
|
||||
from scratch each run.
|
||||
|
||||
JOIN KEY FOR DOWNSTREAM ANALYSIS
|
||||
--------------------------------
|
||||
plant_id (text) joins to the forthcoming energy_eia_facility_fuel_flat
|
||||
table (Form EIA-923), which provides monthly net + gross generation in MWh
|
||||
for the same plants. Together, the two tables answer:
|
||||
|
||||
- WHERE energy is generated (this table, with lat/lon)
|
||||
- WHAT is generated and by whom (this table, with fuel + entity)
|
||||
- HOW MUCH is generated each month (facility_fuel_flat, in MWh)
|
||||
Reference in New Issue
Block a user