got the ingest for energy eia data. created txt files of their descriptions

This commit is contained in:
2026-05-16 17:05:59 -07:00
parent b442998eb5
commit 75d17f8e95
4 changed files with 787 additions and 180 deletions

View File

@@ -0,0 +1,134 @@
================================================================================
EIA Operating Generator Capacity — Sample Rows + Narrative
Generated 2026-05-16 from public.energy_eia_operating_generator_capacity_flat
================================================================================
WHAT THIS DATA IS
-----------------
This table is a flat, queryable view of EIA's "operating-generator-capacity"
endpoint (https://api.eia.gov/v2/electricity/operating-generator-capacity/).
The underlying source is Form EIA-860, which inventories every electric
generator in the United States that is reported as operating (or recently
operating) by its owner.
Each row represents one generator's reported status in one month. A single
power plant typically has multiple generators, so a plant like Plant Barry in
Alabama appears as several rows per month — one for each generator unit
(generator_id 1, 2, 3, ...). The same generator reappears every month it
remains in the inventory, so the table is a time series of (plant × generator
× month) records.
WHAT IT TELLS US
----------------
For each generator, in each reporting month:
- Where it is (state, balancing authority, exact latitude/longitude)
- Who owns or operates it (entity_id, entity_name)
- What fuel/energy source it uses (energy_source_code + descriptive name)
- How it generates electricity (prime_mover_code, e.g. ST=steam turbine,
HY=hydro, IC=internal combustion, WT=wind turbine)
- Its current operating status (status code, see below)
- What sector it serves (utility, IPP, industrial, commercial, etc.)
What it does NOT tell us is how much electricity the generator actually
produces in that month — that data comes from a separate EIA endpoint
("facility-fuel", Form EIA-923), captured in a sibling table.
STATUS CODES IN THIS TABLE
--------------------------
OP Operating 4,229,083 rows
SB Standby / backup 339,057 rows
OS Out of service 99,816 rows
OA Out of service (annual) 28,769 rows
SUMMARY STATISTICS
------------------
Total rows: 4,696,725
Distinct generators (by plant_id × generator_id): ~75k
Distinct plants (plant_id): 15,791
Distinct states/territories: 51
Distinct months covered: 218
Period range: 2008-01 → 2026-02
Rows with lat/lon geometry: 4,685,500 (99.76%)
Distinct fuel codes: 38
TOP 10 FUELS BY ROW COUNT
-------------------------
Natural Gas 1,301,782
Water (hydro) 908,741
Distillate Fuel Oil* 767,207
Solar 624,113
Landfill Gas 317,709
Wind 245,214
Bituminous Coal 108,352
Subbituminous Coal 75,587
Electricity used for energy storage 43,833
Geothermal 41,066
* EIA stores this as "Disillate Fuel Oil" (sic). The misspelling is in
EIA's source data, not introduced by ingest. Preserved verbatim.
FIRST 5 ROWS (earliest period, ordered by plant_id)
---------------------------------------------------
period | plant_id | plant_name | state | entity_name | gen_id | status | fuel | pm | latitude | longitude
---------+----------+--------------+-------+------------------+--------+--------+------------------+----+-----------+-----------
2008-01 | 2 | Bankhead Dam | AL | Alabama Power Co | 1 | OP | Water | HY | 33.218889 | -87.579722
2008-01 | 3 | Barry | AL | Alabama Power Co | 1 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
2008-01 | 3 | Barry | AL | Alabama Power Co | 2 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
2008-01 | 3 | Barry | AL | Alabama Power Co | 3 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
2008-01 | 3 | Barry | AL | Alabama Power Co | 4 | OP | Bituminous Coal | ST | 31.004167 | -88.013889
(Both plants are in Alabama; Bankhead Dam is a hydro facility on the Black
Warrior River, Plant Barry is a coal-fired steam plant near Mobile. Both
were operating in January 2008.)
LAST 5 ROWS (latest period, ordered by plant_id)
------------------------------------------------
period | plant_id | plant_name | state | entity_name | gen_id | status | fuel | pm | latitude | longitude
---------+----------+------------+-------+----------------------------+--------+--------+---------------------+----+-----------+-------------
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 1 | SB | Disillate Fuel Oil | IC | 55.339722 | -160.497222
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 2 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 3 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | 5.1 | OP | Disillate Fuel Oil | IC | 55.339722 | -160.497222
2026-02 | 1 | Sand Point | AK | Sand Point Generating, LLC | WT1 | OS | Wind | WT | 55.339722 | -160.497222
(Sand Point is a small remote-Alaska community station with five generators:
four diesel internal-combustion units and one wind turbine. The wind turbine
is currently out of service.)
KNOWN DATA-QUALITY QUIRKS IN EIA'S SOURCE DATA
----------------------------------------------
- Historical longitude sign bug (FIXED at ingest time, 2026-05-16).
For reporting periods 2008-01 through 2010-11, EIA stored lower-48
longitudes as positive numbers (Bankhead Dam was +87.579722 instead
of -87.579722). EIA cleaned this up in their own data starting
2010-12, but the historical periods still had the bug. The flat
table's build step now applies:
CASE WHEN longitude > 0 AND state_id <> 'AK'
THEN -longitude ELSE longitude END
and rebuilds geom from the corrected coordinates. Alaska is
excluded because some Aleutian plants (~11k bug-era rows) are
legitimately east of the dateline with positive longitudes.
Affected non-AK rows fixed: 403,558. After the fix, every plant
in the table is at a geographically plausible US location.
- Fuel description "Disillate Fuel Oil" (missing 't', should be
"Distillate") — EIA's spelling, preserved as-is in energy_source_desc.
REFRESH CADENCE
---------------
A systemd user timer rebuilds this table every Monday at 03:30 local time
via ~/.local/bin/ingest-eia-energy-layers-weekly. The ingest fetches the
full dataset per month (Jan 2008 → current) and rebuilds the flat table
from scratch each run.
JOIN KEY FOR DOWNSTREAM ANALYSIS
--------------------------------
plant_id (text) joins to the forthcoming energy_eia_facility_fuel_flat
table (Form EIA-923), which provides monthly net + gross generation in MWh
for the same plants. Together, the two tables answer:
- WHERE energy is generated (this table, with lat/lon)
- WHAT is generated and by whom (this table, with fuel + entity)
- HOW MUCH is generated each month (facility_fuel_flat, in MWh)