Add LegiScan legislation ingestion and analysis queries
Adds ingest_legiscan.py to pull all US state + federal bills (2016-2026) from the LegiScan API into legiscan_sessions and legiscan_bills tables. Bills are keyword-tagged across 8 research categories (data_center, ratepayer_protection, large_load, grid_impact, tax_incentive, etc.). Loads ~1.3M bills; ~60K tagged relevant. Adds query_legiscan_bills.sql with pre-built analysis queries including state/DC joins. Updates database-tables.md, README.md, and research-ideas.md accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -13,11 +13,12 @@
|
||||
|
||||
## Table Organization
|
||||
|
||||
Tables are organized into four categories:
|
||||
Tables are organized into five categories:
|
||||
1. **Core Data Center Tables** - Master inventories and source data
|
||||
2. **Enrichment Tables** - Data centers joined with contextual data
|
||||
3. **Base Layer Tables** - Geographic and demographic reference layers
|
||||
4. **Infrastructure Tables** - Energy and connectivity infrastructure
|
||||
5. **Legislation Tables** - LegiScan state and federal bill data (2016-2026)
|
||||
|
||||
---
|
||||
|
||||
@@ -499,6 +500,85 @@ Tables are organized into four categories:
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Legislation Tables
|
||||
|
||||
Populated by `ingest_legiscan.py` using the [LegiScan API](https://legiscan.com/legiscan).
|
||||
Covers all 50 states + DC + US Congress, sessions from 2016 through 2026.
|
||||
Data licensed [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) — attribute LegiScan LLC.
|
||||
|
||||
### `legiscan_sessions`
|
||||
**Rows**: 646
|
||||
**Purpose**: One row per legislative session dataset downloaded from LegiScan
|
||||
|
||||
**Key Columns**:
|
||||
- `session_id` (INTEGER) - LegiScan session ID (PRIMARY KEY)
|
||||
- `state_abbr` (VARCHAR) - Two-letter state code (`CA`, `TX`, `US`, etc.)
|
||||
- `state_id` (INTEGER) - LegiScan numeric state ID
|
||||
- `year_start`, `year_end` (INTEGER) - Session year range
|
||||
- `session_title` (TEXT) - Full session name
|
||||
- `session_tag` (TEXT) - Short tag (e.g., "Regular Session", "1st Special Session")
|
||||
- `is_special` (BOOLEAN) - True for special/extraordinary sessions
|
||||
- `is_prior` (BOOLEAN) - True for completed/sine-die sessions
|
||||
- `dataset_hash` (VARCHAR) - MD5 of dataset ZIP; used to detect updates
|
||||
- `dataset_date` (DATE) - Date dataset was last published by LegiScan
|
||||
- `dataset_size_mb` (FLOAT) - Compressed ZIP size
|
||||
- `bill_count` (INTEGER) - Number of bills loaded from this session
|
||||
- `imported_at` (TIMESTAMPTZ) - When this session was last imported
|
||||
|
||||
### `legiscan_bills`
|
||||
**Rows**: ~1,313,000
|
||||
**Purpose**: All bills from all sessions; tagged for relevance to data center research topics
|
||||
|
||||
**Key Columns**:
|
||||
- `bill_id` (INTEGER) - LegiScan bill ID (PRIMARY KEY)
|
||||
- `session_id` (INTEGER) - FK → `legiscan_sessions`
|
||||
- `state` (VARCHAR) - Two-letter state code
|
||||
- `bill_number` (VARCHAR) - Bill number (e.g., `SB 1000`, `HB 233`)
|
||||
- `bill_type` (VARCHAR) - `B`=Bill, `R`=Resolution, `CR`=Concurrent Resolution, etc.
|
||||
- `title` (TEXT) - Short title
|
||||
- `description` (TEXT) - Longer description
|
||||
- `status` (INTEGER) - Current status code (see below)
|
||||
- `status_date` (DATE) - Date of last status change
|
||||
- `completed` (INTEGER) - 1 if bill is in a terminal state
|
||||
- `body` (VARCHAR) - Originating chamber (`H`=House, `S`=Senate, `C`=Council, etc.)
|
||||
- `url` (TEXT) - LegiScan bill page URL
|
||||
- `state_link` (TEXT) - Official state legislature URL
|
||||
- `change_hash` (VARCHAR) - MD5 used to detect bill-level updates
|
||||
- `subjects` (TEXT[]) - LegiScan subject tags (GIN indexed)
|
||||
- `sponsor_count` (INTEGER) - Number of sponsors
|
||||
- `vote_count` (INTEGER) - Number of recorded votes
|
||||
- `text_count` (INTEGER) - Number of bill text versions
|
||||
- `is_relevant` (BOOLEAN) - True if any relevance tag matched (GIN indexed)
|
||||
- `relevance_tags` (TEXT[]) - Matched topic tags (GIN indexed)
|
||||
- `imported_at` (TIMESTAMPTZ) - When this bill was last upserted
|
||||
|
||||
**Status codes**: 1=Introduced, 2=Engrossed, 3=Enrolled, 4=Passed, 5=Vetoed, 6=Failed, 7=Override, 8=Chaptered, 9=Referred, 12=Draft
|
||||
|
||||
**Relevance tags** (keyword-matched against title + description + subjects):
|
||||
|
||||
| Tag | What it captures |
|
||||
|-----|-----------------|
|
||||
| `data_center` | Data centers, hyperscale, colocation, AI campuses, HPC facilities |
|
||||
| `large_load` | Crypto mining, large industrial loads, extraordinary load |
|
||||
| `ratepayer_protection` | Cost shifting, cross-subsidy, rate design, affordability, rate class |
|
||||
| `grid_impact` | Grid reliability, transmission, interconnection queue, IRP |
|
||||
| `tax_incentive` | Tax exemptions, abatements, credits for facilities |
|
||||
| `energy_policy` | Renewable PPAs, green tariffs, clean electricity, decarbonization |
|
||||
| `water_use` | Cooling water, evaporative cooling, water footprint |
|
||||
| `siting_permitting` | Zoning, conditional use permits, local control, preemption |
|
||||
|
||||
**Notes**:
|
||||
- ~60,000 relevant bills out of 1.3M total (~4.6%)
|
||||
- `data_center` tag: ~2,182 bills; `ratepayer_protection`: ~49,000
|
||||
- GIN indexes on `subjects`, `relevance_tags`, and full-text (`title || description`)
|
||||
- Use `query_legiscan_bills.sql` for pre-built research queries
|
||||
- Re-run `python ingest_legiscan.py --fetch --load` weekly to pick up dataset updates
|
||||
- Re-run `python ingest_legiscan.py --tag` after editing keyword lists
|
||||
|
||||
---
|
||||
|
||||
## Commonly Used Joins
|
||||
|
||||
### Data Center to Demographics
|
||||
|
||||
Reference in New Issue
Block a user