Ingestion
Two public-domain sources, refreshed daily.
- USASpending.gov - prime award transactions, sub-award (FSRS) records, and federal-account financials, pulled by fiscal year and action date.
- SEC EDGAR - the
company_tickers.jsonissuer↔ticker reference, and the Exhibit-21 “Subsidiaries of the Registrant” exhibit from each public filer’s latest 10-K / 20-F.
EDGAR access respects the SEC fair-access policy (≤10 requests/second, declared User-Agent). No scraping of authenticated or non-public pages; no personal data is ingested.
Normalization & crosswalk
Names are cleaned, then mapped to a parent ticker built from filings.
Recipient names are written inconsistently, so every name is first reduced to a canonical form:
lower-case → strip punctuation & symbols → collapse whitespace
strip legal-form tokens only: inc, corp, corporation, company, co, llc, lp, ltd, the, and
keep distinctive tokens: systems, solutions, technologies, services, group…
Stripping only legal-form suffixes is deliberate - removing distinctive words like “Systems” would collapse unrelated firms onto each other (e.g. Paragon Systems vs. an unrelated “Paragon”). We learned this the hard way and corrected for it.
Then, for every public filer, we parse its EX-21 schedule and map each normalized subsidiary name to the parent’s ticker:
This is why ELECTRIC BOAT CORP resolves to GD and SIKORSKY AIRCRAFT to LMT - names a direct issuer-name match never catches. The crosswalk rebuilds from filings, so it updates as companies reorganize.
Entity resolution
Each recipient runs through a deterministic-first match cascade.
2. exact normalized match → EX-21 subsidiary (the parent)
3. fuzzy match → SEC issuer (token-set ratio ≥ 94)
4. fuzzy match → EX-21 subsidiary (token-set ratio ≥ 94)
else → unresolved (treated as private / non-public)
Fuzzy matching uses an order- and length-tolerant token-set ratio, so it aligns variants like “Raytheon Company” with the issuer “RTX Corp”. A high threshold keeps false positives rare.
Every resolution carries a method and a confidence grade:
| Tier | Method | Handling |
|---|---|---|
| High | exact issuer / exact EX-21 subsidiary | published |
| Review | fuzzy ≥ threshold | held for human review |
| Excluded | universities, JVs, non-profits, private firms, government | flagged non-public |
Point-in-time & lag
We snapshot the data daily, because the source overwrites itself.
To fix this we capture a point-in-time snapshot every day: cumulative obligations and outlays per ticker, stamped with the capture date. Because the source overwrites itself, these vintages cannot be reconstructed after the fact - they exist only because we record them daily. The day-over-day delta yields a spending-flow series the source never publishes directly.
Data quality
Three controls keep the output clean.
- EX-21 boilerplate removal. Filing exhibits contain headers and jurisdictions (“Delaware”, “Name of Subsidiary”); we filter these so they cannot become spurious matches.
- Sub-award sanity bounds. FSRS amounts are self-reported and frequently mis-keyed - a meaningful share of the largest records exceed any plausible value (we have observed entries in the trillions). Implausible figures are capped or discarded and logged.
- Idempotent capture. Daily snapshots are de-duplicated per (date, ticker), so re-runs never double-count.
Coverage & limits
What the dataset covers today - and where it is deliberately conservative.
History extends to FY2016 (extendable to FY2008) across the public companies that receive federal awards - concentrated in defense, government IT/services, and healthcare.
- Outlay coverage is sparse. Many awards report no outlay-to-date; outlay figures are partial.
- Joint ventures are approximate. Multi-parent management JVs are attributed to a lead public parent where one exists, and flagged.
- Figures are revised. All amounts change as agencies update filings; we preserve vintages but do not restate the source.
- Federal share varies. Attribution is most material for government-dependent companies; for diversified issuers, federal dollars may be a small part of total revenue.