Data Pipeline
Over 150 live data sources are ingested, validated, and transformed into a unified macro data estate. This page documents every external feed, quality control measure, and internally derived composite signal.
FRED Economic Data — 107 Series
The backbone of Convex’s data estate is the Federal Reserve Economic Data (FRED) API, maintained by the St. Louis Fed. We ingest 107 time series spanning the full macro spectrum — from overnight rates to housing starts. Every series has documented metadata, update frequency, and seasonal adjustment status.
Quality controls: All incoming data points pass outlier-bounds validation. Missing values are handled via forward-fill (for irregular release schedules) or interpolation (for known gaps). Source-level errors are tracked and surfaced in pipeline observability.
| Category | Series | Examples |
|---|---|---|
| Yield Curve & Rates | 17 | DGS1–DGS30, T10Y2Y, T10Y3M, SOFR, EFFR, term premium (ACM) |
| Inflation Pipeline | 13 | CPI, Core CPI, PCE, Core PCE, PPI, breakeven inflation rates (5Y, 10Y) |
| Credit & Financial Stress | 14 | HY/IG spreads, NFCI, credit delinquency rates, SLOOS lending standards |
| Liquidity Complex | 8 | Fed balance sheet (WALCL), RRP, TGA, M2, reserve balances |
| Labour Market | 13 | Unemployment, claims (weekly), JOLTS, participation rate, Sahm Rule |
| Economic Activity | 19 | GDP, industrial production, retail sales, durable goods, ISM PMIs |
| Housing & Consumer | 12 | Mortgage rates, building permits, existing sales, consumer sentiment |
| Market Indices & Vol | 11 | S&P 500, VIX, put/call ratio, small-cap/large-cap rotation |
GDELT Geopolitical Events
The GDELT Project (Global Database of Events, Language, and Tone) provides real-time event monitoring of global media. Convex ingests GDELT event data through per-pillar search queries across 6 geopolitical categories.
Noise filtering: Raw GDELT output is extremely noisy. We apply a strict 80+ trusted domain whitelist (Reuters, AP, AFP, BBC, FT, Economist, Bloomberg, government agencies, think tanks, defence outlets) that filters approximately 95% of GDELT noise while retaining high-quality event signals.
Events are queried in batches (3 parallel queries with inter-batch delays) with rate limiting (max 1 request/second, exponential backoff on 429/5xx responses).
RSS News Feeds — 46 Sources
Convex monitors 46 editorially diverse news sources via RSS, categorised into 7 editorial leans. This diversity is intentional — editorial convergence across normally-opposed sources is itself a powerful signal. Each source has a known editorial lean, update frequency, and authority profile.
| Category | Examples | Purpose |
|---|---|---|
| Establishment | Bloomberg, Financial Times, Reuters, Wall Street Journal, BBC | Mainstream institutional consensus view |
| Contrarian | Wolf Street, ZeroHedge, independent macro analysts | Counter-consensus perspectives and early warnings |
| Wire Services | AP, AFP, Reuters Wire | Breaking news primacy detection |
| Government | Federal Reserve, US Treasury, ECB, BoE, PBOC communications | Direct policy signals and forward guidance |
| Academic | NBER, Brookings, policy think tanks, research institutes | Structural analysis and long-term framework shifts |
| Crypto-Native | Specialised digital asset outlets | On-chain sentiment and DeFi-specific developments |
| Neutral | AP (general), balanced financial outlets | Baseline narrative without editorial lean |
CFTC Commitments of Traders
The Commodity Futures Trading Commission (CFTC) publishes weekly Commitments of Traders reports showing how commercial hedgers, non-commercial speculators, and other traders are positioned in major futures contracts. Convex ingests this via the CFTC Socrata Open Data API.
For each contract, we track non-commercial long/short positions, net speculative positioning (long minus short), commercial net positioning, and open interest. We then compute 52-week percentile ranks for net speculative positioning — readings above the 90th or below the 10th percentile trigger "peak positioning" events that signal potential contrarian setups.
| Contract | Exchange | Tracked Series |
|---|---|---|
| Bitcoin Futures | CME | NC Long, NC Short, Net Spec, Commercial Net, Open Interest |
| Gold Futures | COMEX | NC Long, NC Short, Net Spec, Commercial Net, Open Interest |
| Crude Oil (WTI) | NYMEX | NC Long, NC Short, Net Spec, Commercial Net, Open Interest |
| S&P 500 E-mini | CME | NC Long, NC Short, Net Spec, Commercial Net, Open Interest |
Polymarket Prediction Markets
Convex tracks active prediction markets via the Polymarket Gamma APIacross 9 tag categories: politics, crypto, economics, geopolitics, Fed policy, inflation, recession, interest rates, and trade. Market prices represent crowd-aggregated implicit probabilities.
Prediction market data serves two purposes: providing an external market-implied probability anchor for comparison against our Bayesian scenario estimates, and surfacing emerging consensus on policy shifts before they appear in traditional data.
Data is cached with 5-minute revalidation. When a tracked scenario has an explicit Polymarket slug mapping, the market-implied probability appears alongside our model estimate.
Macro Intelligence Pipeline
Raw events from GDELT and RSS feeds pass through a multi-gate pipeline before reaching publication. This ensures only genuinely significant, non-duplicate, quality-validated content is published.
Pre-Filter
Raw events ingested from GDELT and RSS. Text normalisation, pillar keyword matching (must hit at least 1 of 6 macro pillars), CAMEO code whitelist with Goldstein threshold (|score| ≥ 2.0), and deduplication — both exact hash (Tier 1) and simhash near-duplicate detection with Jaccard > 0.75 (Tier 2).
Classification
LLM-based classification: is this event relevant to macro trading? Which editorial pillar? Priority assignment: FLASH (urgent, 30-minute generation SLA), STANDARD (broader context aggregation), or WATCHLIST (monitor only). Circuit breaker pauses classification if daily API costs exceed ceiling.
Brief Generation
FLASH events receive deep enrichment (20+ data points) and generate urgent ~200 word briefs. STANDARD events aggregate broader context into ~400 word macro briefs. Up to 2 regeneration attempts on generation failure.
Validation
Deterministic validation (mechanism relevance, consistency checks) and editorial validation (word count 900–1500, tone appropriateness, deduplication within 48-hour window). Hard gates auto-publish on pass; soft gates route to editor review on failure.
Observability: Every pipeline run logs events processed, classified, deemed relevant, generated, and published. Per-stage event logging and error tracking enable rapid diagnosis when pipeline throughput changes.
Derived Composite Signals
Beyond raw data ingestion, Convex computes 10 derived composite signals that synthesize multiple FRED series into higher-level macro narratives. These are pre-computed and injected as context into every analysis pipeline — providing the AI research desk with structured, quantitative context rather than raw numbers.
Net Liquidity Proxy
Fed Balance Sheet − RRP − TGA — from WALCL, RRPONTSYD, WTREGEN
Actual market liquidity in trillions, with 1-week, 1-month, and 3-month changes computed. Direction classified as expanding, contracting, or stable.
Yield Curve Shape
2s5s, 5s10s, 2s10s, 2s30s spreads + butterfly — from DGS2, DGS5, DGS10, DGS30
Full term structure analysis with z-scored spreads and movement classification: bear steepening, bull flattening, bear flattening, or bull steepening.
Real Yields
5Y TIPS, 10Y TIPS, slope, term premium — from DFII5, DFII10, THREEFYTP10
Real interest rate complex with slope (10Y minus 5Y TIPS) and ACM term premium. 1-month change direction tracked.
Inflation Expectations
5Y/10Y breakeven, slope, expectations index — from T5YIE, T10YIE
Market-implied inflation expectations from TIPS breakevens. Slope (10Y minus 5Y) reveals whether markets see inflation as transitory or structural.
Labour Market Intensity
Composite of unemployment, claims, participation, emp-pop — from UNRATE, IC4WSA, CIVPART, EMRATIO
Multi-dimensional labour market health assessment. Change direction classified: tightening, loosening, or stable.
Credit Health
HY spread + IG spread + credit conditions z-score — from BAMLH0A0HYM2, BAMLC0A0CM, NFCI
Composite credit stress measurement combining spread levels and the NFCI financial conditions index.
Risk Appetite
VIX + put/call + HY fund flows + small/large rotation — from VIXCLS, equity ETF ratios
Broader risk sentiment composite feeding into CRAI and scenario context.
Corporate Earnings
S&P 500 EPS estimates + revision rates + forward P/E — from Analyst consensus data
Earnings revision momentum and valuation context vs. historical percentile.
Housing Activity
Mortgage rates, starts, permits, existing sales, supply — from MORTGAGE30US, HOUST, PERMIT, EXHOSLUSM495S
Leading and coincident housing indicators aggregated for shelter inflation and wealth effect assessment.
Consumer Activity
Retail sales, sentiment, vehicle sales, consumer credit — from RSXFS, UMCSENT, TOTALSA, REVOLSL
Consumer spending momentum and confidence — the demand side of the macro equation.
Data Freshness & Latency
| Source | Frequency | Latency | Notes |
|---|---|---|---|
| FRED Economic Data | Daily / Weekly / Monthly | < 24 hours | Depends on series release schedule |
| GDELT Events | Real-time (~2h behind news) | < 2 hours | 80+ trusted domain whitelist |
| CFTC COT | Weekly (Friday release) | ~1 week | Reporting delay is inherent to CFTC |
| Polymarket | Real-time | < 5 minutes | 9 tracked tag categories |
| RSS Feeds | Every 15–30 minutes | 15–30 min | 46 sources across 7 categories |
| DFM Factor Scores | Daily (post macro data) | < 1 day | Computed after FRED ingestion |
| Bilateral Stress | Daily (00:30 UTC) | < 24 hours | Derived from GDELT events |
| Narrative Velocity | 30-minute snapshots | 30 min | Daily composite aggregation |
These 6 independent data source types — economic data, market prices, news clustering, futures positioning, proprietary composite indices, and cross-source divergences — don’t just feed existing scenarios. A daily “radar” scans across all of them looking for emerging macro configurations that don’t match anything currently being tracked, and autonomously promotes new scenarios when evidence accumulates across multiple sources.
How the Scenario Engine uses this data →