CONVEX
Methodology 02

Data Pipeline

Over 150 live data sources are ingested, validated, and transformed into a unified macro data estate. This page documents every external feed, quality control measure, and internally derived composite signal.

FRED Economic Data — 107 Series

The backbone of Convex’s data estate is the Federal Reserve Economic Data (FRED) API, maintained by the St. Louis Fed. We ingest 107 time series spanning the full macro spectrum — from overnight rates to housing starts. Every series has documented metadata, update frequency, and seasonal adjustment status.

Quality controls: All incoming data points pass outlier-bounds validation. Missing values are handled via forward-fill (for irregular release schedules) or interpolation (for known gaps). Source-level errors are tracked and surfaced in pipeline observability.

CategorySeriesExamples
Yield Curve & Rates17DGS1–DGS30, T10Y2Y, T10Y3M, SOFR, EFFR, term premium (ACM)
Inflation Pipeline13CPI, Core CPI, PCE, Core PCE, PPI, breakeven inflation rates (5Y, 10Y)
Credit & Financial Stress14HY/IG spreads, NFCI, credit delinquency rates, SLOOS lending standards
Liquidity Complex8Fed balance sheet (WALCL), RRP, TGA, M2, reserve balances
Labour Market13Unemployment, claims (weekly), JOLTS, participation rate, Sahm Rule
Economic Activity19GDP, industrial production, retail sales, durable goods, ISM PMIs
Housing & Consumer12Mortgage rates, building permits, existing sales, consumer sentiment
Market Indices & Vol11S&P 500, VIX, put/call ratio, small-cap/large-cap rotation

GDELT Geopolitical Events

The GDELT Project (Global Database of Events, Language, and Tone) provides real-time event monitoring of global media. Convex ingests GDELT event data through per-pillar search queries across 6 geopolitical categories.

Noise filtering: Raw GDELT output is extremely noisy. We apply a strict 80+ trusted domain whitelist (Reuters, AP, AFP, BBC, FT, Economist, Bloomberg, government agencies, think tanks, defence outlets) that filters approximately 95% of GDELT noise while retaining high-quality event signals.

Events are queried in batches (3 parallel queries with inter-batch delays) with rate limiting (max 1 request/second, exponential backoff on 429/5xx responses).

Six Geopolitical Pillars
Sanctions & Trade Restrictions
Supply Chain Disruption
Sovereign Divergence
Energy Security
Armed Conflict
Political Instability

RSS News Feeds — 46 Sources

Convex monitors 46 editorially diverse news sources via RSS, categorised into 7 editorial leans. This diversity is intentional — editorial convergence across normally-opposed sources is itself a powerful signal. Each source has a known editorial lean, update frequency, and authority profile.

CategoryExamplesPurpose
EstablishmentBloomberg, Financial Times, Reuters, Wall Street Journal, BBCMainstream institutional consensus view
ContrarianWolf Street, ZeroHedge, independent macro analystsCounter-consensus perspectives and early warnings
Wire ServicesAP, AFP, Reuters WireBreaking news primacy detection
GovernmentFederal Reserve, US Treasury, ECB, BoE, PBOC communicationsDirect policy signals and forward guidance
AcademicNBER, Brookings, policy think tanks, research institutesStructural analysis and long-term framework shifts
Crypto-NativeSpecialised digital asset outletsOn-chain sentiment and DeFi-specific developments
NeutralAP (general), balanced financial outletsBaseline narrative without editorial lean

CFTC Commitments of Traders

The Commodity Futures Trading Commission (CFTC) publishes weekly Commitments of Traders reports showing how commercial hedgers, non-commercial speculators, and other traders are positioned in major futures contracts. Convex ingests this via the CFTC Socrata Open Data API.

For each contract, we track non-commercial long/short positions, net speculative positioning (long minus short), commercial net positioning, and open interest. We then compute 52-week percentile ranks for net speculative positioning — readings above the 90th or below the 10th percentile trigger "peak positioning" events that signal potential contrarian setups.

ContractExchangeTracked Series
Bitcoin FuturesCMENC Long, NC Short, Net Spec, Commercial Net, Open Interest
Gold FuturesCOMEXNC Long, NC Short, Net Spec, Commercial Net, Open Interest
Crude Oil (WTI)NYMEXNC Long, NC Short, Net Spec, Commercial Net, Open Interest
S&P 500 E-miniCMENC Long, NC Short, Net Spec, Commercial Net, Open Interest

Polymarket Prediction Markets

Convex tracks active prediction markets via the Polymarket Gamma APIacross 9 tag categories: politics, crypto, economics, geopolitics, Fed policy, inflation, recession, interest rates, and trade. Market prices represent crowd-aggregated implicit probabilities.

Prediction market data serves two purposes: providing an external market-implied probability anchor for comparison against our Bayesian scenario estimates, and surfacing emerging consensus on policy shifts before they appear in traditional data.

Data is cached with 5-minute revalidation. When a tracked scenario has an explicit Polymarket slug mapping, the market-implied probability appears alongside our model estimate.

Macro Intelligence Pipeline

Raw events from GDELT and RSS feeds pass through a multi-gate pipeline before reaching publication. This ensures only genuinely significant, non-duplicate, quality-validated content is published.

Gate 1

Pre-Filter

Raw events ingested from GDELT and RSS. Text normalisation, pillar keyword matching (must hit at least 1 of 6 macro pillars), CAMEO code whitelist with Goldstein threshold (|score| ≥ 2.0), and deduplication — both exact hash (Tier 1) and simhash near-duplicate detection with Jaccard > 0.75 (Tier 2).

Gate 2

Classification

LLM-based classification: is this event relevant to macro trading? Which editorial pillar? Priority assignment: FLASH (urgent, 30-minute generation SLA), STANDARD (broader context aggregation), or WATCHLIST (monitor only). Circuit breaker pauses classification if daily API costs exceed ceiling.

Gate 3

Brief Generation

FLASH events receive deep enrichment (20+ data points) and generate urgent ~200 word briefs. STANDARD events aggregate broader context into ~400 word macro briefs. Up to 2 regeneration attempts on generation failure.

Quality Gates

Validation

Deterministic validation (mechanism relevance, consistency checks) and editorial validation (word count 900–1500, tone appropriateness, deduplication within 48-hour window). Hard gates auto-publish on pass; soft gates route to editor review on failure.

Observability: Every pipeline run logs events processed, classified, deemed relevant, generated, and published. Per-stage event logging and error tracking enable rapid diagnosis when pipeline throughput changes.

Derived Composite Signals

Beyond raw data ingestion, Convex computes 10 derived composite signals that synthesize multiple FRED series into higher-level macro narratives. These are pre-computed and injected as context into every analysis pipeline — providing the AI research desk with structured, quantitative context rather than raw numbers.

Net Liquidity Proxy

Fed Balance Sheet − RRP − TGA — from WALCL, RRPONTSYD, WTREGEN

Actual market liquidity in trillions, with 1-week, 1-month, and 3-month changes computed. Direction classified as expanding, contracting, or stable.

Yield Curve Shape

2s5s, 5s10s, 2s10s, 2s30s spreads + butterfly — from DGS2, DGS5, DGS10, DGS30

Full term structure analysis with z-scored spreads and movement classification: bear steepening, bull flattening, bear flattening, or bull steepening.

Real Yields

5Y TIPS, 10Y TIPS, slope, term premium — from DFII5, DFII10, THREEFYTP10

Real interest rate complex with slope (10Y minus 5Y TIPS) and ACM term premium. 1-month change direction tracked.

Inflation Expectations

5Y/10Y breakeven, slope, expectations index — from T5YIE, T10YIE

Market-implied inflation expectations from TIPS breakevens. Slope (10Y minus 5Y) reveals whether markets see inflation as transitory or structural.

Labour Market Intensity

Composite of unemployment, claims, participation, emp-pop — from UNRATE, IC4WSA, CIVPART, EMRATIO

Multi-dimensional labour market health assessment. Change direction classified: tightening, loosening, or stable.

Credit Health

HY spread + IG spread + credit conditions z-score — from BAMLH0A0HYM2, BAMLC0A0CM, NFCI

Composite credit stress measurement combining spread levels and the NFCI financial conditions index.

Risk Appetite

VIX + put/call + HY fund flows + small/large rotation — from VIXCLS, equity ETF ratios

Broader risk sentiment composite feeding into CRAI and scenario context.

Corporate Earnings

S&P 500 EPS estimates + revision rates + forward P/E — from Analyst consensus data

Earnings revision momentum and valuation context vs. historical percentile.

Housing Activity

Mortgage rates, starts, permits, existing sales, supply — from MORTGAGE30US, HOUST, PERMIT, EXHOSLUSM495S

Leading and coincident housing indicators aggregated for shelter inflation and wealth effect assessment.

Consumer Activity

Retail sales, sentiment, vehicle sales, consumer credit — from RSXFS, UMCSENT, TOTALSA, REVOLSL

Consumer spending momentum and confidence — the demand side of the macro equation.

Data Freshness & Latency

SourceFrequencyLatencyNotes
FRED Economic DataDaily / Weekly / Monthly< 24 hoursDepends on series release schedule
GDELT EventsReal-time (~2h behind news)< 2 hours80+ trusted domain whitelist
CFTC COTWeekly (Friday release)~1 weekReporting delay is inherent to CFTC
PolymarketReal-time< 5 minutes9 tracked tag categories
RSS FeedsEvery 15–30 minutes15–30 min46 sources across 7 categories
DFM Factor ScoresDaily (post macro data)< 1 dayComputed after FRED ingestion
Bilateral StressDaily (00:30 UTC)< 24 hoursDerived from GDELT events
Narrative Velocity30-minute snapshots30 minDaily composite aggregation
Where this data goes

These 6 independent data source types — economic data, market prices, news clustering, futures positioning, proprietary composite indices, and cross-source divergences — don’t just feed existing scenarios. A daily “radar” scans across all of them looking for emerging macro configurations that don’t match anything currently being tracked, and autonomously promotes new scenarios when evidence accumulates across multiple sources.

How the Scenario Engine uses this data →