All articles

Getting Started with the World Bank API

First pull, key indicators, and what to expect when building pipelines on the World Bank Open Data API.

Source on EconIndx: World Bank Open Data — free, CC BY 4.0, 1,600+ indicators across 260+ economies.

Access & Pricing

Completely free, no API key required for standard use. Optionally register at data.worldbank.org for higher rate limits. No contracts, no enterprise tier. The CC BY 4.0 license permits commercial use with attribution.

Your First Data Pull

No authentication needed. Pull your first indicator in under a minute:

import requests
import pandas as pd

WB_BASE = "https://api.worldbank.org/v2"

def fetch_indicator(indicator: str, per_page: int = 5000) -> pd.DataFrame:
    """Fetch an indicator for all countries, most recent 50 years."""
    url = f"{WB_BASE}/country/all/indicator/{indicator}"
    params = {
        "format": "json",
        "per_page": per_page,
        "mrv": 50,   # most recent 50 values per country
    }
    r = requests.get(url, params=params)
    meta, data = r.json()

    if not data:
        return pd.DataFrame()

    rows = []
    for d in data:
        if d["value"] is not None:
            rows.append({
                "country_code": d["countryiso3code"],
                "country_name": d["country"]["value"],
                "indicator": indicator,
                "year": int(d["date"]),
                "value": d["value"],
            })
    return pd.DataFrame(rows)

# Pull GDP (current USD) for all countries
gdp = fetch_indicator("NY.GDP.MKTP.CD")
print(f"Rows: {len(gdp)}")
print(f"Countries: {gdp['country_code'].nunique()}")
print(f"Year range: {gdp['year'].min()}{gdp['year'].max()}")

First Pull: What to Expect

IndicatorDescriptionRows (all countries, 50yr)Country coverage
NY.GDP.MKTP.CDGDP, current USD~9,000–11,000~215 (sparse early)
SP.POP.TOTLPopulation~12,000260+
FP.CPI.TOTL.ZGInflation, CPI %~7,500~180
SL.UEM.TOTL.ZSUnemployment, %~6,000~150
NY.GDP.PCAP.CDGDP per capita, USD~10,000~215

One full indicator pull (all countries, 50 years) takes 2–5 seconds. Paginate using ?page=N if the total in the metadata exceeds your per_page setting.

Watch for aggregates: The response includes regional groups (WLD, ECS, EAP, etc.) alongside country rows. Filter by region.id != "NA" or check for 3-character ISO codes to exclude aggregates.

Key Indicators to Start With

Output & income:

  • NY.GDP.MKTP.CD — GDP, current USD
  • NY.GDP.MKTP.KD.ZG — GDP growth rate, annual %
  • NY.GDP.PCAP.PP.CD — GDP per capita, PPP (best for cross-country comparison)

People:

  • SP.POP.TOTL — total population
  • SP.DYN.LE00.IN — life expectancy at birth
  • SE.ADT.LITR.ZS — adult literacy rate

Economy:

  • FP.CPI.TOTL.ZG — inflation, CPI %
  • SL.UEM.TOTL.ZS — unemployment, total %
  • BX.KLT.DINV.WD.GD.ZS — FDI net inflows, % of GDP

Trade & finance:

  • NE.EXP.GNFS.ZS — exports of goods & services, % of GDP
  • GC.DOD.TOTL.GD.ZS — central government debt, % of GDP

Data Tolerance & Validation

What’s normal:

  • Null rates are high for low-income countries and early years. For NY.GDP.MKTP.CD, expect ~20–30% nulls across all country-years (many early years are missing). This is not a bug.
  • Regional aggregates (WLD, LIC, MIC, etc.) are present in every response. Store a is_aggregate flag rather than filtering them out — they’re useful for benchmarking.
  • Annual cadence: most indicators update once a year in April/May (World Development Indicators release). Don’t poll more than monthly.
  • Country coverage varies by indicator — some have 260+, others only 100–150 countries.

Validation checks:

def validate_wb_pull(df: pd.DataFrame, indicator: str) -> dict:
    countries = df["country_code"].nunique()
    null_rate = df["value"].isna().mean() if "value" in df.columns else 1.0
    latest_year = df["year"].max() if len(df) else None
    years_stale = 2026 - latest_year if latest_year else None

    return {
        "indicator": indicator,
        "row_count": len(df),
        "country_count": countries,
        "null_rate": round(null_rate, 4),
        "latest_year": latest_year,
        "stale_alert": years_stale > 2 if years_stale else True,
    }

report = validate_wb_pull(gdp, "NY.GDP.MKTP.CD")
print(report)
# Expected: row_count ~10000, country_count ~250, null_rate ~0.15-0.30

Alert thresholds:

  • Country count below 180 for a mainstream indicator: check the API or indicator status
  • latest_year older than 2 years behind current year: data may be deprecated
  • Null rate above 50% for a core macro indicator: investigate — may be a parsing issue

Loading Multiple Indicators Efficiently

import time

indicators = {
    "NY.GDP.MKTP.CD": "gdp_current_usd",
    "NY.GDP.MKTP.KD.ZG": "gdp_growth_pct",
    "SP.POP.TOTL": "population",
    "FP.CPI.TOTL.ZG": "inflation_cpi_pct",
    "SL.UEM.TOTL.ZS": "unemployment_pct",
}

all_frames = []
for code, label in indicators.items():
    df = fetch_indicator(code)
    df["indicator_label"] = label
    all_frames.append(df)
    time.sleep(1)  # polite — no published rate limit but respect the API

master = pd.concat(all_frames, ignore_index=True)
print(f"Total rows: {len(master)}")
# Expected: 45,000–60,000 rows for 5 indicators

Schema Stability

The Indicators API schema has been stable for years. Indicator codes occasionally retire — check the indicator metadata endpoint (/v2/indicator/{code}) for "sourceNote" and status. Country codes follow ISO 3166 alpha-3 with World Bank extensions (Kosovo = XKX, Channel Islands = CHI). Map these to your geography dimension on first load; the mapping rarely changes.

Next Steps

Learn

Recent guides

View all →