Drug Label Enrichment (openFDA SPL)
Indication text, pharmacological class, RxCUI, boxed warnings, and NDC codes from openFDA Structured Product Labels, joined to Orange Book applications by NDA number. Coverage by approval era and ANDA cross-reference pattern.
The drug_labels table joins FDA Structured Product Label (SPL) data to Orange Book NDA numbers. One row per NDA application, containing the most recent approved label: full indication text, pharmacological class, RxCUI cross-reference, UNII identifier, boxed warning text, and NDC code.
Data source: openFDA Drug Labels (SPL export). FDA mandates electronic SPL submission; this data originates from the same labels that appear on drug.fda.gov. Coverage is near-complete for NDAs approved after ~2007 when electronic submission became mandatory.
Coverage by Approval Era #
The drug_labels table covers approximately 4,470 NDAs (current export) — roughly 81% of the ~5,500 NDAs in applications. Coverage is uneven by approval era:
| Era | NDAs | With indication text |
|---|---|---|
| 2020s | Current | ~90%+ |
| 2010s | ~1,400 | ~90%+ |
| 2000s | ~1,100 | ~80% |
| Pre-2000 | ~2,000 | ~30–50% |
Older drugs often lack electronic SPL records because FDA’s electronic label submission requirement was phased in from 2007 onward. For pre-2000 approvals, a missing row in drug_labels does not indicate a problem with the Orange Book record — it simply means no electronic SPL is available.
ANDA (generic) applications have no independent labels. FDA does not require ANDAs to file their own SPL because they are bioequivalent to the brand NDA they reference. See the ANDA Cross-Reference section below for the join pattern.
Enrichment Fields #
| Field | Coverage | Description and analyst value |
|---|---|---|
indications_and_usage | ~81% of NDAs | Full indication text from the approved label. Often several paragraphs with subsections (1.1, 1.2, etc.) for multi-indication drugs. Use LIKE for keyword search. |
label_effective_date | ~81% of NDAs | Date this label version became effective. Not the drug approval date — labels can be updated many times post-approval. |
pharm_class_epc | ~47% of labeled NDAs | FDA Established Pharmacological Class, e.g. "HMG-CoA Reductase Inhibitor [EPC]". Segment a portfolio by mechanism without manual tagging. |
pharm_class_moa | ~40% of labeled NDAs | Mechanism of Action class, more granular than EPC, e.g. "Cyclooxygenase Inhibitors [MoA]". |
rxcui | ~60% of labeled NDAs | RxNorm Concept Unique Identifier(s), comma-separated. Join to claims databases, formulary files, IQVIA, or CMS Part D. |
unii | ~80% of labeled NDAs | FDA Unique Ingredient Identifier. Stable cross-reference to WHO INN, ClinicalTrials.gov, and FAERS. |
boxed_warning | ~15% of labeled NDAs | Full text of the FDA black-box warning when one exists. Filter high-risk classes, map safety profiles across a therapeutic area. |
ndc | ~75% of labeled NDAs | First NDC code in XXXXX-XXXX format. Link to dispensing data sources (IQVIA, Symphony Health, CMS Part D) for volume or market share analysis. |
The ANDA Gap — Retrieving Indication Text for Generics #
ANDA applications have no independent label data in this dataset. To retrieve indication text for a generic, join the ANDA’s products to their Reference Listed Drug NDA using ingredient + dosage_form_route, then join to drug_labels.
Always filter to a specific ingredient or ANDA first. The ingredient string match across the full products table is slow when unfiltered.
-- Indication text for a specific ANDA (by ANDA appl_no)
SELECT
anda.appl_no AS anda_appl_no,
anda.sponsor_name AS anda_sponsor,
anda_p.ingredient,
anda_p.dosage_form_route,
dl.indications_and_usage
FROM applications anda
JOIN products anda_p ON anda_p.appl_no = anda.appl_no
JOIN products nda_p
ON nda_p.ingredient = anda_p.ingredient
AND nda_p.dosage_form_route = anda_p.dosage_form_route
AND nda_p.appl_type = 'NDA'
AND nda_p.reference_listed_drug = 'Yes'
JOIN drug_labels dl ON dl.appl_no = nda_p.appl_no
WHERE anda.appl_type = 'ANDA'
AND anda.appl_no = '076477' -- Sun Pharma atorvastatin ANDA; replace as needed
AND dl.indications_and_usage IS NOT NULL
GROUP BY anda.appl_no, anda_p.ingredient, anda_p.dosage_form_route;
SQL Queries #
Indication keyword search
The drug_labels table has one row per NDA — no GROUP BY needed when joining via appl_no.
SELECT a.appl_no, a.sponsor_name, p.trade_name, p.ingredient,
l.label_effective_date,
SUBSTR(l.indications_and_usage, 1, 400) AS indication_snippet
FROM drug_labels l
JOIN applications a ON a.appl_no = l.appl_no
JOIN products p ON p.appl_no = l.appl_no AND p.reference_listed_drug = 'Yes'
WHERE l.indications_and_usage LIKE '%non-small cell lung%'
AND p.product_type != 'DISCN'
ORDER BY a.sponsor_name, p.trade_name;
Oncology drugs by indication text
SELECT DISTINCT a.appl_no, a.sponsor_name, p.trade_name, p.ingredient, p.approval_date
FROM drug_labels l
JOIN applications a ON a.appl_no = l.appl_no
JOIN products p ON p.appl_no = l.appl_no AND p.reference_listed_drug = 'Yes'
WHERE (l.indications_and_usage LIKE '%cancer%'
OR l.indications_and_usage LIKE '%tumor%'
OR l.indications_and_usage LIKE '%leukemia%'
OR l.indications_and_usage LIKE '%lymphoma%')
AND a.appl_type = 'NDA'
AND p.product_type != 'DISCN'
ORDER BY p.approval_date DESC;
Pharmacological class breakdown with active exclusivity
A useful starting point for therapeutic area competitive analysis: which pharmacological classes still have drugs under exclusivity protection?
SELECT dl.pharm_class_epc, COUNT(DISTINCT dl.appl_no) AS nda_count,
COUNT(DISTINCT CASE WHEN le.appl_no IS NOT NULL THEN dl.appl_no END) AS under_protection
FROM drug_labels dl
LEFT JOIN loe_estimates le ON le.appl_no = dl.appl_no
WHERE dl.pharm_class_epc IS NOT NULL
GROUP BY dl.pharm_class_epc
ORDER BY under_protection DESC, nda_count DESC
LIMIT 30;
Drugs with boxed warnings by pharmacological class
SELECT pharm_class_epc, COUNT(*) AS drug_count
FROM drug_labels
WHERE boxed_warning IS NOT NULL AND pharm_class_epc IS NOT NULL
GROUP BY pharm_class_epc
ORDER BY drug_count DESC
LIMIT 20;
Drug class + exclusivity end date + active protection
SELECT dl.pharm_class_epc, dl.brand_name,
SUBSTR(dl.indications_and_usage, 1, 200) AS indication_snippet,
e.exclusivity_code, e.exclusivity_expiration_date
FROM drug_labels dl
JOIN exclusivity e ON e.appl_no = dl.appl_no
WHERE dl.pharm_class_epc IS NOT NULL
AND e.exclusivity_expiration_date >= date('now')
ORDER BY e.exclusivity_expiration_date DESC
LIMIT 25;
RxCUI cross-reference: find all NDAs with a specific RxNorm concept
-- The rxcui field can contain multiple comma-separated values.
-- This pattern safely matches a specific CUI within the list.
-- 402014 = omeprazole; replace with the CUI of interest.
SELECT appl_no, brand_name, rxcui
FROM drug_labels
WHERE ',' || rxcui || ',' LIKE '%,402014,%';
Full indication text for a specific NDA
SELECT l.appl_no, l.label_effective_date, l.brand_name, l.generic_name,
l.pharm_class_epc, l.indications_and_usage
FROM drug_labels l
WHERE l.appl_no = '209637';
All ANDAs for a specific ingredient — competitive generic landscape
SELECT anda.appl_no AS anda_appl_no, anda.sponsor_name AS anda_sponsor,
anda_p.strength, anda_p.dosage_form_route, anda_p.te_code, anda_p.approval_date
FROM applications anda
JOIN products anda_p ON anda_p.appl_no = anda.appl_no
WHERE anda.appl_type = 'ANDA'
AND LOWER(anda_p.ingredient) = LOWER('ATORVASTATIN CALCIUM')
AND anda_p.product_type != 'DISCN'
ORDER BY anda_p.approval_date;
Source: openFDA Drug Labels · FDA Structured Product Labeling · FDA Electronic Orange Book