Schema Reference
Table and column reference for the RxDataLab Exclusivity dataset including applications, products, patents, exclusivity, drug labels, derived views, and change log tables.
This page documents tables and columns in the orangebook.db export. CSV files will mimic the relational tables in orangebook.db. The dataset is organized into four layers: core FDA snapshot tables (applications, products, patents, exclusivity, drug labels), reference lookup tables (exclusivity codes, patent use codes), derived analytical views (LOE estimates, first generic entry), and the monthly change log tables. For definitive definitions of FDA terms, please see the FDA Orangebook Data Files Reference
applications #
One row per FDA drug application. The Orange Book covers small-molecule drugs only: NDA (brand) and ANDA (generic). Biologics are tracked in the FDA Purple Book and are not present here. The application number (appl_no) is the primary join key across this dataset (make sure you don’t confuse application numbers).
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | FDA drug application number (e.g. "021446"). Primary key across all tables. Not a patent application number. |
appl_type | TEXT | "NDA" (brand drug) or "ANDA" (generic drug) |
sponsor_name | TEXT | Raw FDA applicant name as listed in the Orange Book. Always populated. The authoritative name — use this when rxdl_company is blank. |
rxdl_company | TEXT | RxDataLab harmonized company name. Normalizes subsidiary names and historical name changes to a single canonical entity (e.g., "PFIZER LABORATORIES DIV PFIZER INC" → "Pfizer"). Empty string when unmatched. Not all Orangebook Companies are covered. |
rxdl_ticker | TEXT | Stock ticker symbol for publicly traded companies (e.g., "PFE"). Empty string when private or not yet matched. |
rxdl_cik | TEXT | SEC Central Index Key. Enables joining to SEC filings and EDGAR data. Empty string when not available. |
rxdl_company_id | INTEGER | RxDataLab internal company identifier. Use as a join key to other RxDataLab datasets. NULL when unmatched. |
Company search pattern: Use sponsor_name LIKE '%PFIZER%' for raw FDA name searches (always populated). Where available, you can use rxdl_company LIKE '%PFIZER%' for harmonized searches which will consolidate subsidiaries and name account for name changes. Coverage of rxdl_* fields skews toward commercially significant NDAs from public companies that have active clinical trials. Most ANDA filers are private generic manufacturers and remain unmatched.
products #
One row per product formulation. A single NDA routinely has multiple products — different strengths, routes, and release types. An NDA for a 10 mg, 20 mg, and 40 mg tablet will have three rows here (possibly four if an IV form was also approved). Patents and exclusivity link at the (appl_no, product_no) level.
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | Links to applications.appl_no |
product_no | TEXT | Product number within the application (e.g. "001", "002"). Opaque FDA sequence number — use as a join key, not for display. |
trade_name | TEXT | Brand name |
ingredient | TEXT | Active pharmaceutical ingredient(s). Combination products list multiple ingredients separated by semicolons. |
dosage_form_route | TEXT | Dosage form and route, semicolon-separated (e.g. "TABLET;ORAL", "SOLUTION;INTRAVENOUS", "CAPSULE, EXTENDED RELEASE;ORAL") |
strength | TEXT | Dosage strength as listed in the Orange Book (e.g. "20MG", "200MG/10ML") |
appl_type | TEXT | "NDA" or "ANDA" — redundant with applications.appl_type, included for convenience when querying products alone |
te_code | TEXT | FDA Therapeutic Equivalence code. Indicates generic substitutability (e.g. "AB" = bioequivalent, "BX" = insufficient data). Empty for NDA brand products. Absent for some older ANDA products (pre-1980) where FDA did not assign TE codes. |
approval_date | TEXT | FDA approval date for this product (YYYY-MM-DD format). Approval is at the product level — an NDA can have products with different approval dates. Note: a small number of older products have an empty string '' instead of NULL when date is unknown. Use approval_date > '' to filter out both NULL and empty. |
reference_listed_drug | TEXT | "Yes" if this is the Reference Listed Drug — the specific brand product that ANDA generics must demonstrate bioequivalence against. Most NDAs have exactly one RLD product. Approximately 30% of older (pre-2000) NDAs have no product designated as RLD. |
reference_standard | TEXT | "Yes" if used as a Reference Standard for in vitro testing. Usually overlaps with reference_listed_drug. |
product_type | TEXT | "RX" (prescription), "OTC" (over-the-counter), or "DISCN" (discontinued). Always filter product_type != 'DISCN' to exclude discontinued formulations. |
applicant | TEXT | Abbreviated FDA applicant name from the product record. Format may differ from applications.sponsor_name — use sponsor_name from the applications table for the full name. |
Key join pattern: When you want one row per NDA (not one per strength), filter AND p.reference_listed_drug = 'Yes'. When you need full NDA coverage (including pre-2000 drugs with no RLD), do not make this a required filter.
patents #
Active patent listings in the Orange Book. Each row represents one patent listed against one specific product formulation. The same patent number can appear multiple times if it covers multiple formulations.
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | Links to applications.appl_no |
product_no | TEXT | Links to products.product_no — the patent is listed against this specific formulation |
patent_no | TEXT | USPTO granted patent number. Join to patent_details for title, grant date, and patent_application_number. |
patent_expire_date | TEXT | Patent expiration date (YYYY-MM-DD). This is the base expiry date. When pediatric_excl = 1, effective protection extends six months beyond this date. The loe_estimates view applies this +6-month adjustment automatically. |
use_code | TEXT | FDA use code identifying which claimed method of use this patent covers. Join patent_use_codes for the full definition. Empty string when the patent covers the drug substance or formulation (not a specific use). |
pediatric_excl | INTEGER | 1 = FDA has granted a 6-month pediatric exclusivity extension on this patent (under 21 U.S.C. § 505A, Best Pharmaceuticals for Children Act). The patent_expire_date is the base date; effective protection ends 6 months later. loe_estimates applies this adjustment automatically. 0 = no pediatric extension. |
drug_substance_flag | TEXT | "Y" = patent claims cover the drug substance (active ingredient). The hardest type to design around. |
drug_product_flag | TEXT | "Y" = patent claims cover the drug product (formulation, composition). A patent can have both flags. |
delist_flag | TEXT | "Y" = patent has been delisted from the Orange Book. May still be active at USPTO. Delisted patents no longer support 30-month stay. Filter delist_flag != 'Y' in most analyses. |
ob_submission_date | TEXT | Date the NDA holder submitted this patent to FDA for Orange Book listing (YYYY-MM-DD). Distinct from the USPTO filing date and the drug approval date. |
Primary key note: The natural key is (appl_no, product_no, patent_no, use_code, pediatric_excl) — the same patent number can appear as two distinct rows (with and without pediatric exclusivity), representing different legal IP positions.
patent_details #
USPTO metadata for each listed patent number, sourced from PatentsView. One row per unique patent number.
| Column | Type | Description |
|---|---|---|
patent_no | TEXT | USPTO granted patent number (joins to patents.patent_no) |
title | TEXT | Patent title from USPTO |
grant_date | TEXT | Date USPTO granted the patent (YYYY-MM-DD). Typically 20 years before patent_expire_date, less any Patent Term Extension. Distinct from the Orange Book submission date and the drug approval date. |
patent_application_number | TEXT | USPTO pre-grant application number (e.g. "17/123456"). Distinct from appl_no (FDA drug application). Use for USPTO PAIR lookups to access prosecution history, office actions, and prior art. Has no relationship to FDA approval. |
exclusivity #
FDA statutory exclusivity grants. Multiple exclusivity types can apply to the same product simultaneously.
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | Links to applications.appl_no. Note: exclusivity can appear on both NDA and ANDA applications — the PC (Patent Challenge) code appears on ANDAs only. |
product_no | TEXT | Links to products.product_no |
exclusivity_code | TEXT | FDA exclusivity code (e.g. "NCE", "ODE", "PC"). Join exclusivity_codes for the plain-English definition. |
exclusivity_expiration_date | TEXT | Date this exclusivity period expires (YYYY-MM-DD). NOT the grant date. After this date, FDA may approve applications that rely on the protected drug’s safety/efficacy data. |
PC code represents a Paragraph IV first-filer exclusivity: The PC code (definition: “PATENT CHALLENGE”) is granted to the first ANDA applicant that files a Paragraph IV certification asserting that a listed Orange Book patent is invalid or will not be infringed. If the challenge succeeds, the first filer earns 180 days of exclusive generic marketing before other ANDAs can launch. PC entries appear on ANDA appl_no values, not NDA values, so they do not appear in loe_estimates (which covers NDAs only). See the Exclusivity Types page for full details.
exclusivity_codes #
Reference table. Plain-English definitions for all exclusivity codes used in the exclusivity table.
| Column | Type | Description |
|---|---|---|
exclusivity_code | TEXT | FDA exclusivity code (primary key, joins to exclusivity.exclusivity_code) |
definition | TEXT | FDA plain-English description of the exclusivity type |
-- Look up a code definition
SELECT exclusivity_code, definition FROM exclusivity_codes WHERE exclusivity_code = 'NCE';
-- Returns: NCE | NEW CHEMICAL ENTITY
-- See all codes in the dataset
SELECT exclusivity_code, definition FROM exclusivity_codes ORDER BY exclusivity_code;
patent_use_codes #
Reference table. Definitions for FDA use codes assigned to method-of-use patents in the patents table. Over 4,000 codes are currently defined.
| Column | Type | Description |
|---|---|---|
use_code | TEXT | FDA use code identifier (primary key, joins to patents.use_code) |
definition | TEXT | Full description of the specific method of use this code covers |
Use codes are important for Paragraph IV challenges and section viii statements for ANDA applicants, and are meant to ensure ANDA applicants can understand exactly what the claimed patent is covering to avoid infringement of (or challenge) patent protections. See the the Congressional Research Service publication on this topic for more information.
-- All use codes for a specific NDA (021196 = Xyrem / sodium oxybate, a drug with multiple method-of-use patents)
SELECT DISTINCT pt.patent_no, pt.use_code, puc.definition, pt.patent_expire_date
FROM patents pt
LEFT JOIN patent_use_codes puc ON puc.use_code = pt.use_code
WHERE pt.appl_no = '021196'
ORDER BY pt.patent_expire_date;
drug_labels #
One row per NDA application, containing the most recent FDA-approved label from Structured Product Labeling (SPL). ANDA (generic) applications have no independent label record — they reference the brand NDA.
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | Primary key. Links to applications.appl_no. One row per NDA. |
brand_name | TEXT | Brand name from the SPL label |
generic_name | TEXT | Established (INN) name from the SPL label |
manufacturer_name | TEXT | Manufacturer name from the SPL label |
route | TEXT | Route of administration from the SPL (e.g. "ORAL"). Format may differ from dosage_form_route in products. |
indications_and_usage | TEXT | Full indication text from the approved FDA label. Often multiple paragraphs with subsections (1.1, 1.2, etc.) for multi-indication drugs. Use LIKE '%keyword%' for text search. |
label_effective_date | TEXT | Date this version of the label became effective (YYYY-MM-DD). Not the drug approval date — labels are updated many times post-approval. |
pharm_class_epc | TEXT | FDA Established Pharmacological Class (e.g. "HMG-CoA Reductase Inhibitor [EPC]"). NULL when not classified. Present for ~47% of labeled NDAs. |
pharm_class_moa | TEXT | FDA Mechanism of Action class (e.g. "Cyclooxygenase Inhibitors [MoA]"). More granular than EPC. Present for ~40% of labeled NDAs. |
rxcui | TEXT | RxNorm Concept Unique Identifier(s), comma-separated when multiple. Join to claims databases, formulary files, IQVIA, or CMS Part D. Present for ~60% of labeled NDAs. |
unii | TEXT | FDA Unique Ingredient Identifier. Stable cross-reference to WHO INN, ClinicalTrials.gov, and FAERS. Present for ~80% of labeled NDAs. |
boxed_warning | TEXT | Full text of the FDA black-box warning. NULL when none exists (approximately 85% of labeled drugs have no boxed warning). |
ndc | TEXT | First NDC code in XXXXX-XXXX format. For linking to dispensing data (IQVIA, Symphony Health, CMS Part D). Present for ~75% of labeled NDAs. |
Coverage: Approximately 81% of NDAs have indication text. Coverage is ~90%+ for 2010s and later approvals, ~30–50% for pre-2000 approvals. See Drug Labels for the full coverage table and ANDA join pattern.
loe_estimates (derived view) #
One row per NDA that currently has at least one active patent or exclusivity listing. This is the analytical core of the dataset — the computed loss-of-exclusivity date per NDA. NDAs that have already passed their reported LOE are excluded; use first_generic_entry for those.
| Column | Type | Description |
|---|---|---|
appl_no | TEXT | FDA NDA application number |
sponsor_name | TEXT | Raw FDA applicant name |
rxdl_company | TEXT | Harmonized company name (RxDataLab MDM). Empty string when not yet matched. |
trade_name | TEXT | Brand name from the Reference Listed Drug product. Blank for ~30% of pre-2000 NDAs with no RLD designation. |
ingredient | TEXT | Active ingredient(s) from the Reference Listed Drug product |
max_patent_exp | TEXT | Latest effective patent expiry across all patents for this NDA, with +6 months applied per patent where pediatric_excl = 1. NULL when no patents are listed (protected by exclusivity only). |
any_ped_extension | INTEGER | 1 if any patent for this NDA carries a pediatric exclusivity extension |
max_excl_exp | TEXT | Latest exclusivity expiration date across all active exclusivity grants for this NDA. NULL when no exclusivity (protected by patents only). |
excl_codes | TEXT | Comma-separated list of active exclusivity codes (e.g. "NCE,ODE-488") |
reported_loe_date | TEXT | The reported exclusivity cliff. MAX(max_patent_exp, max_excl_exp). Never NULL — the view only includes NDAs where at least one is present. |
years_to_loe | REAL | Floating-point years from export generation date to reported_loe_date. Recompute from reported_loe_date using julianday for current-relative values. |
Methodology: reported_loe_date = MAX(max_patent_exp, max_excl_exp). Pediatric exclusivity (+6 months) is applied per-patent before taking the MAX — the PED-adjusted expiry for each individual patent is computed first, then MAX is taken across all adjusted patent expiries. See LOE Estimates: Methodology for the full computation and caveats.
first_generic_entry (derived view) #
One row per NDA formulation (by ingredient + dosage_form_route). Identifies the earliest ANDA approval date for each brand drug formulation — the observed date when generic competition entered. Historical complement to loe_estimates.
| Column | Type | Description |
|---|---|---|
nda_appl_no | TEXT | FDA NDA application number of the brand drug |
nda_sponsor | TEXT | Raw FDA applicant name for the brand NDA |
trade_name | TEXT | Brand name from the Reference Listed Drug product |
ingredient | TEXT | Active ingredient(s) used for the ANDA-to-NDA match |
dosage_form_route | TEXT | Dosage form and route used for matching (e.g. "TABLET;ORAL") |
nda_approval_date | TEXT | Approval date of the Reference Listed Drug product for this specific formulation. Use with first_generic_approval_date to compute effective exclusivity period (can be negative — see below). |
first_generic_approval_date | TEXT | Earliest FDA approval date among all ANDAs matching this ingredient + dosage_form_route |
first_anda_appl_no | TEXT | ANDA application number of the first-to-market generic. Join to products, exclusivity, or product_changes for further drill-down. |
generic_sponsor | TEXT | FDA applicant name on the earliest-approved ANDA at the time of that approval. Reflects the original applicant of record; ANDA ownership can change post-approval. |
para4_challenge | INTEGER | 1 if any ANDA for this formulation holds PC (Patent Challenge) exclusivity — indicates a Paragraph IV challenge was filed and the first-filer earned 180-day exclusivity. Generic entry likely preceded the NDA’s reported patent expiry. 0 otherwise. |
generic_count | INTEGER | Number of distinct ANDA applications approved for this ingredient + dosage_form_route |
Multiple rows per NDA: A single NDA can yield multiple rows — one per distinct dosage_form_route. Different formulations lose exclusivity independently. To collapse to one row per NDA, use MIN(first_generic_approval_date) GROUP BY nda_appl_no.
Effective exclusivity period: ROUND((julianday(first_generic_approval_date) - julianday(nda_approval_date)) / 365.25, 1) yields years the brand held the market. Can be negative when the NDA was filed for a molecule that already had generics — filter first_generic_approval_date >= nda_approval_date for genuine LOE analysis.
Change Log Tables #
Four tables track monthly changes derived from the Orange Book snapshot: product_changes, patent_changes, exclusivity_changes, and application_changes. All share this structural pattern:
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-increment row identifier. Used as a stable reference key in before/after joins. |
event | TEXT | "beginning_of_history", "added", "updated", or "removed" |
ob_release_date | TEXT | FDA Orange Book publication date when this change was observed (YYYY-MM-DD, always the first of the month). Not a drug approval date or market action date. |
| (all data columns) | All columns from the corresponding current-snapshot table, with the same names |
beginning_of_history is not a genuine addition. Rows with this event were already present when RxDataLab first imported the dataset (October 2023 in the current export). A drug approved in 1995 that was in the October 2023 snapshot gets beginning_of_history — it is not a “new approval” in 2023. For trend analysis, always filter: WHERE event != 'beginning_of_history'.
updated events show new values only. To recover what changed (the before/after), self-join to the most recent prior event for the same record key using the id column. See Change History: updated events.
product_changes additionally includes an appl_type column ("NDA" or "ANDA") so you can filter to a specific application type without joining to applications.
export_metadata #
Single-row table with information about when this export was generated and what history it covers.
| Column | Type | Description |
|---|---|---|
export_date | TEXT | Date this export was generated |
ob_release_date | TEXT | Latest Orange Book release included |
history_start | TEXT | Earliest monthly release in the change log (baseline date, i.e. beginning_of_history event date) |
history_end | TEXT | Latest monthly release in the change log |
SELECT * FROM export_metadata;
Three Application Number Types #
Three distinct number types appear in the dataset, and two of them contain the word “application”. Be careful when querying that you are using appropriate values:
| Identifier | Column | Tables | System | Example |
|---|---|---|---|---|
| FDA drug application number | appl_no | applications, products, patents, exclusivity, drug_labels, all change tables | FDA Center for Drug Evaluation and Research (CDER) | "021446" (Lipitor NDA) |
| USPTO granted patent number | patent_no | patents, patent_details, patent_changes | United States Patent and Trademark Office | "7777050" |
| USPTO pre-grant patent filing number | patent_application_number | patent_details only | USPTO Patent Application Information Retrieval (PAIR) | "17/123456" |
appl_no is the join key across all FDA tables. patent_application_number is a completely separate identifier used only for USPTO prosecution research. When writing joins, always qualify with table aliases (pt.appl_no for the FDA drug application, pd.patent_application_number for the USPTO filing).