Schema Reference

Table and column reference for the RxDataLab Exclusivity dataset including applications, products, patents, exclusivity, drug labels, derived views, and change log tables.

This page documents tables and columns in the orangebook.db export. CSV files will mimic the relational tables in orangebook.db. The dataset is organized into four layers: core FDA snapshot tables (applications, products, patents, exclusivity, drug labels), reference lookup tables (exclusivity codes, patent use codes), derived analytical views (LOE estimates, first generic entry), and the monthly change log tables. For definitive definitions of FDA terms, please see the FDA Orangebook Data Files Reference

applications

One row per FDA drug application. The Orange Book covers small-molecule drugs only: NDA (brand) and ANDA (generic). Biologics are tracked in the FDA Purple Book and are not present here. The application number (appl_no) is the primary join key across this dataset (make sure you don’t confuse application numbers).

ColumnTypeDescription
appl_noTEXTFDA drug application number (e.g. "021446"). Primary key across all tables. Not a patent application number.
appl_typeTEXT"NDA" (brand drug) or "ANDA" (generic drug)
sponsor_nameTEXTRaw FDA applicant name as listed in the Orange Book. Always populated. The authoritative name — use this when rxdl_company is blank.
rxdl_companyTEXTRxDataLab harmonized company name. Normalizes subsidiary names and historical name changes to a single canonical entity (e.g., "PFIZER LABORATORIES DIV PFIZER INC""Pfizer"). Empty string when unmatched. Not all Orangebook Companies are covered.
rxdl_tickerTEXTStock ticker symbol for publicly traded companies (e.g., "PFE"). Empty string when private or not yet matched.
rxdl_cikTEXTSEC Central Index Key. Enables joining to SEC filings and EDGAR data. Empty string when not available.
rxdl_company_idINTEGERRxDataLab internal company identifier. Use as a join key to other RxDataLab datasets. NULL when unmatched.

Company search pattern: Use sponsor_name LIKE '%PFIZER%' for raw FDA name searches (always populated). Where available, you can use rxdl_company LIKE '%PFIZER%' for harmonized searches which will consolidate subsidiaries and name account for name changes. Coverage of rxdl_* fields skews toward commercially significant NDAs from public companies that have active clinical trials. Most ANDA filers are private generic manufacturers and remain unmatched.

products

One row per product formulation. A single NDA routinely has multiple products — different strengths, routes, and release types. An NDA for a 10 mg, 20 mg, and 40 mg tablet will have three rows here (possibly four if an IV form was also approved). Patents and exclusivity link at the (appl_no, product_no) level.

ColumnTypeDescription
appl_noTEXTLinks to applications.appl_no
product_noTEXTProduct number within the application (e.g. "001", "002"). Opaque FDA sequence number — use as a join key, not for display.
trade_nameTEXTBrand name
ingredientTEXTActive pharmaceutical ingredient(s). Combination products list multiple ingredients separated by semicolons.
dosage_form_routeTEXTDosage form and route, semicolon-separated (e.g. "TABLET;ORAL", "SOLUTION;INTRAVENOUS", "CAPSULE, EXTENDED RELEASE;ORAL")
strengthTEXTDosage strength as listed in the Orange Book (e.g. "20MG", "200MG/10ML")
appl_typeTEXT"NDA" or "ANDA" — redundant with applications.appl_type, included for convenience when querying products alone
te_codeTEXTFDA Therapeutic Equivalence code. Indicates generic substitutability (e.g. "AB" = bioequivalent, "BX" = insufficient data). Empty for NDA brand products. Absent for some older ANDA products (pre-1980) where FDA did not assign TE codes.
approval_dateTEXTFDA approval date for this product (YYYY-MM-DD format). Approval is at the product level — an NDA can have products with different approval dates. Note: a small number of older products have an empty string '' instead of NULL when date is unknown. Use approval_date > '' to filter out both NULL and empty.
reference_listed_drugTEXT"Yes" if this is the Reference Listed Drug — the specific brand product that ANDA generics must demonstrate bioequivalence against. Most NDAs have exactly one RLD product. Approximately 30% of older (pre-2000) NDAs have no product designated as RLD.
reference_standardTEXT"Yes" if used as a Reference Standard for in vitro testing. Usually overlaps with reference_listed_drug.
product_typeTEXT"RX" (prescription), "OTC" (over-the-counter), or "DISCN" (discontinued). Always filter product_type != 'DISCN' to exclude discontinued formulations.
applicantTEXTAbbreviated FDA applicant name from the product record. Format may differ from applications.sponsor_name — use sponsor_name from the applications table for the full name.

Key join pattern: When you want one row per NDA (not one per strength), filter AND p.reference_listed_drug = 'Yes'. When you need full NDA coverage (including pre-2000 drugs with no RLD), do not make this a required filter.

patents

Active patent listings in the Orange Book. Each row represents one patent listed against one specific product formulation. The same patent number can appear multiple times if it covers multiple formulations.

ColumnTypeDescription
appl_noTEXTLinks to applications.appl_no
product_noTEXTLinks to products.product_no — the patent is listed against this specific formulation
patent_noTEXTUSPTO granted patent number. Join to patent_details for title, grant date, and patent_application_number.
patent_expire_dateTEXTPatent expiration date (YYYY-MM-DD). This is the base expiry date. When pediatric_excl = 1, effective protection extends six months beyond this date. The loe_estimates view applies this +6-month adjustment automatically.
use_codeTEXTFDA use code identifying which claimed method of use this patent covers. Join patent_use_codes for the full definition. Empty string when the patent covers the drug substance or formulation (not a specific use).
pediatric_exclINTEGER1 = FDA has granted a 6-month pediatric exclusivity extension on this patent (under 21 U.S.C. § 505A, Best Pharmaceuticals for Children Act). The patent_expire_date is the base date; effective protection ends 6 months later. loe_estimates applies this adjustment automatically. 0 = no pediatric extension.
drug_substance_flagTEXT"Y" = patent claims cover the drug substance (active ingredient). The hardest type to design around.
drug_product_flagTEXT"Y" = patent claims cover the drug product (formulation, composition). A patent can have both flags.
delist_flagTEXT"Y" = patent has been delisted from the Orange Book. May still be active at USPTO. Delisted patents no longer support 30-month stay. Filter delist_flag != 'Y' in most analyses.
ob_submission_dateTEXTDate the NDA holder submitted this patent to FDA for Orange Book listing (YYYY-MM-DD). Distinct from the USPTO filing date and the drug approval date.

Primary key note: The natural key is (appl_no, product_no, patent_no, use_code, pediatric_excl) — the same patent number can appear as two distinct rows (with and without pediatric exclusivity), representing different legal IP positions.

patent_details

USPTO metadata for each listed patent number, sourced from PatentsView. One row per unique patent number.

ColumnTypeDescription
patent_noTEXTUSPTO granted patent number (joins to patents.patent_no)
titleTEXTPatent title from USPTO
grant_dateTEXTDate USPTO granted the patent (YYYY-MM-DD). Typically 20 years before patent_expire_date, less any Patent Term Extension. Distinct from the Orange Book submission date and the drug approval date.
patent_application_numberTEXTUSPTO pre-grant application number (e.g. "17/123456"). Distinct from appl_no (FDA drug application). Use for USPTO PAIR lookups to access prosecution history, office actions, and prior art. Has no relationship to FDA approval.

exclusivity

FDA statutory exclusivity grants. Multiple exclusivity types can apply to the same product simultaneously.

ColumnTypeDescription
appl_noTEXTLinks to applications.appl_no. Note: exclusivity can appear on both NDA and ANDA applications — the PC (Patent Challenge) code appears on ANDAs only.
product_noTEXTLinks to products.product_no
exclusivity_codeTEXTFDA exclusivity code (e.g. "NCE", "ODE", "PC"). Join exclusivity_codes for the plain-English definition.
exclusivity_expiration_dateTEXTDate this exclusivity period expires (YYYY-MM-DD). NOT the grant date. After this date, FDA may approve applications that rely on the protected drug’s safety/efficacy data.

PC code represents a Paragraph IV first-filer exclusivity: The PC code (definition: “PATENT CHALLENGE”) is granted to the first ANDA applicant that files a Paragraph IV certification asserting that a listed Orange Book patent is invalid or will not be infringed. If the challenge succeeds, the first filer earns 180 days of exclusive generic marketing before other ANDAs can launch. PC entries appear on ANDA appl_no values, not NDA values, so they do not appear in loe_estimates (which covers NDAs only). See the Exclusivity Types page for full details.

exclusivity_codes

Reference table. Plain-English definitions for all exclusivity codes used in the exclusivity table.

ColumnTypeDescription
exclusivity_codeTEXTFDA exclusivity code (primary key, joins to exclusivity.exclusivity_code)
definitionTEXTFDA plain-English description of the exclusivity type
-- Look up a code definition
SELECT exclusivity_code, definition FROM exclusivity_codes WHERE exclusivity_code = 'NCE';
-- Returns: NCE | NEW CHEMICAL ENTITY

-- See all codes in the dataset
SELECT exclusivity_code, definition FROM exclusivity_codes ORDER BY exclusivity_code;

patent_use_codes

Reference table. Definitions for FDA use codes assigned to method-of-use patents in the patents table. Over 4,000 codes are currently defined.

ColumnTypeDescription
use_codeTEXTFDA use code identifier (primary key, joins to patents.use_code)
definitionTEXTFull description of the specific method of use this code covers

Use codes are important for Paragraph IV challenges and section viii statements for ANDA applicants, and are meant to ensure ANDA applicants can understand exactly what the claimed patent is covering to avoid infringement of (or challenge) patent protections. See the the Congressional Research Service publication on this topic for more information.

-- All use codes for a specific NDA (021196 = Xyrem / sodium oxybate, a drug with multiple method-of-use patents)
SELECT DISTINCT pt.patent_no, pt.use_code, puc.definition, pt.patent_expire_date
FROM patents pt
LEFT JOIN patent_use_codes puc ON puc.use_code = pt.use_code
WHERE pt.appl_no = '021196'
ORDER BY pt.patent_expire_date;

drug_labels

One row per NDA application, containing the most recent FDA-approved label from Structured Product Labeling (SPL). ANDA (generic) applications have no independent label record — they reference the brand NDA.

ColumnTypeDescription
appl_noTEXTPrimary key. Links to applications.appl_no. One row per NDA.
brand_nameTEXTBrand name from the SPL label
generic_nameTEXTEstablished (INN) name from the SPL label
manufacturer_nameTEXTManufacturer name from the SPL label
routeTEXTRoute of administration from the SPL (e.g. "ORAL"). Format may differ from dosage_form_route in products.
indications_and_usageTEXTFull indication text from the approved FDA label. Often multiple paragraphs with subsections (1.1, 1.2, etc.) for multi-indication drugs. Use LIKE '%keyword%' for text search.
label_effective_dateTEXTDate this version of the label became effective (YYYY-MM-DD). Not the drug approval date — labels are updated many times post-approval.
pharm_class_epcTEXTFDA Established Pharmacological Class (e.g. "HMG-CoA Reductase Inhibitor [EPC]"). NULL when not classified. Present for ~47% of labeled NDAs.
pharm_class_moaTEXTFDA Mechanism of Action class (e.g. "Cyclooxygenase Inhibitors [MoA]"). More granular than EPC. Present for ~40% of labeled NDAs.
rxcuiTEXTRxNorm Concept Unique Identifier(s), comma-separated when multiple. Join to claims databases, formulary files, IQVIA, or CMS Part D. Present for ~60% of labeled NDAs.
uniiTEXTFDA Unique Ingredient Identifier. Stable cross-reference to WHO INN, ClinicalTrials.gov, and FAERS. Present for ~80% of labeled NDAs.
boxed_warningTEXTFull text of the FDA black-box warning. NULL when none exists (approximately 85% of labeled drugs have no boxed warning).
ndcTEXTFirst NDC code in XXXXX-XXXX format. For linking to dispensing data (IQVIA, Symphony Health, CMS Part D). Present for ~75% of labeled NDAs.

Coverage: Approximately 81% of NDAs have indication text. Coverage is ~90%+ for 2010s and later approvals, ~30–50% for pre-2000 approvals. See Drug Labels for the full coverage table and ANDA join pattern.

loe_estimates (derived view)

One row per NDA that currently has at least one active patent or exclusivity listing. This is the analytical core of the dataset — the computed loss-of-exclusivity date per NDA. NDAs that have already passed their reported LOE are excluded; use first_generic_entry for those.

ColumnTypeDescription
appl_noTEXTFDA NDA application number
sponsor_nameTEXTRaw FDA applicant name
rxdl_companyTEXTHarmonized company name (RxDataLab MDM). Empty string when not yet matched.
trade_nameTEXTBrand name from the Reference Listed Drug product. Blank for ~30% of pre-2000 NDAs with no RLD designation.
ingredientTEXTActive ingredient(s) from the Reference Listed Drug product
max_patent_expTEXTLatest effective patent expiry across all patents for this NDA, with +6 months applied per patent where pediatric_excl = 1. NULL when no patents are listed (protected by exclusivity only).
any_ped_extensionINTEGER1 if any patent for this NDA carries a pediatric exclusivity extension
max_excl_expTEXTLatest exclusivity expiration date across all active exclusivity grants for this NDA. NULL when no exclusivity (protected by patents only).
excl_codesTEXTComma-separated list of active exclusivity codes (e.g. "NCE,ODE-488")
reported_loe_dateTEXTThe reported exclusivity cliff. MAX(max_patent_exp, max_excl_exp). Never NULL — the view only includes NDAs where at least one is present.
years_to_loeREALFloating-point years from export generation date to reported_loe_date. Recompute from reported_loe_date using julianday for current-relative values.

Methodology: reported_loe_date = MAX(max_patent_exp, max_excl_exp). Pediatric exclusivity (+6 months) is applied per-patent before taking the MAX — the PED-adjusted expiry for each individual patent is computed first, then MAX is taken across all adjusted patent expiries. See LOE Estimates: Methodology for the full computation and caveats.

first_generic_entry (derived view)

One row per NDA formulation (by ingredient + dosage_form_route). Identifies the earliest ANDA approval date for each brand drug formulation — the observed date when generic competition entered. Historical complement to loe_estimates.

ColumnTypeDescription
nda_appl_noTEXTFDA NDA application number of the brand drug
nda_sponsorTEXTRaw FDA applicant name for the brand NDA
trade_nameTEXTBrand name from the Reference Listed Drug product
ingredientTEXTActive ingredient(s) used for the ANDA-to-NDA match
dosage_form_routeTEXTDosage form and route used for matching (e.g. "TABLET;ORAL")
nda_approval_dateTEXTApproval date of the Reference Listed Drug product for this specific formulation. Use with first_generic_approval_date to compute effective exclusivity period (can be negative — see below).
first_generic_approval_dateTEXTEarliest FDA approval date among all ANDAs matching this ingredient + dosage_form_route
first_anda_appl_noTEXTANDA application number of the first-to-market generic. Join to products, exclusivity, or product_changes for further drill-down.
generic_sponsorTEXTFDA applicant name on the earliest-approved ANDA at the time of that approval. Reflects the original applicant of record; ANDA ownership can change post-approval.
para4_challengeINTEGER1 if any ANDA for this formulation holds PC (Patent Challenge) exclusivity — indicates a Paragraph IV challenge was filed and the first-filer earned 180-day exclusivity. Generic entry likely preceded the NDA’s reported patent expiry. 0 otherwise.
generic_countINTEGERNumber of distinct ANDA applications approved for this ingredient + dosage_form_route

Multiple rows per NDA: A single NDA can yield multiple rows — one per distinct dosage_form_route. Different formulations lose exclusivity independently. To collapse to one row per NDA, use MIN(first_generic_approval_date) GROUP BY nda_appl_no.

Effective exclusivity period: ROUND((julianday(first_generic_approval_date) - julianday(nda_approval_date)) / 365.25, 1) yields years the brand held the market. Can be negative when the NDA was filed for a molecule that already had generics — filter first_generic_approval_date >= nda_approval_date for genuine LOE analysis.


Change Log Tables

Four tables track monthly changes derived from the Orange Book snapshot: product_changes, patent_changes, exclusivity_changes, and application_changes. All share this structural pattern:

ColumnTypeDescription
idINTEGERAuto-increment row identifier. Used as a stable reference key in before/after joins.
eventTEXT"beginning_of_history", "added", "updated", or "removed"
ob_release_dateTEXTFDA Orange Book publication date when this change was observed (YYYY-MM-DD, always the first of the month). Not a drug approval date or market action date.
(all data columns)All columns from the corresponding current-snapshot table, with the same names

beginning_of_history is not a genuine addition. Rows with this event were already present when RxDataLab first imported the dataset (October 2023 in the current export). A drug approved in 1995 that was in the October 2023 snapshot gets beginning_of_history — it is not a “new approval” in 2023. For trend analysis, always filter: WHERE event != 'beginning_of_history'.

updated events show new values only. To recover what changed (the before/after), self-join to the most recent prior event for the same record key using the id column. See Change History: updated events.

product_changes additionally includes an appl_type column ("NDA" or "ANDA") so you can filter to a specific application type without joining to applications.


export_metadata

Single-row table with information about when this export was generated and what history it covers.

ColumnTypeDescription
export_dateTEXTDate this export was generated
ob_release_dateTEXTLatest Orange Book release included
history_startTEXTEarliest monthly release in the change log (baseline date, i.e. beginning_of_history event date)
history_endTEXTLatest monthly release in the change log
SELECT * FROM export_metadata;

Three Application Number Types

Three distinct number types appear in the dataset, and two of them contain the word “application”. Be careful when querying that you are using appropriate values:

IdentifierColumnTablesSystemExample
FDA drug application numberappl_noapplications, products, patents, exclusivity, drug_labels, all change tablesFDA Center for Drug Evaluation and Research (CDER)"021446" (Lipitor NDA)
USPTO granted patent numberpatent_nopatents, patent_details, patent_changesUnited States Patent and Trademark Office"7777050"
USPTO pre-grant patent filing numberpatent_application_numberpatent_details onlyUSPTO Patent Application Information Retrieval (PAIR)"17/123456"

appl_no is the join key across all FDA tables. patent_application_number is a completely separate identifier used only for USPTO prosecution research. When writing joins, always qualify with table aliases (pt.appl_no for the FDA drug application, pd.patent_application_number for the USPTO filing).

Last verified: April 2026