RxDataLab's Philosophy

The dirty secret among biotech data vendors is that virtually all of the data we use is publicly available. Behind every polished demo and every platform with a slick UI is the same underlying data:

ClinicalTrials.gov

FDA data files

CMS billing data

USPTO patents

Pubmed

Many vendors treat their public sources like a trade secret to be hidden. We take the opposite view: if you can’t see the source, you can’t trust the data. For us, transparency is the product.

The real value that a data vendor provides is making data maximally accessible and useful for your actual workflows. In the past, most companies answer that question by building ever more complex dashboards and reports. But dashboards are someone else’s guess about what matters to you. Our view is that that guess is increasingly wrong. There is no dashboard that anticipates every question your analyst will ask next week, and no dashboard you can wire directly into your own pipeline.

So RxDataLab takes a different approach. We do the hard work of cleaning, joining, entity harmonization, ontology mapping, and provenance tracking. Then we hand you the result, directly, in whatever form fits how you work:

Download and build. Buy a zip file, unzip it, point your agent at it, build your own view dashboard. Our Orange Book dataset ships today, including the FDA Orange Book and Purple Book joined with USPTO patent data, structured drug labels, and loss of exclusivity estimates, ready to load.
Integrate via API. Pull only what you need, on your schedule, into your own stack.
Use the platform. If you want the data surfaced and pre-structured, app.rxdatalab.com is there.
RxDataLab Builds. We build a custom pipeline or dataset that you can integrate into your workflow.

Every option ships with the same thing underneath: primary-source data with full traceability back to the regulatory filing, and thorough documentation.

Provenance Matters #

AI Agents promise to amplify everything, including errors. A hallucinated drug exclusivity date in an analyst memo is an embarrassment at best. The answer to that problem isn’t bigger models, it is providing auditable, traceable source data and tight context control.

For example, our 8-K signals feed uses deterministic classification models layered with LLM summarization. See something interesting or something that doesn’t quite make sense?

Click to view the source. Provenance and auditability are foundational to how we work.

Every entry in the signals feed links directly to the original SEC filing.

Our datasets ship with an AGENTS.md, and our documentation is written for the person or agent that needs to reproduce and explain the answer. Fundamentally, we know your work is important, and we believe you should be able to check ours.

Who We Are #

RxDataLab is a small, focused company. We work with this data every day, not as a product exercise but because we use it for our own research. We know the caveats in the sources, we know where ClinicalTrials.gov records break, and we see the patterns in how companies behave by bringing it all together.

Anyone can wrap a public API and call it AI-ready. We work with agentic tools in practice and understand what good context looks like at the data layer.

You probably found us through an AI tool or a direct reference. We don’t have a sales team. We exist because we have earned trust, and we prefer to keep it that way.