Building a Form D Database for Biotech Investment Tracking
Learn how to parse SEC Form D filings into a private capital database for biotech and venture research.

The SEC is best known for regulating public markets, but it also oversees private capital raises. Form D is a critical yet underutilized resource for tracking those transactions.
Form D is a notice filing that companies must submit when they sell securities without registering under the Securities Act of 1933 based on an exemption like Rule 504 or 506 of Regulation D. These include nearly all venture rounds, private placements, and many types of convertible debt. Companies must file Form D within 15 days of the first sale and update it as the offering evolves.
Why Form D Matters #
Form D filings can reveal:
- Amount of securities offered and sold
- Type of security (equity, debt, convertible notes, etc.)
- First sale date and amendment activity
- Investor participation and eligibility (accredited or non-accredited investors)
- Company revenue range (seldom disclosed in practice)
These details help us infer round size, funding velocity, and investor type for emerging or pre-public biotech companies. We used Form D filings when analyzing the Eikon Therapeutics pipeline vs marketing.
Limitations and Compliance Gaps #
The primary limitation of relying on Form D is inconsistent filing compliance. Many companies raise capital without filing, despite clear regulatory obligations. This is likely due to lax enforcement, as the common industry understanding was that the SEC would not enforce penalties for Form D violations alone.
That may be changing. In 2024, the SEC brought its first enforcement actions in years for failure to file Form D against an investment advisor, a Fintech company and a fantasy sports company, indicating a new stance that may increase compliance.
In addition to spotty compliance, certain types of fundraising activity fall outside Form D disclosure. Notably, equity compensation (e.g., stock options issued under Rule 701), small offerings under Regulation A, and many SAFE (Simple Agreements for Future Equity) agreements (especially those under $1 million) often go unreported. Convertible notes may be disclosed inconsistently, depending on how issuers classify them. Additionally, Form D is not required for certain foreign issuers or offerings solely to foreign investors under Regulation S. These exemptions introduce blind spots when using Form D data to map out a company’s complete financing history.
In general, it is good to be aware that while Form D filings are useful, they are not a complete picture of the private markets.
Building Your Own Biotech Funding Database #
The SEC conveniently publishes quarterly Form D data sets as structured text starting from Q1 2008. During analysis, we most frequently use:
- SIC Code and company name
- CIK identifier (official SEC company identifier)
- Date of sale and amendment status
- Type of securities offered
- Total offering and amount sold
- Investor counts and type flags
Using the quarterly data sets, design your database schema to mirror the file structure in the zip files. With a few minor updates you can load the data into SQLite and start writing queries right away.
The primary key that links the 6 tables is the filing accession number. To return all the filings for a company, the query is simply:
SELECT
fd.filing_date,
fd.submission_type,
o.is_amendment,
o.total_amount_sold,
o.total_offering_amount
FROM
form_d_submissions fd
JOIN offerings o ON fd.accession_number = o.accession_number
JOIN issuers i ON i.accession_number = fd.accession_number
WHERE
i.entity_name = 'SYNCHRONY INC'
ORDER BY
fd.filing_date
When analyzing the data, it is important to watch out for the amendment flag. Amendments are updates to previous rounds, so you’ll want to take the latest amendment to determine how much was raised in a round.
All told, the full database from Q1 2008 through Q2 2025 is only about a 1GB SQLite file, so a very manageable and powerful tool can sit on your desktop!
Manual Parsing for Timeliness
If it fits your needs, the quarterly data files from SEC are ideal. However, at RxDataLab we track a subset of companies closely and want to know as soon as Form D (or another SEC notice) is filed.
To track filings in near real-time, we fetch all available submissions for each company via the EDGAR API and parse Form D directly.
def get_submissions(cik: str):
url = f"https://data.sec.gov/submissions/CIK{cik.zfill(10)}.json"
res = requests.get(url, headers=HEADERS)
res.raise_for_status()
return res.json()
def build_form_url(cik: str, accession: str, primary_doc: str) -> str:
cik_stripped = cik.lstrip("0")
accession_nodash = accession.replace("-", "")
return f"https://www.sec.gov/Archives/edgar/data/{cik_stripped}/{accession_nodash}/{primary_doc}"
Unfortunately, Form D isn’t structured as XBRL, so you’ll need to handle some complex and poorly specified HTML.
If quarterly data is sufficient for your use case, then the SEC data sets are ideal to use for creating a custom database or research tool to follow the private markets.
Enriching the Filing Database
The base database is a great starting point for analysis. At RxDataLab, we combine data from many sources to enrich filings data from the SEC (such as the Form D tables). For example, we combine data from clinical trials, the SEC, patent filings, FDA filings, hiring trends, and more to get a clear view of what companies are actually doing, rather than what they are saying.
You can see an example in our recent analysis of Eikon Therapeutics and other work on our research page.
For a sector-wide view of the industry we like to use the SEC’s standard industry classification codes (SIC) codes to segment and compare biotech fundraising to other industries over time. For example, the plot below shows biotech funding compared to technology and traditional sectors (e.g., oil & gas, banking).
Of course, SIC codes don’t provide a complete picture, especially when everyone wants to be a tech company and in particular with the “techbio” trend pushed by the VC’s in biotech. Tempus AI, for example, uses SIC 7370, for “Services-Computer Programming, Data Processing, Etc.” That means it wouldn’t be properly categorized in this plot as a Biotech/Pharma company (while you can make an argument that is accurate, I think it belongs with biotech).
So while the base database is a great starting point, we have to use industry knowledge and other data sources to supplement and expand the data.
Conclusion #
Form D provides a valuable, if imperfect, view into U.S. private capital markets. It’s especially helpful for biotech, where companies raise multiple rounds before going public.
By building a personal Form D database, you can avoid relying entirely on opaque and expensive private datasets. At RxDataLab, we use this data—alongside clinical trials, patents, and job listings—to form a holistic picture of company growth and investor behavior.
Interested in building your own database?
We’re happy to share the schema and code examples to help you get started. Email us at [email protected] or subscribe to our newsletter to get early access to tutorials and downloadable resources.
Important References and Regulations
- Regulation D includes the common exemptions 504 and 506
- Section 4(a)(5) of the Securities Act is an uncommon exemption, see text
- Recent SEC Enforcement of Regulation D violations against investment advisor GRID 202 LLC (DBA Re-Envision Wealth), Pipe Technologies Inc. (a Fintech company), and Underdog Sports Holdings, Inc. (fantasy sports)
RxDataLab Intelligence Brief
Weekly analysis of biotech company behavior: clinical trials, fundraising, job openings, and regulatory filings. We focus on what companies do, not what they say.
- Based on job posts, trials, patents, and filings
- Targeted toward BD, competitive intel, and recruiting
- Concise and curated—one short update per week