Skip to content

What is OmicIDX?

OmicIDX is a public, read-only index that aggregates biological metadata from a handful of authoritative sources into a single, queryable shape.

  • NCBI SRA — sequencing studies, samples, experiments, runs.
  • NCBI GEO — gene expression series, samples, and platforms.
  • NCBI BioSample and BioProject — descriptive metadata that ties everything together.
  • NCBI PubMed — citation records, used to link publications back to the experimental data they describe.
  • EBI BioSamples — the European complement to NCBI BioSample, daily-partitioned.

See Data sources for source URLs, refresh cadences, and what each contributes.

  • Not a sequence archive. OmicIDX indexes the metadata around sequencing data — the studies, samples, and experiments — not the raw reads. For sequence retrieval, use the upstream archives directly.
  • Not real-time. The index refreshes daily on cron-driven cascades. See Architecture.
  • Not a write API. All endpoints are read-only.

Querying biological metadata across the major archives means dealing with five different XML/SOFT formats, several FTP layouts, inconsistent identifier conventions, and no usable join keys. OmicIDX normalizes those into a single relational model with stable accessions and cross-source links, then exposes it as a REST API and as a single downloadable DuckDB file.

Researchers, bioinformaticians, and data engineers who want to:

  • Search across multiple archives without writing per-archive XML parsers.
  • Build pipelines that resolve study/sample/experiment relationships across sources.
  • Get a downloadable analytical snapshot (DuckDB) for offline work.

The API is open and rate-limited; no API key required. See Rate limits.