Pristine Seas Science Database

The Pristine Seas Science Database is a centralized system for ecological data collected across more than a decade of scientific expeditions led by National Geographic Pristine Seas. It enables high-integrity, reproducible research on marine biodiversity and informs global ocean conservation policy.


Why This Matters

Conservation begins with knowledge. The Pristine Seas Science Database is a globally unique resource that answers foundational questions about ocean health, biodiversity, and ecosystem change.

Spanning all major ocean basins across 40+ expeditions, it brings together standardized data from Arctic fjords to tropical reefs — from coastal bays to deep-sea trenches and pelagic zones. Its strength lies in breadth and integration: seabird surveys to submersible dives, eDNA to benthic cover — all unified by shared spatial and taxonomic frameworks.

The system enables:

  • Fast, reproducible analysis across sites, regions, and years
  • Cross-method synthesis through modular yet consistent structure
  • Scalable science that informs conservation, policy, and decision-making

Key Features

Modular by Method

Each survey protocol — from reef fish and benthic cover to eDNA, BRUVS, and submersibles — maintains its own standardized schema while integrating seamlessly with the whole.

Spatially Anchored

Built on a hierarchical spatial model: expedition → region → subregion → site → station. This structure enables robust spatial integration and filtering across all datasets.

Taxonomically Standardized

Centralized taxonomy with harmonized species names, ecological traits, and functional groups. Based on WoRMS with expert curation for regional accuracy.

Analysis-Ready

Tidy-format tables, clear join keys, and native organization in Google BigQuery make the system efficient for large-scale ecological analyses.

Built for Collaboration

Transparent, well-documented, and modular. Designed for reuse, extension, and shared scientific workflows across institutions.


Getting Started

  1. Browse the documentation using the sidebar navigation
  2. Start with Architecture to understand the system structure
  3. Explore Method Datasets for specific survey protocols
  4. Review Taxonomy for species standardization
  5. Query data via the BigQuery Console

Common Applications

  • Species richness and diversity analysis
  • Biomass assessments by trophic group
  • Temporal trends in coral cover
  • Multi-method data integration
  • Conservation priority identification

FAIR Data Principles

The database adheres to FAIR data principles, ensuring all records are:

Findable
Unique identifiers (ps_site_id, aphia_id) and rich metadata enable discovery and indexing

Accessible
Hosted in Google BigQuery with open protocols and tools for querying and download

Interoperable
Tidy data principles, SI units, ISO 8601 dates, and controlled vocabularies

Reusable
Comprehensive documentation, versioning, and modular design support transparency and replication


Technical Foundation

The database is built on:

  • Google BigQuery for scalable data storage and querying
  • R/RStudio for data processing and analysis
  • Quarto for documentation
  • GitHub for version control
  • WoRMS API for taxonomic standardization

Learn More

For questions about the database, contact the Pristine Seas data team or explore our GitHub repository.