Database Architecture

The Pristine Seas Science Database is a modular, analysis-ready system for organizing ecological data collected across more than 40 expeditions worldwide. Its design balances flexibility — to accommodate diverse survey methods — and consistency — to support clean integration, querying, and long-term analysis.


Core Structure

All data are hosted in Google BigQuery under the project pristine-seas, organized into two major dataset groups:

  • Method Datasets: One for each survey technique (e.g., UVS, BRUVS, eDNA), with standardized tables for sites, stations, observations, and summaries
  • Reference Datasets: Shared taxonomic, spatial, and lookup tables that provide a unified backbone across all methods

graph TB
    A[pristine-seas] --> B[Method Datasets]
    A --> C[Reference Datasets]
    
    B --> D[uvs]
    B --> E[sbruv]
    B --> F[pbruv]
    B --> G[edna]
    B --> H[sub]
    B --> I[...]
    
    C --> J[expeditions]
    C --> K[taxonomy]
    C --> L[lookup]
    
    style A fill:#004165,stroke:#fff,color:#fff
    style C fill:#006C9B,stroke:#fff,color:#fff
    style B fill:#006C9B,stroke:#fff,color:#fff

High-level database organization


Reference Datasets

These datasets provide a shared scaffold for spatial joins, species trait integration, QA/QC, and consistent metadata across all methods:

📂 expeditions/Spatial backbone & metadata for all field campaigns

  • 📄 info — Expedition registry: dates, vessels, teams, partners
  • 📄 sites — Master site registry across all survey methods

📂 taxonomy/Standardized species reference data

  • 📄 fish — Fish taxonomy, traits, and life history
  • 📄 benthos — Benthic organisms: corals, algae, sponges
  • 📄 inverts — Mobile and sessile invertebrates

📂 lookup/Controlled vocabularies & reference tables

  • 📄 habitat_types — Standardized habitat classifications
  • 📄 exposure_types — Wave exposure categories

Method Datasets

Each method has its own dataset (e.g., uvs, pbruv, sub, edna) that stores raw and summarized outputs from a given type of field survey. These follow a common structure, with tables for:

  • sites — Survey events or deployments (e.g., dive, deployment, survey)
  • stations — Sampling units within a site (e.g., depth strata, rigs)
  • observations — Primary ecological or environmental records
  • summaries — Aggregated station-level metrics

📂 uvs/Underwater Visual Survey (SCUBA-based)

  • 📄 sites — Dive site metadata: location, conditions, protocols used

  • 📄 stations — Depth-stratified sampling units within each site

  • Fish Belt Transects

    • 📄 blt_stations — Fish belt transect metadata: divers, effort, conditions
    • 📄 blt_observations — Raw fish counts by species, size, and transect
    • 📄 blt_biomass_by_taxa — Station-level fish biomass summaries
  • Benthic Surveys

    • 📄 lpi_stations — Benthic line-point intercept transect metadata
    • 📄 lpi_counts — Raw benthic point data by morphotaxon
    • 📄 lpi_cover_by_taxa — Station-level benthic percent cover
  • Other Protocols

    • 📄 inverts_stations — Mobile invertebrate transect metadata
    • 📄 inverts_counts — Raw invertebrate observations and sizes
    • 📄 recruits_stations — Coral recruitment quadrat metadata
    • 📄 recruits_counts — Juvenile coral counts by genus

📂 sbruv/Seabed Baited Remote Underwater Video

  • 📄 stations — Camera deployment metadata: location, depth, bait
  • 📄 maxn_by_taxa — Fish MaxN summaries by species

📂 pbruv/Pelagic BRUVS (mid-water camera arrays)

  • 📄 stations — String deployment metadata
  • 📄 rigs — Individual rig positions and drift
  • 📄 maxn_by_taxa — Fish MaxN by species and rig

📂 edna/Environmental DNA sampling

  • 📄 sites — Collection location metadata
  • 📄 stations — Depth-stratified water samples
  • 📄 detections — Species detections and read counts

📂 sub/Submersible surveys

  • 📄 sites — Dive metadata: pilot, observers, conditions
  • 📄 stations — Transect segments within dives
  • 📄 observations — Species observations along transects

📂 dscm/Deep Sea Camera deployments

  • 📄 stations — Camera drop metadata: depth, duration
  • 📄 maxn_by_taxa — Fish MaxN summaries

📂 birds/Seabird transect surveys

  • 📄 stations — Transect metadata: vessel vs land-based
  • 📄 observations — Bird sightings with counts and behavior

Data Model

All data in the Pristine Seas Science Database are structured around a shared, hierarchical spatial model. This enables standardized grouping, filtering, and integration across methods, expeditions, and geographies.

  • Expedition
    A defined field campaign
    e.g., COL_2024

    • Region
      Broad geographic or administrative area
      e.g., Murat, Chocó, Temotu

      • Subregion
        Intermediate unit like an island, atoll, gulf, or reef complex
        e.g., Navigator bank, Mussau, Golfo de Tribugá, Utupua

        • Locality
          Local named feature such as a community, village, islet
          e.g., Lolieng, Tasitel

          • Site
            A specific location within the subregion where a survey method (e.g., UVS, BRUVS, eDNA) was used
            e.g., COL_2024_uvs_001

            • Station
              A discrete sampling unit within a site, corresponding to a specific depth or replicate
              e.g., COL_2024_uvs_001_10m
Standard Identifiers

To ensure consistency, traceability, and joinability across all datasets, Pristine Seas uses structured identifiers for sites and stations.

Expedition (exp_id): ISO3_YEAR

  • Examples: FJI_2025, COL_2022,

Site (ps_site_id): exp_id_method_###

Always use 3-digit padding for site numbers.

  • Examples: FJI_2025_uvs_001, FJI_2025_pbruv_003, FJI_2025_sub_014

Station (ps_station_id): ps_site_id_suffix

Depth and replicate suffixes use fixed formats.

  • Examples:
    • FJI_2025_uvs_001_10m – Underwater visual survey at 10m depth
    • FJI_2025_pbruv_003_r102 – rig 102 of pelagic BRUV deployment 3
    • FJI_2025_dscm_006_300m – deep sea camera at 300m

Style & Conventions

Naming Standards

All database objects follow consistent patterns:

Tables & Fields
snake_case throughout • lowercase except proper nouns • no spaces or special characters
Examples: blt_observations, depth_m, accepted_name

Identifiers
Self-documenting structure • sortable • joinable
Examples: FJI_2025_uvs_001_10m, ps_station_id

Data Standards

Units
Metric (SI) with suffix • depth_mweight_gtemperature_cbiomass_gm2

Formats
Dates: YYYY-MM-DD • Times: HH:MM:SS • Coordinates: decimal degrees, WGS84

Types
Booleans: TRUE/FALSE • Missing: NULL

Structure Principles

Long Format
One observation per row • One variable per column

Keys & References
Primary: ps_site_id, ps_station_id • Foreign keys explicit • Natural hierarchy

Modularity
Method-specific structure • Shared references • Clear boundaries