graph TB A[pristine-seas] --> B[Method Datasets] A --> C[Reference Datasets] B --> D[uvs] B --> E[sbruv] B --> F[pbruv] B --> G[edna] B --> H[sub] B --> I[...] C --> J[expeditions] C --> K[taxonomy] C --> L[lookup] style A fill:#004165,stroke:#fff,color:#fff style C fill:#006C9B,stroke:#fff,color:#fff style B fill:#006C9B,stroke:#fff,color:#fff
Database Architecture
The Pristine Seas Science Database is a modular, analysis-ready system for organizing ecological data collected across more than 40 expeditions worldwide. Its design balances flexibility — to accommodate diverse survey methods — and consistency — to support clean integration, querying, and long-term analysis.
Core Structure
All data are hosted in Google BigQuery under the project pristine-seas
, organized into two major dataset groups:
- Method Datasets: One for each survey technique (e.g., UVS, BRUVS, eDNA), with standardized tables for sites, stations, observations, and summaries
- Reference Datasets: Shared taxonomic, spatial, and lookup tables that provide a unified backbone across all methods
Reference Datasets
These datasets provide a shared scaffold for spatial joins, species trait integration, QA/QC, and consistent metadata across all methods:
📂 expeditions/ — Spatial backbone & metadata for all field campaigns
- 📄
info
— Expedition registry: dates, vessels, teams, partners - 📄
sites
— Master site registry across all survey methods
📂 taxonomy/ — Standardized species reference data
- 📄
fish
— Fish taxonomy, traits, and life history - 📄
benthos
— Benthic organisms: corals, algae, sponges - 📄
inverts
— Mobile and sessile invertebrates
📂 lookup/ — Controlled vocabularies & reference tables
- 📄
habitat_types
— Standardized habitat classifications - 📄
exposure_types
— Wave exposure categories
Method Datasets
Each method has its own dataset (e.g., uvs
, pbruv
, sub
, edna
) that stores raw and summarized outputs from a given type of field survey. These follow a common structure, with tables for:
sites
— Survey events or deployments (e.g., dive, deployment, survey)stations
— Sampling units within a site (e.g., depth strata, rigs)observations
— Primary ecological or environmental recordssummaries
— Aggregated station-level metrics
📂 uvs/ — Underwater Visual Survey (SCUBA-based)
📄
sites
— Dive site metadata: location, conditions, protocols used📄
stations
— Depth-stratified sampling units within each siteFish Belt Transects
- 📄
blt_stations
— Fish belt transect metadata: divers, effort, conditions - 📄
blt_observations
— Raw fish counts by species, size, and transect - 📄
blt_biomass_by_taxa
— Station-level fish biomass summaries
- 📄
Benthic Surveys
- 📄
lpi_stations
— Benthic line-point intercept transect metadata - 📄
lpi_counts
— Raw benthic point data by morphotaxon - 📄
lpi_cover_by_taxa
— Station-level benthic percent cover
- 📄
Other Protocols
- 📄
inverts_stations
— Mobile invertebrate transect metadata - 📄
inverts_counts
— Raw invertebrate observations and sizes - 📄
recruits_stations
— Coral recruitment quadrat metadata - 📄
recruits_counts
— Juvenile coral counts by genus
- 📄
📂 sbruv/ — Seabed Baited Remote Underwater Video
- 📄
stations
— Camera deployment metadata: location, depth, bait - 📄
maxn_by_taxa
— Fish MaxN summaries by species
📂 pbruv/ — Pelagic BRUVS (mid-water camera arrays)
- 📄
stations
— String deployment metadata - 📄
rigs
— Individual rig positions and drift - 📄
maxn_by_taxa
— Fish MaxN by species and rig
📂 edna/ — Environmental DNA sampling
- 📄
sites
— Collection location metadata - 📄
stations
— Depth-stratified water samples - 📄
detections
— Species detections and read counts
📂 sub/ — Submersible surveys
- 📄
sites
— Dive metadata: pilot, observers, conditions - 📄
stations
— Transect segments within dives - 📄
observations
— Species observations along transects
📂 dscm/ — Deep Sea Camera deployments
- 📄
stations
— Camera drop metadata: depth, duration - 📄
maxn_by_taxa
— Fish MaxN summaries
📂 birds/ — Seabird transect surveys
- 📄
stations
— Transect metadata: vessel vs land-based - 📄
observations
— Bird sightings with counts and behavior
Data Model
All data in the Pristine Seas Science Database are structured around a shared, hierarchical spatial model. This enables standardized grouping, filtering, and integration across methods, expeditions, and geographies.
Expedition
A defined field campaign
e.g.,COL_2024
Region
Broad geographic or administrative area
e.g., Murat, Chocó, TemotuSubregion
Intermediate unit like an island, atoll, gulf, or reef complex
e.g., Navigator bank, Mussau, Golfo de Tribugá, UtupuaLocality
Local named feature such as a community, village, islet
e.g., Lolieng, TasitelSite
A specific location within the subregion where a survey method (e.g., UVS, BRUVS, eDNA) was used
e.g.,COL_2024_uvs_001
- Station
A discrete sampling unit within a site, corresponding to a specific depth or replicate
e.g.,COL_2024_uvs_001_10m
- Station
To ensure consistency, traceability, and joinability across all datasets, Pristine Seas uses structured identifiers for sites and stations.
Expedition (exp_id
): ISO3_YEAR
- Examples:
FJI_2025
,COL_2022
,
Site (ps_site_id
): exp_id_method_###
Always use 3-digit padding for site numbers.
- Examples:
FJI_2025_uvs_001
,FJI_2025_pbruv_003
,FJI_2025_sub_014
Station (ps_station_id
): ps_site_id_suffix
Depth and replicate suffixes use fixed formats.
- Examples:
FJI_2025_uvs_001_10m
– Underwater visual survey at 10m depth
FJI_2025_pbruv_003_r102
– rig 102 of pelagic BRUV deployment 3
FJI_2025_dscm_006_300m
– deep sea camera at 300m
Style & Conventions
Naming Standards
All database objects follow consistent patterns:
Tables & Fields
snake_case
throughout • lowercase except proper nouns • no spaces or special characters
Examples: blt_observations
, depth_m
, accepted_name
Identifiers
Self-documenting structure • sortable • joinable
Examples: FJI_2025_uvs_001_10m
, ps_station_id
Data Standards
Units
Metric (SI) with suffix • depth_m
• weight_g
• temperature_c
• biomass_gm2
Formats
Dates: YYYY-MM-DD
• Times: HH:MM:SS
• Coordinates: decimal degrees, WGS84
Types
Booleans: TRUE/FALSE
• Missing: NULL
Structure Principles
Long Format
One observation per row • One variable per column
Keys & References
Primary: ps_site_id
, ps_station_id
• Foreign keys explicit • Natural hierarchy
Modularity
Method-specific structure • Shared references • Clear boundaries