graph TB
A[pristine-seas] --> B[Method Datasets]
A --> C[Reference Datasets]
B --> D[uvs]
B --> E[BRUVS_seabed]
B --> F[BRUVS_pelagic]
B --> G[edna]
B --> H[sub]
B --> I[...]
C --> J[expeditions]
C --> K[taxa_info]
C --> L[lookup]
style A fill:#004165,stroke:#fff,color:#fff
style C fill:#006C9B,stroke:#fff,color:#fff
style B fill:#006C9B,stroke:#fff,color:#fff
Database Architecture
The Pristine Seas Science Database is a modular, analysis-ready system for organizing ecological data collected across more than 40 expeditions worldwide. Its design balances flexibility—to accommodate diverse survey methods—with consistency—to support clean integration, querying, and long-term analysis.
This architecture prioritizes:
- Traceability: Every observation links back to spatial and taxonomic references
- Flexibility: New methods integrate seamlessly without restructuring existing data
- Analysis-Ready: Pre-computed summaries enable fast queries while preserving raw data for reproducibility
Core Structure
All data are hosted in Google BigQuery under the project pristine-seas, organized into two major dataset groups:
- Method Datasets: One for each survey technique (e.g., UVS, BRUVS, eDNA), with standardized tables for sites, stations, observations, and summaries
- Reference Datasets: Shared taxonomic, spatial, and lookup tables that provide a unified backbone across all methods
Reference Datasets
These datasets provide a shared scaffold for spatial joins, species trait integration, QA/QC, and consistent metadata across all methods:
📂 expeditions/ — Spatial backbone & metadata for all field campaigns
- 📄
info— Expedition metadata: dates, regions, vessels, participants, partners - 📄
sites— Master site registry across all methods (lat/lon, habitat, depth range)
📂 taxa_info/ — Standardized species reference data
- 📄
fish— Fish taxonomy (WoRMS), traits (length-weight), trophic groups - 📄
benthos— Benthic organisms: hard/soft corals, algae, sponges, CCA - 📄
inverts— Mobile and sessile invertebrates: echinoderms, mollusks, crustaceans
📂 lookup/ — Controlled vocabularies and reference tables
- 📄
researchers— Team members: roles, affiliations, expedition participation
Method Datasets
Each method has its own dataset (e.g., uvs, pbruv, sub, edna) that stores raw and summarized outputs from a given type of field survey. These follow a common structure, with tables for:
sites— Survey events or deployments (e.g., dive, deployment, survey)stations— Sampling units within a site (e.g., depth strata, rigs)observations— Primary ecological or environmental recordssummaries— Aggregated station-level metrics
Common Method Dataset Structure
While each method has unique requirements, all follow this general pattern:
Core Tables:
{method}.sites— Deployment-level metadata (when/where){method}.stations— Sampling units (replicates, strata, depth){method}.*_observations— Raw ecological records{method}.*_by_station_taxa— Pre-aggregated summaries
Key Principles:
- Observations are immutable: Raw data preserved as collected after QA/QC
- Stations link everything: All analyses join through
ps_station_id - Summaries are derived: Calculated from observations, can be regenerated
- Taxonomy is external: Most species traits live in
taxa_info.*, not method tables
📂 uvs/ — Underwater Visual Survey (SCUBA-based)
- 📄
sites— Dive site metadata: location, conditions, protocols used - 📄
stations_unified— Unified view of all UVS stations across protocols
Fish Belt Transects
- 📄
blt_observations— Raw fish counts by species, size, and transect - 📄
blt_stations— Fish station metadata: divers, effort, summaries - 📄
blt_biomass_by_station_taxa— Station-level fish biomass summaries by taxa
Benthic Surveys (LPI)
- 📄
lpi_observations— Raw benthic point data by morphotaxon - 📄
lpi_stations— Benthic transect metadata: divers, effort, summaries - 📄
lpi_cover_by_station_taxa— Station-level benthic percent cover by taxa
Invertebrate Surveys
- 📄
inverts_observations— Raw invertebrate counts and sizes - 📄
inverts_stations— Invertebrate transect metadata: divers, effort, summaries - 📄
inverts_density_by_station_taxa— Station-level invertebrate density
Coral Recruitment
- 📄
recruits_observations— Juvenile coral counts by genus - 📄
recruits_stations— Coral recruitment quadrat metadata: divers, effort, summaries - 📄
recruits_density_by_station_taxa— Station-level coral recruit density by taxa
📂 BRUVS_seabed/ — Seabed Baited Remote Underwater Video
- 📄
stations— Camera deployment metadata: location, depth, bait - 📄
maxn_by_station_taxa— Fish MaxN summaries by species
📂 BRUVS_pelagic/ — Pelagic BRUVS (mid-water camera arrays)
- 📄
rigs— Individual rig metadata (e.g. positions and drift) - 📄
stations— String deployment metadata - 📄
maxn_by_station_taxa— Fish MaxN by species and station
📂 edna/ — Environmental DNA sampling
- 📄
sites— Collection location metadata - 📄
stations— Depth-stratified water samples - 📄
detections_by_station_taxa— Species detections and read counts
📂 sub/ — Submersible surveys
- 📄
sites— Dive metadata: pilot, observers, conditions - 📄
stations— Transect segments within dives - 📄
observations— Species observations
📂 dscm/ — Deep Sea Camera deployments
- 📄
stations— Camera drop metadata: depth, duration - 📄
maxn_by_station_taxa— MaxN summaries
📂 birds/ — Seabird transect surveys
- 📄
stations— Transect metadata: vessel vs land-based - 📄
observations— Bird sightings with counts and behavior
Data Model
All data in the Pristine Seas Science Database are structured around a shared, hierarchical spatial model. This enables standardized grouping, filtering, and integration across methods, expeditions, and geographies.
Expedition
A defined field campaign
e.g.,COL_2024Region
Broad geographic or administrative area
e.g., Murat, Chocó, TemotuSubregion
Intermediate unit like an island, atoll, gulf, or reef complex
e.g., Navigator Bank, Mussau, Golfo de Tribugá, UtupuaLocality
Local named feature such as a community, village, islet
e.g., Lolieng, TasitelSite
A specific location within the subregion where a survey method (e.g., UVS, BRUVS, eDNA) was deployed
e.g.,COL_2024_uvs_001- Station
A discrete sampling unit within a site, corresponding to a specific depth or replicate
e.g.,COL_2024_uvs_001_10m
- Station
Key Relationships:
- One expedition → many sites (across methods)
- One site → many stations (across depths/replicates)
- Spatial context cascades: stations inherit their site’s region, subregion, and locality
- Method independence: UVS, BRUVS, and eDNA can occur at the same geographic location as separate sites/stations
To ensure consistency, traceability, and joinability across all datasets, Pristine Seas uses structured identifiers for sites and stations.
Expedition (exp_id): ISO3_YEAR
- Examples:
FJI_2025,COL_2022,PNG_2024
Site (ps_site_id): exp_id_method_###
Always use 3-digit padding for site numbers.
- Examples:
FJI_2025_uvs_001,FJI_2025_pbruv_003,FJI_2025_sub_014
Station (ps_station_id): ps_site_id_suffix
Depth and replicate suffixes use fixed formats.
Suffix Formats by Method:
UVS (depth-stratified):
_{depth}m
Examples:FJI_2025_uvs_001_5m,FJI_2025_uvs_001_10m,FJI_2025_uvs_001_20mBRUVS pelagic (rig-based):
_r{rig_number}
Examples:FJI_2025_pbruv_003_r101,FJI_2025_pbruv_003_r102Deep cameras:
_{depth}m
Examples:FJI_2025_dscm_006_300m,FJI_2025_dscm_006_500mSubmersibles (transects):
_t{transect_number}
Examples:FJI_2025_sub_014_t01,FJI_2025_sub_014_t02
Integration & Joins
Spatial Joins
All method datasets link to the spatial backbone:
{method}.sites.ps_site_id→expeditions.sites.ps_site_id- Inherited fields:
exp_id,region,subregion,locality
Taxonomic Joins
Observations link to trait databases via AphiaID:
{method}.*_observations.accepted_aphia_id→taxa_info.fish.accepted_aphia_id{method}.*_observations.accepted_aphia_id→taxa_info.benthos.accepted_aphia_id{method}.*_observations.accepted_aphia_id→taxa_info.inverts.accepted_aphia_id
Effort Standardization
Station-level summaries include survey effort for density calculations:
- Fish: biomass per m² (
biomass_gm2) - Benthos: percent cover (
pct_cover) - Inverts: density per m² (
density_m2) - BRUVS: MaxN (standardized count)
Style & Conventions
Naming Standards
All database objects follow consistent patterns:
Tables & Fields
snake_case throughout • lowercase except proper nouns • no spaces or special characters
Examples: blt_observations, depth_m, accepted_name
Identifiers
Self-documenting structure • sortable • joinable
Examples: FJI_2025_uvs_001_10m, ps_station_id
Special Cases:
- Acronyms: Stay uppercase when widely recognized (e.g.,
CCA,DNA,MPA) - Geographic names: Preserve proper capitalization (e.g.,
Golfo de Tribugá) - Species names: Use lowercase except genus initial (e.g.,
Acropora cervicornis) - Boolean prefixes: Use
is_*,has_*,in_*(e.g.,in_mpa,has_blt)
Data Standards
Units
Metric (SI) with suffix • depth_m • weight_g • temperature_c • biomass_gm2
Formats
Dates: YYYY-MM-DD • Times: HH:MM:SS • Coordinates: decimal degrees, WGS84
Types
Booleans: TRUE/FALSE • Missing: NULL
Structure Principles
Long Format
One observation per row • One variable per column
Keys & References
Primary: ps_site_id, ps_station_id • Foreign keys explicit • Natural hierarchy
Modularity
Method-specific structure • Shared references • Clear boundaries