Introduction to divDyn - Phanerozoic diversity

Ádám T. Kocsis

FAU GeoZentrum Nordbayern

Note!

Note

This is a reveal.js presentation written in Quarto. The presentation is optimized for the resolution of 1920 x 1080 pixels.

Background

What is this?

Short for Diversity Dynamics
Background in island biogeography (MacArthur and Wilson 1963)
Diversity, extinction and origination metrics
Variables time series - to be analyzed further!

Alroy (2008) - PNAS

Diversity over time?

Overall increase?
Saturation?
Equilibrial dynamics?

J. John Sepkoski Jr. (1978)

The divDyn project

Estimating metrics has been a problem - everybody had different scripts
Implementing sampling standardization
Wrote basics for my PhD

The project

R hexsticker

https://github.com/divDyn

Getting started

Resources

Paper: Kocsis et al. (2019)
CRAN: https://cran.r-project.org/package=divDyn
Webpage: https://divdyn.github.io/r-package/
Handout: library(divDyn); vignette("handout")
Evolv-ED blog: https://evolv-ed.net (in-development)
Phanerozoic rates (SOM): https://github.com/divDyn/ddPhanero

Installing and loading

Package (v0.8.2) can be installed from CRAN.

install.packages("divDyn")

Attach the package with:

library(divDyn)

Binning

divDyn operates with discrete-time
Requires temporal/stratigraphic binning
One value/time bin
A timescale
Always the first step of the analysis!

Example time scales

47-bin, 10my time scale

data(tens, package="divDyn")
str(tens)

'data.frame':   49 obs. of  9 variables:
 $ X10    : chr  "Cambrian 1" "Cambrian 2" "Cambrian 3" "Cambrian 4" ...
 $ Ocean  : chr  "ar1" "cc1" "cc1" "cc1" ...
 $ Ocean2 : chr  "ar1" "" "" "" ...
 $ Climate: chr  "w" "w" "w" "w" ...
 $ bottom : num  542 532 513 501 490 ...
 $ mid    : num  537 522 507 495 483 ...
 $ top    : num  532 513 501 490 479 ...
 $ dur    : num  10 19 12 11 11.4 ...
 $ ten    : int  1 2 3 4 5 6 7 8 9 10 ...

Old PBDB logo (Sydney)

Example time scales

95-bin stages

data(stages, package="divDyn")
str(stages)

'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...

Based on: Gradstein, Ogg, and Schmitz (2020)

Required data - Interface conventions

x: occurrence data.frame to be analyzed
bin: column name of bin numbers (integer)
tax: column name of taxon entries (character)
coll: column name of sampling units (optional, character)
age: column name of numeric ages (optional, numeric)

Phanerozoic Marine Animals

About

The example was used as an application example for the divDyn paper
Was supposed to have a similar scope as the results of (J. John Sepkoski 1997) and (Alroy 2010)
The analyses are done at the genus level!

Stratigraphic assignments
Sampling patterns
Richness estimators
Taxonomic rates
Subsampling

More info: https://github.com/divDyn/ddPhanero

Downloading the data

We need almost the whole PBDB.

API call to produce downlad:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,genus,subgenus,coll,coords,loc,paleoloc,strat,stratext,lith,env,ref,crmod,timebins,timecompare,refattr,entname,attr,ident,img,plant,abund,ecospace,taphonomy,etbasis,pres,prot,lithext,geo,methods,resgroup,refattr,ent

takes about 30 minutes…
2GB+ .csv file (read.csv())
or…

The chronosphere

API for research data acquisition (analytical code!)
Versioned data entries
PBDB from August 19, 2025

Install from CRAN

install.packages("chronosphere")

Load package

The chronosphere

Library to access static data object
Instantiated object = datafile (remote URL) + import code (dependencies)
Time series of data products

Downloading the PBDB

Actual download:

# Download data (August 18, 2025)
dat <- chronosphere::fetch("pbdb", ver="20250819")

# The most up-to-date version
# dat <- chronosphere::fetch("pbdb")

# to a given directory 
# dat <- chronosphere::fetch("pbdb", ver="20250819", datadir="<yourdir>")

Takes about 1 minute, ~130MB download.

After done, check if structure is ok!

str(dat)

'data.frame':   1657814 obs. of  152 variables:
 $ occurrence_no      : int  1 2 3 4 5 7 8 9 10 11 ...
 $ record_type        : chr  "occ" "occ" "occ" "occ" ...
 $ reid_no            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ flags              : chr  "" "" "" "" ...
 $ collection_no      : int  1 1 1 2 3 4 4 1 5 5 ...
 $ identified_name    : chr  "Australosutura llanoensis" "Carbonocoryphe planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ identified_rank    : chr  "species" "species" "species" "species" ...
 $ identified_no      : int  349412 349526 349420 349411 349411 349411 349420 349411 349411 349668 ...
 $ difference         : chr  "" "recombined as" "" "" ...
 $ accepted_name      : chr  "Australosutura llanoensis" "Phillibole planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ accepted_attr      : logi  NA NA NA NA NA NA ...
 $ accepted_rank      : chr  "species" "species" "species" "species" ...
 $ accepted_no        : int  349412 349526 349418 349410 349410 349410 349418 349410 349410 349664 ...
 $ early_interval     : chr  "Ivorian" "Ivorian" "Ivorian" "Ivorian" ...
 $ late_interval      : chr  "" "" "" "" ...
 $ max_ma             : num  354 354 354 354 354 ...
 $ min_ma             : num  347 347 347 347 347 ...
 $ ref_author         : chr  "Brezinski 1998" "Brezinski 1998" "Brezinski 1998" "Brezinski 1998" ...
 $ ref_pubyr          : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
 $ reference_no       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ phylum             : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
 $ phylum_no          : chr  "18891" "18891" "18891" "18891" ...
 $ class              : chr  "Trilobita" "Trilobita" "Trilobita" "Trilobita" ...
 $ class_no           : chr  "19100" "19100" "19100" "19100" ...
 $ order              : chr  "Proetida" "Proetida" "Proetida" "Proetida" ...
 $ order_no           : chr  "21062" "21062" "21062" "21062" ...
 $ family             : chr  "Brachymetopidae" "Phillipsiidae" "Phillipsiidae" "Proetidae" ...
 $ family_no          : chr  "56732" "85866" "85866" "56726" ...
 $ genus              : chr  "Australosutura" "Archegonus (Phillibole)" "Thigriffides" "Pudoproetus" ...
 $ genus_no           : chr  "21084" "21075" "21387" "177081" ...
 $ subgenus_no        : int  NA 21306 NA NA NA NA NA NA NA NA ...
 $ collection_name    : chr  "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9048" ...
 $ collection_subset  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ collection_aka     : chr  "" "" "" "" ...
 $ lng                : num  -99 -99 -99 -98.1 -98.1 ...
 $ lat                : num  31.2 31.2 31.2 31 31 ...
 $ cc                 : chr  "US" "US" "US" "US" ...
 $ state              : chr  "Texas" "Texas" "Texas" "Texas" ...
 $ county             : chr  "San Saba" "San Saba" "San Saba" "" ...
 $ latlng_basis       : chr  "based on nearby landmark" "based on nearby landmark" "based on nearby landmark" "" ...
 $ latlng_precision   : chr  "2" "2" "2" "1" ...
 $ altitude_value     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ altitude_unit      : chr  "" "" "" "" ...
 $ geogscale          : chr  "outcrop" "outcrop" "outcrop" "" ...
 $ geogcomments       : chr  "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "Near San Saba, TX" ...
 $ paleomodel         : chr  "gplates" "gplates" "gplates" "gplates" ...
 $ geoplate           : chr  "101" "101" "101" "101" ...
 $ paleoage           : chr  "mid" "mid" "mid" "mid" ...
 $ paleolng           : num  -65.7 -65.7 -65.7 -65.6 -65.6 ...
 $ paleolat           : num  -26.7 -26.7 -26.7 -27.5 -27.5 ...
 $ formation          : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ geological_group   : chr  "" "" "" "" ...
 $ member             : chr  "" "" "" "" ...
 $ formation.1        : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ geological_group.1 : chr  "" "" "" "" ...
 $ member.1           : chr  "" "" "" "" ...
 $ stratscale         : chr  "bed" "bed" "bed" "" ...
 $ zone               : chr  "" "" "" "S. isostichia - crenulata through S. anchoralis - latus" ...
 $ zone_type          : chr  "" "" "" "" ...
 $ localsection       : chr  "" "" "" "" ...
 $ localbed           : chr  "" "" "" "" ...
 $ localbedunit       : chr  "" "" "" "" ...
 $ localorder         : chr  "" "" "" "" ...
 $ regionalsection    : chr  "" "" "" "" ...
 $ regionalbed        : chr  "" "" "" "" ...
 $ regionalbedunit    : chr  "" "" "" "" ...
 $ regionalorder      : chr  "" "" "" "" ...
 $ stratcomments      : chr  "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Late Kinderhookian and Osagean" ...
 $ lithdescript       : chr  "" "" "" "" ...
 $ lithology1         : chr  "\"limestone\"" "\"limestone\"" "\"limestone\"" "\"carbonate\"" ...
 $ lithadj1           : chr  "" "" "" "" ...
 $ lithification1     : chr  "lithified" "lithified" "lithified" "" ...
 $ minor_lithology1   : chr  "" "" "" "" ...
 $ fossilsfrom1       : chr  "" "" "" "" ...
 $ lithology2         : chr  "" "" "" "" ...
 $ lithadj2           : chr  "" "" "" "" ...
 $ lithification2     : chr  "" "" "" "" ...
 $ minor_lithology2   : chr  "" "" "" "" ...
 $ fossilsfrom2       : chr  "" "" "" "" ...
 $ primary_reference  : chr  "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ ...
 $ created            : chr  "1998-11-20 07:59:51" "1998-11-20 07:59:51" "1998-11-20 07:59:51" "2001-03-26 11:06:31" ...
 $ modified           : chr  "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2000-11-30 12:09:35" ...
 $ updated            : logi  NA NA NA NA NA NA ...
 $ time_bins          : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_contain       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_major         : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_buffer        : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_overlap       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ authorizer         : chr  "J. Alroy" "J. Alroy" "J. Alroy" "W. Kiessling" ...
 $ enterer            : chr  "M. Sommers" "M. Sommers" "M. Sommers" "W. Kiessling" ...
 $ modifier           : chr  "M. Clapham" "M. Clapham" "M. Clapham" "" ...
 $ updater            : logi  NA NA NA NA NA NA ...
 $ primary_name       : chr  "Australosutura" "Carbonocoryphe" "Thigriffides" "Pudoproetus" ...
 $ primary_reso       : chr  "" "" "" "" ...
 $ subgenus_name      : chr  "" "" "" "" ...
 $ subgenus_reso      : chr  "" "" "" "" ...
 $ species_name       : chr  "llanoensis" "planucauda" "roundyi" "chappelensis" ...
 $ species_reso       : chr  "" "" "" "" ...
 $ image_no           : int  1746 1746 1746 1746 1746 1746 1746 1746 1746 1746 ...
  [list output truncated]
 - attr(*, "chronosphere")=List of 13
  ..$ src         : chr "pbdb"
  ..$ ser         : chr "occs3"
  ..$ res         : chr "species"
  ..$ ver         : int 20250819
  ..$ datafile    : chr "pbdb_occs.rds"
  ..$ item        : int 2700
  ..$ reference   : chr [1:2] "Please remember to acknowledge the Paleobiology Database (http://paleobiodb.org) in your publication." "Alternatively, you can use: The Paleobiology Database Community. (year). Denormalized occurrence table of the P"| __truncated__
  ..$ bibtex      : chr [1:2] "NA" "NA"
  ..$ downloadDate: POSIXct[1:1], format: "2025-08-20 14:36:24"
  ..$ productDate : chr "2025-08-19"
  ..$ infoURL     : chr "https://chronosphere.info/data/pbdb/occs3/"
  ..$ API         : chr "https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,"| __truncated__
  ..$ additional  : list()

Filtering - I. Taxonomy: quality

Get rid of unprecise entries!

# filter records not identified at least to genus
dat <-dat[dat$accepted_rank %in% c("genus", "species", "subgenus", "subspecies"),]

# omit non-informative genus entries
dat <- dat[dat$genus!="", ]

# Monitor data loss:
nrow(dat)

[1] 1486303

Strategy

Goal: to match the original scope of Sepkoski and Alroy
Conservative inclusion at high levels (whole group is to be included)
Include particular lower level entries
Dirty…

Filtering - I. Taxonomy: phyla

marineNoPlant <- c("",
    "Agmata",
    "Annelida",
    "Bilateralomorpha",
    "Brachiopoda",
    "Bryozoa",
    "Calcispongea",
    "Chaetognatha",
    "Cnidaria",
    "Ctenophora",
    "Echinodermata",
    "Entoprocta",
    "Foraminifera",
    "Hemichordata",
    "Hyolitha",
    "Mollusca",
    "Nematoda",
    "Nematomorpha",
    "Nemertina",
    "Onychophora",
    "Petalonamae",
    "Phoronida",
    "Platyhelminthes",
    "Porifera",
    "Problematica",
    "Rhizopodea",
    "Rotifera",
    "Sarcomastigophora",
    "Sipuncula",
    "Uncertain",
    "Vetulicolia",
    ""
)

# which rows?
bByPhyla <- dat$phylum%in% marineNoPlant

Filtering - I. Taxonomy: classes

#B. classes
#   levels(factor(noNeed$class))
needClass <- c(
    "Acanthodii",
    "Actinopteri",
    "Actinopterygii",
    "Agnatha",
    "Cephalaspidomorphi",
    "Chondrichthyes",
    "Cladistia",
    "Coelacanthimorpha",
    "Conodonta",
    "Galeaspida",
    "Myxini",
    "Osteichthyes",
    "Petromyzontida",
    "Plagiostomi",
    "Pteraspidomorphi",
    # here come the Arthropods
    "Artiopoda",
    "Branchiopoda",
    "Cephalocarida",
    "Copepoda",
    "Malacostraca",
    "Maxillopoda",
    "Megacheira",
    "Merostomoidea",
    "Ostracoda",
    "Paratrilobita",
    "Pycnogonida",
    "Remipedia",
    "Thylacocephala",
    "Trilobita",
    "Xiphosura"
)

# which rows?
bNeedClass <- dat$class %in% needClass

Filtering - I. Taxonomy: more

#C.  mammals
#   mammals <- dat[dat$class=="Mammalia",]
#   levels(factor(mammals$order))
needMammalOrd <- c("Cetacea", "Sirenia")

# which rows?
bMammalOrder <- dat$order %in% needMammalOrd

# the carnivores
#   carnivores <- dat[dat$order=="Carnivora",]
#   levels(factor(carnivores$family))
needFam <- c("Otariidae", "Phocidae", "Desmatophocidae")

# which rows?
bNeedMamFam<- dat$family%in%needFam

# D. Reptiles
#   reptiles <- dat[dat$class=="Reptilia",]
#   levels(factor(reptiles$order))
needReptOrd<-c(
    "Eosauropterygia",
    "Hupehsuchia",
    "Ichthyosauria",
    "Placodontia",
    "Sauropterygia",
    "Thalattosauria"
)

# which rows?
bRept <- dat$order%in%needReptOrd

# E. turtles 
#   turtles <- dat[dat$order=="Testudines",]
#   levels(factor(turtles$family))

# Chelonioidea turtles
needTurtleFam <- c(
    "Cheloniidae",
    "Protostegidae",
    "Dermochelyidae",
    "Dermochelyoidae",
    "Toxochelyidae",
    "Pancheloniidae"
)

# which rows?
bTurtle <- dat$family%in%needTurtleFam

Filtering - I. Taxonomy: execute!

Subsetting for the rows we highlighted:

# subset the data
dat <- dat[bByPhyla | bNeedClass | bMammalOrder | bNeedMamFam | bRept | bTurtle , ]

Resolve the potential homonymy problem across taxa:

dat$clgen <- paste(dat$class, dat$genus)

Number of remaining records:

nrow(dat)

[1] 1057872

Filtering - II. Environment

Fossils come from different environments:

levels(factor((dat$environment)))

 [1] ""                                    
 [2] "\"channel\""                         
 [3] "\"floodplain\""                      
 [4] "alluvial fan"                        
 [5] "basin reef"                          
 [6] "basinal (carbonate)"                 
 [7] "basinal (siliceous)"                 
 [8] "basinal (siliciclastic)"             
 [9] "carbonate indet."                    
[10] "cave"                                
[11] "channel lag"                         
[12] "coarse channel fill"                 
[13] "coastal indet."                      
[14] "crater lake"                         
[15] "crevasse splay"                      
[16] "deep subtidal indet."                
[17] "deep subtidal ramp"                  
[18] "deep subtidal shelf"                 
[19] "deep-water indet."                   
[20] "delta front"                         
[21] "delta plain"                         
[22] "deltaic indet."                      
[23] "dry floodplain"                      
[24] "dune"                                
[25] "eolian indet."                       
[26] "estuary/bay"                         
[27] "fine channel fill"                   
[28] "fissure fill"                        
[29] "fluvial indet."                      
[30] "fluvial-deltaic indet."              
[31] "fluvial-lacustrine indet."           
[32] "foreshore"                           
[33] "glacial"                             
[34] "interdistributary bay"               
[35] "interdune"                           
[36] "intrashelf/intraplatform reef"       
[37] "karst indet."                        
[38] "lacustrine - large"                  
[39] "lacustrine - small"                  
[40] "lacustrine delta front"              
[41] "lacustrine delta plain"              
[42] "lacustrine deltaic indet."           
[43] "lacustrine indet."                   
[44] "lacustrine prodelta"                 
[45] "lagoonal"                            
[46] "lagoonal/restricted shallow subtidal"
[47] "levee"                               
[48] "loess"                               
[49] "marginal marine indet."              
[50] "marine indet."                       
[51] "mire/swamp"                          
[52] "offshore"                            
[53] "offshore indet."                     
[54] "offshore ramp"                       
[55] "offshore shelf"                      
[56] "open shallow subtidal"               
[57] "paralic indet."                      
[58] "perireef or subreef"                 
[59] "peritidal"                           
[60] "platform/shelf-margin reef"          
[61] "pond"                                
[62] "prodelta"                            
[63] "reef, buildup or bioherm"            
[64] "sand shoal"                          
[65] "shallow subtidal indet."             
[66] "shoreface"                           
[67] "sinkhole"                            
[68] "slope"                               
[69] "slope/ramp reef"                     
[70] "spring"                              
[71] "submarine fan"                       
[72] "tar"                                 
[73] "terrestrial indet."                  
[74] "transition zone/lower shoreface"     
[75] "wet floodplain"

Filtering - II. Environment

We want go get rid of non-marine ones:

omitEnv <- c(
    "\"floodplain\"",
    "alluvial fan",
    "cave",
    "\"channel\"",
    "channel lag" ,
    "coarse channel fill",
    "crater lake",
    "crevasse splay",
    "dry floodplain",
    "delta plain",
    "dune",
    "eolian indet.",
    "fine channel fill",
    "fissure fill",
    "fluvial indet.",
    "fluvial-lacustrine indet.",
    "fluvial-deltaic indet.",
    "glacial",
    "interdune",
    "karst indet.",
    "lacustrine - large",
    "lacustrine - small",
    "lacustrine delta front",
    "lacustrine delta plain",
    "lacustrine deltaic indet.",
    "lacustrine indet.",
    "lacustrine interdistributary bay",
    "lacustrine prodelta",
    "levee",
    "loess",
    "mire/swamp",
    "pond",
    "sinkhole",
    "spring",
    "tar",
    "terrestrial indet.",
    "wet floodplain"
)

Actual omission:

dat<-dat[!dat$environment%in%omitEnv, ]

# remaining
nrow(dat)

[1] 1040103

Omitting unlithified sediments

Unusually good preservation (usually in more recent intervals) might cause temporal biases in sampling. To evade this problem, it can be a good idea to omit records from unlithified sediments.

dat <- dat[dat$lithification1!="unlithified",]
nrow(dat)

[1] 965815

Stratigraphic binning (stages): two solutions

95-bin stages (`stg`)

data(stages, package="divDyn")
str(stages)

'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...

From Ediacaran
Some stages are merged

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning (stages): two solutions

PBDB time-bins (`stb`)

Literal stages of the GTS (99 bins)
Internally done by the PBDB
Using time_bins, time_contain, time_overlap, time_major, time_buffer
Sounds awesome but unstable!

Note

We start with solution 1 to better understand the process. Check out the binningCompare.R script for a comparison.

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning

Stratigraphic attributes belong to collections
Assignment is uncertain
Hundreds of names: standard, regional, historic all entries come from:
- early_interval: obligatory, starting name
- late_interval: facultative, if collection range doesn’t fit in one interval

Stage number lookup table

Relevant information is in the keys object: ?keys

data(keys, package="divDyn")
str(keys,1)

List of 8
 $ tenInt:List of 51
 $ stgInt:List of 84
 $ reefs :List of 3
 $ lith  :List of 3
 $ lat   :List of 3
 $ bath  :List of 3
 $ grain :List of 3
 $ depenv:List of 3

This object contains useful information for processing the PBDB: from the old FossilWorks and from scripts of W. Kiessling

The relevant information is in:

keys$stgInt

$`14`
 [1] "Cressagian"   "Datsonian"    "Deming"       "Demingian"    "Gasconadian" 
 [6] "Mansian"      "Migneintian"  "Pakerort"     "Skullrockian" "Stairsian"   
[11] "Tremadoc"     "Tremadocian"  "Varangu"      "Warendian"   

$`15`
 [1] "Bendigonian"   "Billingen"     "Blackhillsian" "Cassinian"    
 [5] "Chewtonian"    "Floian"        "Hunneberg"     "Jefferson"    
 [9] "Jeffersonian"  "Moridunian"    "Tulean"       

$`16`
[1] "Castlemainian" "Dapingian"     "Dawan"         "Fennian"      
[5] "Klabava"       "Langevoja"     "Rangerian"     "Volkhov"      
[9] "Yapeenian"    

$`17`
 [1] "Abereiddian"      "Aluoja"           "Aseri"            "Chazy"           
 [5] "Chazyan"          "Darriwilian"      "Dobrotiva"        "Dobrotivian"     
 [9] "Early Llandeilo"  "Early Llanvirn"   "Hulo"             "Hunderum"        
[13] "Kunda"            "Lasnamagi"        "Late Llandeilo"   "Late Llanvirn"   
[17] "Llandeilian"      "Llandeilo"        "Llanvirn"         "Llanvirnian"     
[21] "Middle Llandeilo" "Oretanian"        "Sarka"            "Uhaku"           
[25] "Valaste"          "Whiterock"        "Wildernessian"   

$`18`
 [1] "Ashbyan"      "Aurelucian"   "Black River"  "Blackriveran" "Burrellian"  
 [6] "Costonian"    "Gisbornian"   "Haljala"      "Harnagian"    "Idavere"     
[11] "Johvi"        "Kirkfieldian" "Kukruse"      "Mohawkian"    "Rocklandian" 
[16] "Sandbian"     "Soudleyan"    "Turinian"    

$`19`
 [1] "Actonian"      "Bohdalec"      "Cautleyan"     "Chatfieldian" 
 [5] "Cheneyan"      "Eastonian"     "Edenian"       "Katian"       
 [9] "Kralodvorian"  "Marshbrookian" "Maysville"     "Maysvillian"  
[13] "Nabala"        "Oandu"         "Onnian"        "Pirgu"        
[17] "Pusgillian"    "Rakvere"       "Rawtheyan"     "Richmond"     
[21] "Richmondian"   "Shermanian"    "Streffordian"  "Vormsi"       
[25] "Woolstonian"   "Zahorany"     

$`20`
[1] "Gamach"     "Gamachian"  "Hirnantian" "Kosovian"   "Porkuni"   

$`21`
[1] "Juuru"      "Rhuddanian"

$`22`
[1] "Aeronian"

$`23`
[1] "Telychian"

$`24`
[1] "Jaani"        "Sheinwoodian"

$`25`
[1] "Gleedonian"  "Homerian"    "Rootsikula"  "Whitwellian"

$`26`
[1] "Bringewood" "Elton"      "Gorstian"  

$`27`
[1] "Kuressaare"   "Leintwardine" "Ludfordian"   "Whitcliffe"  

$`28`
[1] "Kaugetuma" "Ohesaare"  "Pridoli"  

$`29`
[1] "Gedinne"    "Gedinnine"  "Lochkovian"

$`30`
[1] "Pragian" "Siegen" 

$`31`
[1] "Deer Park"    "Early Emsian" "Ems"          "Emsian"       "Late Emsian" 
[6] "Zlichovian"  

$`32`
[1] "Eifel"     "Eifelian"  "Southwood"

$`33`
[1] "Early Givetian"  "Givet"           "Givetian"        "Late Givetian"  
[5] "Middle Givetian" "Taghanic"       

$`34`
[1] "Early Frasnian"  "Frasnian"        "Late Frasnian"   "Middle Frasnian"

$`35`
[1] "Early Famennian"  "Famennian"        "Late Famennian"   "Middle Famennian"

$`36`
[1] "Early Mississippian" "Hastarian"           "Ivorian"            
[4] "Kinderhookian"       "Tournaisian"        

$`37`
[1] "Arundian"   "Asbian"     "Brigantian" "Chadian"    "Holkerian" 
[6] "Jiusian"    "Meramecian" "Visean"    

$`38`
[1] "Arnsbergian"        "Late Mississippian" "Pendleian"         
[4] "Serpukhovian"      

$`39`
 [1] "Alportian"           "Bashkirian"          "Cheremshanian"      
 [4] "Chokierian"          "Early Pennsylvanian" "Kinderscoutian"     
 [7] "Krasnopolyanian"     "Marsdenian"          "Morrowan"           
[10] "Prikarnian"          "Voznesenian"         "Westphalian A"      
[13] "Yeadonian"          

$`40`
 [1] "Desmoinesian"         "Kashirian"            "Middle Pennsylvanian"
 [4] "Moscovian"            "Myachkovian"          "Podolskian"          
 [7] "Vereian"              "Westphalian B"        "Westphalian C"       
[10] "Westphalian D"       

$`41`
[1] "Dorogovilovian" "Kasimovian"     "Khamovnichean"  "Krevyakian"    
[5] "Missourian"     "Stephanian A"   "Stephanian B"  

$`42`
[1] "Gzhelian"        "Kuzel"           "Melekhovian"     "Noginian"       
[5] "Pavlovoposadian" "Rusavkian"       "Stephanian C"   

$`43`
[1] "Asselian"   "Autunian"   "Shikhanian"

$`44`
[1] "Sakmarian"      "Sterlitamakian" "Tastubian"     

$`45`
[1] "Artinskian" "Sarginian" 

$`46`
[1] "Irenian"   "Kungurian" "Saranian"  "Ufimian"  

$`47`
[1] "Early Kazanian" "Kazanian"       "Kubergandian"   "Late Kazanian" 
[5] "Roadian"        "Sokian"        

$`48`
[1] "Murgabian" "Urzhumian" "Word"      "Wordian"  

$`49`
[1] "Capitan"    "Capitanian"

$`50`
[1] "Longtanian"    "Wuchiapingian"

$`51`
[1] "Changhsingian"

$`52`
[1] "Dienerian"    "Griesbachian" "Induan"      

$`53`
[1] "Olenekian" "Smithian"  "Spathian" 

$`54`
[1] "Aegean"    "Anisian"   "Bithynian" "Illyrian"  "Pelsonian"

$`55`
[1] "Fassanian"    "Ladinian"     "Longobardian"

$`56`
[1] "Carnian"  "Julian"   "Oretian"  "Tuvalian"

$`57`
[1] "Alaunian" "Lacian"   "Norian"   "Otamitan" "Sevatian" "Warepan" 

$`58`
[1] "Otapirian" "Rhaetian" 

$`59`
[1] "Angulata"          "Canadensis"        "Early Hettangian" 
[4] "Hettangian"        "Late Hettangian"   "Liasicus"         
[7] "Middle Hettangian" "Planorbis"        

$`60`
[1] "Bucklandi"        "Early Sinemurian" "Late Sinemurian"  "Obtusum"         
[5] "Oxynotum"         "Raricostatum"     "Semicostatum"     "Sinemurian"      
[9] "Turneri"         

$`61`
 [1] "Davoei"              "Early Pliensbachian" "Freboldi"           
 [4] "Ibex"                "Imlayi"              "Jamesoni"           
 [7] "Kunae"               "Late Pliensbachian"  "Margaritatus"       
[10] "Pliensbachian"       "Spinatum"            "Whiteavesi"         

$`62`
 [1] "Bifrons"         "Early Toarcian"  "Falciferum"      "Kanense"        
 [5] "Late Toarcian"   "Levesquei"       "Middle Toarcian" "Planulata"      
 [9] "Tenuicostatum"   "Thouarsense"     "Toarcian"        "Variabilis"     

$`63`
[1] "Aalenian"        "Concavum"        "Early Aalenian"  "Late Aalenian"  
[5] "Middle Aalenian" "Murchisonae"     "Opalinum"        "Scissum"        

$`64`
 [1] "Bajocian"       "Discites"       "Early Bajocian" "Garantiana"    
 [5] "Humphresianum"  "Laeviuscula"    "Late Bajocian"  "Parkinsoni"    
 [9] "Sauzei"         "Subfurcatum"   

$`65`
[1] "Bathonian"        "Early Bathonian"  "Late Bathonian"   "Middle Bathonian"

$`66`
[1] "Callovian"        "Early Callovian"  "Late Callovian"   "Middle Callovian"

$`67`
[1] "Early Oxfordian"  "Late Oxfordian"   "Middle Oxfordian" "Oxfordian"       

$`68`
[1] "Early Kimmeridgian" "Kimmeridgian"       "Late Kimmeridgian" 

$`69`
[1] "Early Tithonian"  "Late Tithonian"   "Middle Tithonian" "Puaroan"         
[5] "Tithonian"       

$`70`
[1] "Berriasian"        "Early Berriasian"  "Late Berriasian"  
[4] "Middle Berriasian"

$`71`
[1] "Early Valanginian" "Late Valanginian"  "Valanginian"      

$`72`
[1] "Early Hauterivian" "Hauterivian"       "Late Hauterivian" 

$`73`
[1] "Barremian"       "Early Barremian" "Late Barremian" 

$`74`
[1] "Aptian"       "Early Aptian" "Late Aptian" 

$`75`
[1] "Albian"        "Early Albian"  "Late Albian"   "Middle Albian"
[5] "Motuan"        "Urutawan"     

$`76`
[1] "Cenomanian"        "Early Cenomanian"  "Late Cenomanian"  
[4] "Middle Cenomanian" "Ngaterian"        

$`77`
[1] "Early Turonian"  "Late Turonian"   "Mangaotanean"    "Middle Turonian"
[5] "Turonian"       

$`78`
[1] "Coniacian"        "Early Coniacian"  "Late Coniacian"   "Middle Coniacian"
[5] "Teratan"         

$`79`
[1] "Early Santonian"  "Late Santonian"   "Middle Santonian" "Piripauan"       
[5] "Santonian"       

$`80`
[1] "Campanian"        "Early Campanian"  "Judithian"        "Late Campanian"  
[5] "Middle Campanian"

$`81`
[1] "Early Maastrichtian" "Edmontonian"         "Lancian"            
[4] "Late Maastrichtian"  "Maastrichtian"      

$`82`
[1] "Danian"          "Early Paleocene" "Pu1"             "Pu3"            
[5] "Puercan"         "To2"             "To3"             "Torrejonian"    

$`83`
 [1] "Cf1"              "Cf2"              "Cf3"              "Clarkforkian"    
 [5] "Gashatan"         "Late Paleocene"   "Middle Paleocene" "Riochican"       
 [9] "Selandian"        "Thanetian"        "Ti1"             

$`84`
 [1] "Bumbanian"    "Early Eocene" "Graybullian"  "Lostcabinian" "Lysitean"    
 [6] "Mangaorapan"  "MP 10"        "MP 7"         "MP 8 + 9"     "Sandcouleean"
[11] "Waipawan"     "Wasatchian"   "Ypresian"    

$`85`
[1] "Early Uintan" "Late Uintan"  "Lutetian"     "MP 11"        "MP 13"       
[6] "MP 14"        "Mustersan"    "Uintan"      

$`86`
[1] "Bartonian"  "Duchesnean"

$`87`
 [1] "Chadronian"        "Early Chadronian"  "Ergilian"         
 [4] "Houldjinian"       "Kaiatan"           "Late Chadronian"  
 [7] "Late Eocene"       "Middle Chadronian" "MP 18"            
[10] "MP 19"             "MP 20"             "Priabonian"       
[13] "Runangan"          "Ulangochuian"     

$`88`
[1] "Early Oligocene" "MP 21"           "MP 22"           "MP 23"          
[5] "MP 24"           "Orellan"         "Rupelian"        "Whitneyan"      

$`89`
[1] "Chattian"       "Duntroonian"    "Late Oligocene" "Monroecreekian"
[5] "MP 26"          "MP 27"          "MP 28"          "MP 29"         
[9] "MP 30"         

$`90`
 [1] "Aquitanian"          "Burdigalian"         "Early Hemingfordian"
 [4] "Early Miocene"       "Hemingfordian"       "Late Hemingfordian" 
 [7] "MN 1"                "MN 3"                "MN 4"               
[10] "Orleanian"           "Santacrucian"       

$`91`
 [1] "Badenian"       "Clifdenian"     "Colloncuran"    "Langhian"      
 [5] "Lillburnian"    "Middle Miocene" "MN 5"           "MN 6"          
 [9] "Sarmatian"      "Serravallian"   "Waiauan"       

$`92`
 [1] "Chasicoan"               "early Early Hemphillian"
 [3] "Kapitean"                "Late Miocene"           
 [5] "Messinian"               "MN 10"                  
 [7] "MN 11"                   "MN 12"                  
 [9] "MN 9"                    "Pannonian"              
[11] "Tortonian"               "Vallesian"              

$`93`
 [1] "Early Pliocene" "Early Pliocene" "Late Pliocene"  "Late Pliocene" 
 [5] "MN 14"          "MN 16"          "Opoitian"       "Piacenzian"    
 [9] "Pliocene"       "Villanyian"     "Waipipian"      "Zanclean"      

$`94`
[1] "Calabrian"          "Early Pleistocene"  "Gelasian"          
[4] "Ionian"             "Late Pleistocene"   "Middle Pleistocene"
[7] "MN 17"              "Pleistocene"       

$`95`
[1] "Holocene"

$`-1`
[1] ""

$default
[1] NA

Combine/categorize entries

Built-in categorize() function allows us to find entries in keys$stgInt:

# B. the stg entries (lookup)
stgMin<-categorize(dat[,"early_interval"],keys$stgInt)
stgMax<-categorize(dat[,"late_interval"],keys$stgInt)

entry	result	meaning
`"Ypresian"`	`84`	name indicates interval confined to a stage
`"Permian"`	`NA`	Name indicates interval not confined to a stage
`""`	`-1`	Inteval entry is not given - `late_interval`

Lower number - earlier stage.

We need a single bin entry

Make sure that numbers are numbers!

stgMin<-as.numeric(stgMin)
stgMax<-as.numeric(stgMax)

We need just one bin for every entry! Add this to the table, we will fill this in the next step.

# empty container
dat$stg <- rep(NA, nrow(dat))

What are the cases?

stgMin	stgMax	Meaning	stg
84	-1	`early_interval` resolves to one stage, no `late_interval`	84
84	84	`early_interval` resolves to same stage as `late_interval`	84
84	86	collection can range from stages `84` to `86`	NA
84	NA	`late_interval` does not resolve to one stage	NA
NA	-1	`early_interval` does not resolve to one stage, the other missing	NA
NA	NA	neither `early_interval` nor `late_interval` resolve to one stage	NA

Deriving a single stage number

Programming the rules above with logical expressions:

# select entries, where
stgCondition <- c(
# the early and late interval fields indicate the same stg
    which(stgMax==stgMin),
# or the late_intervarl field is empty
    which(stgMax==-1))

In these entries, use the stg indicated by the early_interval:

dat$stg[stgCondition] <- stgMin[stgCondition]

Result

integer: record comes from a one stage
NA: record does not come from one stage

Extra cases /1

The Cambrian and the Ordovician are problematic.

The Cambrian

Collections were assigned by Na Lin to the 10 stages (Na and Kiessling 2015) .

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/cambStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2018-08-31/cambProcess.R")

Note

This is only necessary if you do Phanerozoic scale analyses!
You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

Extra cases /2

The Cambrian and the Ordovician are problematic.

The Ordovician

Assigments based on formations and biozone by Wolfgang Kiessling.

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/ordStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2019-05-31/ordProcess.R")

Note

This is only necessary if you do Phanerozoic scale analyses!
You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

Example solution

Function to extract relevant information from the PBDB download

#' Function to generate stage-lookup table from the Paleobiology Database
#'
#' Processess time-bin data downloaded from the database
#'
#' @param pbdb data.frame from database.
#' @param stagecolumn Charater string, of the time-bin information column
#' @param min_ma Character string, column name of the minimum ages 
#' @param max_ma Character string, column name of the maximum ages 
#' @param bin Character string, this will be the column name of the bin information
#' @return A data.frame with the stages' name bottom, mid and top age and bin number.
StagesFromPBDB<- function(pbdb, stagecolumn="time_contain", min_ma="min_ma", max_ma="max_ma", bin="stb"){
    # get the appropriate colulmns out of the PBDB download
    timeinfo <- unique(pbdb[,c(stagecolumn, max_ma, min_ma)])

    # omit those entires that are not meaningful
    timeinfo <- timeinfo[timeinfo[,stagecolumn]!="-",]
    
    # search for stage minima and maxima
    ma <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, max_ma], max)
    min <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, min_ma], min)

    # create stages object
    stagesDF <- data.frame(stage=names(ma), bottom=ma, mid=(ma+min)/2, top=min)
    stagesDF <-stagesDF[order(stagesDF$bottom, decreasing=TRUE),]
    rownames(stagesDF) <- NULL

    # check consistency - Cambrian is not consistent
    # stages$top[1:(nrow(stages)-1)]==stages$bottom[2:nrow(stages)]

    # create integer bin numbers
    stagesDF<- cbind(stagesDF, 1:nrow(stagesDF))
    colnames(stagesDF)[ncol(stagesDF)] <- bin
    return(stagesDF)
}

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

Call to the function:

# new stages object
stagebins <- StagesFromPBDB(dat)
stagebins

               stage   bottom       mid      top stb
1          Ediacaran 635.0000 586.90000 538.8000   1
2          Fortunian 538.8000 533.90000 529.0000   2
3            Stage 2 529.0000 525.00000 521.0000   3
4            Stage 3 521.0000 517.75000 514.5000   4
5            Stage 4 514.5000 510.50000 506.5000   5
6            Wuliuan 506.5000 505.50000 504.5000   6
7            Drumian 504.5000 502.50000 500.5000   7
8         Guzhangian 500.5000 498.75000 497.0000   8
9            Paibian 497.0000 495.60000 494.2000   9
10      Jiangshanian 494.2000 492.60000 491.0000  10
11          Stage 10 491.0000 488.92500 486.8500  11
12       Tremadocian 486.8500 481.97500 477.1000  12
13            Floian 477.1000 474.20000 471.3000  13
14         Dapingian 471.3000 470.35000 469.4000  14
15       Darriwilian 469.4000 463.80000 458.2000  15
16          Sandbian 458.2000 455.50000 452.8000  16
17            Katian 452.8000 449.00000 445.2000  17
18        Hirnantian 445.2000 444.15000 443.1000  18
19        Rhuddanian 443.1000 441.80000 440.5000  19
20          Aeronian 440.5000 439.55000 438.6000  20
21         Telychian 438.6000 435.75000 432.9000  21
22      Sheinwoodian 432.9000 431.75000 430.6000  22
23          Homerian 430.6000 428.65000 426.7000  23
24          Gorstian 426.7000 425.85000 425.0000  24
25        Ludfordian 425.0000 422.31000 419.6200  25
26        Lochkovian 419.6200 416.32000 413.0200  26
27           Pragian 413.0200 411.82000 410.6200  27
28            Emsian 410.6200 402.04500 393.4700  28
29          Eifelian 393.4700 390.71000 387.9500  29
30          Givetian 387.9500 385.13000 382.3100  30
31          Frasnian 382.3100 377.23000 372.1500  31
32         Famennian 372.1500 365.50500 358.8600  32
33       Tournaisian 358.8600 352.78000 346.7000  33
34            Visean 346.7000 338.50000 330.3000  34
35      Serpukhovian 330.3000 326.85000 323.4000  35
36        Bashkirian 323.4000 319.30000 315.2000  36
37         Moscovian 315.2000 311.10000 307.0000  37
38        Kasimovian 307.0000 305.35000 303.7000  38
39          Gzhelian 303.7000 301.30000 298.9000  39
40          Asselian 298.9000 296.21000 293.5200  40
41         Sakmarian 293.5200 291.81000 290.1000  41
42        Artinskian 290.1000 286.70000 283.3000  42
43         Kungurian 283.3000 278.85000 274.4000  43
44           Roadian 274.4000 270.65000 266.9000  44
45           Wordian 266.9000 265.59000 264.2800  45
46        Capitanian 264.2800 261.89500 259.5100  46
47     Wuchiapingian 259.5100 256.82500 254.1400  47
48     Changhsingian 254.1400 253.02100 251.9020  48
49            Induan 251.9020 250.90100 249.9000  49
50         Olenekian 249.9000 248.30000 246.7000  50
51           Anisian 246.7000 244.08200 241.4640  51
52          Ladinian 241.4640 239.23200 237.0000  52
53           Carnian 237.0000 232.15000 227.3000  53
54            Norian 227.3000 216.50000 205.7000  54
55          Rhaetian 205.7000 203.55000 201.4000  55
56        Hettangian 201.4000 200.45000 199.5000  56
57        Sinemurian 199.5000 196.20000 192.9000  57
58     Pliensbachian 192.9000 188.55000 184.2000  58
59          Toarcian 184.2000 179.45000 174.7000  59
60          Aalenian 174.7000 172.80000 170.9000  60
61          Bajocian 170.9000 169.55000 168.2000  61
62         Bathonian 168.2000 166.75000 165.3000  62
63         Callovian 165.3000 163.40000 161.5000  63
64         Oxfordian 161.5000 158.15000 154.8000  64
65      Kimmeridgian 154.8000 152.00000 149.2000  65
66         Tithonian 149.2000 146.15000 143.1000  66
67        Berriasian 143.1000 140.07500 137.0500  67
68       Valanginian 137.0500 134.82500 132.6000  68
69       Hauterivian 132.6000 129.18500 125.7700  69
70         Barremian 125.7700 123.58500 121.4000  70
71            Aptian 121.4000 117.30000 113.2000  71
72            Albian 113.2000 106.85000 100.5000  72
73        Cenomanian 100.5000  97.20000  93.9000  73
74          Turonian  93.9000  91.85000  89.8000  74
75         Coniacian  89.8000  87.75000  85.7000  75
76         Santonian  85.7000  84.65000  83.6000  76
77         Campanian  83.6000  77.90000  72.2000  77
78     Maastrichtian  72.2000  69.10000  66.0000  78
79            Danian  66.0000  63.83000  61.6600  79
80         Selandian  61.6600  60.45000  59.2400  80
81         Thanetian  59.2400  57.62000  56.0000  81
82          Ypresian  56.0000  52.03500  48.0700  82
83          Lutetian  48.0700  44.55000  41.0300  83
84         Bartonian  41.0300  39.37000  37.7100  84
85        Priabonian  37.7100  35.80500  33.9000  85
86          Rupelian  33.9000  30.60000  27.3000  86
87          Chattian  27.3000  25.17000  23.0400  87
88        Aquitanian  23.0400  21.74500  20.4500  88
89       Burdigalian  20.4500  18.21500  15.9800  89
90          Langhian  15.9800  14.90000  13.8200  90
91      Serravallian  13.8200  12.72500  11.6300  91
92         Tortonian  11.6300   9.43800   7.2460  92
93         Messinian   7.2460   6.28950   5.3330  93
94          Zanclean   5.3330   4.46650   3.6000  94
95        Piacenzian   3.6000   3.09000   2.5800  95
96          Gelasian   2.5800   2.19000   1.8000  96
97         Calabrian   1.8000   1.28700   0.7740  97
98         Chibanian   0.7740   0.45150   0.1290  98
99  Late Pleistocene   0.1290   0.07035   0.0117  99
100         Holocene   0.0117   0.00585   0.0000 100

Note the issue with the beginning of the Cambrian!

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

#' Function to bin PBDB data to stages
#'
#' @param pbdb data.frame from database.
#' @param stages data.frame produced by StagesFromPBDB.
#' @param bin Character string, this will be the column name of the bin information
#' @param stagecolumn Character string, of the time-bin information column in the PBDB
#' @return A vector with integer bin numbers
BinPBDB <- function(pbdb,stages,bin="stb",stagecolumn="time_contain"){
    # new vector
    y <- rep(NA, nrow(pbdb))

    # simple for loop to execute the binning
    for(i in 1:nrow(stages)){
        y[which(pbdb[, stagecolumn]==stages$stage[i])] <-stages[i,bin]
    }

    return(y)
}

And finally do the binning:

# bins from the pbdb output
dat$stb <- BinPBDB(dat, stagebins)

We are done with the preparation

For other exercises (CMR): join some bits from the timescale

# stg (95bin) entries
stgFields <- stages[,c("stg", "stage","top", "bottom")]
colnames(stgFields)[2:4] <- paste0("stg_", colnames(stgFields))[2:4]
datMerged <- merge(dat,stgFields, by="stg")

# stb (99bin) entries
stbFields <- stagebins
colnames(stbFields)[1:4] <- paste0("stb_",colnames(stbFields)[1:4])
datMerged <- merge(datMerged,stbFields, by="stb")

This might be a good time to take a snapshot and save an incremental result object (e.g. .rds)

saveRDS(datMerged, file="pbdb_processed_2025-08-19.rds")

Note

The smaller subset you look at, the weirder your data behaves.
This should not be considered final!

Basic sampling patterns

The number of collections and number of records (occurrences) are very much correlated with sampling.

These simple statistics can be calculated with basic functions:

table(dat$stg)


    4     5     6     7     8     9    10    11    12    13    14    15    16 
 1360  1816  5888  2774  4503  1789  3535  1572  3746  2122  6747  5999  4713 
   17    18    19    20    21    22    23    24    25    26    27    28    29 
15418 15814 33987  3830  4698  3359  5952  7309  3751  2596  4779  4213  7608 
   30    31    32    33    34    35    36    37    38    39    40    41    42 
 5446  8045  7476 22295 11066  7510  6350 13442  4682  3999  6679  3338  3113 
   43    44    45    46    47    48    49    50    51    52    53    54    55 
 5441  8141  8105 14462 10908 11210  8927 18529 16256  7736 10342 14431  6105 
   56    57    58    59    60    61    62    63    64    65    66    67    68 
10998  8652  6138  3489  6717 17764 15695  3960  6158  6855  9695 10756  7140 
   69    70    71    72    73    74    75    76    77    78    79    80    81 
 7548  3566  7411  5592  4002  7926 11488 14526  8641  2383  5281 15021 38071 
   82    83    84    85    86    87    88    89    90    91    92    93    94 
 8357  7163  8690  9484 12718 11361  8511  7459 16753 15352 21329 23816 26064 
   95 
 3872

Sampling

To make these even faster and convenient, the binstat() function can calculate them all at once:

sampStg<- binstat(dat, bin="stg", tax="clgen", coll="collection_no", 
        duplicates=FALSE)

Note

duplicates=FALSE makes sure that multiple occurrences of the same genus in the same collection (i.e. when there are multiple species) are counted as one!

The result of `binstat()`

str(sampStg)

'data.frame':   95 obs. of  8 variables:
 $ stg     : num  1 2 3 4 5 6 7 8 9 10 ...
 $ occs    : num  NA NA NA 1079 1471 ...
 $ colls   : num  NA NA NA 217 225 784 640 852 331 558 ...
 $ SIBs    : num  NA NA NA 161 220 690 503 585 349 395 ...
 $ occ1    : num  NA NA NA 75 84 238 216 210 154 139 ...
 $ occ2    : num  NA NA NA 17 35 118 76 82 63 64 ...
 $ u       : num  NA NA NA 0.93 0.943 ...
 $ chao1occ: num  NA NA NA 326 321 ...

By default, there are as many rows as the highest integer in the bin argument (95).
There are missing values in the beginning. This is can help with plotting.
Better: plot the values in the context of the timescale.

Unifying the output

Easiest: merge the results with the time scale object.

sampling <- merge(stages, sampStg, by="stg")
str(sampling)

'data.frame':   95 obs. of  20 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ occs     : num  NA NA NA 1079 1471 ...
 $ colls    : num  NA NA NA 217 225 784 640 852 331 558 ...
 $ SIBs     : num  NA NA NA 161 220 690 503 585 349 395 ...
 $ occ1     : num  NA NA NA 75 84 238 216 210 154 139 ...
 $ occ2     : num  NA NA NA 17 35 118 76 82 63 64 ...
 $ u        : num  NA NA NA 0.93 0.943 ...
 $ chao1occ : num  NA NA NA 326 321 ...

Plotting

We create an empty canvas to plot on.

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,20000), 
    ylab="Number of Records")

The number of records

# occurrences
lines(sampling$mid, sampling$occs, lwd=2)

The number of collections

# collections
lines(sampling$mid, sampling$colls, lwd=2, col="blue")

Adding a legend

legend("topleft", bg="white", legend=c("occurrences", "collections"), 
    col=c("black", "blue"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Sampling is heterogeneous

Strong correlation between collection and record numbers

cor.test(sampling$occs, sampling$colls, method="spearman")


    Spearman's rank correlation rho

data:  sampling$occs and sampling$colls
S = 15310, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8820218

Main function: `divDyn()`

Calculate basic metrics of diversity dynamics:

dd <- divDyn(dat, bin="stg", tax="clgen")
str(dd)

'data.frame':   95 obs. of  31 variables:
 $ stg      : num  1 2 3 4 5 6 7 8 9 10 ...
 $ t2d      : num  NA NA NA NA 93 121 213 182 173 138 ...
 $ t2u      : num  NA NA NA 93 121 213 182 173 138 94 ...
 $ t3       : num  NA NA NA NA 52 55 72 58 62 31 ...
 $ tPart    : num  NA NA NA NA 15 2 27 13 19 6 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 7 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 12 8 10 10 ...
 $ tSing    : num  NA NA NA 51 47 371 166 238 83 129 ...
 $ tOri     : num  NA NA NA 110 80 183 122 134 76 101 ...
 $ tExt     : num  NA NA NA NA 38 67 130 131 102 111 ...
 $ tThrough : num  NA NA NA NA 72 85 138 129 161 126 ...
 $ divSIB   : num  NA NA NA 161 220 690 503 585 349 395 ...
 $ divCSIB  : num  NA NA NA NA 223 ...
 $ divRT    : num  NA NA NA 161 237 706 556 632 422 467 ...
 $ divBC    : num  NA NA NA NA 110 152 268 260 263 237 ...
 $ extProp  : num  NA NA NA 0.317 0.359 ...
 $ oriProp  : num  NA NA NA 1 0.536 ...
 $ extPC    : num  NA NA NA NA 0.424 ...
 $ oriPC    : num  NA NA NA NA 0.747 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.845 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.463 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.269 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.313 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.776 ...
 $ sampRange: num  NA NA NA NA 0.764 ...

The output of `divDyn()`

Returns a data.frame, row of values for every bin. Prefixes:

t: taxon counts
div: diversity counts
ext: extinction metrics
ori: origination metrics
samp: sampling metrics

See: ?divDyn for more…

Range-based metrics of diversity dynamics

Range-through diversity

divRT = tSing(FL) + tExt(bL) + tOri(Ft) + tThrough(bt)

Boundary-crosser diversity

divBC = ~~tSing(FL)~~ + tExt(bL) + tOri(Ft) + tThrough(bt)

Per capita rates (M. Foote 1999)

Time homogeneous process in bin, Decay rate
Originally divided by $\Delta t$
Assumption about pulsedness (Michael Foote 2005)

oriPC = -log( tOri / ( tOri + tThrough ))
extPC = -log( tExt / ( tExt + tThrough ))

Plotting preparation

To make plotting easy:

patterns <- merge(stages, dd, by="stg")
str(patterns)

'data.frame':   95 obs. of  43 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ t2d      : num  NA NA NA NA 93 121 213 182 173 138 ...
 $ t2u      : num  NA NA NA 93 121 213 182 173 138 94 ...
 $ t3       : num  NA NA NA NA 52 55 72 58 62 31 ...
 $ tPart    : num  NA NA NA NA 15 2 27 13 19 6 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 7 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 12 8 10 10 ...
 $ tSing    : num  NA NA NA 51 47 371 166 238 83 129 ...
 $ tOri     : num  NA NA NA 110 80 183 122 134 76 101 ...
 $ tExt     : num  NA NA NA NA 38 67 130 131 102 111 ...
 $ tThrough : num  NA NA NA NA 72 85 138 129 161 126 ...
 $ divSIB   : num  NA NA NA 161 220 690 503 585 349 395 ...
 $ divCSIB  : num  NA NA NA NA 223 ...
 $ divRT    : num  NA NA NA 161 237 706 556 632 422 467 ...
 $ divBC    : num  NA NA NA NA 110 152 268 260 263 237 ...
 $ extProp  : num  NA NA NA 0.317 0.359 ...
 $ oriProp  : num  NA NA NA 1 0.536 ...
 $ extPC    : num  NA NA NA NA 0.424 ...
 $ oriPC    : num  NA NA NA NA 0.747 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.845 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.463 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.269 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.313 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.776 ...
 $ sampRange: num  NA NA NA NA 0.764 ...

Plotting - empty canvas

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Plotting `divRT`

lines(patterns$mid, patterns$divRT)

Plotting `divBC`

lines(patterns$mid, patterns$divBC, col="blue")

Plotting rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Taxonomic rates")

Plotting rates (lines)

lines(patterns$mid, patterns$oriPC, col="green")
lines(patterns$mid, patterns$extPC, col="red")

Plotting rates (legend)

legend("topleft", bg="white", legend=c("origination", "extinction"), 
    col=c("green", "red"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Can you name the main events and patterns?

Range-based rates - main features of results

Decline of rates
Mass extinctions + (rebounds)
Edge effects

Issues

Smearing!
(Signor III and Lipps 1982) effect
More severe with worse sampling
Pull of the Recent

Occurrence record-based metrics

“occurrence-based”

Sampled-in-Bin diversity

divSIB
The number of sampled taxa in the focal bin.
Very vulnerable to the sampling universe effect

Plotting SIB diversity (previous)

Previous code:

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")

Plotting SIB diversity (line)

The SIB diversity:

lines(patterns$mid, patterns$divSIB, col="gray")

Plotting SIB diversity (legend)

The SIB diversity:

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB"), 
    col=c("black", "blue", "gray"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Three-timer sampling completeness

samp3T = t3 / (t3 + tPart)
$[0,1]$ - How well-sampled is the interval?

Plotting `samp3t` (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1), 
    ylab="Sampling completeness")

Plotting `samp3t` (line)

lines(patterns$mid, patterns$samp3t)

Corrected SIB diversity (`divCSIB`)

Use samp3t to correct divSIB
divCSIV = divSIB / samp3t * samp3tTotal

Plotting CSIB diversity (previous)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")
lines(patterns$mid, patterns$divSIB, col="gray")

Plotting CSIB diversity (line)

lines(patterns$mid, patterns$divCSIB, col="purple")

Plotting CSIB diversity (legend)

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB", "divCSIB"), 
    col=c("black", "blue", "gray", "purple"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Diversity comparison

Range-based values are higher and less volatile (smoother)
Face-value results: divRT
Truest method: divCSIB (with subsampling)

Occurrence record-based rates

Purpose: get rid of the smearing effects
Approximate the per capita rates, when sampling is complete

Three-timer rates (Alroy 1996)
Corrected three-timer rates (Alroy 2008)
Gap-filler rates (Alroy 2014)
Second-for-third substitution rates (Alroy 2015)
- Wider moving windows, more complex patterns

Three-timer rates `3t`

Per capita rates in a 3-bin window.

Taxon patterns:

Lower two-timers (t2d)
Upper two-timers (t2u)
Three-timer (t3)

(Alroy 1996)

ext3t = log( t2d / t3)
ori3t = log( t2u / t3)

Corrected Three-timer rates `C3t`

Improvement presented by Alroy (2008)

extC3t = log( t2d / t3 ) + log( samp3t[i+1] )
oriC3t = log( t2u / t3 ) - log( samp3t[i-1] )

Gap-filler `GF` and `2f3` rates

Gap filler equations (Alroy 2014): $extC3t = log((t2d + tPart)/(t3 + tPart + tGFu))$
$oriC3t = log((t2u + tPart)/(t3 + tPart + tGFd))$

Improvements in (Alroy 2015): second-for-third substituion rates. These are algorithmic solutions. These are the state-of-the-art rate metrics of this kind:

ext2f3
ori2f3

Comparison of extinction rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Extinction rates")

Comparison of extinction rates (lines)

lines(patterns$mid, patterns$ext3t, col="black")
lines(patterns$mid, patterns$extC3t, col="orange")
lines(patterns$mid, patterns$extGF, col="cyan")
lines(patterns$mid, patterns$ext2f3, col="darkred")

Comparison of extinction rates (legend)

legend("topleft", bg="white", legend=c("3t", "C3t", "GF", "2f3"), 
    col=c("black", "orange", "cyan", "darkred"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Rate types - comparison

Different metrics converge, but they vary!
No edge or smearing effects with the occurrence-based ones
They omit lots of data, very unprecise: negative values!

Rate General guidelines

Use

PC: General assessment/raw patterns
2f3: When possible, rate estimation at MEs

Fig. from Kocsis et al. (2019)

Subsampling of samples in divDyn

Why?

Comparing samples with different quantity of information
What would our results look like if we had the same quantity of information?

Miller and Foote (1996)

Two main types

Classical Rarefaction (CR)

Analytical and algorithmic approach
target: a particular number of records
By-list subsampling

Shareholder Quorum Subsampling (SQS)

Based on coverage
Target is a certain level of quorum (coverage)

Algorithmic solution: SQS (Alroy 2010)
Analytical solution: Coverage-based rarefaction (Chao and Jost 2012)

Implementation in divDyn

Subsampling for one sample is the first step
Single sample implementation on the Evolv-ED

Subsampling with Phanerozoic Marine Animals

Sampling’s effects (occurrences)

cor.test(sampling$occs, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$occs and patterns$divCSIB
S = 26994, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7777997

Massive correlation between sampling and diversity in a bin!

Sampling’s effects (collections)

cor.test(sampling$colls, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52221, p-value = 4.471e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5701468

Massive correlation between sampling and diversity in a bin!

Subsampling in a time-bin context

Kocsis et al. (2019)

Subsample every bin
Calculate a statistic in mind - run a function on the data, e.g. divDyn()
Store the result
Iterate 1-3 a number of times

CR with Phanerozoic-scale data

crStg<-subsample(dat, bin="stg", tax="clgen", coll="collection_no", q=1000, 
    iter=5, duplicates=FALSE, na.rm=TRUE)

Arguments

arg	meaning
`q`	Subsampling quota
`iter`	Number of iterations
`duplicates`	Multiple records of the same genus in a list?
`coll`	What is a list (for `duplicates`)
`na.rm`	Omit `NA` bin assignments automatically

CR with Phanerozoic-scale data (canvas)

Prepartion for plotting

cr <- merge(stages, crStg, by="stg")

Timescale plot

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Raw diversity curves (RT and CSIB)

lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divCSIB, col="purple")

Adding subsampling results

lines(cr$mid, cr$divRT, lty=2)
lines(cr$mid, cr$divCSIB, col="purple", lty=2)

Did it solve the problem?

Not entirely…

No subsampling

cor.test(sampling$colls, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52221, p-value = 4.471e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5701468

CR subsampling

cor.test(sampling$colls, cr$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and cr$divCSIB
S = 74214, p-value = 0.0001504
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.3891114

… but it is better!

Occurrence weighted By-list subsampling

Another version of rarefaction
Keeps lists (collections) intact
People call it $O^xW$
Largely superseeded by SQS

arg	use
`type`	the type, here `oxw`
`xexp=0`	q: coll No.
`xexp=1`	q: occ No.

oxwStg <- subsample(dat, bin="stg", tax="clgen", coll="collection_no", 
    q=200, type="oxw", xexp=0, iter=10, duplicates=FALSE, na.rm=TRUE)

Issues with plain CR and by-list

sampling of rare genera is disproportionately low
decrease in curve variance as the quota decreases (1000, 800, 600, 400)

Shareholder quorum subsampling

Aims to get a certain quorum of the sample
Algorithmic solution (c.f. Coverage-based rarefaction, which is the analytical solution)

sqsStg<-subsample(dat, bin="stg", tax="clgen", 
    coll="collection_no", q=0.7, iter=10, 
    ref="reference_no",singleton="ref", type="sqs", duplicates=FALSE, 
    excludeDominant=TRUE, largestColl =TRUE, na.rm=TRUE)

arg	meaning
q	target quorum
type	subsamplinng type
ref	reference columns
singleton	What is a singleton based on?
excludeDominant	tiny correction by Alroy (2010)
largestColl	tiny correction by Alroy (2010)

Prepare to plot

Merge with timescale

sqs <- merge(stages, sqsStg, by="stg")

Timescale

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

SQS vs CR vs raw

lines(patterns$mid, patterns$divCSIB, col="purple")
lines(cr$mid, cr$divCSIB, col="purple", lty=2)
lines(sqs$mid, sqs$divCSIB, col="purple", lty=3)

Guidelines

Note

Use subsampling to test the robustness of your results!
Use as high iter as necessary to stabilize the results (usually above 100)
Can be used to do rates as well! (use CR!)

Environmental affinities

Does the taxon like env. A. or B?
Raw occurreence counts are misleading
First row: equal sampling
Second row: unequal
sampling (shifts occurrence balance)
divDyn::affinity() -> ?affinity

The coral example

Details

Material available online on Evolv-ED
Go through this at your own pace
Shout if you have any issues!

References

Alroy, John. 1996. “Constant Extinction, Constrained Diversification, and Uncoordinated Stasis in North American Mammals.” Palaeogeography, Palaeoclimatology, Palaeoecology 127: 285–311.

———. 2008. “Dynamics of Origination and Extinction in the Marine Fossil Record.” Proceedings of the National Academy of Science 105: 11536–42.

———. 2010. “The Shifting Balance of Diversity Among Major Marine Animal Groups.” Science 329: 1191–94. https://doi.org/10.1126/science.1189910.

———. 2014. “Accurate and Precise Estimates of Origination and Extinction Rates.” Paleobiology 40 (3): 374–97. https://doi.org/10.1666/13036.

———. 2015. “A More Precise Speciation and Extinction Rate Estimator.” Paleobiology 41 (04): 633–39. https://doi.org/10.1017/pab.2015.26.

Chao, Anne, and Lou Jost. 2012. “Coverage-Based Rarefaction and Extrapolation: Standardizing Samples by Completeness Rather Than Size.” Ecology 93 (12): 2533–47. https://doi.org/10.1890/11-1952.1.

Foote, M. 1999. “Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids.” Paleobiology 25 (S2): 1–115. https://doi.org/10.1017/S0094837300020236.

Foote, Michael. 2005. “Pulsed Origination and Extinction in the Marine Realm.” Paleobiology 31 (1): 6–20.

Gradstein, Felix M, James George Ogg, and Mark D Schmitz. 2020. The Geologic Time Scale 2020.

Kocsis, Ádám T., Carl J. Reddin, John Alroy, and Wolfgang Kiessling. 2019. “The R Package divDyn for Quantifying Diversity Dynamics Using Fossil Sampling Data.” Methods in Ecology and Evolution 10 (5): 735–43. https://doi.org/10.1111/2041-210X.13161.

MacArthur, Robert, and Edward O. Wilson. 1963. “An Equilibrium Theroy of Insular Zoogeography.” Evolution 17 (4): 373–87.

Miller, Arnold I., and Michael Foote. 1996. “Calibrating the Ordovician Radiation of Marine Life: Implications for Phanerozoic Diversity Trends.” Paleobiology 22 (2): 304–9.

Na, Lin, and Wolfgang Kiessling. 2015. “Diversity Partitioning During the Cambrian Radiation.” Proceedings of the National Academy of Sciences 112 (15): 4702–6. https://doi.org/10.1073/pnas.1424985112.

Sepkoski, J. John. 1997. “Biodiversity: Past, Present, and Future.” Journal of Paleontology 71 (4): 533–39. https://doi.org/10.1017/S0022336000040026.

Sepkoski, J. John, Jr. 1978. “A Kinetic Model of Phanerozoic Taxonomic Diversity I. Analysis of Marine Orders.” Paleobiology 4: 223–51. https://doi.org/10.2307/2400203.

Signor III, P. W., and J. H. Lipps. 1982. “Sampling Bias, Gradual Extinction Patterns and Catastrophesin the Fossil Record.” Geological Society of America Special Paper 190: 291–96.

Introduction to divDyn - Phanerozoic diversity

Note!

Background

What is this?

Diversity over time?

The divDyn project

Getting started

Resources

Installing and loading

Binning

Example time scales

47-bin, 10my time scale

Example time scales

95-bin stages

Required data - Interface conventions

Phanerozoic Marine Animals

About

Downloading the data

The chronosphere

The chronosphere

Downloading the PBDB

Filtering - I. Taxonomy: quality

Strategy

Filtering - I. Taxonomy: phyla

Filtering - I. Taxonomy: classes

Filtering - I. Taxonomy: more

Filtering - I. Taxonomy: execute!

Filtering - II. Environment

Filtering - II. Environment

Omitting unlithified sediments

Stratigraphic binning (stages): two solutions

95-bin stages (stg)

Stratigraphic binning (stages): two solutions

PBDB time-bins (stb)

Stratigraphic binning

Stage number lookup table

Combine/categorize entries

We need a single bin entry

What are the cases?

Deriving a single stage number

Result

Extra cases /1

The Cambrian

Extra cases /2

The Ordovician

The PBDB’s internal ‘time bins’ (stb)

Example solution

The PBDB’s internal ‘time bins’ (stb)

The PBDB’s internal ‘time bins’ (stb)

We are done with the preparation

Basic sampling patterns

Sampling

The result of binstat()

Unifying the output

Plotting

The number of records

The number of collections

Adding a legend

Sampling is heterogeneous

Main function: divDyn()

The output of divDyn()

Range-based metrics of diversity dynamics

Range-through diversity

Boundary-crosser diversity

Per capita rates (M. Foote 1999)

Plotting preparation

Plotting - empty canvas

Plotting divRT

Plotting divBC

Plotting rates (canvas)

Plotting rates (lines)

Plotting rates (legend)

Range-based rates - main features of results

Issues

Occurrence record-based metrics

Sampled-in-Bin diversity

Plotting SIB diversity (previous)

Plotting SIB diversity (line)

Plotting SIB diversity (legend)

Three-timer sampling completeness

95-bin stages (`stg`)

PBDB time-bins (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The result of `binstat()`

Main function: `divDyn()`

The output of `divDyn()`

Plotting `divRT`

Plotting `divBC`

Plotting `samp3t` (canvas)

Plotting `samp3t` (line)

Corrected SIB diversity (`divCSIB`)

Three-timer rates `3t`

Corrected Three-timer rates `C3t`

Gap-filler `GF` and `2f3` rates