Introduction to divDyn - Phanerozoic diversity

Ádám T. Kocsis

FAU GeoZentrum Nordbayern

Background

What is this?

Short for Diversity Dynamics
Background in island biogeography (MacArthur and Wilson 1963)
Diversity, extinction and origination metrics
Variables time series - to be analyzed further!

Alroy (2008) - PNAS

Diversity over time?

Overall increase?
Saturation?
Equilibrial dynamics?

J. John Sepkoski Jr. (1978)

The divDyn project

Estimating metrics has been a problem - everybody had different scripts
Implementing sampling standardization
Wrote basics for my PhD

The project

R hexsticker

https://github.com/divDyn

Getting started

Resources

Paper: Kocsis et al. (2019)
CRAN: https://cran.r-project.org/package=divDyn
Webpage: https://divdyn.github.io/r-package/
Handout: library(divDyn); vignette("handout")
Evolv-ED blog: https://evolv-ed.net (in-development)
Phanerozoic rates (SOM): https://github.com/divDyn/ddPhanero

Installing and loading

Package (v0.8.2) can be installed from CRAN.

install.packages("divDyn")

Attach the package with:

library(divDyn)

Binning

divDyn operates with discrete-time
Requires temporal/stratigraphic binning
One value/time bin
A timescale
Always the first step of the analysis!

Example time scales

47-bin, 10my time scale

data(tens, package="divDyn")
str(tens)

'data.frame':   49 obs. of  9 variables:
 $ X10    : chr  "Cambrian 1" "Cambrian 2" "Cambrian 3" "Cambrian 4" ...
 $ Ocean  : chr  "ar1" "cc1" "cc1" "cc1" ...
 $ Ocean2 : chr  "ar1" "" "" "" ...
 $ Climate: chr  "w" "w" "w" "w" ...
 $ bottom : num  542 532 513 501 490 ...
 $ mid    : num  537 522 507 495 483 ...
 $ top    : num  532 513 501 490 479 ...
 $ dur    : num  10 19 12 11 11.4 ...
 $ ten    : int  1 2 3 4 5 6 7 8 9 10 ...

Old PBDB logo (Sydney)

Example time scales

95-bin stages

data(stages, package="divDyn")
str(stages)

'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...

Based on: Gradstein, Ogg, and Schmitz (2020)

Required data - Interface conventions

x: occurrence data.frame to be analyzed
bin: column name of bin numbers (integer)
tax: column name of taxon entries (character)
coll: column name of sampling units (optional, character)
age: column name of numeric ages (optional, numeric)

Phanerozoic Marine Animals

About

The example was used as an application example for the divDyn paper
Was supposed to have a similar scope as the results of (J. John Sepkoski 1997) and (Alroy 2010)
The analyses are done at the genus level!

Stratigraphic assignments
Sampling patterns
Richness estimators
Taxonomic rates
Subsampling

More info: https://github.com/divDyn/ddPhanero

Downloading the data

We need almost the whole PBDB.

API call to produce downlad:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,genus,subgenus,coll,coords,loc,paleoloc,strat,stratext,lith,env,ref,crmod,timebins,timecompare,refattr,entname,attr,ident,img,plant,abund,ecospace,taphonomy,etbasis,pres,prot,lithext,geo,methods,resgroup,refattr,ent

takes about 30 minutes…
2GB+ .csv file (read.csv())
or…

The chronosphere

API for research data acquisition (analytical code!)
Versioned data entries
PBDB from July 11, 2024 (time-bins are not working ATM!)

Install from CRAN

install.packages("chronosphere")

Load package

The chronosphere

Library to access static data object
Instantiated object = datafile (remote URL) + import code (dependencies)
Time series of data products

Downloading the PBDB

Actual download:

# Download data (July 11, 2024)
dat <- chronosphere::fetch("pbdb", ver="20240711")

# The most up-to-date version
# dat <- chronosphere::fetch("pbdb")

# to a given directory 
# dat <- chronosphere::fetch("pbdb", ver="20240711", datadir="<yourdir>")

Takes about 1 minute, ~130MB download.

After done, check if structure is ok!

str(dat)

'data.frame':   1611424 obs. of  147 variables:
 $ occurrence_no      : int  1 2 3 4 5 7 8 9 10 11 ...
 $ record_type        : chr  "occ" "occ" "occ" "occ" ...
 $ reid_no            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ flags              : chr  "" "" "" "" ...
 $ collection_no      : int  1 1 1 2 3 4 4 1 5 5 ...
 $ identified_name    : chr  "Australosutura llanoensis" "Carbonocoryphe planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ identified_rank    : chr  "species" "species" "species" "species" ...
 $ identified_no      : int  349412 349526 349420 349411 349411 349411 349420 349411 349411 349668 ...
 $ difference         : chr  "" "recombined as" "" "" ...
 $ accepted_name      : chr  "Australosutura llanoensis" "Phillibole planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ accepted_attr      : logi  NA NA NA NA NA NA ...
 $ accepted_rank      : chr  "species" "species" "species" "species" ...
 $ accepted_no        : int  349412 349526 349418 349410 349410 349410 349418 349410 349410 349664 ...
 $ early_interval     : chr  "Ivorian" "Ivorian" "Ivorian" "Ivorian" ...
 $ late_interval      : chr  "" "" "" "" ...
 $ max_ma             : num  354 354 354 354 354 ...
 $ min_ma             : num  347 347 347 347 347 ...
 $ ref_author         : chr  "Brezinski" "Brezinski" "Brezinski" "Brezinski" ...
 $ ref_pubyr          : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
 $ reference_no       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ phylum             : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
 $ phylum_no          : chr  "18891" "18891" "18891" "18891" ...
 $ class              : chr  "Trilobita" "Trilobita" "Trilobita" "Trilobita" ...
 $ class_no           : chr  "19100" "19100" "19100" "19100" ...
 $ order              : chr  "Proetida" "Proetida" "Proetida" "Proetida" ...
 $ order_no           : chr  "21062" "21062" "21062" "21062" ...
 $ family             : chr  "Brachymetopidae" "Phillipsiidae" "Phillipsiidae" "Proetidae" ...
 $ family_no          : chr  "56732" "85866" "85866" "56726" ...
 $ genus              : chr  "Australosutura" "Archegonus (Phillibole)" "Thigriffides" "Pudoproetus" ...
 $ genus_no           : chr  "21084" "21075" "21387" "177081" ...
 $ subgenus_no        : int  NA 21306 NA NA NA NA NA NA NA NA ...
 $ collection_name    : chr  "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9048" ...
 $ collection_subset  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ collection_aka     : chr  "" "" "" "" ...
 $ lng                : num  -99 -99 -99 -98.1 -98.1 ...
 $ lat                : num  31.2 31.2 31.2 31 31 ...
 $ cc                 : chr  "US" "US" "US" "US" ...
 $ state              : chr  "Texas" "Texas" "Texas" "Texas" ...
 $ county             : chr  "San Saba" "San Saba" "San Saba" "" ...
 $ latlng_basis       : chr  "based on nearby landmark" "based on nearby landmark" "based on nearby landmark" "" ...
 $ latlng_precision   : chr  "2" "2" "2" "1" ...
 $ geogscale          : chr  "outcrop" "outcrop" "outcrop" "" ...
 $ geogcomments       : chr  "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "Near San Saba, TX" ...
 $ paleomodel         : chr  "gplates" "gplates" "gplates" "gplates" ...
 $ geoplate           : chr  "101" "101" "101" "101" ...
 $ paleoage           : chr  "mid" "mid" "mid" "mid" ...
 $ paleolng           : num  -65.7 -65.7 -65.7 -65.6 -65.6 ...
 $ paleolat           : num  -26.7 -26.7 -26.7 -27.5 -27.5 ...
 $ formation          : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ stratgroup         : chr  "" "" "" "" ...
 $ member             : chr  "" "" "" "" ...
 $ formation.1        : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ stratgroup.1       : chr  "" "" "" "" ...
 $ member.1           : chr  "" "" "" "" ...
 $ stratscale         : chr  "bed" "bed" "bed" "" ...
 $ zone               : chr  "" "" "" "S. isostichia - crenulata through S. anchoralis - latus" ...
 $ localsection       : chr  "" "" "" "" ...
 $ localbed           : chr  "" "" "" "" ...
 $ localbedunit       : chr  "" "" "" "" ...
 $ localorder         : chr  "" "" "" "" ...
 $ regionalsection    : chr  "" "" "" "" ...
 $ regionalbed        : chr  "" "" "" "" ...
 $ regionalbedunit    : chr  "" "" "" "" ...
 $ regionalorder      : chr  "" "" "" "" ...
 $ stratcomments      : chr  "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Late Kinderhookian and Osagean" ...
 $ lithdescript       : chr  "" "" "" "" ...
 $ lithology1         : chr  "\"limestone\"" "\"limestone\"" "\"limestone\"" "\"carbonate\"" ...
 $ lithadj1           : chr  "" "" "" "" ...
 $ lithification1     : chr  "lithified" "lithified" "lithified" "" ...
 $ minor_lithology1   : chr  "" "" "" "" ...
 $ fossilsfrom1       : chr  "" "" "" "" ...
 $ lithology2         : chr  "" "" "" "" ...
 $ lithadj2           : chr  "" "" "" "" ...
 $ lithification2     : chr  "" "" "" "" ...
 $ minor_lithology2   : chr  "" "" "" "" ...
 $ fossilsfrom2       : chr  "" "" "" "" ...
 $ primary_reference  : chr  "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ ...
 $ created            : chr  "1998-11-20 07:59:51" "1998-11-20 07:59:51" "1998-11-20 07:59:51" "2001-03-26 11:06:31" ...
 $ modified           : chr  "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2000-11-30 12:09:35" ...
 $ time_bins          : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_contain       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_major         : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_buffer        : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_overlap       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ authorizer         : chr  "J. Alroy" "J. Alroy" "J. Alroy" "W. Kiessling" ...
 $ enterer            : chr  "M. Sommers" "M. Sommers" "M. Sommers" "W. Kiessling" ...
 $ modifier           : chr  "M. Clapham" "M. Clapham" "M. Clapham" "" ...
 $ primary_name       : chr  "Australosutura" "Carbonocoryphe" "Thigriffides" "Pudoproetus" ...
 $ primary_reso       : chr  "" "" "" "" ...
 $ subgenus_name      : chr  "" "" "" "" ...
 $ subgenus_reso      : chr  "" "" "" "" ...
 $ species_name       : chr  "llanoensis" "planucauda" "roundyi" "chappelensis" ...
 $ species_reso       : chr  "" "" "" "" ...
 $ image_no           : int  1746 1746 1746 1746 1746 1746 1746 1746 1746 1746 ...
 $ plant_organ        : chr  "" "" "" "" ...
 $ plant_organ2       : chr  "" "" "" "" ...
 $ abund_value        : chr  "4" "1" "2" "6" ...
 $ abund_unit         : chr  "specimens" "specimens" "specimens" "specimens" ...
 $ taxon_environment  : chr  "marine" "marine" "marine" "marine" ...
  [list output truncated]
 - attr(*, "chronosphere")=List of 13
  ..$ src         : chr "pbdb"
  ..$ ser         : chr "occs3"
  ..$ res         : chr "species"
  ..$ ver         : int 20240711
  ..$ datafile    : chr "pbdb_occs.rds"
  ..$ item        : int 2503
  ..$ reference   : chr [1:2] "Please remember to acknowledge the Paleobiology Database (http://paleobiodb.org) in your publication." "Alternatively, you can use: The Paleobiology Database Community. (year). Denormalized occurrence table of the P"| __truncated__
  ..$ bibtex      : chr [1:2] "NA" "NA"
  ..$ downloadDate: POSIXct[1:1], format: "2024-08-09 06:19:21"
  ..$ productDate : chr "2024-07-11"
  ..$ infoURL     : chr "https://chronosphere.info/data/pbdb/occs3/"
  ..$ API         : chr "https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,"| __truncated__
  ..$ additional  : list()

Filtering - I. Taxonomy: quality

Get rid of unprecise entries!

# filter records not identified at least to genus
dat <-dat[dat$accepted_rank %in% c("genus", "species"),]

# omit non-informative genus entries
dat <- dat[dat$genus!="", ]

# Monitor data loss:
nrow(dat)

[1] 1435105

Strategy

Goal: to match the original scope of Sepkoski and Alroy
Conservative inclusion at high levels (whole group is to be included)
Include particular lower level entries
Dirty…

Filtering - I. Taxonomy: phyla

marineNoPlant <- c("",
    "Agmata",
    "Annelida",
    "Bilateralomorpha",
    "Brachiopoda",
    "Bryozoa",
    "Calcispongea",
    "Chaetognatha",
    "Cnidaria",
    "Ctenophora",
    "Echinodermata",
    "Entoprocta",
    "Foraminifera",
    "Hemichordata",
    "Hyolitha",
    "Mollusca",
    "Nematoda",
    "Nematomorpha",
    "Nemertina",
    "Onychophora",
    "Petalonamae",
    "Phoronida",
    "Platyhelminthes",
    "Porifera",
    "Problematica",
    "Rhizopodea",
    "Rotifera",
    "Sarcomastigophora",
    "Sipuncula",
    "Uncertain",
    "Vetulicolia",
    ""
)

# which rows?
bByPhyla <- dat$phylum%in% marineNoPlant

Filtering - I. Taxonomy: classes

#B. classes
#   levels(factor(noNeed$class))
needClass <- c(
    "Acanthodii",
    "Actinopteri",
    "Actinopterygii",
    "Agnatha",
    "Cephalaspidomorphi",
    "Chondrichthyes",
    "Cladistia",
    "Coelacanthimorpha",
    "Conodonta",
    "Galeaspida",
    "Myxini",
    "Osteichthyes",
    "Petromyzontida",
    "Plagiostomi",
    "Pteraspidomorphi",
    # here come the Arthropods
    "Artiopoda",
    "Branchiopoda",
    "Cephalocarida",
    "Copepoda",
    "Malacostraca",
    "Maxillopoda",
    "Megacheira",
    "Merostomoidea",
    "Ostracoda",
    "Paratrilobita",
    "Pycnogonida",
    "Remipedia",
    "Thylacocephala",
    "Trilobita",
    "Xiphosura"
)

# which rows?
bNeedClass <- dat$class %in% needClass

Filtering - I. Taxonomy: more

#C.  mammals
#   mammals <- dat[dat$class=="Mammalia",]
#   levels(factor(mammals$order))
needMammalOrd <- c("Cetacea", "Sirenia")

# which rows?
bMammalOrder <- dat$order %in% needMammalOrd

# the carnivores
#   carnivores <- dat[dat$order=="Carnivora",]
#   levels(factor(carnivores$family))
needFam <- c("Otariidae", "Phocidae", "Desmatophocidae")

# which rows?
bNeedMamFam<- dat$family%in%needFam

# D. Reptiles
#   reptiles <- dat[dat$class=="Reptilia",]
#   levels(factor(reptiles$order))
needReptOrd<-c(
    "Eosauropterygia",
    "Hupehsuchia",
    "Ichthyosauria",
    "Placodontia",
    "Sauropterygia",
    "Thalattosauria"
)

# which rows?
bRept <- dat$order%in%needReptOrd

# E. turtles 
#   turtles <- dat[dat$order=="Testudines",]
#   levels(factor(turtles$family))

# Chelonioidea turtles
needTurtleFam <- c(
    "Cheloniidae",
    "Protostegidae",
    "Dermochelyidae",
    "Dermochelyoidae",
    "Toxochelyidae",
    "Pancheloniidae"
)

# which rows?
bTurtle <- dat$family%in%needTurtleFam

Filtering - I. Taxonomy: execute!

Subsetting for the rows we highlighted:

# subset the data
dat <- dat[bByPhyla | bNeedClass | bMammalOrder | bNeedMamFam | bRept | bTurtle , ]

Resolve the potential homonymy problem across taxa:

dat$clgen <- paste(dat$class, dat$genus)

Number of remaining records:

nrow(dat)

[1] 1015237

Filtering - II. Environment

Fossils come from different environments:

levels(factor((dat$environment)))

 [1] ""                                    
 [2] "\"channel\""                         
 [3] "\"floodplain\""                      
 [4] "alluvial fan"                        
 [5] "basin reef"                          
 [6] "basinal (carbonate)"                 
 [7] "basinal (siliceous)"                 
 [8] "basinal (siliciclastic)"             
 [9] "carbonate indet."                    
[10] "cave"                                
[11] "channel lag"                         
[12] "coarse channel fill"                 
[13] "coastal indet."                      
[14] "crater lake"                         
[15] "crevasse splay"                      
[16] "deep subtidal indet."                
[17] "deep subtidal ramp"                  
[18] "deep subtidal shelf"                 
[19] "deep-water indet."                   
[20] "delta front"                         
[21] "delta plain"                         
[22] "deltaic indet."                      
[23] "dry floodplain"                      
[24] "dune"                                
[25] "eolian indet."                       
[26] "estuary/bay"                         
[27] "fine channel fill"                   
[28] "fissure fill"                        
[29] "fluvial indet."                      
[30] "fluvial-deltaic indet."              
[31] "fluvial-lacustrine indet."           
[32] "foreshore"                           
[33] "glacial"                             
[34] "interdistributary bay"               
[35] "interdune"                           
[36] "intrashelf/intraplatform reef"       
[37] "karst indet."                        
[38] "lacustrine - large"                  
[39] "lacustrine - small"                  
[40] "lacustrine delta front"              
[41] "lacustrine delta plain"              
[42] "lacustrine deltaic indet."           
[43] "lacustrine indet."                   
[44] "lacustrine prodelta"                 
[45] "lagoonal"                            
[46] "lagoonal/restricted shallow subtidal"
[47] "levee"                               
[48] "loess"                               
[49] "marginal marine indet."              
[50] "marine indet."                       
[51] "mire/swamp"                          
[52] "offshore"                            
[53] "offshore indet."                     
[54] "offshore ramp"                       
[55] "offshore shelf"                      
[56] "open shallow subtidal"               
[57] "paralic indet."                      
[58] "perireef or subreef"                 
[59] "peritidal"                           
[60] "platform/shelf-margin reef"          
[61] "pond"                                
[62] "prodelta"                            
[63] "reef, buildup or bioherm"            
[64] "sand shoal"                          
[65] "shallow subtidal indet."             
[66] "shoreface"                           
[67] "sinkhole"                            
[68] "slope"                               
[69] "slope/ramp reef"                     
[70] "spring"                              
[71] "submarine fan"                       
[72] "tar"                                 
[73] "terrestrial indet."                  
[74] "transition zone/lower shoreface"     
[75] "wet floodplain"

Filtering - II. Environment

We want go get rid of non-marine ones:

omitEnv <- c(
    "\"floodplain\"",
    "alluvial fan",
    "cave",
    "\"channel\"",
    "channel lag" ,
    "coarse channel fill",
    "crater lake",
    "crevasse splay",
    "dry floodplain",
    "delta plain",
    "dune",
    "eolian indet.",
    "fine channel fill",
    "fissure fill",
    "fluvial indet.",
    "fluvial-lacustrine indet.",
    "fluvial-deltaic indet.",
    "glacial",
    "interdune",
    "karst indet.",
    "lacustrine - large",
    "lacustrine - small",
    "lacustrine delta front",
    "lacustrine delta plain",
    "lacustrine deltaic indet.",
    "lacustrine indet.",
    "lacustrine interdistributary bay",
    "lacustrine prodelta",
    "levee",
    "loess",
    "mire/swamp",
    "pond",
    "sinkhole",
    "spring",
    "tar",
    "terrestrial indet.",
    "wet floodplain"
)

Actual omission:

dat<-dat[!dat$environment%in%omitEnv, ]

# remaining
nrow(dat)

[1] 997995

Omitting unlithified sediments

Unusually good preservation (usually in more recent intervals) might cause temporal biases in sampling. To evade this problem, it can be a good idea to omit records from unlithified sediments.

dat <- dat[dat$lithification1!="unlithified",]
nrow(dat)

[1] 928088

Stratigraphic binning (stages): two solutions

95-bin stages (`stg`)

data(stages, package="divDyn")
str(stages)

'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...

From Ediacaran
Some stages are merged

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning (stages): two solutions

PBDB time-bins (`stb`)

Literal stages of the GTS (99 bins)
Internally done by the PBDB
Using time_bins, time_contain, time_overlap, time_major, time_buffer
Sounds awesome but unstable!

Note

We start with solution 1 to better understand the process.

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning

Stratigraphic attributes belong to collections
Assignment is uncertain
Hundreds of names: standard, regional, historic all entries come from:
- early_interval: obligatory, starting name
- late_interval: facultative, if collection range doesn’t fit in one interval

Stage number lookup table

Relevant information is in the keys object: ?keys

data(keys, package="divDyn")
str(keys,1)

List of 8
 $ tenInt:List of 51
 $ stgInt:List of 84
 $ reefs :List of 3
 $ lith  :List of 3
 $ lat   :List of 3
 $ bath  :List of 3
 $ grain :List of 3
 $ depenv:List of 3

This object contains useful information for processing the PBDB: from the old FossilWorks and from scripts of W. Kiessling

The relevant information is in:

keys$stgInt

$`14`
 [1] "Cressagian"   "Datsonian"    "Deming"       "Demingian"    "Gasconadian" 
 [6] "Mansian"      "Migneintian"  "Pakerort"     "Skullrockian" "Stairsian"   
[11] "Tremadoc"     "Tremadocian"  "Varangu"      "Warendian"   

$`15`
 [1] "Bendigonian"   "Billingen"     "Blackhillsian" "Cassinian"    
 [5] "Chewtonian"    "Floian"        "Hunneberg"     "Jefferson"    
 [9] "Jeffersonian"  "Moridunian"    "Tulean"       

$`16`
[1] "Castlemainian" "Dapingian"     "Dawan"         "Fennian"      
[5] "Klabava"       "Langevoja"     "Rangerian"     "Volkhov"      
[9] "Yapeenian"    

$`17`
 [1] "Abereiddian"      "Aluoja"           "Aseri"            "Chazy"           
 [5] "Chazyan"          "Darriwilian"      "Dobrotiva"        "Dobrotivian"     
 [9] "Early Llandeilo"  "Early Llanvirn"   "Hulo"             "Hunderum"        
[13] "Kunda"            "Lasnamagi"        "Late Llandeilo"   "Late Llanvirn"   
[17] "Llandeilian"      "Llandeilo"        "Llanvirn"         "Llanvirnian"     
[21] "Middle Llandeilo" "Oretanian"        "Sarka"            "Uhaku"           
[25] "Valaste"          "Whiterock"        "Wildernessian"   

$`18`
 [1] "Ashbyan"      "Aurelucian"   "Black River"  "Blackriveran" "Burrellian"  
 [6] "Costonian"    "Gisbornian"   "Haljala"      "Harnagian"    "Idavere"     
[11] "Johvi"        "Kirkfieldian" "Kukruse"      "Mohawkian"    "Rocklandian" 
[16] "Sandbian"     "Soudleyan"    "Turinian"    

$`19`
 [1] "Actonian"      "Bohdalec"      "Cautleyan"     "Chatfieldian" 
 [5] "Cheneyan"      "Eastonian"     "Edenian"       "Katian"       
 [9] "Kralodvorian"  "Marshbrookian" "Maysville"     "Maysvillian"  
[13] "Nabala"        "Oandu"         "Onnian"        "Pirgu"        
[17] "Pusgillian"    "Rakvere"       "Rawtheyan"     "Richmond"     
[21] "Richmondian"   "Shermanian"    "Streffordian"  "Vormsi"       
[25] "Woolstonian"   "Zahorany"     

$`20`
[1] "Gamach"     "Gamachian"  "Hirnantian" "Kosovian"   "Porkuni"   

$`21`
[1] "Juuru"      "Rhuddanian"

$`22`
[1] "Aeronian"

$`23`
[1] "Telychian"

$`24`
[1] "Jaani"        "Sheinwoodian"

$`25`
[1] "Gleedonian"  "Homerian"    "Rootsikula"  "Whitwellian"

$`26`
[1] "Bringewood" "Elton"      "Gorstian"  

$`27`
[1] "Kuressaare"   "Leintwardine" "Ludfordian"   "Whitcliffe"  

$`28`
[1] "Kaugetuma" "Ohesaare"  "Pridoli"  

$`29`
[1] "Gedinne"    "Gedinnine"  "Lochkovian"

$`30`
[1] "Pragian" "Siegen" 

$`31`
[1] "Deer Park"    "Early Emsian" "Ems"          "Emsian"       "Late Emsian" 
[6] "Zlichovian"  

$`32`
[1] "Eifel"     "Eifelian"  "Southwood"

$`33`
[1] "Early Givetian"  "Givet"           "Givetian"        "Late Givetian"  
[5] "Middle Givetian" "Taghanic"       

$`34`
[1] "Early Frasnian"  "Frasnian"        "Late Frasnian"   "Middle Frasnian"

$`35`
[1] "Early Famennian"  "Famennian"        "Late Famennian"   "Middle Famennian"

$`36`
[1] "Early Mississippian" "Hastarian"           "Ivorian"            
[4] "Kinderhookian"       "Tournaisian"        

$`37`
[1] "Arundian"   "Asbian"     "Brigantian" "Chadian"    "Holkerian" 
[6] "Jiusian"    "Meramecian" "Visean"    

$`38`
[1] "Arnsbergian"        "Late Mississippian" "Pendleian"         
[4] "Serpukhovian"      

$`39`
 [1] "Alportian"           "Bashkirian"          "Cheremshanian"      
 [4] "Chokierian"          "Early Pennsylvanian" "Kinderscoutian"     
 [7] "Krasnopolyanian"     "Marsdenian"          "Morrowan"           
[10] "Prikarnian"          "Voznesenian"         "Westphalian A"      
[13] "Yeadonian"          

$`40`
 [1] "Desmoinesian"         "Kashirian"            "Middle Pennsylvanian"
 [4] "Moscovian"            "Myachkovian"          "Podolskian"          
 [7] "Vereian"              "Westphalian B"        "Westphalian C"       
[10] "Westphalian D"       

$`41`
[1] "Dorogovilovian" "Kasimovian"     "Khamovnichean"  "Krevyakian"    
[5] "Missourian"     "Stephanian A"   "Stephanian B"  

$`42`
[1] "Gzhelian"        "Kuzel"           "Melekhovian"     "Noginian"       
[5] "Pavlovoposadian" "Rusavkian"       "Stephanian C"   

$`43`
[1] "Asselian"   "Autunian"   "Shikhanian"

$`44`
[1] "Sakmarian"      "Sterlitamakian" "Tastubian"     

$`45`
[1] "Artinskian" "Sarginian" 

$`46`
[1] "Irenian"   "Kungurian" "Saranian"  "Ufimian"  

$`47`
[1] "Early Kazanian" "Kazanian"       "Kubergandian"   "Late Kazanian" 
[5] "Roadian"        "Sokian"        

$`48`
[1] "Murgabian" "Urzhumian" "Word"      "Wordian"  

$`49`
[1] "Capitan"    "Capitanian"

$`50`
[1] "Longtanian"    "Wuchiapingian"

$`51`
[1] "Changhsingian"

$`52`
[1] "Dienerian"    "Griesbachian" "Induan"      

$`53`
[1] "Olenekian" "Smithian"  "Spathian" 

$`54`
[1] "Aegean"    "Anisian"   "Bithynian" "Illyrian"  "Pelsonian"

$`55`
[1] "Fassanian"    "Ladinian"     "Longobardian"

$`56`
[1] "Carnian"  "Julian"   "Oretian"  "Tuvalian"

$`57`
[1] "Alaunian" "Lacian"   "Norian"   "Otamitan" "Sevatian" "Warepan" 

$`58`
[1] "Otapirian" "Rhaetian" 

$`59`
[1] "Angulata"          "Canadensis"        "Early Hettangian" 
[4] "Hettangian"        "Late Hettangian"   "Liasicus"         
[7] "Middle Hettangian" "Planorbis"        

$`60`
[1] "Bucklandi"        "Early Sinemurian" "Late Sinemurian"  "Obtusum"         
[5] "Oxynotum"         "Raricostatum"     "Semicostatum"     "Sinemurian"      
[9] "Turneri"         

$`61`
 [1] "Davoei"              "Early Pliensbachian" "Freboldi"           
 [4] "Ibex"                "Imlayi"              "Jamesoni"           
 [7] "Kunae"               "Late Pliensbachian"  "Margaritatus"       
[10] "Pliensbachian"       "Spinatum"            "Whiteavesi"         

$`62`
 [1] "Bifrons"         "Early Toarcian"  "Falciferum"      "Kanense"        
 [5] "Late Toarcian"   "Levesquei"       "Middle Toarcian" "Planulata"      
 [9] "Tenuicostatum"   "Thouarsense"     "Toarcian"        "Variabilis"     

$`63`
[1] "Aalenian"        "Concavum"        "Early Aalenian"  "Late Aalenian"  
[5] "Middle Aalenian" "Murchisonae"     "Opalinum"        "Scissum"        

$`64`
 [1] "Bajocian"       "Discites"       "Early Bajocian" "Garantiana"    
 [5] "Humphresianum"  "Laeviuscula"    "Late Bajocian"  "Parkinsoni"    
 [9] "Sauzei"         "Subfurcatum"   

$`65`
[1] "Bathonian"        "Early Bathonian"  "Late Bathonian"   "Middle Bathonian"

$`66`
[1] "Callovian"        "Early Callovian"  "Late Callovian"   "Middle Callovian"

$`67`
[1] "Early Oxfordian"  "Late Oxfordian"   "Middle Oxfordian" "Oxfordian"       

$`68`
[1] "Early Kimmeridgian" "Kimmeridgian"       "Late Kimmeridgian" 

$`69`
[1] "Early Tithonian"  "Late Tithonian"   "Middle Tithonian" "Puaroan"         
[5] "Tithonian"       

$`70`
[1] "Berriasian"        "Early Berriasian"  "Late Berriasian"  
[4] "Middle Berriasian"

$`71`
[1] "Early Valanginian" "Late Valanginian"  "Valanginian"      

$`72`
[1] "Early Hauterivian" "Hauterivian"       "Late Hauterivian" 

$`73`
[1] "Barremian"       "Early Barremian" "Late Barremian" 

$`74`
[1] "Aptian"       "Early Aptian" "Late Aptian" 

$`75`
[1] "Albian"        "Early Albian"  "Late Albian"   "Middle Albian"
[5] "Motuan"        "Urutawan"     

$`76`
[1] "Cenomanian"        "Early Cenomanian"  "Late Cenomanian"  
[4] "Middle Cenomanian" "Ngaterian"        

$`77`
[1] "Early Turonian"  "Late Turonian"   "Mangaotanean"    "Middle Turonian"
[5] "Turonian"       

$`78`
[1] "Coniacian"        "Early Coniacian"  "Late Coniacian"   "Middle Coniacian"
[5] "Teratan"         

$`79`
[1] "Early Santonian"  "Late Santonian"   "Middle Santonian" "Piripauan"       
[5] "Santonian"       

$`80`
[1] "Campanian"        "Early Campanian"  "Judithian"        "Late Campanian"  
[5] "Middle Campanian"

$`81`
[1] "Early Maastrichtian" "Edmontonian"         "Lancian"            
[4] "Late Maastrichtian"  "Maastrichtian"      

$`82`
[1] "Danian"          "Early Paleocene" "Pu1"             "Pu3"            
[5] "Puercan"         "To2"             "To3"             "Torrejonian"    

$`83`
 [1] "Cf1"              "Cf2"              "Cf3"              "Clarkforkian"    
 [5] "Gashatan"         "Late Paleocene"   "Middle Paleocene" "Riochican"       
 [9] "Selandian"        "Thanetian"        "Ti1"             

$`84`
 [1] "Bumbanian"    "Early Eocene" "Graybullian"  "Lostcabinian" "Lysitean"    
 [6] "Mangaorapan"  "MP 10"        "MP 7"         "MP 8 + 9"     "Sandcouleean"
[11] "Waipawan"     "Wasatchian"   "Ypresian"    

$`85`
[1] "Early Uintan" "Late Uintan"  "Lutetian"     "MP 11"        "MP 13"       
[6] "MP 14"        "Mustersan"    "Uintan"      

$`86`
[1] "Bartonian"  "Duchesnean"

$`87`
 [1] "Chadronian"        "Early Chadronian"  "Ergilian"         
 [4] "Houldjinian"       "Kaiatan"           "Late Chadronian"  
 [7] "Late Eocene"       "Middle Chadronian" "MP 18"            
[10] "MP 19"             "MP 20"             "Priabonian"       
[13] "Runangan"          "Ulangochuian"     

$`88`
[1] "Early Oligocene" "MP 21"           "MP 22"           "MP 23"          
[5] "MP 24"           "Orellan"         "Rupelian"        "Whitneyan"      

$`89`
[1] "Chattian"       "Duntroonian"    "Late Oligocene" "Monroecreekian"
[5] "MP 26"          "MP 27"          "MP 28"          "MP 29"         
[9] "MP 30"         

$`90`
 [1] "Aquitanian"          "Burdigalian"         "Early Hemingfordian"
 [4] "Early Miocene"       "Hemingfordian"       "Late Hemingfordian" 
 [7] "MN 1"                "MN 3"                "MN 4"               
[10] "Orleanian"           "Santacrucian"       

$`91`
 [1] "Badenian"       "Clifdenian"     "Colloncuran"    "Langhian"      
 [5] "Lillburnian"    "Middle Miocene" "MN 5"           "MN 6"          
 [9] "Sarmatian"      "Serravallian"   "Waiauan"       

$`92`
 [1] "Chasicoan"               "early Early Hemphillian"
 [3] "Kapitean"                "Late Miocene"           
 [5] "Messinian"               "MN 10"                  
 [7] "MN 11"                   "MN 12"                  
 [9] "MN 9"                    "Pannonian"              
[11] "Tortonian"               "Vallesian"              

$`93`
 [1] "Early Pliocene" "Early Pliocene" "Late Pliocene"  "Late Pliocene" 
 [5] "MN 14"          "MN 16"          "Opoitian"       "Piacenzian"    
 [9] "Pliocene"       "Villanyian"     "Waipipian"      "Zanclean"      

$`94`
[1] "Calabrian"          "Early Pleistocene"  "Gelasian"          
[4] "Ionian"             "Late Pleistocene"   "Middle Pleistocene"
[7] "MN 17"              "Pleistocene"       

$`95`
[1] "Holocene"

$`-1`
[1] ""

$default
[1] NA

Combine/categorize entries

Built-in categorize() function allows us to find entries in keys$stgInt:

# B. the stg entries (lookup)
stgMin<-categorize(dat[,"early_interval"],keys$stgInt)
stgMax<-categorize(dat[,"late_interval"],keys$stgInt)

entry	result	meaning
`"Ypresian"`	`84`	name indicates interval confined to a stage
`"Permian"`	`NA`	Name indicates interval not confined to a stage
`""`	`-1`	Inteval entry is not given - `late_interval`

Lower number - earlier stage.

We need a single bin entry

Make sure that numbers are numbers!

stgMin<-as.numeric(stgMin)
stgMax<-as.numeric(stgMax)

We need just one bin for every entry! Add this to the table, we will fill this in the next step.

# empty container
dat$stg <- rep(NA, nrow(dat))

What are the cases?

stgMin	stgMax	Meaning	stg
84	-1	`early_interval` resolves to one stage, no `late_interval`	84
84	84	`early_interval` resolves to same stage as `late_interval`	84
84	86	collection can range from stages `84` to `86`	NA
84	NA	`late_interval` does not resolve to one stage	NA
NA	-1	`early_interval` does not resolve to one stage, the other missing	NA
NA	NA	neither `early_interval` nor `late_interval` resolve to one stage	NA

Deriving a single stage number

Programming the rules above with logical expressions:

# select entries, where
stgCondition <- c(
# the early and late interval fields indicate the same stg
    which(stgMax==stgMin),
# or the late_intervarl field is empty
    which(stgMax==-1))

In these entries, use the stg indicated by the early_interval:

dat$stg[stgCondition] <- stgMin[stgCondition]

Result

integer: record comes from a one stage
NA: record does not come from one stage

Extra cases /1

The Cambrian and the Ordovician are problematic.

The Cambrian

Collections were assigned by Na Lin to the 10 stages (Na and Kiessling 2015) .

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/cambStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2018-08-31/cambProcess.R")

Note

This is only necessary if you do Phanerozoic scale analyses!
You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

Extra cases /2

The Cambrian and the Ordovician are problematic.

The Ordovician

Assigments based on formations and biozone by Wolfgang Kiessling.

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/ordStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2019-05-31/ordProcess.R")

Note

This is only necessary if you do Phanerozoic scale analyses!
You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

Example solution

Function to extract relevant information from the PBDB download

#' Function to generate stage-lookup table from the Paleobiology Database
#'
#' Processess time-bin data downloaded from the database
#'
#' @param pbdb data.frame from database.
#' @param stagecolumn Charater string, of the time-bin information column
#' @param min_ma Character string, column name of the minimum ages 
#' @param max_ma Character string, column name of the maximum ages 
#' @param bin Character string, this will be the column name of the bin information
#' @return A data.frame with the stages' name bottom, mid and top age and bin number.
StagesFromPBDB<- function(pbdb, stagecolumn="time_contain", min_ma="min_ma", max_ma="max_ma", bin="stb"){
    # get the appropriate colulmns out of the PBDB download
    timeinfo <- unique(pbdb[,c(stagecolumn, max_ma, min_ma)])

    # omit those entires that are not meaningful
    timeinfo <- timeinfo[timeinfo[,stagecolumn]!="-",]
    
    # search for stage minima and maxima
    ma <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, max_ma], max)
    min <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, min_ma], min)

    # create stages object
    stagesDF <- data.frame(stage=names(ma), bottom=ma, mid=(ma+min)/2, top=min)
    stagesDF <-stagesDF[order(stagesDF$bottom, decreasing=TRUE),]
    rownames(stagesDF) <- NULL

    # check consistency - Cambrian is not consistent
    # stages$top[1:(nrow(stages)-1)]==stages$bottom[2:nrow(stages)]

    # create integer bin numbers
    stagesDF<- cbind(stagesDF, 1:nrow(stagesDF))
    colnames(stagesDF)[ncol(stagesDF)] <- bin
    return(stagesDF)
}

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

Call to the function:

# new stages object
stagebins <- StagesFromPBDB(dat)
stagebins

              stage   bottom       mid      top stb
1         Fortunian 538.8000 533.90000 529.0000   1
2           Stage 2 525.5000 523.25000 521.0000   2
3           Stage 3 521.0000 517.50000 514.0000   3
4           Stage 4 514.0000 511.50000 509.0000   4
5           Wuliuan 509.0000 506.75000 504.5000   5
6           Drumian 504.5000 502.50000 500.5000   6
7        Guzhangian 500.5000 498.75000 497.0000   7
8           Paibian 497.0000 495.50000 494.0000   8
9      Jiangshanian 494.0000 491.75000 489.5000   9
10         Stage 10 489.5000 487.45000 485.4000  10
11      Tremadocian 485.4000 481.55000 477.7000  11
12           Floian 477.7000 473.85000 470.0000  12
13        Dapingian 470.0000 468.65000 467.3000  13
14      Darriwilian 467.3000 462.85000 458.4000  14
15         Sandbian 458.4000 455.70000 453.0000  15
16           Katian 453.0000 449.10000 445.2000  16
17       Hirnantian 445.2000 444.50000 443.8000  17
18       Rhuddanian 443.8000 442.30000 440.8000  18
19         Aeronian 440.8000 439.65000 438.5000  19
20        Telychian 438.5000 435.95000 433.4000  20
21     Sheinwoodian 433.4000 431.95000 430.5000  21
22         Homerian 430.5000 428.95000 427.4000  22
23         Gorstian 427.4000 426.50000 425.6000  23
24       Ludfordian 425.6000 422.40000 419.2000  24
25       Lochkovian 419.2000 415.00000 410.8000  25
26          Pragian 410.8000 409.20000 407.6000  26
27           Emsian 407.6000 400.45000 393.3000  27
28         Eifelian 393.3000 390.50000 387.7000  28
29         Givetian 387.7000 385.20000 382.7000  29
30         Frasnian 382.7000 377.45000 372.2000  30
31        Famennian 372.2000 365.55000 358.9000  31
32      Tournaisian 358.9000 352.80000 346.7000  32
33           Visean 346.7000 338.80000 330.9000  33
34     Serpukhovian 330.9000 327.05000 323.2000  34
35       Bashkirian 323.2000 319.20000 315.2000  35
36        Moscovian 315.2000 311.10000 307.0000  36
37       Kasimovian 307.0000 305.35000 303.7000  37
38         Gzhelian 303.7000 301.30000 298.9000  38
39         Asselian 298.9000 296.21000 293.5200  39
40        Sakmarian 293.5200 291.81000 290.1000  40
41       Artinskian 290.1000 286.80000 283.5000  41
42        Kungurian 283.5000 278.25500 273.0100  42
43          Roadian 273.0100 269.95500 266.9000  43
44          Wordian 266.9000 265.59000 264.2800  44
45       Capitanian 264.2800 261.89500 259.5100  45
46    Wuchiapingian 259.5100 256.82500 254.1400  46
47    Changhsingian 254.1400 253.02100 251.9020  47
48           Induan 251.9020 251.55100 251.2000  48
49        Olenekian 251.2000 249.20000 247.2000  49
50          Anisian 247.2000 244.60000 242.0000  50
51         Ladinian 242.0000 239.50000 237.0000  51
52          Carnian 237.0000 232.00000 227.0000  52
53           Norian 227.0000 217.75000 208.5000  53
54         Rhaetian 208.5000 204.95000 201.4000  54
55       Hettangian 201.4000 200.45000 199.5000  55
56       Sinemurian 199.5000 196.20000 192.9000  56
57    Pliensbachian 192.9000 188.55000 184.2000  57
58         Toarcian 184.2000 179.45000 174.7000  58
59         Aalenian 174.7000 172.80000 170.9000  59
60         Bajocian 170.9000 169.55000 168.2000  60
61        Bathonian 168.2000 166.75000 165.3000  61
62        Callovian 165.3000 163.40000 161.5000  62
63        Oxfordian 161.5000 158.15000 154.8000  63
64     Kimmeridgian 154.8000 152.00000 149.2000  64
65        Tithonian 149.2000 147.10000 145.0000  65
66       Berriasian 145.0000 142.40000 139.8000  66
67      Valanginian 139.8000 136.20000 132.6000  67
68      Hauterivian 132.6000 129.18500 125.7700  68
69        Barremian 125.7700 123.58500 121.4000  69
70           Aptian 121.4000 117.20000 113.0000  70
71           Albian 113.0000 106.75000 100.5000  71
72       Cenomanian 100.5000  97.20000  93.9000  72
73         Turonian  93.9000  91.85000  89.8000  73
74        Coniacian  89.8000  88.05000  86.3000  74
75        Santonian  86.3000  84.95000  83.6000  75
76        Campanian  83.6000  77.85000  72.1000  76
77    Maastrichtian  72.1000  69.05000  66.0000  77
78           Danian  66.0000  63.80000  61.6000  78
79        Selandian  61.6000  60.40000  59.2000  79
80        Thanetian  59.2000  57.60000  56.0000  80
81         Ypresian  56.0000  51.90000  47.8000  81
82         Lutetian  47.8000  44.50000  41.2000  82
83        Bartonian  41.2000  39.45500  37.7100  83
84       Priabonian  37.7100  35.80500  33.9000  84
85         Rupelian  33.9000  30.86000  27.8200  85
86         Chattian  27.8200  25.42500  23.0300  86
87       Aquitanian  23.0300  21.73500  20.4400  87
88      Burdigalian  20.4400  18.21000  15.9800  88
89         Langhian  15.9800  14.90000  13.8200  89
90     Serravallian  13.8200  12.72500  11.6300  90
91        Tortonian  11.6300   9.43800   7.2460  91
92        Messinian   7.2460   6.28950   5.3330  92
93         Zanclean   5.3330   4.46650   3.6000  93
94       Piacenzian   3.6000   3.09000   2.5800  94
95         Gelasian   2.5800   2.19000   1.8000  95
96        Calabrian   1.8000   1.28700   0.7740  96
97        Chibanian   0.7740   0.45150   0.1290  97
98 Late Pleistocene   0.1290   0.07035   0.0117  98
99         Holocene   0.0117   0.00585   0.0000  99

Note the issue with the beginning of the Cambrian!

The PBDB’s internal ‘time bins’ (`stb`)

Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
Strategy:

Make timescale from the entries
Assign bin numbers to collections/records

#' Function to bin PBDB data to stages
#'
#' @param pbdb data.frame from database.
#' @param stages data.frame produced by StagesFromPBDB.
#' @param bin Character string, this will be the column name of the bin information
#' @param stagecolumn Character string, of the time-bin information column in the PBDB
#' @return A vector with integer bin numbers
BinPBDB <- function(pbdb,stages,bin="stb",stagecolumn="time_contain"){
    # new vector
    y <- rep(NA, nrow(pbdb))

    # simple for loop to execute the binning
    for(i in 1:nrow(stages)){
        y[which(pbdb[, stagecolumn]==stages$stage[i])] <-stages[i,bin]
    }

    return(y)
}

And finally do the binning:

# bins from the pbdb output
dat$stb <- BinPBDB(dat, stagebins)

We are done with the preparation

For other exercises (CMR): join some bits from the timescale

# stg (95bin) entries
stgFields <- stages[,c("stg", "stage","top", "bottom")]
colnames(stgFields)[2:4] <- paste0("stg_", colnames(stgFields))[2:4]
datMerged <- merge(dat,stgFields, by="stg")

# stb (99bin) entries
stbFields <- stagebins
colnames(stbFields)[1:4] <- paste0("stb_",colnames(stbFields)[1:4])
datMerged <- merge(datMerged,stbFields, by="stb")

This might be a good time to take a snapshot and save an incremental result object (e.g. .rds)

Note

The smaller subset you look at, the weirder your data behaves.
This should not be considered final!

Basic sampling patterns

The number of collections and number of records (occurrences) are very much correlated with sampling.

These simple statistics can be calculated with basic functions:

table(dat$stg)


    4     5     6     7     8     9    10    11    12    13    14    15    16 
 1359  1813  5664  2663  3948  1030  2400  1203  2597  1808  6438  5863  4609 
   17    18    19    20    21    22    23    24    25    26    27    28    29 
15147 15589 33271  3766  4662  3224  5797  6980  3918  2584  4827  4094  7735 
   30    31    32    33    34    35    36    37    38    39    40    41    42 
 5270  7888  7489 22065 10775  7303  6320 13284  4576  3806  6477  2859  3106 
   43    44    45    46    47    48    49    50    51    52    53    54    55 
 5519  8105  8132 14374 10898 11201  8941 18660 16266  7765 10146 14441  5832 
   56    57    58    59    60    61    62    63    64    65    66    67    68 
10964  8626  5910  3419  6500 17322 15314  3964  5897  6702  9198 10294  6909 
   69    70    71    72    73    74    75    76    77    78    79    80    81 
 6909  3384  7064  5434  3627  7383 10922 13777  8334  2269  5000 14460 36044 
   82    83    84    85    86    87    88    89    90    91    92    93    94 
 7612  6084  8027  9156 12154 10772  8273  7182 14902 14514 20438 21203 24030 
   95 
 3446

Sampling

To make these even faster and convenient, the binstat() function can calculate them all at once:

sampStg<- binstat(dat, bin="stg", tax="clgen", coll="collection_no", 
        duplicates=FALSE)

Note

duplicates=FALSE makes sure that multiple occurrences of the same genus in the same collection (i.e. when there are multiple species) are counted as one!

The result of `binstat()`

str(sampStg)

'data.frame':   95 obs. of  8 variables:
 $ stg     : num  1 2 3 4 5 6 7 8 9 10 ...
 $ occs    : num  NA NA NA 1074 1466 ...
 $ colls   : num  NA NA NA 217 225 756 636 821 266 463 ...
 $ SIBs    : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ occ1    : num  NA NA NA 74 83 236 194 199 144 124 ...
 $ occ2    : num  NA NA NA 17 34 105 73 75 54 58 ...
 $ u       : num  NA NA NA 0.931 0.943 ...
 $ chao1occ: num  NA NA NA 321 319 ...

By default, there are as many rows as the highest integer in the bin argument (95).
There are missing values in the beginning. This is can help with plotting.
Better: plot the values in the context of the timescale.

Unifying the output

Easiest: merge the results with the time scale object.

sampling <- merge(stages, sampStg, by="stg")
str(sampling)

'data.frame':   95 obs. of  20 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ occs     : num  NA NA NA 1074 1466 ...
 $ colls    : num  NA NA NA 217 225 756 636 821 266 463 ...
 $ SIBs     : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ occ1     : num  NA NA NA 74 83 236 194 199 144 124 ...
 $ occ2     : num  NA NA NA 17 34 105 73 75 54 58 ...
 $ u        : num  NA NA NA 0.931 0.943 ...
 $ chao1occ : num  NA NA NA 321 319 ...

Plotting

We create an empty canvas to plot on.

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,20000), 
    ylab="Number of Records")

The number of records

# occurrences
lines(sampling$mid, sampling$occs, lwd=2)

The number of collections

# collections
lines(sampling$mid, sampling$colls, lwd=2, col="blue")

Adding a legend

legend("topleft", bg="white", legend=c("occurrences", "collections"), 
    col=c("black", "blue"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Sampling is heterogeneous

Strong correlation between collection and record numbers

cor.test(sampling$occs, sampling$colls, method="spearman")


    Spearman's rank correlation rho

data:  sampling$occs and sampling$colls
S = 14290, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8898787

Main function: `divDyn()`

Calculate basic metrics of diversity dynamics:

dd <- divDyn(dat, bin="stg", tax="clgen")
str(dd)

'data.frame':   95 obs. of  31 variables:
 $ stg      : num  1 2 3 4 5 6 7 8 9 10 ...
 $ t2d      : num  NA NA NA NA 93 119 205 169 147 101 ...
 $ t2u      : num  NA NA NA 93 119 205 169 147 101 79 ...
 $ t3       : num  NA NA NA NA 52 56 67 52 44 24 ...
 $ tPart    : num  NA NA NA NA 14 2 25 13 19 5 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 5 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 11 6 9 9 ...
 $ tSing    : num  NA NA NA 51 47 350 149 222 74 116 ...
 $ tOri     : num  NA NA NA 109 78 172 113 112 57 86 ...
 $ tExt     : num  NA NA NA NA 38 64 128 121 95 80 ...
 $ tThrough : num  NA NA NA NA 71 85 129 121 138 115 ...
 $ divSIB   : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ divCSIB  : num  NA NA NA NA 218 ...
 $ divRT    : num  NA NA NA 160 234 671 519 576 364 397 ...
 $ divBC    : num  NA NA NA NA 109 149 257 242 233 195 ...
 $ extProp  : num  NA NA NA 0.319 0.363 ...
 $ oriProp  : num  NA NA NA 1 0.534 ...
 $ extPC    : num  NA NA NA NA 0.429 ...
 $ oriPC    : num  NA NA NA NA 0.741 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.828 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.468 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.28 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.329 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.788 ...
 $ sampRange: num  NA NA NA NA 0.775 ...

The output of `divDyn()`

Returns a data.frame, row of values for every bin. Prefixes:

t: taxon counts
div: diversity counts
ext: extinction metrics
ori: origination metrics
samp: sampling metrics

See: ?divDyn for more…

Range-based metrics of diversity dynamics

Range-through diversity

divRT = tSing(FL) + tExt(bL) + tOri(Ft) + tThrough(bt)

Boundary-crosser diversity

divBC = ~~tSing(FL)~~ + tExt(bL) + tOri(Ft) + tThrough(bt)

Per capita rates (M. Foote 1999)

Time homogeneous process in bin, Decay rate
Originally divided by $\Delta t$
Assumption about pulsedness (Michael Foote 2005)

oriPC = -log( tOri / ( tOri + tThrough ))
extPC = -log( tExt / ( tExt + tThrough ))

Plotting preparation

To make plotting easy:

patterns <- merge(stages, dd, by="stg")
str(patterns)

'data.frame':   95 obs. of  43 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ t2d      : num  NA NA NA NA 93 119 205 169 147 101 ...
 $ t2u      : num  NA NA NA 93 119 205 169 147 101 79 ...
 $ t3       : num  NA NA NA NA 52 56 67 52 44 24 ...
 $ tPart    : num  NA NA NA NA 14 2 25 13 19 5 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 5 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 11 6 9 9 ...
 $ tSing    : num  NA NA NA 51 47 350 149 222 74 116 ...
 $ tOri     : num  NA NA NA 109 78 172 113 112 57 86 ...
 $ tExt     : num  NA NA NA NA 38 64 128 121 95 80 ...
 $ tThrough : num  NA NA NA NA 71 85 129 121 138 115 ...
 $ divSIB   : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ divCSIB  : num  NA NA NA NA 218 ...
 $ divRT    : num  NA NA NA 160 234 671 519 576 364 397 ...
 $ divBC    : num  NA NA NA NA 109 149 257 242 233 195 ...
 $ extProp  : num  NA NA NA 0.319 0.363 ...
 $ oriProp  : num  NA NA NA 1 0.534 ...
 $ extPC    : num  NA NA NA NA 0.429 ...
 $ oriPC    : num  NA NA NA NA 0.741 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.828 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.468 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.28 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.329 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.788 ...
 $ sampRange: num  NA NA NA NA 0.775 ...

Plotting - empty canvas

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Plotting `divRT`

lines(patterns$mid, patterns$divRT)

Plotting `divBC`

lines(patterns$mid, patterns$divBC, col="blue")

Plotting rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Taxonomic rates")

Plotting rates (lines)

lines(patterns$mid, patterns$oriPC, col="green")
lines(patterns$mid, patterns$extPC, col="red")

Plotting rates (legend)

legend("topleft", bg="white", legend=c("origination", "extinction"), 
    col=c("green", "red"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Can you name the main events and patterns?

Range-based rates - main features of results

Decline of rates
Mass extinctions + (rebounds)
Edge effects

Issues

Smearing!
(Signor III and Lipps 1982) effect
More severe with worse sampling
Pull of the Recent

Occurrence record-based metrics

“occurrence-based”

Sampled-in-Bin diversity

divSIB
The number of sampled taxa in the focal bin.
Very vulnerable to the sampling universe effect

Plotting SIB diversity (previous)

Previous code:

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")

Plotting SIB diversity (line)

The SIB diversity:

lines(patterns$mid, patterns$divSIB, col="gray")

Plotting SIB diversity (legend)

The SIB diversity:

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB"), 
    col=c("black", "blue", "gray"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Three-timer sampling completeness

samp3T = t3 / (t3 + tPart)
$[0,1]$ - How well-sampled is the interval?

Plotting `samp3t` (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1), 
    ylab="Sampling completeness")

Plotting `samp3t` (line)

lines(patterns$mid, patterns$samp3t)

Corrected SIB diversity (`divCSIB`)

Use samp3t to correct divSIB
divCSIV = divSIB / samp3t * samp3tTotal

Plotting CSIB diversity (previous)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")
lines(patterns$mid, patterns$divSIB, col="gray")

Plotting CSIB diversity (line)

lines(patterns$mid, patterns$divCSIB, col="purple")

Plotting CSIB diversity (legend)

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB", "divCSIB"), 
    col=c("black", "blue", "gray", "purple"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Diversity comparison

Range-based values are higher and less volatile (smoother)
Face-value results: divRT
Truest method: divCSIB (with subsampling)

Occurrence record-based rates

Purpose: get rid of the smearing effects
Approximate the per capita rates, when sampling is complete

Three-timer rates (Alroy 1996)
Corrected three-timer rates (Alroy 2008)
Gap-filler rates (Alroy 2014)
Second-for-third substitution rates (Alroy 2015)
- Wider moving windows, more complex patterns

Three-timer rates `3t`

Per capita rates in a 3-bin window.

Taxon patterns:

Lower two-timers (t2d)
Upper two-timers (t2u)
Three-timer (t3)

(Alroy 1996)

ext3t = log( t2d / t3)
ori3t = log( t2u / t3)

Corrected Three-timer rates `C3t`

Improvement presented by Alroy (2008)

extC3t = log( t2d / t3 ) + log( samp3t[i+1] )
oriC3t = log( t2u / t3 ) - log( samp3t[i-1] )

Gap-filler `GF` and `2f3` rates

Gap filler equations (Alroy 2014): $extC3t = log((t2d + tPart)/(t3 + tPart + tGFu))$
$oriC3t = log((t2u + tPart)/(t3 + tPart + tGFd))$

Improvements in (Alroy 2015): second-for-third substituion rates. These are algorithmic solutions. These are the state-of-the-art rate metrics of this kind:

ext2f3
ori2f3

Comparison of extinction rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Extinction rates")

Comparison of extinction rates (lines)

lines(patterns$mid, patterns$ext3t, col="black")
lines(patterns$mid, patterns$extC3t, col="orange")
lines(patterns$mid, patterns$extGF, col="cyan")
lines(patterns$mid, patterns$ext2f3, col="darkred")

Comparison of extinction rates (legend)

legend("topleft", bg="white", legend=c("3t", "C3t", "GF", "2f3"), 
    col=c("black", "orange", "cyan", "darkred"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Rate types - comparison

Different metrics converge, but they vary!
No edge or smearing effects with the occurrence-based ones
They omit lots of data, very unprecise: negative values!

Rate General guidelines

Use

PC: General assessment/raw patterns
2f3: When possible, rate estimation at MEs

Fig. from Kocsis et al. (2019)

Subsampling of samples in divDyn

Why

Comparing samples with different quantity of information
What would our results look like if we had the same quantity of information?

Miller and Foote (1996)

Two main types

Classical Rarefaction (CR)

Analytical and algorithmic approach
target: a particular number of records
By-list subsampling

Shareholder Quorum Subsampling (SQS)

Based on coverage
Target is a certain level of quorum (coverage)

Algorithmic solution: SQS (Alroy 2010)
Analytical solution: Coverage-based rarefaction (Chao and Jost 2012)

Implementation in divDyn

Subsampling for one sample is the first step
Example implementation on Evolv-ED

Subsampling with Phanerozoic Marine Animals

Sampling’s effects (occurrences)

cor.test(sampling$occs, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$occs and patterns$divCSIB
S = 28386, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7663415

Massive correlation between sampling and diversity in a bin!

Sampling’s effects (collections)

cor.test(sampling$colls, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52450, p-value = 9.109e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5682595

Massive correlation between sampling and diversity in a bin!

Subsampling in a time-bin context

Kocsis et al. (2019)

Subsample every bin
Calculate a statistic in mind - run a function on the data, e.g. divDyn()
Store the result
Iterate 1-3 a number of times

CR with Phanerozoic-scale data

crStg<-subsample(dat, bin="stg", tax="clgen", coll="collection_no", q=1000, 
    iter=5, duplicates=FALSE, na.rm=TRUE)

Arguments

arg	meaning
`q`	Subsampling quota
`iter`	Number of iterations
`duplicates`	Multiple records of the same genus in a list?
`coll`	What is a list (for `duplicates`)
`na.rm`	Omit `NA` bin assignments automatically

CR with Phanerozoic-scale data (canvas)

Prepartion for plotting

cr <- merge(stages, crStg, by="stg")

Timescale plot

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Raw diversity curves (RT and CSIB)

lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divCSIB, col="purple")

Adding subsampling results

lines(cr$mid, cr$divRT, lty=2)
lines(cr$mid, cr$divCSIB, col="purple", lty=2)

Did it solve the problem?

Not entirely…

No subsampling

cor.test(sampling$colls, patterns$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52450, p-value = 9.109e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5682595

CR subsampling

cor.test(sampling$colls, cr$divCSIB, method="spearman")


    Spearman's rank correlation rho

data:  sampling$colls and cr$divCSIB
S = 73152, p-value = 0.001691
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.3333819

… but it is better!

Occurrence weighted By-list subsampling

Another version of rarefaction
Keeps lists (collections) intact
People call it $O^xW$
Largely superseeded by SQS

arg	use
`type`	the type, here `oxw`
`xexp=0`	q: coll No.
`xexp=1`	q: occ No.

oxwStg <- subsample(dat, bin="stg", tax="clgen", coll="collection_no", 
    q=200, type="oxw", xexp=0, iter=10, duplicates=FALSE, na.rm=TRUE)

Issues with plain CR and by-list

sampling of rare genera is disproportionately low
decrease in curve variance as the quota decreases (1000, 800, 600, 400)

Shareholder quorum subsampling

Aims to get a certain quorum of the sample
Algorithmic solution (c.f. Coverage-based rarefaction, which is the analytical solution)

sqsStg<-subsample(dat, bin="stg", tax="clgen", 
    coll="collection_no", q=0.7, iter=10, 
    ref="reference_no",singleton="ref", type="sqs", duplicates=FALSE, 
    excludeDominant=TRUE, largestColl =TRUE, na.rm=TRUE)

arg	meaning
q	target quorum
type	subsamplinng type
ref	reference columns
singleton	What is a singleton based on?
excludeDominant	tiny correction by Alroy (2010)
largestColl	tiny correction by Alroy (2010)

Prepare to plot

Merge with timescale

sqs <- merge(stages, sqsStg, by="stg")

Timescale

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

SQS vs CR vs raw

lines(patterns$mid, patterns$divCSIB, col="purple")
lines(cr$mid, cr$divCSIB, col="purple", lty=2)
lines(sqs$mid, sqs$divCSIB, col="purple", lty=3)

Guidelines

Note

Use subsampling to test the robustness of your results!
Use as high iter as necessary to stabilize the results (usually above 100)
Can be used to do rates as well! (use CR!)

Environmental affinities

Does the taxon like env. A. or B?
Raw occurreence counts are misleading
First row: equal sampling
Second row: unequal
sampling (shifts occurrence balance)
divDyn::affinity() -> ?affinity

The coral example

Details

Material available online on Evolv-ED
Go through this at your own pace
Shout if you have any issues!

References

Alroy, John. 1996. “Constant Extinction, Constrained Diversification, and Uncoordinated Stasis in North American Mammals.” Palaeogeography, Palaeoclimatology, Palaeoecology 127: 285–311.

———. 2008. “Dynamics of Origination and Extinction in the Marine Fossil Record.” Proceedings of the National Academy of Science 105: 11536–42.

———. 2010. “The Shifting Balance of Diversity Among Major Marine Animal Groups.” Science 329: 1191–94. https://doi.org/10.1126/science.1189910.

———. 2014. “Accurate and Precise Estimates of Origination and Extinction Rates.” Paleobiology 40 (3): 374–97. https://doi.org/10.1666/13036.

———. 2015. “A More Precise Speciation and Extinction Rate Estimator.” Paleobiology 41 (04): 633–39. https://doi.org/10.1017/pab.2015.26.

Chao, Anne, and Lou Jost. 2012. “Coverage-Based Rarefaction and Extrapolation: Standardizing Samples by Completeness Rather Than Size.” Ecology 93 (12): 2533–47. https://doi.org/10.1890/11-1952.1.

Foote, M. 1999. “Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids.” Paleobiology 25 (S2): 1–115. https://doi.org/10.1017/S0094837300020236.

Foote, Michael. 2005. “Pulsed Origination and Extinction in the Marine Realm.” Paleobiology 31 (1): 6–20.

Gradstein, Felix M, James George Ogg, and Mark D Schmitz. 2020. The Geologic Time Scale 2020.

Kocsis, Ádám T., Carl J. Reddin, John Alroy, and Wolfgang Kiessling. 2019. “The R Package divDyn for Quantifying Diversity Dynamics Using Fossil Sampling Data.” Methods in Ecology and Evolution 10 (5): 735–43. https://doi.org/10.1111/2041-210X.13161.

MacArthur, Robert, and Edward O. Wilson. 1963. “An Equilibrium Theroy of Insular Zoogeography.” Evolution 17 (4): 373–87.

Miller, Arnold I., and Michael Foote. 1996. “Calibrating the Ordovician Radiation of Marine Life: Implications for Phanerozoic Diversity Trends.” Paleobiology 22 (2): 304–9.

Na, Lin, and Wolfgang Kiessling. 2015. “Diversity Partitioning During the Cambrian Radiation.” Proceedings of the National Academy of Sciences 112 (15): 4702–6. https://doi.org/10.1073/pnas.1424985112.

Sepkoski, J. John. 1997. “Biodiversity: Past, Present, and Future.” Journal of Paleontology 71 (4): 533–39. https://doi.org/10.1017/S0022336000040026.

Sepkoski, J. John, Jr. 1978. “A Kinetic Model of Phanerozoic Taxonomic Diversity I. Analysis of Marine Orders.” Paleobiology 4: 223–51. https://doi.org/10.2307/2400203.

Signor III, P. W., and J. H. Lipps. 1982. “Sampling Bias, Gradual Extinction Patterns and Catastrophesin the Fossil Record.” Geological Society of America Special Paper 190: 291–96.

Introduction to divDyn - Phanerozoic diversity

Background

What is this?

Diversity over time?

The divDyn project

Getting started

Resources

Installing and loading

Binning

Example time scales

47-bin, 10my time scale

Example time scales

95-bin stages

Required data - Interface conventions

Phanerozoic Marine Animals

About

Downloading the data

The chronosphere

The chronosphere

Downloading the PBDB

Filtering - I. Taxonomy: quality

Strategy

Filtering - I. Taxonomy: phyla

Filtering - I. Taxonomy: classes

Filtering - I. Taxonomy: more

Filtering - I. Taxonomy: execute!

Filtering - II. Environment

Filtering - II. Environment

Omitting unlithified sediments

Stratigraphic binning (stages): two solutions

95-bin stages (stg)

Stratigraphic binning (stages): two solutions

PBDB time-bins (stb)

Stratigraphic binning

Stage number lookup table

Combine/categorize entries

We need a single bin entry

What are the cases?

Deriving a single stage number

Result

Extra cases /1

The Cambrian

Extra cases /2

The Ordovician

The PBDB’s internal ‘time bins’ (stb)

Example solution

The PBDB’s internal ‘time bins’ (stb)

The PBDB’s internal ‘time bins’ (stb)

We are done with the preparation

Basic sampling patterns

Sampling

The result of binstat()

Unifying the output

Plotting

The number of records

The number of collections

Adding a legend

Sampling is heterogeneous

Main function: divDyn()

The output of divDyn()

Range-based metrics of diversity dynamics

Range-through diversity

Boundary-crosser diversity

Per capita rates (M. Foote 1999)

Plotting preparation

Plotting - empty canvas

Plotting divRT

Plotting divBC

Plotting rates (canvas)

Plotting rates (lines)

Plotting rates (legend)

Range-based rates - main features of results

Issues

Occurrence record-based metrics

Sampled-in-Bin diversity

Plotting SIB diversity (previous)

Plotting SIB diversity (line)

Plotting SIB diversity (legend)

Three-timer sampling completeness

Plotting samp3t (canvas)

95-bin stages (`stg`)

PBDB time-bins (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The PBDB’s internal ‘time bins’ (`stb`)

The result of `binstat()`

Main function: `divDyn()`

The output of `divDyn()`

Plotting `divRT`

Plotting `divBC`

Plotting `samp3t` (canvas)

Plotting `samp3t` (line)

Corrected SIB diversity (`divCSIB`)

Three-timer rates `3t`

Corrected Three-timer rates `C3t`

Gap-filler `GF` and `2f3` rates