Introduction to divDyn - Phanerozoic diversity

Ádám T. Kocsis

FAU GeoZentrum Nordbayern

Background

What is this?

  • Short for Diversity Dynamics
  • Background in island biogeography (MacArthur and Wilson 1963)
  • Diversity, extinction and origination metrics
  • Variables time series - to be analyzed further!


Alroy (2008) - PNAS

Diversity over time?

  • Overall increase?
  • Saturation?
  • Equilibrial dynamics?


J. John Sepkoski Jr. (1978)

The divDyn project

  • Estimating metrics has been a problem - everybody had different scripts
  • Implementing sampling standardization
  • Wrote basics for my PhD


The project


R hexsticker


https://github.com/divDyn

Getting started

Resources

Installing and loading

Package (v0.8.2) can be installed from CRAN.

install.packages("divDyn")

Attach the package with:

library(divDyn)

Binning

  • divDyn operates with discrete-time
  • Requires temporal/stratigraphic binning
  • One value/time bin
  • A timescale
  • Always the first step of the analysis!

Example time scales

47-bin, 10my time scale

data(tens, package="divDyn")
str(tens)
'data.frame':   49 obs. of  9 variables:
 $ X10    : chr  "Cambrian 1" "Cambrian 2" "Cambrian 3" "Cambrian 4" ...
 $ Ocean  : chr  "ar1" "cc1" "cc1" "cc1" ...
 $ Ocean2 : chr  "ar1" "" "" "" ...
 $ Climate: chr  "w" "w" "w" "w" ...
 $ bottom : num  542 532 513 501 490 ...
 $ mid    : num  537 522 507 495 483 ...
 $ top    : num  532 513 501 490 479 ...
 $ dur    : num  10 19 12 11 11.4 ...
 $ ten    : int  1 2 3 4 5 6 7 8 9 10 ...

Old PBDB logo (Sydney)

Example time scales

95-bin stages

data(stages, package="divDyn")
str(stages)
'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...

Based on: Gradstein, Ogg, and Schmitz (2020)

Required data - Interface conventions

  • x: occurrence data.frame to be analyzed
  • bin: column name of bin numbers (integer)
  • tax: column name of taxon entries (character)
  • coll: column name of sampling units (optional, character)
  • age: column name of numeric ages (optional, numeric)

Phanerozoic Marine Animals

About

  • The example was used as an application example for the divDyn paper
  • Was supposed to have a similar scope as the results of (J. John Sepkoski 1997) and (Alroy 2010)
  • The analyses are done at the genus level!
  1. Stratigraphic assignments
  2. Sampling patterns
  3. Richness estimators
  4. Taxonomic rates
  5. Subsampling

More info: https://github.com/divDyn/ddPhanero

Downloading the data

We need almost the whole PBDB.

API call to produce downlad:

https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,genus,subgenus,coll,coords,loc,paleoloc,strat,stratext,lith,env,ref,crmod,timebins,timecompare,refattr,entname,attr,ident,img,plant,abund,ecospace,taphonomy,etbasis,pres,prot,lithext,geo,methods,resgroup,refattr,ent
  • takes about 30 minutes…
  • 2GB+ .csv file (read.csv())
  • or…

The chronosphere

  • API for research data acquisition (analytical code!)
  • Versioned data entries
  • PBDB from July 11, 2024 (time-bins are not working ATM!)

Install from CRAN

install.packages("chronosphere")

Load package

The chronosphere

  • Library to access static data object
  • Instantiated object = datafile (remote URL) + import code (dependencies)
  • Time series of data products

Downloading the PBDB

Actual download:

# Download data (July 11, 2024)
dat <- chronosphere::fetch("pbdb", ver="20240711")

# The most up-to-date version
# dat <- chronosphere::fetch("pbdb")

# to a given directory 
# dat <- chronosphere::fetch("pbdb", ver="20240711", datadir="<yourdir>")

Takes about 1 minute, ~130MB download.

After done, check if structure is ok!

str(dat)
'data.frame':   1611424 obs. of  147 variables:
 $ occurrence_no      : int  1 2 3 4 5 7 8 9 10 11 ...
 $ record_type        : chr  "occ" "occ" "occ" "occ" ...
 $ reid_no            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ flags              : chr  "" "" "" "" ...
 $ collection_no      : int  1 1 1 2 3 4 4 1 5 5 ...
 $ identified_name    : chr  "Australosutura llanoensis" "Carbonocoryphe planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ identified_rank    : chr  "species" "species" "species" "species" ...
 $ identified_no      : int  349412 349526 349420 349411 349411 349411 349420 349411 349411 349668 ...
 $ difference         : chr  "" "recombined as" "" "" ...
 $ accepted_name      : chr  "Australosutura llanoensis" "Phillibole planucauda" "Thigriffides roundyi" "Pudoproetus chappelensis" ...
 $ accepted_attr      : logi  NA NA NA NA NA NA ...
 $ accepted_rank      : chr  "species" "species" "species" "species" ...
 $ accepted_no        : int  349412 349526 349418 349410 349410 349410 349418 349410 349410 349664 ...
 $ early_interval     : chr  "Ivorian" "Ivorian" "Ivorian" "Ivorian" ...
 $ late_interval      : chr  "" "" "" "" ...
 $ max_ma             : num  354 354 354 354 354 ...
 $ min_ma             : num  347 347 347 347 347 ...
 $ ref_author         : chr  "Brezinski" "Brezinski" "Brezinski" "Brezinski" ...
 $ ref_pubyr          : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
 $ reference_no       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ phylum             : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
 $ phylum_no          : chr  "18891" "18891" "18891" "18891" ...
 $ class              : chr  "Trilobita" "Trilobita" "Trilobita" "Trilobita" ...
 $ class_no           : chr  "19100" "19100" "19100" "19100" ...
 $ order              : chr  "Proetida" "Proetida" "Proetida" "Proetida" ...
 $ order_no           : chr  "21062" "21062" "21062" "21062" ...
 $ family             : chr  "Brachymetopidae" "Phillipsiidae" "Phillipsiidae" "Proetidae" ...
 $ family_no          : chr  "56732" "85866" "85866" "56726" ...
 $ genus              : chr  "Australosutura" "Archegonus (Phillibole)" "Thigriffides" "Pudoproetus" ...
 $ genus_no           : chr  "21084" "21075" "21387" "177081" ...
 $ subgenus_no        : int  NA 21306 NA NA NA NA NA NA NA NA ...
 $ collection_name    : chr  "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9047, Jack Sloan Ranch" "USNM 9048" ...
 $ collection_subset  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ collection_aka     : chr  "" "" "" "" ...
 $ lng                : num  -99 -99 -99 -98.1 -98.1 ...
 $ lat                : num  31.2 31.2 31.2 31 31 ...
 $ cc                 : chr  "US" "US" "US" "US" ...
 $ state              : chr  "Texas" "Texas" "Texas" "Texas" ...
 $ county             : chr  "San Saba" "San Saba" "San Saba" "" ...
 $ latlng_basis       : chr  "based on nearby landmark" "based on nearby landmark" "based on nearby landmark" "" ...
 $ latlng_precision   : chr  "2" "2" "2" "1" ...
 $ geogscale          : chr  "outcrop" "outcrop" "outcrop" "" ...
 $ geogcomments       : chr  "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "800 to 900 feet SSE of an abandoned well and just SE of a stock pen on the Jack Sloan ranch. The pen and abando"| __truncated__ "Near San Saba, TX" ...
 $ paleomodel         : chr  "gplates" "gplates" "gplates" "gplates" ...
 $ geoplate           : chr  "101" "101" "101" "101" ...
 $ paleoage           : chr  "mid" "mid" "mid" "mid" ...
 $ paleolng           : num  -65.7 -65.7 -65.7 -65.6 -65.6 ...
 $ paleolat           : num  -26.7 -26.7 -26.7 -27.5 -27.5 ...
 $ formation          : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ stratgroup         : chr  "" "" "" "" ...
 $ member             : chr  "" "" "" "" ...
 $ formation.1        : chr  "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" "Chappel Limestone" ...
 $ stratgroup.1       : chr  "" "" "" "" ...
 $ member.1           : chr  "" "" "" "" ...
 $ stratscale         : chr  "bed" "bed" "bed" "" ...
 $ zone               : chr  "" "" "" "S. isostichia - crenulata through S. anchoralis - latus" ...
 $ localsection       : chr  "" "" "" "" ...
 $ localbed           : chr  "" "" "" "" ...
 $ localbedunit       : chr  "" "" "" "" ...
 $ localorder         : chr  "" "" "" "" ...
 $ regionalsection    : chr  "" "" "" "" ...
 $ regionalbed        : chr  "" "" "" "" ...
 $ regionalbedunit    : chr  "" "" "" "" ...
 $ regionalorder      : chr  "" "" "" "" ...
 $ stratcomments      : chr  "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Chappel Limestone is a condensed one-meter thick unit spanning the S. isostichia-crenulata through Gnathodus ty"| __truncated__ "Late Kinderhookian and Osagean" ...
 $ lithdescript       : chr  "" "" "" "" ...
 $ lithology1         : chr  "\"limestone\"" "\"limestone\"" "\"limestone\"" "\"carbonate\"" ...
 $ lithadj1           : chr  "" "" "" "" ...
 $ lithification1     : chr  "lithified" "lithified" "lithified" "" ...
 $ minor_lithology1   : chr  "" "" "" "" ...
 $ fossilsfrom1       : chr  "" "" "" "" ...
 $ lithology2         : chr  "" "" "" "" ...
 $ lithadj2           : chr  "" "" "" "" ...
 $ lithification2     : chr  "" "" "" "" ...
 $ minor_lithology2   : chr  "" "" "" "" ...
 $ fossilsfrom2       : chr  "" "" "" "" ...
 $ primary_reference  : chr  "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ "D. K. Brezinski. 1998. Trilobites from the Lower Mississippian starved basin facies of the southern United Stat"| __truncated__ ...
 $ created            : chr  "1998-11-20 07:59:51" "1998-11-20 07:59:51" "1998-11-20 07:59:51" "2001-03-26 11:06:31" ...
 $ modified           : chr  "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2018-02-02 16:22:36" "2000-11-30 12:09:35" ...
 $ time_bins          : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_contain       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_major         : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_buffer        : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ time_overlap       : chr  "Tournaisian" "Tournaisian" "Tournaisian" "Tournaisian" ...
 $ authorizer         : chr  "J. Alroy" "J. Alroy" "J. Alroy" "W. Kiessling" ...
 $ enterer            : chr  "M. Sommers" "M. Sommers" "M. Sommers" "W. Kiessling" ...
 $ modifier           : chr  "M. Clapham" "M. Clapham" "M. Clapham" "" ...
 $ primary_name       : chr  "Australosutura" "Carbonocoryphe" "Thigriffides" "Pudoproetus" ...
 $ primary_reso       : chr  "" "" "" "" ...
 $ subgenus_name      : chr  "" "" "" "" ...
 $ subgenus_reso      : chr  "" "" "" "" ...
 $ species_name       : chr  "llanoensis" "planucauda" "roundyi" "chappelensis" ...
 $ species_reso       : chr  "" "" "" "" ...
 $ image_no           : int  1746 1746 1746 1746 1746 1746 1746 1746 1746 1746 ...
 $ plant_organ        : chr  "" "" "" "" ...
 $ plant_organ2       : chr  "" "" "" "" ...
 $ abund_value        : chr  "4" "1" "2" "6" ...
 $ abund_unit         : chr  "specimens" "specimens" "specimens" "specimens" ...
 $ taxon_environment  : chr  "marine" "marine" "marine" "marine" ...
  [list output truncated]
 - attr(*, "chronosphere")=List of 13
  ..$ src         : chr "pbdb"
  ..$ ser         : chr "occs3"
  ..$ res         : chr "species"
  ..$ ver         : int 20240711
  ..$ datafile    : chr "pbdb_occs.rds"
  ..$ item        : int 2503
  ..$ reference   : chr [1:2] "Please remember to acknowledge the Paleobiology Database (http://paleobiodb.org) in your publication." "Alternatively, you can use: The Paleobiology Database Community. (year). Denormalized occurrence table of the P"| __truncated__
  ..$ bibtex      : chr [1:2] "NA" "NA"
  ..$ downloadDate: POSIXct[1:1], format: "2024-08-09 06:19:21"
  ..$ productDate : chr "2024-07-11"
  ..$ infoURL     : chr "https://chronosphere.info/data/pbdb/occs3/"
  ..$ API         : chr "https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,"| __truncated__
  ..$ additional  : list()

Filtering - I. Taxonomy: quality

Get rid of unprecise entries!

# filter records not identified at least to genus
dat <-dat[dat$accepted_rank %in% c("genus", "species"),]

# omit non-informative genus entries
dat <- dat[dat$genus!="", ]

# Monitor data loss:
nrow(dat)
[1] 1435105

Strategy

  • Goal: to match the original scope of Sepkoski and Alroy
  • Conservative inclusion at high levels (whole group is to be included)
  • Include particular lower level entries
  • Dirty…

Filtering - I. Taxonomy: phyla

marineNoPlant <- c("",
    "Agmata",
    "Annelida",
    "Bilateralomorpha",
    "Brachiopoda",
    "Bryozoa",
    "Calcispongea",
    "Chaetognatha",
    "Cnidaria",
    "Ctenophora",
    "Echinodermata",
    "Entoprocta",
    "Foraminifera",
    "Hemichordata",
    "Hyolitha",
    "Mollusca",
    "Nematoda",
    "Nematomorpha",
    "Nemertina",
    "Onychophora",
    "Petalonamae",
    "Phoronida",
    "Platyhelminthes",
    "Porifera",
    "Problematica",
    "Rhizopodea",
    "Rotifera",
    "Sarcomastigophora",
    "Sipuncula",
    "Uncertain",
    "Vetulicolia",
    ""
)

# which rows?
bByPhyla <- dat$phylum%in% marineNoPlant

Filtering - I. Taxonomy: classes

#B. classes
#   levels(factor(noNeed$class))
needClass <- c(
    "Acanthodii",
    "Actinopteri",
    "Actinopterygii",
    "Agnatha",
    "Cephalaspidomorphi",
    "Chondrichthyes",
    "Cladistia",
    "Coelacanthimorpha",
    "Conodonta",
    "Galeaspida",
    "Myxini",
    "Osteichthyes",
    "Petromyzontida",
    "Plagiostomi",
    "Pteraspidomorphi",
    # here come the Arthropods
    "Artiopoda",
    "Branchiopoda",
    "Cephalocarida",
    "Copepoda",
    "Malacostraca",
    "Maxillopoda",
    "Megacheira",
    "Merostomoidea",
    "Ostracoda",
    "Paratrilobita",
    "Pycnogonida",
    "Remipedia",
    "Thylacocephala",
    "Trilobita",
    "Xiphosura"
)

# which rows?
bNeedClass <- dat$class %in% needClass

Filtering - I. Taxonomy: more

#C.  mammals
#   mammals <- dat[dat$class=="Mammalia",]
#   levels(factor(mammals$order))
needMammalOrd <- c("Cetacea", "Sirenia")

# which rows?
bMammalOrder <- dat$order %in% needMammalOrd

# the carnivores
#   carnivores <- dat[dat$order=="Carnivora",]
#   levels(factor(carnivores$family))
needFam <- c("Otariidae", "Phocidae", "Desmatophocidae")

# which rows?
bNeedMamFam<- dat$family%in%needFam

# D. Reptiles
#   reptiles <- dat[dat$class=="Reptilia",]
#   levels(factor(reptiles$order))
needReptOrd<-c(
    "Eosauropterygia",
    "Hupehsuchia",
    "Ichthyosauria",
    "Placodontia",
    "Sauropterygia",
    "Thalattosauria"
)

# which rows?
bRept <- dat$order%in%needReptOrd

# E. turtles 
#   turtles <- dat[dat$order=="Testudines",]
#   levels(factor(turtles$family))

# Chelonioidea turtles
needTurtleFam <- c(
    "Cheloniidae",
    "Protostegidae",
    "Dermochelyidae",
    "Dermochelyoidae",
    "Toxochelyidae",
    "Pancheloniidae"
)

# which rows?
bTurtle <- dat$family%in%needTurtleFam

Filtering - I. Taxonomy: execute!

Subsetting for the rows we highlighted:

# subset the data
dat <- dat[bByPhyla | bNeedClass | bMammalOrder | bNeedMamFam | bRept | bTurtle , ]

Resolve the potential homonymy problem across taxa:

dat$clgen <- paste(dat$class, dat$genus)

Number of remaining records:

nrow(dat)
[1] 1015237

Filtering - II. Environment

Fossils come from different environments:

levels(factor((dat$environment)))
 [1] ""                                    
 [2] "\"channel\""                         
 [3] "\"floodplain\""                      
 [4] "alluvial fan"                        
 [5] "basin reef"                          
 [6] "basinal (carbonate)"                 
 [7] "basinal (siliceous)"                 
 [8] "basinal (siliciclastic)"             
 [9] "carbonate indet."                    
[10] "cave"                                
[11] "channel lag"                         
[12] "coarse channel fill"                 
[13] "coastal indet."                      
[14] "crater lake"                         
[15] "crevasse splay"                      
[16] "deep subtidal indet."                
[17] "deep subtidal ramp"                  
[18] "deep subtidal shelf"                 
[19] "deep-water indet."                   
[20] "delta front"                         
[21] "delta plain"                         
[22] "deltaic indet."                      
[23] "dry floodplain"                      
[24] "dune"                                
[25] "eolian indet."                       
[26] "estuary/bay"                         
[27] "fine channel fill"                   
[28] "fissure fill"                        
[29] "fluvial indet."                      
[30] "fluvial-deltaic indet."              
[31] "fluvial-lacustrine indet."           
[32] "foreshore"                           
[33] "glacial"                             
[34] "interdistributary bay"               
[35] "interdune"                           
[36] "intrashelf/intraplatform reef"       
[37] "karst indet."                        
[38] "lacustrine - large"                  
[39] "lacustrine - small"                  
[40] "lacustrine delta front"              
[41] "lacustrine delta plain"              
[42] "lacustrine deltaic indet."           
[43] "lacustrine indet."                   
[44] "lacustrine prodelta"                 
[45] "lagoonal"                            
[46] "lagoonal/restricted shallow subtidal"
[47] "levee"                               
[48] "loess"                               
[49] "marginal marine indet."              
[50] "marine indet."                       
[51] "mire/swamp"                          
[52] "offshore"                            
[53] "offshore indet."                     
[54] "offshore ramp"                       
[55] "offshore shelf"                      
[56] "open shallow subtidal"               
[57] "paralic indet."                      
[58] "perireef or subreef"                 
[59] "peritidal"                           
[60] "platform/shelf-margin reef"          
[61] "pond"                                
[62] "prodelta"                            
[63] "reef, buildup or bioherm"            
[64] "sand shoal"                          
[65] "shallow subtidal indet."             
[66] "shoreface"                           
[67] "sinkhole"                            
[68] "slope"                               
[69] "slope/ramp reef"                     
[70] "spring"                              
[71] "submarine fan"                       
[72] "tar"                                 
[73] "terrestrial indet."                  
[74] "transition zone/lower shoreface"     
[75] "wet floodplain"                      

Filtering - II. Environment

We want go get rid of non-marine ones:

omitEnv <- c(
    "\"floodplain\"",
    "alluvial fan",
    "cave",
    "\"channel\"",
    "channel lag" ,
    "coarse channel fill",
    "crater lake",
    "crevasse splay",
    "dry floodplain",
    "delta plain",
    "dune",
    "eolian indet.",
    "fine channel fill",
    "fissure fill",
    "fluvial indet.",
    "fluvial-lacustrine indet.",
    "fluvial-deltaic indet.",
    "glacial",
    "interdune",
    "karst indet.",
    "lacustrine - large",
    "lacustrine - small",
    "lacustrine delta front",
    "lacustrine delta plain",
    "lacustrine deltaic indet.",
    "lacustrine indet.",
    "lacustrine interdistributary bay",
    "lacustrine prodelta",
    "levee",
    "loess",
    "mire/swamp",
    "pond",
    "sinkhole",
    "spring",
    "tar",
    "terrestrial indet.",
    "wet floodplain"
)

Actual omission:

dat<-dat[!dat$environment%in%omitEnv, ]

# remaining
nrow(dat)
[1] 997995

Omitting unlithified sediments

Unusually good preservation (usually in more recent intervals) might cause temporal biases in sampling. To evade this problem, it can be a good idea to omit records from unlithified sediments.

dat <- dat[dat$lithification1!="unlithified",]
nrow(dat)
[1] 928088

Stratigraphic binning (stages): two solutions

95-bin stages (stg)

data(stages, package="divDyn")
str(stages)
'data.frame':   95 obs. of  13 variables:
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
  • From Ediacaran
  • Some stages are merged

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning (stages): two solutions

PBDB time-bins (stb)

  • Literal stages of the GTS (99 bins)
  • Internally done by the PBDB
  • Using time_bins, time_contain, time_overlap, time_major, time_buffer
  • Sounds awesome but unstable!

Note

We start with solution 1 to better understand the process.

Based on: Gradstein, Ogg, and Schmitz (2020)

Stratigraphic binning

  • Stratigraphic attributes belong to collections
  • Assignment is uncertain
  • Hundreds of names: standard, regional, historic all entries come from:
    • early_interval: obligatory, starting name
    • late_interval: facultative, if collection range doesn’t fit in one interval

Stage number lookup table

Relevant information is in the keys object: ?keys

data(keys, package="divDyn")
str(keys,1)
List of 8
 $ tenInt:List of 51
 $ stgInt:List of 84
 $ reefs :List of 3
 $ lith  :List of 3
 $ lat   :List of 3
 $ bath  :List of 3
 $ grain :List of 3
 $ depenv:List of 3

This object contains useful information for processing the PBDB: from the old FossilWorks and from scripts of W. Kiessling

The relevant information is in:

keys$stgInt
$`14`
 [1] "Cressagian"   "Datsonian"    "Deming"       "Demingian"    "Gasconadian" 
 [6] "Mansian"      "Migneintian"  "Pakerort"     "Skullrockian" "Stairsian"   
[11] "Tremadoc"     "Tremadocian"  "Varangu"      "Warendian"   

$`15`
 [1] "Bendigonian"   "Billingen"     "Blackhillsian" "Cassinian"    
 [5] "Chewtonian"    "Floian"        "Hunneberg"     "Jefferson"    
 [9] "Jeffersonian"  "Moridunian"    "Tulean"       

$`16`
[1] "Castlemainian" "Dapingian"     "Dawan"         "Fennian"      
[5] "Klabava"       "Langevoja"     "Rangerian"     "Volkhov"      
[9] "Yapeenian"    

$`17`
 [1] "Abereiddian"      "Aluoja"           "Aseri"            "Chazy"           
 [5] "Chazyan"          "Darriwilian"      "Dobrotiva"        "Dobrotivian"     
 [9] "Early Llandeilo"  "Early Llanvirn"   "Hulo"             "Hunderum"        
[13] "Kunda"            "Lasnamagi"        "Late Llandeilo"   "Late Llanvirn"   
[17] "Llandeilian"      "Llandeilo"        "Llanvirn"         "Llanvirnian"     
[21] "Middle Llandeilo" "Oretanian"        "Sarka"            "Uhaku"           
[25] "Valaste"          "Whiterock"        "Wildernessian"   

$`18`
 [1] "Ashbyan"      "Aurelucian"   "Black River"  "Blackriveran" "Burrellian"  
 [6] "Costonian"    "Gisbornian"   "Haljala"      "Harnagian"    "Idavere"     
[11] "Johvi"        "Kirkfieldian" "Kukruse"      "Mohawkian"    "Rocklandian" 
[16] "Sandbian"     "Soudleyan"    "Turinian"    

$`19`
 [1] "Actonian"      "Bohdalec"      "Cautleyan"     "Chatfieldian" 
 [5] "Cheneyan"      "Eastonian"     "Edenian"       "Katian"       
 [9] "Kralodvorian"  "Marshbrookian" "Maysville"     "Maysvillian"  
[13] "Nabala"        "Oandu"         "Onnian"        "Pirgu"        
[17] "Pusgillian"    "Rakvere"       "Rawtheyan"     "Richmond"     
[21] "Richmondian"   "Shermanian"    "Streffordian"  "Vormsi"       
[25] "Woolstonian"   "Zahorany"     

$`20`
[1] "Gamach"     "Gamachian"  "Hirnantian" "Kosovian"   "Porkuni"   

$`21`
[1] "Juuru"      "Rhuddanian"

$`22`
[1] "Aeronian"

$`23`
[1] "Telychian"

$`24`
[1] "Jaani"        "Sheinwoodian"

$`25`
[1] "Gleedonian"  "Homerian"    "Rootsikula"  "Whitwellian"

$`26`
[1] "Bringewood" "Elton"      "Gorstian"  

$`27`
[1] "Kuressaare"   "Leintwardine" "Ludfordian"   "Whitcliffe"  

$`28`
[1] "Kaugetuma" "Ohesaare"  "Pridoli"  

$`29`
[1] "Gedinne"    "Gedinnine"  "Lochkovian"

$`30`
[1] "Pragian" "Siegen" 

$`31`
[1] "Deer Park"    "Early Emsian" "Ems"          "Emsian"       "Late Emsian" 
[6] "Zlichovian"  

$`32`
[1] "Eifel"     "Eifelian"  "Southwood"

$`33`
[1] "Early Givetian"  "Givet"           "Givetian"        "Late Givetian"  
[5] "Middle Givetian" "Taghanic"       

$`34`
[1] "Early Frasnian"  "Frasnian"        "Late Frasnian"   "Middle Frasnian"

$`35`
[1] "Early Famennian"  "Famennian"        "Late Famennian"   "Middle Famennian"

$`36`
[1] "Early Mississippian" "Hastarian"           "Ivorian"            
[4] "Kinderhookian"       "Tournaisian"        

$`37`
[1] "Arundian"   "Asbian"     "Brigantian" "Chadian"    "Holkerian" 
[6] "Jiusian"    "Meramecian" "Visean"    

$`38`
[1] "Arnsbergian"        "Late Mississippian" "Pendleian"         
[4] "Serpukhovian"      

$`39`
 [1] "Alportian"           "Bashkirian"          "Cheremshanian"      
 [4] "Chokierian"          "Early Pennsylvanian" "Kinderscoutian"     
 [7] "Krasnopolyanian"     "Marsdenian"          "Morrowan"           
[10] "Prikarnian"          "Voznesenian"         "Westphalian A"      
[13] "Yeadonian"          

$`40`
 [1] "Desmoinesian"         "Kashirian"            "Middle Pennsylvanian"
 [4] "Moscovian"            "Myachkovian"          "Podolskian"          
 [7] "Vereian"              "Westphalian B"        "Westphalian C"       
[10] "Westphalian D"       

$`41`
[1] "Dorogovilovian" "Kasimovian"     "Khamovnichean"  "Krevyakian"    
[5] "Missourian"     "Stephanian A"   "Stephanian B"  

$`42`
[1] "Gzhelian"        "Kuzel"           "Melekhovian"     "Noginian"       
[5] "Pavlovoposadian" "Rusavkian"       "Stephanian C"   

$`43`
[1] "Asselian"   "Autunian"   "Shikhanian"

$`44`
[1] "Sakmarian"      "Sterlitamakian" "Tastubian"     

$`45`
[1] "Artinskian" "Sarginian" 

$`46`
[1] "Irenian"   "Kungurian" "Saranian"  "Ufimian"  

$`47`
[1] "Early Kazanian" "Kazanian"       "Kubergandian"   "Late Kazanian" 
[5] "Roadian"        "Sokian"        

$`48`
[1] "Murgabian" "Urzhumian" "Word"      "Wordian"  

$`49`
[1] "Capitan"    "Capitanian"

$`50`
[1] "Longtanian"    "Wuchiapingian"

$`51`
[1] "Changhsingian"

$`52`
[1] "Dienerian"    "Griesbachian" "Induan"      

$`53`
[1] "Olenekian" "Smithian"  "Spathian" 

$`54`
[1] "Aegean"    "Anisian"   "Bithynian" "Illyrian"  "Pelsonian"

$`55`
[1] "Fassanian"    "Ladinian"     "Longobardian"

$`56`
[1] "Carnian"  "Julian"   "Oretian"  "Tuvalian"

$`57`
[1] "Alaunian" "Lacian"   "Norian"   "Otamitan" "Sevatian" "Warepan" 

$`58`
[1] "Otapirian" "Rhaetian" 

$`59`
[1] "Angulata"          "Canadensis"        "Early Hettangian" 
[4] "Hettangian"        "Late Hettangian"   "Liasicus"         
[7] "Middle Hettangian" "Planorbis"        

$`60`
[1] "Bucklandi"        "Early Sinemurian" "Late Sinemurian"  "Obtusum"         
[5] "Oxynotum"         "Raricostatum"     "Semicostatum"     "Sinemurian"      
[9] "Turneri"         

$`61`
 [1] "Davoei"              "Early Pliensbachian" "Freboldi"           
 [4] "Ibex"                "Imlayi"              "Jamesoni"           
 [7] "Kunae"               "Late Pliensbachian"  "Margaritatus"       
[10] "Pliensbachian"       "Spinatum"            "Whiteavesi"         

$`62`
 [1] "Bifrons"         "Early Toarcian"  "Falciferum"      "Kanense"        
 [5] "Late Toarcian"   "Levesquei"       "Middle Toarcian" "Planulata"      
 [9] "Tenuicostatum"   "Thouarsense"     "Toarcian"        "Variabilis"     

$`63`
[1] "Aalenian"        "Concavum"        "Early Aalenian"  "Late Aalenian"  
[5] "Middle Aalenian" "Murchisonae"     "Opalinum"        "Scissum"        

$`64`
 [1] "Bajocian"       "Discites"       "Early Bajocian" "Garantiana"    
 [5] "Humphresianum"  "Laeviuscula"    "Late Bajocian"  "Parkinsoni"    
 [9] "Sauzei"         "Subfurcatum"   

$`65`
[1] "Bathonian"        "Early Bathonian"  "Late Bathonian"   "Middle Bathonian"

$`66`
[1] "Callovian"        "Early Callovian"  "Late Callovian"   "Middle Callovian"

$`67`
[1] "Early Oxfordian"  "Late Oxfordian"   "Middle Oxfordian" "Oxfordian"       

$`68`
[1] "Early Kimmeridgian" "Kimmeridgian"       "Late Kimmeridgian" 

$`69`
[1] "Early Tithonian"  "Late Tithonian"   "Middle Tithonian" "Puaroan"         
[5] "Tithonian"       

$`70`
[1] "Berriasian"        "Early Berriasian"  "Late Berriasian"  
[4] "Middle Berriasian"

$`71`
[1] "Early Valanginian" "Late Valanginian"  "Valanginian"      

$`72`
[1] "Early Hauterivian" "Hauterivian"       "Late Hauterivian" 

$`73`
[1] "Barremian"       "Early Barremian" "Late Barremian" 

$`74`
[1] "Aptian"       "Early Aptian" "Late Aptian" 

$`75`
[1] "Albian"        "Early Albian"  "Late Albian"   "Middle Albian"
[5] "Motuan"        "Urutawan"     

$`76`
[1] "Cenomanian"        "Early Cenomanian"  "Late Cenomanian"  
[4] "Middle Cenomanian" "Ngaterian"        

$`77`
[1] "Early Turonian"  "Late Turonian"   "Mangaotanean"    "Middle Turonian"
[5] "Turonian"       

$`78`
[1] "Coniacian"        "Early Coniacian"  "Late Coniacian"   "Middle Coniacian"
[5] "Teratan"         

$`79`
[1] "Early Santonian"  "Late Santonian"   "Middle Santonian" "Piripauan"       
[5] "Santonian"       

$`80`
[1] "Campanian"        "Early Campanian"  "Judithian"        "Late Campanian"  
[5] "Middle Campanian"

$`81`
[1] "Early Maastrichtian" "Edmontonian"         "Lancian"            
[4] "Late Maastrichtian"  "Maastrichtian"      

$`82`
[1] "Danian"          "Early Paleocene" "Pu1"             "Pu3"            
[5] "Puercan"         "To2"             "To3"             "Torrejonian"    

$`83`
 [1] "Cf1"              "Cf2"              "Cf3"              "Clarkforkian"    
 [5] "Gashatan"         "Late Paleocene"   "Middle Paleocene" "Riochican"       
 [9] "Selandian"        "Thanetian"        "Ti1"             

$`84`
 [1] "Bumbanian"    "Early Eocene" "Graybullian"  "Lostcabinian" "Lysitean"    
 [6] "Mangaorapan"  "MP 10"        "MP 7"         "MP 8 + 9"     "Sandcouleean"
[11] "Waipawan"     "Wasatchian"   "Ypresian"    

$`85`
[1] "Early Uintan" "Late Uintan"  "Lutetian"     "MP 11"        "MP 13"       
[6] "MP 14"        "Mustersan"    "Uintan"      

$`86`
[1] "Bartonian"  "Duchesnean"

$`87`
 [1] "Chadronian"        "Early Chadronian"  "Ergilian"         
 [4] "Houldjinian"       "Kaiatan"           "Late Chadronian"  
 [7] "Late Eocene"       "Middle Chadronian" "MP 18"            
[10] "MP 19"             "MP 20"             "Priabonian"       
[13] "Runangan"          "Ulangochuian"     

$`88`
[1] "Early Oligocene" "MP 21"           "MP 22"           "MP 23"          
[5] "MP 24"           "Orellan"         "Rupelian"        "Whitneyan"      

$`89`
[1] "Chattian"       "Duntroonian"    "Late Oligocene" "Monroecreekian"
[5] "MP 26"          "MP 27"          "MP 28"          "MP 29"         
[9] "MP 30"         

$`90`
 [1] "Aquitanian"          "Burdigalian"         "Early Hemingfordian"
 [4] "Early Miocene"       "Hemingfordian"       "Late Hemingfordian" 
 [7] "MN 1"                "MN 3"                "MN 4"               
[10] "Orleanian"           "Santacrucian"       

$`91`
 [1] "Badenian"       "Clifdenian"     "Colloncuran"    "Langhian"      
 [5] "Lillburnian"    "Middle Miocene" "MN 5"           "MN 6"          
 [9] "Sarmatian"      "Serravallian"   "Waiauan"       

$`92`
 [1] "Chasicoan"               "early Early Hemphillian"
 [3] "Kapitean"                "Late Miocene"           
 [5] "Messinian"               "MN 10"                  
 [7] "MN 11"                   "MN 12"                  
 [9] "MN 9"                    "Pannonian"              
[11] "Tortonian"               "Vallesian"              

$`93`
 [1] "Early Pliocene" "Early Pliocene" "Late Pliocene"  "Late Pliocene" 
 [5] "MN 14"          "MN 16"          "Opoitian"       "Piacenzian"    
 [9] "Pliocene"       "Villanyian"     "Waipipian"      "Zanclean"      

$`94`
[1] "Calabrian"          "Early Pleistocene"  "Gelasian"          
[4] "Ionian"             "Late Pleistocene"   "Middle Pleistocene"
[7] "MN 17"              "Pleistocene"       

$`95`
[1] "Holocene"

$`-1`
[1] ""

$default
[1] NA

Combine/categorize entries

Built-in categorize() function allows us to find entries in keys$stgInt:

# B. the stg entries (lookup)
stgMin<-categorize(dat[,"early_interval"],keys$stgInt)
stgMax<-categorize(dat[,"late_interval"],keys$stgInt)
entry result meaning
"Ypresian" 84 name indicates interval confined to a stage
"Permian" NA Name indicates interval not confined to a stage
"" -1 Inteval entry is not given - late_interval

Lower number - earlier stage.

We need a single bin entry

Make sure that numbers are numbers!

stgMin<-as.numeric(stgMin)
stgMax<-as.numeric(stgMax)

We need just one bin for every entry! Add this to the table, we will fill this in the next step.

# empty container
dat$stg <- rep(NA, nrow(dat))

What are the cases?

stgMin stgMax Meaning stg
84 -1 early_interval resolves to one stage, no late_interval 84
84 84 early_interval resolves to same stage as late_interval 84
84 86 collection can range from stages 84 to 86 NA
84 NA late_interval does not resolve to one stage NA
NA -1 early_interval does not resolve to one stage, the other missing NA
NA NA neither early_interval nor late_interval resolve to one stage NA

Deriving a single stage number

Programming the rules above with logical expressions:

# select entries, where
stgCondition <- c(
# the early and late interval fields indicate the same stg
    which(stgMax==stgMin),
# or the late_intervarl field is empty
    which(stgMax==-1))

In these entries, use the stg indicated by the early_interval:

dat$stg[stgCondition] <- stgMin[stgCondition]

Result

  • integer: record comes from a one stage
  • NA: record does not come from one stage

Extra cases /1

The Cambrian and the Ordovician are problematic.

The Cambrian

Collections were assigned by Na Lin to the 10 stages (Na and Kiessling 2015) .

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/cambStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2018-08-31/cambProcess.R")

Note

  • This is only necessary if you do Phanerozoic scale analyses!
  • You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

Extra cases /2

The Cambrian and the Ordovician are problematic.

The Ordovician

Assigments based on formations and biozone by Wolfgang Kiessling.

# load data
load(url("https://github.com/divDyn/ddPhanero/raw/master/data/Stratigraphy/2018-08-31/ordStrat.RData"))

# correct it with this function
source("https://github.com/divDyn/ddPhanero/raw/master/scripts/strat/2019-05-31/ordProcess.R")

Note

  • This is only necessary if you do Phanerozoic scale analyses!
  • You do not need to repeat all these stages, when you focus on a shorter interval of time or a specific taxa. Think about whether these are necessary!

The PBDB’s internal ‘time bins’ (stb)

  • Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
  • Strategy:
  1. Make timescale from the entries
  2. Assign bin numbers to collections/records

Example solution

Function to extract relevant information from the PBDB download

#' Function to generate stage-lookup table from the Paleobiology Database
#'
#' Processess time-bin data downloaded from the database
#'
#' @param pbdb data.frame from database.
#' @param stagecolumn Charater string, of the time-bin information column
#' @param min_ma Character string, column name of the minimum ages 
#' @param max_ma Character string, column name of the maximum ages 
#' @param bin Character string, this will be the column name of the bin information
#' @return A data.frame with the stages' name bottom, mid and top age and bin number.
StagesFromPBDB<- function(pbdb, stagecolumn="time_contain", min_ma="min_ma", max_ma="max_ma", bin="stb"){
    # get the appropriate colulmns out of the PBDB download
    timeinfo <- unique(pbdb[,c(stagecolumn, max_ma, min_ma)])

    # omit those entires that are not meaningful
    timeinfo <- timeinfo[timeinfo[,stagecolumn]!="-",]
    
    # search for stage minima and maxima
    ma <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, max_ma], max)
    min <- tapply(INDEX=timeinfo[,stagecolumn], X=timeinfo[, min_ma], min)

    # create stages object
    stagesDF <- data.frame(stage=names(ma), bottom=ma, mid=(ma+min)/2, top=min)
    stagesDF <-stagesDF[order(stagesDF$bottom, decreasing=TRUE),]
    rownames(stagesDF) <- NULL

    # check consistency - Cambrian is not consistent
    # stages$top[1:(nrow(stages)-1)]==stages$bottom[2:nrow(stages)]

    # create integer bin numbers
    stagesDF<- cbind(stagesDF, 1:nrow(stagesDF))
    colnames(stagesDF)[ncol(stagesDF)] <- bin
    return(stagesDF)
}

The PBDB’s internal ‘time bins’ (stb)

  • Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
  • Strategy:
  1. Make timescale from the entries
  2. Assign bin numbers to collections/records

Call to the function:

# new stages object
stagebins <- StagesFromPBDB(dat)
stagebins
              stage   bottom       mid      top stb
1         Fortunian 538.8000 533.90000 529.0000   1
2           Stage 2 525.5000 523.25000 521.0000   2
3           Stage 3 521.0000 517.50000 514.0000   3
4           Stage 4 514.0000 511.50000 509.0000   4
5           Wuliuan 509.0000 506.75000 504.5000   5
6           Drumian 504.5000 502.50000 500.5000   6
7        Guzhangian 500.5000 498.75000 497.0000   7
8           Paibian 497.0000 495.50000 494.0000   8
9      Jiangshanian 494.0000 491.75000 489.5000   9
10         Stage 10 489.5000 487.45000 485.4000  10
11      Tremadocian 485.4000 481.55000 477.7000  11
12           Floian 477.7000 473.85000 470.0000  12
13        Dapingian 470.0000 468.65000 467.3000  13
14      Darriwilian 467.3000 462.85000 458.4000  14
15         Sandbian 458.4000 455.70000 453.0000  15
16           Katian 453.0000 449.10000 445.2000  16
17       Hirnantian 445.2000 444.50000 443.8000  17
18       Rhuddanian 443.8000 442.30000 440.8000  18
19         Aeronian 440.8000 439.65000 438.5000  19
20        Telychian 438.5000 435.95000 433.4000  20
21     Sheinwoodian 433.4000 431.95000 430.5000  21
22         Homerian 430.5000 428.95000 427.4000  22
23         Gorstian 427.4000 426.50000 425.6000  23
24       Ludfordian 425.6000 422.40000 419.2000  24
25       Lochkovian 419.2000 415.00000 410.8000  25
26          Pragian 410.8000 409.20000 407.6000  26
27           Emsian 407.6000 400.45000 393.3000  27
28         Eifelian 393.3000 390.50000 387.7000  28
29         Givetian 387.7000 385.20000 382.7000  29
30         Frasnian 382.7000 377.45000 372.2000  30
31        Famennian 372.2000 365.55000 358.9000  31
32      Tournaisian 358.9000 352.80000 346.7000  32
33           Visean 346.7000 338.80000 330.9000  33
34     Serpukhovian 330.9000 327.05000 323.2000  34
35       Bashkirian 323.2000 319.20000 315.2000  35
36        Moscovian 315.2000 311.10000 307.0000  36
37       Kasimovian 307.0000 305.35000 303.7000  37
38         Gzhelian 303.7000 301.30000 298.9000  38
39         Asselian 298.9000 296.21000 293.5200  39
40        Sakmarian 293.5200 291.81000 290.1000  40
41       Artinskian 290.1000 286.80000 283.5000  41
42        Kungurian 283.5000 278.25500 273.0100  42
43          Roadian 273.0100 269.95500 266.9000  43
44          Wordian 266.9000 265.59000 264.2800  44
45       Capitanian 264.2800 261.89500 259.5100  45
46    Wuchiapingian 259.5100 256.82500 254.1400  46
47    Changhsingian 254.1400 253.02100 251.9020  47
48           Induan 251.9020 251.55100 251.2000  48
49        Olenekian 251.2000 249.20000 247.2000  49
50          Anisian 247.2000 244.60000 242.0000  50
51         Ladinian 242.0000 239.50000 237.0000  51
52          Carnian 237.0000 232.00000 227.0000  52
53           Norian 227.0000 217.75000 208.5000  53
54         Rhaetian 208.5000 204.95000 201.4000  54
55       Hettangian 201.4000 200.45000 199.5000  55
56       Sinemurian 199.5000 196.20000 192.9000  56
57    Pliensbachian 192.9000 188.55000 184.2000  57
58         Toarcian 184.2000 179.45000 174.7000  58
59         Aalenian 174.7000 172.80000 170.9000  59
60         Bajocian 170.9000 169.55000 168.2000  60
61        Bathonian 168.2000 166.75000 165.3000  61
62        Callovian 165.3000 163.40000 161.5000  62
63        Oxfordian 161.5000 158.15000 154.8000  63
64     Kimmeridgian 154.8000 152.00000 149.2000  64
65        Tithonian 149.2000 147.10000 145.0000  65
66       Berriasian 145.0000 142.40000 139.8000  66
67      Valanginian 139.8000 136.20000 132.6000  67
68      Hauterivian 132.6000 129.18500 125.7700  68
69        Barremian 125.7700 123.58500 121.4000  69
70           Aptian 121.4000 117.20000 113.0000  70
71           Albian 113.0000 106.75000 100.5000  71
72       Cenomanian 100.5000  97.20000  93.9000  72
73         Turonian  93.9000  91.85000  89.8000  73
74        Coniacian  89.8000  88.05000  86.3000  74
75        Santonian  86.3000  84.95000  83.6000  75
76        Campanian  83.6000  77.85000  72.1000  76
77    Maastrichtian  72.1000  69.05000  66.0000  77
78           Danian  66.0000  63.80000  61.6000  78
79        Selandian  61.6000  60.40000  59.2000  79
80        Thanetian  59.2000  57.60000  56.0000  80
81         Ypresian  56.0000  51.90000  47.8000  81
82         Lutetian  47.8000  44.50000  41.2000  82
83        Bartonian  41.2000  39.45500  37.7100  83
84       Priabonian  37.7100  35.80500  33.9000  84
85         Rupelian  33.9000  30.86000  27.8200  85
86         Chattian  27.8200  25.42500  23.0300  86
87       Aquitanian  23.0300  21.73500  20.4400  87
88      Burdigalian  20.4400  18.21000  15.9800  88
89         Langhian  15.9800  14.90000  13.8200  89
90     Serravallian  13.8200  12.72500  11.6300  90
91        Tortonian  11.6300   9.43800   7.2460  91
92        Messinian   7.2460   6.28950   5.3330  92
93         Zanclean   5.3330   4.46650   3.6000  93
94       Piacenzian   3.6000   3.09000   2.5800  94
95         Gelasian   2.5800   2.19000   1.8000  95
96        Calabrian   1.8000   1.28700   0.7740  96
97        Chibanian   0.7740   0.45150   0.1290  97
98 Late Pleistocene   0.1290   0.07035   0.0117  98
99         Holocene   0.0117   0.00585   0.0000  99

Note the issue with the beginning of the Cambrian!

The PBDB’s internal ‘time bins’ (stb)

  • Collection’s assignment included in the time_contain, time_bins, time_overlap, time_major, time_buffer fields
  • Strategy:
  1. Make timescale from the entries
  2. Assign bin numbers to collections/records
#' Function to bin PBDB data to stages
#'
#' @param pbdb data.frame from database.
#' @param stages data.frame produced by StagesFromPBDB.
#' @param bin Character string, this will be the column name of the bin information
#' @param stagecolumn Character string, of the time-bin information column in the PBDB
#' @return A vector with integer bin numbers
BinPBDB <- function(pbdb,stages,bin="stb",stagecolumn="time_contain"){
    # new vector
    y <- rep(NA, nrow(pbdb))

    # simple for loop to execute the binning
    for(i in 1:nrow(stages)){
        y[which(pbdb[, stagecolumn]==stages$stage[i])] <-stages[i,bin]
    }

    return(y)
}

And finally do the binning:

# bins from the pbdb output
dat$stb <- BinPBDB(dat, stagebins)

We are done with the preparation

For other exercises (CMR): join some bits from the timescale

# stg (95bin) entries
stgFields <- stages[,c("stg", "stage","top", "bottom")]
colnames(stgFields)[2:4] <- paste0("stg_", colnames(stgFields))[2:4]
datMerged <- merge(dat,stgFields, by="stg")

# stb (99bin) entries
stbFields <- stagebins
colnames(stbFields)[1:4] <- paste0("stb_",colnames(stbFields)[1:4])
datMerged <- merge(datMerged,stbFields, by="stb")

This might be a good time to take a snapshot and save an incremental result object (e.g. .rds)

Note

  • The smaller subset you look at, the weirder your data behaves.
  • This should not be considered final!

Basic sampling patterns

The number of collections and number of records (occurrences) are very much correlated with sampling.

These simple statistics can be calculated with basic functions:

table(dat$stg)

    4     5     6     7     8     9    10    11    12    13    14    15    16 
 1359  1813  5664  2663  3948  1030  2400  1203  2597  1808  6438  5863  4609 
   17    18    19    20    21    22    23    24    25    26    27    28    29 
15147 15589 33271  3766  4662  3224  5797  6980  3918  2584  4827  4094  7735 
   30    31    32    33    34    35    36    37    38    39    40    41    42 
 5270  7888  7489 22065 10775  7303  6320 13284  4576  3806  6477  2859  3106 
   43    44    45    46    47    48    49    50    51    52    53    54    55 
 5519  8105  8132 14374 10898 11201  8941 18660 16266  7765 10146 14441  5832 
   56    57    58    59    60    61    62    63    64    65    66    67    68 
10964  8626  5910  3419  6500 17322 15314  3964  5897  6702  9198 10294  6909 
   69    70    71    72    73    74    75    76    77    78    79    80    81 
 6909  3384  7064  5434  3627  7383 10922 13777  8334  2269  5000 14460 36044 
   82    83    84    85    86    87    88    89    90    91    92    93    94 
 7612  6084  8027  9156 12154 10772  8273  7182 14902 14514 20438 21203 24030 
   95 
 3446 

Sampling

To make these even faster and convenient, the binstat() function can calculate them all at once:

sampStg<- binstat(dat, bin="stg", tax="clgen", coll="collection_no", 
        duplicates=FALSE)  

Note

duplicates=FALSE makes sure that multiple occurrences of the same genus in the same collection (i.e. when there are multiple species) are counted as one!

The result of binstat()

str(sampStg)
'data.frame':   95 obs. of  8 variables:
 $ stg     : num  1 2 3 4 5 6 7 8 9 10 ...
 $ occs    : num  NA NA NA 1074 1466 ...
 $ colls   : num  NA NA NA 217 225 756 636 821 266 463 ...
 $ SIBs    : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ occ1    : num  NA NA NA 74 83 236 194 199 144 124 ...
 $ occ2    : num  NA NA NA 17 34 105 73 75 54 58 ...
 $ u       : num  NA NA NA 0.931 0.943 ...
 $ chao1occ: num  NA NA NA 321 319 ...
  • By default, there are as many rows as the highest integer in the bin argument (95).
  • There are missing values in the beginning. This is can help with plotting.
  • Better: plot the values in the context of the timescale.

Unifying the output

Easiest: merge the results with the time scale object.

sampling <- merge(stages, sampStg, by="stg")
str(sampling)
'data.frame':   95 obs. of  20 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ occs     : num  NA NA NA 1074 1466 ...
 $ colls    : num  NA NA NA 217 225 756 636 821 266 463 ...
 $ SIBs     : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ occ1     : num  NA NA NA 74 83 236 194 199 144 124 ...
 $ occ2     : num  NA NA NA 17 34 105 73 75 54 58 ...
 $ u        : num  NA NA NA 0.931 0.943 ...
 $ chao1occ : num  NA NA NA 321 319 ...

Plotting

  1. We create an empty canvas to plot on.
tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,20000), 
    ylab="Number of Records")

The number of records

# occurrences
lines(sampling$mid, sampling$occs, lwd=2)

The number of collections

# collections
lines(sampling$mid, sampling$colls, lwd=2, col="blue")

Adding a legend

legend("topleft", bg="white", legend=c("occurrences", "collections"), 
    col=c("black", "blue"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Sampling is heterogeneous

Strong correlation between collection and record numbers

cor.test(sampling$occs, sampling$colls, method="spearman")

    Spearman's rank correlation rho

data:  sampling$occs and sampling$colls
S = 14290, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8898787 

Main function: divDyn()

Calculate basic metrics of diversity dynamics:

dd <- divDyn(dat, bin="stg", tax="clgen")
str(dd)
'data.frame':   95 obs. of  31 variables:
 $ stg      : num  1 2 3 4 5 6 7 8 9 10 ...
 $ t2d      : num  NA NA NA NA 93 119 205 169 147 101 ...
 $ t2u      : num  NA NA NA 93 119 205 169 147 101 79 ...
 $ t3       : num  NA NA NA NA 52 56 67 52 44 24 ...
 $ tPart    : num  NA NA NA NA 14 2 25 13 19 5 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 5 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 11 6 9 9 ...
 $ tSing    : num  NA NA NA 51 47 350 149 222 74 116 ...
 $ tOri     : num  NA NA NA 109 78 172 113 112 57 86 ...
 $ tExt     : num  NA NA NA NA 38 64 128 121 95 80 ...
 $ tThrough : num  NA NA NA NA 71 85 129 121 138 115 ...
 $ divSIB   : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ divCSIB  : num  NA NA NA NA 218 ...
 $ divRT    : num  NA NA NA 160 234 671 519 576 364 397 ...
 $ divBC    : num  NA NA NA NA 109 149 257 242 233 195 ...
 $ extProp  : num  NA NA NA 0.319 0.363 ...
 $ oriProp  : num  NA NA NA 1 0.534 ...
 $ extPC    : num  NA NA NA NA 0.429 ...
 $ oriPC    : num  NA NA NA NA 0.741 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.828 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.468 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.28 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.329 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.788 ...
 $ sampRange: num  NA NA NA NA 0.775 ...

The output of divDyn()

Returns a data.frame, row of values for every bin. Prefixes:

  • t: taxon counts
  • div: diversity counts
  • ext: extinction metrics
  • ori: origination metrics
  • samp: sampling metrics

See: ?divDyn for more…

Range-based metrics of diversity dynamics

Range-through diversity

divRT = tSing(FL) + tExt(bL) + tOri(Ft) + tThrough(bt)

Boundary-crosser diversity

divBC = tSing(FL) + tExt(bL) + tOri(Ft) + tThrough(bt)

Per capita rates (M. Foote 1999)

  • Time homogeneous process in bin, Decay rate
  • Originally divided by \(\Delta t\)
  • Assumption about pulsedness (Michael Foote 2005)

oriPC = -log( tOri / ( tOri + tThrough ))
extPC = -log( tExt / ( tExt + tThrough ))

Plotting preparation

To make plotting easy:

patterns <- merge(stages, dd, by="stg")
str(patterns)
'data.frame':   95 obs. of  43 variables:
 $ stg      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ sys      : chr  "E" "E" "E" "Cm" ...
 $ system   : chr  "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
 $ series   : chr  "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
 $ stage    : chr  "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
 $ short    : chr  "Ava" "Whs" "Nam" "For" ...
 $ bottom   : num  580 560 550 539 529 ...
 $ mid      : num  570 555 544 534 525 ...
 $ top      : num  560 550 539 529 521 ...
 $ dur      : num  20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
 $ systemCol: chr  "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
 $ seriesCol: chr  "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
 $ col      : chr  "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
 $ t2d      : num  NA NA NA NA 93 119 205 169 147 101 ...
 $ t2u      : num  NA NA NA 93 119 205 169 147 101 79 ...
 $ t3       : num  NA NA NA NA 52 56 67 52 44 24 ...
 $ tPart    : num  NA NA NA NA 14 2 25 13 19 5 ...
 $ tGFd     : num  NA NA NA NA NA 4 4 13 5 6 ...
 $ tGFu     : num  NA NA NA NA 1 10 11 6 9 9 ...
 $ tSing    : num  NA NA NA 51 47 350 149 222 74 116 ...
 $ tOri     : num  NA NA NA 109 78 172 113 112 57 86 ...
 $ tExt     : num  NA NA NA NA 38 64 128 121 95 80 ...
 $ tThrough : num  NA NA NA NA 71 85 129 121 138 115 ...
 $ divSIB   : num  NA NA NA 160 218 655 469 532 294 328 ...
 $ divCSIB  : num  NA NA NA NA 218 ...
 $ divRT    : num  NA NA NA 160 234 671 519 576 364 397 ...
 $ divBC    : num  NA NA NA NA 109 149 257 242 233 195 ...
 $ extProp  : num  NA NA NA 0.319 0.363 ...
 $ oriProp  : num  NA NA NA 1 0.534 ...
 $ extPC    : num  NA NA NA NA 0.429 ...
 $ oriPC    : num  NA NA NA NA 0.741 ...
 $ ext3t    : num  NA NA NA NA 0.581 ...
 $ ori3t    : num  NA NA NA NA 0.828 ...
 $ extC3t   : num  NA NA NA NA 0.546 ...
 $ oriC3t   : num  NA NA NA NA NA ...
 $ extGF    : num  NA NA NA NA 0.468 ...
 $ oriGF    : num  NA NA NA NA NA ...
 $ E2f3     : num  NA NA NA NA 0.28 ...
 $ O2f3     : num  NA NA NA NA NA ...
 $ ext2f3   : num  NA NA NA NA 0.329 ...
 $ ori2f3   : num  NA NA NA NA NA ...
 $ samp3t   : num  NA NA NA NA 0.788 ...
 $ sampRange: num  NA NA NA NA 0.775 ...

Plotting - empty canvas

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Plotting divRT

lines(patterns$mid, patterns$divRT)

Plotting divBC

lines(patterns$mid, patterns$divBC, col="blue")

Plotting rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Taxonomic rates")

Plotting rates (lines)

lines(patterns$mid, patterns$oriPC, col="green")
lines(patterns$mid, patterns$extPC, col="red")

Plotting rates (legend)

legend("topleft", bg="white", legend=c("origination", "extinction"), 
    col=c("green", "red"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Can you name the main events and patterns?

Range-based rates - main features of results

  • Decline of rates
  • Mass extinctions + (rebounds)
  • Edge effects

Issues

11000115001200012500130001350014000 050100150200250300350 number of taxa 0.10.20.30.40.50.60.70.80.9 11000115001200012500130001350014000 0.00.51.01.52.0 time steps per capita extinction rates A B

Occurrence record-based metrics

“occurrence-based”

Sampled-in-Bin diversity

  • divSIB
  • The number of sampled taxa in the focal bin.
  • Very vulnerable to the sampling universe effect

Plotting SIB diversity (previous)

Previous code:

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")

Plotting SIB diversity (line)

The SIB diversity:

lines(patterns$mid, patterns$divSIB, col="gray")

Plotting SIB diversity (legend)

The SIB diversity:

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB"), 
    col=c("black", "blue", "gray"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Three-timer sampling completeness

  • samp3T = t3 / (t3 + tPart)
  • \([0,1]\) - How well-sampled is the interval?

Plotting samp3t (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1), 
    ylab="Sampling completeness")

Plotting samp3t (line)

lines(patterns$mid, patterns$samp3t)

Corrected SIB diversity (divCSIB)

  • Use samp3t to correct divSIB
  • divCSIV = divSIB / samp3t * samp3tTotal

Plotting CSIB diversity (previous)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")
lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divBC, col="blue")
lines(patterns$mid, patterns$divSIB, col="gray")

Plotting CSIB diversity (line)

lines(patterns$mid, patterns$divCSIB, col="purple")

Plotting CSIB diversity (legend)

legend("topleft", bg="white", legend=c("divRT", "divBC", "divSIB", "divCSIB"), 
    col=c("black", "blue", "gray", "purple"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Diversity comparison

  • Range-based values are higher and less volatile (smoother)
  • Face-value results: divRT
  • Truest method: divCSIB (with subsampling)

Occurrence record-based rates

  • Purpose: get rid of the smearing effects
  • Approximate the per capita rates, when sampling is complete
  1. Three-timer rates (Alroy 1996)
  2. Corrected three-timer rates (Alroy 2008)
  3. Gap-filler rates (Alroy 2014)
  4. Second-for-third substitution rates (Alroy 2015)
    • Wider moving windows, more complex patterns

Three-timer rates 3t

Per capita rates in a 3-bin window.

Taxon patterns:

  • Lower two-timers (t2d)
  • Upper two-timers (t2u)
  • Three-timer (t3)

(Alroy 1996)

ext3t = log( t2d / t3)
ori3t = log( t2u / t3)

Corrected Three-timer rates C3t

Improvement presented by Alroy (2008)

extC3t = log( t2d / t3 ) + log( samp3t[i+1] )
oriC3t = log( t2u / t3 ) - log( samp3t[i-1] )

Gap-filler GF and 2f3 rates

Gap filler equations (Alroy 2014): \(extC3t = log((t2d + tPart)/(t3 + tPart + tGFu))\)
\(oriC3t = log((t2u + tPart)/(t3 + tPart + tGFd))\)

Improvements in (Alroy 2015): second-for-third substituion rates. These are algorithmic solutions. These are the state-of-the-art rate metrics of this kind:

ext2f3
ori2f3

Comparison of extinction rates (canvas)

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,1.5), 
    ylab="Extinction rates")

Comparison of extinction rates (lines)

lines(patterns$mid, patterns$ext3t, col="black")
lines(patterns$mid, patterns$extC3t, col="orange")
lines(patterns$mid, patterns$extGF, col="cyan")
lines(patterns$mid, patterns$ext2f3, col="darkred")

Comparison of extinction rates (legend)

legend("topleft", bg="white", legend=c("3t", "C3t", "GF", "2f3"), 
    col=c("black", "orange", "cyan", "darkred"), lwd=2, inset=c(0.05,0.01), cex=1.3)

Rate types - comparison

  • Different metrics converge, but they vary!
  • No edge or smearing effects with the occurrence-based ones
  • They omit lots of data, very unprecise: negative values!

Rate General guidelines

Use

PC: General assessment/raw patterns
2f3: When possible, rate estimation at MEs

Fig. from Kocsis et al. (2019)

Subsampling of samples in divDyn

Why

  • Comparing samples with different quantity of information
  • What would our results look like if we had the same quantity of information?

Miller and Foote (1996)

Two main types

Classical Rarefaction (CR)

  • Analytical and algorithmic approach
  • target: a particular number of records
  • By-list subsampling

Shareholder Quorum Subsampling (SQS)

  • Based on coverage
  • Target is a certain level of quorum (coverage)

Algorithmic solution: SQS (Alroy 2010)
Analytical solution: Coverage-based rarefaction (Chao and Jost 2012)

Implementation in divDyn

  • Subsampling for one sample is the first step
  • Example implementation on Evolv-ED

Subsampling with Phanerozoic Marine Animals

Sampling’s effects (occurrences)

cor.test(sampling$occs, patterns$divCSIB, method="spearman")

    Spearman's rank correlation rho

data:  sampling$occs and patterns$divCSIB
S = 28386, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7663415 

Massive correlation between sampling and diversity in a bin!

Sampling’s effects (collections)

cor.test(sampling$colls, patterns$divCSIB, method="spearman")

    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52450, p-value = 9.109e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5682595 

Massive correlation between sampling and diversity in a bin!

Subsampling in a time-bin context

Kocsis et al. (2019)

  1. Subsample every bin
  2. Calculate a statistic in mind - run a function on the data, e.g. divDyn()
  3. Store the result
  4. Iterate 1-3 a number of times

CR with Phanerozoic-scale data

crStg<-subsample(dat, bin="stg", tax="clgen", coll="collection_no", q=1000, 
    iter=5, duplicates=FALSE, na.rm=TRUE)

Arguments

arg meaning
q Subsampling quota
iter Number of iterations
duplicates Multiple records of the same genus in a list?
coll What is a list (for duplicates)
na.rm Omit NA bin assignments automatically

CR with Phanerozoic-scale data (canvas)

Prepartion for plotting

cr <- merge(stages, crStg, by="stg")

Timescale plot

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

Raw diversity curves (RT and CSIB)

lines(patterns$mid, patterns$divRT)
lines(patterns$mid, patterns$divCSIB, col="purple")

Adding subsampling results

lines(cr$mid, cr$divRT, lty=2)
lines(cr$mid, cr$divCSIB, col="purple", lty=2)

Did it solve the problem?

Not entirely…

No subsampling

cor.test(sampling$colls, patterns$divCSIB, method="spearman")

    Spearman's rank correlation rho

data:  sampling$colls and patterns$divCSIB
S = 52450, p-value = 9.109e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.5682595 

CR subsampling

cor.test(sampling$colls, cr$divCSIB, method="spearman")

    Spearman's rank correlation rho

data:  sampling$colls and cr$divCSIB
S = 73152, p-value = 0.001691
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.3333819 

… but it is better!

Occurrence weighted By-list subsampling

  • Another version of rarefaction
  • Keeps lists (collections) intact
  • People call it \(O^xW\)
  • Largely superseeded by SQS
arg use
type the type, here oxw
xexp=0 q: coll No.
xexp=1 q: occ No.
oxwStg <- subsample(dat, bin="stg", tax="clgen", coll="collection_no", 
    q=200, type="oxw", xexp=0, iter=10, duplicates=FALSE, na.rm=TRUE)

Issues with plain CR and by-list

  • sampling of rare genera is disproportionately low
  • decrease in curve variance as the quota decreases (1000, 800, 600, 400)

Shareholder quorum subsampling

  • Aims to get a certain quorum of the sample
  • Algorithmic solution (c.f. Coverage-based rarefaction, which is the analytical solution)
sqsStg<-subsample(dat, bin="stg", tax="clgen", 
    coll="collection_no", q=0.7, iter=10, 
    ref="reference_no",singleton="ref", type="sqs", duplicates=FALSE, 
    excludeDominant=TRUE, largestColl =TRUE, na.rm=TRUE)
arg meaning
q target quorum
type subsamplinng type
ref reference columns
singleton What is a singleton based on?
excludeDominant tiny correction by Alroy (2010)
largestColl tiny correction by Alroy (2010)

Prepare to plot

Merge with timescale

sqs <- merge(stages, sqsStg, by="stg")

Timescale

tsplot(stages, boxes="sys", shading="sys", xlim=4:95, ylim=c(0,4000), 
    ylab="Richness")

SQS vs CR vs raw

lines(patterns$mid, patterns$divCSIB, col="purple")
lines(cr$mid, cr$divCSIB, col="purple", lty=2)
lines(sqs$mid, sqs$divCSIB, col="purple", lty=3)

Guidelines

Note

  • Use subsampling to test the robustness of your results!
  • Use as high iter as necessary to stabilize the results (usually above 100)
  • Can be used to do rates as well! (use CR!)

Environmental affinities

  • Does the taxon like env. A. or B?
  • Raw occurreence counts are misleading
  • First row: equal sampling
  • Second row: unequal
  • sampling (shifts occurrence balance)
  • divDyn::affinity() -> ?affinity

The coral example

Details

  • Material available online on Evolv-ED
  • Go through this at your own pace
  • Shout if you have any issues!

References

Alroy, John. 1996. “Constant Extinction, Constrained Diversification, and Uncoordinated Stasis in North American Mammals.” Palaeogeography, Palaeoclimatology, Palaeoecology 127: 285–311.
———. 2008. “Dynamics of Origination and Extinction in the Marine Fossil Record.” Proceedings of the National Academy of Science 105: 11536–42.
———. 2010. “The Shifting Balance of Diversity Among Major Marine Animal Groups.” Science 329: 1191–94. https://doi.org/10.1126/science.1189910.
———. 2014. “Accurate and Precise Estimates of Origination and Extinction Rates.” Paleobiology 40 (3): 374–97. https://doi.org/10.1666/13036.
———. 2015. “A More Precise Speciation and Extinction Rate Estimator.” Paleobiology 41 (04): 633–39. https://doi.org/10.1017/pab.2015.26.
Chao, Anne, and Lou Jost. 2012. “Coverage-Based Rarefaction and Extrapolation: Standardizing Samples by Completeness Rather Than Size.” Ecology 93 (12): 2533–47. https://doi.org/10.1890/11-1952.1.
Foote, M. 1999. “Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids.” Paleobiology 25 (S2): 1–115. https://doi.org/10.1017/S0094837300020236.
Foote, Michael. 2005. “Pulsed Origination and Extinction in the Marine Realm.” Paleobiology 31 (1): 6–20.
Gradstein, Felix M, James George Ogg, and Mark D Schmitz. 2020. The Geologic Time Scale 2020.
Kocsis, Ádám T., Carl J. Reddin, John Alroy, and Wolfgang Kiessling. 2019. “The R Package divDyn for Quantifying Diversity Dynamics Using Fossil Sampling Data.” Methods in Ecology and Evolution 10 (5): 735–43. https://doi.org/10.1111/2041-210X.13161.
MacArthur, Robert, and Edward O. Wilson. 1963. “An Equilibrium Theroy of Insular Zoogeography.” Evolution 17 (4): 373–87.
Miller, Arnold I., and Michael Foote. 1996. “Calibrating the Ordovician Radiation of Marine Life: Implications for Phanerozoic Diversity Trends.” Paleobiology 22 (2): 304–9.
Na, Lin, and Wolfgang Kiessling. 2015. “Diversity Partitioning During the Cambrian Radiation.” Proceedings of the National Academy of Sciences 112 (15): 4702–6. https://doi.org/10.1073/pnas.1424985112.
Sepkoski, J. John. 1997. “Biodiversity: Past, Present, and Future.” Journal of Paleontology 71 (4): 533–39. https://doi.org/10.1017/S0022336000040026.
Sepkoski, J. John, Jr. 1978. “A Kinetic Model of Phanerozoic Taxonomic Diversity I. Analysis of Marine Orders.” Paleobiology 4: 223–51. https://doi.org/10.2307/2400203.
Signor III, P. W., and J. H. Lipps. 1982. “Sampling Bias, Gradual Extinction Patterns and Catastrophesin the Fossil Record.” Geological Society of America Special Paper 190: 291–96.