Measuring and understanding the causes of spatial and temporal variation in population density is one of the most basic and important aspects of ecology and conservation biology. If you were allowed to do only one thing in an ecological or conservation-related study, this would probably be it.
The unmarked package in R allows one to estimate population density using distance sampling models, which account for variation in the probability that animals are detected (with probability = p) or not detected (with probability 1-p). At the most basic level,
\[ \widehat{N} = \frac{C}{ \widehat{p} } \]
The estimate of population size \(\widehat{N}\) is equal to the number counted, divided by the estimated detection probability. There are close logical parallels with models used to estimate survival rates with marked animals (the name unmarked is a nod to the well-known program MARK.)
The data to determine C and estimate \(\widehat{p}\) come from line transect sampling, in which one moves along a transect and records the perpendicular distance from the transect at which each individual (or group, see below) was detected. In practice, one actually records the linear distance and the bearing for each sighting, and then uses trigonometry to obtain the perpendicular distance.
This is done to avoid animals moving (usually running away) before one arrives to the point where they are perpendicular.
Distance sampling models correct density estimates for undetected animals by using the distribution of distances, with the assumptions that:
If assumptions 1 and 2 are valid, then the mean detection probability p is estimated as the area under the red line fitted to this frequency distribution, relative to the area under a horizontal line at p = 1.
The unmarked package in R provides a function (distsamp) to fit slightly more sophisticated distance sampling models, which allow for estimation of the effects of covariates on both the detection probability and population density. Controlling for effects of covariates on detection is usually not of ecological interest, but is necessary for accurate estimation of density. Understanding the covariates of density is often of primary interest for conservation. Doing the estimation in R allows for comparison of models using AIC scores or some other criterion, and for graphical display of results, as shown below.
Clean up the workspace, set the working directory (you’ll have to adjust the setwd() command to match the directory path on your machine; recall that path statements in R use /, not as in windows), load the unmarked package, read the data in, inspect the data.
rm(list = ls())
setwd("C:/Users/screel/Desktop/KENYA/ungulate_transects_in_R")
# you will need to set the working directory appropriately for your machine
# recall that there is a menu option under session in RStudio
library(unmarked)
## Loading required package: reshape
## Loading required package: lattice
## Loading required package: Rcpp
dists <- read.csv("knp.allsp.oct2012.csv", header=TRUE)
dists$trans <- as.factor(dists$trans) #convert transects into a categorical variable
dim(dists)
## [1] 190 13
head(dists)
## date trans type hab ht col burn lagoon seas.water species
## 1 21-Oct-12 KT16.5 T OW 1-Oct MIX P A A BUF
## 2 26-Oct-12 KT1.3 T OB 1-Oct BR N P A BUSHBUCK
## 3 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P BUSHBUCK
## 4 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P BUSHBUCK
## 5 26-Oct-12 KT5.5 O OW 1-Oct GR P A A BUSHBUCK
## 6 23-Oct-12 KT10.1 O OW 1-Oct BR C A A DUIKER
## single.mix herd.size dist
## 1 S 3 154
## 2 S 2 24
## 3 S 1 48
## 4 S 2 32
## 5 S 1 8
## 6 NA 48
tail(dists)
## date trans type hab ht col burn lagoon seas.water species
## 185 26-Oct-12 KT3.3 T OW 1-Oct BR P A A
## 186 26-Oct-12 KT5.4 O OW 1-Oct GR N A A
## 187 25-Oct-12 KT6.3 O OW 1-Oct BR N A A
## 188 24-Oct-12 KT7.1 S OW <10 BR P A A
## 189 24-Oct-12 KT8.2 O OW 1-Oct GR C A A
## 190 24-Oct-12 KT8.3 O OG >1 BR N A A
## single.mix herd.size dist
## 185 NA 0
## 186 NA 0
## 187 NA 0
## 188 NA 0
## 189 NA 0
## 190 NA 0
Subset data by species and inspect it. Here, we’ll select only the data for puku, a small obligate grazer that lives in medium-sized herds, mainly in floodplains close to permanent rivers.
dists.sub <- subset(dists, species == 'PU')
dim(dists.sub)
## [1] 61 13
head(dists.sub)
## date trans type hab ht col burn lagoon seas.water species
## 78 26-Oct-12 KT1.1 T OW 1-Oct BR N P A PU
## 79 26-Oct-12 KT1.1 T OW 1-Oct BR N P A PU
## 80 26-Oct-12 KT1.3 T OB 1-Oct BR N P A PU
## 81 26-Oct-12 KT1.3 T OB 1-Oct BR N P A PU
## 82 22-Oct-12 KT12.1 O OW 1-Oct BR C A A PU
## 83 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## single.mix herd.size dist
## 78 S 2 0
## 79 S 1 10
## 80 S 6 65
## 81 S 1 69
## 82 S 6 0
## 83 S 1 0
tail(dists.sub)
## date trans type hab ht col burn lagoon seas.water species
## 133 24-Oct-12 KT9.2 T OW 1-Oct BR P P P PU
## 134 24-Oct-12 KT9.3 T OW 1-Oct BR N P P PU
## 135 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 136 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 137 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 138 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## single.mix herd.size dist
## 133 M 2 83
## 134 S 2 29
## 135 S 1 59
## 136 S 9 66
## 137 M 50 71
## 138 S 2 101
Order the data and view it for truncation. Truncation can be useful to optimize the trade-off between maximizing sample size (yielding precise estimates of density) and including sparse data with low detection probability (and thus yielding imprecise estimates of density).
dists.order <- dists.sub[order(dists.sub$dist), ] #this orders the data from small to large perp distances
dists.order
## date trans type hab ht col burn lagoon seas.water species
## 78 26-Oct-12 KT1.1 T OW 1-Oct BR N P A PU
## 82 22-Oct-12 KT12.1 O OW 1-Oct BR C A A PU
## 83 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 84 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 97 22-Oct-12 KT13.1 O OG 1-Oct GR C A A PU
## 110 19-Oct-12 KT17.4 T FG 1-Oct GR C P P PU
## 79 26-Oct-12 KT1.1 T OW 1-Oct BR N P A PU
## 109 19-Oct-12 KT17.3 T FG 1-Oct GR P P A PU
## 120 26-Oct-12 KT4.1 S OW 1-Oct BR C A P PU
## 123 26-Oct-12 KT5.2 O CW <10 BR N A A PU
## 124 26-Oct-12 KT5.2 O CW <10 BR N A A PU
## 94 23-Oct-12 KT12B.3 S OW 1-Oct MIX P A A PU
## 85 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 112 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 134 24-Oct-12 KT9.3 T OW 1-Oct BR N P P PU
## 121 26-Oct-12 KT4.5 O OW 1-Oct BR C A A PU
## 122 26-Oct-12 KT5.1 O OW 1-Oct BR N A A PU
## 128 25-Oct-12 KT6.2 O OW 1-Oct BR C A A PU
## 132 24-Oct-12 KT9.1 T OW 1-Oct BR C P A PU
## 135 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 125 25-Oct-12 KT6.1 O OW 1-Oct MIX P A P PU
## 80 26-Oct-12 KT1.3 T OB 1-Oct BR N P A PU
## 92 23-Oct-12 KT12B.2 S OG 1-Oct GR C A A PU
## 136 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 81 26-Oct-12 KT1.3 T OB 1-Oct BR N P A PU
## 115 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 137 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 89 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 131 24-Oct-12 KT9.1 T OW 1-Oct BR C P A PU
## 113 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 133 24-Oct-12 KT9.2 T OW 1-Oct BR P P P PU
## 106 19-Oct-12 KT17.1 T FG 1-Oct GR P P P PU
## 87 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 116 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 114 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 90 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 138 19-Oct-12 KT17.5 T FG 1-Oct GR C P P PU
## 118 19-Oct-12 KT18.2 S FG 1-Oct GR N P P PU
## 99 22-Oct-12 KT14.1 O OB <10 GR N A A PU
## 117 19-Oct-12 KT18.1 S FG 1-Oct GR N P P PU
## 98 22-Oct-12 KT13.1 O OG 1-Oct GR C A A PU
## 100 22-Oct-12 KT14.2 O OG 1-Oct MIX P P P PU
## 95 23-Oct-12 KT12B.3 S OW 1-Oct MIX P A A PU
## 129 25-Oct-12 KT6.2 O OW 1-Oct BR C A A PU
## 103 21-Oct-12 KT16.10 S FG <10 GR C P P PU
## 104 21-Oct-12 KT16.10 S FG <10 GR C P P PU
## 119 19-Oct-12 KT18.2 S FG 1-Oct GR N P P PU
## 130 25-Oct-12 KT6.2 O OW 1-Oct BR C A A PU
## 88 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 126 25-Oct-12 KT6.1 O OW 1-Oct MIX P A P PU
## 108 19-Oct-12 KT17.1 T FG 1-Oct GR P P P PU
## 127 25-Oct-12 KT6.1 O OW 1-Oct MIX P A P PU
## 101 21-Oct-12 KT15.1 O OG 1-Oct GR P A A PU
## 86 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 102 21-Oct-12 KT16.10 S FG <10 GR C P P PU
## 111 19-Oct-12 KT17.4 T FG 1-Oct GR C P P PU
## 107 19-Oct-12 KT17.1 T FG 1-Oct GR P P P PU
## 91 23-Oct-12 KT12B.1 S OG 1-Oct GR C P P PU
## 93 23-Oct-12 KT12B.2 S OG 1-Oct GR C A A PU
## 96 23-Oct-12 KT12B.3 S OW 1-Oct MIX P A A PU
## 105 21-Oct-12 KT16.10 S FG <10 GR C P P PU
## single.mix herd.size dist
## 78 S 2 0
## 82 S 6 0
## 83 S 1 0
## 84 S 4 0
## 97 S 1 0
## 110 S 1 0
## 79 S 1 10
## 109 S 1 14
## 120 M 1 15
## 123 S 1 19
## 124 S 1 21
## 94 S 3 24
## 85 S 16 27
## 112 S 5 27
## 134 S 2 29
## 121 M 1 30
## 122 S 1 44
## 128 M 1 45
## 132 S 1 47
## 135 S 1 59
## 125 S 1 63
## 80 S 6 65
## 92 M 7 66
## 136 S 9 66
## 81 S 1 69
## 115 S 14 69
## 137 M 50 71
## 89 S 3 76
## 131 S 1 80
## 113 M 4 82
## 133 M 2 83
## 106 S 19 84
## 87 S 11 85
## 116 S 2 86
## 114 M 3 90
## 90 S 1 92
## 138 S 2 101
## 118 S 5 106
## 99 S 14 117
## 117 S 11 119
## 98 S 13 121
## 100 S 27 125
## 95 S 1 127
## 129 S 2 163
## 103 S 1 175
## 104 S 7 182
## 119 M 1 198
## 130 S 3 204
## 88 M 7 227
## 126 M 1 233
## 108 S 16 244
## 127 M 6 249
## 101 S 7 263
## 86 M 17 275
## 102 M 7 301
## 111 M 2 369
## 107 S 21 412
## 91 S 1 476
## 93 S 9 498
## 96 S 8 555
## 105 S 6 614
dim(dists.order)
## [1] 61 13
300m looks reasonable as a cutoff. Truncate the data:
dists.trunc <- dists.order[which(dists.order$dist < 300), ]
dim(dists.trunc)
## [1] 54 13
#dists.trunc # commented out data inspection to shorten the output
Aggregate detections into 50 m or 100 m intervals
yDat <- formatDistData(dists.trunc, distCol="dist", transectNameCol="trans", dist.breaks=c(0, 50, 100, 150, 200, 250, 300)) #create distance categories (m). Could change these.
yDat
## [0,50] (50,100] (100,150] (150,200] (200,250] (250,300]
## KT1.1 2 0 0 0 0 0
## KT1.2 0 0 0 0 0 0
## KT1.3 0 2 0 0 0 0
## KT1.4 0 0 0 0 0 0
## KT1.5 0 0 0 0 0 0
## KT10.1 0 0 0 0 0 0
## KT10.2 0 0 0 0 0 0
## KT10.3 0 0 0 0 0 0
## KT11.1 0 0 0 0 0 0
## KT11.2 0 0 0 0 0 0
## KT11.3 0 0 0 0 0 0
## KT12.1 1 0 0 0 0 0
## KT12.2 0 0 0 0 0 0
## KT12B.1 3 3 0 0 1 1
## KT12B.2 0 1 0 0 0 0
## KT12B.3 1 0 1 0 0 0
## KT12B.4 0 0 0 0 0 0
## KT13.1 1 0 1 0 0 0
## KT13.2 0 0 0 0 0 0
## KT13.3 0 0 0 0 0 0
## KT14.1 0 0 1 0 0 0
## KT14.2 0 0 1 0 0 0
## KT14.3 0 0 0 0 0 0
## KT14.4 0 0 0 0 0 0
## KT15.1 0 0 0 0 0 1
## KT15.2 0 0 0 0 0 0
## KT15.3 0 0 0 0 0 0
## KT15.4 0 0 0 0 0 0
## KT15.5 0 0 0 0 0 0
## KT16.1 0 0 0 0 0 0
## KT16.10 0 0 0 2 0 0
## KT16.2 0 0 0 0 0 0
## KT16.3 0 0 0 0 0 0
## KT16.4 0 0 0 0 0 0
## KT16.5 0 0 0 0 0 0
## KT16.6 0 0 0 0 0 0
## KT16.7 0 0 0 0 0 0
## KT16.8 0 0 0 0 0 0
## KT16.9 0 0 0 0 0 0
## KT17.1 0 1 0 0 1 0
## KT17.2 0 0 0 0 0 0
## KT17.3 1 0 0 0 0 0
## KT17.4 1 0 0 0 0 0
## KT17.5 0 3 1 0 0 0
## KT18.1 1 4 1 0 0 0
## KT18.2 0 0 1 1 0 0
## KT18.3 0 0 0 0 0 0
## KT3.1 0 0 0 0 0 0
## KT3.2 0 0 0 0 0 0
## KT3.3 0 0 0 0 0 0
## KT3.4 0 0 0 0 0 0
## KT4.1 1 0 0 0 0 0
## KT4.2 0 0 0 0 0 0
## KT4.3 0 0 0 0 0 0
## KT4.4 0 0 0 0 0 0
## KT4.5 1 0 0 0 0 0
## KT5.1 1 0 0 0 0 0
## KT5.2 2 0 0 0 0 0
## KT5.3 0 0 0 0 0 0
## KT5.4 0 0 0 0 0 0
## KT5.5 0 0 0 0 0 0
## KT6.1 0 1 0 0 2 0
## KT6.2 1 0 0 1 1 0
## KT6.3 0 0 0 0 0 0
## KT7.1 0 0 0 0 0 0
## KT7.2 0 0 0 0 0 0
## KT7.3 0 0 0 0 0 0
## KT8.1 0 0 0 0 0 0
## KT8.2 0 0 0 0 0 0
## KT8.3 0 0 0 0 0 0
## KT8.4 0 0 0 0 0 0
## KT8.5 0 0 0 0 0 0
## KT8.6 0 0 0 0 0 0
## KT9.1 1 1 0 0 0 0
## KT9.2 0 1 0 0 0 0
## KT9.3 1 0 0 0 0 0
nrow(yDat) #Should be 76 rows for this subset of KNP data, equal to ~120 km of transects sampled on one occasion.
## [1] 76
Read in and inspect the covariates that will be used to model effects on p and density.
covs <- read.csv("C:/Users/screel/Desktop/KENYA/ungulate_transects_in_R/knp_covs_oct2012.csv", header=TRUE)
head(covs)
## date trans type length hab ht col burn lagoon seas.water
## 1 26-Oct-12 KT1.1 T 2000 OW 10-1 BR N P A
## 2 26-Oct-12 KT1.2 T 300 OW 10-1 BR N P A
## 3 26-Oct-12 KT1.3 T 2000 OB 10-1 BR N P A
## 4 26-Oct-12 KT1.4 T 2000 CW 10-1 BR N A A
## 5 26-Oct-12 KT1.5 T 1700 CW 10-1 BR N P A
## 6 23-Oct-12 KT10.1 O 2000 OW 10-1 BR C A A
tail(covs)
## date trans type length hab ht col burn lagoon seas.water
## 71 25-Oct-12 KT6.1 O 2000 OW 10-1 MIX P A P
## 72 24-Oct-12 KT9.2 T 1200 OW 10-1 BR P P P
## 73 24-Oct-12 KT9.3 T 900 OW 10-1 BR N P P
## 74 26-Oct-12 KT3.2 T 2000 OW 10-1 BR P A A
## 75 26-Oct-12 KT3.3 T 2000 OW 10-1 BR P A A
## 76 25-Oct-12 KT6.3 O 1400 OW 10-1 BR N A A
nrow(covs)
## [1] 76
Create the unmarked dataframe. unmarked requires input in a dataframe with a specific format, and provides a convenience function to create it. You can use context sensitive help (F1 in Rstudio) to better understand the arguments of this function and their default values.
umf <- unmarkedFrameDS(y=as.matrix(yDat), siteCovs=covs, survey="line", dist.breaks=c(0, 50, 100, 150, 200, 250, 300), tlength=covs$length, unitsIn="m")
head(umf)
## Data frame representation of unmarkedFrame object.
## y.1 y.2 y.3 y.4 y.5 y.6 date trans type length hab ht col
## KT1.1 2 0 0 0 0 0 26-Oct-12 KT1.1 T 2000 OW 10-1 BR
## KT1.2 0 0 0 0 0 0 26-Oct-12 KT1.2 T 300 OW 10-1 BR
## KT1.3 0 2 0 0 0 0 26-Oct-12 KT1.3 T 2000 OB 10-1 BR
## KT1.4 0 0 0 0 0 0 26-Oct-12 KT1.4 T 2000 CW 10-1 BR
## KT1.5 0 0 0 0 0 0 26-Oct-12 KT1.5 T 1700 CW 10-1 BR
## KT10.1 0 0 0 0 0 0 23-Oct-12 KT10.1 O 2000 OW 10-1 BR
## KT10.2 0 0 0 0 0 0 23-Oct-12 KT10.2 O 2000 CW 10-1 BR
## KT10.3 0 0 0 0 0 0 23-Oct-12 KT10.3 O 1800 CW 10-1 BR
## KT11.1 0 0 0 0 0 0 23-Oct-12 KT11.1 O 2000 CW 10-1 GR
## KT11.2 0 0 0 0 0 0 23-Oct-12 KT11.2 O 2000 CW 10-1 GR
## burn lagoon seas.water
## KT1.1 N P A
## KT1.2 N P A
## KT1.3 N P A
## KT1.4 N A A
## KT1.5 N P A
## KT10.1 C A A
## KT10.2 C A A
## KT10.3 C A A
## KT11.1 C A A
## KT11.2 P A A
View the frequency distribution for perpendicular distances as a final check
hist(umf, xlab="distance (m)", main="", cex.lab=0.8, cex.axis=0.8)
The fact that the frequency of detections falls way off past 100m strongly suggest that we would underestimate density if we did not use distance sampling to correct for the probability of detection being less than one. This sort of pattern is nearly universal – inferences based on ‘naive’ density estimates that fail to address detection should be treated with caution. These are surprisingly common… for example, elk counts in Montana and YNP do not account for detection.
distsamp(~1, ~1) fits a model with just one mean value (an “intercept only” model) for each parameter.
Recall that you can use context-sensitive help (F1) to understand the arguments of the distsamp function (or any function.) The first parameter fit is p, so the first ~1 means ‘fit an intercept-only model for p – just determine the overall mean detection probability’. The second parameter fit is density, N, so the second ~1 means ‘fit an intercept only model for N – just determine one overall estimate of density’. (If you’re reading the code carefully you’ll have noticed that unmarked is estimating population density (N/area) in the units of area we specified in the unmarkedFrameDS function, rather than population size, N.)
Unmarked can fita variety of distributions to the histogram of distances: here we consider halfnormal, hazard and uniform distribution, running the simplest models first to check if the data are adequate for simple modelling with distance sampling methods.
m.half <- distsamp(~1 ~1, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.haz <- distsamp(~1 ~1, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.uni <- distsamp(~1 ~1, umf, keyfun="uniform", output="density", unitsOut="kmsq")
m.half
##
## Call:
## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "halfnorm", output = "density",
## unitsOut = "kmsq")
##
## Density:
## Estimate SE z P(>|z|)
## 0.396 0.175 2.26 0.0235
##
## Detection:
## Estimate SE z P(>|z|)
## 4.81 0.122 39.4 0
##
## AIC: 375.2169
m.haz
##
## Call:
## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "hazard", output = "density",
## unitsOut = "kmsq")
##
## Density:
## Estimate SE z P(>|z|)
## 0.507 0.223 2.28 0.0228
##
## Detection:
## Estimate SE z P(>|z|)
## 4.5 0.299 15.1 2.97e-51
##
## Hazard-rate(scale):
## Estimate SE z P(>|z|)
## 0.629 0.327 1.92 0.0547
##
## AIC: 374.9477
m.uni
##
## Call:
## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "uniform", output = "density",
## unitsOut = "kmsq")
##
## Density:
## Estimate SE z P(>|z|)
## -0.284 0.136 -2.09 0.0367
##
## AIC: 399.4951
All three models can be fit successfully. AIC scores indicate that the uniform distribution is substantially worse than the other two, which makes sense given the shape of the distibution (plotted above).
Proceed to fitting an a priori model set. Note that in this set, I am not considering covariates for p, though the estimates of density do take the mean value of p into account, while testing for variables that affect density.
#half-norm:
m.half.1.type <- distsamp(~1 ~type, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.hab <- distsamp(~1 ~hab, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.ht <- distsamp(~1 ~ht, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.col <- distsamp(~1 ~col, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.burn <- distsamp(~1 ~burn, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.lagoon <- distsamp(~1 ~lagoon, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
m.half.1.seaswater <- distsamp(~1 ~seas.water, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")
#hazard:
m.haz.1.type <- distsamp(~1 ~type, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.hab <- distsamp(~1 ~hab, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.ht <- distsamp(~1 ~ht, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.col <- distsamp(~1 ~col, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.burn <- distsamp(~1 ~burn, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.lagoon <- distsamp(~1 ~lagoon, umf, keyfun="hazard", output="density", unitsOut="kmsq")
m.haz.1.seaswater <- distsamp(~1 ~seas.water, umf, keyfun="hazard", output="density", unitsOut="kmsq")
Fit list and model selection. This code organizes all of the output from the models and compares them via AIC scores.
fmList <- fitList(m.half=m.half, m.uni=m.uni,
m.half.1.type=m.half.1.type, m.half.1.hab=m.half.1.hab, m.half.1.ht=m.half.1.ht,
m.half.1.col=m.half.1.col, m.half.1.burn=m.half.1.burn, m.half.1.lagoon=m.half.1.lagoon,
m.half.1.seaswater=m.half.1.seaswater, m.haz.1.type=m.haz.1.type, m.haz.1.hab=m.haz.1.hab,
m.haz.1.ht=m.haz.1.ht, m.haz.1.burn=m.haz.1.burn,
m.haz.1.lagoon=m.haz.1.lagoon, m.haz.1.seaswater=m.haz.1.seaswater, m.haz.1.col=m.haz.1.col, m.haz=m.haz)
modSel(fmList)
## nPars AIC delta AICwt cumltvWt
## m.haz.1.hab 7 368.50 0.00 2.7e-01 0.27
## m.half.1.hab 6 368.76 0.27 2.4e-01 0.51
## m.haz.1.ht 5 369.64 1.14 1.5e-01 0.66
## m.half.1.ht 4 369.91 1.41 1.3e-01 0.80
## m.haz.1.burn 5 372.56 4.07 3.6e-02 0.83
## m.haz.1.type 5 372.76 4.27 3.2e-02 0.87
## m.half.1.burn 4 372.83 4.34 3.1e-02 0.90
## m.half.1.type 4 373.03 4.53 2.8e-02 0.93
## m.haz.1.col 5 374.63 6.13 1.3e-02 0.94
## m.half.1.col 4 374.90 6.40 1.1e-02 0.95
## m.haz 3 374.95 6.45 1.1e-02 0.96
## m.haz.1.seaswater 4 375.21 6.71 9.5e-03 0.97
## m.half 2 375.22 6.72 9.5e-03 0.98
## m.half.1.seaswater 3 375.48 6.98 8.3e-03 0.99
## m.haz.1.lagoon 4 375.92 7.43 6.6e-03 0.99
## m.half.1.lagoon 3 376.19 7.70 5.8e-03 1.00
## m.uni 1 399.50 31.00 5.1e-08 1.00
Goodness of fit test: good fit if results of these tests show p > 0.05. First, fit a model: use your best-supported model (Lowest AIC). Then check GOF for that model.
(fm <- distsamp(~1 ~hab, keyfun="halfnorm", umf)) #starts=c(1, 1, 1, 1, 1, 1, 1)))
##
## Call:
## distsamp(formula = ~1 ~ hab, data = umf, keyfun = "halfnorm")
##
## Density:
## Estimate SE z P(>|z|)
## (Intercept) -4.839 0.394 -12.293 9.85e-35
## habFG -0.255 0.690 -0.370 7.11e-01
## habOB 0.614 0.802 0.766 4.44e-01
## habOG 1.530 0.476 3.217 1.29e-03
## habOW 0.747 0.420 1.779 7.52e-02
##
## Detection:
## Estimate SE z P(>|z|)
## 4.81 0.122 39.4 0
##
## AIC: 368.7646
# Function returning three fit-statistics.
fitstats <- function(fm) {
observed <- getY(fm@data)
expected <- fitted(fm)
resids <- residuals(fm)
sse <- sum(resids^2)
chisq <- sum((observed - expected)^2 / expected)
freeTuke <- sum((sqrt(observed) - sqrt(expected))^2)
out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke)
return(out)
}
(pb <- parboot(fm, fitstats, nsim=25, report=1))
## t0 = 88.13049 892.762 73.80425
## 61.7,492.9,71.8
## 40.6,406.3,51.1
## 64.1,460,62
## 43.8,504.8,56.8
## 48.6,476.4,58.3
## 53.2,383.3,60.3
## 45.4,300,55.7
## 55.6,380,63.7
## 61.1,474.7,70.6
## 48.8,547.1,59.9
## 54.2,409.1,63.5
## 74.2,363.6,77
## 64.6,400.8,63.9
## 57.3,334.4,65.2
## 36.5,582.4,46.6
## 52.7,456.6,63.3
## 39.9,664.5,53.5
## 64.3,381.4,66.3
## 61.1,407.5,68.4
## 59,425.5,63.9
## 50.1,429.4,58.2
## 49.4,521.7,61.8
## 72.8,416.8,79.6
## 56.6,366.8,65
## 45.8,491.2,54.3
##
## Call: parboot(object = fm, statistic = fitstats, nsim = 25, report = 1)
##
## Parametric Bootstrap Statistics:
## t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
## SSE 88.1 33.7 9.80 0.0000
## Chisq 892.8 449.7 81.44 0.0000
## freemanTukey 73.8 11.4 7.58 0.0769
##
## t_B quantiles:
## 0% 2.5% 25% 50% 75% 97.5% 100%
## SSE 37 39 49 54 61 73 74
## Chisq 300 321 383 426 491 615 665
## freemanTukey 47 49 58 63 65 78 80
##
## t0 = Original statistic compuated from data
## t_B = Vector of bootstrap samples
There is evidence of some GOF problems for these models.
Below, we are making predictions from univariate models. Alternatively we could use multivariate models to show predictions with other covariates held at their means.
First, we obtain predictions for density of puku in each habitat, then predictions for presence vs. absence of water, burned vs unburned areas, green/brown/mixed grass cover, and local presence vs absence of dambos (seasonal lagoons)
#Predict with habitat
m.hab <- data.frame(hab=factor(c('CW', 'FG', 'OB', 'OG', 'OW')))
hab.pred <- predict(m.half.1.hab, type="state", newdata=m.hab, appendData=TRUE)
hab.pred
## Predicted SE lower upper hab
## 1 0.7917147 0.3116250 0.3660405 1.712412 CW
## 2 0.6133793 0.3604603 0.1938695 1.940657 FG
## 3 1.4623150 1.0465805 0.3596127 5.946300 OB
## 4 3.6569708 1.1295464 1.9962122 6.699406 OG
## 5 1.6706024 0.3559837 1.1002543 2.536606 OW
#Predict with seasonal water
m.seaswater <- data.frame(seas.water=factor(c('Y', 'N')))
seaswater.predict<- predict(m.haz.1.seaswater, type="state", newdata=m.seaswater, appendData=TRUE)
seaswater.predict
## Predicted SE lower upper seas.water
## 1 1.049433 0.4668786 0.4387999 2.509821 Y
## 2 1.789522 0.4077539 1.1449466 2.796975 N
#Predict with burn status
m.burn <- data.frame(burn=factor(c('Y', 'N', 'P')))
burn.pred <- predict(m.half.1.burn, type="state", newdata=m.burn, appendData=TRUE)
burn.pred
## Predicted SE lower upper burn
## 1 0.8935453 0.2759991 0.4877493 1.636954 Y
## 2 1.6058702 0.3851266 1.0036255 2.569503 N
## 3 2.1725212 0.5412600 1.3332034 3.540231 P
#predict with grass color:
m.col <- data.frame(col=factor(c('BR', 'GR', 'MIX')))
col.pred <- predict(m.half.1.col, type="state", newdata=m.col, appendData=TRUE)
col.pred
## Predicted SE lower upper col
## 1 1.8661789 0.3841525 1.2466187 2.793656 BR
## 2 1.2906892 0.3524577 0.7557497 2.204273 GR
## 3 0.7988435 0.3678780 0.3239456 1.969932 MIX
#Predict with lagoon
m.lagoon <- data.frame(lagoon=factor(c('Y', 'N'))) #SLNP
lagoon.predict<- predict(m.half.1.lagoon, type="state", newdata=m.lagoon, appendData=TRUE)
lagoon.predict
## Predicted SE lower upper lagoon
## 1 1.135166 0.3799783 0.5890245 2.187687 Y
## 2 1.598201 0.2981475 1.1087613 2.303693 N
You’ll typically want to show results graphically. Here is an example with mean herd densities for each habitat, with standard errors. The code above used the predict function to obtain the predicted values for each habitat from the distsamp model ‘m.half.1.hab’ (with an intercept only for p, habitat effects on N, and a half-normal distibution fit to the frequency distribution for perpendicual distances). The output was a assigned to dataframe (hab.pred) with variables named Predicted (the estimated density), SE (the standard error of this mean), lower (lower 95% confidence limit), upper (upper 95% CL), and hab (habitat type).
hab.pred
Predicted SE lower upper hab
1 0.7917147 0.3116250 0.3660405 1.712412 CW
2 0.6133793 0.3604603 0.1938695 1.940657 FG
3 1.4623150 1.0465805 0.3596127 5.946300 OB
4 3.6569708 1.1295464 1.9962122 6.699406 OG
5 1.6706024 0.3559837 1.1002543 2.536606 OW
This code uses the package ggplot2 to produce pretty graphics. A similar plot could be made more simply with base R plotting functions (barplot, for example).
library(ggplot2)
ggplot(hab.pred, aes(x=hab, y=Predicted)) +
geom_bar(position=position_dodge(), stat="identity", colour = 'blue', fill = 'light blue') +
geom_errorbar(aes(ymin=Predicted-SE, ymax=Predicted+SE), width=.2, colour = 'blue') +
scale_x_discrete(name = expression(bold('Vegetation Type'))) +
scale_y_continuous(name = expression(bold(paste("Herds / ", km^bold("2"))))) +
theme_bw()
This figure is a nice, simple way to show the effect of vegetation type on puku density, with a fairly strong tendency for density to be higher in open habitats, as expected for an obligate grazer. (Aside: puku density is rather low in flooded grassland because Kafue also holds a large population of red lechwe, a species that specializes on flooded grasslands and competitively excludes puku from those areas.)
checking whether herd size affects the likelihood of detection, by examining the relationship between herd size and perpendicular distance. A lack of relationship here (no significant slope) indicates that herd size does not detectably affect detection.
#Regress cluster size on distance
plot(dists.trunc$dist, dists.trunc$herd.size)
cluster.det <- lm(dists.trunc$herd.size ~ dists.trunc$dist)
anova(cluster.det)
## Analysis of Variance Table
##
## Response: dists.trunc$herd.size
## Df Sum Sq Mean Sq F value Pr(>F)
## dists.trunc$dist 1 142.6 142.601 2.042 0.159
## Residuals 52 3631.3 69.833
summary(cluster.det) # if p<0.15, then use estimate of herd size at distance 0, include SE to propagate uncertainty
##
## Call:
## lm(formula = dists.trunc$herd.size ~ dists.trunc$dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.099 -4.065 -3.070 1.235 44.373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.10570 1.76631 2.324 0.024 *
## dists.trunc$dist 0.02143 0.01500 1.429 0.159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.357 on 52 degrees of freedom
## Multiple R-squared: 0.03779, Adjusted R-squared: 0.01928
## F-statistic: 2.042 on 1 and 52 DF, p-value: 0.159
#Regress log(cluster size) on distance
plot(dists.trunc$dist, log(dists.trunc$herd.size))
l.herd <- log(dists.trunc$herd.size)
cluster.det <- lm(l.herd ~ dists.trunc$dist)
summary(cluster.det)
##
## Call:
## lm(formula = l.herd ~ dists.trunc$dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8110 -0.7995 -0.1211 0.8118 2.8721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.701921 0.227956 3.079 0.00331 **
## dists.trunc$dist 0.004760 0.001935 2.460 0.01727 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.078 on 52 degrees of freedom
## Multiple R-squared: 0.1042, Adjusted R-squared: 0.08699
## F-statistic: 6.049 on 1 and 52 DF, p-value: 0.01727
The models are fit using detections of each ‘cluster’ of one or more individuals; here, puku herds. We usually want to convert the estimate of herd density into an estimate of individual density. If the regression of herd size on distance is strong (p<0.15 is a good rule of thumb), then we can use the estimated herd size at distance=0 to convert the density of herds to the density of individuals.
First we need to backtransform the estimated intercept (herd size at distance = 0) and SE from the log-scale.
bt.est <- exp(0.701921) #exp of the intercept
bt.est
## [1] 2.017625
bt.se <- 0.227956/(1/bt.est) #SE of intercept divided by 1 over the back-transformed estimate
bt.se
## [1] 0.4599297
If the regression of herd size on density is not strong (p>0.15 is a rule of thumb) and there are no important covariates of density in top model, then the overall mean herd size is appropriate to convert herd density to individual density:
avg.herd <- mean(dists.trunc$herd.size)
avg.herd
## [1] 6.037037
sd.herd <- sd(dists.trunc$herd.size)
se.herd <- sd.herd/(sqrt(nrow(dists.trunc)))
dim(dists.trunc)
## [1] 54 13
If the covariates in the best density model also affect herd size, we must determine mean herd size for each level of the relevant covariates:
library(plyr)
##
## Attaching package: 'plyr'
##
## The following objects are masked from 'package:reshape':
##
## rename, round_any
#Mean and SE of herd size by habitats:
herdsize.hab <- ddply(dists.trunc,~hab,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))
herdsize.hab
## hab mean se
## 1 CW 1.000000 0.0000000
## 2 FG 8.444444 2.7820882
## 3 OB 7.000000 3.7859389
## 4 OG 8.846154 2.1508679
## 5 OW 2.000000 0.3791976
#lagoons
ddply(dists.trunc,~seas.water,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))
## seas.water mean se
## 1 A 3.409091 0.8287556
## 2 P 7.843750 1.7961144
#burn status
ddply(dists.trunc,~burn,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))
## burn mean se
## 1 C 6.461538 1.987059
## 2 N 4.352941 1.084453
## 3 P 7.636364 2.707809
So we convert from density of herds to density of individuals by multiplication:
Take the density estimates for each covariate level from the predict function above (for example, the density of herds in each habitat)
Multiply these by the mean herd size from the ddply function to obtain individual density estimates, corrected for the influence of variables found to affect density, herd size, or both.
hab.ind.density <- hab.pred$Predicted*herdsize.hab$mean
hab.ind.density
## [1] 0.7917147 5.1796478 10.2362051 32.3501260 3.3412047
The values in the vector hab.ind.density are the densities of puku individuals in the five major habitats (or more correctly, vegetation types)
CW (closed woodland) FG (flooded grassland) OB (open bushland) OG (open grassland) OW (open woodland)
Habitat-specific herd densities from hab.pred$Predicted were:
Predicted SE lower upper hab
1 0.7917147 0.3116250 0.3660405 1.712412 CW
2 0.6133793 0.3604603 0.1938695 1.940657 FG
3 1.4623150 1.0465805 0.3596127 5.946300 OB
4 3.6569708 1.1295464 1.9962122 6.699406 OG
5 1.6706024 0.3559837 1.1002543 2.536606 OW
And habitat-specific mean herd sizes from herdsize.hab$mean were:
hab mean se
1 CW 1.000000 0.0000000
2 FG 8.444444 2.7820882
3 OB 7.000000 3.7859389
4 OG 8.846154 2.1508679
5 OW 2.000000 0.3791976
Make a new figure of individual density by habitat type, to compare to the pattern for herd density. First, use the cbind function to make an appropriate dataframe.
hab.ind.pred <- cbind(hab.pred, hab.ind.density)
hab.ind.pred
## Predicted SE lower upper hab hab.ind.density
## 1 0.7917147 0.3116250 0.3660405 1.712412 CW 0.7917147
## 2 0.6133793 0.3604603 0.1938695 1.940657 FG 5.1796478
## 3 1.4623150 1.0465805 0.3596127 5.946300 OB 10.2362051
## 4 3.6569708 1.1295464 1.9962122 6.699406 OG 32.3501260
## 5 1.6706024 0.3559837 1.1002543 2.536606 OW 3.3412047
Then make minor modifications to the code for the previous plot using the ggplot function.
ggplot(hab.ind.pred, aes(x=hab, y=hab.ind.density)) +
geom_bar(position=position_dodge(), stat="identity", colour = 'dark red', fill = 'red') +
# geom_errorbar(aes(ymin=Predicted-SE, ymax=Predicted+SE), width=.2, colour = 'blue') +
scale_x_discrete(name = expression(bold('Vegetation Type'))) +
scale_y_continuous(name = expression(bold(paste("Puku / ", km^bold("2"))))) +
theme_bw()
A much stronger preference for open grassland is revealed. Note that I ‘commented out’ the error bar code because we haven’t calculated variances for individual density yet… that code is just below. (When executing a script, R ignores comments, which are identified by a #. By putting a # at the start of a line, or commenting it out, you can disable code without deleting it, if you want to keep it for later use.)
When we calculate a parameter (like individual density) that is itself a function of two or more parameters that we’ve already estimated (like herd density and herd size), we can use the delta method to combine the variances of the original parameters to yield the variance of the final parameter. Briefly, the delta method is a way of calculating the variance of a function of one or more variables. For linear functions, there is an exact solution to the problem of determining the variance of a function, using the variances of the variables in the function. For non-linear functions, there is no exact solution, but the delta method usually provides a reasonable approximation. (The delta method relies on Taylor series approximation, which may be familiar to you from calculus but is beyond the scope of this exercise.)
A delta method example, calculating a variance for individual puku density in OW (open woodland) vegetation:
library(msm) #there is a deltamethod function in the msm package; equivalent functions are available in other packages.
N1 = 1.67 #herd density of puku in OW from hab.pred
N2 = 2.00 #mean herd size of puku in OW from herdsize.hab
s.1 = (0.36* sqrt(76))^2 # converting SE to variance for herd density
s.2 = (0.38* sqrt(76))^2 # and then for herd size
sigma = matrix(c(s.1,0,0,s.2),2,2) # putting those variances into a 2X2 variance-covariance matrix with covariances = 0
ind.density.se=deltamethod(~x2*x1,c(N1,N2),sigma, ses = TRUE) # applying the delta method with the deltamethod function in the msm package
ind.density.se
## [1] 8.366893
Quite a large standard error in this habitat-specific estimate of individual density when the variance in both estimated density and estimate herd size for that habitat are propagated forward. However, that is not too surprising when you consider how little data I’ve provided for this estimate of density in just one habitat type, with only one set of surveys, with just one species under consideration… a clear example of the fact that partitioning a small data set often causes uncertainties to become large.
Obtain a confidence interval
lcl.density = max(0,(N1*N2 - 1.96 * ind.density.se)) # lower confidence limit is bounded at zero
ucl.density = N1*N2 + 1.96 * ind.density.se
CI.density = c(lcl.density, ucl.density)
CI.density
## [1] 0.00000 19.73911