Introduction to Model Selection in R

AIC and hypothesis testing

Gabriel P Andrade Ponce

Gabriel.Andrade@sfasu.edu

Postdoctoral Research Associate of Wildlife Management | Arthur Temple College of Forestry and Agriculture

Nature is complex

We are interested in understanding the factors and drivers that affect our phenomenon of interest.

How do scientists deal with such complexity?

We tackle complexity by formulating simple hypotheses that can be expressed as mathematical models.

Imagine at least two hypotheses, drivers, or factors that could influence your phenomenon of interest.

Are these factors mutually exclusive?

Which one do you think is more important?

Model selection

wait a moment……..

how do I translate my hypotheses into mathematical models?
How can I determine which model provides the best fit to my data?

Skunk abundance example

We already know a tool, right?

Looks like a job for a regression!

Since we have count data, we need to use a generalized linear model (GLM) with a Poisson family.

The formal notation for a Poisson probability distribution is

\[ N \sim Poisson (\lambda)\]

Using the log link function, we can model \(\lambda\)

\[ log(\lambda) = \beta_0 + \beta_1*x \]

In our case:

\(\lambda=\) Expected or mean number of skunks
\(\beta_0=\) The intercept
\(\beta_1=\) The slope
\(x=\) distance to road values

Now we can do the same for each of the predictions in our hypotheses.

Shelters

\[log(\text{Skunks}) = \beta_0 + \beta_1* \text{Number of shelters} \]

Resources and Shelters

\[log(\text{Skunks}) = \beta_0 + \beta_1* \text{Number of shelters}+ \beta_2* \text{Resources}\]

What about a model that tells me skunk abundance increases with the number of shelters but also depends on resources?

\[ \begin{split} \log(\text{Skunks}) = \beta_0 + \beta_1 \times \text{Number of shelters} \\ + \beta_2 \times \text{Resources} + \beta_3 \times \text{shelters*resource} \end{split} \]

Be careful with specification of the model, because each one tells a different story.

Characteristics of a good model

Explains the data well while leaving out any unnecessary details.

Parsimony

Approaches to estimating model parsimony

Null Hypothesis Testing

Information criteria

Approaches to estimating model parsimony

Bayesian

One thing in common: Maximum Likelihood Estimation (MLE)

MLE is a method for finding the set of parameters (\(\theta\)) that make our observed data most likely to occur under a given model.

In other words, we try different parameter values and choose the ones that give the highest probability of observing the data we actually have.

It assumes that all observations are independent and come from the same probability distribution.

A model with a higher log-likelihood value fits the data better, it’s more consistent with what we observed.

Null hypothesis testing

Suppose we have a series of models representing nested hypotheses

To compare nested models, we can use the Likelihood Ratio Test (LRT).

The likelihood ratio is calculated as:

\[ \lambda= \frac{L_{reduced-model}}{L_{full-model}} \]

To make the calculation easier, we often use the test statistic:

\[ D= -2(ln L_{reduced}-ln L_{full}) \]

Under the null hypothesis, \(D\) is approximately chi-squared distributed (\(\chi^2\)) with degrees of freedom equal to the difference in the number of parameters between the models.

We will use hare presence–absence data from (Murray et al. 2020)

You can download the data directly from here 🐰

hares <- read.csv("hare.orig.csv")

head(hares)

  occupancy00 PatchArea_raw PatchArea.stand PatchCon_raw PatchCon.stand
1           1            29      -0.7113976    0.8920588       0.616758
2           0            32      -0.6646382    0.9429488       1.232089
3           0            27      -0.7425704    0.9295989       1.070670
4           1           249       2.7176196    0.9735905       1.602590
5           0           181       1.6577416    0.9450229       1.257168
6           1            23      -0.8049162    0.9507410       1.326307
  Cover_raw Cover.stand  Food_raw Food.stand
1  18.87500  -2.6675387 0.1428571  0.2415567
2  22.90625  -2.4736539 0.0000000 -0.5636526
3  63.50000  -0.5212791 0.0000000 -0.5636526
4  89.25000   0.7171789 0.2857143  1.0467660
5  77.37500   0.1460454 0.0000000 -0.5636526
6  93.96429   0.9439146 0.0000000 -0.5636526

We will use a Bernoulli distribution, which models success/failure counts for each trial.

We can fit a GLM using the glm function:

ht_mod1 <- glm(
  occupancy00 ~ PatchArea.stand + PatchCon.stand + Cover.stand + Food.stand, 
  family = binomial(),  # specify the distribution
  data = hares          # specify the dataset
)

summary(ht_mod1)


Call:
glm(formula = occupancy00 ~ PatchArea.stand + PatchCon.stand + 
    Cover.stand + Food.stand, family = binomial(), data = hares)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)       0.9536     0.1613   5.913 3.35e-09 ***
PatchArea.stand   0.1397     0.1344   1.040    0.298    
PatchCon.stand   -0.1009     0.1245  -0.810    0.418    
Cover.stand       0.5688     0.1251   4.548 5.42e-06 ***
Food.stand        1.4500     0.2808   5.163 2.42e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 476.24  on 365  degrees of freedom
Residual deviance: 392.38  on 361  degrees of freedom
AIC: 402.38

Number of Fisher Scoring iterations: 6

Which in our formal notation would be equivalent to

\[ \log\left[ \frac { P( \operatorname{occupancy00} = \operatorname{1} ) }{ 1 - P( \operatorname{occupancy00} = \operatorname{1} ) } \right] = \alpha \\+ \beta_{1}(\operatorname{PatchArea.stand}) \\ + \beta_{2}(\operatorname{PatchCon.stand}) \\+ \beta_{3}(\operatorname{Cover.stand}) + \beta_{4}(\operatorname{Food.stand}) \]

This equation shows the log-odds of hare occupancy as a linear combination of our predictors. Each coefficient represents the change in log-odds associated with a one-unit change in the predictor, holding others constant.

First, we need to check some assumptions

Violating these assumptions can invalidate the maximum likelihood test (Zuur et al. 2009).

Evaluating assumptions in a GLM depends on the type of distribution family. Fortunately, some packages make this process relatively simple (Hartig 2022; Lüdecke et al. 2021).

Check for overdispersion

Is there more or less variation than the model explains?

library(DHARMa)

testOverdispersion(ht_mod1)


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.9793, p-value = 0.664
alternative hypothesis: two.sided

library(performance)

check_overdispersion(ht_mod1)

# Overdispersion test

 dispersion ratio = 0.979
          p-value = 0.664

Test the distributional assumptions

Binomial or count models rarely have normal residuals — even when the model is good. Fortunately, DHARMa comes to the rescue!

Using Dharma

testUniformity(ht_mod1)


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  simulationOutput$scaledResiduals
D = 0.031125, p-value = 0.8702
alternative hypothesis: two-sided

Using performance

library(dplyr)
perf_res <- check_residuals(ht_mod1)
perf_res

OK: Simulated residuals appear as uniformly distributed (p = 0.870).

plot(perf_res)

Homogeneity of variance (or deviance)e

Important: This test assumes that the deviance is equal across groups or along the residuals.

# Test for homogeneity of variance (deviance)
# Checks whether the spread of residuals is consistent across groups or levels of predictors
testQuantiles(ht_mod1)


    Test for location of quantiles via qgam

data:  res
p-value = 0.9509
alternative hypothesis: both

One Cool thing about performance is that we can check all the assumptions at once with the check_model(ht_mod1)

Now let’s move on to the test

We use the Anova function to generate a deviance analysis table. When applied to a single model, it compares the variance explained by each predictor to that of a null model.

H0= There is no difference in the explained variance of the Null model and the one with X independent variable.

Ha= The deviancy explained by including X independent variable differs from the Null model.

car::Anova(ht_mod1, type= "II")

Analysis of Deviance Table (Type II tests)

Response: occupancy00
                LR Chisq Df Pr(>Chisq)    
PatchArea.stand    1.146  1     0.2844    
PatchCon.stand     0.668  1     0.4138    
Cover.stand       22.716  1  1.878e-06 ***
Food.stand        52.592  1  4.105e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now let’s fit a new model by removing one of the predictors:

ht_mod2 <- glm(
  occupancy00 ~ PatchArea.stand + Cover.stand + Food.stand, 
  family = binomial(),  # specify the distribution
  data = hares           # specify the dataset
)

# Perform deviance analysis
car::Anova(ht_mod2, type= "II")

Analysis of Deviance Table (Type II tests)

Response: occupancy00
                LR Chisq Df Pr(>Chisq)    
PatchArea.stand    1.053  1     0.3048    
Cover.stand       22.456  1  2.150e-06 ***
Food.stand        52.348  1  4.649e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now that we have two models — one full model (ht_mod1) including all predictors, and one reduced model (ht_mod2) with one variable removed — we can formally test whether the simpler model performs just as well.

We do this using the Likelihood Ratio Test (LRT):

anova(ht_mod2, # restricted
      ht_mod1, # full
      test = "LRT")

Analysis of Deviance Table

Model 1: occupancy00 ~ PatchArea.stand + Cover.stand + Food.stand
Model 2: occupancy00 ~ PatchArea.stand + PatchCon.stand + Cover.stand + 
    Food.stand
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1       362     393.05                     
2       361     392.38  1  0.66784   0.4138

Is this model really better than the one that included all predictors?

If the p-value is small (e.g. < 0.05), → reject H₀ → the full model is significantly better.

If the p-value is large, → fail to reject H₀ → the reduced model is sufficient.

Statistically, both methods perform the same Likelihood Ratio Test. The difference is mainly practical anova() is the classical base R approach, while test_likelihoodratio() provides cleaner, more informative output

test1 <- test_likelihoodratio(ht_mod2, # Restricted model
                              ht_mod1, # Full model
                              estimator = "ML")
test1

# Likelihood-Ratio-Test (LRT) for Model Comparison (ML-estimator)

Name    | Model | df | df_diff | Chi2 |     p
---------------------------------------------
ht_mod2 |   glm |  4 |         |      |      
ht_mod1 |   glm |  5 |       1 | 0.67 | 0.414

If in practical terms they are the same, which one should we choose?

Let’s continue exploring possible models

ht_mod3 <- glm(occupancy00 ~ Cover.stand+ Food.stand, 
               family = binomial(), # Define the distribution
               data = hares) # Define dataset
car::Anova(ht_mod3, type= "II")

Analysis of Deviance Table (Type II tests)

Response: occupancy00
            LR Chisq Df Pr(>Chisq)    
Cover.stand   24.529  1  7.320e-07 ***
Food.stand    53.735  1  2.294e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we are formally comparing two nested models using a Likelihood Ratio Test (LRT):

test_likelihoodratio(ht_mod3, 
                     ht_mod2, estimator = "ML")

# Likelihood-Ratio-Test (LRT) for Model Comparison (ML-estimator)

Name    | Model | df | df_diff | Chi2 |     p
---------------------------------------------
ht_mod3 |   glm |  3 |         |      |      
ht_mod2 |   glm |  4 |       1 | 1.05 | 0.305

We’ve now reduced our model to include just one predictor:

ht_mod4 <- glm(occupancy00 ~ Food.stand, 
               family = binomial(), # Define the distribution
               data = hares) # Define dataset
car::Anova(ht_mod4, type= "II")

Analysis of Deviance Table (Type II tests)

Response: occupancy00
           LR Chisq Df Pr(>Chisq)    
Food.stand   57.606  1  3.203e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What happened here?

test_likelihoodratio(ht_mod4, 
                     ht_mod3, estimator = "ML")

# Likelihood-Ratio-Test (LRT) for Model Comparison (ML-estimator)

Name    | Model | df | df_diff |  Chi2 |      p
-----------------------------------------------
ht_mod4 |   glm |  2 |         |       |       
ht_mod3 |   glm |  3 |       1 | 24.53 | < .001

Interpreting the summary output

Let’s take a look at the selected model

summary(ht_mod3)


Call:
glm(formula = occupancy00 ~ Cover.stand + Food.stand, family = binomial(), 
    data = hares)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.9455     0.1603   5.898 3.68e-09 ***
Cover.stand   0.5806     0.1231   4.718 2.38e-06 ***
Food.stand    1.4462     0.2785   5.194 2.06e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 476.24  on 365  degrees of freedom
Residual deviance: 394.11  on 363  degrees of freedom
AIC: 400.11

Number of Fisher Scoring iterations: 6

The reports package can helps to do a general report of the model (Makowski et al. 2023) ( or at least understand a little of what you did 🙃).

library(report)

report(ht_mod3)

We fitted a logistic model (estimated using ML) to predict occupancy00 with
Cover.stand and Food.stand (formula: occupancy00 ~ Cover.stand + Food.stand).
The model's explanatory power is moderate (Tjur's R2 = 0.20). The model's
intercept, corresponding to Cover.stand = 0 and Food.stand = 0, is at 0.95 (95%
CI [0.65, 1.28], p < .001). Within this model:

  - The effect of Cover stand is statistically significant and positive (beta =
0.58, 95% CI [0.35, 0.83], p < .001; Std. beta = 0.58, 95% CI [0.35, 0.83])
  - The effect of Food stand is statistically significant and positive (beta =
1.45, 95% CI [0.95, 2.05], p < .001; Std. beta = 1.45, 95% CI [0.95, 2.05])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.

Of course, before you interpret or report anything from your model, you should evaluate its assumptions.

# Simulate residuals for the selected model
residuals_ht_mod3 <- simulateResiduals(ht_mod3)

# Plot the residuals to check model assumptions
plot(residuals_ht_mod3)

Backward selection

Backward selection is a model simplification method where we start with a complex model (including all predictors and interactions) and systematically remove variables one by one, checking at each step whether the model fit is significantly worsened.

drop_mod <- glm(occupancy00 ~ (PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand)^2, 
               family = binomial(), 
               data = hares)

then we use the drop1 function

drop1(drop_mod, test = "LRT")

Single term deletions

Model:
occupancy00 ~ (PatchArea.stand + PatchCon.stand + Cover.stand + 
    Food.stand)^2
                               Df Deviance    AIC    LRT Pr(>Chi)  
<none>                              373.46 395.46                  
PatchArea.stand:PatchCon.stand  1   375.32 395.32 1.8630  0.17228  
PatchArea.stand:Cover.stand     1   380.00 400.00 6.5426  0.01053 *
PatchArea.stand:Food.stand      1   373.56 393.56 0.1098  0.74041  
PatchCon.stand:Cover.stand      1   374.13 394.13 0.6718  0.41241  
PatchCon.stand:Food.stand       1   376.56 396.56 3.1096  0.07783 .
Cover.stand:Food.stand          1   377.26 397.26 3.8017  0.05120 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We now create a new model by removing the terms that did not significantly contribute to the model fit. We then use drop1() to test the effect of each remaining term on model deviance.

drop_mod1 <- glm(occupancy00 ~ PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand+ PatchArea.stand:PatchCon.stand+PatchArea.stand:Cover.stand+PatchCon.stand:Cover.stand+PatchCon.stand:Food.stand+Cover.stand:Food.stand  , 
               family = binomial(), 
               data = hares)

# Evaluate the contribution of each term using likelihood ratio tests
drop1(drop_mod1, test = "LRT")

Single term deletions

Model:
occupancy00 ~ PatchArea.stand + PatchCon.stand + Cover.stand + 
    Food.stand + PatchArea.stand:PatchCon.stand + PatchArea.stand:Cover.stand + 
    PatchCon.stand:Cover.stand + PatchCon.stand:Food.stand + 
    Cover.stand:Food.stand
                               Df Deviance    AIC    LRT Pr(>Chi)  
<none>                              373.56 393.56                  
PatchArea.stand:PatchCon.stand  1   375.63 393.63 2.0642  0.15079  
PatchArea.stand:Cover.stand     1   380.00 398.00 6.4359  0.01118 *
PatchCon.stand:Cover.stand      1   374.22 392.22 0.6558  0.41804  
PatchCon.stand:Food.stand       1   377.03 395.03 3.4692  0.06252 .
Cover.stand:Food.stand          1   377.35 395.35 3.7837  0.05175 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

drop_mod2 <- glm(occupancy00 ~ PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand+ PatchArea.stand:PatchCon.stand+ PatchArea.stand:Cover.stand+ PatchCon.stand:Food.stand +Cover.stand:Food.stand , 
               family = binomial(), 
               data = hares)
drop1(drop_mod2, test = "LRT")

drop_mod3 <- glm(occupancy00 ~ PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand+ PatchArea.stand:Cover.stand+ PatchCon.stand:Food.stand +Cover.stand:Food.stand , 
               family = binomial(), 
               data = hares)
drop1(drop_mod3, test = "LRT")

drop_mod4 <- glm(occupancy00 ~ PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand+ PatchArea.stand:Cover.stand+ PatchCon.stand:Food.stand, 
               family = binomial(), 
               data = hares)
drop1(drop_mod4, test = "LRT")

After the backward selection process, we ended up with the following final model, which includes only the predictors and interactions that significantly contribute to explaining hare occupancy.

summary(drop_mod4)


Call:
glm(formula = occupancy00 ~ PatchArea.stand + PatchCon.stand + 
    Cover.stand + Food.stand + PatchArea.stand:Cover.stand + 
    PatchCon.stand:Food.stand, family = binomial(), data = hares)

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.1531     0.1986   5.805 6.42e-09 ***
PatchArea.stand               0.3278     0.1546   2.121 0.033930 *  
PatchCon.stand               -0.3935     0.2163  -1.819 0.068890 .  
Cover.stand                   0.4653     0.1405   3.312 0.000927 ***
Food.stand                    1.7084     0.3524   4.848 1.25e-06 ***
PatchArea.stand:Cover.stand  -0.4399     0.1616  -2.722 0.006482 ** 
PatchCon.stand:Food.stand    -0.7361     0.3805  -1.934 0.053075 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 476.24  on 365  degrees of freedom
Residual deviance: 380.03  on 359  degrees of freedom
AIC: 394.03

Number of Fisher Scoring iterations: 7

Forward selection

We can also perform forward selection, i.e., starting with a simple null model and adding terms one by one. However, forward selection is generally not recommended, because it may overlook important interactions or collinear effects and tends to be less stable than backward selection.

add_mod <- glm(occupancy00 ~ 1 , 
               family = binomial(), 
               data = hares)

then we use the function add1 in which we have to specify which terms we are going to add

add1(add_mod , 
     scope = ~ PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand, 
     test = "LRT")

Single term additions

Model:
occupancy00 ~ 1
                Df Deviance    AIC    LRT  Pr(>Chi)    
<none>               476.24 478.24                     
PatchArea.stand  1   470.92 474.92  5.316   0.02113 *  
PatchCon.stand   1   475.99 479.99  0.254   0.61403    
Cover.stand      1   447.84 451.84 28.399 9.869e-08 ***
Food.stand       1   418.63 422.63 57.606 3.203e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Akaike information Criteria

What is the AIC?

The Akaike Information Criterion (AIC) measures the relative amount of information lost or gained by using a model to represent the data, compared to other models.

\[ AIC= -2ln(L_{max})+2k) \]

Since adding more parameters usually improves the model fit (but can lead to overfitting), the penalty term \(2k\) discourages overly complex models. Where \(k\) is the number of estimated parameters in the model.

In practice, models with lower AIC values are preferred, as they are expected to lose the least amount of information relative to the “true” model.

Create a set of candidate models

#Food
aic_mod1 <- glm(occupancy00 ~ Food.stand, 
               family = binomial(), 
               data = hares)
#Cover

aic_mod2 <- glm(occupancy00 ~ Cover.stand, 
               family = binomial(), 
               data = hares)

# Patch size
aic_mod3 <- glm(occupancy00 ~ PatchArea.stand, 
               family = binomial(), 
               data = hares)

# Patch connectivity
aic_mod4 <- glm(occupancy00 ~ PatchCon.stand, 
               family = binomial(), 
               data = hares)

Try to fit the following models

- Food, Cover
- Food, Patch size
- Food, Patch connectivity
- Cover, Patch size
- Food, Cover, Patch size
- Food, Cover, Patch connectivity
- Food, Patch size, Patch connectivity
- Food, Cover interaction
- Food, Patch size interaction
- Food, Patch connectivity interaction
- Cover, Patch connectivity interaction
- Constant only

Try to fit the following models

# Food, Cover
aic_mod5 <- glm(occupancy00 ~ Food.stand+ Cover.stand, 
               family = binomial(), 
               data = hares)

#Food, Patch size
aic_mod6 <- glm(occupancy00 ~ Food.stand+ PatchArea.stand, 
               family = binomial(), 
               data = hares)

# Food, Patch connectivity
aic_mod7 <- glm(occupancy00 ~ Food.stand+ PatchCon.stand, 
               family = binomial(), 
               data = hares)

# Cover, Patch size
aic_mod8 <- glm(occupancy00 ~ Cover.stand+ PatchArea.stand, 
               family = binomial(), 
               data = hares)

# Food, Cover, Patch size

aic_mod9 <- glm(occupancy00 ~ Food.stand+ Cover.stand+ PatchArea.stand, 
               family = binomial(), 
               data = hares)


# Food, Cover, Patch connectivity
aic_mod10 <- glm(occupancy00 ~ Food.stand+ Cover.stand+ PatchCon.stand, 
               family = binomial(), 
               data = hares)

Try to fit the following models

#Food, Patch size, Patch connectivity
aic_mod11 <- glm(occupancy00 ~ Food.stand+ PatchArea.stand+ PatchCon.stand, 
               family = binomial(), 
               data = hares)

# Food, Cover interaction
aic_mod12 <- glm(occupancy00 ~ Food.stand*Cover.stand, 
               family = binomial(), 
               data = hares)

# Food, Patch size interaction
aic_mod13 <- glm(occupancy00 ~ Food.stand*PatchArea.stand, 
               family = binomial(), 
               data = hares)

# Food, Patch connectivity interaction
aic_mod14 <- glm(occupancy00 ~ Food.stand*PatchCon.stand, 
               family = binomial(), 
               data = hares)

# Cover, Patch connectivity interaction
aic_mod15 <- glm(occupancy00 ~ Cover.stand*PatchCon.stand, 
               family = binomial(), 
               data = hares)

#Constant only
aic_mod16 <- glm(occupancy00 ~ 1, 
               family = binomial(), 
               data = hares)

To perform model selection using AIC, there are several options and packages available. I particularly like AICcmodavg and its function aictab() for its versatility (Mazerolle 2023).

Here, we compare a set of candidate models, including single predictors, additive combinations, and interactions. aictab() calculates AIC for each model, ranks them, and provides differences relative to the best model.

library(AICcmodavg)

AICctab <- aictab(list("Food"= aic_mod1,
                       "Cover"= aic_mod2,
                       "Patch size"= aic_mod3,
                       "Patch conectivity"= aic_mod4,
                       "Food +Cover"= aic_mod5,
                       "Food +Patch size"= aic_mod6,
                       "Food +Patch connectivity"= aic_mod7,
                       "Cover+ Patch size"= aic_mod8,
                       "Food+ Cover+ Patch size"= aic_mod9,
                       "Food+ Cover+ Patch connectivity"=  aic_mod10,
                       "Food+ Patch size+ Patch connectivity"= aic_mod11,
                       "Food* Cover"= aic_mod12,
                       "Food* Patch size"= aic_mod13,
                       "Food* Patch"= aic_mod14,
                       "Cover* Patch connectivity"= aic_mod15,
                       "Constant"= aic_mod16),
                  second.ord = FALSE, # To use AIC
                  sort = TRUE # Rank the models
                  )

View(AICctab)

Another useful option is to use the performance package

AIC_performance <- compare_performance(list("Food"= aic_mod1,
                       "Cover"= aic_mod2,
                       "Patch size"= aic_mod3,
                       "Patch conectivity"= aic_mod4,
                       "Food +Cover"= aic_mod5,
                       "Food +Patch size"= aic_mod6,
                       "Food +Patch connectivity"= aic_mod7,
                       "Cover+ Patch size"= aic_mod8,
                       "Food+ Cover+ Patch size"= aic_mod9,
                       "Food+ Cover+ Patch connectivity"=  aic_mod10,
                       "Food+ Patch size+ Patch connectivity"= aic_mod11,
                       "Food* Cover"= aic_mod12,
                       "Food* Patch size"= aic_mod13,
                       "Food* Patch"= aic_mod14,
                       "Cover* Patch connectivity"= aic_mod15,
                       "Constant"= aic_mod16),
                       metrics = "AIC",
                       rank = FALSE)

View(AIC_performance)

The AIC even allows us to compare all possible combinations of models.

First, we need to fit a global model that includes all predictors and possible pairwise interactions.
This global model serves as the starting point for automated model selection using dredge().

Note: Setting na.action = "na.fail" is important because dredge() requires that the model contains no missing values.

global_mod <- glm(occupancy00 ~ (PatchArea.stand+ PatchCon.stand+ Cover.stand+ Food.stand)^2, 
               family = binomial(), 
               data = hares,
               na.action = "na.fail") # important to dredge
summary(global_mod )


Call:
glm(formula = occupancy00 ~ (PatchArea.stand + PatchCon.stand + 
    Cover.stand + Food.stand)^2, family = binomial(), data = hares, 
    na.action = "na.fail")

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.2733     0.2253   5.652 1.59e-08 ***
PatchArea.stand                  0.2929     0.2218   1.320   0.1867    
PatchCon.stand                  -0.4187     0.2421  -1.730   0.0837 .  
Cover.stand                      0.6565     0.1678   3.913 9.10e-05 ***
Food.stand                       1.9403     0.4040   4.802 1.57e-06 ***
PatchArea.stand:PatchCon.stand  -0.2425     0.1894  -1.280   0.2004    
PatchArea.stand:Cover.stand     -0.3906     0.1641  -2.381   0.0173 *  
PatchArea.stand:Food.stand      -0.1176     0.3467  -0.339   0.7345    
PatchCon.stand:Cover.stand       0.1229     0.1508   0.815   0.4150    
PatchCon.stand:Food.stand       -0.7218     0.4301  -1.679   0.0932 .  
Cover.stand:Food.stand           0.4875     0.2477   1.968   0.0491 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 476.24  on 365  degrees of freedom
Residual deviance: 373.46  on 355  degrees of freedom
AIC: 395.46

Number of Fisher Scoring iterations: 7

Now we can use the dredge() function from the MuMIn package.
This function fits and ranks all possible combinations of models based on the structure of the global model.

In this example, we set a limit of up to 3 variables per model for simplicity.
Be aware that interpreting models with multiple interactions can quickly become challenging, so dredging is mostly useful as an exploratory tool.

library(MuMIn)

dredge <- dredge( global_mod, #Global model
                  rank = "AIC", # Information criteria
                  m.lim = c(NA, 3) # Limint interactions 
                  
  
)

View(dredge)

We can extract the best models from the dredge results, for example those with ΔAIC < 2, which are considered essentially equivalent to the top model.

We can then examine the summary of any selected model.

# Extract models with ΔAIC < 2 from dredge results
best_models <- get.models(dredge, subset = delta< 2)

# View the summary of a specific model (e.g., model #20)
summary(best_models$`20`)


Call:
glm(formula = occupancy00 ~ Cover.stand + Food.stand + Cover.stand:Food.stand + 
    1, family = binomial(), data = hares, na.action = "na.fail")

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)              1.0515     0.1837   5.724 1.04e-08 ***
Cover.stand              0.7662     0.1500   5.109 3.24e-07 ***
Food.stand               1.6703     0.3256   5.130 2.90e-07 ***
Cover.stand:Food.stand   0.5108     0.2341   2.182   0.0291 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 476.24  on 365  degrees of freedom
Residual deviance: 389.90  on 362  degrees of freedom
AIC: 397.9

Number of Fisher Scoring iterations: 6

Finally, we check the assumptions of the selected model using simulated residuals.

This helps ensure that the model provides a valid representation of the data before interpreting or reporting the results.

res_best <- simulateResiduals(best_models$`20`)

plot(res_best)

Different approaches give us slightly different models.

Choose the approach based on your modeling objectives (Tredennick et al. 2021).

Finally, we are going to generate prediction graphs using the `ggeffects` package.

This allows us to visualize how the predictors influence the probability of hare occupancy based on the selected best model.

library(ggeffects)

prediction <- ggpredict(best_models$`20`,
                        terms = c("Food.stand [all]" , "Cover.stand " ))

prediction

# Predicted probabilities of occupancy00

Cover.stand: -1

Food.stand | Predicted |     95% CI
-----------------------------------
     -0.56 |      0.41 | 0.33, 0.50
      0.24 |      0.64 | 0.53, 0.73
      0.85 |      0.78 | 0.64, 0.88
      1.55 |      0.89 | 0.74, 0.96
      2.25 |      0.95 | 0.82, 0.99
      5.07 |      1.00 | 0.96, 1.00

Cover.stand: 0

Food.stand | Predicted |     95% CI
-----------------------------------
     -0.56 |      0.53 | 0.46, 0.59
      0.24 |      0.81 | 0.72, 0.87
      0.85 |      0.92 | 0.84, 0.96
      1.55 |      0.97 | 0.91, 0.99
      2.25 |      0.99 | 0.96, 1.00
      5.07 |      1.00 | 1.00, 1.00

Cover.stand: 1

Food.stand | Predicted |     95% CI
-----------------------------------
     -0.56 |      0.64 | 0.56, 0.72
      0.24 |      0.91 | 0.83, 0.96
      0.85 |      0.98 | 0.91, 0.99
      1.55 |      0.99 | 0.96, 1.00
      2.25 |      1.00 | 0.98, 1.00
      5.07 |      1.00 | 1.00, 1.00

plot(prediction, show_data = TRUE)

To visualize how detection probability varies across two continuous predictors, we first need to create a new data frame containing all possible combinations of their values. This can be done using the expand.grid() function.

Next, we use the predict() function to estimate detection probabilities for each combination based on our fitted model.

Finally, we arrange these predictions into a data frame that can be plotted as a heat map, where each axis represents one variable and the color scale represents the predicted probability.

newdata <- expand.grid(
  Food.stand  = seq(min(hares$Food.stand, na.rm = TRUE),
                 max(hares$Food.stand, na.rm = TRUE),
                 length.out = 100),
  Cover.stand = seq(min(hares$Cover.stand, na.rm = TRUE),
                 max(hares$Cover.stand, na.rm = TRUE),
                 length.out = 100)
)


pred2 <- predict(best_models$`20`, 
                 newdata = newdata, 
                 type = "link", se.fit = TRUE)

pre_df <- newdata %>%
  mutate(
    fit_link = pred2$fit,
    se_link  = pred2$se.fit,
    prob     = plogis(fit_link),                          
    prob_low = plogis(fit_link - 1.96 * se_link),         
    prob_high= plogis(fit_link + 1.96 * se_link)          
  )

ggplot(pre_df, aes(x = Food.stand, y = Cover.stand, 
                   fill = prob)) +
  geom_raster() +                                  
  scale_fill_viridis_c(name = "Occupancy\nprobability", 
                       option = "viridis", direction = -1) +
  geom_contour(aes(z = prob), colour = "white", 
               alpha = 0.5, bins = 8) + 
  labs(x = "Food availability", y = "Cover", 
       title = "Predicted occupancy probability") +
  coord_fixed() +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Thanks

Source code for this presentation can be found at : https://github.com/gpandradep/Model-selection

Hartig, Florian. 2022. “DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models.” https://CRAN.R-project.org/package=DHARMa.

Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski. 2021. “Performance: An r Package for Assessment, Comparison and Testing of Statistical Models” 6: 3139. https://doi.org/10.21105/joss.03139.

Makowski, Dominique, Daniel Lüdecke, Indrajeet Patil, Rémi Thériault, Mattan S. Ben-Shachar, and Brenton M. Wiernik. 2023. “Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption.” https://easystats.github.io/report/.

Mazerolle, Marc J. 2023. “AICcmodavg: Model Selection and Multimodel Inference Based on (q)AIC(c).” https://cran.r-project.org/package=AICcmodavg.

Murray, Dennis L., Guillaume Bastille-Rousseau, Lynne E Beaty, Megan L Hornseth, Jeffrey R. Row, and Daniel Thornton. 2020. “From Research Hypothesis to Model Selection: A Strategy for Robust Inference in Population Ecology.” In, edited by Dennis L. Murray and Brett Sandercock, 448. Jhon Wiley & Sons.

Tredennick, Andrew T., Giles Hooker, Stephen P. Ellner, and Peter B. Adler. 2021. “A Practical Guide to Selecting Models for Exploration, Inference, and Prediction in Ecology.” Ecology 102 (6): e03336. https://doi.org/10.1002/ecy.3336.

Zuur, Alain F, Elena N Ieno, Neil J Walker, Anatoly A Saveliev, and Graham M Smith. 2009. Mixed Effects Models and Extensions in Ecology with r. 1st ed. Statistics for Biology and Health. Springer.

Introduction to Model Selection in R

Nature is complex

How do scientists deal with such complexity?

Imagine at least two hypotheses, drivers, or factors that could influence your phenomenon of interest.

Are these factors mutually exclusive?

Which one do you think is more important?

Model selection

Skunk abundance example

We already know a tool, right?

Looks like a job for a regression!

Now we can do the same for each of the predictions in our hypotheses.

Shelters

Resources and Shelters

What about a model that tells me skunk abundance increases with the number of shelters but also depends on resources?

Be careful with specification of the model, because each one tells a different story.

Characteristics of a good model

Approaches to estimating model parsimony

Approaches to estimating model parsimony

Bayesian

One thing in common: Maximum Likelihood Estimation (MLE)

Null hypothesis testing

First, we need to check some assumptions

Check for overdispersion

Test the distributional assumptions

Homogeneity of variance (or deviance)e

Now let’s move on to the test

Interpreting the summary output

Backward selection

Forward selection

Akaike information Criteria

What is the AIC?

The AIC even allows us to compare all possible combinations of models.

Finally, we are going to generate prediction graphs using the ggeffects package.

Thanks

Finally, we are going to generate prediction graphs using the `ggeffects` package.