~/package glmulti

Rglmulti is a R package for model selection and model-averaging using AIC, BIC or other criteria. Its core is an efficient enumerator that generates all candidate formulas, taking into account marginality and/or other constraints. It also features a genetic algorithm approach to explore candidate models when their total number is just unmanageable. You can get the package from the CRAN, at http://cran.r-project.org/web/packages/glmulti/index.html

2014/06/30. On filtering out undesired interactions in glmulti 1.x

You have been many to observe that when there is only a subset of all possible interactions that you want to include in the candidate set of models (e.g. only interactions involving one particular variable), glmulti 1.x may run into trouble, especially if “marginality” is set on. This is because I had originally designed the exclusion algorithm to work when a few interactions were to be excluded, not when most were undesired.

Fortunately there is an easy work around to specify any pattern of model exclusion, which involves:

  1. Telling glmulti to explore all models;
  2. Using a custom fit function: a wrapper to your fit function that filters out undesired models.

This may remain vague for many of you, so here is a specific example. I have a dataset mydata with a dependent variable y and 7 explanatory variables (c1, c2, c3, f1, f2, f3 and m). The goal is to consider only interactions between factor “m” and the other variables. In other words, I want the full model to be:


All right. I thus prepare a wrapper to glm that first checks whether the formula is “acceptable” or not. If it is, it calls glm as usual. If not, it does not call glm, but rather returns some crappy model. This way glmulti will be tricked to believe that the formula caused a very poor fit, and will not retain it (it does cause a poor fit in a way, but the lack of fit is caused by our extrinsic choice to not consider the model in candidates). Simple isn’t it?

As a crappy model, I usually take the null model that only has an intercept. I fit it once and for all and call it nullos:

nullos <- glm(y~1,data=mydata)

NOTE: This of course assumes the null model is NOT one of your best models… If it is, you probably have an issue in the way your problem is posed, but in any case, you can still find an even crappier model if needed (e.g. no intercept at all, stopped convergence…).

Now, I write the wrapper function that will reject unacceptable model formulas. I call it myglm. Note that in this case (including interactions with one specific variable), filtering out models is fairly straightward and most of the below code is comments:

myglm=function(y, data) {
# we get the terms for the formula
if (missing(data)) data<-environment(y)
termz = terms(y,data=data)
# we get the order of all terms
orderz = attr(termz,"order")
# we get all pairwise interactions, if any. Otherwise the formula is okay
intz = which(orderz==2)
# we locate the row corresponding to the desired variable (as it is not constant)
# HERE I WANT VARIABLE M (change accordingly)
index=which(dimnames(attr(termz,"factors"))[[1]] == "m")
if (length(index)>0) { # the desired effect must be present
 if(length(intz)>0) {
    # we simply test that all interactions include the desired effect
    # otherwise we return the crappy null model
    if (min(attr(termz,"factors")[index,intz])==0) return(nullos)
} else return(nullos);
# if all is good we just call glm as usual
return(glm(formula=y, data=data)) 

That’s it! The filtering out will be handled by myglm, so we do not need to care when calling glmulti. We just run it without specifying exclusions:

glmulti(y~., data=mydata, fitfunc=myglm)

We can check that myglm filters models as intended:

myglm(y~( c2 + c3 + f1 + f2 + f3+ c1)*m,data=mydata)
# fitted
# fitted
# not fitted: the null model is returned instead.

In principle the workaround is suboptimal since model formulas are generated and then discarded, but in practice this is negligible. You can easily modify myglm to specify arbitrary rules for inclusions/exclusions, even very complex ones. Again, version 2 will handle this in a much cleaner way.


2014/06/27. I’m late on my schedule for releasing version 2.0 (yeah, I know…). Now that Master students have finished their theses, I promise to dig into this this summer. MEANWHILE:

Several of you have encountered troubles with model averaging and the latest versions of lme4 (for mixed models). The problem is that they have changed a few things in the package structure (the name of their class, the internal structure of objects…). All you have to do really it update the getfit function so that glmulti knows as to extract parameter values for the new objects.
If you are using the new version of lme4, then simply use this getfit function instead of the one I recommended before:

setMethod('getfit', 'merMod', function(object, ...) {
 if (length(dimnames(summ)[[1]])==1) {
 summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error")))
 cbind(summ1, df=rep(10000,length(fixef(object))))

This will allow you to obtain model averaged estimates with the latest lme4.

NOTE: I have also answered the related question at stackoverflow.

2013/04/12. Version 2.0 will be released as soon as I finish up the packing, but I’m done with the core code. It will bring improvements that I had wanted to implement from the start but had proven difficult: (i) unlimited number of predictors, (ii) unlimited depth of interaction, (iii) better support for term exclusions/inclusions and use with ‘poly’. Thanks to all users for their help and support!

2013/04/10. Version 1.0.7 has just been released. This is is a small update that fixes some issues with qaic/qaicc. Use it if you have overdispersed data! One change is the way the estimated c value should be specified. It is now done through the option “glmulti-cvalue“, instead of defining the global variable glmultiqaiccvalue.  So just use setOption(“glmulti-value”=1.414) to specify a scale of 1.414.

6 thoughts on “~/package glmulti

  1. Hi, obviously I have been too busy these last months, sorry for not responding. Last but not least: yes, version 2.0 can do that easily. Actually, version 1.x should have included this as well since it is quite simple to do (and you found out), but I never took the time to implement this in a user-transparent and “builtin” way. Best

  2. In myglm, I’ve added a proper a handling of the ‘data’ argument, and added an ‘else’ statement that I had forgotten and let some undesired models slip through. Thanks to Jan Lammertyn for pointing at that!

  3. And one remark: you will note that using ‘min’ (in myglm) is in theory slower while the programmer would prefer using ‘prod’: making the product of all entries is a faster operation than finding the min value, and would do the job here. But I let you change this if you are looking for milliseconds..

    1. I tried it, but this option to set the c value does not work when I want to get the qaicc and use quassipoisson error distribution. please help

      1. When using quasi-families:
        (1) pass on the overdispersion parameter (c) to glmulti using the ‘options’ R command (provided in package utils):
        e.g. to set an overdispersion value of two: > options(“glmulti-cvalue” = 2)
        (2) run glmulti with the REGULAR FAMILY (not the quasi one) as the log likelihood can no longer be extracted from glm objects with a quasi family in latest R releases.
        e.g. for a quasipoission model use family=poisson.
        You just need to use a suitable information criterion (e.g. qaic or qaicc).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s