*glmulti* is a R package for model selection and model-averaging using AIC, BIC or other criteria. Its core is an efficient enumerator that generates all candidate formulas, taking into account marginality and/or other constraints. It also features a genetic algorithm approach to explore candidate models when their total number is just unmanageable. You can get the package from the CRAN, at http://cran.r-project.org/web/packages/glmulti/index.html

**2014/06/30. On filtering out undesired interactions in glmulti 1.x**

You have been many to observe that when there is only a subset of all possible interactions that you want to include in the candidate set of models (e.g. only interactions involving one particular variable), glmulti 1.x may run into trouble, especially if “marginality” is set on. This is because I had originally designed the exclusion algorithm to work when a few interactions were to be excluded, not when most were undesired.

Fortunately there is an easy work around to specify any pattern of model exclusion, which involves:

- Telling glmulti to explore all models;
- Using a custom fit function: a wrapper to your fit function that filters out undesired models.

This may remain vague for many of you, so here is a specific example. I have a dataset *mydata* with a dependent variable *y* and 7 explanatory variables (*c1, c2, c3, f1, f2, f3 *and* m*). The goal is to consider only interactions between factor “m” and the other variables. In other words, I want the full model to be:

All right. I thus prepare a wrapper to glm that first checks whether the formula is “acceptable” or not. If it is, it calls *glm* as usual. If not, it does not call *glm*, but rather returns some crappy model. This way *glmulti* will be tricked to believe that the formula caused a very poor fit, and will not retain it (it does cause a poor fit in a way, but the lack of fit is caused by our extrinsic choice to not consider the model in candidates). Simple isn’t it?

As a crappy model, I usually take the null model that only has an intercept. I fit it once and for all and call it nullos:

nullos <- glm(y~1,data=mydata)

NOTE:* This of course assumes the null model is NOT one of your best models… If it is, you probably have an issue in the way your problem is posed, but in any case, you can still find an even crappier model if needed (e.g. no intercept at all, stopped convergence…).*

Now, I write the wrapper function that will reject unacceptable model formulas. I call it *myglm*. Note that in this case (including interactions with one specific variable), filtering out models is fairly straightward and most of the below code is comments:

myglm=function(y, data) { # we get the terms for the formula if (missing(data)) data<-environment(y) termz = terms(y,data=data) # we get the order of all terms orderz = attr(termz,"order") # we get all pairwise interactions, if any. Otherwise the formula is okay intz = which(orderz==2) # we locate the row corresponding to the desired variable (as it is not constant) # HERE I WANT VARIABLE M (change accordingly) index=which(dimnames(attr(termz,"factors"))[[1]] == "m") if (length(index)>0) { # the desired effect must be present if(length(intz)>0) { # we simply test that all interactions include the desired effect # otherwise we return the crappy null model if (min(attr(termz,"factors")[index,intz])==0) return(nullos) } } else return(nullos); # if all is good we just call glm as usual return(glm(formula=y, data=data)) }

That’s it! The filtering out will be handled by *myglm*, so we do not need to care when calling *glmulti*. We just run it without specifying exclusions:

glmulti(y~., data=mydata, fitfunc=myglm)

We can check that *myglm* filters models as intended:

myglm(y~( c2 + c3 + f1 + f2 + f3+ c1)*m,data=mydata) # fitted myglm(y~.,data=mydata) # fitted myglm(y~.*.,data=mydata) # not fitted: the null model is returned instead.

In principle the workaround is suboptimal since model formulas are generated and then discarded, but in practice this is negligible. You can easily modify *myglm* to specify arbitrary rules for inclusions/exclusions, even very complex ones. Again, version 2 will handle this in a much cleaner way.

**2014/06/27. **I’m late on my schedule for releasing version 2.0 (yeah, I know…). Now that Master students have finished their theses, I promise to dig into this this summer. *MEANWHILE*:

Several of you have encountered troubles with model averaging and the latest versions of ** lme4 **(for mixed models). The problem is that they have changed a few things in the package structure (the name of their class, the internal structure of objects…). All you have to do really it update the

*getfit*function so that glmulti knows as to extract parameter values for the new objects.

If you are using the new version of

**, then simply use this**

*lme4**getfit*function instead of the one I recommended before:

setMethod('getfit', 'merMod', function(object, ...) { summ=summary(object)$coef summ1=summ[,1:2] if (length(dimnames(summ)[[1]])==1) { summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error"))) } cbind(summ1, df=rep(10000,length(fixef(object)))) })

This will allow you to obtain model averaged estimates with the latest **lme4. **

NOTE: I have also answered the related question at stackoverflow.

**2013/04/12. **Version **2.0** will be released as soon as I finish up the packing, but I’m done with the core code. It will bring improvements that I had wanted to implement from the start but had proven difficult: (i) unlimited number of predictors, (ii) unlimited depth of interaction, (iii) better support for term exclusions/inclusions and use with ‘poly’. Thanks to all users for their help and support!

**2013/04/10**. Version 1.0.7 has just been released. This is is a small update that fixes some issues with qaic/qaicc. Use it if you have overdispersed data! One change is the way the estimated *c* value should be specified. It is now done through the option “*glmulti-cvalue*“, instead of defining the global variable *glmultiqaiccvalue*. So just use setOption(“glmulti-value”=1.414) to specify a scale of 1.414.

Hi Vincent,

I’m using your glmulti package and I am interested in “forcing in” a variable in all of my models. I have seen how to do this in the help doc as well as here: http://r.789695.n4.nabble.com/Package-quot-glmulti-quot-Include-a-variable-in-ALL-models-td4648849.html. Will version 2.0 be able to do this?

Hi, obviously I have been too busy these last months, sorry for not responding. Last but not least: yes, version 2.0 can do that easily. Actually, version 1.x should have included this as well since it is quite simple to do (and you found out), but I never took the time to implement this in a user-transparent and “builtin” way. Best

In myglm, I’ve added a proper a handling of the ‘data’ argument, and added an ‘else’ statement that I had forgotten and let some undesired models slip through. Thanks to Jan Lammertyn for pointing at that!

And one remark: you will note that using ‘min’ (in myglm) is in theory slower while the programmer would prefer using ‘prod’: making the product of all entries is a faster operation than finding the min value, and would do the job here. But I let you change this if you are looking for milliseconds..

I tried it, but this option to set the c value does not work when I want to get the qaicc and use quassipoisson error distribution. please help

When using quasi-families:

(1) pass on the overdispersion parameter (c) to glmulti using the ‘options’ R command (provided in package utils):

e.g. to set an overdispersion value of two: > options(“glmulti-cvalue” = 2)

(2) run glmulti with the REGULAR FAMILY (not the quasi one) as the log likelihood can no longer be extracted from glm objects with a quasi family in latest R releases.

e.g. for a quasipoission model use family=poisson.

You just need to use a suitable information criterion (e.g. qaic or qaicc).