Poisson Regression

Let’s start with the canonical count model, the Poisson. Recall that – as a GLM – the Poisson uses a log link between \(E[y_i|X_i]=\lambda_i\) (the conditional mean), and \(\mu_i=\sum_j \beta_j X_{ij}\) (the linear predictor), i.e. \(\log E[y_i|X_i] = \sum_j \beta_j X_{ij}\):

model{
    ## Likelihood
    for(i in 1:N){
      y[i] ~ dpois(lambda[i])
      log(lambda[i]) <- mu[i]
      mu[i] <- inprod(beta[],X[i,])
      }     
    ## Priors 
    beta ~ dmnorm(mu.beta,tau.beta)  # multivariate Normal prior
}

Note that we can’t actually compile this model from within the .Rmd file, so we have to write it in a separate file (poiss.bug), and save it in our working directory.

As usual, it is wise to begin with simulated data. All of the housekeeping code required for the .R and .bugs files, as well as our basic framework can be adopted from our previous examples, such as the Wallerstein-Stephens regression.

Let’s construct some toy data, with \(1,000\) observations, \(1\) predictor, and an intercept and slope equal to \(1\):

N <- 1000
beta0 <- 1  # intercept
beta1 <- 1  # slope
x <- rnorm(n=N)  # standard Normal predictor
mu <- beta0*1 + beta1*x  # linear predictor function
lambda <- exp(mu)  # CEF
y <- rpois(n=N, lambda=lambda)  # Poisson DV
dat <- data.frame(x,y)  

Now, we specify the parameters we will feed into JAGS (data and priors). As usual, we we’ll use a vague multivariate Normal prior centered around \(0\) for the regression parameters.

forJags <- list(X=cbind(1,dat$x),  # predictors
                y=dat$y,  # DV
                N=N,  # sample size
                mu.beta=rep(0,2),  # priors centered on 0
                tau.beta=diag(.0001,2))  # diffuse priors

Let’s compile our model, and make sure it works:

library(rjags)
jagsmodel <- jags.model(file="~/Dropbox/209_S16/Code/pois.bug",  # compile model
                        data=forJags)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 1
##    Total graph size: 6010
## 
## Initializing model

Time to update our model, and generate posteriors for our two regression parameters:

out <- coda.samples(jagsmodel,
                    variable.names="beta",  # parameter vector (length 2)
                    n.iter=1e5)  # increase chain length from default

Now we can check that JAGS gets the mean of the posteriors of our 2 parameters (\(\beta_0, \beta_1\)) roughly right (close to \(1\)), and compare them to the ML estimates we get from glm:

summary(out)  
## 
## Iterations = 1001:101000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean      SD  Naive SE Time-series SE
## beta[1] 1.0253 0.02040 6.450e-05      0.0002447
## beta[2] 0.9851 0.01414 4.472e-05      0.0001589
## 
## 2. Quantiles for each variable:
## 
##           2.5%    25%    50%    75% 97.5%
## beta[1] 0.9850 1.0118 1.0255 1.0391 1.065
## beta[2] 0.9575 0.9755 0.9851 0.9945 1.013
summary(glm(dat$y~dat$x, family=poisson))  
## 
## Call:
## glm(formula = dat$y ~ dat$x, family = poisson)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2370  -0.9222  -0.1036   0.5611   2.8208  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.02523    0.02090   49.06   <2e-16 ***
## dat$x        0.98530    0.01436   68.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 5839.5  on 999  degrees of freedom
## Residual deviance: 1098.8  on 998  degrees of freedom
## AIC: 3784.2
## 
## Number of Fisher Scoring iterations: 5

The MCMC and MLE estimates are very close, and both basically on point.

Let’s see how our posteriors look:

plot(out)  

We see that convergence is not perfect, even after 100,000 iterations: the posterior of beta[2] is not exactly symmetric.

We should also assess the posteriors’ stationarity, by looking at the Heidelberg-Welch convergence diagnostic:

heidel.diag(out)  
## [[1]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed       1         0.681  
## beta[2] passed       1         0.758  
##                                  
##         Halfwidth Mean  Halfwidth
##         test                     
## beta[1] passed    1.025 0.000480 
## beta[2] passed    0.985 0.000311

Seems ok. Let’s also check that our chain’s length is satisfactory.

raftery.diag(out)  
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                                
##          Burn-in  Total Lower bound  Dependence
##          (M)      (N)   (Nmin)       factor (I)
##  beta[1] 36       39669 3746         10.60     
##  beta[2] 26       27700 3746          7.39

But what about the effective sample size?

effectiveSize(out)  
##  beta[1]  beta[2] 
## 6950.793 7921.019

This corresponds to only 6.95, 7.92\(\%\) of our draws, signaling strong dependency in our chain.

Concluding, even with a correctly specified model, uninformative priors, \(2\) parameters, and a dataset that is not excessive (\(n=1,000\)), this is not exactly a “walk through the park” for JAGS. Various changes to the sampling process might improve the properties of our posteriors, such as use a longer burn-in and/or a longer chain, thin more heavily, intiate the sampler at the ML estimates, etc.

Negative Binomial Regression

Moving on to the NB distribution, we need more reparameterization to get into a form appropriate for our regression. Following the notation in the JAGS manual, and in Jackman’s code in the book, we parameterize the NB density for observation \(i\) with \(p_i\) and \(r\). The latter is the (over)dispersion parameter (\(\geq 0\)), in the Poisson distribution equals \(1\) (no overdispersion). The former is referred to as the success parameter, and for observation \(i\) is defined as \(p_i=\frac{r}{r+\lambda_i}\), where \(\log \lambda_i=\sum_j \beta_j X_{ij}\), as in the Poisson.

To fully specify our JAGS model, we need a prior for \(r\), in addition to priors for our \(\beta\)’s (we use the same as above). We use a uniform prior that puts an upper bound of \(50\) for \(r\). As Jackman notes (p. 280), “this is not at all restrictive: recall that the negative binomial tends to the Poisson as \(r \rightarrow \infty\) […] so the negative binomial is practically indistinguishable from the Poisson once the overdispersion parameter gets anywhere close to being that large”.

model{
    ## Likelihood
    for(i in 1:N){
      y[i] ~ dnegbin(p[i],r)
      p[i] <- r/(r+lambda[i]) 
      log(lambda[i]) <- mu[i]
      mu[i] <- inprod(beta[],X[i,])
    } 
    ## Priors
    beta ~ dmnorm(mu.beta,tau.beta)
    r ~ dunif(0,50)
}

Again, let’s use toy data to verify that our model works. We use the same data and setup as in the Poisson, and set \(r=2\). Note that to simulate data from a negative binomial (using rnegbin) we need the pscl package. Also, note that rnegbin in R is parametarized by mu and theta, which are \(\lambda\) and \(r\) in our setup, whereas dnegbin in JAGS is parametarized by p and r.

library(pscl)
N <- 1000
beta0 <- 1
beta1 <- 1
x <- rnorm(n=N)
mu <- beta0*1 + beta1*x
lambda <- exp(mu)
r <- 2
y <- rnegbin(n=N, mu=lambda, theta=r)
dat <- data.frame(x,y)

I feed the same information to JAGS as in the Poisson:

forJags <- list(X=cbind(1,dat$x),
                y=dat$y,
                N=N,
                mu.beta=rep(0,2),
                tau.beta=diag(.0001,2))

To achieve better convergence, let’s lengthen the initial adaptation:

jagsmodel <- jags.model(file="~/Dropbox/209_S16/Code/negbin.bug",
                        data=forJags,
                        n.adapt=5e3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 2
##    Total graph size: 8013
## 
## Initializing model

For the same purpose, let’s keep every fifth draw (thin=5):

out <- coda.samples(jagsmodel,
                    variable.names=c("beta","r"),
                    n.iter=1e5,
                    thin=5)

How do our posterior estimates look compared to the ML ones?

summary(out)
## 
## Iterations = 5005:105000
## Thinning interval = 5 
## Number of chains = 1 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean      SD  Naive SE Time-series SE
## beta[1] 1.0547 0.03217 0.0002275      0.0003123
## beta[2] 0.9795 0.03420 0.0002419      0.0003407
## r       1.8071 0.13588 0.0009608      0.0009608
## 
## 2. Quantiles for each variable:
## 
##           2.5%    25%    50%   75% 97.5%
## beta[1] 0.9912 1.0330 1.0547 1.077 1.118
## beta[2] 0.9137 0.9562 0.9791 1.003 1.047
## r       1.5583 1.7123 1.8006 1.894 2.091
summary(glm.nb(y~x,data=dat))
## 
## Call:
## glm.nb(formula = y ~ x, data = dat, init.theta = 1.79694651, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7450  -1.0722  -0.2950   0.4142   2.9886  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.05479    0.03205   32.91   <2e-16 ***
## x            0.97901    0.03364   29.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.7969) family taken to be 1)
## 
##     Null deviance: 2208.6  on 999  degrees of freedom
## Residual deviance: 1070.5  on 998  degrees of freedom
## AIC: 4379.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.797 
##           Std. Err.:  0.134 
## 
##  2 x log-likelihood:  -4373.913

Again, both ML and MCMC estimates of the mean of our parameters (\(\beta_0, \beta_1, r\)) are relatively close to each other, though further from the true value this time.

How do the posteriors look?

plot(out) 

The traceplots do not show any trending, but the posteriors for the intercept and slope have hints of bimodality.

Let’s also check some diagnostics.

heidel.diag(out)  # assess stationarity
## [[1]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed       1         0.905  
## beta[2] passed       1         0.597  
## r       passed       1         0.763  
##                                 
##         Halfwidth Mean Halfwidth
##         test                    
## beta[1] passed    1.05 0.000612 
## beta[2] passed    0.98 0.000668 
## r       passed    1.81 0.001883
raftery.diag(out)  # assess required run length
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                                
##          Burn-in  Total Lower bound  Dependence
##          (M)      (N)   (Nmin)       factor (I)
##  beta[1] 25       29990 3746         8.01      
##  beta[2] 25       27880 3746         7.44      
##  r       10       18700 3746         4.99
effectiveSize(out)  # assess independence of draws
##  beta[1]  beta[2]        r 
## 10611.42 10078.00 20000.00

The first two tests do not flag anything worrying. As for the independence of our draws, though it does not match the size of our chain (draws fully independent), effective sample size is more than \(50\%\) for intercept and slope, and over \(95\%\) for the dispersion parameter. (Recall that we drew \(100,000\) values, but only kept every fifth).

Richer NB models can be hard to fit in JAGS. As Jackman says after fitting a slighltly more complicated model, “the resulting Markov chain produces woefully slow mixing and several hundred thousand iterations are required to generate a thorough exploration of the joint posterior density” (p. 279).

Interestingly, to improve convergence we can use independent Normal priors for the intercept and slope. However, this comes at the cost of slower sampling (not shown).

Zero-Inflated Models

Zero-inflated models are not easy to identify. When the two components – binary and count – are poorly separated, ML estimates can be unstable, particularly in the presence of outliers. This is the case with Wilson & Piazza’s data, since around 2/3 of the non-zero observations lie below 10, while the remaining counts are significantly larger (see histogram). Some solutions have been proposed to this, most of which rely on complicated frequentist adjustments to the standard ML algorithm. Yet, given the recent advancements in computing power, MCMC methods present a viable alternative for obtaining stable estimates.

Another obstacle to identification is the inclusion of the same predictors in both components of the mixture model. For example, Wilson & Piazza – like most terrorism scholars – make this modeling choice because their theory does not differentiate between the forces that determine the likelihood of observing any attacks at all (binary component), and the forces that determine the likelihood of observing – say – \(c\) attacks (count component). Thus, there are no theoretical grounds for including different predictors in each component of the model. Despite its theoretical appeal, though, this choice renders identification laborious, since the estimates of the slope parameters in each component ``fight" for the same variation. Moreover, it is unclear whether MCMC methods can aid identification on this front.

In any case, let’s try to fit a zero-inflated model. We begin with the 2-part density of the model:

\[ p(y_i) = \begin{cases} p_0(y_i = 0) \quad &\text{if } y_i = 0 \\ p_1(y_i) (1 - p_0(y_i = 0)) \quad &\text{if } y_i \geq 0 \end{cases} \]

Here, \(p_0\) denotes the binary component’s density (probability of zero-inflation), and \(p_y\) the count component’s density. Now we can form the zero-inflated density, which is a mixture:

\[ p(y_i) = p_0(0) * I_{\{0\}}(y_i) + (1-p_0(0))p_1(y_i) \]

If we could specify our own likelihood function in JAGS, we could code the above, after choosing densities for each component. Since we cannot, though, we have to be creative.

Zero-Inflated Poisson Regression

Let’s try to fit a zero-inflated Poisson model (ZIP) in JAGS. I begin by describing the zero-inflation component, which leads to a “hack” in the count component. First we need a Bernoulli variable, usually with a logit link, to model whether the observation is a structural zero or not. When a structural zero is observed, the Bernoulli variable zeroes-out/absorbs the mean of the count and, hence, “crash” the mean of the observation. Yet, because JAGS cannot compile the model if it observes a non-zero observation from a mean-zero Poisson distribution, when the Bernoulli predicts zero-inflation we must set the Poisson’s mean to an infinitesimally small number. \

model{
    ## Likelihood
    for(i in 1:N){
      y[i] ~ dpois(lambda.hacked[i])
      lambda.hacked[i] <- lambda[i]*(1-zero[i]) + 1e-10*zero[i]
      lambda[i] <- exp(mu.count[i])
      mu.count[i] <- inprod(beta[],X[i,])
      
      ## Zero-Inflation
      zero[i] ~ dbern(pi[i])
      pi[i] <- ilogit(mu.binary[i])
      mu.binary[i] <- inprod(alpha[],X[i,])
    } 
    ## Priors
    beta ~ dmnorm(mu.beta,tau.beta)
    alpha ~ dmnorm(mu.alpha,tau.alpha)
}

Now let’s try our model on the same toy data as before, but after mixing in some zeros, from a known distribution. We can adapt the count component from our Poisson example. However, for the binary component we need some more information. First, the variables that enter its CEF. To simplify things, let’s assume the same two variables that enter the CEF for our count component also enter the CEF for our binary component – that is, the constant and our single predictor, drawn from a standard Normal. (This is a common modelling choice, and the one used by Wilson and Piazza.) Second, we need an intercept and slope. To make the effect of our variables more sensible, let’s reverse their sign, and set them equal to \(-.5\) (remember, the binary component predicts zeros). Finally, after we draw our \(y\)’s from a Poisson, we must account for zero-inflation. By multiplying the draws from the count by (1 minus the zero-inflation indicator), we obtain a new outcome variable that equals zero when the Bernoulli draw a zero and/or when the Poisson draws a zero. The rest of the code follows the notation of the JAGS model above.

N <- 1000
alpha0 <- -.5
alpha1 <- -.5
beta0 <- 1
beta1 <- 1
x <- rnorm(n=N)
mu.binary <- alpha0*1 + alpha1*x
pi <- exp(mu.binary)/(1+exp(mu.binary))
zero <- rbinom(n=N, size=1, prob=pi)
mu.count <- beta0*1 + beta1*x
lambda <- exp(mu.count)
y.count <- rpois(n=N, lambda=lambda)
y <- y.count*(1-zero) 
dat <- data.frame(x,y)  

Before proceeding, let’s see how much zero-inflation we have, and how our outcome is distributed now:

table(zero)
## zero
##   0   1 
## 634 366
head(table(y))
## y
##   0   1   2   3   4   5 
## 440  96 106  65  57  44

We see that around 1/3 (634) of our observations will be “structural” zeros. Note that we also get zeros from the Poisson component, hence we have 440 zeros. This calls for a zero-inflated model!

We can now pass the same parameters as before to JAGS, while including priors for the binary component’s parameters (mu.alpha, tau.alpha):

forJags <- list(X=cbind(1,dat$x), 
                y=dat$y,  
                N=N,  
                mu.alpha=rep(0,2),  
                tau.alpha=diag(.0001,2),
                mu.beta=rep(0,2),
                tau.beta=diag(.0001,2))

The rest of the process is by now familiar:

jagsmodel <- jags.model(file="~/Dropbox/209_S16/Code/zipoisson.bug",
                        data=forJags)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 1002
##    Total graph size: 15019
## 
## Initializing model
out <- coda.samples(jagsmodel,
                    variable.names=c("alpha","beta"),
                    n.iter=1e5)

summary(out)
## 
## Iterations = 1001:101000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean      SD  Naive SE Time-series SE
## alpha[1] -0.6504 0.08637 2.731e-04      0.0010978
## alpha[2] -0.4867 0.09963 3.150e-04      0.0014123
## beta[1]   0.9812 0.02862 9.051e-05      0.0004252
## beta[2]   1.0327 0.01706 5.394e-05      0.0002353
## 
## 2. Quantiles for each variable:
## 
##             2.5%     25%     50%     75%   97.5%
## alpha[1] -0.8228 -0.7084 -0.6498 -0.5919 -0.4831
## alpha[2] -0.6813 -0.5529 -0.4886 -0.4196 -0.2909
## beta[1]   0.9249  0.9618  0.9813  1.0004  1.0376
## beta[2]   0.9991  1.0212  1.0328  1.0441  1.0657
summary(zeroinfl(y~x, data=dat, dist="poisson"))
## 
## Call:
## zeroinfl(formula = y ~ x, data = dat, dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1820 -0.7043 -0.2547  0.6463  6.2692 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.98214    0.02855   34.40   <2e-16 ***
## x            1.03243    0.01707   60.48   <2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.64467    0.08575  -7.518 5.55e-14 ***
## x           -0.48867    0.09926  -4.923 8.52e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 10 
## Log-likelihood: -1693 on 4 Df

Once again, MCMC and MLE estimates are similar, and close to the true values. Note that both models do worse – in terms of bias and inefficiency – at estimating the binary component’s parameters.

plot(out)

effectiveSize(out)
## alpha[1] alpha[2]  beta[1]  beta[2] 
## 6188.891 4976.288 4530.591 5254.628

Clearly, the posteriors betray that the sampler has not done a great job, and there is large dependency in our draws.

Zero-Inflated Negative Binomial Regression

We only need to change two lines of code to adapt the ZIP model to the ZINB. First, the obvious one: the density (we can copy this from the NB model). Second, the conditional mean of the distribution. This takes

model{
    ## Likelihood
    for(i in 1:N){
      y[i] ~ dnegbin(p[i],r)
      p[i] <- r/(r+(1-zero[i])*lambda.count[i]) - 1e-10*zero[i]
      lambda.count[i] <- exp(mu.count[i])
      mu.count[i] <- inprod(beta[],X[i,])
      
      ## Zero-Inflation
      zero[i] ~ dbern(pi[i])
      pi[i] <- ilogit(mu.binary[i])  
      mu.binary[i] <- inprod(alpha[],X[i,])
    } 
  
    ## Priors
    alpha ~ dmnorm(mu.alpha,tau.alpha)
    beta ~ dmnorm(mu.beta,tau.beta)
    r ~ dunif(0,50)
}

Let’s use the same method as before to simulate data. We only have to change the distribution for the count component to a negative binomial. We can set \(r=2\), as in our initial NB example:

N <- 1000
alpha0 <- -.5
alpha1 <- -.5
beta0 <- 1
beta1 <- 1
r <- 2
x <- rnorm(n=N)
mu.binary <- alpha0*1 + alpha1*x
pi <- exp(mu.binary)/(1+exp(mu.binary))
zero <- rbinom(n=N, size=1, prob=pi)
mu.count <- beta0*1 + beta1*x
lambda <- exp(mu.count)
y.count <- rnegbin(n=N, mu=lambda, theta=r)
y <- y.count*(1-zero) 
dat <- data.frame(x,y)  

We can go ahead and compile our model, then update it:

jagsmodel <- jags.model(file="~/Dropbox/209_S16/Code/zinegbin.bug",
                        data=forJags)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1000
##    Unobserved stochastic nodes: 1003
##    Total graph size: 17022
## 
## Initializing model
out <- coda.samples(jagsmodel,
                    variable.names=c("alpha","beta","r"),
                    n.iter=1e4)

Let’s compare it to the MLE estimate:

summary(out)
## 
## Iterations = 1001:11000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean      SD  Naive SE Time-series SE
## alpha[1] -0.6724 0.08887 0.0008887      0.0033331
## alpha[2] -0.4725 0.09934 0.0009934      0.0043981
## beta[1]   0.9703 0.03157 0.0003157      0.0013478
## beta[2]   1.0420 0.02214 0.0002214      0.0008931
## r        43.0826 5.47581 0.0547581      0.0972040
## 
## 2. Quantiles for each variable:
## 
##             2.5%     25%     50%     75%   97.5%
## alpha[1] -0.8481 -0.7305 -0.6719 -0.6129 -0.5001
## alpha[2] -0.6642 -0.5407 -0.4699 -0.4039 -0.2849
## beta[1]   0.9094  0.9492  0.9699  0.9906  1.0335
## beta[2]   0.9990  1.0278  1.0420  1.0568  1.0851
## r        29.4594 39.8938 44.3586 47.4797 49.7672
summary(zeroinfl(y~x, data=dat, dist="negbin"))
## 
## Call:
## zeroinfl(formula = y ~ x, data = dat, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0952 -0.6386 -0.4079  0.3979  8.0060 
## 
## Count model coefficients (negbin with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.95799    0.05744  16.678  < 2e-16 ***
## x            0.99142    0.05374  18.449  < 2e-16 ***
## Log(theta)   0.58710    0.13469   4.359 1.31e-05 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.6120     0.1225  -4.995 5.88e-07 ***
## x            -0.5716     0.1220  -4.685 2.80e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 1.7988 
## Number of iterations in BFGS optimization: 12 
## Log-likelihood: -1786 on 5 Df

We see that our model did a terrible job of describing the posterior of \(r\), the overdispersion parameter. The MLE estimate is not spot on (\(r=\exp(\log \theta))\)), but it got much closer than JAGS. Let’s check the plots:

plot(out)

Indeed, our sample did not do too well with \(r\). Why?

Resources

For a review of all of the above count models in R, see the vignette for the pscl package (you can download it by calling vignette("countreg")). For more on Poisson models in a Bayesian context, see Jackman pp.73–80 and 208-2012. For more on NB models in a Bayesian context, see Jackman pp.78–80, 222-225, and 278–280.