NOTE : Below post is valid for Package version 1.4.0 and Before.
Estimating the shape parameters by Maximizing the Log Likelihood value of Beta-Binomial Distribution.
Introduction
When we need to estimate parameters from a discrete distribution or continuous distribution or a function we can use the below mentioned R functions. We will be using the technique of maximizing the Log Likelihood function or minimizing the Negative Log Likelihood function. Based on this technique we will compare these R functions because it might benefit people who are struggling which one to choose. We have 7 functions in total by my knowledge when I was writing this blog post.
Using the fitODBOD package, I will use the Alcohol Consumption data to try and model it for the Beta-Binomial Distribution, which has two shape parameters to estimate. The process is that we find values for shape parameters a and b (or \(\alpha\) and \(\beta\)) which will maximize the Log Likelihood function of Beta Binomial Distribution or in our case minimize the Negative Log Likelihood function of Beta Binomial Distribution.
Above mentioned Beta-Binomial distribution’s Probability Mass Function(PMF) is denoted as
where \(n=0,1,2,...\), \(x=0,1,2,...,n\) and \(\alpha,\beta > 0\). Further \(x\) is the Binomial Random variable, a,b(or \(\alpha\), \(\beta\)) are shape parameters and \(n\) is the binomial trial value. Also B(\(\alpha\),\(\beta\)) is the Beta function. In this distribution we have to estimate the values for a and b.
Further, using the PMF we can construct the Likelihood function for \(\Omega_{BB}=(\alpha,\beta)^T\) as given below:
In the package fitODBOD we have the function EstMLEBetaBin which is constructed based on the above Negative Log Likelihood function, and we will use it.
We take Log to transform the Likelihood function values, which will simplify the computation process and save time. The optimization functions we need to compare will use specific mathematical methods to find the most appropriate shape parameter values.
Further, I will focus on the attributes of the above functions. Focusing from which package, Number of Inputs, Number of Outputs, Time to complete optimization and Analytical Methods used with the assistance in two tables. Alcohol Consumption data has two sets of frequency values but only values from week 1 will be used. Below is the the Alcohol Consumption data, where number of observations is 399 and the Binomial Random variable is a vector of values from zero to seven.
optim is the first function in concern. Documentation of the optim function is useful and it indicates that this function is only used on one input situations only. This means our EstMLEBetaBin function has to be modified. Reason for this is only the parameters that should be estimated need to be input values but our EstMLEBetaBin function has four parameters which are a,b,x(Binomial Random Variable) and freq(corresponding frequency values).
While using optim function first index refers to shape parameter a and second index refers to shape parameter b. Further, we have to input the observations or in our case the Binomial random variable values and their respective frequencies. I think it is inconvenient to modify the EstMLEBetaBin function, because if we want to estimate parameters for different data-sets it would become tedious. After modification we have a new function foroptim which can be used for demonstration and comparison.
Below is the code to estimation and going through the outputs. It should be noted that we have to provide initial parameter values as an input to the optim function, and it is best to provide values in the domain of shape parameter values which we want to estimate.
Here the shape parameters a and b are in the region of greater than zero but less than positive infinity (\(+\infty >a,b>0\)). So for the initial parameters of a=0.1 and b=0.2 we are finding parameters which would minimize the Negative Log Likelihood function of Beta-Binomial distribution with the Alcohol Consumption data.
# new function to facilitate optim criteria# only one input but has two elementsforoptim<-function(a){EstMLEBetaBin(x=Alcohol_data$Days, freq=Alcohol_data$week1,a=a[1],b=a[2])}# optimizing values for a,b using default mathematical methodoptim_answer<-optim(par=c(0.1,0.2),fn=foroptim)# obtaining class of outputclass(optim_answer)#length of outputlength(optim_answer)# the outputsoptim_answer$par# estimated values for a, boptim_answer$value# minimized function value optim_answer$counts# see the documentation to understandoptim_answer$convergence# indicates successful completionoptim_answer$message# additional information# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,optim_answer$par[1],optim_answer$par[2])
So the foroptim function can be used as above and parameters are estimated for \(\alpha\) and \(\beta\) (or a, b) for the Alcohol Consumption data week 1. Further we have scrutinized the function as below.
package : stats
No of Inputs: 7
Minimum required Inputs : 2
Class of output : list
No of outputs: 5
No of Analytical Methods : 6
Default Method : Nelder-Mead
nlm Function
nlm function is also similar to optim but only one analytical method will be used, which is a Newton-type Algorithm. Here also there needs to be changes made to our EstMLEBetaBin function as previously. After making those changes we have called the new Negative Log-likelihood function of Beta-Binomial distribution as fornlm. Then we can use the nlm function and estimate a and b for the initial shape parameter values of 0.1 and 0.2 respectively. Documentation of nlm function is very useful so that we can understand how it works.
Below is the code for using nlm function appropriately and fiddling with the results.
#new function to facilitate nlm criteriafornlm<-function(a){EstMLEBetaBin(x=Alcohol_data$Days, freq=Alcohol_data$week1,a=a[1],a[2])}#optimizing values for a,b using the only analytical methodnlm_answer<-nlm(f=fornlm,p=c(0.1,0.2))#obtaining class of outputclass(nlm_answer)#length of outputlength(nlm_answer)# the outputsnlm_answer$estimate# estimated values for a, bnlm_answer$minimum# minimized function value nlm_answer$gradient# gradient at the estimated minimum of given funcitonnlm_answer$code# indicates successful completionnlm_answer$iterations# number of iterations performed# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,nlm_answer$estimate[1],nlm_answer$estimate[2])
Similarly for the fornlm function we have estimated values for a and b which would fit the Alcohol consumption data of week 1. Below is a point form summary of nlm function.
package : stats
No of Inputs: 12
Minimum required Inputs : 2
Class of output : list
No of outputs: 5
No of Analytical Methods : 1
Default Method : Newton-type Algorithm
nlminb Function
nlminb is also similar and requires the EstMLEBetaBin function to be restructured as similar to previous situations. After this task we now have a function called fornlminb. nlminb function is based on analytical method of unconstrained and box-constrained optimization using PORT routines.
After choosing initial parameter values for a and b, which are respectively 0.1 and 0.2 the estimation was done following the process below
# new function to facilitate nlminb criteriafornlminb<-function(a){EstMLEBetaBin(x=Alcohol_data$Days, freq=Alcohol_data$week1,a=a[1],a[2])}# optimizing values for a,b using default analytical methodnlminb_answer<-nlminb(start=c(0.1,0.2), objective=fornlminb)# obtaining class of outputclass(nlminb_answer)# length of outputlength(nlminb_answer)# the outputsnlminb_answer$par# estimated values for a, bnlminb_answer$objective# minimized function value nlminb_answer$evaluations# see the documentation to understandnlminb_answer$convergence# indicates successful completionnlminb_answer$message# additional informationnlminb_answer$iterations# number of iterations performed# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,nlminb_answer$par[1],nlminb_answer$par[2])
Documentation includes the information related to the function therefore referring it will be useful. After estimating values for a and b using nlminb function these were noticed regarding the function in concern
package : stats
No of Inputs: 8
Minimum required Inputs : 2
Class of output : list
No of outputs: 6
No of Analytical Methods : 1
Default Method : Unconstrained and box-constrained optimization using PORT routines
ucminf Function
Package ucminf produces the ucminf function, as previously mentioned functions here also we have to change the EstMLEBetaBin function. After making the changes we will be using the forucminf function to the estimation process of shape parameters a and b.
When initial parameter values are set to a=0.1 and b=0.2 we will obtain results from ucminf function, which will minimize the Negative Log-likelihood value of Beta-Binomial distribution. Below is the code for estimation and using the results to understand the function ucminf.
library(ucminf)# new function to facilitate ucminf criteriaforucminf<-function(a){EstMLEBetaBin(x=Alcohol_data$Days,freq =Alcohol_data$week1,a=a[1],a[2])}# optimizing values for a,b using default analytical methoducminf_answer<-ucminf(par=c(0.1,0.2),fn=forucminf)#obtaining classclass(ucminf_answer)# length of outputlength(ucminf_answer)# the outputsucminf_answer$par# estimated values for a, bucminf_answer$value# minimized function value ucminf_answer$invhessian.lt# see the documentation understanducminf_answer$convergence# indicates successful completionucminf_answer$message# additional informationucminf_answer$info# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,ucminf_answer$par[1],ucminf_answer$par[2])
With the help of R Documentation the estimation was done for shape parameter values a and b using the ucminf function. Below is the initial understanding of the ucminf function
package : ucminf
No of Inputs: 5
Minimum required Inputs : 2
Class of output : list
No of outputs: 6
No of Analytical Methods : 1
Default Method : Quasi-Newton Algorithm type with BFGS updating
maxLik Function
maxLik function is from the maxLik package, which only maximizes the Log Likelihood function. Therefore we have to restructure EstMLEBetaBin as previously mentioned, but as an addition a negative sign is added for the output. This new function will be called as formaxLik.
For the initial parameter values where a=0.1 and b=0.2 the maxLik function will be used and results will be evaluated as below.
library(maxLik)# new function to facilitate maxLik criteriaformaxLik<-function(a){-EstMLEBetaBin(x=Alcohol_data$Days,freq =Alcohol_data$week1,a=a[1],a[2])}# optimizing values for a,b using default analytical methodmaxLik_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2))# obtaining class of outputclass(maxLik_answer)# length of outputlength(maxLik_answer)# the outputsmaxLik_answer$estimate# estimated values for a, bmaxLik_answer$maximum# minimized function value maxLik_answer$iterations# no of iterations to succeedmaxLik_answer$gradient# last gradient value which was calculatedmaxLik_answer$message# additional informationmaxLik_answer$hessian# hessian matrixmaxLik_answer$code# indicates successful completionmaxLik_answer$fixed# logical vector indicating which parameters are constantsmaxLik_answer$type# type of maximizationmaxLik_answer$last.step# list describing the last unsuccessful stepmaxLik_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,maxLik_answer$estimate[1],maxLik_answer$estimate[2])
Using the Documentation of maxLik and the Documentation of maxNR function the above analysis was done for the maxLik function. According to the above code, below are the findings from the maxLik function in point form
package : maxLik
No of Inputs: 6
Minimum required Inputs : 2
Class of output : list or class of maxim or class of maxLik
No of outputs: 11
No of Analytical Methods : 7
Default Method : Automatically chosen
mle Function
stats4 package is installed so that mle function can be operated for the purpose of estimating a and b parameters. Here also as previously we need to make some changes as below and create a new Negative Log Likelihood function called formle.
For the initial parameter values of a=0.1 and b=0.2 the Negative Log Likelihood value of Beta-Binomial distribution has been minimized using mle function. Below is the code for estimation and investigation from the outputs of mle after estimation.
library(stats4)# new function to facilitate mle criteriaformle<-function(a,b){EstMLEBetaBin(x=Alcohol_data$Days,freq =Alcohol_data$week1,a,b)}# optimizing values for a,b using default analytical methodmle_answer<-mle(minuslogl=formle,start =list(a=0.1,b=0.2))# obtaining classclass(mle_answer)# length of outputlength(mle_answer)# the outputsmle_answer@call# inputs i have used mle_answer@coef# estimated values for a,bmle_answer@fullcoef# all values, even the fixed values we did not want to estimatemle_answer@vcov# variance covariance matrix for a,bmle_answer@min# minimized function valuemle_answer@details# details after estimation processmle_answer@nobs# number of observations to be used for computing only if given mle_answer@method# optimization methods used# Methods usedconfint(mle_answer)# confidence intervals for estimated valueslogLik(mle_answer)# Negative loglikelihood value for estimated values profile(mle_answer)# Likelihood profile generation.nobs(mle_answer)# number of observations to be used for computing only if givenshow(mle_answer)# display object brieflysummary(mle_answer)# generate a summary#update() # updating if we have new data and need to estimate new valuesvcov(mle_answer)# variance covariance matrix# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,mle_answer@coef[1],mle_answer@coef[2])
It should be noted that in all 5 previous functions we had only outputs in the form of lists, but through mle we are seeing a class of mle output.The Documention explains the inputs and outputs of mle. Documentation for mle-class explains further about the methods that can be used. Below is a list of findings based on the outputs of the mle function.
package : stats4
No of Inputs: 5
Minimum required Inputs : 2
Class of output : class of mle
No of outputs: 9
Methods for output: 8
No of Analytical Methods :6
Default Method : Nelder and Mead
mle2 Function
mle2 function is advanced than mle function and it is from the package bbmle. Here, there is no need to modify the EstMLEBetabin function from the fitODBOD package to estimation as all previous situations .
For the initial parameter values of a=0.1 and b=0.2 Negative Log Likelihood function of Beta-Binomial distribution will be minimized where outputs will be investigated and methods related to output of class mle2 will be used.
Below is the code for using mle2 function and scrutinizing the output and methods related to it.
library(bbmle)# optimizing values for a,b using default analytical methodmle2_answer<-mle2(minuslogl=EstMLEBetaBin,start =list(a=0.1,b=0.2), data =list(x=Alcohol_data$Days,freq=Alcohol_data$week1))# obtaining classclass(mle2_answer)# length of outputlength(mle2_answer)# the outputsmle2_answer@call# inputs generally considered mle2_answer@call.orig# inputs i have givenmle2_answer@coef# estimated values for a,bmle2_answer@fullcoef# all values, even the fixed values we did not want to estimatemle2_answer@vcov# variance covariance matrix for a,bmle2_answer@min# minimized function valuemle2_answer@details# details after estimation processmle2_answer@method# optimization methods usedmle2_answer@data# data used for estimation mle2_answer@formula# if a formula was specified in the input mle2_answer@optimizer# function used for optimizing# Methods usedcoef(mle2_answer)# extrat the estimated valuesconfint(mle2_answer)# confidence intervals for estimated valuesshow(mle2_answer)# display object brieflysummary(mle2_answer)# generate a summary#update() #updating if we have new data and need to estimate new valuesvcov(mle2_answer)# variance covariance matrix#formula(mle2_answer) # if a formula was specified in the input #plot(mle2_answer) # plot the profilelogLik(mle2_answer)# Negative loglikelihood value for estimated values profile(mle2_answer)# profile of estimated values# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,mle2_answer@coef[1],mle2_answer@coef[2])
Documention of mle2 function and Documentation of mle2-class
provide a few findings as mentioned below in point form.
package : bbmle
No of Inputs: 22
Minimum required Inputs : 3
Class of output : class of mle2
No of outputs: 12
Methods for output: 9
No of Analytical Methods : 6
Default Method : Nelder and Mead
Summary of the seven optimization functions in R
After completing a brief fact finding of the R functions optim, nlm, nlminb, ucminf, maxLik, mle and mle2. We can record the facts and results in two tables. Using them we can choose the best suitable function for our needs of estimation from optimization.
Mostly I prefer mle2 function than others because it provides special methods to handle the mle2 outputs, and the output themselves are very thorough than other R function outputs.
It can be seen that estimated a and b shape parameter values are different only from the third decimal point in all outputs. But this does not effect the minimized Negative Log Likelihood value of Beta-Binomial distribution for our Alcohol Consumption data, which is same for all outputs.
Function
optim
nlm
nlminb
ucminf
maxLik
mle
mle2
Package
stats
stats
stats
ucminf
maxLik
stats4
bbmle
No of Inputs
7
12
8
5
6
5
22
Minimum No of Inputs
2
2
2
2
2
2
3
Analytical Methods
6
1
1
1
7
6
6
Output Type
List
List
List
List
List class maxLik class maxim
class mle
class mle2
No of Outputs
5
5
6
6
11
9
12
No of Methods
None
None
None
None
None
8
9
Initial value(a,b)
a=0.1 b=0.2
a=0.1 b=0.2
a=0.1 b=0.2
a=0.1 b=0.2
a=0.1 b=0.2
a=0.1 b=0.2
a=0.1 b=0.2
Estimated value(a,b)
a=0.7230707 b=0.5809894
a=0.7229384 b=0.5808448
a=0.7229404 b=0.5808469
a=0.7229390 b=0.5808458
a=0.7229428 b=0.5808488
a=0.7228930 b=0.5807279
a=0.7228930 b=0.5807279
Negative Log Likelihood value
813.4571
813.4571
813.4571
813.4571
813.4571
813.4571
813.4571
Summary of Time evaluation for the seven optimization R functions
Further at the beginning I have mentioned to evaluate the system process time for the seven functions. In order to do this time comparison it is possible to use the benchmark function of rbenchmark package, and below mentioned code chunk provides the output in a table form. It includes the functions and their respective time values. The estimation process of the parameters where each function has been replicated 1000 times to receive a more accurate table of time values.
The table is in ascending order according to the elapsed time values column. According to this we can see that least time takes to the nlminb function and most time is taken to the mle2 function. These times completely depends on the Negative Log Likelihood function you need to minimize, the data you provided, the number of estimators that needs to be estimated, the complexity of the function and finally your computer’s processing power.
Therefore based on the needs of your output and the function which needs estimation choose the best function out of the above seven. Because as I said related to the earlier table the estimated values are slightly different but does not make any strong influence on the results.
Summary of Results after estimating parameters using the optimization R functions
After using the functions optim, nlm, nlminb, ucminf, maxLik, mle and mle2 to estimate the shape parameters a, b we can use the estimated parameters in the function fitBetaBin. Using this function we can find expected frequencies for each of the estimated parameters a,b and compare p-values and over-dispersion and understand if using different optimization functions had any effect on them.
According to the below table there is no significant changes between the expected frequencies, p-values or the over dispersion values. This is a clear indication that it does not matter what function we use for the optimization. Because the process will occur effectively only efficiency(time) will be affected.
BinomialRandomVariable
Frequency
optim
nlm
nlminb
ucminf
maxLik
mle
mle2
0
47
54.61
54.62
54.62
54.62
54.62
54.62
54.62
1
54
42
42
42
42
42
42
42
2
43
38.91
38.9
38.9
38.9
38.9
38.9
38.9
3
40
38.54
38.54
38.54
38.54
38.54
38.54
38.54
4
40
40.07
40.07
40.07
40.07
40.07
40.07
40.07
5
41
44
44
44
44
44
43.99
43.99
6
39
53.09
53.09
53.09
53.09
53.09
53.09
53.09
7
95
87.77
87.78
87.78
87.78
87.78
87.8
87.8
Total No of Observations
399
398.99
399
399
399.01
399
399.01
399.01
p-value
0.0902
0.0901
0.0901
0.0901
0.0901
0.0903
0.0903
Over Dispersion
0.4340165
0.4340686
0.4340679
0.4340683
0.434067
0.4340992
0.4340992
Final Conclusion
We had 7 functions to compare but choosing one over the other is completely harmless to the final result of estimation as seen by our tables. The only issue is time, therefore I would recommend choose the best function based on your needs of output and research objective.