NOTE : Below post is valid for Package version 1.4.0 and Before.
Estimating the shape parameters of Beta-Binomial Distribution
Introduction
Beginning of this month I wrote a small section regarding maxLik function by comparing it to other optimization functions. Here, we will further study the analytical methods which can be used in this function and compare them to find suitability. maxLik function is from the package maxLik. Further, Documentation clearly indicates all things related to the function in detail.
Focusing on the seven analytical methods is my intention from this blog post. So, we have the Beta-Binomial distribution and Binomial Outcome data, and need to estimate proper shape parameters which would Maximize the Log Likelihood value of the Beta-Binomial distribution for the Alcohol Consumption data. In this case Alcohol Consumption data from the fitODBOD package will be used.
Further we will focus on the process time to optimization, estimated shape parameters, maximized Log Likelihood value, expected frequencies, p-value and Over-dispersion with tables.
Below are the seven analytical methods in concern
NR
BFGS
BFGSR
BHHH
SANN
CG
NM
Alcohol Consumption data has two sets of frequency values but only values from week 1 will be used. Below is the the Alcohol Consumption data, where number of observations is 399 and the Binomial Random variable is a vector of values from zero to seven.
Small section about the maxLik function will be very useful to understand this blog post.
Reference : Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443–458 Marquardt, D.W., (1963)
An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431–441
So for the initial parameters of a=0.1 and b=0.2 we will be finding estimated parameters from different analytical methods which would maximize the Log Likelihood value of the Beta-Binomial distribution.
First we are transforming the given EstMLEBetaBin function to satisfy the maxLik function conditions.
# new function to facilitate maxLik criteria# only one input but has two elementsformaxLik<-function(a){-EstMLEBetaBin(x=Alcohol_data$Days, freq=Alcohol_data$week1,a=a[1],b=a[2])}
So the formaxLik function can be used as above and parameters are estimated for \(\alpha\) and \(\beta\) (or a, b) for the Alcohol Consumption data week 1. Further the maxLik function can be scrutinized as below.
package : maxLik
No of Inputs: 6
Minimum required Inputs : 2
Class of output : list or class of maxim or class of maxLik
No of outputs: 11
No of Analytical Methods : 7
Default Method : Automatically chosen
NR method
NR is an abbreviation for Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration.
library(maxLik)# optimizing values for a,b using NR analytical methodNR_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="NR")# obtaining class of outputclass(NR_answer)# length of outputlength(NR_answer)# the outputsNR_answer$estimate# estimated values for a, bNR_answer$maximum# minimized function value NR_answer$iterations# no of iterations to succeedNR_answer$gradient# last gradient value which was calculatedNR_answer$message# additional informationNR_answer$hessian# hessian matrixNR_answer$code# indicates successful completionNR_answer$fixed# logical vector indicating which parameters are constantsNR_answer$type# type of maximizationNR_answer$last.step# list describing the last unsuccessful stepNR_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,NR_answer$estimate[1],NR_answer$estimate[2])
BFGS method
BFGS is a Quasi-Newton method (also known as a variable metric algorithm), specifically that published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno. This uses function values and gradients to build up a picture of the surface to be optimized.
Reference : Broyden, C.G., 1967. Quasi-Newton methods and their application to function minimization. Mathematics of Computation, 21(99), pp.368-381.
# optimizing values for a,b using BFGS analytical methodBFGS_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="BFGS")# obtaining class of outputclass(BFGS_answer)# length of outputlength(BFGS_answer)# the outputsBFGS_answer$estimate# estimated values for a, bBFGS_answer$maximum# minimized function value BFGS_answer$iterations# no of iterations to succeedBFGS_answer$gradient# last gradient value which was calculatedBFGS_answer$message# additional informationBFGS_answer$hessian# hessian matrixBFGS_answer$code# indicates successful completionBFGS_answer$fixed# logical vector indicating which parameters are constantsBFGS_answer$type# type of maximizationBFGS_answer$last.step# list describing the last unsuccessful stepBFGS_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,BFGS_answer$estimate[1],BFGS_answer$estimate[2])
BFGSR method
Combination of two methods which are Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970).
Reference : Broyden, C.G., 1967. Quasi-Newton methods and their application to function minimization. Mathematics of Computation, 21(99), pp.368-381.
# optimizing values for a,b using BFGSR analytical methodBFGSR_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="BFGSR")# obtaining class of outputclass(BFGSR_answer)# length of outputlength(BFGSR_answer)# the outputsBFGSR_answer$estimate# estimated values for a, bBFGSR_answer$maximum# minimized function value BFGSR_answer$iterations# no of iterations to succeedBFGSR_answer$gradient# last gradient value which was calculatedBFGSR_answer$message# additional informationBFGSR_answer$hessian# hessian matrixBFGSR_answer$code# indicates successful completionBFGSR_answer$fixed# logical vector indicating which parameters are constantsBFGSR_answer$type# type of maximizationBFGSR_answer$last.step# list describing the last unsuccessful stepBFGSR_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,BFGSR_answer$estimate[1],BFGSR_answer$estimate[2])
BHHH method
BHHH method (Berndt, Hall, Hall, Hausman 1974). The BHHH (information equality) approximation is only valid for log-likelihood functions. It requires the score (gradient) values by individual observations and hence those must be returned by individual observations by grad or fn. With the complexity of BHHH method I choose not to discuss it here, but a reference is mentioned to anyone who has interest in this analytical method.
Reference : Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653–665.
SANN method
Method SANN is by default a variant of simulated annealing given in Belisle (1992). Simulated-annealing belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differential functions. This implementation uses the Metropolis function for the acceptance probability.
By default the next candidate point is generated from a Gaussian Markov kernel with scale proportional to the actual temperature. If a function to generate a new candidate point is given, method SANN can also be used to solve combinatorial optimization problems. Temperatures are decreased according to the logarithmic cooling schedule as given in Belisle (1992, p.890); specifically, the temperature is set to \(temp / log(((t-1) %/% tmax)*tmax + exp(1))\), where \(t\) is the current iteration step and temp and tmax are specifiable via control.
Note that the SANN method depends critically on the settings of the control parameters. It is not a general-purpose method but can be very useful in getting to a good value on a very rough surface.
Reference : Belisle, C.J., 1992. Convergence theorems for a class of simulated annealing algorithms on R d. Journal of Applied Probability, 29(4), pp.885-895.
# optimizing values for a,b using SANN analytical methodSANN_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="SANN")# obtaining class of outputclass(SANN_answer)# length of outputlength(SANN_answer)# the outputsSANN_answer$estimate# estimated values for a, bSANN_answer$maximum# minimized function value SANN_answer$iterations# no of iterations to succeedSANN_answer$gradient# last gradient value which was calculatedSANN_answer$message# additional informationSANN_answer$hessian# hessian matrixSANN_answer$code# indicates successful completionSANN_answer$fixed# logical vector indicating which parameters are constantsSANN_answer$type# type of maximizationSANN_answer$last.step# list describing the last unsuccessful stepSANN_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,SANN_answer$estimate[1],SANN_answer$estimate[2])
CG method
Method CG is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak-Ribiere or Beale-Sorenson updates). Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems.
Reference : Fletcher, R. and Reeves, C.M., 1964. Function minimization by conjugate gradients. The computer journal, 7(2), pp.149-154.
# optimizing values for a,b using CG analytical methodCG_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="CG")# obtaining class of outputclass(CG_answer)# length of outputlength(CG_answer)# the outputsCG_answer$estimate# estimated values for a, bCG_answer$maximum# minimized function value CG_answer$iterations# no of iterations to succeedCG_answer$gradient# last gradient value which was calculatedCG_answer$message# additional informationCG_answer$hessian# hessian matrixCG_answer$code# indicates successful completionCG_answer$fixed# logical vector indicating which parameters are constantsCG_answer$type# type of maximizationCG_answer$last.step# list describing the last unsuccessful stepCG_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,CG_answer$estimate[1],CG_answer$estimate[2])
NM method
NM is the abbreviation to Nelder and Mead method. According to the documentation it uses only function values and is robust but relatively slow. It will work reasonably well for non-differential functions.
Reference : Nelder, J.A. and Mead, R., 1965. A simplex method for function minimization. The computer journal, 7(4), pp.308-313.
# optimizing values for a,b using NM analytical methodNM_answer<-maxLik(logLik =formaxLik, start =c(0.1,0.2),method ="NM")# obtaining class of outputclass(NM_answer)# length of outputlength(NM_answer)# the outputsNM_answer$estimate# estimated values for a, bNM_answer$maximum# minimized function value NM_answer$iterations# no of iterations to succeedNM_answer$gradient# last gradient value which was calculatedNM_answer$message# additional informationNM_answer$hessian# hessian matrixNM_answer$code# indicates successful completionNM_answer$fixed# logical vector indicating which parameters are constantsNM_answer$type# type of maximizationNM_answer$last.step# list describing the last unsuccessful stepNM_answer$control# see the documentation understand# fitting the Beta-Binomial distribution with estimated shape parameter valuesfitBetaBin(Alcohol_data$Days,Alcohol_data$week1,NM_answer$estimate[1],NM_answer$estimate[2])
Sumary of Time evalutation for different Analytical methods of maxLik function
Below table will compare the system process time for different analytical methods. In order to do this time comparison it is possible to use the benchmark function of rbenchmark package. Below mentioned code chunk provides the output in a table form which includes the analytical methods and their respective time values. The estimation process of the parameters where each method has been replicated 1000 times to receive a more accurate table for time values.
Table is in the ascending order for elapsed time column. It is evidently clear that SANN analytical method has taken the most time. Before that the analytical method BFGSR is in 5th place. While NM or Nelder Mead method has taken the least time. This time is calculated for 1000 replications of the function being repeated under same conditions. These times does not only reflect based on analytical method, rather on the Log Likelihood function that needs to be maximized, the data provided, the number of estimated that needs to be estimated, the complexity of the function and finally the computer’s processing power.
Summary of results after using the maxLik function for different analytical methods
Estimated the shape parameters a,b pair wise from the analytical methods NR, BFGS, BFGSR, SANN, CG and NM. These estimated parameters will be now used in the fitBetaBin function to find the expected frequencies, p-values and over-dispersion. The above measurements can be used to compare each analytical method for any significance difference.
Comparing p-values it is clear that all analytical methods generate the same value up-to third decimal point. This is not the case in Maximum Log Likelihood value where analytical methods NR, BFGS, CG and NM have obtained the value -813.4571, while BFGSR and SANN have shown -813.4576. Further, Over-dispersion values are similar until third decimal point, but after that there is a clear difference among all six methods.
All of the analytical methods have produced distinct values for estimated shape parameters of a and b. For the shape parameter a, similarity is only until second decimal point, and for shape parameter b, similarity of value is only on first decimal point.
BinomialRandomVariable
Frequency
NR
BFGS
BFGSR
SANN
CG
NM
0
47
54.62
54.62
54.72
54.78
54.62
54.61
1
54
42
42
41.98
42.06
42
42
2
43
38.9
38.9
38.85
38.93
38.9
38.91
3
40
38.54
38.54
38.47
38.55
38.54
38.54
4
40
40.07
40.07
40
40.06
40.07
40.07
5
41
44
43.99
43.93
43.97
44
44
6
39
53.09
53.09
53.06
53.03
53.09
53.09
7
95
87.78
87.8
87.98
87.76
87.78
87.77
Total No of Observations
399
399
399.01
398.99
399.14
399
398.99
p-value
0.0901
0.0903
0.0902
0.0903
0.0901
0.0902
Estimated a and b
a=0.7229428 b=0.5808488
a=0.7228919 b=0.5807283
a=0.7209896 b=0.5790360
a=0.7219066 b=0.5813484
a=0.7229403 b=0.5808469
a=0.7230707 b=0.5809894
Maximum Log Likelihood
-813.4571
-813.4571
-813.4576
-813.4576
-813.4571
-813.4571
Over Dispersion
0.434067
0.4340993
0.4347778
0.4341682
0.4340679
0.4340165
Final Conclusion
Now we can conclude the above findings using the tables provided, and it is clear that there is no strong change in expected frequencies, maximum Log Likelihood value or p-value if we use any one of the methods mentioned above. If time is crucial it is best to avoid BFGSR and SANN methods as they take considerable amount of time. I would recommend choose the analytical method from maxLik function based on your needs of output and research objective.