Benchmarking the maxLik function

BetaBinomial

fitODBOD

maxLik

Published

December 20, 2018

NOTE : Below post is valid for Package version 1.4.0 and Before.

Estimating the shape parameters of Beta-Binomial Distribution

Introduction

Beginning of this month I wrote a small section regarding maxLik function by comparing it to other optimization functions. Here, we will further study the analytical methods which can be used in this function and compare them to find suitability. maxLik function is from the package maxLik. Further, Documentation clearly indicates all things related to the function in detail.

Focusing on the seven analytical methods is my intention from this blog post. So, we have the Beta-Binomial distribution and Binomial Outcome data, and need to estimate proper shape parameters which would Maximize the Log Likelihood value of the Beta-Binomial distribution for the Alcohol Consumption data. In this case Alcohol Consumption data from the fitODBOD package will be used.

Further we will focus on the process time to optimization, estimated shape parameters, maximized Log Likelihood value, expected frequencies, p-value and Over-dispersion with tables.

Below are the seven analytical methods in concern

NR
BFGS
BFGSR
BHHH
SANN
CG
NM

Alcohol Consumption data has two sets of frequency values but only values from week 1 will be used. Below is the the Alcohol Consumption data, where number of observations is 399 and the Binomial Random variable is a vector of values from zero to seven.

library(fitODBOD)
kable(Alcohol_data,"html",align=c('c','c','c')) %>%
  kable_styling(bootstrap_options = c("striped"),font_size = 14,full_width = F) %>%
  row_spec(0,color = "blue") %>%
  column_spec(1,color = "red")

Days	week1	week2
0	47	42
1	54	47
2	43	54
3	40	40
4	40	49
5	41	40
6	39	43
7	95	84

Brief of maxLik Function

Small section about the maxLik function will be very useful to understand this blog post.

Reference : Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443–458 Marquardt, D.W., (1963)

An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431–441

So for the initial parameters of a=0.1 and b=0.2 we will be finding estimated parameters from different analytical methods which would maximize the Log Likelihood value of the Beta-Binomial distribution.

First we are transforming the given EstMLEBetaBin function to satisfy the maxLik function conditions.

# new function to facilitate maxLik criteria
# only one input but has two elements
formaxLik<-function(a)
  {
  -EstMLEBetaBin(x=Alcohol_data$Days, freq=Alcohol_data$week1,a=a[1],b=a[2])
  }

So the formaxLik function can be used as above and parameters are estimated for \(\alpha\) and \(\beta\) (or a, b) for the Alcohol Consumption data week 1. Further the maxLik function can be scrutinized as below.

package : maxLik
No of Inputs: 6
Minimum required Inputs : 2
Class of output : list or class of maxim or class of maxLik
No of outputs: 11
No of Analytical Methods : 7
Default Method : Automatically chosen

NR method

NR is an abbreviation for Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration.

library(maxLik)

# optimizing values for a,b using NR analytical method
NR_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "NR")

# obtaining class of output
class(NR_answer)

# length of output
length(NR_answer)

# the outputs
NR_answer$estimate # estimated values for a, b
NR_answer$maximum # minimized function value 
NR_answer$iterations  # no of iterations to succeed
NR_answer$gradient # last gradient value which was calculated
NR_answer$message # additional information
NR_answer$hessian # hessian matrix
NR_answer$code # indicates successful completion
NR_answer$fixed # logical vector indicating which parameters are constants
NR_answer$type # type of maximization
NR_answer$last.step # list describing the last unsuccessful step
NR_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           NR_answer$estimate[1],NR_answer$estimate[2])

BFGS method

BFGS is a Quasi-Newton method (also known as a variable metric algorithm), specifically that published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno. This uses function values and gradients to build up a picture of the surface to be optimized.

Reference : Broyden, C.G., 1967. Quasi-Newton methods and their application to function minimization. Mathematics of Computation, 21(99), pp.368-381.

# optimizing values for a,b using BFGS analytical method
BFGS_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "BFGS")

# obtaining class of output
class(BFGS_answer)

# length of output
length(BFGS_answer)

# the outputs
BFGS_answer$estimate # estimated values for a, b
BFGS_answer$maximum # minimized function value 
BFGS_answer$iterations  # no of iterations to succeed
BFGS_answer$gradient # last gradient value which was calculated
BFGS_answer$message # additional information
BFGS_answer$hessian # hessian matrix
BFGS_answer$code # indicates successful completion
BFGS_answer$fixed # logical vector indicating which parameters are constants
BFGS_answer$type # type of maximization
BFGS_answer$last.step # list describing the last unsuccessful step
BFGS_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           BFGS_answer$estimate[1],BFGS_answer$estimate[2])

BFGSR method

Combination of two methods which are Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970).

Reference : Broyden, C.G., 1967. Quasi-Newton methods and their application to function minimization. Mathematics of Computation, 21(99), pp.368-381.

# optimizing values for a,b using BFGSR analytical method
BFGSR_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "BFGSR")

# obtaining class of output
class(BFGSR_answer)

# length of output
length(BFGSR_answer)

# the outputs
BFGSR_answer$estimate # estimated values for a, b
BFGSR_answer$maximum # minimized function value 
BFGSR_answer$iterations  # no of iterations to succeed
BFGSR_answer$gradient # last gradient value which was calculated
BFGSR_answer$message # additional information
BFGSR_answer$hessian # hessian matrix
BFGSR_answer$code # indicates successful completion
BFGSR_answer$fixed # logical vector indicating which parameters are constants
BFGSR_answer$type # type of maximization
BFGSR_answer$last.step # list describing the last unsuccessful step
BFGSR_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           BFGSR_answer$estimate[1],BFGSR_answer$estimate[2])

BHHH method

BHHH method (Berndt, Hall, Hall, Hausman 1974). The BHHH (information equality) approximation is only valid for log-likelihood functions. It requires the score (gradient) values by individual observations and hence those must be returned by individual observations by grad or fn. With the complexity of BHHH method I choose not to discuss it here, but a reference is mentioned to anyone who has interest in this analytical method.

Reference : Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653–665.

SANN method

Method SANN is by default a variant of simulated annealing given in Belisle (1992). Simulated-annealing belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differential functions. This implementation uses the Metropolis function for the acceptance probability.

By default the next candidate point is generated from a Gaussian Markov kernel with scale proportional to the actual temperature. If a function to generate a new candidate point is given, method SANN can also be used to solve combinatorial optimization problems. Temperatures are decreased according to the logarithmic cooling schedule as given in Belisle (1992, p.890); specifically, the temperature is set to \(temp / log(((t-1) %/% tmax)*tmax + exp(1))\), where \(t\) is the current iteration step and temp and tmax are specifiable via control.

Note that the SANN method depends critically on the settings of the control parameters. It is not a general-purpose method but can be very useful in getting to a good value on a very rough surface.

Reference : Belisle, C.J., 1992. Convergence theorems for a class of simulated annealing algorithms on R d. Journal of Applied Probability, 29(4), pp.885-895.

# optimizing values for a,b using SANN analytical method
SANN_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "SANN")

# obtaining class of output
class(SANN_answer)

# length of output
length(SANN_answer)

# the outputs
SANN_answer$estimate # estimated values for a, b
SANN_answer$maximum # minimized function value 
SANN_answer$iterations  # no of iterations to succeed
SANN_answer$gradient # last gradient value which was calculated
SANN_answer$message # additional information
SANN_answer$hessian # hessian matrix
SANN_answer$code # indicates successful completion
SANN_answer$fixed # logical vector indicating which parameters are constants
SANN_answer$type # type of maximization
SANN_answer$last.step # list describing the last unsuccessful step
SANN_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           SANN_answer$estimate[1],SANN_answer$estimate[2])

CG method

Method CG is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak-Ribiere or Beale-Sorenson updates). Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems.

Reference : Fletcher, R. and Reeves, C.M., 1964. Function minimization by conjugate gradients. The computer journal, 7(2), pp.149-154.

# optimizing values for a,b using CG analytical method
CG_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "CG")

# obtaining class of output
class(CG_answer)

# length of output
length(CG_answer)

# the outputs
CG_answer$estimate # estimated values for a, b
CG_answer$maximum # minimized function value 
CG_answer$iterations  # no of iterations to succeed
CG_answer$gradient # last gradient value which was calculated
CG_answer$message # additional information
CG_answer$hessian # hessian matrix
CG_answer$code # indicates successful completion
CG_answer$fixed # logical vector indicating which parameters are constants
CG_answer$type # type of maximization
CG_answer$last.step # list describing the last unsuccessful step
CG_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           CG_answer$estimate[1],CG_answer$estimate[2])

NM method

NM is the abbreviation to Nelder and Mead method. According to the documentation it uses only function values and is robust but relatively slow. It will work reasonably well for non-differential functions.

Reference : Nelder, J.A. and Mead, R., 1965. A simplex method for function minimization. The computer journal, 7(4), pp.308-313.

# optimizing values for a,b using NM analytical method
NM_answer<-maxLik(logLik =formaxLik, start = c(0.1,0.2),method = "NM")

# obtaining class of output
class(NM_answer)

# length of output
length(NM_answer)

# the outputs
NM_answer$estimate # estimated values for a, b
NM_answer$maximum # minimized function value 
NM_answer$iterations  # no of iterations to succeed
NM_answer$gradient # last gradient value which was calculated
NM_answer$message # additional information
NM_answer$hessian # hessian matrix
NM_answer$code # indicates successful completion
NM_answer$fixed # logical vector indicating which parameters are constants
NM_answer$type # type of maximization
NM_answer$last.step # list describing the last unsuccessful step
NM_answer$control # see the documentation understand

# fitting the Beta-Binomial distribution with estimated shape parameter values
fitBetaBin(Alcohol_data$Days,Alcohol_data$week1,
           NM_answer$estimate[1],NM_answer$estimate[2])

Sumary of Time evalutation for different Analytical methods of maxLik function

Below table will compare the system process time for different analytical methods. In order to do this time comparison it is possible to use the benchmark function of rbenchmark package. Below mentioned code chunk provides the output in a table form which includes the analytical methods and their respective time values. The estimation process of the parameters where each method has been replicated 1000 times to receive a more accurate table for time values.

Table is in the ascending order for elapsed time column. It is evidently clear that SANN analytical method has taken the most time. Before that the analytical method BFGSR is in 5th place. While NM or Nelder Mead method has taken the least time. This time is calculated for 1000 replications of the function being repeated under same conditions. These times does not only reflect based on analytical method, rather on the Log Likelihood function that needs to be maximized, the data provided, the number of estimated that needs to be estimated, the complexity of the function and finally the computer’s processing power.

library(rbenchmark)

Results1<-benchmark(
          "NR"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "NR")},
          "BFGS"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "BFGS")},
          "BFGSR"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "BFGSR")},
          "SANN"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "SANN")},
          "CG"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "CG")},
          "NM"={maxLik(logLik=formaxLik, start = c(0.1,0.2), method = "NM")},
          replications = 100,
          columns = c("test","replications","elapsed",
                      "relative","user.self","sys.self"),
          order = 'elapsed'
          )

kable(Results1,"html",align = c('c','c','c','c','c','c')) %>%
  kable_styling(full_width=T,bootstrap_options=c("striped"),font_size = 14)%>%
  row_spec(0,color = "blue") %>%
  column_spec(1,color = "red")

	test	replications	elapsed	relative	user.self	sys.self
6	NM	100	1.60	1.000	1.56	0.00
2	BFGS	100	2.70	1.688	2.71	0.00
1	NR	100	3.28	2.050	3.25	0.02
5	CG	100	5.97	3.731	5.92	0.01
3	BFGSR	100	55.99	34.994	55.73	0.09
4	SANN	100	156.20	97.625	155.33	0.04

Summary of results after using the maxLik function for different analytical methods

Estimated the shape parameters a,b pair wise from the analytical methods NR, BFGS, BFGSR, SANN, CG and NM. These estimated parameters will be now used in the fitBetaBin function to find the expected frequencies, p-values and over-dispersion. The above measurements can be used to compare each analytical method for any significance difference.

Comparing p-values it is clear that all analytical methods generate the same value up-to third decimal point. This is not the case in Maximum Log Likelihood value where analytical methods NR, BFGS, CG and NM have obtained the value -813.4571, while BFGSR and SANN have shown -813.4576. Further, Over-dispersion values are similar until third decimal point, but after that there is a clear difference among all six methods.

All of the analytical methods have produced distinct values for estimated shape parameters of a and b. For the shape parameter a, similarity is only until second decimal point, and for shape parameter b, similarity of value is only on first decimal point.

BinomialRandomVariable	Frequency	NR	BFGS	BFGSR	SANN	CG	NM
0	47	54.62	54.62	54.72	54.78	54.62	54.61
1	54	42	42	41.98	42.06	42	42
2	43	38.9	38.9	38.85	38.93	38.9	38.91
3	40	38.54	38.54	38.47	38.55	38.54	38.54
4	40	40.07	40.07	40	40.06	40.07	40.07
5	41	44	43.99	43.93	43.97	44	44
6	39	53.09	53.09	53.06	53.03	53.09	53.09
7	95	87.78	87.8	87.98	87.76	87.78	87.77
Total No of Observations	399	399	399.01	398.99	399.14	399	398.99
p-value		0.0901	0.0903	0.0902	0.0903	0.0901	0.0902
Estimated a and b		a=0.7229428 b=0.5808488	a=0.7228919 b=0.5807283	a=0.7209896 b=0.5790360	a=0.7219066 b=0.5813484	a=0.7229403 b=0.5808469	a=0.7230707 b=0.5809894
Maximum Log Likelihood		-813.4571	-813.4571	-813.4576	-813.4576	-813.4571	-813.4571
Over Dispersion		0.434067	0.4340993	0.4347778	0.4341682	0.4340679	0.4340165

Final Conclusion

Now we can conclude the above findings using the tables provided, and it is clear that there is no strong change in expected frequencies, maximum Log Likelihood value or p-value if we use any one of the methods mentioned above. If time is crucial it is best to avoid BFGSR and SANN methods as they take considerable amount of time. I would recommend choose the analytical method from maxLik function based on your needs of output and research objective.

THANK YOU