Logistic regression using Firth's bias reduction:
a solution to the problem of separation in logistic regression

 

The phenomenon of separation is observed in the fitting process of a logistic regression model if the likelihood converges to a finite value while at least one parameter estimate diverges to (plus or minus) infinity. Separation primarily occurs in small or sparse samples with highly predictive covariates. The simplest case of separation is in the analysis of a 2x2 table with one zero cell count. Statistical software packages for logistic egression using the maximum likelihood method cannot appropriately deal with this problem. Exact solutions exist but require special software and are not applicable if continuous covariates have to be analysed. A bias reduction method originally proposed by Firth (1993) has been proved suggested as an ideal solution to solve the separation problem by Heinze and Schemper (2002). It has been shown that unlike the standard maximum likelihood method, this method always leads to finite parameter estimates. An extensive simulation study can be found in a Technical Report (Heinze, 1999). A recently published study compares the method with exact logistic regression by means of analysis of some small-sample real-life data sets in which separation or a situation close to separation is present (Heinze, 2006). The application of Firth's bias reduction to logistic regression was also recently proposed by Bull et al (2002, 2007) and Heinze and Schemper (2006).

We developed a SAS macro and an SPLUS library to make this method available from within one of these widely used statistical software packages (Heinze and Ploner, 2003). Our programs are also capable of performing interval estimation based on profile penalized log likelihood (PPL) and of plotting the PPL function as was suggested by Heinze and Schemper (2002). The SAS macro was revised in March 2005 using The SAS System for Windows 9.1, and again improved in March and September 2006. The SPLUS library was written in SPLUS 4.0 and has been updated for SPLUS 6 by Harry Southworth, Cheshire, UK. It has also been subjected to a revision in March 2005. An R package is available at CRAN or can be downloaded here (version 1.06, April 2006). All three programs are documented in a Technical Report (Heinze and Ploner, 2004)

Another SAS macro CFL applies Firth's correction to conditional logistic regression. It can be used for any sparse data analyses of clustered data with binary outcomes, such as matched case-control studies, or studies including a nuisance random effect. The application of Firth's correction to conditional logistic regression is presented at the 2008 ISCB conference.

Please note that users of SAS version 9.2 can apply Firth's correction by specifying the option FIRTH in the model statement of PROC LOGISTIC. Profile penalized likelihood confidence intervals can be computed by specifying CLODDS=PL in combination with the FIRTH option. However, PROC LOGISTIC does not provide corresponding p-values from penalized likelihood ratio tests (as does our fl macro). The FIRTH option cannot be combined with a STRATA option. For conditional logistic regression, you must therefore use the above mentioned CFL macro.

 

References (chronological order)

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38.
Heinze, G. (1999). Technical Report 10/1999: The application of Firth's procedure to Cox and logistic regression. Section of Clinical Biometrics, Department of Medical Computer Sciences, University of Vienna, Vienna.
Bull, S., Mak, C., and Greenwood, C.M.T. (2002). A modified score function estimator for multinomial logistic regression in small samples. Computational Statistics and Data Analysis, 39, 57-74.
Heinze, G. and Schemper, M. (2002). A Solution to the Problem of Separation in logistic regression. Statistics in Medicine, 21, 2409 - 2419.
Heinze, G. and Ploner, M. (2003). Fixing the nonconvergence bug in logistic regression with SPLUS and SAS.Computer Methods and Programs in Biomedicine, 71, 181-187.
Heinze, G. and Ploner, M. (2004). Technical Report 2/2004: A SAS macro, S-PLUS library and R package to perform logistic regression without convergence problems. Section of Clinical Biometrics, Department of Medical Computer Sciences, Medical University of Vienna, Vienna.
Heinze, G. and Schemper, M. (2006). Letter Re: A permutation test for inference in logistic regression with small- and moderate-sized data sets. Statistics in Medicine, 25, 719.
Heinze, G. (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statistics in Medicine, 25, 4216-4226.
Bull, S., Lewinger, J.P., and Lee, S. (2007). Confidence intervals for multinomial logistic regression in sparse data. Statistics in Medicine, 26, 903-918.

 

Download area

All programs offered here are free of charge. However, before downloading the program, please register here. The packages are ZIP-archives containing required program files, user's guides and installation instructions.


SPLUS version:
Your name:
Your e-mail address:
Your institution:


R package:
Your name:
Your e-mail address:
Your institution:


SAS version:
Your name:
Your e-mail address:
Your institution:


SAS macro CFL for conditional logistic regression:
Your name:
Your e-mail address:
Your institution:

Last changes Thursday, August 14, 2008