Biostatistics; Confounding Factors (Epidemiology); Prognosis; Regression Analysis
My primary research focuses on biostatistical regression modeling strategies for prediction and estimation of effects of exposures on outcomes, particularly when sample sizes are small or outcome events are rare. My secondary research focus is the re-use of health data for medical research, particularly when sample sizes are very large, as in nationwide studies on health insurance claims. I am also interested in providing statistical software for routine application of our methodological developments. I have collaborated as biostatistical partner in several EU-funded projects. I am interested in analysis methods for observational studies and member of the steering group of the international STRATOS initiative (Strengthening Analytical Thinking for Observational Studies). In STRATOS, I also chair the topic group 2 on Selection of Variables and Functional Forms in Multivariable Analysis.
Techniques, methods & infrastructure
In regression modeling, we focus on penalized likelihood techniques for prediction and effect estimation. In several research projects, we have in particular investigated the Firth penalty (Jeffreys prior) as a solution to the problem of non-existence of regression coefficients estimated by maximum likelihood in various risk models. Other research has focused on algorithmic variable selection methods, on analyses of high-dimensional (omics) data or on the optimization of prediction models.
In evaluating new methodology, we use simulation studies, by which we can learn how methods perform under various conditions. In our simulation studies, we always try to define scenarios that are likely to be encountered in real-life data analysis. While in single-data set analyses the underlying population is usually unknown, simulation studies have the advantage that the population properties are defined by the experimentator, enabling the generalizability of results.
- SCOUT - Supporting Causal Conclusions from Observational Survival Studies (2018)
Source of Funding: EU, H2020-MSCA-IF-2017
Coordinator of the collaborative project
- CaReSyAn - Combatting the CardioRenal Syndrome: towards an integrative analysis to reduce cardiovascular burden in chronic kidney disease (2017)
Source of Funding: EU, ITN
- Predicting rare events more accurately (PREMA) (2015)
Source of Funding: FWF (Austrian Science Fund), Joint projects
- Heinze, G., Wallisch, C. & Dunkler, D., 2018. Variable selection - A review and recommendations for the practicing statistician. Biometrical Journal. Available at: http://dx.doi.org/10.1002/bimj.201700067.
- Mansournia, M.A. et al., 2017. Separation in Logistic Regression - Causes, Consequences, and Control. American Journal of Epidemiology. Available at: http://dx.doi.org/10.1093/aje/kwx299.
- Eichinger, S. et al., 2010. Risk Assessment of Recurrence in Patients With Unprovoked Deep Vein Thrombosis or Pulmonary Embolism: The Vienna Prediction Model. Circulation, 121(14), pp.1630-1636. Available at: http://dx.doi.org/10.1161/CIRCULATIONAHA.109.925214.
- Heinze, G. et al., 2009. Mortality in renal transplant recipients given erythropoietins to increase haemoglobin concentration: cohort study. BMJ, 339(oct23 1), pp.b4018-b4018. Available at: http://dx.doi.org/10.1136/bmj.b4018.
- Heinze, G. & Schemper, M., 2002. A solution to the problem of separation in logistic regression. Statistics in Medicine, 21(16), pp.2409-2419. Available at: http://dx.doi.org/10.1002/sim.1047.