Skip to main content
European Network of Centres for Pharmacoepidemiology and Pharmacovigilance

Chapter 6: Methods to address bias and confounding

6.1. Bias

6.1.1. Selection bias

Selection biases are distortions that result from procedures used to select subject and from factors that determine study participation. The common element of such biases is that the relation between exposure and disease is different for those who participate and for all those who should have been theoretically eligible for study, including those who do not participate. Because estimates of effect are conditioned on participation, the associations observed in a study represent a mix of forces that determine participation” (Greenland, Lash. Modern Epidemiology. 3rd edition). Lack of representativeness of exposure or outcome pattern alone is not sufficient to cause selection bias. Examples of common selection biases are prevalence bias, self-selection bias, and referral bias.

Prevalence bias may occur when prevalent drug users are included in an observational study, i.e., patients already taking a therapy for some time before study follow-up began. This can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if the risk varies with time, as seen in safety studies with unwanted exclusion from a safety assessment of persons discontinuing treatments following early adverse reactions (‘depletion of susceptibles’). An illustrative example is the comparison between users of third and older generations of oral contraceptives regarding the risk of venous thrombosis where the association for the third generation was initially overestimated due to the heathy user bias in persons taking older generation contraceptives (see The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update 1999;5(6):707-20). Secondly, covariates for drug use at study entry are often influenced by the previous intake of the drug.

Self-selection in epidemiological studies may introduce selection bias and influence the validity of study results.

Referral bias occurs when patients with an abnormal test result are referred to a medical specialist at a higher rate than are patients with normal test results. In Clinical implications of referral bias in the diagnostic performance of exercise testing for coronary artery disease (J Am Heart Assoc. 2013;2(6):e000505), it was shown that exercise echocardiography and myocardial perfusion imaging are considerably less sensitive and more specific for coronary artery disease after adjustment for referral bias.

The article Collider bias undermines our understanding of COVID-19 disease risk and severity (Nat Commun. 2020;11(1):5749) describes a selection bias where a variable (a collider) is influenced by two other variables, for example when an exposure (being a healthcare worker) and an outcome (severity of COVID-19 infection) both affect the variable determining the likelihood of being sampled (presence of PCR testing or hospitalisation). A bias would arise when the analysis includes only those people who have experienced an event such as hospitalisation with COVID-19, been tested for active infection or who have volunteered their participation. Among hospitalised patients, the relationship between any exposure that relate to hospitalisation and the severity of infection would be distorted compared to the general population. The article proposes methods for detecting and minimising the effects of collider bias. Vaccine side-effects and SARS-CoV-2 infection after vaccination in users of the COVID Symptom Study app in the UK: a prospective observational study (Lancet Infect Dis. 2021;S1473) discusses that collider bias would occur in the study if both vaccination status and COVID-19 positivity influenced the probability of participation in the study. However, it is believed that collider bias was unlikely to underlie the reduction in infections following vaccination seen in the data given that strong reductions in COVID-19 hospitalisations after vaccination were observed in other nationwide studies.

Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;kwab028) illustrates selection bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection and how to address them at the analysis and design stages.

Mitigating selection bias at the analysis stage

Once they have occurred, selection biases cannot be removed at the analysis stage if the factors responsible for the selection are not known or not measured. In some circumstances, it may be possible to restrict the study population by including only groups where the selection did not operate. For example, a prevalence bias may be removed by restricting the analysis to incident drug users, i.e., patients enter the study cohort only at the start of the first course of the treatment of interest (or of different treatment groups) during the study period. Consequences may include reduced precision of estimates due to lower sample size and likely reduction in the number of patients with long-term exposure. In circumstances where the factors influencing the selection are known and have been accurately measured, they can be treated as confounding factors and adjusted for at the analysis stage.

Mitigating selection bias at the design stage

The impact of selection biases should therefore be best avoided or minimised with proper consideration at study design. The new user (incident user) design helps mitigate selection bias by alleviate healthy user bias for preventive treatments in some circumstances (see Healthy User and Related Biases in Observational Studies of Preventive Interventions: A Primer for Physicians. J Gen Intern Med 2011;26(5):546-50). The article Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol. 2003;158 (9):915–20) defines new user designs in cohort and case-control settings. The articles The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep. 2015;2(4):221-28) and New-user designs with conditional propensity scores: a unified complement to the traditional active comparator new-user approach (Pharmacoepidemiol Drug Saf. 2017;26(4):469-7) extend the discussion to studies with active comparators. One should be aware of the difference between a new user (which requires absence of prior use of a given drug/drug class during a prespecified washout period) and a treatment-naïve user (which requires absence of prior treatment for a given indication). A treatment-naïve status may not be ascertainable in left-truncated data.

The active comparator new user design (see Chapter 4.3.2) would ideally compare two treatments that are marketed contemporaneously. However, a more common situation is where a recently marketed drug is compared with an older established alternative. For such situations, the article Prevalent new-user cohort designs for comparative drug effect studies by time-conditional propensity scores (Pharmacoepidemiol Drug Saf. 2017;26(4):459-68) introduces a cohort design allowing identification of matched subjects using the comparator drug at the same point in the course of disease as the (newly marketed) drug of interest. The design utilises time-based and prescription-based exposure sets to compute time-dependent propensity scores of initiating the new drug.

Observational studies of treatment effectiveness: worthwhile or worthless? (Clin Epidemiol. 2018;11:35-42) discusses how researchers can mitigate the risk of bias in the cohort design by presenting a case of the comparative effectiveness of two antidiabetic treatments using data collected during routine clinical practice.

The use of case-only designs can also reduce selection bias if the statistical assumptions of the method are fulfilled (see Chapter 4.2.3).

6.1.2. Information bias

Information bias (misclassification) arises when incorrect information about either exposure or outcome or any covariates is collected in the study or if variables are incorrectly categorised. Different factors may cause information bias. Chapter 4.3. discusses errors in definition, measurement and classification of variables and how to address them. Errors may also occur in the study design and method for data collection. Examples are the recall bias occurring in case-control studies where cases and controls can have different recall of their past exposures (see Recall bias in epidemiologic studies (J Clin Epidemiol. 1990;43(1):87-9), as well as the protopathic bias and surveillance or detection bias which are described below.

Protopathic bias

Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias, also called reverse causation, thus reflects a reversal of cause and effect (see Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65(22):2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies (see Cancer Incidence after Initiation of Antimuscarinic Medications for Overactive Bladder in the United Kingdom: Evidence for Protopathic Bias, Pharmacotherapy. 2017;37(6):673-83.)

Protopathic bias has also been described as a selection bias and it should not be confused with confounding by indication, i.e., when a variable is a risk factor for a disease among non-exposed subjects and is associated with the exposure of interest in the population from which the cases derive, without being an intermediate step in the causal pathway between the exposure and the disease (see Confounding by Indication: An Example of Variation in the Use of Epidemiologic Terminology, Am J Epidemiol. 1999;149(11):981-3).

Mitigating protopathic bias at the analysis stage

Protopathic bias may be handled by including a time-lag, (i.e., by disregarding all exposure during a specified period of time before the index date) or by restricting the analysis to cases in which the absence of relation of start of treatment to the symptoms of the outcomes is documented. Both of these methods are used in Long-Term Risk of Skin Cancer and Lymphoma in Users of Topical Tacrolimus and Pimecrolimus: Final Results from the Extension of the Cohort Study Protopic Joint European Longitudinal Lymphoma and Skin Cancer Evaluation (JOELLE) (Clin Epidemiol. 2021;13:1141-53).

Surveillance bias (or detection bias)

Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or because of an associated symptom. For example, post-menopausal exposure to oestrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between oestrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).

Mitigating surveillance bias at the design stage

This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. These issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA 2011;305(23):2462-3).

6.1.3. Time-related bias

Immortal time bias

Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur (K. Rothman, S. Greenland, T. Lash. Modern Epidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008).

Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf. 2007;16(3):241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this type of bias.

Immortal Time Bias in Pharmacoepidemiology (Am J Epidemiol 2008;167(4):492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions, illustrated with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases to the duration of immortal time. It is asserted that immortal time bias arises by conditioning on future exposure and that it can be avoided by analysing the data as if the exposures and outcomes were included as they developed, without ever looking into the future. Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021; 190(8):1452-6) illustrates immortal time bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection.

Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162(10):1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects are excluded from the analysis and the study allows effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ. 2010; 340:b5087) describes how immortal time in observational studies can bias the results in favour of the treatment group and how it can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias in pharmacoepidemiology’ (Am J Epidemiol 2009;170(5):667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.

Emulating Target Trials to Avoid Immortal Time Bias - An Application to Antibiotic Initiation and Preterm Delivery - PubMed (nih.gov) (Epidemiology. 2023;34(3):430-438) describes how a sequence of target trial emulations (see Chapter 4.2.6) can be used to avoid immortal time bias in situations where only few individuals start treatment at a certain time point.

Other forms of time-related bias

In many database studies, drugs administered during hospitalisations are unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure, especially when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol. 2008;168(3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.

In case-control studies assessing chronic diseases with multiple hospitalisations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias, as also shown in Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol. 2008;168(3):329-35).

In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g., diabetes), which can induce confounding of the association with an outcome (e.g., cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).

Time-related biases in pharmacoepidemiology (Drug Saf. 2020;29(9):1101-10) further discusses several time-related biases and illustrates their impact on the effects of different COPD treatments on lung cancer, acute myocardial infarction and mortality outcomes, in studies using electronic healthcare databases. Protopathic, latency, immortal time, time-window, depletion of susceptibles, and immeasurable time biases were shown to significantly impact the effects of the study drugs on the outcomes.

Mitigating time-related bias at the design stage

Immortal time bias and other time-related biases such as prevalent bias can be avoided by emulation of a target trial, as this approach aligns assessment of eligibility and baseline information with start of follow-up (see Chapter 4.2.6).

6.2. Confounding

Confounding occurs when the estimate of measure of association is distorted by the presence of another risk factor. For a variable to be a confounder, it must be associated with both the exposure and the outcome, without being in the causal pathway.

6.2.1. Confounding by indication

Confounding by indication refers to a determinant of the outcome parameter that is present in people at perceived high risk or poor prognosis and is an indication for intervention. This means that differences in care between the exposed and non-exposed, for example, may partly originate from differences in indication for medical intervention such as the presence of specific risk factors for health problems. Another name for this type of confounding is ‘channeling’. Confounding by severity is a type of confounding by indication, where not only the disease but its severity acts as confounder (see Confounding by Indication: An Example of Variation in the Use of Epidemiologic Terminology, Am J Epidemiol. 1999;149(11):981-3).

This type of confounding has frequently been reported in studies evaluating the efficacy of pharmaceutical interventions and is almost always encountered in various extents in pharmacoepidemiological studies. A good example can be found in Confounding and indication for treatment in evaluation of drug treatment for hypertension (BMJ. 1997;315:1151-4).

With the more recent application of pharmacoepidemiological methods to assess effectiveness, confounding by indication is a greater challenge and the article Approaches to combat with confounding by indication in observational studies of intended drug effects (Pharmacoepidemiol Drug Saf. 2003;12(7):551-8) focusses on its possible reduction in studies of intended effects. An extensive review of these and other methodological approaches discussing their strengths and limitations is discussed in Methods to assess intended effects of drug treatment in observational studies are reviewed (J Clin Epidemiol. 2004;57(12):1223-31).

An example of how results from a sensitivity analysis can differ from the main analysis and point towards confounding by indication is presented in First-dose ChAdOx1 and BNT162b2 COVID-19 vaccines and thrombocytopenic, thromboembolic and hemorrhagic events in Scotland (Nat Med. 2021; 27(7):1290-7), where the authors highlight the possibility of residual confounding by indication and perform a post-hoc self-controlled case series to adjust for time-invariant confounders.

6.2.2. Unmeasured confounding

Complete adjustment for confounders would require detailed information on clinical parameters, lifestyle or over-the-counter medications, which are often not measured in electronic healthcare records, causing residual confounding bias. Using directed acyclic graphs to detect limitations of traditional regression in longitudinal studies (Int J Public Health 2010;55(6):701-3) reviews confounding and intermediate effects in longitudinal data and introduces causal graphs to understand the relationships between the variables in an epidemiological study.

Unmeasured confounding can be adjusted for only through randomisation. When this is not possible, as most often in pharmacoepidemiological studies, the potential impact of residual confounding on the results should be estimated and considered in the discussion.

Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics (Pharmacoepidemiol Drug Saf. 2006;15(5):291-303) provides a systematic approach to sensitivity analyses to investigate the impact of residual confounding in pharmacoepidemiological studies that use healthcare utilisation databases. In this article, four basic approaches to sensitivity analysis were identified: (1) sensitivity analyses based on an array of informed assumptions; (2) analyses to identify the strength of residual confounding that would be necessary to explain an observed drug-outcome association; (3) external adjustment of a drug-outcome association given additional information on single binary confounders from survey data using algebraic solutions; (4) external adjustment considering the joint distribution of multiple confounders of any distribution from external sources of information using propensity score calibration. The paper concludes that sensitivity analyses and external adjustments can improve our understanding of the effects of drugs in epidemiological database studies. With the availability of easy-to-apply spreadsheets (e.g., at https://www.drugepi.org/dope/software#Sensitivity), sensitivity analyses should be used more frequently, substituting qualitative discussions of residual confounding.

The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study (Am J Epidemiol. 2007;166(6):646–55) considers the extent and patterns of bias in estimates of exposure-outcome associations that can result from residual or unmeasured confounding, when there is no true association between the exposure and the outcome. Another important finding of this study was that when confounding factors (measured or unmeasured) are interrelated (e.g., in situations of confounding by indication), adjustment for a few factors can almost completely eliminate confounding.

6.2.3. Methods to address confounding

Methods to address confounding include case-only designs (see Chapter 4.2.3) and use of an active comparator (see Chapter 4.3.2). Other methods are detailed hereafter.

6.2.3.1. Disease risk scores

An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol. 1976;104(6):609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.

DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res. 2009;18(1):67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf. 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows that large variation exists with differences in terminology and methods used.

In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf. 2012;21 Suppl 2:138–47), it is argued that DRS may have a place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology 2016;27(1):133-42).

6.2.3.2. Propensity scores

Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.

A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol. 1999;150(4):327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.

The estimated EPS summarises all measured confounders in a single variable and thus can be used in the analysis, as any other confounder, for matching, stratification, weighting or as a covariate in a regression model to adjust for the measured confounding. A description of these methods can be found in the following articles: An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies (Multivariate Behav Res. 2011;46(3):399-424), Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality (Multivariate Behav Res. 2011;46(1):119-51) and Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies (Stat Med. 2015;34(28):3661-79).

Propensity score matching in cohort studies is frequently done 1:1, which, while allowing for selection of the best match for each member of the exposed cohort, may lead to severe depletion of the study population and the associated lower precision, especially when coupled with trimming. Increasing the matching ratio may increase precision but also negatively affect confounding control. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf. 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.

Use of EPS for stratification or weighing overcomes the precision-related limitation of matching-based methods, allowing use of a larger proportion of the study population in the analysis. Fine stratification approach is based on defining large number (50 or 100) number of EPS strata, as described in A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent (Epidemiology 2017;28(2):249-57).

High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiology 2009;20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol. 2011;173(12):1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf. 2012;20(8):849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf. 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS. High-dimensional propensity scores for empirical covariate selection in secondary database studies: Planning, implementation, and reporting (Pharmacoepidemiol Drug Saf. 2023;32(2):93-106) provides an ISPE-endorsed overview of the hd-PS approach and recommendations on the planning, implementation, and reporting of hd-PS used for causal treatment-effect estimations in longitudinal healthcare databases. It contains a checklist with key considerations as a supportive decision tool to aid investigators in the implementation and transparent reporting of hd-PS techniques, and to aid decision-makers unfamiliar with hd-PS in the understanding and interpretation of studies using this approach.

The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf. 2011;20(11):1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2014;23(8):802-11). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol. 2010;172(7):843-54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.

Performance of propensity score calibration-–a simulation study (Am J Epidemiol. 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.

Principles of variable selection for inclusion in EPS are described, for example, in Variable selection for propensity score models (Am J Epidemiol. 2006;163(12):1149-56) and in Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study (Pharmacoepidemiol Drug Saf. 2013;22(1):77-85).

Although in most situations, propensity score models, with the possible exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in drug safety research.

An important limitation of PS is that it is not directly amenable for case-control studies. A critical assessment of propensity scores is provided in Propensity scores: from naive enthusiasm to intuitive understanding (Stat Methods Med Res. 2012;21(3):273-93). Semiautomated and machine-learning based approaches to propensity score methods are currently being developed (Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects (Clin Epidemiol 2018;10:771-88).

6.2.3.3. Instrumental variables

An instrumental variable (IV) is defined in Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf. 2010; 19(6):537-54) as a factor that is assumed to be related to treatment but is neither directly nor indirectly related to the study outcome. An IV should fulfil three assumptions: (1) it should affect treatment or be associated with treatment by sharing a common cause; (2) it should be a factor that is as good as randomly assigned so that it is unrelated to patient characteristics, and (3) it should be related to the outcome only through its association with treatment. This article also presents a practical guidance on IV analyses in pharmacoepidemiology. The article Instrumental variable methods for causal inference (Stat Med. 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.

IV analysis is an approach to address uncontrolled confounding in comparative studies. An introduction to instrumental variables for epidemiologists (Int J Epidemiol. 2000;29(4):722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumptions could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).

The complexity of the issues associated with confounding by indication, channeling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which was at odds with results from clinical trials. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2016;25 Suppl 1:132-41).

An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf. 2014;23(2):165-77) demonstrates that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.

Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol. 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options.

6.2.3.4. Prior event rate ratios

Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) between the exposed and unexposed from periods before and after initiation of a drug exposure, as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf. 2008;17(7):671-685). For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.

Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2015(5);24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided.

6.2.3.5. Handling time-dependent confounding in the analysis

In longitudinal studies, the value of covariates may change and be measured over time. These covariates are time-dependent confounders if they are affected by prior treatment and predict the future treatment decision and future outcome conditional on the past treatment exposure (see Comparison of Statistical Approaches Dealing with Time-dependent Confounding in Drug Effectiveness Studies, Stat Methods Med Res. 2016). Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation.

G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol. 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure.

The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11(5):550-60) introduces a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding. MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11(5):561-70).

Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol. 2003;158(7):687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta2-agonist use and the risk of coronary heart disease (Eur J Epidemiol. 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression.

6.2.3.6. The trend-in-trend design

The Trend-in-trend Research Design for Causal Inference (Epidemiology 2017;28(4):529-36) presents a semi-ecological design, whereby trends in exposure and in outcome rates are compared in subsets of the population that have different rates of uptake for the drug in question. These subsets are identified through PS modelling. There is a formal framework for transforming the observed trends into an effect estimate. Simulation and empirical studies showed the design to be less statistically efficient than a cohort study, but more resistant to confounding. The trend-in-trend method may be useful in settings where there is a strong time trend in exposure, such as a newly approved drug.

6.3. Missing data

Missing data (or missing values) are defined as data value(s) that are not available for a variable in the data source of interest for a given analysis, hence are not observed. Missing data may also arise from attrition bias, non-response or poorly designed protocols. Missing data is an error as the data does not represent the true value of what is set out to be measured.

6.3.1. Impact of missing data

Missing data are a common issue in both clinical trial and observational data, and can have significant consequences on the conclusions that can be drawn from the results of an analysis for the following reasons: 1) the absence of data reduces statistical power, which refers to the probability that the test will reject the null hypothesis when it is false; 2) the unobservable data can introduce bias and increase uncertainty in the estimation of the model parameters; 3) it can reduce the representativeness of the sample; 4) it may complicate the analyses as it may render the completeness of data different between variables. Each of these elements can lead to invalid conclusions. Whether these issues are applicable to the dataset under study depends on the type of missing data (i.e., missing data mechanism).

6.3.2. Missing data mechanisms

When missing data is present, choosing the right statistical methods and making inferences are more complex, as assumptions about the processes that create missing data need to be made explicitly.

Missing data assumptions are classified into 3 categories, depending on the relationship between the unobserved values and the probability of missingness:

  • Missing completely at random (MCAR): there are no systematic differences between the distribution of the missing values and the observed values. Missingness is unrelated to any variable in the analysis, including the variable with missing data itself. This is the most restrictive mechanism, but rather unrealistic.

  • Missing at random (MAR): any systematic difference between the missing and observed values for a given variable can be explained by differences in other variables of the observed data. Missingness is associated with those variables, but not with the variable with missing data itself. This mechanism may be more realistic in some real-world settings.

  • Missing not at random (MNAR): even after the observed data are taken into account, systematic differences remain between the missing values and the observed values. Missingness depends on the unobserved values of the variable with missing data itself.

Assumptions on missing data mechanisms determine the type of analysis that would be possible. In general, it is not possible to distinguish between these 3 mechanisms based on the observed data alone. In order words, missing data assumptions in general cannot be tested or verified. The distinction between MCAR and MAR could be made based on the observed data, but subject matter expertise and knowledge about the data collection process are needed to justify the assumption of data being MCAR or MAR. It is however not feasible to assess MAR versus MNAR based on the observed data.

6.3.3. Methods for handling missing data

Some simple solutions exist, but they generally lead to misleading inferences if the underlying assumptions on mechanisms of missingness are not valid, and they should be avoided. Examples include single imputation methods such as carrying forward the last observation in longitudinal analyses or mean substitution. Complete case analysis (CCA), i.e., removing all records with missing data, is only valid in certain circumstances, e.g., if the missing data is MCAR. Even in these circumstances, CCA will result in loss of power and increased uncertainty in the estimated parameters.

Therefore, it is advised to use other statistical methods to handle missing data, such as multiple imputation (Multiple Imputation and its Application, Wiley 2013, ISBN:9780470740521) or inverse probability weighting (Review of inverse probability weighting for dealing with missing data, Statistical Methods in Medical Research 2013;22:278-95).The choice of such statistical methods will depend on the assumed missing data mechanism.

If the missing data can be assumed to be MCAR or MAR, the Fully Conditional Specification (FCS), described in Flexible Imputation of Missing Data (Van Buuren S. 2nd ed. Chapman and Hall/CRC 2018, 10.1201/9780429492259), is a commonly used approach. MI utilises observed data to predict the value of missing data points, generating multiple complete data sets, performing analyses on each imputed data set, and then averaging the results.

If the missing data are assumed to be MNAR, most common statistical analysis methods are not appropriate, and would lead to biased results. There are methods to handle MNAR data, which depend on different assumptions or incorporate more specific knowledge about the missingness mechanism. One example is the not-at-random fully conditional specification (NARFCS) as described in On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice (Stat Med. 2018, 37(15): 2338–53, 10.1002/sim.7643).

Multiple imputation (MI) methods  such as Pattern Mixture Models can be used to implement any missing data assumption (Multiple Imputation and its Application, Wiley 2013, ISBN:9780470740521).

It is important, as explained in The proportion of missing data should not be used to guide decisions on multiple imputation (J Clin Epidemiol. 2019;110:63-73), that the amount of missing data does not decide on the right MI method. In general, it is desirable to understand how sensitive to missing data assumptions are the conclusions drawn from the data, as well as to the particular method used to handle missing values. To investigate this, it is helpful to perform sensitivity analyses exploring how inferences vary under various mechanism assumptions and under various approaches.

A practice sometimes used is to create a category of the variable, or an indicator, for the missing values; however, this should be avoided. This practice can be invalid even if the data are missing completely at random, see Indicator and Stratification Methods for Missing Explanatory Variables in Multiple Linear Regression (J Am Stat Assoc. 1996;91(433):222-30) and Missing data in epidemiological studies (In Armitage P, Colton T, eds. Encyclopedia of biostatistics. Wiley, 1998: 2641-2654.).

A concise review of methods to handle missing data is provided in the book Statistical analysis with missing data (Little RJA, Rubin DB. 3rd ed., Wiley 2019). The section ‘Handling of missing values’ in Modern Epidemiology, 4th ed. (T. Lash, T. VanderWeele, S. Haneuse, K.Rothman. Wolters Kluwer, 2020) is a summary of the state of the art, focused on practical issues for epidemiologists.

Other useful references on handling missing data include the books Multiple Imputation for Nonresponse in Surveys (Rubin DB, Wiley, 2004) and Analysis of Incomplete Multivariate Data (Schafer JL, Chapman & Hall/CRC, 1997), and the articles A comparison of multiple imputation methods for missing data in longitudinal studies (BMC Med Res Methodol. 2018;18(1):168), Using the outcome for imputation of missing predictor values was preferred (J Clin Epi. 2006;59(10):1092-101), and Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data (Stat Med. 2014;33(21):3725-37).

The article Framework for the treatment and reporting of missing data in observational studies: The Treatment and Reporting of Missing data in Observational Studies framework (J Clin Epi. 2021;134:79-88) focuses on missing data in non-interventional studies and provides a framework on both analysis and reporting of study results relying on incomplete data.

6.3.4. Statistical software

Many statistical procedures in standard software automatically eliminate subjects with missing data. However, a wide range of statistical software is currently available to impute missing data, mainly focusing on Multiple Imputation (MI) methods when missing data is assumed to be MAR, such as The MI Procedure of the SAS Institute. Multiple imputation of missing values (Stata J. 2004;4:227-41), and mice: Multivariate Imputation by Chained Equations in R (J Stat Soft. 2011;45(3)). A good overview of available software packages is provided in Missing data: A statistical framework for practice (Biom J. 2021;63(5): 915-47). Software tools in SAS and R for multiple imputation of missing data under MAR and MNAR have also been made available by the Drug Information Association Scientific Working Group on Estimands and Missing Data.

6.4. Triangulation

Triangulation is not a separate methodological approach, but rather a research paradigm aiming to enhance the confidence in inferred causal relationships. Triangulation in aetiological epidemiology (Int J Epidemiol. 2016;45(6):1866-86) defines triangulation as “the practice of obtaining more reliable answers to research questions through integrating results from several different approaches, where each approach has different key sources of potential bias that are unrelated to each other.” Triangulation differs from replication by explicitly choosing data sources/data collection approaches, study designs and/or analytical approaches with different bias structures.

In Triangulation of pharmacoepidemiology and laboratory science to tackle otic quinolone safety (Basic Clin Pharmacol Toxicol. 2022;Suppl 1:75-80), laboratory studies using cell culture and rodent models were complemented with real-world data from pharmacoepidemiological studies to translate mechanistic findings and corroborate real-world evidence. In Identifying Antidepressants Less Likely to Cause Hyponatremia: Triangulation of Retrospective Cohort, Disproportionality, and Pharmacodynamic Studies (Clin Pharmacol Ther. 2022; 111(6):1258-67), analyses of three different types of data with their respective analyses are presented: a retrospective cohort study, a disproportionality analysis of patients in the Japanese Adverse Drug Event Report database, and a pharmacodynamic study examining the binding affinity for serotonin transporter.

Triangulation does not require the use of different data sources and can readily be employed in studies using electronic healthcare data, which allow investigators to use a multitude of study designs and analytical approaches. For example, in Prenatal Antidepressant Exposure and the Risk of Attention-deficit/Hyperactivity Disorder in Childhood: A Cohort Study With Triangulation (Epidemiology. 2022;33(4):581-592), a negative control analysis, a sibling analysis, and a former-user analysis were used to triangulate results.

In recent years, the use of genetic tools has become popular for the investigation of drug effects. The complementary application of drug target mendelian randomisation and colocalisation analyses can provide another layer of genetic evidence for causality, as demonstrated by Genetically proxied therapeutic inhibition of antihypertensive drug targets and risk of common cancers: A mendelian randomization analysis (PLoS Med. 2022 Feb 3;19(2):e1003897). It is recommended to use triangulation methods and formalise sensitivity analyses using a priori specification of potential biases and their (assumed) directions in the main analysis and by performing sensitivity/triangulation analyses explicitly addressing these biases.