Skip to main content
European Network of Centres for Pharmacoepidemiology and Pharmacovigilance

Chapter 3: Development of the study protocol

The study protocol is the core document of a study, to be developed as a key step in any study once the research question has been clearly defined. It is strongly recommended to assess feasibility of answering the research question ahead of developing the protocol (see Chapter 2). The final version must precisely describe all study objectives and design characteristics to ensure reproducibility of the study. The protocol should be amended as needed, and amendments should be justified.

The GVP Module VIII - Post-authorisation safety studies (PASS) has been available since 2012. For PASS described in this module, the Commission Implementing Regulation (EU) No 520/2012 provides legal definitions of the start of data collection (the date from which information on the first study subject is first recorded in the study dataset, or, in the case of secondary use of data, the date from which data extraction starts) and end of data collection (the date from which the analytical dataset is completely available). These dates provide a timeline supporting the planning of the overall study and the submission of the final study report to competent authorities. Module VIII of the GVP also details the required format of protocols, abstracts and final study reports for imposed PASS. Based on these requirements, the EMA published detailed templates for the protocol and final study report which it recommends to be used for all PASS, including meta-analyses and systematic reviews. Although these templates have been developed to address research questions related to the safety of medicinal products, they can be applied to any type of pharmacoepidemiological study.

Derived from an international consensus, the HARPER protocol template (HARmonized Protocol Template to Enhance Reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: A good practices report of a joint ISPE/ISPOR task force, Pharmacoepidemiol Drug Saf. 2023;32(1):44-55) became available in 2023 to guide structure and content of real-world evidence (RWE) study protocols, with a focus on providing information on operational study parameters used to create analytical datasets from the data collected to address the study objectives. It can serve as a tool to promote transparency, reproducibility and harmonisation of non-interventional study protocols and can facilitate design and assessment of high-quality protocols. HARPER is compatible with the legal format and content of GVP Module VIII and can be used in PASS protocols (or the protocol of any pharmacoepidemiological study) without change of structure.

The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP) provides guidance on contents of a pharmacoepidemiology study protocol and the different contents to be covered. It states that the protocol should include a description of the data quality and integrity, including abstraction of original documents, extent of source data verification, and validation of endpoints. The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets includes a description of the design elements that should be addressed, including the choice of data sources and study populations, the study design and statistical analyses. The ENCePP Checklist for Study Protocols seeks to stimulate researchers to consider important epidemiological concepts when designing a pharmacoepidemiological study and writing a study protocol. The Agency for Healthcare Research and Quality (AHRQ) published Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide (2013) including best practice, principles and checklists on a wide range of topics that are also applicable to observational studies outside the scope of comparative effectiveness research.

A key component for design transparency, also included in the HARPER template, is visualisation through study design diagrams. Graphical Depiction of Longitudinal Study Designs in Health Care Databases (Ann Intern Med. 2019;19;170(6):398-406) provides a simple framework to help understanding how the design will be implemented, especially in relation to the definition of time periods for data collection. Such graphical frameworks for presenting study designs in the protocol are recommended to foster transparency, enhance understanding of the design, and support the evaluation of the protocols and the interpretation of study results, as also illustrated in A Framework for Visualizing Study Designs and Data Observability in Electronic Health Record Data (Clin Epidemiol. 2022;14:601-8) and Visualizations throughout pharmacoepidemiology study planning, implementation, and reporting (Pharmacoepidemiol Drug Saf. 2022;31(11):1140-52).

For consent process and ethical guidelines related to human subject research, see Chapter 15.2. HARPER also provides considerations on protection of human subjects based on GDPR and use of anonymised or pseudo-anonymised data sources.

GVP Module VIII - Post-authorisation safety studies provides a structure for study protocols, which should cover at least the following aspects:

  • The research question that the study is designed to answer, which might be purely descriptive, exploratory or explanatory (hypothesis-driven) (see Chapter 2). The protocol should include a background description that explains the rationale (scientific, regulatory, etc.) and current knowledge on the research question. It will also explain the context of the research question, including what data are currently available and how these data can or cannot contribute to answering the question. The context will also be defined in terms of what information sources can be used to generate appropriate data and how the proposed study methodology will be shaped around these data.

  • The main study objective and possible secondary objectives, which are operational definitions of the research question. In defining secondary objectives, consideration could be given to time and cost, which may impose constraints and choices, for example in terms of feasibility, sample size, duration of follow-up, sensitivity analyses, or data collection (see Chapter 2). 

  • The source and study population to be used to answer the research question. The protocol should describe whether this population is already identified, and whether data are already available (secondary data collection) or whether data needs to be generated de novo (primary data collection). The boundaries of the desired population will be defined, including inclusion/exclusion criteria, timelines (such as index dates for inclusion in the study) and any exposure or events defining the population.

  • Exposures of interest that need to be pre-specified and defined, including duration and intensity of exposure, source of data and methods of ascertainment (see Chapter 4.3). 

  • Outcomes of interest that need to be pre-specified and defined, including data sources, operational definitions and methods of ascertainment such as data elements in field studies or appropriate codes in database studies (see Chapter 4.3). 

  • Adverse events/reactions that will or will not be collected and reported and the procedures put in place for this purpose. In the EU, the collection and reporting of adverse events or reactions by companies sponsoring a post-authorisation study should follow the recommendations specified in Module VI of the Guideline on good pharmacovigilance practice (GVP) - Management and reporting of adverse reactions to medicinal products. If the study qualifies as an interventional trial, the reporting criteria laid down in Clinical Trial Regulation (EU) 536/2014 and the draft Volume 10 - Guidance documents applying to clinical trials should be followed. 

  • The covariates and potential confounders that need to be pre-specified and defined, including how they will be measured (see Chapter 4.3). 

  • The statistical analysis plan, including statistical methods and software used, adjustment strategies, and how the results are going to be presented (see Chapter 11). 

  • The identification and way of minimisation of potential biases (see Chapter 5). 

  • Major assumptions, critical uncertainties and challenges in the design, conduct and interpretation of the results of the study given the research question and the data used. 

  • Ethical considerations, as described in Chapter 14. 

  • The study protocol should also explain how the results will be interpreted, avoiding misuse of p-values and statistical significance (see Chapter 4.1).

The HARPER template also recommends including in the protocol:

  • A rationale, context and table for choices relating to selection of time zero, inclusion/exclusion criteria.

  • Structured tables for exposure, outcome, follow-up and covariates, as well as validation, with a  description of algorithms used for data collection.

  • An evaluation of the fitness-for purpose of the data source(s) used.

  • A structured table detailing all sensitivity analyses.

Various data collection forms including the Case Report Form (CRF) for primary data collection, and list of disease codes or descriptions of the data elements for secondary data collection, may be appended to the protocol, providing an exact representation of how the data will be collected. The study protocol could include a section specifying ways in which the CRF will be piloted, tested and finalised. Amendments of final CRFs should be justified. For field studies, physician or patient forms could be included depending on the data collection methodology. Other forms may be included as needed, such as patient information, consent form or patient-oriented summaries.

Registration of the study protocol before the start of data collection provides information to other researchers about the study, improves transparency and, especially for studies based on secondary use of data, provides assurance that the stated hypotheses have not been influenced by the results. The Catalogue of RWD studies is a public register open to any researcher for the registration of non-interventional studies. In addition, study protocols can be registered and posted on other platforms: ClinicalTrials.gov now includes specific guidelines for the posting of non-interventional research, while since 2020, the Open Science Forum has a specific registration portal for observational studies.