This six-month series of articles from the Joanna Briggs Institute (JBI) has led the reader through the rigorous process of conducting a systematic review. The first article (published in March) summarized the systematic review as a scientific exercise, one affecting health care and health policy. Subsequent articles covered devising a review question and a search strategy and appraising and extracting data from studies found in the search. In this sixth and final article, we will focus on writing the results and discussion sections, where most clinicians turn when seeking guidance from a systematic review.
Readers of systematic reviews may be patients, clinicians, administrators, or policymakers and therefore possess a wide range of skill in understanding research terms. It's therefore important that reviewers make every effort to guide readers in making the best use of results in myriad cultural contexts and settings. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement has become the international standard for doing so.1 A central principle of the PRISMA statement is the need for the language to be plain and transparent. Indeed, the Cochrane Collaboration requires authors to begin a systematic review with a "plain language summary."2
Care is needed when determining the recommendations made based on a systematic review; they will be used to inform patient care in a variety of ways-through clinical guidelines, clinical pathways, protocols, and policies. The reader must understand both the strengths and the limitations of the available research on the question at hand. When you begin your interpretation of the results of the systematic review, with or without a meta-analysis, you're attempting to answer three questions3: What is already known to guide practice? What isn't known-that is, what gaps in the knowledge can you identify? And what future research priorities should be explored?
Reporting on the Results of a Systematic Review
Study selection. In answering the first question-What is already known to guide practice?-you'll discuss the search criteria and the selection of included studies, best presented in a QUOROM flow diagram (QUOROM stands for Quality of Reporting of Meta-Analyses). Many peer-reviewed journals will not accept systematic reviews for publication that do not include a flow diagram.
The PRISMA statement has established that the flow diagram must include the number of1
* unique records identified by the searches.
* records excluded after preliminary screening (such as screening of titles and abstracts).
* records retrieved in full text.
* records or studies excluded after assessment of the full text (include brief description of reasons).
* studies meeting the eligibility criteria for the review (thus contributing to qualitative synthesis).
* studies contributing to the main outcome.
A reader can gain an impression from the diagram of the scope of the research found and its relevance to the question at hand. See Figure 1 for an example of a flow diagram.
| An example of a flow diagram (numbers for illustration purposes only).|
The flow diagram also details the process by which each study was included or excluded. It's therefore important that you explain specific results and details of the selection process in the review.1, 2 This transparency adds to the credibility of the review's outcomes. By giving details about where you found most of the included studies-whether from electronic searches of bibliographic databases, hand searches of journals and reference lists, or researchers themselves-you're providing important information on both selection bias and publication bias. Selection bias refers to differences between studies as a result of the randomization of participants to intervention and control groups.2Publication bias refers to anything that hinders the transparent reporting of trial results. For example, historically, both authors and editorial teams have preferred to report only the positive outcomes of trials1, 2; non-English-speaking authors have also experienced difficulty in the past in having their manuscripts accepted for publication in leading peer-reviewed journals. Both of these can result in significant publication bias (also known as reporting bias).2
If most of the included studies were found outside the main subject-matter databases, alert the reader to the potential for publication bias or citation bias (which, according to the Cochrane handbook, occurs when researchers "advocate their own opinions and use the literature to justify their point of view"2). Keep in mind that even the most comprehensive search of both indexed and gray literature won't include all relevant research, since the reality is that only a small proportion of research projects reach publication.1-3 Inform the reader of barriers you encountered in the search strategy and its impact on your final results.
In a 2013 systematic review, Wong and colleagues examined whether giving paracetamol (acetaminophen) and ibuprofen together or alternating them is more effective than giving either alone in lowering fever and reducing discomfort in children.4 Their inclusion criteria were children with new fever; they excluded children with an injury or recent surgery. Throughout this article, we will excerpt from that review to illustrate how best to report aspects of the review process. For example, in the following excerpt Wong and colleagues defined their search strategy4:
"The search strategy identified 3649 citations from electronic databases[horizontal ellipsis]. After screening titles and abstracts, 53 studies were assessed to be potentially relevant. Ten additional studies were identified for further examination after hand-searching abstracts from the Pediatric Academic Society conference proceedings, but none met the inclusion criteria. No additional studies were identified for further examination after contact with experts or hand-searching reference lists from previous systematic reviews and included studies."
Included studies. Once you've described the details of the search, proceed to describe the included studies. Ideally, you'll present study details according to the PICO mnemonic: information on Population, Intervention, Comparison intervention, and Outcome measures. Typically, you'll present this information in tables, a quick review of which tells the reader any important between-study differences, such as in methods of treatment administration, outcome measurement, or reporting of missing data (at no time should you attempt to assume the results of missing data). In addition, you'll include a description of the included studies for each of the PICO categories, as Wong and colleagues did in this excerpt on interventions4:
"In all six studies, antipyretic medication was administered orally. Five studies used a paracetamol dose of 15 mg/kg orally[horizontal ellipsis] and one study used a loading dose of paracetamol of 25 mg/kg with subsequent doses of 12.5 mg/kg[horizontal ellipsis]. Four studies used an ibuprofen dose of 10 mg/kg[horizontal ellipsis] one study used an ibuprofen dose of 5 mg/kg[horizontal ellipsis] and one study used an ibuprofen loading dose of 10 mg/kg with subsequent doses of 5 mg/kg."
If you contacted study authors for additional data, you'll also need to include when and how you did this.
Heterogeneity. No two studies are the same. At any stage of research, from sample selection to treatment administration to data collection, studies may be conducted differently. You'll need to present and discuss such differences, also known as heterogeneity, in your included studies. By presenting forest plots, you'll provide the reader with some important information about both the statistical (methodologic) and clinical (interventional or outcome) heterogeneity. (See the fifth article in this series, "Data Extraction and Synthesis," July, for an example of a forest plot and a discussion of heterogeneity.)
If the degree of statistical or clinical heterogeneity is high, then a pooling of the outcome data for the included studies will have a misleading result. Instead, you should provide a narrative summary of the included studies. If there was moderate heterogeneity and you've performed a pooled analysis (a combining of data from different studies), tell the reader whether or not you used a fixed-effects or random-effects model. Use a fixed-effect model if you're confident that the between-study differences are due entirely to chance.2 If you're unsure of the cause of the between-study differences, use a random-effects model.2
Wong and colleagues conducted tests to determine heterogeneity and provided the following narrative summary4:
"There was a large amount of variation between the trials in medication dosage, regimens of administration, and frequency and type of assessment. Due to the small number of studies in each comparison, we were unable to assess the impact of these variations. Similarly, there was large variation in patient factors such as age, aetiology (viral or bacteria), severity of illness, and co-morbidities that may affect the effectiveness of interventions."
Excluded studies. You'll also present a description of excluded studies. Here you can present important patterns you noted during data extraction, such as the way interventions were administered or outcomes measured. As Wong and colleagues reported4:
"One study[horizontal ellipsis] met the search criteria for a [randomized controlled trial] in the topic of interest. However, relevant data on mean temperature was not reported. The author of the trial was contacted and did not have available access to the desired data."
Risk of bias. Next, you'll describe the included studies' methodologic quality and its influence on your interpretation of the results. You'll structure this description according to the tools you used during study appraisal, including those determining internal and external validity (requiring a review of each study's design). A variety of scales and checklists are available, many of which include items not directly related to internal or external validity.2 Most tools provide a single value or summary of bias for the whole study. Of particular value is when you provide an interpretation of the magnitude and likely direction of bias for each of the review outcomes. Each outcome will be affected differently by a particular bias. Therefore, a single score in the assessment of bias is discouraged because it can mislead the reader.
In 2005, the Cochrane Collaboration devised a "risk of bias tool" that was evaluated in 2011. The authors identified seven principles the tool is based on, among them not using quality scales, keeping a focus on internal validity, and basing an assessment of risk of bias on study results rather than on problems with the methods "that are not directly related to risk of bias."5 According to the Cochrane handbook, the credibility of the results of a randomized controlled trial (RCT) depends on the authors' reporting of the following six methodologic criteria, giving the reader an idea of the degree of systematic error in the included studies and hence of their overall credibility2:
* random sequence generation (selection bias): a description of how the researchers generated and administered the random allocation sequence to ensure a 50-50 chance of each participant being allocated to either the treatment or the control group
* allocation concealment (selection bias): a description of how the researchers ensured that the random allocation sequence was concealed from any person involved in the trial
* blinding of participants and personnel (performance bias): a description of how all those involved in the trial were unaware of which patient was or was not receiving treatment
* blinding of outcome assessment (detection bias): a description of how all outcome assessors-clinicians, data collectors, patients-were unaware of who was or was not receiving the treatment
* incomplete outcome data (attrition bias): a description of the number of participants randomized into the trial and of those who completed the trial, accounting for all missing data
* selective reporting bias (reporting bias): a description of whether discrepancies exist between those outcomes measured and those reported in the final analysis; failure to include this can result in making misleading conclusions
Assigning levels of evidence to recommendations. You should assign to any recommendation a "level of evidence" grade congruent with the research design that led to the recommendation. A "summary of findings" table has become the preferred method of grading the results of systematic reviews on questions of cause and effect. The GRADE system offers two grades of recommendations: strong and weak.6 This grading of the evidence alerts the reader to its clinical significance.
The GRADE software will guide you in completing a summary of findings table, which gives a balanced summary of evidence for each of the main outcomes identified in the review protocol.6 You're detailing for the reader how and why you determined the level of bias, and therefore the level of "risk" associated with the findings, for each included study. If the risk-benefit ratio is clearly in favor of the treatment and the quality of evidence is credible and valid, you will provide a strong recommendation.6 If the treatment results in undesired effects, you'll give it a weak recommendation; if the grade is weak, the table will give details on where the research is biased or incomplete.
Wong and colleagues provided the following analysis of one of the studies included in their review4:
"The quality of evidence for reductions in mean temperature and the proportions remaining febrile is of low quality at best, meaning we can have little confidence in the results. The evidence for a reduction in mean [Non-Communicating Children's Pain Checklist] score is also judged to be of low quality. For combined versus alternating therapy, the evidence was downgraded to "very low" due to the extremely small study size (40 participants)."
Now that you've presented a comprehensive analysis, you can provide answers to the remaining two questions3, 7: What isn't known-that is, what gaps in the knowledge can you identify? And what future research priorities should be explored?
Implications for Practice
Once you have provided a comprehensive, objective analysis and graded the evidence, you'll explain what it means in terms of current practice. The JBI recommends that this discussion be centered on the following four areas7:
* Evidence of feasibility. "Feasibility is the extent to which an activity is practical and practicable. Clinical feasibility is about whether or not an activity or intervention is physically, culturally or financially practical or possible within a given context."
* Evidence of appropriateness. "Appropriateness is the extent to which an intervention or activity fits with or is apt in a situation. Clinical appropriateness is about how an activity or intervention relates to the context in which care is given."
* Evidence of meaningfulness. "Meaningfulness is the extent to which an intervention or activity is positively experienced by the patient. Meaningfulness relates to the personal experience, opinions, values, thoughts, beliefs and interpretations of patients or clients."
* Evidence of effectiveness. "Effectiveness is the extent to which an intervention, when used appropriately, achieves the intended effect. Clinical effectiveness is about the relationship between an intervention and clinical or health outcomes."
Statistical vs. clinical significance. When determining how to present your results and make recommendations for practice, keep in mind that statistical significance does not always translate into relevance in the clinical or policy arena. Ideally, to help a reader differentiate between the statistical and clinical significance of a treatment graded strong, you'll provide details, as applicable, on the associated harms (known as the hazard ratio), the risks (the risk ratio), and the number needed to treat.1, 6, 7 For example, a reader may be impressed to learn that in order to see a benefit in one person 12 people needed to be treated. But the reader's impression might change if the treatment effect was only seen in one of 250 people treated. Similarly, a treatment with a risk reduction of 45% may be encouraging, until the reader learns there were adverse effects.
Policymakers and administrators value additional information such as the associated costs of an intervention so they can determine what resources would be needed. Drawing parallels with similar systematic reviews and using case studies as examples have proven to be valuable strategies used by review authors in their efforts to ground their results in the everyday reality of patient care, and in doing so to facilitate the implementation of evidence.6 As Wong and colleagues wrote in their review4:
"There is some evidence that both alternating and combined antipyretic therapy may be more effective at reducing temperatures than monotherapy alone. However, the evidence for improvements in measures of child discomfort remains inconclusive. There is insufficient evidence to support the use of alternating antipyretic therapy over combined antipyretic therapy[horizontal ellipsis]."
"Three systematic reviews looking at combined or alternating ibuprofen and paracetamol therapy exist in the literature[horizontal ellipsis]. All three reviews raised similar concerns to those highlighted in this review regarding lack of blinding and reasons for withdrawal from studies, low sample size, and variable drug doses and administration regimens."
Implications for Further Research
While conducting your review, you may have become aware of gaps in the literature, ranging from too little research conducted in a particular population to an inconsistency in study outcomes. You might also notice methodologic differences-too many descriptive versus analytically designed studies. Reporting such gaps is arguably as important as reporting the results of the review. From this information, clinical researchers can determine research priorities, and health care administrators and policymakers can encourage and support research funding. You should avoid making general statements calling for further research and instead highlight the gaps you've identified so that future research will be focused on areas of need. As Wong and colleagues wrote4:
"Future RCTs should focus on child discomfort using standardized and validated assessment tools. More research is needed on the safety of alternating and combined antipyretic regimens."
Limitations and conclusions. As with any research, a systematic review must include a critical reflection of its limitations-of both the individual studies and the overall review.1 By disclosing any barriers you encountered in determining the quality of reported data, you help other researchers to tailor study design, protocols, and reporting to minimize such limitations. For example, the generalizability and applicability of a review will be affected by the number of studies available that assess the most important outcome of interest. As Wong and colleagues wrote4:
"Current guidelines recommend only monotherapy for febrile children, in order to avoid potential side effects from multiple medication administration. The results from this study do not suggest any serious short term adverse effects from either alternating or combined antipyretic therapy compared with monotherapy. However, none of the included trials was large enough to have the power to detect important differences between treatment arms, nor were they long enough to detect potential adverse events from regular use. From the vast amount of literature on paracetamol and ibuprofen both drugs are regarded as safe with serious side effects being few and infrequent."
This series of six articles from the JBI has provided a step-by-step overview of how to conduct a systematic review, providing what we hope will be a valuable resource for nurses looking to inform their practice with rigorous research. Other comprehensive resources are also available, among them the JBI's reviewers' manual (http://bit.ly/1h2F8RZ)8 and the Cochrane handbook (http://handbook.cochrane.org),2 as well as a textbook by Holly and colleagues.9
The series has moved from the all-important first step in the process, that of formulating and articulating the review question in a way to facilitate the search for the evidence, through the selection of studies, the appraisal of their methodologic quality, and the extraction and synthesis of the data. The series has culminated with this article on how to develop recommendations for practice. Such recommendations should be derived from the highest level of evidence available, providing a foundation for evidence-based practice in nursing and other health professions.10