Authors

  1. Sabour, Siamak MD, MSc, DSc, PhD
  2. Ghassemi, Fariba MD

Article Content

Dear Ms. Carroll,

 

We were interested to read the article by Czaikowski, Liang, and Stewart published in the April 2014 issue of Journal of Neuroscience Nursing (Czaikowski, Liang, & Stewart, 2014). The authors purpose was to potentially provide greater information than the formally used Glasgow Coma Scale (GCS) when assessing critically ill, neurologically impaired pediatric patients, including those sedated and/or intubated (Czaikowski et al., 2014). They modified the Full Outline of UnResponsiveness (FOUR) Score Scale for the mentioned population (Czaikowski et al., 2014). Experienced pediatric intensive care unit nurses were trained as "expert raters." Two different nurses assessed each subject using the Pediatric FOUR Score Scale (PFSS), GCS, and Richmond Agitation Sedation Scale at three different time points. Data were compared with the Pediatric Cerebral Performance Category (PCPC) assessed by another nurse. As the authors mentioned, they hypothesized that the PFSS and PCPC should highly correlate and the GCS and PCPC should correlate lower.

 

Such analysis has nothing to do with reliability (precision) and prediction validity of a test. Actually, considering correlation coefficient is one of the common mistakes in reliability analysis (Jeckel, Katz, Elmore, & Wild, 2007; Rothman, Sander, & Lash, 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour, Dastjerdi, & Moezizadeh, 2013; Sabour & Ghassemi, 2012). Why did the authors not use well-known statistical tests for reliability and validity analysis?

 

To assess the reliability, intraclass correlation coefficient for quantitative variables and weighted kappa for qualitative variables should be used with caution because kappa has its own limitation too (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour et al., 2013; Sabour & Ghassemi, 2012). Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive and negative likelihood ratios (LR+ and LR-) as well as diagnostic odds and odds ratio (true results\false results) are among the tests to correctly assess validity (accuracy) of a single test compared with a gold standard (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour et al., 2013; Sabour & Ghassemi, 2012).

 

As the authors reported, the PFSS is excellent for interrater reliability for trained nurse-rater pairs and prediction of poor outcome and in-hospital mortality, under various situations, but there were no statistically significant differences between the PFSS and the GCS. Statistically significant does not mean clinical importance of the findings (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour et al., 2013; Sabour & Ghassemi, 2012). Actually, statistically significant and clinically important are two completely different issues that should not be confused with each other. Statistically significant findings can easily change to nonsignificant because of several reasons such as decreasing sample size. It means that the amount of difference between the PFSS and the GCS is clinically important (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour et al., 2013; Sabour & Ghassemi, 2012).

 

As the authors concluded, the PFSS does have the potential to provide greater neurological assessment in the intubated and/or sedated patient based on the outcomes of our study. Such conclusion can be misinterpreted because of inappropriate use of statistical tests (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2013a, 2013b; Sabour & Dastjerdi, 2012; Sabour et al., 2013; Sabour & Ghassemi, 2012).

 

It is crucial to know that, for prediction studies, we need two different cohort data sets or at least one cohort data set splitting that to develop our prediction model and then to validate it. So without validation of our prediction model, we cannot generalize our findings (Jeckel et al., 2007; Rothman et al., 2008; Sabour, 2012, 2013c; Sabour & Ghassemi, 2013).

 

As a take-home message, reliability and validity should not be confused with each other, and appropriate tests should be applied for scientifically correct assessment. Otherwise, mismanagement of the patients cannot be avoided. Finally, for a specific research question, appropriate study design should be considered.

 

Sincerely,

 

Siamak Sabour, MD, MSc, DSc, PhD

 

Safety Promotion and Injury Prevention Research

 

Centre, Shahid BeheshtiUniversity of Medical Sciences

 

Department of Clinical Epidemiology, Shahid Beheshti

 

University of Medical Sciences Tehran, Iran

 

s.sabour@sbmu.ac.ir

 

Fariba Ghassemi, MD

 

Eye Research Centre, Farabi Hospital

 

Tehran University of Medical Sciences

 

Tehran, Iran

 

References

 

Czaikowski B. L., Liang H., Stewart C. T. A (2014). Pediatric FOUR Score Coma Scale: Interrater reliability and predictive validity. Journal of Neuroscience Nursing, 46 (2), 79-87. [Context Link]

 

Jeckel J. F., Katz D. L., Elmore J. G., Wild D. M. G. (2007). The study of causation in epidemiologic investigation and research. In: Jeckel J. F. (Ed.), Epidemiology, Biostatistics and Preventive Medicine (3rd ed., pp. 64-66). Philadelphia, PA: Saunders, Elsevier. [Context Link]

 

Rothman K. J., Sander G., Lash T. L. (2008). Cohort studies. In: Rothman K. J. (Ed.), Modern Epidemiology (3rd ed., pp. 79-85). Baltimore, MD: Lippincott Williams & Wilkins. [Context Link]

 

Sabour S. (2012). Prediction of spontaneous preterm delivery in women with threatened preterm labour: A prospective cohort study of multiple proteins in maternal serum. British Journal of Obstetrics and Gynecology, 119 (12), 1544. [Context Link]

 

Sabour S. (2013a). Reliability and accuracy of skeletal muscle imaging in limb-girdle muscular dystrophies. Neurology, 80 (24), 2275. doi:10.1212/WNL.0b013e318299ef6b

 

Sabour S. (2013b). A quantitative assessment of the accuracy and reliability of O-arm images for deep brain stimulation surgery. Neurosurgery, 72 (4), E696. doi:10.1227/NEU.0b013e318282d66e [Context Link]

 

Sabour S. (2013c). Obesity predictors in people with chronic spinal cord injury: Common mistake. Journal of Research in Medical Sciences, 18 (12), 1118. [Context Link]

 

Sabour S., Dastjerdi E. V. (2012). Reliability of implant surgical guides based on soft-tissue models: A methodological mistake. Journal of Oral Implantology, 38 (6), 805. doi:10.1563/AAID-JOI-D-12-00176 [Context Link]

 

Sabour S., Dastjerdi E. V., Moezizadeh M. (2013). Accuracy of peri-implant bone thickness and validity of assessing bone augmentation material using cone beam computed tomography-Is this correct? Clinical Oral Investigations, 17 (7), 1785. doi:10.1007/s00784-013-0944-0

 

Sabour S., Ghassemi F. (2012). Accuracy, validity, and reliability of the infrared optical head tracker (IOHT). Investigative Ophthalmology and Visual Science, 53 (8), 4776. doi:10.1167/iovs.12-10324 [Context Link]

 

Sabour S., Ghassemi F. ( 2013). Predictive value of confocal scanning laser for the onset of visual field loss. Ophthalmology, 120 (6), e31-e32. [Context Link]