Buy this Article for $10.95

Have a coupon or promotional code? Enter it here:

When you buy this you'll get access to the ePub version, a downloadable PDF, and the ability to print the full article.


interrater agreement, pediatric intensive care, sedation assessment, State Behavioral Scale, treatment fidelity, withdrawal assessment, Withdrawal Assessment Tool-Version 1



  1. Lebet, Ruth
  2. Hayakawa, Jennifer
  3. Chamblee, Tracy B.
  4. Tala, Joana A.
  5. Singh, Nakul
  6. Wypij, David
  7. Curley, Martha A. Q.


Background: RESTORE (Randomized Evaluation of Sedation Titration for Respiratory Failure) was a cluster randomized clinical trial evaluating a sedation strategy in children 2 weeks to <18 years of age with acute respiratory failure supported on mechanical ventilation. A total of 31 U.S. pediatric intensive care units (PICUs) participated in the trial. Staff nurse rater agreement on measures used to assess a critical component of treatment fidelity was essential throughout the 4-year data collection period.


Objective: The purpose of the study is to describe the method of establishing and maintaining interrater agreement (IRA) of two core clinical assessment instruments over the course of the clinical trial.


Methods: IRA cycles were carried out at all control and intervention sites and included a minimum of five measurements of the State Behavioral Scale (SBS) and Withdrawal Assessment Tool-Version 1 (WAT-1). Glasgow Coma Scale scores were also obtained. PICUs demonstrating <80% agreement repeated their IRA cycle. Fleiss's kappa coefficient was used to assess IRA.


Results: Repeated IRA cycles were required for 8% of 226 SBS cycles and 2% of 222 WAT-1 cycles. Fleiss's kappa coefficients from more than 1,350 paired assessments were .86 for SBS and .92 for WAT-1, demonstrating strong agreement and similar to .91 for the Glasgow Coma Scale. There was no difference in Fleiss's kappa for any of the instruments based on unit size or timing of assessment (earlier or later in the study). For SBS scores, Fleiss's kappa was significantly different in larger and smaller PICUs (.82 vs. .92, p = .003); however, Fleiss's kappa for both groups indicated excellent agreement.


Conclusion: Monitoring measurement reliability is an essential step in ensuring treatment fidelity and, thus, the validity of study results. Standardization on the use of these core assessment instruments among participating sites was achieved and maintained throughout the trial.