1. Foster, Jan PhD, APRN, CNS

Article Content

Measurement in intervention research requires deep contemplation and careful, detailed planning. Distinct indicators are needed so that the investigator can recognize whether the desired outcome or result of the intervention has occurred; one must "know it when you see it." A general idea of what a researcher intends to investigate (recovery from neuromuscular blockade/paralyzing medications, for example) may morph into a very specific idea (how long does it take for recovery of optimal muscle strength), owing to available measurement approaches. What to measure arises from a fine-tuned research question and operational definitions, along with a theory of the phenomena to be investigated (what is the theoretical basis for strength of muscle movement in a given time span after receiving neuromuscular blocking agents?). How to measure comes from design of the study and elements of measurement science, that is, instrument validity, reliability, sensitivity, and specificity. There is a variety of measures available to the researcher, who must match the type of measure to what best suits the intervention and outcomes under study. Sometimes, temporal concepts are interjected in the measurement process, as an endpoint so that data collection is concluded for economic reasons, that is, 1 month after discharge or as an approach to quantifying, and reporting or communicating a response to the intervention, that is, number of seizures per day, episodes of nausea per postoperative hour, and so on. Several forms of measurement, along with practical application, will be described in this month's column.



Qualitative measurement in intervention research involves the use of words versus numbers to describe phenomena of interest. Researchers themselves become the measurement instrument through intimate involvement in the study. Researchers query participants for common phrases, descriptions, and responses until terms used are exhausted and no further descriptors emerge.


The design in a qualitative study materializes as the study progresses, as a result of ongoing data analysis. Sample sizes are generally small, compared with sample sizes in quantitative studies. Qualitative measures can be useful in the pilot study phase of intervention research to determine the feasibility of a larger study. Feedback from participants can be used to tailor recruitment and retention issues in a large-scale study, for example. Understanding patients' and families' needs or preferences when designing an intervention can be achieved through qualitative research. For instance, a researcher can gain better understanding of barriers to adherence in a weight loss intervention before implementation in the study. In addition, process, instrument, or theory refinement for a quantitative study may be achieved with a preemptive qualitative study. Examples of qualitative approaches used by nurse scientists include ethnography, ethnoscience, phenomenology, hermeneutics, ethology, ecological psychology, grounded theory, ethnomethodology, semiotics, discourse analysis, and historical analysis.1 Several resources are available when designing or appraising a qualitative study on the Equator Network: Enhancing Quality and Transparency of Health Research (



Quantitative measurement supplies the numbers used in statistical analyses. In intervention research, the independent variable or intervention is measured against the outcome variable or dependent variable. The numbers are used in simple or complex statistical analyses to describe the phenomena, estimate parameters, and test hypotheses. Characteristics or attributes of the patient, object, entity, behavior, and others, are measured.3 The researcher aims to quantify the effect on the outcome in accordance with the quantity of the intervention.


Scales of measurement are used to appropriately assign numbers and correctly analyze resulting data. First is the nominal-level scale, which is the assignment of numbers as labels to entities or categories of entities; numbers are used as substitutes for names that identify classes of objects.4 Examples include gender, medical diagnoses, dichotomous variables, and other attributes that are assigned to 2 or more categories. For example, a nurse researcher may wish to determine the effect of an intervention on delirium prevention and results are analyzed according to participants who were classified as delirium positive or delirium negative.


Second is ordinal-level data, which involves the assignment of numbers to participants or objects to reflect their rank order on the characteristic of interest. The assigned numbers do not indicate how much of an attribute but instead represent a relationship of greater or lesser than between 2 or more characteristics.4 A researcher may wish to measure the impact of an intervention on pain; a pain scale with levels ranging from 1 to 10 could be used to evaluate the effectiveness of the intervention. However, because the quantity of pain is not being measured, one cannot assume that because multiple participants rate their pain as 5, for example, that they actually experience equal pain. Also, one cannot assume that a rating of 10 represents twice as much pain as those rating their pain at a 5. Likert scales are another commonly used example of ordinal-level measurement in nursing research to evaluate attitudes (satisfaction, importance, support) usually on a 1 to 5 or sometimes 1 to 3 or 1 to 7 scale.


Next is interval-level data, in which the requirements of ordinal level are met, and in addition, the intervals between each number are equal. The numbers reflect constant units of measurement and allow for comparisons between differences. Temperature is the most commonly used illustration of interval-level data. For example, a researcher aims to evaluate the effect of prewarming on patients undergoing abdominal surgery on postoperative hypothermia. For patients whose temperature changes from 37[degrees]C to 36[degrees]C, the change is the same as those whose temperature changes from 36[degrees]C to 35[degrees]C. However, because the 0 point has been arbitrarily assigned (the freezing point of water), it is meaningful to report the differences in the temperature change but not meaningful to report the actual temperatures per se. Using an example in the Fahrenheit scale, one cannot say a person's temperature of 102 degrees is twice as warm as a person's temperature of 51 degrees, again because even though the intervals between each degree is equal, the 0 point has been arbitrarily determined.4


Finally, the ratio-level scale meets the conditions of the interval-level scale and, in addition, has a meaningful or absolute 0. In other words, there is 0 amount of the attribute being measured. Many physiological variables fall into the ratio-level scale.4 A researcher may wish to compare a blood pressure (BP) and weight management program on adults and adolescents. Before the intervention, the average adult systolic BP is 250 and the average adolescent BP is 125; the average for adults is twice as much as that for adolescents, which reflects ratio-level data (it is of course possible to have 00 BP). On the other hand, after the weight loss program, the average adult BP drops to 180 and the adolescents' to 100. Rather than report the average BP, which is still elevated for adults, the researcher may want to conclude that the intervention was more effective in adults because the difference in BP reduction between the 2 groups was greater for the adults (interval-level data).


Recognizing the level of data in measurement is necessary for appropriate descriptive and inferential statistical analyses. In descriptive statistics, the measures of central tendency, mode, median, and mean are to be used according to the level of data and/or when assumptions of normality within the sampling frame are met. The mode (most frequently occurring score in the sample) is the only measure appropriate to nominal-level data. The median (an indication of position in a set of scores at which point 50% are higher and 50% are lower) can be used for ordinal-, interval-, and ratio-level data. The mean is the average, is appropriate only for interval- or ratio-level data (the distance between each score must be equal to reflect a mathematical average), and is affected by extreme scores. When the distribution of the scores is skewed by extreme scores, the median provides a more accurate description of the sample than the mean does. When using inferential statistics to extrapolate from the sample and arrive at some conclusions for the population from which the sample was drawn, the researcher must use the statistic appropriate to the measure of central tendency and corresponding level of measurement. Parametric tests are used when the data are interval or ratio level and several assumptions are met (ie, random sampling, normal distribution, equal variance). For example, it is appropriate to report the mean for interval- and ratio-level data that are not skewed and the t test can be used for comparing 2 groups (heart rate before and after 2 types of exercise) and analysis of variance for more than 2 groups (heart rate before and after 3 types of exercise). The Pearson r can be used for assessing correlation between 2 groups when mean scores are reported (elevation in heart rate and body mass index after exercise). Nonparametric tests are used when assumptions of parametric tests are not met. The median test can be used when the median is reported instead of the mean. When using ordinal-level data, the Mann-Whitney U or Wilcoxon signed ranks tests are appropriate. The [chi]2 test is commonly used for nominal-level data (a comparison of the number of men and women who successfully completed a prescribed exercise program).5 Failure to use the appropriate test or violation of the necessary underlying assumptions in the analysis of data can lead to inaccurate or completely erroneous conclusions about the intervention.


Quantitative measurement allows the researcher to maintain an unbiased view because unlike qualitative research, in which there is greater opportunity for subjective interpretation of responses, the numbers speak for themselves. The design in quantitative research is highly structured, and specific instruments are used to measure the attributes of interest. All data collectors must agree when they are "seeing it," based on a set of predetermined, static criteria. For example, when reporting the effectiveness of an intervention to reduce central line-acquired blood stream infections, the research team must use consistent, standardized criteria to accurately evaluate success. Quantitative measures are also used to establish end-points. Length of stay, ventilator days, and postoperative days are some common examples. Limited resources are sometimes considered when determining end-points; for example, in research evaluating quality of life in critical care survivorship, the cutoff point has traditionally been at hospital discharge or 28 days after intensive care unit discharge.6


Subjective Measures

Subjective measurement is a method used in quantitative research that allows study participants much leeway in their responses. Open-ended interviews, fill-in-the-blank questionnaires, and self-scribed reports are examples. Although words are used in the data collection process, this approach should not be confused with qualitative measurement. With a large enough sample size, simple responses may be numerically coded and analyzed quantitatively. Greater depth in responses such as in paragraphs or essays, on the other hand, requires scoring by content experts.3


Objective Measures

Objective measures require forced responses by the participant. Examples include multiple choice questionnaires, physiologic variables, psychometric variables, and direct observation of behaviors. Psychometric variables commonly of interest to researchers include cognition, which can be measured with preachievement and postachievement tests, and properties representative of the affective domain such as values and attitudes; satisfaction surveys is an example. Direct observation of psychomotor skills is common in nursing education research. Instruments used in this type of measurement are of concern; validity and reliability are essential to producing accuracy and reproducibility in results.3 Researchers are encouraged to use instruments with published psychometric properties whenever possible versus self-designed. Examples of tested instruments, along with methods of evaluating or conducting quantitative studies, including randomized, observational, quality improvement studies, and case reports are on the Equator Network: Enhancing Quality and Transparency of Health Research Web site (


Comparative Effectiveness Research

Comparative effectiveness research (CER), in a world in which the randomized clinical trial dominates as the gold standard of research, is not by definition intervention research, and as such, purists would not expect to be included in this manuscript. Comparative effectiveness research is considered observational research in which the benefits and harms of strategies previously tested in clinical trials (to determine efficacy) are evaluated on patients in "real world" clinical settings (to determine effectiveness).7 Instead of randomizing to treatment or controlled groups, interventions are evaluated as part of usual clinical care. This model is attractive to nurse scientists and consistent with nursing ethics in which there is interest in evaluating the effectiveness of interventions for a broader segment of the population, in patient settings in which full compliance with the research protocol (as in the randomized clinical trial) is not always feasible, and when there may be exceptions to patient inclusion criteria. However, researchers are held to standards that include a well-developed study plan, rigor in measurement, nonbiased analysis, and plausible interpretation7 to advance science and inform decision making about use of the intervention. Various groups have established principles of conducting and evaluating CER; the principles are summarized and explained in The GRACE (Good ReseArch for Comparative Effectiveness) Initiative at



Patient-centered outcomes research assimilates CER, which is quantitative in nature, and qualitative research methods. Patients and families are included in the development and planning of the study so that outcomes that reflect patients' needs and preferences are addressed. Also, other stakeholders such as clinicians, policy makers, community leaders, and payers can make better informed decisions about which interventions are superior.8 A qualitative or subjective approach may be used initially to determine patients' views, followed by quantitative or objective methods during evaluation of the intervention. When patients guide research that results in improved care as they see it, adherence to treatment plans is enhanced. Numerous projects have been funded by the PCOR Institute since 2013; some recent examples include Comparative Effectiveness of Breast Cancer Screening and Diagnostic Evaluation by Extent of Breast Density (Miglioretti), A Comparison: High Intense Periodic vs. Every Week Therapy in Children with Cerebral Palsy (ACHIEVE) (Heathcock), and Assessing Health Outcomes in Rural Areas where Nurse Practitioners Provide Primary Care-Tier II (Stephens). A full listing of funded projects can be found at



So how does one determine what method to use when conducting intervention research? It goes back to what is known (and not known) about the subject of interest and what the researcher seeks to learn. Clear articulation of the research question cannot be over emphasized; the choice of qualitative, quantitative, or mixed methods depends on the research question. Meaningful results start with measurement using the best available instrument and require analysis using rigorous methods, whether qualitative, quantitative, or mixed approaches are used.




1. Equator Network. Enhancing quality and transparency of health research. Reporting guidelines for main study types. Accessed May 24, 2016. [Context Link]


2. Polit DF, Beck CT. Qualitative research design and approaches. In: Nursing Research Generating and Assessing Evidence for Nursing Practice. 9th ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2012:486-514. [Context Link]


3. Waltz CF, Strickland OL, Lenz ER. Measurement theories and frameworks. In: Measurement in Nursing and Health Research. 4th ed. New York: Springer Publishing Co.;2010:49-90. [Context Link]


4. Pedhazur EJ, Schmelkin LP. Measurement and scientific theory. In: Measurement, Design, and Analysis: An Integrated Approach. Psychology Press; New York: 2013:15-29. [Context Link]


5. Polit DF, Beck CT. Inferential statistics. In: Nursing Research Generating and Assessing Evidence for Nursing Practice. 9th ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2012:404-432. [Context Link]


6. Higgins AM, Harris AH. Health economic methods: Cost-minimization, cost-effectiveness, cost-utility, and cost-benefit evaluations. Crit Care Clin. 2012;28:11-24. [Context Link]


7. The GRACE Initiative. GRACE: Good ReseArch for Comparative Effectiveness. Accessed May 24, 2016. [Context Link]


8. Patient-centered Outcomes Research Institute. Research we support. Accessed May 24, 2016. [Context Link]