1. Stotts, Nancy A. EdD, RN, FAAN
  2. Aldrich, Katherine M. MS, RN, NP

Article Content

Instruments once used solely for research are now being integrated into clinical practice. Most nurses depend on instruments to assess patient status, processes, and outcomes. They use them to quantify things as concrete and objectively measurable as the presence or absence of pressure ulcers, as well as such abstract and subjectively measured concepts as level of anxiety, which can be assessed only by focusing on attributes of that concept. For measuring anxiety, for example, these attributes might include irritability, restlessness, sweating, and worry.


No instrument is perfect, regardless of how many years it has been in development or use. It can be difficult to measure abstract concepts, and errors may be caused by the instrument, its user, or the circumstances of administration. The goal is to get the most accurate measurement possible. When evaluating an instrument, clinicians must know whether it has performed well for others studying a similar population in a similar setting.


How do you determine which instruments you can trust and should use? First, clearly define the construct you want to measure. Without a well-defined construct-for example, delirium in older adults who already have dementia-you can't determine whether the instrument will actually measure it. Check the literature for review papers on the topic. They will provide a good foundation for defining the construct and thinking about its attributes. In the How to Try This series, that content is summarized for you.


The psychometrics are the next consideration. A tool's performance is generally described in terms of its psychometric properties, which include reliability, validity, sensitivity, and specificity. Most of these properties are evaluated statistically.


The How to Try This series describes instruments that can be used with older patients. The following definitions will help you understand their psychometric attributes.


Reliability refers to the degree to which an instrument produces consistent and stable results over time and under similar conditions. Reliability is indicated by a correlation coefficient, where 0 is no agreement and 1 is total agreement. A correlation coefficient of between 0.8 and 0.9 is desirable, but 0.7 is acceptable for new instruments. The following are types of reliability.


Internal consistency is assessed with Cronbach's [alpha] (alpha). It indicates how well the items on a scale correlate with each other and the concept. Internal consistency increases as Cronbach's [alpha] nears 1.


Stability, sometimes called test-retest reliability, reflects whether the same score is obtained when the scale is administered on more than one occasion to the same person. When stability is reported, the amount of time that elapsed between uses of the instrument should be included.


Equivalence refers to the level of agreement when two versions of the same instrument (such as the 5- and 30-question forms of the Geriatric Depression Scale) are used or when two testers use the same tool. Instrument equivalence is also known as "split-half reliability" or "alternate form reliability." The consistency with which two or more people administer the tool is called interrater reliability. It's reported using Cohen's [kappa] (kappa) coefficient. The closer the result is to 1, the better the interrater reliability.


Validity refers to the extent to which the instrument measures what it is designed to measure. Validity can vary from one sample or setting to another and by how the instrument is used. An instrument that is valid in one situation may not be in another. Several types of validity are elements of overall or construct validity.


Content validity establishes how well the instrument measures what it's supposed to, such as fall risk or sleepiness. It is determined from reports in the literature by content experts and people using the instrument. When experts judge content validity, a content-validity index may be reported. The index ranges from 0 to 1, with 0.9 or greater being acceptable.


Convergent validity compares the score obtained with one instrument with that gotten with another that measures the same concept; the scores should be related. Divergent validity compares instruments that measure opposite concepts (for example, hope and hopelessness). With divergent validity, the score on one increases as the score on the other decreases.


Validity can also be established by comparing the scores of two contrasting groups that are expected to have different ratings. For example, researchers might compare the mobility of people living in the community with that of those who are hospitalized. If the expectation that hospitalized patients have more limited mobility is true, a valid instrument will show that more hospitalized patients are less mobile than those living in the community.


The predictive validity of an instrument measures how accurately it allows the user to predict the future. For example, a test of how well someone believes she or he can exercise will accurately predict future exercise performance if the test has predictive validity.


Sensitivity, specificity, and predictive values indicate how useful an instrument is in clinical situations in comparison with a well-established instrument or set of diagnostic criteria (often described as a "gold standard").


Sensitivity refers to the ability of a tool to detect a disease or condition when it is actually present, and specificity refers to the ability of a tool to exclude a condition when it is not present. Sensitivity and specificity are inversely related, and as a tool's sensitivity rises, its specificity decreases.


An instrument's positive predictive validity is the probability that those who have the disease or condition being measured will be correctly identified by the tool; its negative predictive validity is the probability that those who do not have the condition will be correctly identified by the tool.



Ideally the instrument you choose will have multiple types of validity and reliability already established, but this isn't always the case when an instrument is new. An instrument should meet at least the following requirements before being used:


* It should have established basic measures of validity and reliability, including content validity, sensitivity, specificity, and internal consistency.


* It should have been tested in a population similar to yours, with such factors as age, sex, and overall health having been taken into account.


* It should be easy to use in clinical practice, impose a minimal burden on the patient and staff, and be easy to interpret.


* It should provide a better basis for care than your existing approach does.


* It should be appropriate for the patient you want to assess. For example, a numeric rating scale is appropriate for measuring the severity of acute pain. However, more data are needed than a one-dimensional scale can provide if you are working with patients with chronic pain; while it would take more time to administer, a more comprehensive scale would provide a better foundation for the future assessment and treatment of chronic pain.



The instrument also needs to be available. Some instruments are free, but some are copyrighted and you must pay a royalty to use them.



We encourage you to consider the concepts that the instruments in the series represent. They are core concepts in caring for older patients. Then we ask you to evaluate whether using the various instruments would enhance your ability to care for older adults. At the very least, we challenge you to read the article, watch the video, and do the assessment. Once you've done that, we encourage you to talk about it with your colleagues. Discuss each of the instruments, the construct that was measured, and how the care you currently provide, based on your current assessment approach, would change if you adopted the instrument. Finally, if you do decide to adopt one or more of these instruments, track your outcomes to see if they improve. Translating research into clinical practice isn't easy, but it may improve outcomes.




Glossary of statistical terms. In: Iverson C, et al., editors. AMA manual of style: a guide for authors and editors. 10th ed. New York: Oxford University Press; 2007. p. 852-900.


Burns N, Grove SK. The practice of nursing research: conduct, critique, and utilization. 5th ed. St. Louis: Elsevier Saunders; 2005.


Froman RD, Schmitt MH. Thinking both inside and outside the box on measurement articles. Res Nurs Health 2003;26(5):335-6.


Hulley SB, et al., editors. Designing clinical research. 3rd ed. Philadelphia: Wolters Kluwer Health/Lippincott Williams and Wilkins; 2007.


Norbeck JS. What constitutes a publishable report of instrument development? Nurs Res 1985;34(6):380-2.


Polit DF, Beck CT. Essentials of nursing research: methods, appraisal, and utilization. 6th ed. Philadelphia: Lippincott Williams and Wilkins; 2006.


Roberts WD, Stone PW. How to choose and evaluate a research instrument. Appl Nurs Res 2003;16(1):70-2.


Stommel M, Will CE. Clinical research: concepts and principles for advanced practice nurses. Philadelphia: Lippincott Williams and Wilkins; 2004.