Instrument Review Criteria
The Minnesota Interagency Developmental Screening Task Force determines criteria for review of developmental and social–emotional screening instruments under consideration for recommended/approved status. Information on screening instruments is gathered from several sources, including: administration manuals; technical documents; literature reviews; communication with the instrument developers; and, the publisher. Developmental and social–emotional screening instruments that sufficiently meet the criteria outlined under the categories of Instrument Purpose, Developmental Domains, Reliability, Validity, and Sensitivity/Specificity are considered for recommended/approved status.
The Minnesota Interagency Developmental Screening Task Force reserves the right to modify the criteria standards used in the review process of developmental and social–emotional screening instruments. The Task Force will continue to integrate research and evidence–based practice in the review of developmental and social–emotional screening instruments.
- Criteria: The Task Force evaluates the purpose of the instrument to ensure that it is focused on screening, rather than assessment, or diagnostic evaluation, and whether it is designed to screen for developmental and social–emotional health rather than to predict the future academic success of the child.
If a purpose is not clearly defined in a brief statement, the Task Force reviews the descriptive materials about the instrument in an attempt to determine the instrument’s purpose.
- Criteria: The following domains must be included in developmental screening: motor, language, cognitive, and social–emotional.
Currently, the social–emotional domains embedded within developmental screening instruments do not demonstrate adequate reliability and validity to determine if a child needs further assessment. Therefore, the Task Force also reviews and recommends separate instruments for the social–emotional domain.
Reliability is an indicator of how consistently or how often identical results can be obtained with the same screening instrument. A reliable instrument is one in which differences in test results are attributable less to chance and more to systematic factors such as lasting and general characteristics of the child (Meisels & Atkins–Burnett, 2005).
- Criteria: The Task Force expects reliability scores of approximately 0.70 or above. Each instrument is evaluated on the actual reliability scores and the methods used to obtain these scores, such as scores by age, test–retest, inter–rater and intra–rater reliabilities.
Validity is an indicator of the accuracy of a test. Primarily, two forms of validity are considered:
Concurrent validity: This compares screening results with outcomes derived from a reliable and valid diagnostic assessment usually performed 7–10 days after the screening test. The validity coefficient reports the agreement between the two tests (Meisels & Atkins–Burnett, 2005).
Predictive validity: This compares the screening results with measures of children’s performance obtained 9–12 months later (Meisels & Atkins–Burnett, 2005).
- Criteria: The Task Force expects validity scores of approximately 0.70 or above. Each instrument is evaluated on the actual validity scores and on the methods used to obtain these scores. Measures of validity must be conducted on a significant number of children and using an appropriate standardized developmental or social–emotional assessment instrument(s).
Sensitivity and specificity are the primary means of evaluating a developmental and social-emotional screening instrument’s capacity to correctly identify children as “at risk” or “not at risk.” Sensitivity refers to the proportion of children who are “at risk” and are correctly identified as such by the test. Specificity refers to the proportion of children who are “not at risk” and are correctly excluded from further diagnostic assessment (Meisels & Atkins–Burnett, 2005).
- Criteria: The Task Force expects sensitivity and specificity scores of approximately 0.70 or above.
Our understanding of, and expectations for, child development change over time as new research emerges, and as changes occur in population demographics, technology, and curriculum. According to standards, screening instrument normative data should be updated every 10-15 years to account for these changes (Emmons and Alfonso, 2005; Head Start, 2011; Glasco, 2014).
- Criteria: The Task Force recommends instruments that have been developed or normed within the last 15 years, unless no other equivalent instrument is available that better meets the screening need for the given population. Additional considerations may include whether the instrument has had recent or ongoing research that demonstrates its effectiveness in identifying children who need further evaluation for developmental or social-emotional concerns.
The following items are important considerations in selection of an instrument, and are reviewed by the Task Force. Issues related to these items will not solely result in failure to achieve recommended/approved status, but when combined with concerns in the above criteria, an instrument may be eliminated from consideration.
Practicality refers to the ease of administration of the screening instrument, and the amount of time needed to administer and score the screening instrument.
- Criteria: The instrument should typically take 30 minutes or less to administer to English–speaking populations.
Population and age span targeted by the instrument
The Task Force considers two parts: the target group for whom the instrument was designed and standardized; and, the age of the child the instrument is designed to screen. This information should be clearly stated by the developer or publisher.
Cultural, ethnic, and linguistic sensitivity
The Task Force considers three items:
- The availability of the instrument in languages other than English.
- The instrument’s ability to accurately screen children from diverse cultures.
- Normative scores, or scores used to establish appropriate cutoff points for referral for the population for which the test is developed, should be provided (Meisels & Atkins–Burnett, 2005).
Minimum expertise of screeners
Screening instruments are designed to be administered by persons with varying levels of expertise, such as assistants, teachers, or psychologists. Some instruments allow for the screening instrument to be administered by a paraprofessional, but need to be scored or evaluated by a professional to determine if the child should be referred for further assessment. The Task Force looks at the presence of training materials or availability of training workshops for screeners to receive training on proper administration. The Task Force also considers an instrument that requires administration by a psychologist or similar professional may be an assessment instrument rather than a screening instrument.
The Task Force understands that school districts and organizations responsible for screening programs consider cost when selecting a developmental screening instrument. For this reason, the Task Force provides cost information on each developmental screening instrument, as available from the publisher.
Emmons, M.R. & Alfonso, V.C. (2005). A critical review of the technical characteristics of current preschool screening batteries. Journal of Psychoeducational Assessment, 23(11).
Glascoe, F.P. (2014). Best practices in test construction: Quality standards for reviewers and researchers. Journal of Developmental and Behavioral Pediatrics (submitted).
Meisels, S. J., & Atkins–Burnett, S. (2005). Developmental screening in early childhood: A guide (5th ed. ed.). Washington, DC: National Association for the Education of Young Children.
Nunnelly, J.C. and Berenstein, I.H. (1994). Pyschometric Theory. (3rd ed.). New York: McGraw-Hill.
U.S. Department of Health and Human Services Administration for Children & Families (2011). Resources for Measuring Services and Outcomes in Head Start Programs Serving Infants and Toddlers. Retrieved 8/11/2014 from http://www.acf.hhs.gov.