Differential Item FunctioningBeyond validity evidence based on internal structure

  1. Juana Gómez-Benito 1
  2. Stephen Sireci 2
  3. José-Luis Padilla 3
  4. M. Dolores Hidalgo 4
  5. Isabel Benítez 5
  1. 1 Universitat de Barcelona
    info

    Universitat de Barcelona

    Barcelona, España

    ROR https://ror.org/021018s57

  2. 2 University of Massachusetts Amherst
    info

    University of Massachusetts Amherst

    Amherst Center, Estados Unidos

    ROR https://ror.org/0072zz521

  3. 3 Universidad de Granada
    info

    Universidad de Granada

    Granada, España

    ROR https://ror.org/04njjy449

  4. 4 Universidad de Murcia
    info

    Universidad de Murcia

    Murcia, España

    ROR https://ror.org/03p3aeb86

  5. 5 Universidad Loyola
    info

    Universidad Loyola

    La Paz, Bolivia

    ROR https://ror.org/01wfnf418

Revista:
Psicothema

ISSN: 0214-9915

Año de publicación: 2018

Volumen: 30

Número: 1

Páginas: 104-109

Tipo: Artículo

Otras publicaciones en: Psicothema

Resumen

Background: In the latest release of the Standards for Educational and Psychological Testing, Differential Item Functioning (DIF) is considered as validity evidence based on internal structure. However, there are no indications of how to design a DIF study as a validation study. In this paper, we propose relating DIF to all sources of validity evidence, and provide a general conceptual framework for transforming “typical” DIF studies into validation studies. Method: We perform a comprehensive review of the literature and make theoretical and practical proposals. Results: The article provides arguments in favour of addressing DIF detection and interpretation as validation studies, and suggestions for conducting DIF validation studies. Discussion: The combination of quantitative and qualitative data within a mixed methods research perspective, along with planning DIF studies as validation studies, can help improve the validity of test score interpretations

Información de financiación

This work was supported by the Spain’s Ministry of Economy and Competitiveness [Grant number PSI2015-67984-R), and the Andalusia Regional Government under the Excellent Research Fund [SEJ-6569].

Financiadores

Referencias bibliográficas

  • American Psychological Association, American Educational Research Association & National Council on Measurement in Education (1974). Standards for educational and psychological tests. Washington, DC: American Psychological Association.
  • American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • American Educational Research Association, American Psychological Association & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Angoff, W. H. (1993). Perspectives on Differential Item Functioning Methodology. In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 3-23). Dublin: Educational Research Center.
  • Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95106. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
  • Benítez, I., & Padilla, J. L. (2014). Analysis of non-equivalent assessments across different linguistic groups using a Mixed Methods approach: Understanding the causes of Differential Item Functioning by Cognitive Interviewing. Journal of Mixed Methods Research, 8(1), 52-68. https://doi.org/10.1177/1558689813488245
  • Benítez, I., Padilla, J. L., Hidalgo, M. D., & Sireci, S. (2015). Using mixed methods to interpret Differential Item Functioning. Applied Measurement in Education, 29(1), 1-16. https://doi.org/10.1080/0895 7347.2015.1102915
  • Camilli, G., & Shepard, L. A. (1994). Methods for Identifying Biased Test Items. United States: SAGE Publications.
  • Cizek, G.J., Rosenberg, S.L., & Koons, H.H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68(3), 397-412. https://doi.org/10.1177/0013164407310130
  • Cleary, T. A., & Hilton, T. L. (1968). An investigation of item bias. Educational and Psychological Measurement, 28(1), 61-75. https://doi.org/10.1177/001316446802800106
  • Cohen, A.S., & Bolt, D.M. (2002). A mixture model analysis of differential item functioning. Paper presented at the Annual Meeting of the American Educational Research Association. New Orleans.
  • Creswell, J. W. (2015). A concise introduction to mixed methods research. Thousand Oaks, CA: Sage Publications.
  • Crocker, L. (1996). Editorial: The great validity debate. Educational Measurement: Issues and Practice, 16(2), 4. https://doi.org/10.1111/j.1745-3992.1997.tb00584.x
  • Embretson, S. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179-197. https://doi.org/10.1037/0033-2909.93.1.179
  • Evers, A., Muñiz, J., Hagemeister, C., Høstmælingen, A., Lindley, P., Sjöberg, A., & Bartram, D. (2013). Assessing the quality of tests: Revision of the EFPA review model. Psicothema, 25(3), 283-291. https://doi.org/10.7334/psicothema2013.97
  • Gadermann, A.M., Guhn, M., & Zumbo, B.D. (2011). Investigating the substantive aspect of construct validity for the Satisfaction with Life Scale adapted for children: A focus on cognitive processes. Social Indicator Research, 100, 37-60. https://doi.org/10.1007/s11205-0109603-x
  • Gómez-Benito, J., Balluerka, N., González, A., Widaman, K. F., & Padilla, J. L. (2017). Detecting differential item functioning in behavioral indicators across parallel forms. Psicothema, 29(1), 91-95. https//doi.org/10.7334/psicothema2015.112
  • Hernández, A., Tomás, I., Ferreres, A., & Lloret, S. (2015). Tercera evaluación de test editados en España [Third evaluation of tests published in Spain]. Papeles del Psicólogo, 36(1), 1-8.
  • Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker & B. McGaw (Eds.), International Encyclopedia of Education (3rd edition), vol. 4, (pp. 36-44). USA: ElsevierScience & Technology.
  • Hidalgo, M. D., Benítez, I., Padilla, J. L., & Gómez-Benito, J. (2017). How polytomous item bias can affect total-group survey score comparisons. Sociological Methods and Research, 46(3), 586-603. https://doi.org/10.1177/0049124115605333
  • Hidalgo, M.D., López-Martínez, M.D., Gómez-Benito, J., & Guilera, G. (2016). A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests. Psicothema, 28(1), 83-88. https//doi.org/10.7334/psicothema2015.142
  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, N.J.: Erlbaum.
  • Kane, M.T. (2006). Validation. In B.L. Robert (Ed.), Educational measurement (4th ed., pp. 17-64). Wesport, CT: Praeger
  • Kane, M.T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73. https://doi.org/10.1111/jedm.12000
  • Lane, S. (2014). Validity evidence based on testing consequences. Psicothema, 26(1), 127-135. https//doi.org/10.7334/psicothema2013.258
  • Maddox, B., Zumbo, B. D., Tay-Lim, B., & Qu, D. (2015). An anthropologist among the psychometricians: Assessment events, Ethnography, and Differential Item Functioning in the Mongolian Gobi, International Journal of Testing, 15(4), 291-309. https://doi.org/10.1080/15305058. 2015.1017103
  • Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118. https://doi. org/10.2307/1164960
  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13-103). New York: MacMillan.
  • Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5-15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
  • Prieto, G., & Muñiz, J. (2000). Un modelo para evaluar la calidad de los tests utilizados en España [A model for evaluating quality of test used in Spain]. Papeles del Psicólogo, 77, 65-71.
  • Rogers, J., & Swaminathan, H. (2016). Concepts and methods in research on differential item functioning of tests items. Past, present, and future. In C.S. Wells & M. Faulkner-Bond (Eds.), Educational measurement (pp. 126-142). New York: Guilford Press.
  • Samuelson, K. (2005). Examining Differential Item Functioning from a latent class perspective. Dissertation Thesis. University of Maryland.
  • Sireci, S. G. (2005a). Using bilinguals to evaluate the comparability of different language versions of a test. In R.K. Hambleton, P. Merenda & C. Spielberger (Eds.), Adapting educational and psychological tests for crosscultural assessment (pp. 117-138). Hillsdale, NJ: Lawrence Erlbaum.
  • Sireci, S. G. (2005b). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12. https://doi.org/10.3102/0013189X034001003
  • Sireci, S. G. (2016). On the validity of useless tests. Assessment in Education: Principles, Policy & Practice, 23(2), 226-235. https://doi.o rg/10.1080/0969594X.2015.1072084
  • Sireci, S. G., & Padilla, J. L. (2014). Validity assessment: Introduction to the special section. Psicothema, 26(1), 97-99. https//doi.org/10.7334/psicothema2013.255
  • Sireci, S. G., & Ríos, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187. https://doi.org/10.1080/13803611.2013.767621
  • Spearman, C. (1904).The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72-101. https://doi.org/10.2307/1412159
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
  • Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera, O. L., & Tanvinder, K. A. (2015). A methodology for Zumbo’s third generation DIF Analysis and the Ecology of Item Responding. Language Assessment Quarterly, 12(1), 136-151. https://doi.org/10.1080/15434303.2014.972559