Differential Item Functioning: Beyond validity evidence based on internal structure

Juana Gómez-Benito; Stephen Sireci; José-Luis Padilla; M. Dolores Hidalgo; Isabel Benítez

Differential Item FunctioningBeyond validity evidence based on internal structure

Juana Gómez-Benito ¹
Stephen Sireci ²
José-Luis Padilla ³
M. Dolores Hidalgo ⁴
Isabel Benítez ⁵

1 Universitat de Barcelona

Universitat de Barcelona

Barcelona, España

ROR https://ror.org/021018s57
2 University of Massachusetts Amherst

University of Massachusetts Amherst

Amherst Center, Estados Unidos

ROR https://ror.org/0072zz521
3 Universidad de Granada

Universidad de Granada

Granada, España

ROR https://ror.org/04njjy449
4 Universidad de Murcia

Universidad de Murcia

Murcia, España

ROR https://ror.org/03p3aeb86
5 Universidad Loyola

Universidad Loyola

La Paz, Bolivia

ROR https://ror.org/01wfnf418

Mostrar afiliaciones +

Revista:

Psicothema

ISSN: 0214-9915

Año de publicación: 2018

Volumen: 30

Número: 1

Páginas: 104-109

Tipo: Artículo

DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Psicothema

Resumen

Background: In the latest release of the Standards for Educational and Psychological Testing, Differential Item Functioning (DIF) is considered as validity evidence based on internal structure. However, there are no indications of how to design a DIF study as a validation study. In this paper, we propose relating DIF to all sources of validity evidence, and provide a general conceptual framework for transforming “typical” DIF studies into validation studies. Method: We perform a comprehensive review of the literature and make theoretical and practical proposals. Results: The article provides arguments in favour of addressing DIF detection and interpretation as validation studies, and suggestions for conducting DIF validation studies. Discussion: The combination of quantitative and qualitative data within a mixed methods research perspective, along with planning DIF studies as validation studies, can help improve the validity of test score interpretations

€ Ver financiación

Información de financiación

This work was supported by the Spain’s Ministry of Economy and Competitiveness [Grant number PSI2015-67984-R), and the Andalusia Regional Government under the Excellent Research Fund [SEJ-6569].

Financiadores

Diabetes Institutes Foundation United States
- PSI2015-67984-R
- SEJ-6569

Referencias bibliográficas

American Psychological Association, American Educational Research Association & National Council on Measurement in Education (1974). Standards for educational and psychological tests. Washington, DC: American Psychological Association.
American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
American Educational Research Association, American Psychological Association & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Angoff, W. H. (1993). Perspectives on Differential Item Functioning Methodology. In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 3-23). Dublin: Educational Research Center.
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95106. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
Benítez, I., & Padilla, J. L. (2014). Analysis of non-equivalent assessments across different linguistic groups using a Mixed Methods approach: Understanding the causes of Differential Item Functioning by Cognitive Interviewing. Journal of Mixed Methods Research, 8(1), 52-68. https://doi.org/10.1177/1558689813488245
Benítez, I., Padilla, J. L., Hidalgo, M. D., & Sireci, S. (2015). Using mixed methods to interpret Differential Item Functioning. Applied Measurement in Education, 29(1), 1-16. https://doi.org/10.1080/0895 7347.2015.1102915
Camilli, G., & Shepard, L. A. (1994). Methods for Identifying Biased Test Items. United States: SAGE Publications.
Cizek, G.J., Rosenberg, S.L., & Koons, H.H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68(3), 397-412. https://doi.org/10.1177/0013164407310130
Cleary, T. A., & Hilton, T. L. (1968). An investigation of item bias. Educational and Psychological Measurement, 28(1), 61-75. https://doi.org/10.1177/001316446802800106
Cohen, A.S., & Bolt, D.M. (2002). A mixture model analysis of differential item functioning. Paper presented at the Annual Meeting of the American Educational Research Association. New Orleans.
Creswell, J. W. (2015). A concise introduction to mixed methods research. Thousand Oaks, CA: Sage Publications.
Crocker, L. (1996). Editorial: The great validity debate. Educational Measurement: Issues and Practice, 16(2), 4. https://doi.org/10.1111/j.1745-3992.1997.tb00584.x
Embretson, S. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179-197. https://doi.org/10.1037/0033-2909.93.1.179
Evers, A., Muñiz, J., Hagemeister, C., Høstmælingen, A., Lindley, P., Sjöberg, A., & Bartram, D. (2013). Assessing the quality of tests: Revision of the EFPA review model. Psicothema, 25(3), 283-291. https://doi.org/10.7334/psicothema2013.97
Gadermann, A.M., Guhn, M., & Zumbo, B.D. (2011). Investigating the substantive aspect of construct validity for the Satisfaction with Life Scale adapted for children: A focus on cognitive processes. Social Indicator Research, 100, 37-60. https://doi.org/10.1007/s11205-0109603-x
Gómez-Benito, J., Balluerka, N., González, A., Widaman, K. F., & Padilla, J. L. (2017). Detecting differential item functioning in behavioral indicators across parallel forms. Psicothema, 29(1), 91-95. https//doi.org/10.7334/psicothema2015.112
Hernández, A., Tomás, I., Ferreres, A., & Lloret, S. (2015). Tercera evaluación de test editados en España [Third evaluation of tests published in Spain]. Papeles del Psicólogo, 36(1), 1-8.
Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker & B. McGaw (Eds.), International Encyclopedia of Education (3rd edition), vol. 4, (pp. 36-44). USA: ElsevierScience & Technology.
Hidalgo, M. D., Benítez, I., Padilla, J. L., & Gómez-Benito, J. (2017). How polytomous item bias can affect total-group survey score comparisons. Sociological Methods and Research, 46(3), 586-603. https://doi.org/10.1177/0049124115605333
Hidalgo, M.D., López-Martínez, M.D., Gómez-Benito, J., & Guilera, G. (2016). A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests. Psicothema, 28(1), 83-88. https//doi.org/10.7334/psicothema2015.142
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, N.J.: Erlbaum.
Kane, M.T. (2006). Validation. In B.L. Robert (Ed.), Educational measurement (4th ed., pp. 17-64). Wesport, CT: Praeger
Kane, M.T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73. https://doi.org/10.1111/jedm.12000
Lane, S. (2014). Validity evidence based on testing consequences. Psicothema, 26(1), 127-135. https//doi.org/10.7334/psicothema2013.258
Maddox, B., Zumbo, B. D., Tay-Lim, B., & Qu, D. (2015). An anthropologist among the psychometricians: Assessment events, Ethnography, and Differential Item Functioning in the Mongolian Gobi, International Journal of Testing, 15(4), 291-309. https://doi.org/10.1080/15305058. 2015.1017103
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118. https://doi. org/10.2307/1164960
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13-103). New York: MacMillan.
Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5-15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
Prieto, G., & Muñiz, J. (2000). Un modelo para evaluar la calidad de los tests utilizados en España [A model for evaluating quality of test used in Spain]. Papeles del Psicólogo, 77, 65-71.
Rogers, J., & Swaminathan, H. (2016). Concepts and methods in research on differential item functioning of tests items. Past, present, and future. In C.S. Wells & M. Faulkner-Bond (Eds.), Educational measurement (pp. 126-142). New York: Guilford Press.
Samuelson, K. (2005). Examining Differential Item Functioning from a latent class perspective. Dissertation Thesis. University of Maryland.
Sireci, S. G. (2005a). Using bilinguals to evaluate the comparability of different language versions of a test. In R.K. Hambleton, P. Merenda & C. Spielberger (Eds.), Adapting educational and psychological tests for crosscultural assessment (pp. 117-138). Hillsdale, NJ: Lawrence Erlbaum.
Sireci, S. G. (2005b). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12. https://doi.org/10.3102/0013189X034001003
Sireci, S. G. (2016). On the validity of useless tests. Assessment in Education: Principles, Policy & Practice, 23(2), 226-235. https://doi.o rg/10.1080/0969594X.2015.1072084
Sireci, S. G., & Padilla, J. L. (2014). Validity assessment: Introduction to the special section. Psicothema, 26(1), 97-99. https//doi.org/10.7334/psicothema2013.255
Sireci, S. G., & Ríos, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187. https://doi.org/10.1080/13803611.2013.767621
Spearman, C. (1904).The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72-101. https://doi.org/10.2307/1412159
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera, O. L., & Tanvinder, K. A. (2015). A methodology for Zumbo’s third generation DIF Analysis and the Ecology of Item Responding. Language Assessment Quarterly, 12(1), 136-151. https://doi.org/10.1080/15434303.2014.972559

Fuente de los datos: Dialnet

Differential Item FunctioningBeyond validity evidence based on internal structure

Universitat de Barcelona

University of Massachusetts Amherst

Universidad de Granada

Universidad de Murcia

Universidad Loyola

Resumen

Información de financiación

Financiadores

Referencias bibliográficas