An overview of differential item functioning in multistage computer. How to write syntax for differential item functioning dif. Current issues 951 people who answered the item correctly at the ability level m, and the proportion of people who answered the item correctly at the ability level m, respectively. A general framework and an r package for the detection of. A handbook on the theory and methods of differential item functioning dif. The chull software program is a matlab graphical user interface that performs the chull procedure. Differential item functioning differential item functioning dif analysis can be used to examine whether items function similarly across different groups and identify items that appear to be too easy or difficult after controlling for the ability levels of the compared groups. Im calibrating an item pool which enables enables especially to measure different abilitylevels. Differential item functioning dif is an important issue of interest in psychometrics and educational measurement.
This has become an impediment in the way of specially nonmathematically oriented researches. It also describes possible advanced irt models using r packages, as well as dichotomous and polytomous irt models, and r packages that contain applications such as differential item functioning and equating are also introduced. Naep technical documentation number of items by severity of differential item functioning in the mathematics combined national and state assessment. An r package for rajus differential functioning of items and tests framework. Dorans, evaluating hypotheses about differential item functioning. Differential item functioning dif differential item functioning dif refers to the differences in item functioning. Software for analyzing differential item functioning. If the factor bringing about such a difference is not part of the construct of focus in the test, then the test would be biased. Several methods have been proposed in recent decades for identifying items that function differently between two or more groups of examinees. Differential item functioning dif is when a test item favors or hinders a characteristic exhibited by group members of a testtaking population.
A items exhibiting no dif, b items exhibiting a weak indication of dif, or c items exhibiting a strong indication of dif. A new method for estimating differential item functioning dif for multiple groups and polytomous items. Differential item functioning dif, as an assessment tool, has been widely used in quantitative psychology, educational measurement, business management, and insurance and healthcare industries. Figure 1 displays a scatterplot for the males and females item difficulties. Analyze dif with specialized software like dfit or parscale.
Pdf an introduction to differential item functioning researchgate. Burton, the effect of item screening on test scores and test characteristics. This is the webpage for the handbook on differential item functioning. Neither the list of the software nor the studies cited are meant to be. Differential item functioning dif has been increasingly applied in fairness studies in psychometric circles. In this investigation, a large sample, representative of a major university on key demographic.
Therefore, a broad range of items is needed, which difficultylevels scatters widely over the scale. Why differential item functioning analyses are an important part of instrument development and. Package difr may 14, 2018 type package title collection of methods to detect dichotomous differential item functioning dif version 5. Starting from a framework for classifying dif detection methods and from a comparative. You will have output that gives item parameters for your two groups.
Differential item functioning dif is the preferred psychometric term for what is otherwise known as item bias. Table 30 supports the investigation of item bias, differential item functioning dif, i. Differential item functioning dif is a statistical characteristic of an item that shows the extent to. Try to search for differential item functioning bayesian to find some resources. Three examples are used to illustrate this approach. A variety of statistical procedures have been developed to assess dif in tests of dichotomous hills, 1989.
Irt differential item functioning tool assess computerized. An r package for rajus differential functioning of. Software for the computation of the statistics involved in item response theory likelihoodratio tests for differential item functioning, 2001, unpublished manuscript to complete dif analyses. Doing so requires a careful balancing of the contributions of technology, psychometrics, test design, and the learning sciences. This analysis can be performed by calculating various statistics, one of the most important being the mantelhaenszel, which can be carried out with. Ministry of science and innovation under the european regional development fund. Perhaps the item is tapping a secondary factor or factors overandabove the one of interest. Comparison of three software programs for evaluating dif. Frontiers assessing differential item functioning on the. We analyzed 95 cognitive reading items, administered to students in 29 european countries. Software for analyzing differential item functioning using the mantel haenszel and standardization procedures. Logistic regression provides a flexible framework for detecting various types of differential item functioning dif.
Differential item functioning analysis with ordinal. Pdf differential item functioning dif has been increasingly applied in. Judicious application of this methodology by the researchers, however, requires an understanding of the technical complexities involved. The primary concern in test development and test use, as bachman 1990. In brief, differential item functioning dif occurs when groups such as defined by gender, ethnicity, age, or education have different probabilities of endorsing a given item on a multiitem scale after controlling for overall scale scores. Tilburg university differential item functioning and educational risk. This article introduces these 45 r packages with their descriptions and features. An item displays dif when test takers possessing the same amount of an ability or trait, but belonging to different subgroups, do not share the same likelihood of correctly answering the item. The torr is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. Differential item functioning dif is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups. Naep tdw number of items by severity of differential. Thus, differentially functioning items elicit different. Logistic regression modeling as a unitary framework for binary and likerttype ordinal item scores. A biased item deviates from the item response theory irt models used in naep, because the probability of doing well on the item depends not only on what the examinee knows and can do and on the item as reflected in the item parameters, but also on a characteristic of the item that is unrelated to the construct being measured.
The test of relational reasoning torr is designed to assess the ability to identify complex patterns within visuospatial stimuli. It occurs when test items function differently for students from two different comparison groups that are matched by the construct. The rows in each group refer to the levels from lower to higher, with the fourth row indicating the sum of each ability level. Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the same way for all subgroups. Lewis, a note on the value of including the studied item in the test score when analyzing test items for dif. Dif analyses are statistical procedures used to determine to what extent the content of an item affects the item endorsement of subgroups of testtakers. A new method for estimating differential item functioning dif for. The purpose of the present analysis is to use differential item functioning dif to identify differences in the performance of native and immigrant students in pisa 2009 that can be directly related to their responses to particular items. Pdf an introduction to differential item functioning. S115s123 o ne approach to the detection of differential item functioning dif is to use logistic regression lrbased techniques.
We examined differential item functioning dif indicators for four variables that repeatedly have. From what i found it is clear that the model would look something like. If dif is found for many items on the test, the final test scores do. The definitions, methods, and interpretations of differential item functioning are extended to the transpose of the usual personitem matrices.
Thus it can be chosen as the member of anchor items. As such, software that estimates twoparameter irt models is required for. Software for analyzing differential item functioning using the. Measuring differential item and test functioning across. Previous efforts extended the framework by using item response theory irt based trait scores, and by employing an iterative process using groupspecific item parameters to account for dif in the trait scores, analogous to purification. Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the same way for all. Dif occurs when examinees from different groups show differing probabilities of success on or endorsing the item after matching on the construct that the item is intended to measure notice that this is exactly the definition of mi applied to test items. Naep analysis and scaling differential item functioning. This issue, known as test bias, has been the subject of a great deal of recent research, and a technique called differential item functioning dif analysis has become the new standard in psychometric bias analysis.
We provide a tutorial on differential item functioning dif analysis. Measurement invariance and differential item functioning. Differential item functioning magnitude and impact. The purpose of dif analyses is to detect response differences of items in questionnaires, rating scales, or tests across different. Human development index with uneven distribution of wealth as. Paper 29002015 multiple ways to detect differential item. This paper presents dfit, an r package that implements the differential functioning of items and tests framework as well as the monte carlo item parameter replication approach for producing cutoff points for differential item functioning indices. The primary purpose is to enhance diagnostic assessment in which individual differences in scores between content domains are clarified by conditioning the scores on item difficulty. The analysis of differential item functioning dif examines whether item responses differ according to characteristics such as language and ethnicity, when people with matching ability levels respond differently to the items. For dichotomously scored items, naep categorizes each item into one of three categories. This article presents an ordinal logistic model for.
1077 302 729 13 1164 28 721 832 223 337 1343 886 186 1628 1614 105 585 1470 734 312 1150 1611 239 736 558 344 546 1252 875 1364 1057 1317 586 543 1305 834 425