Title: | The Nonparametric Classification Methods for Cognitive Diagnosis |
---|---|
Description: | Statistical tools for analyzing cognitive diagnosis (CD) data collected from small settings using the nonparametric classification (NPCD) framework. The core methods of the NPCD framework includes the nonparametric classification (NPC) method developed by Chiu and Douglas (2013) <DOI:10.1007/s00357-013-9132-9> and the general NPC (GNPC) method developed by Chiu, Sun, and Bian (2018) <DOI:10.1007/s11336-017-9595-4> and Chiu and Köhn (2019) <DOI:10.1007/s11336-019-09660-x>. An extension of the NPCD framework included in the package is the nonparametric method for multiple-choice items (MC-NPC) developed by Wang, Chiu, and Koehn (2023) <DOI:10.3102/10769986221133088>. Functions associated with various extensions concerning the evaluation, validation, and feasibility of the CD analysis are also provided. These topics include the completeness of Q-matrix, Q-matrix refinement method, as well as Q-matrix estimation. |
Authors: | Chia-Yi Chiu [aut, cph], Weixuan Xiao [aut, cre], Xiran Wen [aut], Yu Wang [aut] |
Maintainer: | Weixuan Xiao <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-02-22 05:11:15 UTC |
Source: | https://github.com/cran/NPCDTools |
The function is used to compute the attribute-wise agreement rate between two sets of attribute profiles. They need to have the same dimensions.
AAR(x, y)
AAR(x, y)
x |
One set of attribute profiles |
y |
The other set of attribute profiles |
The function returns the attribute-wise agreement rate between two sets of attribute profiles.
Function bestQperm
is used to rearrange the columns of the estimated Q so that
the order of the columns best matches that of the true Q-matrix.
bestQperm(estQ, trueQ)
bestQperm(estQ, trueQ)
estQ |
The estimated Q-matrix. |
trueQ |
The true Q-matrix. |
The function returns a Q-matrix in which the order of the columns best matches that of the true Q-matrix.
This function computes the proportion of corrected q-entries that were originally misspecified in the provisional Q-matrix. This function is used only when the true Q-matrix is known.
correction.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
correction.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
ref.Q |
the |
mis.Q |
A |
true.Q |
The |
The function returns a value between 0 and 1 indicating the proportion of corrected q-entries in ref.Q
that were originally missepcified in mis.Q
.
Function GNPC
is used to estimate examinees' attribute profiles using
the general nonparametric classification (GNPC) method
(Chiu, Sun, & Bian, 2018; Chiu & Koehn, 2019). It can be
used with data conforming to any CDMs.
GNPC( Y, Q, initial.dis = c("hamming", "whamming"), initial.gate = c("AND", "OR", "Mix") )
GNPC( Y, Q, initial.dis = c("hamming", "whamming"), initial.gate = c("AND", "OR", "Mix") )
Y |
A |
Q |
A |
initial.dis |
The type of distance used in the |
initial.gate |
The type of relation between examinees' attribute profiles
and the items.
Allowable relations are |
The function returns a series of outputs, including
The estimates of examinees' attribute profiles
The estimates of examinees' class memberships
The weighted ideal responses
The weights used to compute the weighted ideal responses
A weighted ideal response , defined as the convex combination
of
and
, is proposed.
Suppose item j requires
attributes that, without loss of
generality, have been permuted to the first
positions of the item
attribute vector
. For each item j and
,
the weighted ideal response
is defined as the convex combination
where
. The distance between the observed responses
to item j and the weighted ideal responses
of examinees
in
is defined as the sum of squared deviations:
Thus,
can be minimizing
:
As a viable alternative to for obtaining initial
estimates of the proficiency classes, Chiu et al. (2018) suggested to
use an ideal response with fixed weights defined as
The function is used to estimate examinees' attribute profiles using the nonparametric classification (NPC) method (Chiu, & Douglas, 2013). It uses a distance-based algorithm on the observed item responses for classifying examiness. This function estimates attribute profiles using nonparametric approaches for both the "AND gate" (conjunctive) and the "OR gate" (disjunctive) cognitive diagnostic models. These algorithms select the attribute profile with the smallest loss function value (plain, weighted, or penalized Hamming distance, see below for details) as the estimate. If more than one attribute profiles have the smallest loss function value, one of them is randomly chosen.
NPC( Y, Q, gate = c("AND", "OR"), method = c("Hamming", "Weighted", "Penalized"), wg = 1, ws = 1 )
NPC( Y, Q, gate = c("AND", "OR"), method = c("Hamming", "Weighted", "Penalized"), wg = 1, ws = 1 )
Y |
A matrix of binary responses. Rows represent persons and columns represent items. 1=correct, 0=incorrect. |
Q |
The Q-matrix of the test. Rows represent items and columns represent attributes. 1=attribute required by the item, 0=attribute not required by the item. |
gate |
A character string specifying the type of gate. It can be one of the following:
|
method |
The method of nonparametric estimation.
|
wg |
Additional argument for the "penalized" method. It is the weight assigned to guessing in the DINA or DINO models. A large value of weight results in a stronger impact on Hamming distance (larger loss function values) caused by guessing. |
ws |
Additional input for the "penalized" method. It is the weight assigned to slipping in the DINA or DINO models. A large value of weight results in la stronger impact on Hamming distance (larger loss function values) caused by slipping. |
The function returns a series of outputs, including:
Estimated attribute profiles. Rows represent persons and columns represent attributes. 1=examinee masters the attribute, 0=examinee does not master the attribute.
Estimated ideal response to all items by all examinees. Rows represent persons and columns represent items. 1=correct, 0=incorrect.
The class number (row index in pattern) for each person's attribute profile. It can also be used for locating the loss function value in loss.matrix for the estimated attribute profile for each person.
Number of ties in the Hamming distance among the candidate attribute profiles for each person. When we encounter ties, one of the tied attribute profiles is randomly chosen.
All possible attribute profiles in the search space.
The matrix of the values for the loss function (the plain, weighted, or penalized Hamming distance). Rows represent candidate attribute profiles in the same order with the pattern matrix; columns represent different examinees.
Proficiency class membership is determined by comparing an examinee's
observed item response vector with each of the ideal
item response vectors of the realizable
proficiency classes.
The ideal item responses are a function of the Q-matrix and the attribute
vectors characteristic of the different proficiency classes. Hence, an
examinee’s proficiency class is identified by the attribute vector
underlying that ideal item response vector
which is closest—or most similar—to an examinee’s observed item response
vector. The ideal response to item j is the score that would be obtained
by an examinee if no perturbation occurred.
Let denote the J-dimensional ideal item response
vector of examinee i, and the
of an
examinee’s attribute vector is defined as the attribute vector
underlying the ideal item response vector that among all ideal item response
vectors minimizes the distance to an examinee’s observed item response vector:
A distance measure often used for clustering binary data is the Hamming
distance that simply counts the number of disagreements between two vectors:
If the different levels of variability in the item responses are to be
incorporated, then the Hamming distances can be weighted, for example, by the
inverse of the item sample variance, which allows for larger impact on the
distance functions of items with smaller variance:
Weighting weighting differently for departures from the ideal response model
that would result from slips versus guesses is also considered:
Chiu, C. (2011). Flexible approaches to cognitive diagnosis: nonparametric methods and small sample techniques. Invited session of cognitive diagnosis and item response theory at 2011 Joint Statistical Meeting.
Chiu, C. Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification 30(2), 225-250.
# Generate item and examinee profiles natt <- 3 nitem <- 4 nperson <- 5 Q <- rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(1, 1, 1)) alpha <- rbind(c(0, 0, 0), c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(1, 1, 1)) # Generate DINA model-based response data slip <- c(0.1, 0.15, 0.2, 0.25) guess <- c(0.1, 0.15, 0.2, 0.25) my.par <- list(slip=slip, guess=guess) data <- matrix(NA, nperson, nitem) eta <- matrix(NA, nperson, nitem) for (i in 1:nperson) { for (j in 1:nitem) { eta[i, j] <- prod(alpha[i,] ^ Q[j, ]) P <- (1 - slip[j]) ^ eta[i, j] * guess[j] ^ (1 - eta[i, j]) u <- runif(1) data[i, j] <- as.numeric(u < P) } } # Using the function to estimate examinee attribute profile alpha.est.NP.H <- NPC(data, Q, gate="AND", method="Hamming") alpha.est.NP.W <- NPC(data, Q, gate="AND", method="Weighted") alpha.est.NP.P <- NPC(data, Q, gate="AND", method="Penalized", wg=2, ws=1) nperson <- 1 # Choose an examinee to investigate print(alpha.est.NP.H) # Print the estimated examinee attribute profiles
# Generate item and examinee profiles natt <- 3 nitem <- 4 nperson <- 5 Q <- rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(1, 1, 1)) alpha <- rbind(c(0, 0, 0), c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(1, 1, 1)) # Generate DINA model-based response data slip <- c(0.1, 0.15, 0.2, 0.25) guess <- c(0.1, 0.15, 0.2, 0.25) my.par <- list(slip=slip, guess=guess) data <- matrix(NA, nperson, nitem) eta <- matrix(NA, nperson, nitem) for (i in 1:nperson) { for (j in 1:nitem) { eta[i, j] <- prod(alpha[i,] ^ Q[j, ]) P <- (1 - slip[j]) ^ eta[i, j] * guess[j] ^ (1 - eta[i, j]) u <- runif(1) data[i, j] <- as.numeric(u < P) } } # Using the function to estimate examinee attribute profile alpha.est.NP.H <- NPC(data, Q, gate="AND", method="Hamming") alpha.est.NP.W <- NPC(data, Q, gate="AND", method="Weighted") alpha.est.NP.P <- NPC(data, Q, gate="AND", method="Penalized", wg=2, ws=1) nperson <- 1 # Choose an examinee to investigate print(alpha.est.NP.H) # Print the estimated examinee attribute profiles
The function is used to compute the pattern-wise agreement rate between two sets of attribute profiles. They need to have the same dimensions.
PAR(x, y)
PAR(x, y)
x |
One set of attribute profiles |
y |
The other set of attribute profiles |
The function returns the pattern-wise agreement rate between two sets of attribute profiles.
The function generates a complete Q-matrix based on a pre-specified probability of getting a one.
Q.generate(K, J, p, single.att = TRUE)
Q.generate(K, J, p, single.att = TRUE)
K |
The number of attributes |
J |
The number of items |
p |
The probability of getting a one in the Q-matrix |
single.att |
Whether all the single attribute patterns are included.
If |
The function returns a complete dichotomous Q-matrix
q = Q.generate(3,20,0.5,single.att = TRUE) q1 = Q.generate(5,30,0.6,single.att = FALSE)
q = Q.generate(3,20,0.5,single.att = TRUE) q1 = Q.generate(5,30,0.6,single.att = FALSE)
We estimate memberships using the non-parametric classification method (weighted hamming), and comparisons of the residual sum of squares computed from the observed and the ideal item responses.
QR(Y, Q, gate = c("AND", "OR"), max.ite = 50)
QR(Y, Q, gate = c("AND", "OR"), max.ite = 50)
Y |
A matrix of binary responses (1=correct, 0=incorrect). Rows represent persons and columns represent items. |
Q |
The Q-matrix of the test. Rows represent items and columns represent attributes. |
gate |
A string, "AND" or "OR". "AND": the examinee needs to possess all related attributes to answer an item correctly. "OR": the examinee needs to possess only one of the related attributes to answer an item correctly. |
max.ite |
The number of iterations to run until all RSS of all items are stationary. |
A list containing:
initial.class |
Initial classification |
terminal.class |
Terminal classification |
modified.Q |
The modified Q-matrix |
modified.entries |
The modified q-entries |
This function implements the Q-matrix refinement method developed by Chiu (2013), which is also based on the aforementioned nonparametric classification methods (Chiu & Douglas, 2013). This Q-matrix refinement method corrects potential misspecified entries of the Q-matrix through comparisons of the residual sum of squares computed from the observed and the ideal item responses.
The algorithm operates by minimizing the RSS. Recall that is the
observed response and
is the ideal response.
Then the RSS of item
for examinee
is defined as
.
The RSS of item across all examinees is therefor
where is the latent proficiency-class
, and
is the number of examinees. Chiu(2013) proved that the expectation of
is minimized for the correct q-vector among the
candidates. Please see the paper for the justification.
Chiu, C. Y. (2013). Statistical Refinement of the Q-matrix in Cognitive Diagnosis. Applied Psychological Measurement, 37(8), 598-618.
This function computes the proportion of correctly specified q-entries in a provisional Q-matrix that remain correctly specified after a Q-matrix refinement procedure is applied. This function is used only when the true Q-matrix is known.
retention.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
retention.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
ref.Q |
the |
mis.Q |
A |
true.Q |
The |
The function returns a value between 0 and 1 indicating the proportion of
correctly specified q-entries in mis.Q
that remain correctly specified in ref.Q
after a Q-matrix refinement procedure is applied to mis.Q
.
Function RR
is used to compute the recovery rates for an estimate Q-matrix.
In general, it can be used to compute the agreement rate between two matrices with identical dimensionalities.
RR(Q1, Q2)
RR(Q1, Q2)
Q1 |
The first Q-matrix. |
Q2 |
The second Q-matrix that has the same dimensionality as Q1. |
The function returns
The entry-wise recovery rate
The item-wise recovery rate
The function is used to estimate the Q-matrix based on the data (responses) using the two-step Q-matrix estimation method.
TSQE( Y, K, input.cor = c("tetrachoric", "Pearson"), ref.method = c("QR", "GDI"), GDI.model = c("DINA", "ACDM", "RRUM", "GDINA"), cutoff = 0.8 )
TSQE( Y, K, input.cor = c("tetrachoric", "Pearson"), ref.method = c("QR", "GDI"), GDI.model = c("DINA", "ACDM", "RRUM", "GDINA"), cutoff = 0.8 )
Y |
A |
K |
The number of attributes in the Q-matrix |
input.cor |
The type of correlation used to compute the input for the exploratory factor analysis. It could be the tetrachoric or Pearson correlation. |
ref.method |
The refinement method use to polish the provisional Q-matrix obtained from the EFA. Currently available methods include the Q-matrix refinement (QR) method and the G-DINA discrimination index (GDI). |
GDI.model |
The CDM used in the GDI algorithm to fit the data. Currently available models include the DINA model, the ACDM, the RRUM, and the G-DINA model |
cutoff |
The cutoff used to dichotomize the entries in the provisional Q-matrix |
The function returns the estimated Q-matrix
The TSQE method merges the Provisional Attribute Extraction (PAE) algorithm with a Q-matrix refinement-and-validation method including the Q-Matrix Refinement (QR) Method and the G-DINA Model Discrimination Index (GDI). Specifically, the PAE algorithm relies on classic exploratory factor analysis (EFA) combined with a unique stopping rule for identifying a provisional Q-matrix, and the resulting provisional Q-Matrix will be "polished" with a refinement method to derive the final estimation of Q-matrix.
The initial step of the algorithm is to aggregating the collected Q-Matrix into an inter-item tetrachoric correlation matrix. The reason for using tetrachoric correlation is that the examinee responses are binary, so it is more appropriate than the Pearson product moment correlation coefficient. See Chiu et al. (2022) for details. The next step is to use factor analysis on the item-correlation matrix, and treat the extracted factors as proxies for the latent attributes. The third step concerns identifying which specific attributes are required for which item:
Initialize the item index as .
Let denote the loading of item
on factor
, where
.
Arrange the loadings in descending order. Define a mapping
function , where
is the order index.
Hence,
will indicate the maximum loading,
while
will indicate the minimum loading.
Define
as the proportion of the communality of item accounted for
by the first
factors.
Define
,
where is the cut-off value for the desired proportion
of item variance-accounted-for. Then, the ordered entries of the
provisional q-vector of item
are obtained as
.
Identify
by rearranging the ordered entries of the q-vector using the inverse function
.
Set and repeat (2) to (6) until
.
Then denote the provisional Q-matrix as
.
This function implements the Q-matrix refinement method developed by Chiu (2013), which is also based on the aforementioned nonparametric classification methods (Chiu & Douglas, 2013). This Q-matrix refinement method corrects potential misspecified entries of the Q-matrix through comparisons of the residual sum of squares computed from the observed and the ideal item responses.
The algorithm operates by minimizing the RSS. Recall that
is the observed response and
is the ideal response.
Then the RSS of item
for examinee
is defined as
.
The RSS of item across all examinees is therefor
where is the latent proficiency-class
,
and
is the number of examinees.
Chiu(2013) proved that the expectation of
is minimized for
the correct q-vector among the
candidates. Please see the
paper for the justification.
The GDI is an extension of de la Torre's (2008) -method,
which has a limitation that it cannot be used with CDMs that
devide examinees into more than two groups. In response to the limitation,
de la Torre and Chiu (2016) porposed to select that item attribute vector
which maximizes the weighted variance of the probabilities of a correct
response for the different groups defined as
where is the posterior probability for the proficiency class
, and
,
where
. De la Torre and Chiu (2016) called
the GDI, which can be applied to any CDM that can be reparameterized in
terms of the G-DINA model.
Chiu, C. Y. (2013). Statistical Refinement of the Q-matrix in Cognitive Diagnosis. Applied Psychological Measurement, 37(8), 598-618.
Chiu, C. Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification 30(2), 225-250.
de la Torre, J., & Chiu, C.-Y. (2016) A general method of empirical Q-matrix validation. Psychometrika, 81, 253-73.
de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343-362.