| Title: | The Nonparametric Classification Methods for Cognitive Diagnosis |
|---|---|
| Description: | Statistical tools for analyzing cognitive diagnosis (CD) data collected from small settings using the nonparametric classification (NPCD) framework. The core methods of the NPCD framework includes the nonparametric classification (NPC) method developed by Chiu and Douglas (2013) <DOI:10.1007/s00357-013-9132-9> and the general NPC (GNPC) method developed by Chiu, Sun, and Bian (2018) <DOI:10.1007/s11336-017-9595-4> and Chiu and Köhn (2019) <DOI:10.1007/s11336-019-09660-x>. An extension of the NPCD framework included in the package is the nonparametric method for multiple-choice items (MC-NPC) developed by Wang, Chiu, and Koehn (2023) <DOI:10.3102/10769986221133088>. Functions associated with various extensions concerning the evaluation, validation, and feasibility of the CD analysis are also provided. These topics include the completeness of Q-matrix, Q-matrix refinement method, as well as Q-matrix estimation. |
| Authors: | Chia-Yi Chiu [aut, cph], Weixuan Xiao [aut, cre], Hans Friedrich Köhn [aut], Yu Wang [aut], Xiran Wen [aut] |
| Maintainer: | Weixuan Xiao <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.0 |
| Built: | 2026-06-01 09:53:14 UTC |
| Source: | https://github.com/cran/NPCDTools |
The function is used to compute the attribute-wise agreement rate between two sets of attribute profiles. They need to have the same dimensions.
AAR(x, y)AAR(x, y)
x |
One set of attribute profiles |
y |
The other set of attribute profiles |
The function returns the attribute-wise agreement rate between two sets of attribute profiles.
# see examples used for GNPC.# see examples used for GNPC.
Function bestQperm is used to permute the columns of a Q-matrix so that
the order of the columns best matches that of the benchmark Q-matrix. This function
is useful in a Q-matrix estimation process.
bestQperm(Q, bench.Q)bestQperm(Q, bench.Q)
Q |
The targeted Q-matrix. |
bench.Q |
The benchmark Q-matrix. |
The function returns a Q-matrix in which the order of the columns best matches that of the benchmark Q-matrix.
# See examples used for TSQE.# See examples used for TSQE.
This function computes the proportion of corrected q-entries that were originally misspecified in the provisional Q-matrix. This function is used only when the true Q-matrix is known.
correction.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)correction.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
ref.Q |
The |
mis.Q |
A |
true.Q |
The |
The function returns a value between 0 and 1 representing the proportion of corrected q-entries in ref.Q
that were originally misspecified in mis.Q.
# See examples used for QR function.# See examples used for QR function.
Function distractor.check is used to assess whether the distractors of a given Q-matrix for multiple-choice items are plausible and/or proper.
distractor.check(Q = Q, key = NULL)distractor.check(Q = Q, key = NULL)
Q |
The given Q-matrix for multiple-choice items. It has to be organized in the following manner. The Q-matrix should contain |
key |
A vector that indicates the options where the keys are located. |
A list with class "distractor.check" containing:
A matrix indicating items and options that are not plausible, or NULL if all items are plausible.
A vector of item IDs that are not proper, or NULL if all items are proper.
Logical; TRUE if all items are plausible.
Logical; TRUE if all items are proper.
Chiu, C.-Y., Köhn, H. F. & Wang, Y. (Online first). Plausible and proper multiple-choice items for diagnostic classification. Psychometrika.
## Not run: library(NPCDTools) Q1 <- Q_Ozaki distractor.check(Q1) Q2 <- GDINA::sim10MCDINA2$simQ key <- c(1, 2, 4, 1, 1, 3, 2, 4, 1, 4) distractor.check(Q2, key) ## End(Not run)## Not run: library(NPCDTools) Q1 <- Q_Ozaki distractor.check(Q1) Q2 <- GDINA::sim10MCDINA2$simQ key <- c(1, 2, 4, 1, 1, 3, 2, 4, 1, 4) distractor.check(Q2, key) ## End(Not run)
Function GNPC is used to estimate examinees' attribute profiles using
the general nonparametric classification (GNPC) method
(Chiu et al., 2018; Chiu & Köhn, 2019). It can be
used with data conforming to any cognitive diagnosis models (CDMs).
GNPC( Y, Q, fixed.w = FALSE, initial.dis = "hamming", initial.gate = "AND", max.iter = 1000, tol = 0.001, track.convergence = TRUE )GNPC( Y, Q, fixed.w = FALSE, initial.dis = "hamming", initial.gate = "AND", max.iter = 1000, tol = 0.001, track.convergence = TRUE )
Y |
A |
Q |
A |
fixed.w |
|
initial.dis |
The type of distance used in the |
initial.gate |
The type of relation between examinees' attribute profiles
and the items.
Allowable relations are " |
max.iter |
Maximum number of iterations allowed. Default is 1000. |
tol |
Convergence tolerance. The algorithm stops when the proportion of examinees whose classification changes is less than this value. Default is 0.001. |
track.convergence |
Logical. If |
The function returns a list with the following components:
A matrix of estimated attribute profiles for examinees
A vector of length containing the estimated class memberships
A matrix of weighted ideal responses
A matrix of weights used to compute the weighted ideal responses
(Only if track.convergence = TRUE) A list containing:
iteration: Vector of iteration numbers
prop.change: Proportion of examinees whose classification changed at each iteration
total.distance: Total squared distance between observed and weighted ideal responses
n.iter: Total number of iterations until convergence
converged: Logical indicating whether the algorithm converged within max.iter
A weighted ideal response , defined as the convex combination
of and , is used in the GNPC method to compute distances.
Suppose item requires attributes that, without loss of
generality, have been moved to the first positions of the item
attribute vector . For each item and latent class ,
the weighted ideal response is defined as the convex combination
where . The distance between the observed responses
to item and the weighted ideal responses of examinees
in is defined as the sum of squared deviations:
.
can then be obtained by minimizing , which can then be used to compute .
After all the are obtained, examinees' attribute profiles
can be estimated by minimizing the loss function
The algorithm iteratively updates the weighted ideal responses and reclassifies
examinees until convergence is achieved. The stopping criterion is based on the proportion
of examinees whose classification changes between consecutive iterations:
where is the tolerance level (default = 0.001).
The default initial values of are obtained by using the NPC method. Chiu et al. (2018)
suggested another viable alternative for obtaining initial estimates of the proficiency classes by
using an ideal response with fixed weights defined as
.
programs: The general nonparametric classification method. Psychometrika, 83(2), 355–375. doi:10.1007/s11336-017-9595-4
classification method. Psychometrika, 84(3), 830–845. doi:10.1007/s11336-019-09660-x
## Not run: # Example 1: Basic usage library(GDINA) set.seed(123) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Analyze data using GNPC result <- GNPC(Y, Q, initial.dis = "hamming", initial.gate = "AND") # View results head(result$att.est) table(result$class) # Plot overall convergence plot(result) # Plot individual examinee's convergence plot(result, type = "individual", examinee.id = 1, true.alpha = alpha[1, ]) # Check attribute agreement rate PAR(alpha, result$att.est) AAR(alpha, result$att.est) # Example 2: Without convergence tracking (Convergence tracking is only used for the GNPC plots.) result2 <- GNPC(Y, Q, track.convergence = FALSE) ## End(Not run)## Not run: # Example 1: Basic usage library(GDINA) set.seed(123) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Analyze data using GNPC result <- GNPC(Y, Q, initial.dis = "hamming", initial.gate = "AND") # View results head(result$att.est) table(result$class) # Plot overall convergence plot(result) # Plot individual examinee's convergence plot(result, type = "individual", examinee.id = 1, true.alpha = alpha[1, ]) # Check attribute agreement rate PAR(alpha, result$att.est) AAR(alpha, result$att.est) # Example 2: Without convergence tracking (Convergence tracking is only used for the GNPC plots.) result2 <- GNPC(Y, Q, track.convergence = FALSE) ## End(Not run)
The function estimates examinees' attribute profiles using the nonparametric classification (NPC) method (Chiu & Douglas, 2013). An examinee's attribute profile is estimated by minimizing the distance between the observed and ideal item responses.
NPC( Y, Q, distance = c("hamming", "whamming", "penalized"), gate = c("AND", "OR"), wg = 1, ws = 1 )NPC( Y, Q, distance = c("hamming", "whamming", "penalized"), gate = c("AND", "OR"), wg = 1, ws = 1 )
Y |
A |
Q |
A |
distance |
The type of distance used to compute the loss function. The possible options include
(i) " |
gate |
A character string specifying the type of gate. The possible options include " |
wg |
Additional argument for the "penalized" method. It is a weight assigned to guesses in the DINA or DINO models. A large value of weight results in a stronger impact on the distance (i.e., larger loss function values) caused by guessing. |
ws |
Additional input for the "penalized" method. It is the weight assigned to slips in the DINA or DINO models. A large value of weight results in a stronger impact on the distance (i.e., larger loss function values) caused by slipping. |
The function returns a series of outputs, including:
A matrix representing the estimated attribute profiles.
1 = examinee masters the attribute, 0 = examinee does not master the attribute.
A matrix indicating the estimated ideal response to all
items from all examinees. 1 = correct, 0 = incorrect.
A -dimensional vector showing the class memberships for all examinees.
The number of ties in the Hamming distance among the candidate attribute profiles for each person. When ties occur, one of the tied attribute profiles is randomly chosen.
All possible attribute profiles in the latent space.
A matrix containing the values of the loss function
(the distances) between each examinee's observed response vector and the ideal response vectors.
The nonparametric classification (NPC) method (Chiu & Douglas, 2013) assigns examinees to the
proficiency classes they belong to by comparing their observed item response patterns with each of the ideal
item response patterns of the proficiency classes. When there is no data perturbation, an
examinee's ideal response pattern corresponding to the examinee's true attribute pattern and his/her
observed item response patterns are identical, and thus the distance between them is 0. When data
perturbations are small, this ideal response pattern remains the one most similar to the observed
response pattern, which is exactly the setup of data conforming to the DINA or DINO model. Hence, based
on this rationale, an examinee's attribute profile is obtained by minimizing the distance between the
observed and the ideal item response patterns. The nonparametric nature of the NPC method furthermore
makes it suitable for data obtained from small-scale settings.
Chiu, C. Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30(2), 225-250. doi:10.1007/s00357-013-9132-9
## Not run: library(GDINA) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Estimate attribute profiles using NPC result <- NPC(Y, Q, distance = "hamming", gate = "AND") print(result) result$alpha.est # Check attributed agreement rate PAR(alpha, result$alpha.est) AAR(alpha, result$alpha.est) ## End(Not run)## Not run: library(GDINA) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Estimate attribute profiles using NPC result <- NPC(Y, Q, distance = "hamming", gate = "AND") print(result) result$alpha.est # Check attributed agreement rate PAR(alpha, result$alpha.est) AAR(alpha, result$alpha.est) ## End(Not run)
The function is used to compute the pattern-wise agreement rate between two sets of attribute profiles. They need to have the same dimensions.
PAR(x, y)PAR(x, y)
x |
One set of attribute profiles |
y |
The other set of attribute profiles |
The function returns the pattern-wise agreement rate between two sets of attribute profiles.
# see examples used for GNPC.# see examples used for GNPC.
This function gives two types of diagnostic plots for the outcomes of the GNPC algorithm.
The type = "convergence" option gives two graphs: The upper panel displays the trajectory of the proportion of
membership switches and the lower panel shows the total squared distance along with the iterations.
They illustrate the detailed information about how and whether the
algorithm has converged.
The type = "individual" option returns a sequence of squared distances for a
single examinee and the estimates of the examinee's attribute profile
along with the iterations. The plots allow users to investigate how the algorithm
arrives at its final classifications. In simulation studies, the true
attribute profile can be provided as a reference.
## S3 method for class 'GNPC' plot( x, type = c("convergence", "individual"), examinee.id = NULL, true.alpha = NULL, top.n.pattern = NULL, ... )## S3 method for class 'GNPC' plot( x, type = c("convergence", "individual"), examinee.id = NULL, true.alpha = NULL, top.n.pattern = NULL, ... )
x |
An object of class |
type |
|
examinee.id |
An integer indicating which examinee to be plotted. This argument is required
if |
true.alpha |
A numeric vector of length |
top.n.pattern |
An integer specifying the maximum number of patterns to be displayed.
The default is |
... |
Additional arguments passed to |
For "individual" plots, the visual elements are:
Red line: the attribute pattern ultimately selected by GNPC.
Black line: the true attribute pattern (only when
true.alpha is provided). A small vertical jitter is applied
when it overlaps with the red line.
Other colored lines: the most competitive candidate patterns, selected by proximity at the final iteration.
Filled circle at each iteration: indicates which attribute profile GNPC assigned to the examinee at each iteration, drawn in the corresponding line's color. The line for the true attribute profile always has black circles at every iteration as a fixed reference.
## Not run: library(GDINA) set.seed(123) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Analyze data using GNPC result <- GNPC(Y, Q, initial.dis = "hamming", initial.gate = "AND") # Convergence plot(result) # Individual with true attribute profile (simulation) plot(result, type = "individual", examinee.id = 1, true.alpha = alpha[1, ]) # Individual without true attribute profile (real data) plot(result, type = "individual", examinee.id = 1) ## End(Not run)## Not run: library(GDINA) set.seed(123) N <- 500 Q <- sim30GDINA$simQ gs <- data.frame(guess = rep(0.2, nrow(Q)), slip = rep(0.2, nrow(Q))) sim <- simGDINA(N, Q, gs.parm = gs, model = "DINA") Y <- extract(sim, what = "dat") alpha <- extract(sim, what = "attribute") # Analyze data using GNPC result <- GNPC(Y, Q, initial.dis = "hamming", initial.gate = "AND") # Convergence plot(result) # Individual with true attribute profile (simulation) plot(result, type = "individual", examinee.id = 1, true.alpha = alpha[1, ]) # Individual without true attribute profile (real data) plot(result, type = "individual", examinee.id = 1) ## End(Not run)
Prints a summary of whether the distractors of a given Q-matrix for multiple-choice items are plausible and/or proper.
## S3 method for class 'distractor.check' print(x, ...)## S3 method for class 'distractor.check' print(x, ...)
x |
An object of class "distractor.check" returned by |
... |
Additional arguments (currently not used). |
Prints a summary of the GNPC estimation results.
## S3 method for class 'GNPC' print(x, ...)## S3 method for class 'GNPC' print(x, ...)
x |
An object of class |
... |
Additional arguments (not used). |
Prints a summary of NPC classification results.
## S3 method for class 'NPC' print(x, ...)## S3 method for class 'NPC' print(x, ...)
x |
An object of class |
... |
Additional arguments (not used). |
Print method for objects of class "Qcompleteness".
## S3 method for class 'Qcompleteness' print(x, ...)## S3 method for class 'Qcompleteness' print(x, ...)
x |
An object of class |
... |
Additional arguments (not used) |
Prints a summary of the Q-matrix refinement process, including which entries were modified and the final refined Q-matrix.
## S3 method for class 'Qrefine' print(x, ...)## S3 method for class 'Qrefine' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to print methods. |
A Q-matrix for 30 multiple-choice items measuring 5 attributes, originally used in Ozaki (2015) to demonstrate structured MC-DINA models. Items 1–10 are single-attribute items with no coded distractors. Items 11–20 have one coded distractor each, and items 21–30 have two or three coded distractors.
Q_OzakiQ_Ozaki
A data frame with 66 rows and 7 columns:
Item index (1–30).
Option index. For each item, the first row is the key; subsequent rows are coded distractors.
Attribute 1 indicator (0 or 1).
Attribute 2 indicator (0 or 1).
Attribute 3 indicator (0 or 1).
Attribute 4 indicator (0 or 1).
Attribute 5 indicator (0 or 1).
Ozaki, K. (2015). DINA models for multiple-choice items with few parameters: Considering incorrect answers. Applied Psychological Measurement, 39, 431–447.
Q.completeness is used to examine whether a given Q-matrix is
complete when data conform to a specified CDM. A Q-matrix is said
to be complete if it allows for the unique identification of all possible attribute profiles
among examinees. So far, the function can only be used for a binary Q-matrix with binary responses.
Q.completeness(raw.Q, model = NULL)Q.completeness(raw.Q, model = NULL)
raw.Q |
The Q-matrix that is to be checked, where |
model |
Character string specifying the cognitive diagnosis model. Valid options
are " |
The conditions for one Q-matrix completeness are model-dependent: a Q-matrix may be complete for one CDM but incomplete for another. This function implements the theoretical work developed by Chiu et al. (2009) and Köhn and Chiu (2017).
For DINA and DINO models:
A Q-matrix is complete if and only if it contains all single-attribute items
(Chiu et al., 2009).
For More General CDMs:
The function implements a sequential procedure based on Theorems 3-4 and Propositions 1-2 in the work by Köhn and Chiu (2017).
If Q contains all single-attribute items, it is complete (Proposition 1).
If Q has rank , it is incomplete (Theorem 3).
For full-rank Q-matrices without all single-attribute items, the function
examines non-nested attribute pairs using indicator vectors to determine
if distinct expected response patterns can be guaranteed.
The theoretical framework establishes the sufficient conditions for Q completeness,
which means completeness implies distinct expected item response patterns for
all possible attribute profiles.
A list of class "Qcompleteness" containing:
is_complete |
Logical value indicating completeness: |
status |
Character string: " |
message |
Character string with detailed explanation of the result. |
model |
The CDM used for assessment. |
K |
Number of attributes in the Q-matrix. |
J |
Number of items in the Q-matrix. |
The function also prints the status message to the console as a side effect.
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633-665. doi:10.1007/s11336-009-9125-0
Köhn, H.-F., & Chiu, C.-Y. (2017). A procedure for assessing the completeness of the Q-matrices of cognitively diagnostic tests. Psychometrika, 82(1), 112-132. doi:10.1007/s11336-016-9536-7
Köhn, H.-F., & Chiu, C.-Y. (2018). How to build a complete Q-matrix for a #' cognitively diagnostic test. Journal of Classification, 35(2), 273-299. doi:10.1007/s00357-018-9255-0
## Not run: # Example 1: Complete Q-matrix for DINA model # (contains all 3 single-attribute items) Q1 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1), ncol = 3, byrow = TRUE) result1 <- Q.completeness(Q1, model = "DINA") print(result1$is_complete) # TRUE # Example 2: Incomplete Q-matrix for DINA model # (missing single-attribute items) Q2 <- matrix(c(1, 1, 0, 1, 0, 1, 0, 1, 1), ncol = 3, byrow = TRUE) result2 <- Q.completeness(Q2, model = "DINA") print(result2$is_complete) # FALSE # Example 3: Check completeness for general CDM Q3 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1), ncol = 3, byrow = TRUE) result3 <- Q.completeness(Q3, model = "General") ## End(Not run)## Not run: # Example 1: Complete Q-matrix for DINA model # (contains all 3 single-attribute items) Q1 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1), ncol = 3, byrow = TRUE) result1 <- Q.completeness(Q1, model = "DINA") print(result1$is_complete) # TRUE # Example 2: Incomplete Q-matrix for DINA model # (missing single-attribute items) Q2 <- matrix(c(1, 1, 0, 1, 0, 1, 0, 1, 1), ncol = 3, byrow = TRUE) result2 <- Q.completeness(Q2, model = "DINA") print(result2$is_complete) # FALSE # Example 3: Check completeness for general CDM Q3 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1), ncol = 3, byrow = TRUE) result3 <- Q.completeness(Q3, model = "General") ## End(Not run)
The function generates a complete Q-matrix based on a pre-specified probability that each q-entry equals 1.
Q.generate(K, J, p, single.att = TRUE)Q.generate(K, J, p, single.att = TRUE)
K |
The number of attributes. |
J |
The number of items. |
p |
The probability that each q-entry equals 1. |
single.att |
Whether all the single-attribute patterns are included.
If |
The function returns a dichotomous Q-matrix. A complete Q-matrix is the default unless
single.att = F is specified.
## Not run: # Example 1: A complete Q-matrix with items requiring fewer attributes. Q1 = Q.generate(3, 20, 0.5, single.att = TRUE) # Example 2: A Q-matrix with items requiring more attributes but completeness is not guaranteed. Q2 = Q.generate(5, 30, 0.6, single.att = FALSE) ## End(Not run)## Not run: # Example 1: A complete Q-matrix with items requiring fewer attributes. Q1 = Q.generate(3, 20, 0.5, single.att = TRUE) # Example 2: A Q-matrix with items requiring more attributes but completeness is not guaranteed. Q2 = Q.generate(5, 30, 0.6, single.att = FALSE) ## End(Not run)
This function turns a proper and plausible Q-matrix into an implausible Q-matrix with the user-specified number of implausible distractors.
Q.implausible(Q, n.implausible)Q.implausible(Q, n.implausible)
Q |
A proper and plausible Q-matrix for MC items. |
n.implausible |
The number of items that have implausible distractors. |
The function returns
The generated Q-matrix
The ID of the items that have implausible distractors.
Chiu, C.-Y., Köhn, H. F. & Wang, Y. (Online first). Plausible and proper multiple-choice items for diagnostic classification. Psychometrika. doi:10.1017/psy.2025.10074
This function turns a proper and plausible Q-matrix into an improper Q-matrix with the user-specified number of improper distractors.
Q.improper(Q, n.improper)Q.improper(Q, n.improper)
Q |
A proper and plausible Q-matrix for MC items. |
n.improper |
The number of improper distractors. |
The function returns
The improper Q-matrix generated from Q
The ID of the items that are improper.
Chiu, C.-Y., Köhn, H. F. & Wang, Y. (Online first). Plausible and proper multiple-choice items for diagnostic classification. Psychometrika. doi:10.1017/psy.2025.10074
The QR function refines a provisional Q-matrix by minimizing the residual sum of squares (RSS)
between the observed and ideal item responses across all possible q-vectors, given the estimates of
examinees' attribute profiles.
QR(Y, Q, gate = c("AND", "OR"), max.ite = 50)QR(Y, Q, gate = c("AND", "OR"), max.ite = 50)
Y |
A |
Q |
A |
gate |
A string, " |
max.ite |
The number of iterations to run until the RSS's of all items are stationary. |
This function implements the Q-matrix refinement (QR) method developed by Chiu
(2013). The NPC method (Chiu & Douglas, 2013) is first used to classify examinees and the best q-vector
for an item is identified by minimizing its RSS. Specifically, the RSS of
item for examinee is defined as
where for is the th proficiency class, and
is the number of examinees. Chiu (2013) proved that the expected value of
corresponding to the correct q-vector is the minimum among the
candidates.
A list containing:
initial.class |
Initial classifications of examinees |
terminal.class |
Terminal classification of examinees |
modified.Q |
The modified Q-matrix |
modified.entries |
The modified q-entries |
Chiu, C. Y. (2013). Statistical Refinement of the Q-matrix in Cognitive Diagnosis. Applied Psychological Measurement, 37(8), 598-618. doi:10.1177/0146621613488436
## Not run: ## Generate data library(GDINA) N = 500 Q = sim30GDINA$simQ J = nrow(Q) K= ncol(Q) gs = data.frame(guess = rep(0.2,J), slip = rep(0.2,J)) sim = simGDINA(N, Q, gs.parm = gs, model = "DINA") Y = extract(sim,what = "dat") ## Randomly generate a misspecified Q with 20% of misspecifications mis.Q = matrix(0, J, K) while (any(rowSums(mis.Q)==0)==T){ mis.q = sample(J*K, J*K*0.2) ## percentage of misspecified q ind = arrayInd(mis.q, dim(Q)) mis.Q = Q mis.Q[ind] = 1-mis.Q[ind] } ## Refine the misspecified Q-matrix ref = QR(Y, mis.Q) ref.Q = ref$modified.Q ## Compute the entry-wise and item-wise recovery rates rr = RR(ref.Q, Q) rr$entry.wise rr$item.wise ## Compute the retention rate retention.rate(ref.Q, mis.Q, Q) ## Compute the correction rate correction.rate(ref.Q, mis.Q, Q) ## End(Not run)## Not run: ## Generate data library(GDINA) N = 500 Q = sim30GDINA$simQ J = nrow(Q) K= ncol(Q) gs = data.frame(guess = rep(0.2,J), slip = rep(0.2,J)) sim = simGDINA(N, Q, gs.parm = gs, model = "DINA") Y = extract(sim,what = "dat") ## Randomly generate a misspecified Q with 20% of misspecifications mis.Q = matrix(0, J, K) while (any(rowSums(mis.Q)==0)==T){ mis.q = sample(J*K, J*K*0.2) ## percentage of misspecified q ind = arrayInd(mis.q, dim(Q)) mis.Q = Q mis.Q[ind] = 1-mis.Q[ind] } ## Refine the misspecified Q-matrix ref = QR(Y, mis.Q) ref.Q = ref$modified.Q ## Compute the entry-wise and item-wise recovery rates rr = RR(ref.Q, Q) rr$entry.wise rr$item.wise ## Compute the retention rate retention.rate(ref.Q, mis.Q, Q) ## Compute the correction rate correction.rate(ref.Q, mis.Q, Q) ## End(Not run)
This function computes the proportion of correctly specified q-entries in a provisional Q-matrix that remain correctly specified after a Q-matrix refinement procedure is applied. This function is used only when the true Q-matrix is known.
retention.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)retention.rate(ref.Q = ref.Q, mis.Q = mis.Q, true.Q = true.Q)
ref.Q |
The |
mis.Q |
A |
true.Q |
The |
The function returns a value between 0 and 1 indicating the proportion of
correctly specified q-entries in mis.Q that remain correctly specified in ref.Q
after a Q-matrix refinement procedure is applied to mis.Q.
See examples used for QR.
RR is used to compute the agreement rate between two Q-matrices with
identical dimensions.
RR(Q1, Q2)RR(Q1, Q2)
Q1 |
The first Q-matrix. |
Q2 |
The second Q-matrix that has the same dimensionality as |
The function returns
The entry-wise agreement rate
The item-wise agreement rate
See the examples for the QR and TSQE functions.
Launches the interactive demo shipped with the package (can be found in
inst/shiny/GNPC_app). The app demonstrates NPC, GNPC, and G-DINA
workflows and the ECPE real-data example.
run_gnpc_app( launch.browser = interactive(), host = "127.0.0.1", port = NULL, ... )run_gnpc_app( launch.browser = interactive(), host = "127.0.0.1", port = NULL, ... )
launch.browser |
Logical; open in a web browser? Defaults to |
host |
Host interface passed to |
port |
Optional integer port. If |
... |
Additional arguments forwarded to |
## Not run: run_gnpc_app() ## End(Not run)## Not run: run_gnpc_app() ## End(Not run)
The function estimates the Q-matrix based on the response data using the two-step Q-matrix estimation method.
TSQE( Y, K, input.cor = c("tetrachoric", "pearson"), ref.method = c("QR", "GDI"), GDI.model = c("GDINA", "DINA", "ACDM", "RRUM"), cutoff = 0.8 )TSQE( Y, K, input.cor = c("tetrachoric", "pearson"), ref.method = c("QR", "GDI"), GDI.model = c("GDINA", "DINA", "ACDM", "RRUM"), cutoff = 0.8 )
Y |
A |
K |
The number of attributes in the Q-matrix |
input.cor |
The type of correlation used as input for the
provisional attribute extraction (PAE) algorithm. It could be the
|
ref.method |
The refinement method used to polish the provisional
Q-matrix obtained from the PAE. Currently available methods include
the Q-matrix refinement ( |
GDI.model |
The CDM used in the GDI algorithm to fit the data. Currently available models include the DINA model, the ACDM, the RRUM, and the G-DINA model. |
cutoff |
The cutoff used to dichotomize the entries in the provisional Q-matrix. The default is 0.8. |
The function returns the estimated Q-matrix.
The TSQE method estimates a Q-matrix by integrating the provisional attribute extraction (PAE) algorithm with a Q-matrix refinement-and-validation method, such as the Q-Matrix Refinement (QR) method and the G-DINA Model Discrimination Index (GDI). Specifically, the PAE algorithm relies on classic exploratory factor analysis (EFA) combined with a unique stopping rule for identifying a provisional Q-matrix, and the resulting provisional Q-Matrix is "polished" by a refinement method to derive the finalized estimation of Q-matrix.
The PAE Algorithm starts with computing the inter-item tetrachoric correlation matrix. The reason for using tetrachoric correlation is that the examinee responses are binary, so it is more appropriate than the Pearson product moment correlation coefficient. See Köhn et al. (2025) for details. The next step is to use factor analysis on the item-correlation matrix, and treat the extracted factors as proxies for the latent attributes. The third step concerns the identification of specific attributes required for each item. The detailed algorithm is described below:
Initialize the item index as .
Let denote the loading of item on factor , where .
Arrange the loadings in descending order. Define a mapping
function , where is the order index.
Hence, will indicate the maximum loading,
while will indicate the minimum loading.
Define
as the proportion of the communality of item accounted for
by the first factors.
Define
,
where is the cut-off value for the desired proportion
of item variance-accounted-for. Then, the ordered entries of the
provisional q-vector of item are obtained as
.
Identify
by rearranging the ordered entries of the q-vector using the inverse function .
Set and repeat (2) to (6) until .
Then denote the provisional Q-matrix as .
The provisional Q-matrix is then refined by
using either the QR or GDI method.
Chiu, C. Y. (2013). Statistical Refinement of the Q-matrix in Cognitive Diagnosis. Applied Psychological Measurement, 37(8), 598-618. doi:10.1177/0146621613488436
de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253-73. doi:10.1007/s11336-015-9467-8
Köhn, H. F., Chiu, C.-Y., Oluwalana, O., Kim, H. & Wang, J. (2025). A two-step Q-matrix estimation method, Applied Psychological Measurement, 49(1-2), 3-28. doi:10.1177/01466216241284418
## Not run: library(GDINA) N = 1000 Q = sim30GDINA$simQ J = nrow(Q) K= ncol(Q) gs = data.frame(guess=rep(0.2,J),slip=rep(0.2,J)) sim = simGDINA(N,Q,gs.parm = gs,model = "DINA") Y = extract(sim,what = "dat") ## Run TSQE method with QR est.Q = TSQE(Y, K, input.cor = "tetrachoric", ref.method = "QR", cutoff = 0.8) ## If the recovery rate is to be computed, the columns of the estimated Q-matrix ## should be permuted so that they align with those of the true Q-matrix. best.est.Q = bestQperm(est.Q, Q) ## Compute the recovery rate RR(best.est.Q, Q) ## End(Not run)## Not run: library(GDINA) N = 1000 Q = sim30GDINA$simQ J = nrow(Q) K= ncol(Q) gs = data.frame(guess=rep(0.2,J),slip=rep(0.2,J)) sim = simGDINA(N,Q,gs.parm = gs,model = "DINA") Y = extract(sim,what = "dat") ## Run TSQE method with QR est.Q = TSQE(Y, K, input.cor = "tetrachoric", ref.method = "QR", cutoff = 0.8) ## If the recovery rate is to be computed, the columns of the estimated Q-matrix ## should be permuted so that they align with those of the true Q-matrix. best.est.Q = bestQperm(est.Q, Q) ## Compute the recovery rate RR(best.est.Q, Q) ## End(Not run)