Title: | Modified Rand and Wallace Indices |
---|---|
Description: | It provides functions to compute the values of different modifications of the Rand and Wallace indices. The indices are used to measure the stability or similarity of two partitions obtained on two different sets of units with a non-empty intercept. Splitting and merging of clusters can (depends on the selected index) have a different effect on the value of the indices. The indices are proposed in Cugmas and Ferligoj (2018) <http://ibmi.mf.uni-lj.si/mz/2018/no-1/Cugmas2018.pdf>. |
Authors: | Marjan Cugmas [aut, cre] |
Maintainer: | Marjan Cugmas <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2025-02-07 04:41:29 UTC |
Source: | https://github.com/cran/mri |
Some examples of misclassifications of units in clusters of the first and second partitions. The data are the same as used in Cugmas and Ferligoj (2018) (see Figure 4 in the paper) (the exception are examples d, h, l and p which are random and therefore they can slightly differ from those in the paper).
data("examples")
data("examples")
The data are in a list format. Each element of the list is a contingency table. The newcomers and outgoers are not denoted.
Element of the list (example in Figure 4 and in Table 3 in Cugmas and Ferligoj (2018))
1 (A), 2 (B), 3(C), 4 (D)
13 (E), 14 (F), 15 (G), 16 (H)
5 (I), 6 (J), 7 (K), 8 (L)
9 (M), 10 (N), 11 (0), 12 (P)
Cugmas, M., & Ferligoj, A. (2018). Comparing two partitions of non-equal sets of units. Advances in Methodology and Statistics, 15(1), 1-21.
It covert a given contingency table to a data frame.
fromTableToVectors(cont.table)
fromTableToVectors(cont.table)
cont.table |
contingency table (a data frame with rownames and columnames) |
A data frame with n rows and 2 columns. The first column corresponds to the rows of the contingency table while the second column corresponds to the columns of the contingency table.
Marjan Cugmas
data <- rbind(c(0, 10, 0, 0, 0), c(0, 10, 0, 0, 0), c(0, 0, 10, 0, 0), c(0, 0, 0, 5, 5)) rownames(data) <- 1:4 colnames(data) <- 1:5 fromTableToVectors(cont.table = data)
data <- rbind(c(0, 10, 0, 0, 0), c(0, 10, 0, 0, 0), c(0, 0, 10, 0, 0), c(0, 0, 0, 5, 5)) rownames(data) <- 1:4 colnames(data) <- 1:5 fromTableToVectors(cont.table = data)
They are used to compute the value of the Modified Rand Index and the Modified Adjusted Rand Index. They consider two partitions which are usually obtained on two sets of units where the intercept is non-empty or where one set of units is a subset of another set of units.
Because two vectors U and V (partitions) do not have the same length, the cluster of units, which are not present in the partition V (outgoers), need to be added to the partition V (denoted as V') and/or the cluster of units, which are not present in the partition U (newcomers), need to be added to the partition U (denoted as U'). The added values have to be NA
.
MRI(U, V) MARI(U, V, k)
MRI(U, V) MARI(U, V, k)
U |
partition U or U' |
V |
partition V or V' |
k |
the number of iterations to estimate the expected value of the index in the case of two random and independent partitions |
The functions return the value of the Modified Rand Index or the value of the Modified Adjusted Rand Index. The expected value of the (Modified) Adjusted Rand Index is 0 in the case of two random and independent partitions. The maximum value of the index is 1. Higher value indicates more similar (stable) partitions. Both splitting of clusters and merging of clusters lower the value of the indices.
The special cases of the modified indices (when only outgoers or only newcomers are present) are automatically considered within these functions.
Marjan Cugmas
Cugmas, M., & Ferligoj, A. (2018). Comparing two partitions of non-equal sets of units. Advances in Methodology and Statistics, 15(1), 1-21.
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # increase k in real analyses # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") }
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # increase k in real analyses # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MARI", MARI(U = U, V = V, k = 100), "\n") }
The functions are used to compute the value of the Modified Wallace Index 1 and the Modified Adjusted Wallace Index 1. They consider two partitions which are usually obtained on two sets of units where the intercept is non-empty or where one set of units is a subset of another set of units.
Because two vectors U and V (partitions) do not have the same length, the cluster of units, which are not present in the partition V (outgoers), need to be added to the partition V (denoted as V') and/or the cluster of units, which are not present in the partition U (newcomers), need to be added to the partition U (denoted as U'). The added values have to be NA
.
MW1(U, V) MAW1(U, V, k)
MW1(U, V) MAW1(U, V, k)
U |
partition U or U' |
V |
partition V or V' |
k |
the number of iterations to estimate the expected value of the index in the case of two random and independent partitions |
The functions return the value of the Modified Wallace Index 1 or the value of the Modified Adjusted Wallace Index 1. The expected value of the (Modified) Adjusted Wallace indices 1 is 0 in the case of two random and independent partitions. The maximum value of the index is 1. Higher value indicates more similar (stable) partitions. Splitting of clusters lower the value of the indices while merging does not.
The special cases of the modified indices (when only outgoers or only newcomers are present) are automatically considered within these functions.
Marjan Cugmas
Cugmas, M., & Ferligoj, A. (2018). Comparing two partitions of non-equal sets of units. Advances in Methodology and Statistics, 15(1), 1-21.
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") }
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MAWI1", MAW1(U = U, V = V), "\n") }
The functions are used to compute the value of the Modified Wallace Index 2 and the Modified Adjusted Wallace Index 2. They consider two partitions which are usually obtained on two sets of units where the intercept is non-empty or where one set of units is a subset of another set of units.
Because two vectors U and V (partitions) do not have the same length, the cluster of units, which are not present in the partition V (outgoers), need to be added to the partition V (denoted as V') and/or the cluster of units, which are not present in the partition U (newcomers), need to be added to the partition U (denoted as U'). The added values have to be NA
.
MW2(U, V) MAW2(U, V, k)
MW2(U, V) MAW2(U, V, k)
U |
partition U or U' |
V |
partition V or V' |
k |
the number of iterations to estimate the expected value of the index in the case of two random and independent partitions |
The functions return the value of the Modified Wallace Index 2 or the value of the Modified Adjusted Wallace Index 2. The expected value of the (Modified) Adjusted Wallace indices 2 is 0 in the case of two random and independent partitions. The maximum value of the index is 2. Higher value indicates more similar (stable) partitions. Merging of clusters lowers the value of the indices while splitting does not.
The special cases of modified indices (when only outgoers or only newcomers are present) are automatically considered within these functions.
Marjan Cugmas
Cugmas, M., & Ferligoj, A. (2018). Comparing two partitions of non-equal sets of units. Advances in Methodology and Statistics, 15(1), 1-21.
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") }
# Examples from Cugmas and Ferligoj (2018) paper: data(examples) # EXAMPLES: A, B, C, D par(mfrow = c(4, 4)) for (i in 1:4){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: E, F, G, H for (i in 13:16){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: I, J, K, L for (i in 5:8){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] V[which(max(as.numeric(as.character(V))) == as.numeric(as.character(V)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") } # EXAMPLES: M, N, O, P for (i in 9:12){ U <- fromTableToVectors(examples[[i]])[,1] V <- fromTableToVectors(examples[[i]])[,2] U[which(max(as.numeric(as.character(U))) == as.numeric(as.character(U)))] <- NA cat("MAWI2", MAW2(U = U, V = V), "\n") }