R clusplot.default 双变量聚类图 (clusplot) 默认方法

R语言 clusplot.default 位于 cluster 包(package)。

说明

创建可视化数据分区(聚类)的二变量图。所有观察结果均使用主成分或多维标度由图中的点表示。在每个簇周围绘制一个椭圆。

用法

## Default S3 method:
clusplot(x, clus, diss = FALSE,
          s.x.2d = mkCheckX(x, diss), stand = FALSE,
          lines = 2, shade = FALSE, color = FALSE,
          labels= 0, plotchar = TRUE,
          col.p = "dark green", col.txt = col.p,
          col.clus = if(color) c(2, 4, 6, 3) else 5, cex = 1, cex.txt = cex,
          span = TRUE,
          add = FALSE,
          xlim = NULL, ylim = NULL,
          main = paste("CLUSPLOT(", deparse1(substitute(x)),")"),
          sub = paste("These two components explain",
             round(100 * var.dec, digits = 2), "% of the point variability."),
          xlab = "Component 1", ylab = "Component 2",
          verbose = getOption("verbose"),
          ...)

参数

`x`	矩阵或 DataFrame ，或相异矩阵，具体取决于 `diss` 参数的值。对于矩阵(类似)，每行对应一个观察值，每列对应一个变量。所有变量都必须是数字。允许缺失值 (`NA` s)。它们被相应变量的中位数代替。当某些变量或某些观测值仅包含缺失值时，该函数将停止并显示警告消息。如果是相异矩阵，`x` 是 `daisy` 或 `dist` 或对称矩阵的输出。此外，允许使用长度为 `n*(n-1)/2` 的向量(其中 `n` 是观测值的数量)，并且将以与上述函数的输出相同的方式进行解释。不允许存在缺失值 (NA)。
`clus`	长度为 n 的向量，表示 `x` 的聚类。对于每个观察，向量列出了它所分配到的簇的编号或名称。 `clus` 通常是 `pam` 、 `fanny` 或 `clara` 输出的聚类组件。
`diss`	逻辑指示 `x` 是否将被视为相异矩阵或变量观察矩阵(请参阅上面的 `x` 参数)。
`s.x.2d`	`list` ，其组件名为 `x` (`n \times 2` 矩阵；通常类似于原始数据的主组件)、 `labs` 和 `var.dec` 。
`stand`	逻辑标志：如果为 true，则二维图中 n 个观测值的表示是标准化的。
`lines`	`0, 1, 2` 中的整数，用于获取椭圆之间的距离。两个椭圆 E1 和 E2 之间的距离是沿着连接两个椭圆的中心 `m1` 和 `m2` 的线测量的。如果 E1 和 E2 在通过 `m1` 和 `m2` 的线上重叠，则不会绘制任何线。否则，结果取决于 `lines` 的值：如果行= 0，绘图上不会出现距离线；行数 = 1, 绘制`m1`和`m2`之间的线段；行数 = 2, 绘制E1和E2边界之间的线段(沿着连接`m1`和`m2`的线)。
`shade`	逻辑标志：如果为 TRUE，则椭圆的阴影与其密度有关。密度是簇中的点数除以椭圆面积。
`color`	逻辑标志：如果为 TRUE，则椭圆根据其密度着色。随着密度的增加，颜色有浅蓝色、浅绿色、红色和紫色。要在图形设备上看到这些颜色，应选择适当的配色方案(我们建议使用白色背景)。
`labels`	整数代码，当前为 0,1,2,3,4 和 5 之一。如果标签= 0，图中没有放置标签；标签= 1，可以在图中识别点和椭圆(参见`identify`)；标签= 2，所有点和椭圆都在图中标记；标签= 3，图中仅标记点；标签= 4，图中仅标记了省略号。标签= 5，椭圆在图中被标记，并且可以识别点。向量 `clus` 的级别被视为簇的标签。如果`x` 是类似矩阵的，则点的标签是`x` 的行名称。否则(`diss = TRUE`)，`x`是一个向量，点标签可以作为"Labels"属性(`attr(x,"Labels")`)附加到`x`，就像对`daisy`的输出所做的那样。将不考虑`clus` 可能的`names` 属性。
`plotchar`	逻辑标志：如果为 TRUE，则属于不同簇的点的绘图符号不同。
`span`	逻辑标志：如果为 TRUE，则每个簇由包含其所有点的最小面积的椭圆表示。 (这是最小体积椭球体的特例。) 如果为 FALSE，则椭圆基于相同点的均值和协方差矩阵。虽然计算速度更快，但它通常会产生更大的椭圆。还有一些特殊情况：当一个簇仅由一个点组成时，会在它周围画一个小圆圈。当簇中的点落在一条直线上时，`span=FALSE` 会在其周围绘制一个窄椭圆，而 `span=TRUE` 会给出精确的线段。
`add`	逻辑指示是否应将省略号(如果 `labels` 为 true 则添加标签)添加到现有绘图中。如果为 false，则不会写入 `title` 或副标题，请参阅 `sub` 。
`col.p`	用于观察点的颜色代码。
`col.txt`	用于标签的颜色代码(如果 `labels >= 2` )。
`col.clus`	椭圆的颜色代码(及其标签)；如果颜色为 false(默认情况下)，则只有一个。
`cex` , `cex.txt`	字符扩展(大小)，分别用于点符号和点标签。
`xlim` , `ylim`	长度为 2 的数值向量，给出 x 和 y 范围，如 `plot.default` 中所示。
`main`	情节的主标题；默认情况下，会构建一个。
`sub`	情节的副标题；默认情况下，会构建一个。
`xlab` , `ylab`	绘图的 x 轴和 y 轴标签，带默认值。
`verbose`	逻辑指示是否应该有额外的诊断输出；主要用于‘debugging’。
`...`	还可以提供更多图形参数，请参阅`par`。

细节

clusplot 分别使用函数调用 princomp(*, cor = (ncol(x) > 2)) 或 cmdscale(*, add=TRUE) ，具体取决于 diss 为 false 或 true。这些函数是数据缩减技术，用于表示双变量图中的数据。

然后绘制椭圆来指示簇。绘图的进一步布局由可选参数确定。

值

包含组件的不可见列表：

`Distances`	什么时候`lines`是 1 或 2 我们选择一个 k × k 矩阵(k 是簇的数量)。中的元素`[i,j]`是椭圆 i 和椭圆 j 之间的距离。如果`lines = 0`，那么该分量的值为`NA`.
`Shading`	长度为 k 的向量(其中 k 是簇的数量)，包含每个簇的阴影量。设 y 为向量，其中元素 i 是簇 i 中的点数与椭圆 i 的面积之间的比率。当簇i是线段时，y[i]和簇的密度被设置为`NA`。令 z 为 y 中不含 NA 的所有元素之和。然后我们输入 shading = y/z *37 + 3 。

副作用

在当前图形设备上绘制聚类的可视化显示。

注意

当我们有 4 个或更少的簇时，color=TRUE 会为每个簇赋予不同的颜色。当簇数超过 4 个时，clusplot 使用函数 pam 将密度聚类为 4 组，使得密度几乎相同的椭圆具有相同的颜色。 col.clus 指定使用的颜色。

col.p和col.txt参数，添加为R，被回收以具有观察数量的长度。如果col.p有多个值，使用color = TRUE由于点颜色和椭圆颜色的混合，可能会造成混淆。

例子

## plotting votes.diss(dissimilarity) in a bivariate plot and
## partitioning into 2 clusters
data(votes.repub)
votes.diss <- daisy(votes.repub)
pamv <- pam(votes.diss, 2, diss = TRUE)
clusplot(pamv, shade = TRUE)
## is the same as
votes.clus <- pamv$clustering
clusplot(votes.diss, votes.clus, diss = TRUE, shade = TRUE)
## Now look at components 3 and 2 instead of 1 and 2:
str(cMDS <- cmdscale(votes.diss, k=3, add=TRUE))
clusplot(pamv, s.x.2d = list(x=cMDS$points[, c(3,2)],
                             labs=rownames(votes.repub), var.dec=NA),
         shade = TRUE, col.p = votes.clus,
         sub="", xlab = "Component 3", ylab = "Component 2")

clusplot(pamv, col.p = votes.clus, labels = 4)# color points and label ellipses
# "simple" cheap ellipses: larger than minimum volume:
# here they are *added* to the previous plot:
clusplot(pamv, span = FALSE, add = TRUE, col.clus = "midnightblue")

## Setting a small *label* size:
clusplot(votes.diss, votes.clus, diss = TRUE, labels = 3, cex.txt = 0.6)

if(dev.interactive()) { #  uses identify() *interactively* :
  clusplot(votes.diss, votes.clus, diss = TRUE, shade = TRUE, labels = 1)
  clusplot(votes.diss, votes.clus, diss = TRUE, labels = 5)# ident. only points
}

## plotting iris (data frame) in a 2-dimensional plot and partitioning
## into 3 clusters.
data(iris)
iris.x <- iris[, 1:4]
cl3 <- pam(iris.x, 3)$clustering
op <- par(mfrow= c(2,2))
clusplot(iris.x, cl3, color = TRUE)
U <- par("usr")
## zoom in :
rect(0,-1, 2,1, border = "orange", lwd=2)
clusplot(iris.x, cl3, color = TRUE, xlim = c(0,2), ylim = c(-1,1))
box(col="orange",lwd=2); mtext("sub region", font = 4, cex = 2)
##  or zoom out :
clusplot(iris.x, cl3, color = TRUE, xlim = c(-4,4), ylim = c(-4,4))
mtext("'super' region", font = 4, cex = 2)
rect(U[1],U[3], U[2],U[4], lwd=2, lty = 3)

# reset graphics
par(op)

参考

Pison, G., Struyf, A. and Rousseeuw, P.J. (1999) Displaying a Clustering with CLUSPLOT, Computational Statistics and Data Analysis, 30, 381-392.

Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating Robust Clustering Techniques in S-PLUS, Computational Statistics and Data Analysis, 26, 17-37.

也可以看看

princomp , cmdscale , pam , clara , daisy , par , identify , cov.mve , clusplot.partition。

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Bivariate Cluster Plot (clusplot) Default Method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。