R ggplot2 stat_summary 總結唯一/分箱 x 處的 y 值

stat_summary() 對唯一的 x 或 y 進行操作； stat_summary_bin() 對分箱的 x 或 y 進行操作。它們是 stat_bin() 的更靈活版本：它們不僅可以計算，還可以計算任何聚合。

用法

stat_summary_bin(
  mapping = NULL,
  data = NULL,
  geom = "pointrange",
  position = "identity",
  ...,
  fun.data = NULL,
  fun = NULL,
  fun.max = NULL,
  fun.min = NULL,
  fun.args = list(),
  bins = 30,
  binwidth = NULL,
  breaks = NULL,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE,
  fun.y = deprecated(),
  fun.ymin = deprecated(),
  fun.ymax = deprecated()
)

stat_summary(
  mapping = NULL,
  data = NULL,
  geom = "pointrange",
  position = "identity",
  ...,
  fun.data = NULL,
  fun = NULL,
  fun.max = NULL,
  fun.min = NULL,
  fun.args = list(),
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE,
  fun.y = deprecated(),
  fun.ymin = deprecated(),
  fun.ymax = deprecated()
)

參數

mapping

由 aes() 創建的一組美學映射。如果指定且inherit.aes = TRUE(默認)，它將與繪圖頂層的默認映射組合。如果沒有繪圖映射，則必須提供mapping。

data

該層要顯示的數據。有以下三種選擇：

如果默認為 NULL ，則數據繼承自 ggplot() 調用中指定的繪圖數據。

data.frame 或其他對象將覆蓋繪圖數據。所有對象都將被強化以生成 DataFrame 。請參閱fortify() 將為其創建變量。

將使用單個參數(繪圖數據)調用function。返回值必須是 data.frame ，並將用作圖層數據。可以從 formula 創建 function (例如 ~ head(.x, 10) )。

geom

用於顯示數據的幾何對象，可以作為 ggproto Geom 子類，也可以作為命名去除 geom_ 前綴的幾何對象的字符串(例如 "point" 而不是 "geom_point" )

position

位置調整，可以是命名調整的字符串(例如 "jitter" 使用 position_jitter )，也可以是調用位置調整函數的結果。如果需要更改調整設置，請使用後者。

...

其他參數傳遞給 layer() 。這些通常是美學，用於將美學設置為固定值，例如 colour = "red" 或 size = 3 。它們也可能是配對的 geom/stat 的參數。

fun.data

給出完整數據的函數，應返回包含變量 ymin 、 y 和 ymax 的數據幀。

fun.min, fun, fun.max

或者，提供三個單獨的函數，每個函數都傳遞一個值向量並應返回一個數字。

fun.args

傳遞給函數的可選附加參數。

bins

箱子數量。被 binwidth 覆蓋。默認為 30。

binwidth

箱子的寬度。可以指定為數值或根據未縮放的 x 計算寬度的函數。這裏，"unscaled x" 指的是應用任何尺度變換之前數據中的原始 x 值。當指定函數和分組結構時，每個組將調用該函數一次。默認是使用 bins 中的 bin 數量，覆蓋數據範圍。您應該始終覆蓋此值，探索多個寬度以找到最能說明數據中的故事的寬度。

日期變量的 bin 寬度是每個時間的天數；時間變量的 bin 寬度是秒數。

breaks

或者，您可以提供給出 bin 邊界的數值向量。覆蓋 binwidth 、 bins 、 center 和 boundary 。

na.rm

如果 FALSE ，則默認缺失值將被刪除並帶有警告。如果 TRUE ，缺失值將被靜默刪除。

orientation

層的方向。默認值 ( NA ) 自動根據美學映射確定方向。萬一失敗，可以通過將 orientation 設置為 "x" 或 "y" 來顯式給出。有關更多詳細信息，請參閱方向部分。

show.legend

合乎邏輯的。該層是否應該包含在圖例中？ NA(默認值)包括是否映射了任何美學。 FALSE 從不包含，而 TRUE 始終包含。它也可以是一個命名的邏輯向量，以精細地選擇要顯示的美學。

inherit.aes

如果 FALSE ，則覆蓋默認美學，而不是與它們組合。這對於定義數據和美觀的輔助函數最有用，並且不應繼承默認繪圖規範的行為，例如borders() 。

fun.ymin, fun.y, fun.ymax

請改用上麵指定的版本。

方向

該幾何體以不同的方式對待每個軸，因此可以有兩個方向。通常，方向很容易從給定映射和使用的位置比例類型的組合中推斷出來。因此，ggplot2 默認情況下會嘗試猜測圖層應具有哪個方向。在極少數情況下，方向不明確，猜測可能會失敗。在這種情況下，可以直接使用 orientation 參數指定方向，該參數可以是 "x" 或 "y" 。該值給出了幾何圖形應沿著的軸，"x" 是您期望的幾何圖形的默認方向。

美學

stat_summary() 理解以下美學(所需的美學以粗體顯示)：

x
y
group

在 vignette("ggplot2-specs") 中了解有關設置這些美學的更多信息。

函數匯總

您可以單獨提供匯總函數( fun 、 fun.max 、 fun.min )，也可以作為單個函數( fun.data )提供：

fun.data: 完整的匯總函數。應該以數值向量作為輸入並返回數據幀作為輸出
fun.min: 最小匯總函數(應采用數值向量並返回單個數字)
fun: 主要摘要函數(應采用數值向量並返回單個數字)
fun.max: 最大匯總函數(應采用數值向量並返回單個數字)

簡單的向量函數最容易使用，因為您可以返回單個數字，但靈活性稍差。如果您的匯總函數一次計算多個值(例如最小值和最大值)，請使用 fun.data 。

fun.data 將接收數據，就好像數據沿 x 軸定向一樣，並應返回與該方向相對應的 data.frame。如果該層沿 y 軸定向，則該層將負責翻轉輸入和輸出。

如果未提供聚合函數，則默認為 mean_se() 。

也可以看看

geom_errorbar() , geom_pointrange() , geom_linerange() , geom_crossbar() 用於geoms顯示匯總數據

例子

d <- ggplot(mtcars, aes(cyl, mpg)) + geom_point()
d + stat_summary(fun.data = "mean_cl_boot", colour = "red", linewidth = 2, size = 3)


# Orientation follows the discrete axis
ggplot(mtcars, aes(mpg, factor(cyl))) +
  geom_point() +
  stat_summary(fun.data = "mean_cl_boot", colour = "red", linewidth = 2, size = 3)


# You can supply individual functions to summarise the value at
# each x:
d + stat_summary(fun = "median", colour = "red", size = 2, geom = "point")

d + stat_summary(fun = "mean", colour = "red", size = 2, geom = "point")

d + aes(colour = factor(vs)) + stat_summary(fun = mean, geom="line")


d + stat_summary(fun = mean, fun.min = min, fun.max = max, colour = "red")


d <- ggplot(diamonds, aes(cut))
d + geom_bar()

d + stat_summary(aes(y = price), fun = "mean", geom = "bar")


# Orientation of stat_summary_bin is ambiguous and must be specified directly
ggplot(diamonds, aes(carat, price)) +
  stat_summary_bin(fun = "mean", geom = "bar", orientation = 'y')


# \donttest{
# Don't use ylim to zoom into a summary plot - this throws the
# data away
p <- ggplot(mtcars, aes(cyl, mpg)) +
  stat_summary(fun = "mean", geom = "point")
p

p + ylim(15, 30)
#> Warning: Removed 9 rows containing non-finite values (`stat_summary()`).

# Instead use coord_cartesian
p + coord_cartesian(ylim = c(15, 30))


# A set of useful summary functions is provided from the Hmisc package:
stat_sum_df <- function(fun, geom="crossbar", ...) {
  stat_summary(fun.data = fun, colour = "red", geom = geom, width = 0.2, ...)
}
d <- ggplot(mtcars, aes(cyl, mpg)) + geom_point()
# The crossbar geom needs grouping to be specified when used with
# a continuous x axis.
d + stat_sum_df("mean_cl_boot", mapping = aes(group = cyl))

d + stat_sum_df("mean_sdl", mapping = aes(group = cyl))

d + stat_sum_df("mean_sdl", fun.args = list(mult = 1), mapping = aes(group = cyl))

d + stat_sum_df("median_hilow", mapping = aes(group = cyl))


# An example with highly skewed distributions:
if (require("ggplot2movies")) {
set.seed(596)
mov <- movies[sample(nrow(movies), 1000), ]
 m2 <-
   ggplot(mov, aes(x = factor(round(rating)), y = votes)) +
   geom_point()
 m2 <-
   m2 +
   stat_summary(
     fun.data = "mean_cl_boot",
     geom = "crossbar",
     colour = "red", width = 0.3
   ) +
   xlab("rating")
m2
# Notice how the overplotting skews off visual perception of the mean
# supplementing the raw data with summary statistics is _very_ important

# Next, we'll look at votes on a log scale.

# Transforming the scale means the data are transformed
# first, after which statistics are computed:
m2 + scale_y_log10()
# Transforming the coordinate system occurs after the
# statistic has been computed. This means we're calculating the summary on the raw data
# and stretching the geoms onto the log scale.  Compare the widths of the
# standard errors.
m2 + coord_trans(y="log10")
}

# }