R readr write_delim 将数据帧写入分隔文件

write_*() 系列函数是对 write.csv() 等类似函数的改进，因为它们的速度大约是 write.csv() 的两倍。与 write.csv() 不同，这些函数不包括行名称作为写入文件中的列。通用函数 output_column() 应用于每个变量以将列强制为合适的输出。

用法

write_delim(
  x,
  file,
  delim = " ",
  na = "NA",
  append = FALSE,
  col_names = !append,
  quote = c("needed", "all", "none"),
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

write_csv(
  x,
  file,
  na = "NA",
  append = FALSE,
  col_names = !append,
  quote = c("needed", "all", "none"),
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

write_csv2(
  x,
  file,
  na = "NA",
  append = FALSE,
  col_names = !append,
  quote = c("needed", "all", "none"),
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

write_excel_csv(
  x,
  file,
  na = "NA",
  append = FALSE,
  col_names = !append,
  delim = ",",
  quote = "all",
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

write_excel_csv2(
  x,
  file,
  na = "NA",
  append = FALSE,
  col_names = !append,
  delim = ";",
  quote = "all",
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

write_tsv(
  x,
  file,
  na = "NA",
  append = FALSE,
  col_names = !append,
  quote = "none",
  escape = c("double", "backslash", "none"),
  eol = "\n",
  num_threads = readr_threads(),
  progress = show_progress(),
  path = deprecated(),
  quote_escape = deprecated()
)

参数

x

要写入磁盘的数据帧或小块。

file

要写入的文件或连接。

delim

用于分隔值的分隔符。对于 write_delim() 默认为 " "，对于 write_excel_csv() 默认为 ","，对于 write_excel_csv2() 默认为 ";"。必须是单个字符。

na

用于缺失值的字符串。默认为 NA。缺失值永远不会被引用；与 na 具有相同值的字符串将始终被引用。

append

如果 FALSE ，将覆盖现有文件。如果 TRUE ，将追加到现有文件。在这两种情况下，如果文件不存在，则会创建新文件。

col_names

如果 FALSE ，列名将不会包含在文件顶部。如果 TRUE ，将包含列名称。如果未指定，col_names 将采用给定 append 的相反值。

quote

如何处理包含需要引用的字符的字段。

needed - 仅在需要时才引用值：如果它们包含分隔符、引号或换行符。
all - 引用所有字段。
none - 切勿引用字段。

escape

当数据中存在引号时要使用的转义类型。

double - 引号通过加倍来转义。
backslash - 引号通过前面的反斜杠转义。
none - 引号不会转义。

eol

要使用的行结束符。最常见的是 "\n" 用于 Unix 样式换行符，或 "\r\n" 用于 Windows 样式换行符。

num_threads

读取和具体化向量时使用的线程数。如果您的数据在字段中包含换行符，解析器将自动强制仅使用单个线程。

progress

显示进度条？默认情况下，它只会在交互式会话中显示，而不会在编织文档时显示。显示每 50,000 个值更新一次，并且仅在估计读取时间为 5 秒或更长时才会显示。可以通过将选项 readr.show_progress 设置为 FALSE 来禁用自动进度条。

path

使用file相反。

quote_escape

使用escape相反。

值

write_*() 以不可见方式返回输入x。

输出

因子是被迫的。使用 grisu3 算法将双精度数格式化为十进制字符串。POSIXct值的格式为带有 UTC 时区的 ISO8601注意：本地或非 UTC 时区的 POSIXct 对象在写入前将转换为 UTC 时间。

所有列均编码为 UTF-8。 write_excel_csv() 和 write_excel_csv2() 还包括 UTF-8 Byte order mark，它向 Excel 指示 csv 是 UTF-8 编码的。

创建 write_excel_csv2() 和 write_csv2 是为了允许具有不同区域设置的用户使用默认设置保存 .csv 文件(例如，; 作为列分隔符，, 作为小数点分隔符)。这在一些欧洲国家很常见。

仅当值包含逗号、引号或换行符时才用引号引起来。

如果给出适当的扩展名，write_*() 函数将自动压缩输出。目前支持三种扩展：.gz(用于 gzip 压缩)、.bz2(用于 bzip2 压缩)和 .xz(用于 lzma 压缩)。请参阅示例以获取更多信息。

参考

Florian Loitsch，使用整数快速准确地打印浮点数，PLDI '10，http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

例子

# \dontshow{
.old_wd <- setwd(tempdir())
# }
# If only a file name is specified, write_()* will write
# the file to the current working directory.
write_csv(mtcars, "mtcars.csv")
write_tsv(mtcars, "mtcars.tsv")

# If you add an extension to the file name, write_()* will
# automatically compress the output.
write_tsv(mtcars, "mtcars.tsv.gz")
write_tsv(mtcars, "mtcars.tsv.bz2")
write_tsv(mtcars, "mtcars.tsv.xz")
# \dontshow{
setwd(.old_wd)
# }

源代码：R/write.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Write a data frame to a delimited file。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。