R memCompress 内存中压缩和解压缩

R语言 memCompress 位于 base 包(package)。

说明

原始向量的内存中压缩或解压缩。

用法

memCompress(from, type = c("gzip", "bzip2", "xz", "none"))

memDecompress(from,
              type = c("unknown", "gzip", "bzip2", "xz", "none"),
              asChar = FALSE)

参数

`from`	原始向量。对于 `memCompress` ，字符向量将转换为原始向量，其中字符串由 `"\n"` 分隔。除 `"bzip2"` 之外的类型都支持长原始向量。
`type`	字符串，压缩类型。可以缩写为单个字母，默认为第一个选项。
`asChar`	逻辑：结果是否应该转换为字符串？注意：字符串有 `2^{31}-1` 字节的限制，因此原始向量应用于大型输入。

细节

type = "none" 不变地传递输入，但如果 type 是变量，则可能有用。

type = "unknown" 尝试检测所应用的压缩类型(如果有)：这对于 bzip2 压缩始终会成功，并且如果有合适的标头，对于其他形式也会成功。如果未检测到压缩类型，则与 type = "none" 相同，但会发出警告。

gzip 压缩使用底层库的默认压缩级别(通常是 6 )。它支持 RFC 1950 格式，有时称为 ‘zlib’ 格式，用于压缩和解压缩，并且仅支持 RFC 1952 解压缩，‘gzip’ 格式(用页眉和页脚包装 ‘zlib’ 格式)。

bzip2 压缩始终添加标头 ("BZh" )。底层库仅支持最多 2^{31}-1 元素的内存中(解)压缩。压缩相当于bzip2 -9(默认值)。

使用 type = "xz" 压缩相当于使用 xz -9e 压缩文件(包括添加 ‘magic’ 标头)：解压缩应处理由 xz 版本 4.999 及更高版本以及某些版本压缩的任何文件的内容lzma 。还有其他版本，特别是‘raw’ 流，目前尚未处理。

所有类型的压缩都可以扩展输入：对于 "gzip" 和 "bzip2"，最大扩展是已知的，因此 memCompress 始终可以分配足够的空间。对于"xz"，如果输出太大，压缩可能会失败(但极不可能)。

值

原始向量或字符串(如果 asChar = TRUE )。

`libdeflate`

支持libdeflate添加了库R4.4.0。它使用 RFC 1950 ‘zlib’ 格式的不同代码(以及用于解压缩的 RFC 1952)，预计比使用参考(或系统)要快得多zlib Library 。它用于type = "gzip"如果可供使用的话。

标头和源代码可以从 https://github.com/ebiggers/libdeflate 下载，并且预构建版本适用于大多数 Linux 发行版。

例子

txt <- readLines(file.path(R.home("doc"), "COPYING"))
sum(nchar(txt))
txt.gz <- memCompress(txt, "g") # "gzip", the default
length(txt.gz)
txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt2))
## as from R 4.4.0 this is detected if not specified.
txt2b <- strsplit(memDecompress(txt.gz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt2b, txt2))

txt.bz2 <- memCompress(txt, "b")
length(txt.bz2)
## can auto-detect bzip2:
txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

## xz compression is only worthwhile for large objects
txt.xz <- memCompress(txt, "x")
length(txt.xz)
txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

## test decompressing a gzip-ed file
tf <- tempfile(fileext = ".gz")
con <- gzfile(tf, "w")
writeLines(txt, con)
close(con)
(nf <- file.size(tf))
# if (nzchar(Sys.which("file"))) system2("file", tf)
foo <- readBin(tf, "raw", n = nf)
unlink(tf)
## will detect the gzip header and choose type = "gzip"
txt3 <- strsplit(memDecompress(foo, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

也可以看看

connections 。

extSoftVersion 表示正在使用的 zlib 或 libdeflate 、 bzip2 和 xz 库的版本。

https://en.wikipedia.org/wiki/Data_compression for background on data compression, https://zlib.net/, https://en.wikipedia.org/wiki/Gzip, http://www.bzip.org/, https://en.wikipedia.org/wiki/Bzip2, https://tukaani.org/xz/ and https://en.wikipedia.org/wiki/XZ_Utils for references about the particular schemes used.

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 In-memory Compression and Decompression。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。