R memCompress 內存中壓縮和解壓縮

R語言 memCompress 位於 base 包(package)。

說明

原始向量的內存中壓縮或解壓縮。

用法

memCompress(from, type = c("gzip", "bzip2", "xz", "none"))

memDecompress(from,
              type = c("unknown", "gzip", "bzip2", "xz", "none"),
              asChar = FALSE)

參數

`from`	原始向量。對於 `memCompress` ，字符向量將轉換為原始向量，其中字符串由 `"\n"` 分隔。除 `"bzip2"` 之外的類型都支持長原始向量。
`type`	字符串，壓縮類型。可以縮寫為單個字母，默認為第一個選項。
`asChar`	邏輯：結果是否應該轉換為字符串？注意：字符串有 `2^{31}-1` 字節的限製，因此原始向量應用於大型輸入。

細節

type = "none" 不變地傳遞輸入，但如果 type 是變量，則可能有用。

type = "unknown" 嘗試檢測所應用的壓縮類型(如果有)：這對於 bzip2 壓縮始終會成功，並且如果有合適的標頭，對於其他形式也會成功。如果未檢測到壓縮類型，則與 type = "none" 相同，但會發出警告。

gzip 壓縮使用底層庫的默認壓縮級別(通常是 6 )。它支持 RFC 1950 格式，有時稱為 ‘zlib’ 格式，用於壓縮和解壓縮，並且僅支持 RFC 1952 解壓縮，‘gzip’ 格式(用頁眉和頁腳包裝 ‘zlib’ 格式)。

bzip2 壓縮始終添加標頭 ("BZh" )。底層庫僅支持最多 2^{31}-1 元素的內存中(解)壓縮。壓縮相當於bzip2 -9(默認值)。

使用 type = "xz" 壓縮相當於使用 xz -9e 壓縮文件(包括添加 ‘magic’ 標頭)：解壓縮應處理由 xz 版本 4.999 及更高版本以及某些版本壓縮的任何文件的內容lzma 。還有其他版本，特別是‘raw’ 流，目前尚未處理。

所有類型的壓縮都可以擴展輸入：對於 "gzip" 和 "bzip2"，最大擴展是已知的，因此 memCompress 始終可以分配足夠的空間。對於"xz"，如果輸出太大，壓縮可能會失敗(但極不可能)。

值

原始向量或字符串(如果 asChar = TRUE )。

`libdeflate`

支持libdeflate添加了庫R4.4.0。它使用 RFC 1950 ‘zlib’ 格式的不同代碼(以及用於解壓縮的 RFC 1952)，預計比使用參考(或係統)要快得多zlib Library 。它用於type = "gzip"如果可供使用的話。

標頭和源代碼可以從 https://github.com/ebiggers/libdeflate 下載，並且預構建版本適用於大多數 Linux 發行版。

例子

txt <- readLines(file.path(R.home("doc"), "COPYING"))
sum(nchar(txt))
txt.gz <- memCompress(txt, "g") # "gzip", the default
length(txt.gz)
txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt2))
## as from R 4.4.0 this is detected if not specified.
txt2b <- strsplit(memDecompress(txt.gz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt2b, txt2))

txt.bz2 <- memCompress(txt, "b")
length(txt.bz2)
## can auto-detect bzip2:
txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

## xz compression is only worthwhile for large objects
txt.xz <- memCompress(txt, "x")
length(txt.xz)
txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

## test decompressing a gzip-ed file
tf <- tempfile(fileext = ".gz")
con <- gzfile(tf, "w")
writeLines(txt, con)
close(con)
(nf <- file.size(tf))
# if (nzchar(Sys.which("file"))) system2("file", tf)
foo <- readBin(tf, "raw", n = nf)
unlink(tf)
## will detect the gzip header and choose type = "gzip"
txt3 <- strsplit(memDecompress(foo, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))

也可以看看

connections 。

extSoftVersion 表示正在使用的 zlib 或 libdeflate 、 bzip2 和 xz 庫的版本。

https://en.wikipedia.org/wiki/Data_compression for background on data compression, https://zlib.net/, https://en.wikipedia.org/wiki/Gzip, http://www.bzip.org/, https://en.wikipedia.org/wiki/Bzip2, https://tukaani.org/xz/ and https://en.wikipedia.org/wiki/XZ_Utils for references about the particular schemes used.

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 In-memory Compression and Decompression。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。