R charsets 字符集之间的转换表

R语言 charsets 位于 tools 包(package)。

说明

charset_to_Unicode 是一个 Unicode 代码点矩阵，其中包含常见 8 位编码的列。

Adobe_glyphs 是一个 DataFrame ，它为 Unicode 代码点提供 Adobe 字形名称。它有两个字符列："adobe" 和"unicode"(4 位十六进制表示形式)。

用法

charset_to_Unicode

Adobe_glyphs

细节

charset_to_Unicode 是类 c("noquote", "hexmode") 的整数矩阵，因此以十六进制打印。这些映射是 libiconv 使用的映射：源之间的引号和减号/连字符映射方式存在差异(并且 postscript 编码文件使用不同的映射)。

Adobe_glyphs包括与单个 Unicode 字符相对应的所有 Adobe 字形名称。它按 Unicode 代码点排序，并在字形上的一个点内按字母顺序排序(Unicode 代码点可以有多个名称)。数据位于文件‘base Rhome/分享/编码/Adobe_glyphlist’。

例子

## find Adobe names for ISOLatin2 chars.
latin2 <- charset_to_Unicode[, "ISOLatin2"]
aUnicode <- as.hexmode(paste0("0x", Adobe_glyphs$unicode))
keep <- aUnicode %in% latin2
aUnicode <- aUnicode[keep]
aAdobe <- Adobe_glyphs[keep, 1]
## first match
aLatin2 <- aAdobe[match(latin2, aUnicode)]
## all matches
bLatin2 <- lapply(1:256, function(x) aAdobe[aUnicode == latin2[x]])
format(bLatin2, justify = "none")

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Conversion Tables between Character Sets。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。