1. 問題描述一個(gè)txt文件,使用R中的data.table 包中的fread 函數(shù)讀取時(shí),報(bào)錯(cuò): > dat = fread("test.txt") Error in fread("test.txt") : File is encoded in UTF-16, this encoding is not supported by fread(). Please recode the file to UTF-8.
使用read.table > dat = read.table("test.txt") Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : '<ff><fe><63>'多字節(jié)字符串有錯(cuò) 此外: Warning messages: 1: In read.table("test.txt") : line 1 appears to contain embedded nulls 2: In read.table("test.txt") : line 2 appears to contain embedded nulls 3: In read.table("test.txt") : line 3 appears to contain embedded nulls 4: In read.table("test.txt") : line 4 appears to contain embedded nulls 5: In read.table("test.txt") : line 5 appears to contain embedded nulls 6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
2. 問題解決用notepad++查看了一下,編碼形式:「UCS-2」 所以,在read.table 中,設(shè)定編碼形式:fileEncoding="UCS-2LE"
因此,修改后的代碼為: > dat = read.table("test.txt",fileEncoding = "UCS-2",header = T) > head(dat) chipID sampleID 1 202884940082_R02C04 CW63976425 2 202884940082_R03C01 CW63976831 3 202884940082_R03C02 CW63976366 4 202884940082_R03C03 CW63976367 5 202884940082_R03C04 CW63976433 6 202884940082_R04C01 CW63976615
「搞定!」。 3. 解決思路總結(jié)查看文件的編碼形式,用notepad++查看,然后定義編碼的類型,使用read.table 讀取時(shí),定義一下fileEncoding 即可。以前我以為data.table 包中的fread 是萬(wàn)能的,沒想到它給出報(bào)錯(cuò)不支持UTF-16 ,最后還是用基礎(chǔ)包中的read.table 解決了問題。 真香!
|