當前位置: 首頁>>代碼示例>>Java>>正文


Java UniversalDetector.reset方法代碼示例

本文整理匯總了Java中org.mozilla.universalchardet.UniversalDetector.reset方法的典型用法代碼示例。如果您正苦於以下問題:Java UniversalDetector.reset方法的具體用法?Java UniversalDetector.reset怎麽用?Java UniversalDetector.reset使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在org.mozilla.universalchardet.UniversalDetector的用法示例。


在下文中一共展示了UniversalDetector.reset方法的15個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: getFileCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static String getFileCharset(File file) throws IOException {
	byte[] buf = new byte[4096];
	BufferedInputStream bufferedInputStream = new BufferedInputStream(
			new FileInputStream(file));
	final UniversalDetector universalDetector = new UniversalDetector(null);
	int numberOfBytesRead;
	while ((numberOfBytesRead = bufferedInputStream.read(buf)) > 0
			&& !universalDetector.isDone()) {
		universalDetector.handleData(buf, 0, numberOfBytesRead);
	}
	universalDetector.dataEnd();
	String encoding = universalDetector.getDetectedCharset();
	universalDetector.reset();
	bufferedInputStream.close();
	return encoding;
}
 
開發者ID:simbest,項目名稱:simbest-cores,代碼行數:17,代碼來源:CharsetUtil.java

示例2: determineCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
@Override
public Charset determineCharset(byte[] bytes) {
    UniversalDetector detector = charsetDetector.get();
    try {
        detector.handleData(bytes, 0, bytes.length);
        detector.dataEnd();

        String encoding = detector.getDetectedCharset();
        if (encoding != null) {
            return Charset.forName(encoding);
        }

        return null;
    } finally {
        detector.reset();
    }
}
 
開發者ID:goldmansachs,項目名稱:obevo,代碼行數:18,代碼來源:DetectCharsetStrategy.java

示例3: detectCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static String detectCharset(InputStream fis) throws IOException {
	byte[] buf = new byte[4096];
	// (1)
	UniversalDetector detector = new UniversalDetector(null);
	// (2)
	int nread;
	while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
		detector.handleData(buf, 0, nread);
	}
	// (3)
	detector.dataEnd();
	// (4)
	String encoding = detector.getDetectedCharset();
	// (5)
	detector.reset();
	return encoding;
}
 
開發者ID:NewTranx,項目名稱:newtranx-utils,代碼行數:18,代碼來源:DetectCharset.java

示例4: guessEncoding

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
/**
 * Detect charset encoding of a byte array
 * 
 * @param bytes: the byte array to detect encoding from
 * @return the charset encoding
 */
public static String guessEncoding(byte[] bytes) {
	UniversalDetector detector = new UniversalDetector(null);

	detector.handleData(bytes, 0, bytes.length);
	detector.dataEnd();

	String encoding = detector.getDetectedCharset();
	detector.reset();

	if (encoding == null || "MACCYRILLIC".equals(encoding)) {
		// juniversalchardet incorrectly detects windows-1256 as MACCYRILLIC
		// If encoding is MACCYRILLIC or null, we use ICU4J
		CharsetMatch detected = new CharsetDetector().setText(bytes).detect();
		if (detected != null) {
			encoding = detected.getName();
		} else {
			encoding = "UTF-8";
		}
	}

	return encoding;
}
 
開發者ID:dnbn,項目名稱:submerge,代碼行數:29,代碼來源:FileUtils.java

示例5: createFromEventBody

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static <T> EnrichedEventBodyGeneric createFromEventBody(byte[] payload, boolean isEnriched, Class<T> clazz) throws IOException {

        EnrichedEventBodyGeneric enrichedEventBodyGeneric;

        if (isEnriched) {
            JavaType javaType = JSONStringSerializer.getJavaType(EnrichedEventBodyGeneric.class, clazz);
            enrichedEventBodyGeneric = (EnrichedEventBodyGeneric) JSONStringSerializer.fromBytes(payload, javaType);
        } else {
            // Detecting payload charset
            UniversalDetector detector = new UniversalDetector(null);
            detector.handleData(payload, 0, payload.length);
            detector.dataEnd();
            String charset = detector.getDetectedCharset();
            detector.reset();

            if (charset == null) {
                charset = DEFAULT_CHARSET;
            }
           enrichedEventBodyGeneric = new EnrichedEventBodyGeneric(new String(payload, charset), clazz);
        }

        return enrichedEventBodyGeneric;
    }
 
開發者ID:keedio,項目名稱:flume-enrichment-interceptor-skeleton,代碼行數:24,代碼來源:EnrichedEventBodyGeneric.java

示例6: createFromEventBody

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static EnrichedEventBody createFromEventBody(byte[] payload, boolean isEnriched) throws IOException {

        EnrichedEventBody enrichedBody;

        if (isEnriched) {
            enrichedBody = JSONStringSerializer.fromBytes(payload, EnrichedEventBody.class);
        } else {
            // Detecting payload charset
            UniversalDetector detector = new UniversalDetector(null);
            detector.handleData(payload, 0, payload.length);
            detector.dataEnd();
            String charset = detector.getDetectedCharset();
            detector.reset();

            if (charset == null) {
                charset = DEFAULT_CHARSET;
            }
            enrichedBody = new EnrichedEventBody(new String(payload, charset));
        }

        return enrichedBody;
    }
 
開發者ID:keedio,項目名稱:flume-enrichment-interceptor-skeleton,代碼行數:23,代碼來源:EnrichedEventBody.java

示例7: testEventCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
@Test
public void testEventCharset() throws IOException {
    String expectedCharset = StandardCharsets.UTF_8.name();

    Path path = Paths.get("src/test/resources/notUTFString.txt");
    byte[] payload = Files.readAllBytes(path);

    EnrichedEventBody message = EnrichedEventBody.createFromEventBody(payload, false);
    byte[] output = message.buildEventBody();

    UniversalDetector detector = new UniversalDetector(null);
    detector.handleData(output, 0, output.length);
    detector.dataEnd();
    String outputCharset = detector.getDetectedCharset();
    detector.reset();

    Assert.assertEquals(outputCharset, expectedCharset, "Invalid charset");
}
 
開發者ID:keedio,項目名稱:flume-enrichment-interceptor-skeleton,代碼行數:19,代碼來源:EnrichedEventBodyTest.java

示例8: detectEncoding

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
/**
 * Detect the encoding of the supplied file.
 *
 * @see <a href="https://code.google.com/p/juniversalchardet/">Original</a>
 * @see <a href="https://github.com/amake/juniversalchardet">Fork</a>
 */
public static String detectEncoding(InputStream stream) throws IOException {
    UniversalDetector detector = new UniversalDetector(null);

    byte[] buffer = new byte[4096];
    int read;
    while ((read = stream.read(buffer)) > 0 && !detector.isDone()) {
        detector.handleData(buffer, 0, read);
    }

    detector.dataEnd();

    String encoding = detector.getDetectedCharset();
    detector.reset();

    return encoding;
}
 
開發者ID:miurahr,項目名稱:tmpotter,代碼行數:23,代碼來源:EncodingDetector.java

示例9: detect

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static String detect(InputStream inputStream) throws IOException {
	UniversalDetector detector = Charset.getSingleton()
			.getCharsetDetector();
	byte[] buf = new byte[4096];
	int nread;
	while ((nread = inputStream.read(buf)) > 0 && !detector.isDone()) {
		detector.handleData(buf, 0, nread);
	}
	detector.dataEnd();
	String encoding = detector.getDetectedCharset();
	detector.reset();
	inputStream.close();
	if (encoding == null) {
		// If none encoding is detected, we assume UTF-8
		encoding = UTF8;
	}
	return encoding;
}
 
開發者ID:bonigarcia,項目名稱:dualsub,代碼行數:19,代碼來源:Charset.java

示例10: detectFileCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
/**
 * 探測文本編碼.
 */
public static String detectFileCharset(File file, int detectLength) throws IOException {
	String charset = null;
	FileInputStream fis = null;
	try {
		byte[] buf = new byte[detectLength];
		fis = new FileInputStream(file);
		UniversalDetector detector = new UniversalDetector(null);
		int nread;
		while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
			detector.handleData(buf, 0, nread);
		}
		detector.dataEnd();
		charset = detector.getDetectedCharset();
		detector.reset();
	} finally {
		if (fis != null) {
			fis.close();
		}
	}
	return charset;
}
 
開發者ID:baishui2004,項目名稱:common_gui_tools,代碼行數:25,代碼來源:JUniversalChardet.java

示例11: guessCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public static String guessCharset(String fileName) throws IOException{
	 byte[] buf = new byte[4096];
	    java.io.FileInputStream fis = new java.io.FileInputStream(fileName);


	    UniversalDetector detector = new UniversalDetector(null);


	    int nread;
	    while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
	      detector.handleData(buf, 0, nread);
	    }
	    detector.dataEnd();

	    String encoding = detector.getDetectedCharset();
	    if (encoding != null) {
	      Log.d("ConvertUtil",fileName+" detected encoding = " + encoding);
	    } else {
	    	   Log.d("ConvertUtil","No encoding detected = " + encoding);
	    }

	    detector.reset();
	    return encoding;
}
 
開發者ID:misgod,項目名稱:palmbookreader,代碼行數:25,代碼來源:ConvertUtil.java

示例12: extractCharset

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
/**
 * This method extracts the charset from the html source code.
 * If the charset is not specified, it is set to UTF-8 by default
 * @param is
 * @return
 */
public static String extractCharset(InputStream is) throws java.io.IOException {
    byte[] buf = new byte[4096];
    UniversalDetector detector = new UniversalDetector(null);
    int nread;
    while ((nread = is.read(buf)) > 0 && !detector.isDone()) {
        detector.handleData(buf, 0, nread);
    }
    detector.dataEnd();

    String encoding = detector.getDetectedCharset();
    if (encoding != null) {
        LOGGER.debug("Detected encoding = " + encoding);
    } else {
        LOGGER.debug("No encoding detected.");
    }

    detector.reset();
    if (encoding != null && CrawlUtils.isValidCharset(encoding)) {
        return encoding;
    } else {
        return DEFAULT_CHARSET;
    }
}
 
開發者ID:Tanaguru,項目名稱:Tanaguru,代碼行數:30,代碼來源:CrawlUtils.java

示例13: setLyricFile

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public void setLyricFile(File file) {

        if (file == null || !file.exists()) {
            reset();
            return;
        } else if (file.getPath().equals(mCurrentLyricFilePath)) {
            return;
        } else {
            mCurrentLyricFilePath = file.getPath();
            reset();
        }
        try {

            FileInputStream fis = new FileInputStream(file);
            byte[] buf = new byte[1024];
            UniversalDetector detector = new UniversalDetector(null);
            int nread;
            while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
                detector.handleData(buf, 0, nread);
            }

            detector.dataEnd();
            String encoding = detector.getDetectedCharset();
            if (encoding != null) {
                setLyricFile(file, encoding);
            } else {
                setLyricFile(file, "UTF-8");
            }
            detector.reset();
            fis.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 
開發者ID:Zackratos,項目名稱:PureMusic,代碼行數:36,代碼來源:LyricView.java

示例14: setLyricFile

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
public void setLyricFile(File file) {

        if (file == null || !file.exists()) {
            reset();
            mCurrentLyricFilePath = "";
            return;
        } else if (file.getPath().equals(mCurrentLyricFilePath)) {
            return;
        } else {
            mCurrentLyricFilePath = file.getPath();
            reset();
        }
        try {

            FileInputStream fis = new FileInputStream(file);
            byte[] buf = new byte[1024];
            UniversalDetector detector = new UniversalDetector(null);
            int nread;
            while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
                detector.handleData(buf, 0, nread);
            }

            detector.dataEnd();
            String encoding = detector.getDetectedCharset();
            if (encoding != null) {
                setLyricFile(file, encoding);
            } else {
                setLyricFile(file, "UTF-8");
            }
            detector.reset();
            fis.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 
開發者ID:h4h13,項目名稱:RetroMusicPlayer,代碼行數:37,代碼來源:LyricView.java

示例15: guessEncodingByMozilla

import org.mozilla.universalchardet.UniversalDetector; //導入方法依賴的package包/類
/**
 * 根據字節數組,猜測可能的字符集,如果檢測失敗,返回utf-8
 *
 * @param bytes 待檢測的字節數組
 * @return 可能的字符集,如果檢測失敗,返回utf-8
 */
public static String guessEncodingByMozilla(byte[] bytes) {
    String DEFAULT_ENCODING = "UTF-8";
    UniversalDetector detector = new UniversalDetector(null);
    detector.handleData(bytes, 0, bytes.length);
    detector.dataEnd();
    String encoding = detector.getDetectedCharset();
    detector.reset();
    if (encoding == null) {
        encoding = DEFAULT_ENCODING;
    }
    return encoding;
}
 
開發者ID:jtduan,項目名稱:common-spider,代碼行數:19,代碼來源:CharsetDetector.java


注:本文中的org.mozilla.universalchardet.UniversalDetector.reset方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。