| 
Groovy 1.8.5 | |||||||
| FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectgroovy.util.CharsetToolkit
public class CharsetToolkit extends Object
Utility class to guess the encoding of a given text file.
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
 CharsetToolkit toolkit = new CharsetToolkit(file);
 // guess the encoding
 Charset guessedCharset = toolkit.getCharset();
 // create a reader with the correct charset
 BufferedReader reader = toolkit.getReader();
 // read the file content
 String line;
 while ((line = br.readLine())!= null)
 {
     System.out.println(line);
 }
 
 | Constructor Summary | |
            CharsetToolkit(File file)
            Constructor of the   | 
        |
| Method Summary | |
|---|---|
            static Charset[]
         | 
        
            getAvailableCharsets()
            Retrieves all the available   | 
        
            Charset
         | 
        
            getCharset()
             | 
        
            Charset
         | 
        
            getDefaultCharset()
            Retrieves the default Charset  | 
        
            static Charset
         | 
        
            getDefaultSystemCharset()
            Retrieve the default charset of the system.  | 
        
            boolean
         | 
        
            getEnforce8Bit()
            Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.  | 
        
            BufferedReader
         | 
        
            getReader()
            Gets a   | 
        
            boolean
         | 
        
            hasUTF16BEBom()
            Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).  | 
        
            boolean
         | 
        
            hasUTF16LEBom()
            Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).  | 
        
            boolean
         | 
        
            hasUTF8Bom()
            Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).  | 
        
            void
         | 
        
            setDefaultCharset(Charset defaultCharset)
            Defines the default   | 
        
            void
         | 
        
            setEnforce8Bit(boolean enforce)
            If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.  | 
        
| Methods inherited from class Object | |
|---|---|
| wait, wait, wait, equals, toString, hashCode, getClass, notify, notifyAll | 
| Constructor Detail | 
|---|
public CharsetToolkit(File file)
CharsetToolkit utility class.
     file -  of which we want to know the encoding.
| Method Detail | 
|---|
public static Charset[] getAvailableCharsets()
Charsets on the platform,
 among which the default charset.
     Charsets.
public Charset getCharset()
public Charset getDefaultCharset()
public static Charset getDefaultSystemCharset()
Charset.
public boolean getEnforce8Bit()
public BufferedReader getReader()
BufferedReader (indeed a LineNumberReader) from the File
 specified in the constructor of CharsetToolkit using the charset discovered or the default
 charset if an 8-bit Charset is encountered.BufferedReader
public boolean hasUTF16BEBom()
public boolean hasUTF16LEBom()
public boolean hasUTF8Bom()
public void setDefaultCharset(Charset defaultCharset)
Charset used in case the buffer represents
 an 8-bit Charset.
     defaultCharset -  the default Charset to be returned
 if an 8-bit Charset is encountered.
public void setEnforce8Bit(boolean enforce)
charset rather than US-ASCII.
     enforce -  a boolean specifying the use or not of US-ASCII.
Copyright © 2003-2011 The Codehaus. All rights reserved.