public class CharsetToolkit extends Object
Utility class to guess the encoding of a given text file.
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
 CharsetToolkit toolkit = new CharsetToolkit(file);
 // guess the encoding
 Charset guessedCharset = toolkit.getCharset();
 // create a reader with the correct charset
 BufferedReader reader = toolkit.getReader();
 // read the file content
 String line;
 while ((line = br.readLine())!= null)
 {
     System.out.println(line);
 }
 
 
          | Constructor and description | 
|---|
| CharsetToolkit
                                (File file)Constructor of the CharsetToolkitutility class. | 
| Type Params | Return Type | Name and description | 
|---|---|---|
|  | public static Charset[] | getAvailableCharsets()Retrieves all the available Charsets on the platform,
 among which the defaultcharset. | 
|  | public Charset | getCharset() | 
|  | public Charset | getDefaultCharset()Retrieves the default Charset | 
|  | public static Charset | getDefaultSystemCharset()Retrieve the default charset of the system. | 
|  | public boolean | getEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding. | 
|  | public BufferedReader | getReader()Gets a BufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default
 charset if an 8-bitCharsetis encountered. | 
|  | public boolean | hasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2). | 
|  | public boolean | hasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le). | 
|  | public boolean | hasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors). | 
|  | public void | setDefaultCharset(Charset defaultCharset)Defines the default Charsetused in case the buffer represents
 an 8-bitCharset. | 
|  | public void | setEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. | 
 Constructor of the CharsetToolkit utility class.
      
file -  of which we want to know the encoding. Retrieves all the available Charsets on the platform,
 among which the default charset.
      
Charsets.Retrieves the default Charset
Retrieve the default charset of the system.
Charset.Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.
 Gets a BufferedReader (indeed a LineNumberReader) from the File
 specified in the constructor of CharsetToolkit using the charset discovered or the default
 charset if an 8-bit Charset is encountered.
BufferedReaderHas a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).
Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).
Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).
 Defines the default Charset used in case the buffer represents
 an 8-bit Charset.
      
defaultCharset -  the default Charset to be returned
 if an 8-bit Charset is encountered. If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.
 It might be a file without any special character in the range 128-255, but that may be or become
 a file encoded with the default charset rather than US-ASCII.
      
enforce -  a boolean specifying the use or not of US-ASCII.Copyright © 2003-2021 The Apache Software Foundation. All rights reserved.