|
Groovy Documentation | |||||||
FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object groovy.util.CharsetToolkit
public class CharsetToolkit extends java.lang.Object
Utility class to guess the encoding of a given text file.
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
CharsetToolkit toolkit = new CharsetToolkit(file); // guess the encoding Charset guessedCharset = toolkit.getCharset(); // create a reader with the correct charset BufferedReader reader = toolkit.getReader(); // read the file content String line; while ((line = br.readLine())!= null) { System.out.println(line); }
Constructor Summary | |
CharsetToolkit(java.io.File file)
Constructor of the |
Method Summary | |
---|---|
static java.nio.charset.Charset[]
|
getAvailableCharsets()
|
java.nio.charset.Charset
|
getCharset()
|
java.nio.charset.Charset
|
getDefaultCharset()
Retrieves the default Charset |
static java.nio.charset.Charset
|
getDefaultSystemCharset()
|
boolean
|
getEnforce8Bit()
Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding. |
java.io.BufferedReader
|
getReader()
|
boolean
|
hasUTF16BEBom()
|
boolean
|
hasUTF16LEBom()
Gets a |
boolean
|
hasUTF8Bom()
|
void
|
setDefaultCharset(java.nio.charset.Charset defaultCharset)
Defines the default |
void
|
setEnforce8Bit(boolean enforce)
If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. |
Methods inherited from class java.lang.Object | |
---|---|
java.lang.Object#wait(long, int), java.lang.Object#wait(long), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
Constructor Detail |
---|
public CharsetToolkit(java.io.File file)
CharsetToolkit
utility class.
file
- of which we want to know the encoding.
Method Detail |
---|
public static java.nio.charset.Charset[] getAvailableCharsets()
public java.nio.charset.Charset getCharset()
public java.nio.charset.Charset getDefaultCharset()
public static java.nio.charset.Charset getDefaultSystemCharset()
public boolean getEnforce8Bit()
public java.io.BufferedReader getReader()
public boolean hasUTF16BEBom()
public boolean hasUTF16LEBom()
BufferedReader
(indeed a LineNumberReader
) from the File
specified in the constructor of CharsetToolkit
using the charset discovered or the default
charset if an 8-bit Charset
is encountered.BufferedReader
public boolean hasUTF8Bom()
public void setDefaultCharset(java.nio.charset.Charset defaultCharset)
Charset
used in case the buffer represents
an 8-bit Charset
.
defaultCharset
- the default Charset
to be returned
if an 8-bit Charset
is encountered.
public void setEnforce8Bit(boolean enforce)
charset
rather than US-ASCII.
enforce
- a boolean specifying the use or not of US-ASCII.
Groovy Documentation