Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Constructor and description |
---|
CharsetToolkit
(File file) Constructor of the CharsetToolkit utility class. |
Type | Name and description |
---|---|
static Charset[] |
getAvailableCharsets() Retrieves all the available Charset s on the platform,
among which the default charset . |
Charset |
getCharset() |
Charset |
getDefaultCharset() Retrieves the default Charset |
static Charset |
getDefaultSystemCharset() Retrieve the default charset of the system. |
boolean |
getEnforce8Bit() Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding. |
BufferedReader |
getReader() Gets a BufferedReader (indeed a LineNumberReader ) from the File
specified in the constructor of CharsetToolkit using the charset discovered or the default
charset if an 8-bit Charset is encountered. |
boolean |
hasUTF16BEBom() Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2). |
boolean |
hasUTF16LEBom() Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le). |
boolean |
hasUTF8Bom() Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors). |
void |
setDefaultCharset(Charset defaultCharset) Defines the default Charset used in case the buffer represents
an 8-bit Charset . |
void |
setEnforce8Bit(boolean enforce) If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. |
Constructor of the CharsetToolkit
utility class.
file
- of which we want to know the encoding. Retrieves all the available Charset
s on the platform,
among which the default charset
.
Charset
s.Retrieves the default Charset
Retrieve the default charset of the system.
Charset
.Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.
Gets a BufferedReader
(indeed a LineNumberReader
) from the File
specified in the constructor of CharsetToolkit
using the charset discovered or the default
charset if an 8-bit Charset
is encountered.
BufferedReader
Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).
Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).
Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).
Defines the default Charset
used in case the buffer represents
an 8-bit Charset
.
defaultCharset
- the default Charset
to be returned
if an 8-bit Charset
is encountered. If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.
It might be a file without any special character in the range 128-255, but that may be or become
a file encoded with the default charset
rather than US-ASCII.
enforce
- a boolean specifying the use or not of US-ASCII.Copyright © 2003-2015 The Apache Software Foundation. All rights reserved.