Package groovy.xml
Class XmlParser
java.lang.Object
groovy.xml.XmlParser
- All Implemented Interfaces:
ContentHandler
A helper class for parsing XML into a tree of Node instances for a
simple way of processing XML. This parser does not preserve the XML
InfoSet - if that's what you need try using W3C DOM, dom4j, JDOM, XOM etc.
This parser ignores comments and processing instructions and converts
the XML into a Node for each element in the XML with attributes
and child Nodes and Strings. This simple model is sufficient for
most simple use cases of processing XML.
Parsing is eager: each parse operation consumes the SAX event stream and
builds a complete
Node tree before returning.
Example usage:
import groovy.xml.XmlParser
def xml = '<root><one a1="uno!"/><two>Some text!</two></root>'
def rootNode = new XmlParser().parseText(xml)
assert rootNode.name() == 'root'
assert rootNode.one[0].@a1 == 'uno!'
assert rootNode.two.text() == 'Some text!'
rootNode.children().each { assert it.name() in ['one','two'] }
-
Constructor Summary
ConstructorsConstructorDescriptionCreates a non-validating and namespace-awareXmlParserwhich does not allow DOCTYPE declarations in documents.XmlParser(boolean validating, boolean namespaceAware) Creates aXmlParserwhich does not allow DOCTYPE declarations in documents.XmlParser(boolean validating, boolean namespaceAware, boolean allowDocTypeDeclaration) Creates aXmlParser.Creates a parser backed by the supplied SAX parser.Creates a parser backed by the supplied SAX reader. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidTransfers buffered character data into the current node when an element boundary is reached.voidcharacters(char[] buffer, int start, int length) Buffers character data until the enclosing element boundary is reached.protected NodecreateNode(Node parent, Object name, Map attributes) Creates a new node with the given parent, name, and attributes.voidCompletes the current parse and clears the internal element stack.voidendElement(String namespaceURI, String localName, String qName) Flushes buffered text and pops the current element when its end tag is seen.voidendPrefixMapping(String prefix) Receives namespace prefix scope end notifications.Returns the document locator last provided by SAX.Returns the SAX DTD handler configured on the underlying reader.protected ObjectgetElementName(String namespaceURI, String localName, String qName) Return a name given the namespaceURI, localName and qName.Returns the SAX entity resolver configured on the underlying reader.Returns the SAX error handler configured on the underlying reader.booleangetFeature(String uri) Looks up a SAX feature on the underlying reader.getProperty(String uri) Looks up a SAX property on the underlying reader.protected XMLReaderReturns the configured XML reader after registering this parser as its content handler.voidignorableWhitespace(char[] buffer, int start, int len) Receives ignorable whitespace and optionally preserves it as text content.booleanDetermine if DOCTYPE declarations are allowed.booleanReturns the current keep ignorable whitespace setting.booleanDetermine if namespace handling is enabled.booleanReturns the current trim whitespace setting.booleanDetermine if the parser validates documents.Parses the content of the given file as XML turning it into a tree of Nodes.parse(InputStream input) Parse the content of the specified input stream into a tree of Nodes.Parse the content of the specified reader into a tree of Nodes.Parse the content of the specified URI into a tree of Nodes.Parses the content of the file at the given path as XML turning it into a tree of Nodes.parse(InputSource input) Parse the content of the specified input source into a tree of Nodes.<T> TParse XML from a file into a typed object.<T> TparseAs(Class<T> type, InputStream stream) Parse XML from an input stream into a typed object.<T> TParse XML from a reader into a typed object.<T> TParse XML from a path into a typed object.A helper method to parse the given text as XML.<T> TparseTextAs(Class<T> type, String text) Parse the content of the specified XML text into a typed object.voidprocessingInstruction(String target, String data) Receives processing instruction callbacks.voidsetAllowDocTypeDeclaration(boolean allowDocTypeDeclaration) Enable and/or disable DOCTYPE declaration support.voidsetDocumentLocator(Locator locator) Stores the locator supplied by SAX for later diagnostics or subclass use.voidsetDTDHandler(DTDHandler dtdHandler) Sets the SAX DTD handler on the underlying reader.voidsetEntityResolver(EntityResolver entityResolver) Sets the SAX entity resolver on the underlying reader.voidsetErrorHandler(ErrorHandler errorHandler) Sets the SAX error handler on the underlying reader.voidsetFeature(String uri, boolean value) Enables or disables a SAX feature on the underlying reader.voidsetKeepIgnorableWhitespace(boolean keepIgnorableWhitespace) Sets the keep ignorable whitespace setting value.voidsetNamespaceAware(boolean namespaceAware) Enable and/or disable namespace handling.voidsetProperty(String uri, Object value) Sets a SAX property on the underlying reader.voidsetTrimWhitespace(boolean trimWhitespace) Sets the trim whitespace setting value.voidsetValidating(boolean validating) Enable and/or disable validation.voidskippedEntity(String name) Receives skipped entity notifications.voidResets the current root node before SAX events for a new document begin.voidstartElement(String namespaceURI, String localName, String qName, Attributes list) Creates a newNodefor the current element and pushes it onto the parse stack.voidstartPrefixMapping(String prefix, String namespaceURI) Receives namespace prefix mapping notifications.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Constructor Details
-
XmlParser
Creates a non-validating and namespace-awareXmlParserwhich does not allow DOCTYPE declarations in documents.Parser options can be configured via setters before the first parse call:
// Using Groovy named parameters: def parser = new XmlParser(namespaceAware: false, trimWhitespace: true)
- Throws:
ParserConfigurationException- if no parser which satisfies the requested configuration can be created.SAXException- for SAX errors.
-
XmlParser
public XmlParser(boolean validating, boolean namespaceAware) throws ParserConfigurationException, SAXException Creates aXmlParserwhich does not allow DOCTYPE declarations in documents.- Parameters:
validating-trueif the parser should validate documents as they are parsed; false otherwise.namespaceAware-trueif the parser should provide support for XML namespaces;falseotherwise.- Throws:
ParserConfigurationException- if no parser which satisfies the requested configuration can be created.SAXException- for SAX errors.
-
XmlParser
public XmlParser(boolean validating, boolean namespaceAware, boolean allowDocTypeDeclaration) throws ParserConfigurationException, SAXException Creates aXmlParser.- Parameters:
validating-trueif the parser should validate documents as they are parsed; false otherwise.namespaceAware-trueif the parser should provide support for XML namespaces;falseotherwise.allowDocTypeDeclaration-trueif the parser should provide support for DOCTYPE declarations;falseotherwise.- Throws:
ParserConfigurationException- if no parser which satisfies the requested configuration can be created.SAXException- for SAX errors.
-
XmlParser
Creates a parser backed by the supplied SAX reader.- Parameters:
reader- the XML reader whose features, properties, and handlers will be used
-
XmlParser
Creates a parser backed by the supplied SAX parser.- Parameters:
parser- the SAX parser providing theXMLReaderused for parsing- Throws:
SAXException- if the parser cannot provide an XML reader
-
-
Method Details
-
isTrimWhitespace
public boolean isTrimWhitespace()Returns the current trim whitespace setting.- Returns:
- true if whitespace will be trimmed
-
setTrimWhitespace
public void setTrimWhitespace(boolean trimWhitespace) Sets the trim whitespace setting value.- Parameters:
trimWhitespace- the desired setting value
-
isKeepIgnorableWhitespace
public boolean isKeepIgnorableWhitespace()Returns the current keep ignorable whitespace setting.- Returns:
- true if ignorable whitespace will be kept (default false)
-
setKeepIgnorableWhitespace
public void setKeepIgnorableWhitespace(boolean keepIgnorableWhitespace) Sets the keep ignorable whitespace setting value.- Parameters:
keepIgnorableWhitespace- the desired new value
-
parse
Parses the content of the given file as XML turning it into a tree of Nodes.- Parameters:
file- the File containing the XML to be parsed- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parse
Parses the content of the file at the given path as XML turning it into a tree of Nodes.- Parameters:
path- the path of the File containing the XML to be parsed- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parse
Parse the content of the specified input source into a tree of Nodes.- Parameters:
input- the InputSource for the XML to parse- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parse
Parse the content of the specified input stream into a tree of Nodes.Note that using this method will not provide the parser with any URI for which to find DTDs etc
- Parameters:
input- an InputStream containing the XML to be parsed- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parse
Parse the content of the specified reader into a tree of Nodes.Note that using this method will not provide the parser with any URI for which to find DTDs etc
- Parameters:
in- a Reader to read the XML to be parsed- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parse
Parse the content of the specified URI into a tree of Nodes.- Parameters:
uri- a String containing a URI pointing to the XML to be parsed- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parseText
A helper method to parse the given text as XML.- Parameters:
text- the XML text to parse- Returns:
- the root node of the parsed tree of Nodes
- Throws:
SAXException- Any SAX exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
-
parseTextAs
Parse the content of the specified XML text into a typed object. Requires jackson-databind on the classpath for type conversion. Supports@JsonPropertyand@JsonFormatannotations.- Type Parameters:
T- the target type- Parameters:
type- the target typetext- the XML text to parse- Returns:
- a typed object
- Throws:
XmlRuntimeException- if parsing or conversion fails, or jackson-databind is absent- Since:
- 6.0.0
-
parseAs
Parse XML from a reader into a typed object. Requires jackson-databind on the classpath for type conversion.- Type Parameters:
T- the target type- Parameters:
type- the target typereader- the reader of XML- Returns:
- a typed object
- Throws:
XmlRuntimeException- if parsing or conversion fails, or jackson-databind is absent- Since:
- 6.0.0
-
parseAs
Parse XML from an input stream into a typed object. Requires jackson-databind on the classpath for type conversion.- Type Parameters:
T- the target type- Parameters:
type- the target typestream- the input stream of XML- Returns:
- a typed object
- Throws:
XmlRuntimeException- if parsing or conversion fails, or jackson-databind is absent- Since:
- 6.0.0
-
parseAs
Parse XML from a file into a typed object. Requires jackson-databind on the classpath for type conversion.- Type Parameters:
T- the target type- Parameters:
type- the target typefile- the XML file- Returns:
- a typed object
- Throws:
IOException- if the file cannot be readXmlRuntimeException- if parsing or conversion fails, or jackson-databind is absent- Since:
- 6.0.0
-
parseAs
Parse XML from a path into a typed object. Requires jackson-databind on the classpath for type conversion.- Type Parameters:
T- the target type- Parameters:
type- the target typepath- the path to the XML file- Returns:
- a typed object
- Throws:
IOException- if the file cannot be readXmlRuntimeException- if parsing or conversion fails, or jackson-databind is absent- Since:
- 6.0.0
-
isNamespaceAware
public boolean isNamespaceAware()Determine if namespace handling is enabled.- Returns:
- true if namespace handling is enabled
-
setNamespaceAware
public void setNamespaceAware(boolean namespaceAware) Enable and/or disable namespace handling. Must be set before the first parse call.- Parameters:
namespaceAware- the new desired value- Throws:
IllegalStateException- if called after parsing has started
-
isValidating
public boolean isValidating()Determine if the parser validates documents.- Returns:
- true if validation is enabled
- Since:
- 6.0.0
-
setValidating
public void setValidating(boolean validating) Enable and/or disable validation. Must be set before the first parse call.- Parameters:
validating- the new desired value- Throws:
IllegalStateException- if called after parsing has started- Since:
- 6.0.0
-
isAllowDocTypeDeclaration
public boolean isAllowDocTypeDeclaration()Determine if DOCTYPE declarations are allowed.- Returns:
- true if DOCTYPE declarations are allowed
- Since:
- 6.0.0
-
setAllowDocTypeDeclaration
public void setAllowDocTypeDeclaration(boolean allowDocTypeDeclaration) Enable and/or disable DOCTYPE declaration support. Must be set before the first parse call.- Parameters:
allowDocTypeDeclaration- the new desired value- Throws:
IllegalStateException- if called after parsing has started- Since:
- 6.0.0
-
getDTDHandler
Returns the SAX DTD handler configured on the underlying reader.- Returns:
- the configured DTD handler, or
nullif none has been set
-
getEntityResolver
Returns the SAX entity resolver configured on the underlying reader.- Returns:
- the configured entity resolver, or
nullif none has been set
-
getErrorHandler
Returns the SAX error handler configured on the underlying reader.- Returns:
- the configured error handler, or
nullif none has been set
-
getFeature
Looks up a SAX feature on the underlying reader.- Parameters:
uri- the fully qualified SAX feature URI- Returns:
trueif the feature is enabled- Throws:
SAXNotRecognizedException- if the feature name is not recognizedSAXNotSupportedException- if the feature is recognized but not supported
-
getProperty
Looks up a SAX property on the underlying reader.- Parameters:
uri- the fully qualified SAX property URI- Returns:
- the current value of the property
- Throws:
SAXNotRecognizedException- if the property name is not recognizedSAXNotSupportedException- if the property is recognized but not supported
-
setDTDHandler
Sets the SAX DTD handler on the underlying reader.- Parameters:
dtdHandler- the DTD handler to receive notation and unparsed entity callbacks
-
setEntityResolver
Sets the SAX entity resolver on the underlying reader.- Parameters:
entityResolver- the resolver to use for external entities
-
setErrorHandler
Sets the SAX error handler on the underlying reader.- Parameters:
errorHandler- the handler to receive parser warnings and errors
-
setFeature
public void setFeature(String uri, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException Enables or disables a SAX feature on the underlying reader.- Parameters:
uri- the fully qualified SAX feature URIvalue- the value to apply- Throws:
SAXNotRecognizedException- if the feature name is not recognizedSAXNotSupportedException- if the feature is recognized but not supported
-
setProperty
public void setProperty(String uri, Object value) throws SAXNotRecognizedException, SAXNotSupportedException Sets a SAX property on the underlying reader.- Parameters:
uri- the fully qualified SAX property URIvalue- the value to apply- Throws:
SAXNotRecognizedException- if the property name is not recognizedSAXNotSupportedException- if the property is recognized but not supported
-
startDocument
Resets the current root node before SAX events for a new document begin.- Specified by:
startDocumentin interfaceContentHandler- Throws:
SAXException- if the SAX pipeline reports an error
-
endDocument
Completes the current parse and clears the internal element stack.- Specified by:
endDocumentin interfaceContentHandler- Throws:
SAXException- if the SAX pipeline reports an error
-
startElement
public void startElement(String namespaceURI, String localName, String qName, Attributes list) throws SAXException Creates a newNodefor the current element and pushes it onto the parse stack.- Specified by:
startElementin interfaceContentHandler- Parameters:
namespaceURI- the namespace URI, or an empty string if namespaces are unavailablelocalName- the local element nameqName- the qualified element name as reported by SAXlist- the element attributes- Throws:
SAXException- if node creation fails
-
endElement
Flushes buffered text and pops the current element when its end tag is seen.- Specified by:
endElementin interfaceContentHandler- Parameters:
namespaceURI- the namespace URI, or an empty string if namespaces are unavailablelocalName- the local element nameqName- the qualified element name as reported by SAX- Throws:
SAXException- if text handling fails
-
characters
Buffers character data until the enclosing element boundary is reached.- Specified by:
charactersin interfaceContentHandler- Parameters:
buffer- the character buffer supplied by SAXstart- the start offset in the bufferlength- the number of characters to read- Throws:
SAXException- if the SAX pipeline reports an error
-
startPrefixMapping
Receives namespace prefix mapping notifications. The default implementation does not retain separate prefix state.- Specified by:
startPrefixMappingin interfaceContentHandler- Parameters:
prefix- the declared prefixnamespaceURI- the namespace URI bound to the prefix- Throws:
SAXException- if the SAX pipeline reports an error
-
endPrefixMapping
Receives namespace prefix scope end notifications. The default implementation performs no action.- Specified by:
endPrefixMappingin interfaceContentHandler- Parameters:
prefix- the prefix leaving scope- Throws:
SAXException- if the SAX pipeline reports an error
-
ignorableWhitespace
Receives ignorable whitespace and optionally preserves it as text content.- Specified by:
ignorableWhitespacein interfaceContentHandler- Parameters:
buffer- the character buffer supplied by SAXstart- the start offset in the bufferlen- the number of characters to read- Throws:
SAXException- if the SAX pipeline reports an error
-
processingInstruction
Receives processing instruction callbacks. The default implementation ignores processing instructions.- Specified by:
processingInstructionin interfaceContentHandler- Parameters:
target- the processing instruction targetdata- the processing instruction data- Throws:
SAXException- if the SAX pipeline reports an error
-
getDocumentLocator
Returns the document locator last provided by SAX.- Returns:
- the current locator, or
nullif parsing has not started
-
setDocumentLocator
Stores the locator supplied by SAX for later diagnostics or subclass use.- Specified by:
setDocumentLocatorin interfaceContentHandler- Parameters:
locator- the document locator for the current parse
-
skippedEntity
Receives skipped entity notifications. The default implementation performs no action.- Specified by:
skippedEntityin interfaceContentHandler- Parameters:
name- the skipped entity name- Throws:
SAXException- if the SAX pipeline reports an error
-
getXMLReader
Returns the configured XML reader after registering this parser as its content handler. Subclasses may override to customize reader preparation before parsing begins.- Returns:
- the XML reader used for subsequent parse operations
-
addTextToNode
protected void addTextToNode()Transfers buffered character data into the current node when an element boundary is reached. Subclasses may override to customize text normalization or whitespace preservation during parsing. -
createNode
Creates a new node with the given parent, name, and attributes. The default implementation returns an instance ofgroovy.util.Node.- Parameters:
parent- the parent node, or null if the node being created is the root nodename- an Object representing the name of the node (typically an instance ofQName)attributes- a Map of attribute names to attribute values- Returns:
- a new Node instance representing the current node
-
getElementName
Return a name given the namespaceURI, localName and qName.- Parameters:
namespaceURI- the namespace URIlocalName- the local nameqName- the qualified name- Returns:
- the newly created representation of the name
-