Package groovy.xml

Class XmlSlurper

All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class XmlSlurper extends DefaultHandler
Parse XML into a document tree that may be traversed similar to XPath expressions. For example:

Note that in some cases, a 'selector' expression may not resolve to a single node. For example:

A more realistic example — a book catalog. Given this XML: the equivalent Groovy to slurp it and navigate the tree: Navigation through the returned GPathResult is lazy, so selectors are evaluated on demand rather than exposing an eager groovy.util.Node tree.

See Also:
  • Constructor Details

    • XmlSlurper

      public XmlSlurper() throws ParserConfigurationException, SAXException
      Creates a non-validating and namespace-aware XmlSlurper which does not allow DOCTYPE declarations in documents.

      Parser options can be configured via setters before the first parse call:

       // Using Groovy named parameters:
       def slurper = new XmlSlurper(namespaceAware: false, keepIgnorableWhitespace: true)
       
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlSlurper

      public XmlSlurper(boolean validating, boolean namespaceAware) throws ParserConfigurationException, SAXException
      Creates a XmlSlurper which does not allow DOCTYPE declarations in documents.
      Parameters:
      validating - true if the parser should validate documents as they are parsed; false otherwise.
      namespaceAware - true if the parser should provide support for XML namespaces; false otherwise.
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlSlurper

      public XmlSlurper(boolean validating, boolean namespaceAware, boolean allowDocTypeDeclaration) throws ParserConfigurationException, SAXException
      Creates a XmlSlurper.
      Parameters:
      validating - true if the parser should validate documents as they are parsed; false otherwise.
      namespaceAware - true if the parser should provide support for XML namespaces; false otherwise.
      allowDocTypeDeclaration - true if the parser should provide support for DOCTYPE declarations; false otherwise.
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlSlurper

      public XmlSlurper(XMLReader reader)
      Creates a slurper backed by the supplied SAX reader.
      Parameters:
      reader - the XML reader whose features, properties, and handlers will be used
    • XmlSlurper

      public XmlSlurper(SAXParser parser) throws SAXException
      Creates a slurper backed by the supplied SAX parser.
      Parameters:
      parser - the SAX parser providing the XMLReader used for parsing
      Throws:
      SAXException - if the parser cannot provide an XML reader
  • Method Details

    • setKeepWhitespace

      @Deprecated public void setKeepWhitespace(boolean keepWhitespace)
      Deprecated.
      use setKeepIgnorableWhitespace
      Parameters:
      keepWhitespace - If true then whitespace before elements is kept. The default is to discard the whitespace.
    • setKeepIgnorableWhitespace

      public void setKeepIgnorableWhitespace(boolean keepIgnorableWhitespace)
      Parameters:
      keepIgnorableWhitespace - If true then ignorable whitespace (i.e. whitespace before elements) is kept. The default is to discard the whitespace.
    • isKeepIgnorableWhitespace

      public boolean isKeepIgnorableWhitespace()
      Returns:
      true if ignorable whitespace is kept
    • isNamespaceAware

      public boolean isNamespaceAware()
      Determine if namespace handling is enabled.
      Returns:
      true if namespace handling is enabled
      Since:
      6.0.0
    • setNamespaceAware

      public void setNamespaceAware(boolean namespaceAware)
      Enable and/or disable namespace handling. Must be set before the first parse call.
      Parameters:
      namespaceAware - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
      Since:
      6.0.0
    • isValidating

      public boolean isValidating()
      Determine if the parser validates documents.
      Returns:
      true if validation is enabled
      Since:
      6.0.0
    • setValidating

      public void setValidating(boolean validating)
      Enable and/or disable validation. Must be set before the first parse call.
      Parameters:
      validating - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
      Since:
      6.0.0
    • isAllowDocTypeDeclaration

      public boolean isAllowDocTypeDeclaration()
      Determine if DOCTYPE declarations are allowed.
      Returns:
      true if DOCTYPE declarations are allowed
      Since:
      6.0.0
    • setAllowDocTypeDeclaration

      public void setAllowDocTypeDeclaration(boolean allowDocTypeDeclaration)
      Enable and/or disable DOCTYPE declaration support. Must be set before the first parse call.
      Parameters:
      allowDocTypeDeclaration - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
      Since:
      6.0.0
    • getDocument

      public GPathResult getDocument()
      Returns:
      The GPathResult instance created by consuming a stream of SAX events Note if one of the parse methods has been called then this returns null Note if this is called more than once all calls after the first will return null
    • parse

      public GPathResult parse(InputSource input) throws IOException, SAXException
      Parse the content of the specified input source into a GPathResult object
      Parameters:
      input - the InputSource to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public GPathResult parse(File file) throws IOException, SAXException
      Parses the content of the given file as XML turning it into a GPathResult object
      Parameters:
      file - the File to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public GPathResult parse(InputStream input) throws IOException, SAXException
      Parse the content of the specified input stream into an GPathResult Object. Note that using this method will not provide the parser with any URI for which to find DTDs etc. It is up to you to close the InputStream after parsing is complete (if required).
      Parameters:
      input - the InputStream to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public GPathResult parse(Reader in) throws IOException, SAXException
      Parse the content of the specified reader into a GPathResult Object. Note that using this method will not provide the parser with any URI for which to find DTDs etc. It is up to you to close the Reader after parsing is complete (if required).
      Parameters:
      in - the Reader to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public GPathResult parse(String uri) throws IOException, SAXException
      Parse the content of the specified URI into a GPathResult Object
      Parameters:
      uri - a String containing the URI to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public GPathResult parse(Path path) throws IOException, SAXException
      Parses the content of the file at the given path as XML turning it into a GPathResult object
      Parameters:
      path - the path of the File to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parseText

      public GPathResult parseText(String text) throws IOException, SAXException
      A helper method to parse the given text as XML
      Parameters:
      text - a String containing XML to parse
      Returns:
      An object which supports GPath expressions
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • getDTDHandler

      public DTDHandler getDTDHandler()
      Returns the SAX DTD handler configured on the underlying reader.
      Returns:
      the configured DTD handler, or null if none has been set
    • getEntityResolver

      public EntityResolver getEntityResolver()
      Returns the SAX entity resolver configured on the underlying reader.
      Returns:
      the configured entity resolver, or null if none has been set
    • getErrorHandler

      public ErrorHandler getErrorHandler()
      Returns the SAX error handler configured on the underlying reader.
      Returns:
      the configured error handler, or null if none has been set
    • getFeature

      public boolean getFeature(String uri) throws SAXNotRecognizedException, SAXNotSupportedException
      Looks up a SAX feature on the underlying reader.
      Parameters:
      uri - the fully qualified SAX feature URI
      Returns:
      true if the feature is enabled
      Throws:
      SAXNotRecognizedException - if the feature name is not recognized
      SAXNotSupportedException - if the feature is recognized but not supported
    • getProperty

      Looks up a SAX property on the underlying reader.
      Parameters:
      uri - the fully qualified SAX property URI
      Returns:
      the current value of the property
      Throws:
      SAXNotRecognizedException - if the property name is not recognized
      SAXNotSupportedException - if the property is recognized but not supported
    • setDTDHandler

      public void setDTDHandler(DTDHandler dtdHandler)
      Sets the SAX DTD handler on the underlying reader.
      Parameters:
      dtdHandler - the DTD handler to receive notation and unparsed entity callbacks
    • setEntityResolver

      public void setEntityResolver(EntityResolver entityResolver)
      Sets the SAX entity resolver on the underlying reader.
      Parameters:
      entityResolver - the resolver to use for external entities
    • setEntityBaseUrl

      public void setEntityBaseUrl(URL base)
      Resolves entities against using the supplied URL as the base for relative URLs
      Parameters:
      base - The URL used to resolve relative URLs
    • setErrorHandler

      public void setErrorHandler(ErrorHandler errorHandler)
      Sets the SAX error handler on the underlying reader.
      Parameters:
      errorHandler - the handler to receive parser warnings and errors
    • setFeature

      public void setFeature(String uri, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException
      Enables or disables a SAX feature on the underlying reader.
      Parameters:
      uri - the fully qualified SAX feature URI
      value - the value to apply
      Throws:
      SAXNotRecognizedException - if the feature name is not recognized
      SAXNotSupportedException - if the feature is recognized but not supported
    • setProperty

      public void setProperty(String uri, Object value) throws SAXNotRecognizedException, SAXNotSupportedException
      Sets a SAX property on the underlying reader.
      Parameters:
      uri - the fully qualified SAX property URI
      value - the value to apply
      Throws:
      SAXNotRecognizedException - if the property name is not recognized
      SAXNotSupportedException - if the property is recognized but not supported
    • startDocument

      public void startDocument() throws SAXException
      Resets the current slurped document before SAX events for a new parse begin.
      Specified by:
      startDocument in interface ContentHandler
      Overrides:
      startDocument in class DefaultHandler
      Throws:
      SAXException - if the SAX pipeline reports an error
    • startPrefixMapping

      public void startPrefixMapping(String tag, String uri) throws SAXException
      Records namespace prefix hints for later GPathResult navigation.
      Specified by:
      startPrefixMapping in interface ContentHandler
      Overrides:
      startPrefixMapping in class DefaultHandler
      Parameters:
      tag - the declared prefix
      uri - the namespace URI bound to the prefix
      Throws:
      SAXException - if the SAX pipeline reports an error
    • startElement

      public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException
      Creates a slurper node for the current element and pushes it onto the parse stack.
      Specified by:
      startElement in interface ContentHandler
      Overrides:
      startElement in class DefaultHandler
      Parameters:
      namespaceURI - the namespace URI, or an empty string if namespaces are unavailable
      localName - the local element name
      qName - the qualified element name as reported by SAX
      atts - the element attributes
      Throws:
      SAXException - if node creation fails
    • ignorableWhitespace

      public void ignorableWhitespace(char[] buffer, int start, int len) throws SAXException
      Receives ignorable whitespace and optionally preserves it as text content.
      Specified by:
      ignorableWhitespace in interface ContentHandler
      Overrides:
      ignorableWhitespace in class DefaultHandler
      Parameters:
      buffer - the character buffer supplied by SAX
      start - the start offset in the buffer
      len - the number of characters to read
      Throws:
      SAXException - if the SAX pipeline reports an error
    • characters

      public void characters(char[] ch, int start, int length) throws SAXException
      Buffers character data until the surrounding element boundary is reached.
      Specified by:
      characters in interface ContentHandler
      Overrides:
      characters in class DefaultHandler
      Parameters:
      ch - the character buffer supplied by SAX
      start - the start offset in the buffer
      length - the number of characters to read
      Throws:
      SAXException - if the SAX pipeline reports an error
    • endElement

      public void endElement(String namespaceURI, String localName, String qName) throws SAXException
      Flushes buffered text and restores the parent node when an end tag is reached.
      Specified by:
      endElement in interface ContentHandler
      Overrides:
      endElement in class DefaultHandler
      Parameters:
      namespaceURI - the namespace URI, or an empty string if namespaces are unavailable
      localName - the local element name
      qName - the qualified element name as reported by SAX
      Throws:
      SAXException - if text handling fails
    • endDocument

      public void endDocument() throws SAXException
      Receives the end-of-document callback. The built tree remains available through the one-shot getDocument() result.
      Specified by:
      endDocument in interface ContentHandler
      Overrides:
      endDocument in class DefaultHandler
      Throws:
      SAXException - if the SAX pipeline reports an error