Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save marviorocha/733b05d745c74d05a8bab5ce2b61e23d to your computer and use it in GitHub Desktop.
Save marviorocha/733b05d745c74d05a8bab5ce2b61e23d to your computer and use it in GitHub Desktop.

Revisions

  1. @carolineartz carolineartz revised this gist Apr 9, 2014. 2 changed files with 0 additions and 0 deletions.
    File renamed without changes.
  2. @carolineartz carolineartz renamed this gist Apr 9, 2014. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. @carolineartz carolineartz revised this gist Apr 9, 2014. 1 changed file with 118 additions and 0 deletions.
    118 changes: 118 additions & 0 deletions noko-giri-commandline-ref.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,118 @@
    require 'nokogiri'
    require 'open-uri'

    # Get a Nokogiri::HTML:Document for the page we're interested in...

    doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

    # Do funky things with it using Nokogiri::XML::Node methods...

    ####
    # Search for nodes by css
    doc.css('h3.r a.l').each do |link|
    puts link.content
    end

    doc.at_css('h3').content

    ####
    # Search for nodes by xpath
    doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
    end

    ####
    # Or mix and match.
    doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
    end

    ####
    # Work with attributes
    xml = "<foo wam='bam'>bar</foo>"

    doc = Nokogiri::XML(xml)
    doc.at_css("foo").content => "bar"
    doc.at_css("foo")["wam"].content => "bam"

    ####
    # Work with elements
    el = doc.at_css("foo")
    el.children # => array of elements

    ####

    So for example if we wanted to know all the names of the food items in our
    document we simply say:
    > doc.xpath("//name").collect(&:text)
    => ["carrot", "tomato", "corn", "grapes", "orange", "pear", "apple"]

    If we were interested in the entire node we could leave off the
    .collect(&:text). What if we wanted to select all the names of food items that
    were best baked? This requires us to use what’s called an axis – we will
    first need to find the element “baked” but then go back up our XML elements to
    find which food the item is inside.
    > doc.xpath("//tag[text()='baked']/ancestor::node()/name").collect(&:text)
    => ["pear", "apple"]

    What if we were only interested in vegetables that were good for roasting?
    Just add //veggies:
    >
    doc.xpath("//veggies//tag[text()='roasted']/ancestor::node()/name").collect(&:t
    xt)
    => ["carrot", "tomato"]

    What about if we wanted to know all the tags ‘corn’ had? Again this is very
    easy:
    > doc.xpath("//name[text()='corn']/../tags/tag").collect(&:text)
    => ["raw", "boiled", "grilled"]

    We can even do searches matching the first character. Let’s say we wanted to
    know all the food items that started with the letter ‘c’:
    > doc.xpath("//name[starts-with(text(),'c')]").collect(&:text)
    => ["carrot", "corn"]

    You could also use [contains(text(),'rot'] and get back just carrot, useful
    when you want to do a partial match.

    ####
    # Traversion

    node.ancestors # Ancestors for <node>
    node.at('xpath') # Returns node at given XPATH
    node.at_css('selector') # Returns node at given CSS selector

    node.xpath('xpath') # Returns nodes at given XPATH
    node.css('selector') # Returns nodes at given selector

    node.child # Returns the child node
    node.children # Returns child nodes
    node.parent

    ####
    # Data manipulation

    node.name # Element name
    node.node_type

    node.content # Returns text as string
    # (aka: .inner_text, .text)
    node.content = '...'

    node.inner_html
    node.inner_html = '...'

    node.attribute_nodes # Returns attributes as nodes
    node.attributes # Returns attributes as hash

    ####
    # Tree manipulation

    node.add_next_sibling(other) # Place <other> after <node>
    node.add_previous_sibling(other) # Place <other> before <node>
    node.add_child(other) # Put <other> inside <node>

    node.after(data) # Put a new node after <node>
    node.before(data) # Put a new node before <node>

    node.parent = other # Reparents <node> inside <other>
  4. @carolineartz carolineartz created this gist Apr 9, 2014.
    662 changes: 662 additions & 0 deletions nokogiri-cheat.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,662 @@
    A digest of most of the methods documented at [nokogiri.org](http://nokogiri.org/). Reading [the source](https://github.com/sparklemotion/nokogiri) can help, too.

    Topics not covered: [RelaxNG validation](http://nokogiri.org/Nokogiri/XML/RelaxNG.html) or [Builder](http://nokogiri.org/Nokogiri/XML/Builder.html)
    See also: http://cheat.errtheblog.com/s/nokogiri

    Strings are always stored as UTF-8 internally. Methods that return text
    values will always return UTF-8 encoded strings. Methods that return XML (like
    to_xml, to_html and inner_html) will return a string encoded like the source
    document.

    More Resources
    * [sax-machine](https://github.com/pauldix/sax-machine)
    * [feedzirra](https://github.com/pauldix/feedzirra)
    * [elementor](https://github.com/nakajima/elementor)
    * [mechanize](http://mechanize.rubyforge.org/)
    * [markup_validity](https://github.com/tenderlove/markup_validity)
    * [XPath Reference](http://www.w3.org/TR/xpath/#path-abbrev)
    * [XPath Reference 2](http://msdn.microsoft.com/en-us/library/ms256122.aspx)
    * [CSS Selector Reference](http://msdn.microsoft.com/en-us/library/ie/hh772056(v=vs.85).aspx)
    * [StackOverflow top questions](http://stackoverflow.com/questions/tagged/nokogiri?sort=votes)

    ## Creating and working with Documents
    [Nokogiri::HTML::Document](http://nokogiri.org/Nokogiri/HTML/Document.html)
    [Nokogiri::XML::Document](http://nokogiri.org/Nokogiri/XML/Document.html)
    ``` ruby
    doc = Nokogiri(string_or_io) # Nokogiri will try to guess what type of document you are attempting to parse
    doc = Nokogiri::HTML(string_or_io) # [, url, encoding, options, &block]
    doc = Nokogiri::XML(string_or_io) # [, url, encoding, options, &block]
    # set options with block {|config| config.noblanks.noent.noerror.strict }
    # OR with a bitmask {|config| config.options = Nokogiri::XML::ParseOptions::NOBLANKS | Nokogiri::XML::ParseOptions::NOENT}
    # http://nokogiri.org/Nokogiri/XML/ParseOptions.html
    # doc = Nokogiri.parse(...)
    # doc = Nokogiri::XML.parse(...) #shortcut to Nokogiri::XML::Document.parse
    # doc = Nokogiri::HTML.parse(...) #shortcut to Nokogiri::HTML::Document.parse

    # document namespaces
    doc.collect_namespaces
    doc.remove_namespaces!
    doc.namespaces

    # shortcuts for creating new nodes
    doc.create_cdata(string, &block)
    doc.create_comment(string, &block)
    doc.create_element(name, *args, &block) # Create an element
    doc.create_element "div" # <div></div>
    doc.create_element "div", :class => "container" # <div class='container'></div>
    doc.create_element "div", "contents" # <div>contents</div>
    doc.create_element "div", "contents", :class => "container" # <div class='container'>contents</div>
    doc.create_element "div" { |node| node['class'] = "container" } # <div class='container'></div>
    doc.create_entity
    doc.create_text_node(string, &block)

    doc.root
    doc.root=node

    # A document is a Node, so see working_with_a_node
    ```

    ## Working with Fragments
    [Nokogiri::XML::DocumentFragment](http://nokogiri.org/Nokogiri/XML/DocumentFragment.html)
    [Nokogiri::HTML::DocumentFragment](http://nokogiri.org/Nokogiri/HTML/DocumentFragment.html)

    Generally speaking, unless you expect to have a DOCTYPE and a single root node, you don’t have a document, you have a fragment. For HTML, another rule of thumb is that documents have html and body tags, and fragments usually do not.

    A fragment is a [Node](http://nokogiri.org/Nokogiri/XML/Node.html), but is not a [Document](http://nokogiri.org/Nokogiri/XML/Document.html). If you need to call methods that are only available on Document, like `create_element`, call `fragment.document.create_element`.

    ```ruby
    fragment = Nokogiri::XML.fragment(string)
    fragment = Nokogiri::HTML.fragment(string, encoding = nil)
    # Note: Searching a fragment relative to the document root with xpath
    # will probably not return what you expect. You should search relative to
    # the current context instead. e.g.
    fragment.xpath('//*').size #=> 0
    fragment.xpath('.//*').size #=> 229
    ```

    ## Working with a [Nokogiri::XML::Node](http://nokogiri.org/Nokogiri/XML/Node.html)
    ``` ruby
    node = Nokogiri::XML::Node.new('name', document) # initialize a new node
    node = document.create_element('name') # shortcut

    node.document

    node.name # alias of node.node_name
    node.name= # alias of node.node_name=

    node.read_only?
    node.blank?

    # Type of Node
    node.type # alias of node.node_type
    node.cdata? # type == CDATA_SECTION_NODE
    node.comment? # type == COMMENT_NODE
    node.element? # type == ELEMENT_NODE alias node.elem?
    node.fragment? # type == DOCUMENT_FRAG_NODE (Document fragment node)
    node.html? # type == HTML_DOCUMENT_NODE
    node.text? # type == TEXT_NODE
    node.xml? # type == DOCUMENT_NODE (Document node type)
    # other types not covered by a convenience method
    # ATTRIBUTE_DECL: Attribute declaration type
    # ATTRIBUTE_NODE: Attribute node type
    # DOCB_DOCUMENT_NODE: DOCB document node type
    # DOCUMENT_TYPE_NODE: Document type node type
    # DTD_NODE: DTD node type
    # ELEMENT_DECL: Element declaration type
    # ENTITY_DECL: Entity declaration type
    # ENTITY_NODE: Entity node type
    # ENTITY_REF_NODE: Entity reference node type
    # NAMESPACE_DECL: Namespace declaration type
    # NOTATION_NODE: Notation node type
    # PI_NODE: PI node type
    # XINCLUDE_END: XInclude end type
    # XINCLUDE_START: XInclude start type

    # Attributes, like a hash that maps string keys to string values
    node['src'] # aliases: node.get_attribute, node.attr.
    node['src'] = 'value' # alias node.set_attribute
    node.key?('src') # alias node.has_attribute?
    node.keys
    node.values
    node.delete('src') # alias of node.remove_attribute
    node.each { |attr_name, attr_value| }
    # Node includes Enumerable, which works on these attribute names and values

    # Attribute Nodes
    node.attribute('src') # Get the attribute node with name src
    # Returns a Nokogiri::XML::Attr, a subclass of Nokogiri::XML::Node
    # that provides +.content=+ and +.value=+ to modify the attribute value
    node.attribute_nodes # returns an array of this' the Node attributes as Attr objects.
    node.attribute_with_ns('src', 'namespace') # Get the attribute node with name and namespace
    node.attributes # Returns a hash containing the node's attributes.
    # The key is the attribute name without any namespace,
    # the value is a Nokogiri::XML::Attr representing the attribute.
    # If you need to distinguish attributes with the same name, but with different namespaces, use #attribute_nodes instead.




    # Traversing / Modifying
    # +node_or_tags+ can be a Node, a DocumentFragment, a NodeSet, or a string containing markup.
    ## Self
    node.traverse {|node| } # yields all children and self to a block, _recursively_.
    node.remove # alias of node.unlink # Unlink this node from its current context.
    node.replace(node_or_tags)
    # Replace this Node with +node_or_tags+.
    # Returns the reparented node (if +node_or_tags+ is a Node),
    # or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
    node.swap(node_or_tags) # like +replace+, but returns self to support chaining
    ## Siblings
    node.next # alias of node.next_sibling # Returns the next sibling node
    node.next=(node_or_tags) # alias of node.add_next_sibling
    # Inserts node_or_tags after this node (as a sibling).
    # Returns the reparented node (if +node_or_tags+ is a Node)
    # or returns a NodeSet if (if +node_or_tags is a DocumentFragment, NodeSet, or string.)
    node.after(node_or_tags) # like +next=+, but returns self to suppport chaining
    node.next_element # Returns the next Nokogiri::XML::Element sibling node.
    node.previous # alias of node.previous_sibling # Returns the previous sibling node
    node.previous=(node_or_tags) # alias of node.add_previous_sibling ?
    # Inserts node_or_tags before this node (as a sibling).
    # Returns the reparented node (if +node_or_tags+ is a Node)
    # or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
    node.before(node_or_tags) # just like +previous=+, but returns self to suppport chaining
    node.previous_element # Returns the previous Nokogiri::XML::Element sibling node.
    ## Parent
    node.parent
    node.parent=(node)
    ## Children
    node.child # returns a Node
    node.children # Get the list of children of this node as a NodeSet
    node.children=(node_or_tags)
    # Set the inner html for this Node
    # Returns the reparented node (if +node_or_tags+ is a Node),
    # or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
    node.elements # alias: node.element_children # Get the list of child Elements of this node as a NodeSet.
    node.add_child(node_or_tags)
    # Add +node_or_tags+ as a child of this Node.
    # Returns the reparented node (if +node_or_tags+ is a Node),
    # or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
    node << node_or_tags # like above, but returns self to support chaining, e.g. root << child1 << child2
    node.first_element_child # Returns the first child node of this node that is an element.
    node.last_element_child # Returns the last child node of this node that is an element.
    ## Content / Children
    node.content # aliases node.text node.inner_text node.to_str
    node.content=(string) # Set the Node's content to a Text node containing +string+. The string gets XML escaped, and will not be interpreted as markup.
    node.inner_html # (*args) children.map { |x| x.to_html(*args) }.join
    node.inner_html=(node_or_tags)
    # Sets the inner html of this Node to +node_or_tags+
    # Returns self.
    # Also see related method +children=+





    ## Searching below (see Working with a Nodeset below)
    # see docs for namespace bindings, variable bindings, and custom xpath functions via a handler class
    node.search(*paths) # alias: node / path # paths can be XPath or CSS
    node.at(*paths) # alias node % path # Search for the first occurrence of path. Returns nil if nothing is found, otherwise a Node. (like search(path, ns).first)
    node.xpath(*paths) # search for XPath queries
    node.at_xpath(*paths) # like xpath(*paths).first
    node.css(*rules) # search for CSS rules
    node.at_css(*rules) # like css(*rules).first
    node > selector # Search this node's immediate children using a CSS selector


    # Searching above
    node.ancestors # list of ancestor nodes, closest to furthest, as a NodeSet.
    node.ancestors(selector) # ancestors that match the selector


    # Where am I?
    node.path # Returns the path associated with this Node
    node.css_path # Get the path to this node as a CSS expression
    node.matches?(selector) # does this node match this selector?
    node.line # line number from input
    node.pointer_id # internal pointer number

    # Namespaces
    node.add_namespace(prefix, href) # alias of node.add_namespace_definition
    # Adds a namespace definition with prefix using href value. The result is as
    # if parsed XML for this node had included an attribute
    # ‘xmlns:prefix=value'. A default namespace for this node (“xmlns=”) can be
    # added by passing ‘nil' for prefix. Namespaces added this way will not show
    # up in #attributes, but they will be included as an xmlns attribute when
    # the node is serialized to XML.
    node.default_namespace=(url)
    # Adds a default namespace supplied as a string url href, to self. The
    # consequence is as an xmlns attribute with supplied argument were present
    # in parsed XML. A default namespace set with this method will now show up
    # in #attributes, but when this node is serialized to XML an “xmlns”
    # attribute will appear. See also #namespace and #namespace=
    node.namespace # returns the default namespace set on this node (as with an “xmlns=” attribute), as a Namespace object.
    node.namespace=(ns)
    # Set the default namespace on this node (as would be defined with an
    # “xmlns=” attribute in XML source), as a Namespace object ns . Note that a
    # Namespace added this way will NOT be serialized as an xmlns attribute for
    # this node. You probably want #default_namespace= instead, or perhaps
    # #add_namespace_definition with a nil prefix argument.
    node.namespace_definitions
    # returns namespaces defined on self element directly, as an array of
    # Namespace objects. Includes both a default namespace (as in“xmlns=”), and
    # prefixed namespaces (as in “xmlns:prefix=”).
    node.namespace_scopes
    # returns namespaces in scope for self – those defined on self element
    # directly or any ancestor node – as an array of Namespace objects. Default
    # namespaces (“xmlns=” style) for self are included in this array; Default
    # namespaces for ancestors, however, are not. See also #namespaces
    node.namespaced_key?(attribute, namespace)
    # Returns true if attribute is set with namespace
    node.namespaces # Returns a Hash of {prefix => value} for all namespaces on this node and its ancestors.
    # This method returns the same namespaces as #namespace_scopes.
    #
    # Returns namespaces in scope for self – those defined on self element
    # directly or any ancestor node – as a Hash of attribute-name/value pairs.
    # Note that the keys in this hash XML attributes that would be used to
    # define this namespace, such as “xmlns:prefix”, not just the prefix.
    # Default namespace set on self will be included with key “xmlns”. However,
    # default namespaces set on ancestor will NOT be, even if self has no
    # explicit default namespace.
    # see also attribute_with_ns


    # Rubyisms
    node <=> another_node # Compare two Node objects with respect to their Document. Nodes from different documents cannot be compared.
    # uses xmlXPathCmpNodes "Compare two nodes w.r.t document order"
    node == another_node # compares pointer_id
    node.clone # alias node.dup # Copy this node. An optional depth may be passed in, but it defaults to a deep copy. 0 is a shallow copy, 1 is a deep copy.

    # Visitor pattern
    node.accept(visitor)# calls visitor.visit(self)

    # Write it out (sorted from most flexible/hardest to use to least flexible/easiest to use)
    node.write_to(io, *options)
    # Write Node to +io+ with +options+. +options+ modify the output of
    # this method. Valid options are:
    #
    # * +:encoding+ for changing the encoding
    # * +:indent_text+ the indentation text, defaults to one space
    # * +:indent+ the number of +:indent_text+ to use, defaults to 2
    # * +:save_with+ a combination of SaveOptions constants.
    # SaveOptions
    # AS_BUILDER: Save builder created document
    # AS_HTML: Save as HTML
    # AS_XHTML: Save as XHTML
    # AS_XML: Save as XML
    # DEFAULT_HTML: the default for HTML document
    # DEFAULT_XHTML: the default for XHTML document
    # DEFAULT_XML: the default for XML documents
    # FORMAT: Format serialized xml
    # NO_DECLARATION: Do not include declarations
    # NO_EMPTY_TAGS: Do not include empty tags
    # NO_XHTML: Do not save XHTML
    # e.g. node.write_to(io, :encoding => 'UTF-8', :indent => 2)
    node.write_html_to(io, options={}) # uses write_to with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
    node.write_xhtml_to(io. options={}) # uses write_to with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
    node.write_xml_to(io, options={}) # uses write_to with :save_with => DEFAULT_XML option
    node.serialize # Serialize Node a string using +options+, provided as a hash or block. Uses write_to (via StringIO)
    # node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML)
    # node.serialize(:encoding => 'UTF-8') do |config|
    # config.format.as_xml
    # end
    node.to_html(options={}) # serializes with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
    node.to_xhtml(options={}) # serializes with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
    node.to_xml(options={}) # serializes with :save_with => DEFAULT_XML option
    node.to_s # document.xml? ? to_xml : to_html

    node.inspect
    node.pretty_print(pp) # to enhance pp

    # Utility
    node.encode_special_chars(str) # Encodes special characters :P
    node.fragment(tags) # Create a DocumentFragment containing tags that is relative to this context node.
    node.parse(string_or_io, options={})
    # Parse +string_or_io+ as a document fragment within the context of
    # *this* node. Returns a XML::NodeSet containing the nodes parsed from
    # +string_or_io+.

    # External subsets, like DTD declarations
    node.create_external_subset(name, external_id, system_id)
    node.create_internal_subset(name, external_id, system_id)
    node.external_subset
    node.internal_subset

    # Other:
    node.description # Fetch the Nokogiri::HTML::ElementDescription for this node. Returns nil on XML documents and on unknown tags.
    # e.g. if node is an <img> tag: Nokogiri::HTML::ElementDescription['img'] Nokogiri::HTML::ElementDescription: img embedded image >
    node.decorate! # Decorate this node with the decorators set up in this node's Document. Used internally to provide Slop support and Hpricot compatibility via Nokogiri::Hpricot
    node.do_xinclude # options as a block or hash
    # Do xinclude substitution on the subtree below node. If given a block, a
    # Nokogiri::XML::ParseOptions object initialized from +options+, will be
    # passed to it, allowing more convenient modification of the parser options.

    ```

    ## Working with a [Nokogiri::XML::NodeSet](http://nokogiri.org/Nokogiri/XML/NodeSet.html)
    ``` ruby
    nodes = Nokogiri::XML::NodeSet.new(document, list=[])

    # Set operations
    nodes | other_nodeset # UNION, i.e. merging the sets, returning a new set
    nodes + other_nodeset # UNION, i.e. merging the sets, returning a new set
    nodes & other_nodeset # INTERSECTION # i.e. return a new NodeSet with the common nodes only
    nodes - other_nodeset # DIFFERENCE Returns a new NodeSet containing the nodes in this NodeSet that aren't in other_nodeset
    nodes.include?(node)
    nodes.empty?
    nodes.length # alias nodes.size
    nodes.delete(node) # Delete node from the Nodeset, if it is a member. Returns the deleted node if found, otherwise returns nil.

    # List operations (includes Enumerable)
    nodes.each {|node| }
    nodes.first
    nodes.last
    nodes.reverse # Returns a new NodeSet containing all the nodes in the NodeSet in reverse order
    nodes.index(node) # returns the numeric index or nil
    nodes[3] # element at index 3
    nodes[3,4] # return a NodeSet of size 4, starting at index 3
    nodes[3..6] # or return a NodeSet using a range of indexes
    # alias nodes.slice
    nodes.pop # Removes the last element from set and returns it, or nil if the set is empty
    nodes.push(node) # alias nodes << node # Append node to the NodeSet.
    nodes.shift # Returns the first element of the NodeSet and removes it. Returns nil if the set is empty.
    nodes.filter(expr) # Filter this list for nodes that match expr. WHAT DOES THIS RETURN? NodeSet? Array?
    # find_all { |node| node.matches?(expr) }

    nodes.children # Returns a new NodeSet containing all the children of all the nodes in the NodeSet

    # Content
    nodes.inner_html(*args) # Get the inner html of all contained Node objects
    nodes.inner_text # alias nodes.text

    # Convenience modifiers
    nodes.remove # alias of nodes.unlink # Unlink this NodeSet and all Node objects it contains from their current context.
    nodes.wrap("<div class='container'></div>") # wrap new xml around EACH NODE in a Nodeset
    nodes.before(datum) # Insert datum before the first Node in this NodeSet # e.g. first.before(datum)
    nodes.after(datum) # Insert datum after the last Node in this NodeSet # e.g. last.after(datum)
    nodes.attr(key, value) # set the attribute key to value on all Node objects in the NodeSet
    nodes.attr(key) { |node| 'value' } # set the attribute key to the result of the block on all Node objects in the NodeSet
    # alias nodes.attribute, nodes.set
    nodes.remove_attr(name) # removes the attribute from all nodes in the nodeset
    nodes.add_class(name) # Append the class attribute name to all Node objects in the NodeSet.
    nodes.remove_class(name = nil) # if nil, removes the class attrinute from all nodes in the nodeset

    # Searching
    nodes.search(*paths) # alias nodes / path
    nodes.at(*paths) # alias nodes % path
    nodes.xpath(*paths)
    nodes.at_xpath(*paths)
    nodes.css(*rules)
    nodes.at_css(*rules)
    nodes > selector # Search this NodeSet's nodes' immediate children using CSS selector selector

    # Writing out
    nodes.to_a # alias nodes.to_ary # Return this list as an Array
    nodes.to_html(*args)
    nodes.to_s
    nodes.to_xhtml(*args)
    nodes.to_xml(*args)

    # Rubyisms
    nodes == nodes # Two NodeSets are equal if the contain the same number of elements and if each element is equal to the corresponding element in the other NodeSet
    nodes.dup # Duplicate this node set
    nodes.inspect
    ```

    ## Miscellany
    ``` ruby
    nc = Nokogiri::HTML::NamedCharacters # a Nokogiri::HTML::EntityLookup
    nc[key] # like nc.get(key).try(:value) # e.g. nc['gt'] (62) or nc['rsquo'] (8217)
    nc.get(key) # returns an Nokogiri::HTML::EntityDescription
    # e.g. nc.get('rsquo') #=> #<struct Nokogiri::HTML::EntityDescription value=8217, name="rsquo", description="right single quotation mark, U+2019 ISOnum">

    # Adding a Processing Instruction (like <?xml-stylesheet?>)
    # Nokogiri::XML::ProcessingInstruction http://nokogiri.org/tutorials/modifying_an_html_xml_document.html
    pi = Nokogiri::XML::ProcessingInstruction.new(doc, "xml-stylesheet",'type="text/xsl" href="foo.xsl"')
    doc.root.add_previous_sibling(pi)
    ```


    ## [Reader](http://nokogiri.org/Nokogiri/XML/Reader.html) parsers
    Reader parsers can be used to parse very large XML documents quickly without the need to load the entire document into memory or write a SAX document parser. The reader makes each node in the XML document available exactly once, only moving forward, like a cursor.
    ``` ruby
    reader = Nokogiri::XML::Reader(string_or_io)
    # attrs
    # .encoding
    # .errors
    # .source

    # Reading
    reader.each {|node| } # node and reader are the same object. shortcut for while(node = self.read) yield(node); end;
    reader.read # Move the Reader forward through the XML document.

    node.name
    node.local_name

    # Attributes
    node.attribute('src')
    node.attribute_at(1)
    node.attribute_count
    node.attribute_nodes
    node.attributes
    node.attributes?

    # Content
    node.empty_element?
    node.self_closing?
    node.value # Get the text value of the node if present as a utf-8 encoded string. Does NOT advance the reader.
    node.value? # Does this node have a text value?
    node.inner_xml # Read the contents of the current node, including child nodes and markup into a utf-8 encoded string. Does NOT advance the reader
    node.outer_xml # Does NOT advance the reader

    node.base_uri # Get the xml:base of the node
    node.default? # Was an attribute generated from the default value in the DTD or schema?
    node.depth

    # Namespaces and the rest
    node.namespace_uri # Get the URI defining the namespace associated with the node
    node.namespaces # Get a hash of namespaces for this Node
    node.prefix # Get the shorthand reference to the namespace associated with the node.
    node.xml_version # Get the XML version of the document being read
    node.lang # Get the xml:lang scope within which the node resides.
    node.node_type
    # one of
    # TYPE_ATTRIBUTE
    # TYPE_CDATA
    # TYPE_COMMENT
    # TYPE_DOCUMENT
    # TYPE_DOCUMENT_FRAGMENT
    # TYPE_DOCUMENT_TYPE
    # TYPE_ELEMENT
    # TYPE_END_ELEMENT
    # TYPE_END_ENTITY
    # TYPE_ENTITY
    # TYPE_ENTITY_REFERENCE
    # TYPE_NONE
    # TYPE_NOTATION
    # TYPE_PROCESSING_INSTRUCTION
    # TYPE_SIGNIFICANT_WHITESPACE
    # TYPE_TEXT
    # TYPE_WHITESPACE
    # TYPE_XML_DECLARATION
    node.state # Get the state of the reader
    ```

    ## XSD Validation
    [XSD](http://nokogiri.org/XSD.html)
    [XSD::XMLParser](http://nokogiri.org/XSD/XMLParser.html)
    [XSD::XMLParser::Nokogiri](http://nokogiri.org/XSD/XMLParser/Nokogiri.html)
    ``` ruby
    xsd = Nokogiri::XML::Schema(string_or_io_to_schema_file)
    doc = Nokogiri::XML(File.read(PO_XML_FILE))

    xsd.valid?(doc) # => true/false

    xsd.validate(doc) # returns an an array of SyntaxError s
    xsd.validate(doc).each do |syntax_error|
    syntax_error.error?
    syntax_error.fatal?
    syntax_error.none?
    syntax_error.to_s
    syntax_error.warning?

    # undocumented attributes
    syntax_error.code R
    syntax_error.column R
    syntax_error.domain R
    syntax_error.file R
    syntax_error.int1 R
    syntax_error.level R
    syntax_error.line R
    syntax_error.str1 R
    syntax_error.str2 R
    syntax_error.str3 R
    end


    # http://nokogiri.org/Nokogiri/XML/Schema.html
    # http://nokogiri.org/Nokogiri/XML/AttributeDecl.html
    # http://nokogiri.org/Nokogiri/XML/DTD.html
    # http://nokogiri.org/Nokogiri/XML/ElementDecl.html
    # http://nokogiri.org/Nokogiri/XML/ElementContent.html
    # http://nokogiri.org/Nokogiri/XML/EntityDecl.html
    # http://nokogiri.org/Nokogiri/XML/EntityReference.html

    doc.validate # validate it against its DTD, if it has one
    ```

    ## CSS Parsing
    [Nokogiri::CSS](http://nokogiri.org/Nokogiri/CSS.html)
    [Nokogiri::CSS::Node](http://nokogiri.org/Nokogiri/CSS/Node.html)
    [Nokogiri::CSS::Parser](http://nokogiri.org/Nokogiri/CSS/Parser.html)
    [Nokogiri::CSS::SyntaxError](http://nokogiri.org/Nokogiri/CSS/SyntaxError.html)
    [Nokogiri::CSS::Tokenizer](http://nokogiri.org/Nokogiri/CSS/Tokenizer.html)
    [Nokogiri::CSS::Tokenizer::ScanError](http://nokogiri.org/Nokogiri/CSS/Tokenizer/ScanError.html)
    ``` ruby
    # http://nokogiri.org/Nokogiri/CSS.html
    Nokogiri::CSS.parse('selector') # => returns an AST
    Nokogiri::CSS.xpath_for('selector', options={})

    # http://nokogiri.org/Nokogiri/CSS/Node.html
    # attr: type, value
    #methods
    # accept(visitor)
    # find_by_type
    # new
    # preprocess!
    # to_a
    # to_type
    # to_xpath
    # http://nokogiri.org/Nokogiri/CSS/Parser.html # a Racc generated Parser
    ```


    ## XSLT Transformation
    [Nokogiri::XSLT](http://nokogiri.org/Nokogiri/XSLT.html)
    [Nokogiri::XSLT::Stylesheet](http://nokogiri.org/Nokogiri/XSLT/Stylesheet.html)
    ``` ruby
    doc = Nokogiri::XML(File.read('some_file.xml'))
    xslt = Nokogiri::XSLT(File.read('some_transformer.xslt'))
    puts xslt.transform(doc) # [, xslt_parameters]
    # xslt.serialize(doc) # to am xml string
    # xslt.apply_to(doc, params=[]) # equivalent to xslt.serialize(xslt.transform(doc, params))
    ```

    ## [SAX](http://nokogiri.org/Nokogiri/XML/SAX.html) Parsing
    Event-driving XML parsing appropriate for reading very large XML files without reading the entire document into memory. [The best documentation is in this file.](https://github.com/sparklemotion/nokogiri/blob/master/lib/nokogiri/xml/sax/document.rb)
    ``` ruby
    # Document template
    # Define any or all of these methods to get their notifications:
    # Your document doesn't have to subclass Nokogiri::XML::SAX::Document,
    # doing so just saves you from having to define all the sax methods,
    # rather than the few you need.
    class MyDocument < Nokogiri::XML::SAX::Document
    def xmldecl(version, encoding, standalone)
    end
    def start_document
    end
    def end_document
    end
    def start_element(name, attrs = [])
    end
    def end_element(name)
    end
    def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = [])
    end
    def end_element_namespace(name, prefix = nil, uri = nil)
    end
    def characters(string)
    end
    def comment(string)
    end
    def warning(string)
    end
    def error(string)
    end
    def cdata_block(string)
    end
    end

    # Standard Parser
    parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new) # [, encoding = 'UTF-8]
    # A block can be passed to the parse methods to get the ParserContext before parsing, but you probably don't need that
    parser.parse(string_or_io)
    parser.parse_io(io) # [, encoding = "ASCII"]
    parser.parse_file(filename)
    parser.parse_memory(string)

    # If you want HTML correction features, instantiate this parser instead
    parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new)
    ```

    (If you're a weirdo,) You can stream the XML manually using [Nokogiri::SAX::PushParser](http://nokogiri.org/Nokogiri/XML/SAX/PushParser.html)
    The best documentation is [this file](https://github.com/sparklemotion/nokogiri/blob/master/lib/nokogiri/xml/sax/push_parser.rb).

    ## [Slop](http://nokogiri.org/Nokogiri/Decorators/Slop.html) decorator (Don’t use this)
    The ::Slop decorator implements method_missing such that methods may be used instead of CSS or XPath.
    See the bottom of [this page](http://nokogiri.org/tutorials/searching_a_xml_html_document.html)
    [Nokogiri.Slop](http://nokogiri.org/Nokogiri.html#method-c-Slop)
    [Nokogiri::XML::Document#slop!](http://nokogiri.org/Nokogiri/XML/Document.html#method-i-slop-21)
    [Nokogiri::Decorators::Slop](http://nokogiri.org/Nokogiri/Decorators/Slop.html)

    ``` ruby
    doc = Nokogiri::Slop(string_or_io)
    doc = Nokogiri(string_or_io).slop!
    doc = Nokogiri::HTML(string_or_io).slop!
    doc = Nokogiri::XML(string_or_io).slop!

    doc = Nokogiri::Slop(<<-eohtml)
    <html>
    <body>
    <p>first</p>
    <p>second</p>
    </body>
    </html>
    eohtml
    assert_equal('second', doc.html.body.p[1].text)


    doc = Nokogiri::Slop <<-EOXML
    <employees>
    <employee status="active">
    <fullname>Dean Martin</fullname>
    </employee>
    <employee status="inactive">
    <fullname>Jerry Lewis</fullname>
    </employee>
    </employees>
    EOXML

    # navigate!
    doc.employees.employee.last.fullname.content # => "Jerry Lewis"

    # access node attributes!
    doc.employees.employee.first["status"] # => "active"

    # use some xpath!
    doc.employees.employee("[@status='active']").fullname.content # => "Dean Martin"
    doc.employees.employee(:xpath => "@status='active'").fullname.content # => "Dean Martin"

    # use some css!
    doc.employees.employee("[status='active']").fullname.content # => "Dean Martin"
    doc.employees.employee(:css => "[status='active']").fullname.content # => "Dean Martin"
    ```