Skip to content

Instantly share code, notes, and snippets.

@dj9090
Forked from chrisguitarguy/big-xml.php
Last active August 29, 2015 14:20
Show Gist options
  • Select an option

  • Save dj9090/ed1e740c274eb69e142d to your computer and use it in GitHub Desktop.

Select an option

Save dj9090/ed1e740c274eb69e142d to your computer and use it in GitHub Desktop.

Revisions

  1. @chrisguitarguy chrisguitarguy revised this gist Feb 27, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion big-xml.php
    Original file line number Diff line number Diff line change
    @@ -31,7 +31,7 @@
    // over those tags using XMLReader::next().
    while ($tag === $reader->name) {

    // since XMLReader doesn't really supply use with much of a usable
    // since XMLReader doesn't really supply us with much of a usable
    // API, we can convert the current node to an instace of `SimpleXMLElement`
    $elem = new \SimpleXMLElement($reader->readOuterXML());

  2. @chrisguitarguy chrisguitarguy created this gist Feb 27, 2013.
    51 changes: 51 additions & 0 deletions big-xml.php
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,51 @@
    <?php
    /**
    * an example of how to read huge XML files relatively quickly and efficiently
    * using a few core PHP libraries.
    *
    */

    // Assume your file is very large, 140MB or somethig like that
    $fn = __DIR__ . '/some_file.xml';

    // The tag we want to extract from the file
    $tag = 'item';

    // we'll use XMLReader to "parse" the large XML file directly because it doesn't
    // load the entire tree into memory, just "tokenizes" it enough to deal with
    $reader = new \XMLReader();

    // now open our file
    if (!$reader->open($fn)) {
    throw new \RuntimeException("Could not open {$fn} with XMLReader");
    }

    // loop though the file, read just advances to the next node.
    // XMLReader isn't aware of any the document tree, so nodes get
    // iterated over as they appear in the file. We'll just read until
    // the end of the file.
    while ($reader->read()) {

    // XMLReader::$name will contain the current tab name, check to see if it
    // matches the tag you're looking for. If it does, we can just iterate
    // over those tags using XMLReader::next().
    while ($tag === $reader->name) {

    // since XMLReader doesn't really supply use with much of a usable
    // API, we can convert the current node to an instace of `SimpleXMLElement`
    $elem = new \SimpleXMLElement($reader->readOuterXML());

    // now use SimpleXMLElement as you normally would.
    foreach ($elem->children() as $child) {
    echo $child->getName(), ': ', $child, PHP_EOL;
    }

    // Children in a certain namespace even.
    foreach ($elem->children('http://purl.org/dc/elements/1.1/') as $child) {
    echo "{http://purl.org/dc/elements/1.1/}", $child->getName(), ': ', $child, PHP_EOL;
    }

    // move on to the next one
    $reader->next($tag);
    }
    }