PHP simple large XML parser

September 6th, 2009 by Dragos Leave a reply »

I needed a simple PHP script to parse large XML files fast and without huge memory consumption, so I’ve written a small class for this.

This class can be used to parse large XML files (it works with small one also) fast and with minimum of memory consumption.
It can parse any valid XML and convert it to an array. What it does not do is to get the attributes of the nodes.
If you need it, contact me and i can implement it for you if you want.

You can parse any part of the XML as it supports XPath with the same performance as parsing the entire XML (well, a little bit faster as it’s less data to parse)

Here is an example

Let’s say we have the following XML file (called example.xml)

<?xml version="1.0" encoding="UTF-8" ?>
<!--
/**********************
 * Example XML (not that large, but it's for demo purpose only)
 **********************/
 -->
<myFirstNode>
	<color-palettes>
		<color type='txt'>red</color>
		<color type='txt'>yellow</color>
		<color type='txt'>lime</color>
		<color type='txt'>cyan</color>
		<color type='txt'>blue</color>
		<color type='txt'>magenta</color>
		<color type='txt'>white</color>
		<color type='txt'>black</color>
		<color type='hex'>#FF0000</color>
		<color type='hex'>#FFFF00</color>
		<color type='hex'>#00FF00</color>
		<color type='hex'>#00FFFF</color>
		<color type='hex'>#0000FF</color>
		<color type='hex'>#FF00FF</color>
		<color type='hex'>#FFFFFF</color>
		<color type='hex'>#000000</color>
	</color-palettes>
	<first-100-numbers>
		<number n='1'>1</number>
		<number n='2'>2</number>
		<number n='3'>3</number>
		...
		<number n='97'>97</number>
		<number n='98'>98</number>
		<number n='99'>99</number>
		<number n='100'>100</number>
	</first-10-numbers>
	<searchengines>
		<engine>
			<name>Google</name>
			<website>http://www.google.com</website>
		</engine>
		<engine>
			<name>Yahoo</name>
			<website>http://www.yahoo.com</website>
		</engine>
		<engine>
			<name>Bing</name>
			<website>http://www.bing.com</website>
		</engine>
	</searchengines>
</myFirstNode>

And here is the PHP code to extract some data from it as an array:

<php

// include the class
require_once('SimpleLargeXMLParser.class.php');
$xml = "example.xml";

// get all colors in hex format as an array
$array = SimpleLargeXMLParser::parseXML($xml, "//myFirstNode/color-palettes/color[@type='hex']");

// get all numbers bigger then 50 as an array
$array = SimpleLargeXMLParser::parseXML($xml, "//myFirstNode/first-100-numbers/number[@n>'50']");

// get all search engines as an array
$array = SimpleLargeXMLParser::parseXML($xml, "//myFirstNode/searchengines");

// get the full XML file as an array
// if you don't specify the first node the script will search for it and use the root node
// for performance reasons is better to specify it if you know it
$array = SimpleLargeXMLParser::parseXML($xml, "//myFirstNode"); 

?>

A new version is available. See [p2p type=”id” value=”225″ text=”this post”] for more information about what’s new.

Download

Download here. (there are some examples in the package)

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
PHP simple large XML parser, 10.0 out of 10 based on 3 ratings
Advertisement

4 comments

  1. Batmunkh says:

    Hello,
    nice class saving my time.
    i have one question to ask.

    How to get node attribute values?

    :)

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. Dragos says:

    I’ve updated the class. Now you can get the attributes also.
    see this post: http://www.protung.ro/2009/10/new-php-simple-large-xml-parser-version/

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  3. Vijay says:

    Hi,

    Such a nice class dude,

    but i got a problem,

    I can’t read these type of xmls : http://feeds2.feedburner.com/24thfloor

    Can you please provide some help here…..

    I save it as XML and manualy add the tag … then only it produce result.

    i tried to give

    $array = SimpleLargeXMLParser::parseXML($xml,’//feed’);

    and

    $array = SimpleLargeXMLParser::parseXML($xml,’//entry’);

    but it won’t work

    Please help,

    Thanks in Advance

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)