I have added support for namespaces to the parser.
What this means, you are now able to register a namspace in order to be able to parse the XML.
Also the usage of this class changed.
<?php
$xml = "http://www.protung.ro/feed/atom/"; // parse an Atom feed
// create a new object
$parser = new SimpleLargeXMLParser();
// load the XML
$parser->loadXML($xml);
// register the namespace
$parser->registerNamespace("atom", "http://www.w3.org/2005/Atom");
// this will get an array of entries
$array = $parser->parseXML("//atom:feed/atom:entry");
?>
As always, you can download the new version from here.
Really good job !
Tested with huge file – around 5 Mo, 4000 entries : less than 1 sec to parse it
I’m having some headaches with parsing xml file, the problem regards xml with hex entities inside of the xml file (i.e: à and so on), I cannot understand why but the result of parsing outputs strange chars and not preserve these entities.
I found your classes and tried to put in example.xml some hex entities …but I have the same results as I had before.
I tried a lot but I cannot find a solution, do you have one?
Make sure you surround the text in CDATA
So if you have something like:
à
make it like:
< ![CDATA[à]]>
Thank you, it works for text inside xml nodes. What doesn’t yet work is for hexa/decimal entities inside an attribute (ie.: <color TEXT="an entity “>#333333). In this case surrounding with a CDATA brakes the parsing process.
I think the only solution is anyway to pre process the xml file and post process the output. Thank you anyway Dragos
you have to escape the
&for example:
<node type='&#xe0;'>&#xe0</node>
the same applies for node values if you do not escape it with CDATA (it’s better with CDATA anyway)
Thank you very much! You saved me!
(and sorry for the proliferation of “wrong” comments)