Issue
I am trying to get the inner HTML of a DOMElement in PHP. Example markup:
<div>...</div>
<div id="target"><p>Here's some <em>funny</em> text</p></div>
<div>...</div>
<div>...</div>
Feeding the above string into the variable $html, I am doing:
$doc = new DOMDocument();
@$doc->loadHTML("<html><body>$html</body></html>");
$node = $doc->getElementById('target')
$markup = '';
foreach ($node->childNodes as $child) {
$markup .= $child->ownerDocument->saveXML($child);
}
The resulting $markup string looks like this (converted to JSON to reveal the invisible characters):
"<p>Here's some \u00a0 <em>funny<\/em> \u00a0 text<\/p>"
All characters have been converted to Unicode non-breaking spaces, which breaks my application.
In my ideal world, there would be a way to retrieve the original string of HTML inside the target div as-is, without DomDocument doing anything to it at all. That doesn't seem to be possible, so the next best thing would be to somehow turn off this character conversion. So far I've tried:
- Setting
$doc->substituteEntities = false;with no result. Changing it totruedoesn't help either. - Toggling
$doc->preserveWhiteSpacewith no change either way - Changing
saveXMLtosaveHTML. Doesn't make a difference.
Finally I resorted to tacking on this hack, which works but doesn't feel like the right solution.
$markup = str_replace("\xc2\xa0", ' ', $markup);
Surely there is a better way?
Solution
You can use mb_convert_encoding() to convert the Unicode characters to their entities without touching your brackets and such:
<?php
$html = '
<div>...</div>
<div id="target"><p>Here\'s some <em>funny</em> text</p></div>
<div>...</div>
<div>...</div>
';
$doc = new DOMDocument();
libxml_use_internal_errors();
$doc->loadHTML("<html><body>$html</body></html>");
$node = $doc->getElementById('target');
$markup = '';
foreach ($node->childNodes as $child) {
$markup .= $child->ownerDocument->saveHTML($child);
}
$markup = mb_convert_encoding($markup, 'HTML-ENTITIES', 'UTF-8');
echo $markup;
Output:
<p>Here's some <em>funny</em> text</p>
Answered By - miken32 Answer Checked By - Katrina (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.