Issue
A string variable oldHTMLContent contains a text string from a MailChimp API request response that represents the current content of an email campaign. Here is the string but it includes a bunch of \r\n that you can't see in the display below:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<style type="text/css">
@media only screen and (max-width: 480px) {
table#canspamBar td {
font-size:14px !important;
}
table#canspamBar td a {
display:block !important;
margin-top:10px !important;
}
}
</style>
</head>
<body>
<p> </p>
<div class="userBot">
<a href="http://dev.mydev.org/why-so-many-people-are-signing-up-for-cynthia-for-new-york-volunteer-events"><img src="http://dev.mydev.org/wp-content/uploads/2018/07/CynthiaNixon.jpg" width="1012" height="592" alt="CynthiaNixon.jpg"></a>
<p>When we ask ourselves why so many people are signing up for Cynthia For New York volunteer events this weekend, this is what ... (click for more)</p>
</div> <center>
<br>
<br>
<br>
<br>
<br>
<br>
<table border="0" cellpadding="0" cellspacing="0" width="100%" id="canspamBarWrapper" style="background-color:#FFFFFF;border-top:1px solid #E5E5E5;">
<tr>
<td align="center" valign="top" style="padding-top:20px;padding-bottom:20px;">
<table border="0" cellpadding="0" cellspacing="0" id="canspamBar">
<tr>
<td align="center" valign="top" style="color:#606060;font-family:Helvetica, Arial, sans-serif;font-size:11px;line-height:150%;padding-right:20px;padding-bottom:5px;padding-left:20px;text-align:center;">
This email was sent to <a href="mailto:*|EMAIL|*" target="_blank" style="color:#404040 !important;">*|EMAIL|*</a>
<br><a href="*|ABOUT_LIST|*" target="_blank" style="color:#404040 !important;"><em>why did I get this?</em></a> <a href="*|UNSUB|*" style="color:#404040 !important;">unsubscribe from this list</a> <a href="*|UPDATE_PROFILE|*" style="color:#404040 !important;">update subscription preferences</a>
<br>*|LIST:ADDRESSLINE|*
<br>
<br>
</td>
</tr>
</table>
</td>
</tr>
</table>
</center>
</body>
</html>
I want to extract just the "userBot" class but I can't seem to access it with getElementsByClassName.
When this code executes, the result is always zero.
Dim oldHTMLContent As String
Dim oldHtmlDoc As MSHTML.HTMLDocument
Set oldHtmlDoc = New HTMLDocument
oldHtmlDoc.body.innerText=oldHTMLContent
debug.Print oldHtmlDoc.getElementsByClassName("userBot").length
How do I define the right object and load it with the HTML string so I can work with the userBot class? I can see I'm loading the whole DOM, including
Solution
Transfer as .innerHTML
to the new HTMLDocument
then use a CSS class selector, "."
, as shown below. Also, your naming seems a little confusing. IMO it would be clearer if you were transferring oldInnerHTML
to newHTMLDoc
, or something like that.
Option Explicit
Public Sub test()
Dim html As New HTMLDocument
html.body.innerHTML = [A1] '<= This is your oldHTMLContent. I am reading from a cell.
Debug.Print html.querySelector(".userBot").innerText
End Sub
This is the same as saying:
Debug.Print html.getElementsByClassName("userBot")(0).innerText
Sample of output:
Answered By - QHarr Answer Checked By - Pedro (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.