PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, March 4, 2022

[FIXED] How to parse and scrape the content of WordPress

 March 04, 2022     custom-wordpress-pages, domdocument, php, wordpress, wordpress-theming     No comments   

Issue

Is it possible to have a custom function to truncate the contents of a defined DIV on a blog post page to use as the summary on the blog index page. So rather than using $the_content or $the_excerpt - Is it possible to create $the_customContent and have some PHP which checks the blog post page and collects the content of the div with class "ThisIsTheContentToUse" - reason for this is that my blog posts have content on the page above the content I want to be included as the blog summary on the blog index page - so either want to tell WP to ignore those blocks of content, or, probably easier - just tell WP where the content to truncate is - e.g. in the "ThisIsTheContentToUse" div... possible?

If so... how? Can't seem to find anything online that defines this custom functionality - surely I can't be the first person to want to do this...?

Would apply_filters make this possible?

https://developer.wordpress.org/reference/hooks/the_content/

So, The blog post is structured as:

<div class="headerArea">
  <h2>The title is here</h2>
  <ul>
</div>
<div class="bullets">
  <li>Bullet 1</li>
  <li>Bullet 2</li>
  <li>Bullet 3</li>
  </ul>
</div>
<div class="ThisIsTheContentToUse">
  <p>The content starts here</p>
</div>

So, currently with the basic get_the_content, the result is:

"The title is here Bullet 1 Bullet 2 Bullet 3 The content starts here"

But what I want is just the content of the "ThisIsTheContentToUse" div.

So it would be:

"The content starts here"


Solution

There are several ways we could set this up, two of which are popular. We could use the php DOMDocument class as well as the good old favorite of mine regular expressions!

Using DOMDocument:

  1. First, we get the content using get_the_content function.
  2. Then, we'll read the content using DOMDocument.
  3. And finally parse it.
$test = get_the_content();

if (class_exists('DOMDocument')) 
{
    $dom = new DOMDocument();

    $class_name = 'ThisIsTheContentToUse';// This is the class name of your div element

    @$dom->loadHTML($test);

    $nodes = $dom->getElementsByTagName('div');

    foreach ($nodes as $element) 
    {
        $element_class = $element->getAttribute('class');

        if (substr_count($element_class, $class_name)) 
        {
            echo 'Using DOMDocument: ' . $element->nodeValue;
        }
    }
}

Which will output this:

enter image description here


Using Regular Expressions:

  1. We use preg_match function.
  2. This is the pattern <div class="ThisIsTheContentToUse">([^w]*?)<\/div>.
$test = get_the_content();

preg_match('/<div class="ThisIsTheContentToUse">([^w]*?)<\/div>/', $test, $match);

$new_excerpt = $match[1];

echo 'Using regular expressions: ' . $new_excerpt;

Which will output this:

enter image description here



Answered By - Ruvee
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing