Find and Extract All Headings From a Web Page in PHP

Reading Time: 6 minutes
7,719 Views

Inside this article we will see the concept of find and extract all headings from a web page in php. Concept of this article will provide very classified information to understand the things.

This PHP tutorial is based on how to find and extract all headings from a web page. In this guide, we will see how to fetch the HTML content of a web page by URL and then extract all the headings and list them. To do this, we will be use PHP’s DOMDocument class.

Header tags, also known as heading tags, are used to separate headings and subheadings on a webpage. They rank in order of importance, from H1 to H6, with H1s usually being the title. Header tags improve the readability and SEO of a webpage.

HTML headings tags from H1 to H6. Like

<h1>...</h1>
<h2>...</h2>
...
<h6>...</h6>

DOMDocument of PHP also termed as PHP DOM Parser. We will see step by step concept to find and extract all headings from a html using DOM parser.

Learn More –

Let’s get started.


Find & Extract All Headings From a Web Page

Inside this example we will use web page URL to get all headings and extract them.

Create file index.php inside your application.

Open index.php and write this complete code into it.

<?php 

$htmlString = file_get_contents('https://onlinewebtutorblog.com/');

//Create a new DOMDocument object.
$htmlDom = new DOMDocument;

//Load the HTML string into our DOMDocument object.
@$htmlDom->loadHTML($htmlString);

//Extract all h1 elements / tags from the HTML.
$h1Tags = $htmlDom->getElementsByTagName('h1');

//Extract all h2 elements / tags from the HTML.
$h2Tags = $htmlDom->getElementsByTagName('h2');

//Extract all h3 elements / tags from the HTML.
$h3Tags = $htmlDom->getElementsByTagName('h3');

//Extract all h4 elements / tags from the HTML.
$h4Tags = $htmlDom->getElementsByTagName('h4');

//Extract all h5 elements / tags from the HTML.
$h5Tags = $htmlDom->getElementsByTagName('h5');

//Extract all h6 elements / tags from the HTML.
$h6Tags = $htmlDom->getElementsByTagName('h6');

// Arrays to store H1 to H6 headings
$extractedH1Tags = [];
$extractedH2Tags = [];
$extractedH3Tags = [];
$extractedH4Tags = [];
$extractedH5Tags = [];
$extractedH6Tags = [];

// Loop for h1
foreach($h1Tags as $h1Tag){

    //Get the node value of h1 tag
    $h1Value = trim($h1Tag->nodeValue);

    $extractedH1Tags[] = $h1Value;
}

// Loop for h2
foreach($h2Tags as $h2Tag){

    //Get the node value of h2 tag
    $h2Value = trim($h2Tag->nodeValue);

    $extractedH2Tags[] = $h2Value;
}

// Loop for h3
foreach($h3Tags as $h3Tag){

    //Get the node value of h3 tag
    $h3Value = trim($h3Tag->nodeValue);

    $extractedH3Tags[] = $h3Value;
}

// Loop for h4
foreach($h4Tags as $h4Tag){

    //Get the node value of h4 tag
    $h4Value = trim($h4Tag->nodeValue);

    $extractedH4Tags[] = $h4Value;
}

// Loop for h5
foreach($h5Tags as $h5Tag){

    //Get the node value of h5 tag
    $h5Value = trim($h5Tag->nodeValue);

    $extractedH5Tags[] = $h5Value;
}

// Loop for h6
foreach($h6Tags as $h6Tag){

    //Get the node value of h6 tag
    $h6Value = trim($h6Tag->nodeValue);

    $extractedH6Tags[] = $h6Value;
}

$headingsArray = [
  "h1" => $extractedH1Tags,
  "h2" => $extractedH2Tags,
  "h3" => $extractedH3Tags,
  "h4" => $extractedH4Tags,
  "h5" => $extractedH5Tags,
  "h6" => $extractedH6Tags
];

echo "<pre>";

print_r($headingsArray);

Output

When we run index.php. Here is the output

We hope this article helped you to Find and Extract All Headings From a Web Page in PHP Tutorial in a very detailed way.

If you liked this article, then please subscribe to our YouTube Channel for PHP & it’s framework, WordPress, Node Js video tutorials. You can also find us on Twitter and Facebook.