How To Create a WebPage Scrapper WordPress Plugin

Reading Time: 12 minutes
359 Views

Building a WebPage Scraper WordPress Plugin integrated with ChatGPT opens up possibilities for dynamically generating content using natural language.

In this tutorial, we’ll explore the development process of a WordPress plugin that scrapes web content and utilizes ChatGPT, an advanced language model, to enrich and generate text based on the scraped data.

We’ll cover the basics of WordPress plugin development, incorporating web scraping functionalities using appropriate libraries or tools (considering libraries like Simple HTML DOM Parser), and integrating ChatGPT Prompt to process and generate text based on the scraped content.

Read More: Remove #more From URLs In GeneratePress Premium Theme

Let’s get started.

Plan & Prepare ChatGPT Input for WordPress Plugin

First, you need to plan what exactly you want to do with your plugin.

Once decided. Go to ChatGPT prompt and add your input string to get result in proper way.

Suppose you don’t get the response what you are looking for, please change your prompt input value and ask ChatGPT to update your code. It will do in a fly.

On the basis of several changes in ChatGPT prompt’s here is the complete step by step code.

Guide to Creating a WordPress Plugin for Website Scraping

Creating a WordPress plugin to scrape an entire website and generate a comprehensive report involves several steps.

Given the complexity and the vast amount of data involved in scraping an entire site, I’ll outline a simplified version that focuses on scraping and generating a basic report containing titles, meta information, scripts, links, and headings.

Set Up the Plugin Folder and Main file

Create a new folder in the wp-content/plugins directory of your WordPress installation. Let’s name it site-scraper-report.

Create a file named site-scraper-report.php in the site-scraper-report folder. This file will contain the main plugin code.

Plugin Main header Information

Inside site-scraper-report.php, include the plugin information and basic WordPress plugin headers

/*
Plugin Name: Site Scraper Report
Description: Scrapes a website and generates a report.
Version: 1.0
*/

Admin Menu Page

Let’s create an admin menu page where users can input the website URL.

// Add a menu item in the admin panel to input the website URL
function site_scraper_report_menu() {
    add_menu_page(
        'Site Scraper Report',
        'Site Scraper',
        'manage_options',
        'site-scraper',
        'site_scraper_page'
    );
}

add_action('admin_menu', 'site_scraper_report_menu');

// HTML/CSS for the plugin's admin page
function site_scraper_page() {
    ...
}

Read More: How To Add Custom Author Box In Free GeneratePress Theme (Without Plugin)

Plugin HTML Code

It provides a input for website URL. Also it contains the layout for accordion sections like Title, Meta, Link, Script, Images, etc.

<div class="wrap">
    <h1>Site Scraper Report</h1>
    <form id="scrape-form">
        <label for="website_url">Enter Website URL:</label>
        <input type="text" id="website_url" name="website_url">
        <input type="submit" name="submit" value="Generate Report">
    </form>

    <div class="loading" id="loading-indicator">Loading...</div>

    ....

</div>

Accordion and Ajax Javascript Functions

These are the scripts used to handle accordion effect and also submit form using ajax with url input value.

Additionally the code also binds the server response to HTML layout.

<script>
    jQuery(document).ready(function($) {
        // Accordion functionality
        $('.accordion').click(function() {
            $(this).toggleClass('active');
            var panel = $(this).next();
            
            ...
        });
    
        // AJAX for generating report
        $('#scrape-form').submit(function(event) {
            event.preventDefault();
    
            // Show loading indicator
            $('#loading-indicator').addClass('active');
    
            ...
        });
    });
    </script>

Scraping Logic and Report Generation

Inside the site-scraper-report.php file, you’ll need to handle the form submission and integrate scraping logic to generate the report based on the provided URL.

This involves using PHP Simple HTML DOM Parser to retrieve content and then organize it into an HTML report.

// AJAX handler for generating report
add_action('wp_ajax_generate_report', 'generate_report');
add_action('wp_ajax_nopriv_generate_report', 'generate_report');

function generate_report() {
    if (isset($_POST['website_url'])) {
        $website_url = $_POST['website_url'];

        // Use PHP and DOMDocument to scrape the website and generate the report
        $html = file_get_contents($website_url);

        ...
    }
}

Complete Source Code (Website Scraping WP Plugin)

Open site-scraper-report.php, write this complete code into it.

<?php
/*
Plugin Name: Site Scraper Report
Description: Scrapes a website and generates a report.
Version: 1.0
Author: OpenAI
*/

// Enqueue jQuery for AJAX functionality
function enqueue_jquery() {
    wp_enqueue_script('jquery');
}
add_action('wp_enqueue_scripts', 'enqueue_jquery');

function site_scraper_report_menu() {
    add_menu_page(
        'Site Scraper Report',
        'Site Scraper',
        'manage_options',
        'site-scraper',
        'site_scraper_page'
    );
}
add_action('admin_menu', 'site_scraper_report_menu');

function site_scraper_page() {
    ?>
<style>
/* Form styles */
#scrape-form {
    margin-bottom: 20px;
}

label {
    display: block;
    margin-bottom: 5px;
    font-weight: bold;
}

input[type="text"] {
    width: calc(100% - 10px);
    padding: 8px;
    margin-bottom: 10px;
    border: 1px solid #ccc;
    border-radius: 3px;
}

input[type="submit"] {
    padding: 8px 15px;
    background-color: #4CAF50;
    color: white;
    border: none;
    border-radius: 3px;
    cursor: pointer;
}

input[type="submit"]:hover {
    background-color: #45a049;
}

/* Accordion styles */
.accordion {
    background-color: #f9f9f9;
    color: #444;
    cursor: pointer;
    padding: 18px;
    width: 100%;
    text-align: left;
    border: none;
    outline: none;
    transition: 0.4s;
    margin-bottom: 10px;
}

.active,
.accordion:hover {
    background-color: #ddd;
}

.accordion::after {
    content: '\002B';
    color: #777;
    font-weight: bold;
    float: right;
    margin-left: 5px;
}

.active::after {
    content: "\2212";
}

.panel {
    padding: 0 18px;
    display: none;
    background-color: white;
    overflow: hidden;
    border: 1px solid #ccc;
    border-top: none;
}

.loading {
    display: none;
    font-weight: bold;
    margin-top: 10px;
}

.loading.active {
    display: block;
    /* Additional styles to make it visually appealing */
    color: #555;
    font-style: italic;
    text-align: center;
}
</style>

<div class="wrap">
    <h1>Site Scraper Report</h1>
    <form id="scrape-form">
        <label for="website_url">Enter Website URL:</label>
        <input type="text" id="website_url" name="website_url">
        <input type="submit" name="submit" value="Generate Report">
    </form>

    <div class="loading" id="loading-indicator">Loading...</div>

    <!-- Accordion for scraped content -->
    <button class="accordion">Page Title</button>
    <div class="panel" id="page-title-content">
        <!-- Content for page title -->
    </div>

    <button class="accordion">Meta Tags</button>
    <div class="panel" id="meta-tags-content">
        <!-- Content for meta tags -->
    </div>

    <button class="accordion">Headings</button>
    <div class="panel" id="headings-content">
        <!-- Content for headings -->
    </div>

    <button class="accordion">Images</button>
    <div class="panel" id="images-content">
        <!-- Content for images -->
    </div>

    <button class="accordion">Scripts</button>
    <div class="panel" id="scripts-content">
        <!-- Content for scripts -->
    </div>

</div>

<script>
jQuery(document).ready(function($) {
    // Accordion functionality
    $('.accordion').click(function() {
        $(this).toggleClass('active');
        var panel = $(this).next();
        if (panel.css('display') === 'block') {
            panel.css('display', 'none');
        } else {
            panel.css('display', 'block');
        }
    });

    // AJAX for generating report
    $('#scrape-form').submit(function(event) {
        event.preventDefault();

        // Show loading indicator
        $('#loading-indicator').addClass('active');

        var url = $('#website_url').val();
        $.ajax({
            url: '<?php echo admin_url('admin-ajax.php'); ?>',
            type: 'POST',
            data: {
                action: 'generate_report',
                website_url: url
            },
            success: function(response) {

                // Hide loading indicator on success
                $('#loading-indicator').removeClass('active');

                $('#page-title-content').html('<p>' + response.title + '</p>');
                $('#meta-tags-content').html('<ul>' + response.meta.join('') + '</ul>');

                // Update content for other sections
                $('#headings-content').html(response.headings.join('<br>'));
                $('#images-content').html('<ul>' + response.images.join('') + '</ul>');
                $('#scripts-content').html('<ul>' + response.scripts.join('') + '</ul>');
                // Add similar updates for other sections
            },
            error: function() {
                // Hide loading indicator on error (if needed)
                $('#loading-indicator').removeClass('active');
            }
        });
    });
});
</script>
<?php
}

// AJAX handler for generating report
add_action('wp_ajax_generate_report', 'generate_report');
add_action('wp_ajax_nopriv_generate_report', 'generate_report');

function generate_report() {
    if (isset($_POST['website_url'])) {
        $website_url = $_POST['website_url'];

        // Use PHP and DOMDocument to scrape the website and generate the report
        $html = file_get_contents($website_url);

        // Create a DOMDocument instance and suppress warnings for malformed HTML
        $doc = new DOMDocument();
        @$doc->loadHTML($html);

        // Get page title
        $titleNode = $doc->getElementsByTagName('title')->item(0);
        $title = ($titleNode) ? $titleNode->nodeValue : 'No Title Found';

        // Get meta tags
        $metaTags = $doc->getElementsByTagName('meta');
        $meta = [];
        foreach ($metaTags as $tag) {
            $meta[] = '<li>' . $tag->getAttribute('name') . ': ' . $tag->getAttribute('content') . '</li>';
        }

        // Get headings (h1, h2, h3, etc.)
        $headings = [];
        $headingNodes = $doc->getElementsByTagName('*');
        foreach ($headingNodes as $node) {
            if (preg_match('/h\d/', $node->nodeName)) {
                $headings[] = $node->nodeValue;
            }
        }

        // Get images
        $images = [];
        $imageNodes = $doc->getElementsByTagName('img');
        foreach ($imageNodes as $image) {
            //$images[] = '<li><img src="' . $image->getAttribute('src') . '" alt="' . $image->getAttribute('alt') . '"></li>';
            $images[] = '<li>' . $image->getAttribute('src'). '</li>';
        }

        // Get script tags
        $scriptNodes = $doc->getElementsByTagName('script');
        foreach ($scriptNodes as $tag) {
            $scripts[] = '<li>' . $tag->getAttribute('src'). '</li>';
        }

        // Prepare and return the report data
        $report = array(
            'title' => $title,
            'meta' => $meta,
            'headings' => $headings,
            'images' => $images,
            'scripts' => $scripts,
        );

        wp_send_json($report); // Send report data as JSON response
    }
}

?>

Test your WordPress Plugin

Log in to your WordPress dashboard.

  • Go to the Plugins menu and find “Site Scraper Report“.
  • Click “Activate” to activate the plugin.

Read More: 5 Top Free WordPress Cache Plugins to Boost Site Speed

Once plugin activates, it creates at Plugin menu “Site Scraper“.

Click on it,

Put your Webapge URL to get report.

That’s it.

We hope this article helped you to learn about Build a WebPage Scrapper WordPress Plugin Using ChatGPT in a very detailed way.

Online Web Tutor invites you to try Skillshike! Learn CakePHP, Laravel, CodeIgniter, Node Js, MySQL, Authentication, RESTful Web Services, etc into a depth level. Master the Coding Skills to Become an Expert in PHP Web Development. So, Search your favourite course and enroll now.

If you liked this article, then please subscribe to our YouTube Channel for PHP & it’s framework, WordPress, Node Js video tutorials. You can also find us on Twitter and Facebook.