Querying Whois information with PHP

The wonderful world of Whois!, if you do not know what it is or what it is for, this is probably not the publication for you. But if you know what I’m talking about, then you’re going to be excited to know that you can stop using third-party services to check this information, with a little effort and love for programming you can create your own service! (I’m trying to add excitement to a query for information that, as necessary as it is, is quite common.)

Come on!, join me again to walk the yellow brick path… a path full of wonders!.

What is Whois?

Whois is a public service that allows you to check who owns a domain, its status, expiration dates, renewal and other information of interest.

You can use a Whois query to determine if a domain is available for purchase, for example. (Warning: Your mileage may vary, whois is generally quite reliable but can produce results you don’t expect), use it with caution.

Where can you find the Whois servers of a TLD?

IANA (the entity in charge of regulating the allocation of IP addresses and root DNS for address resolution) has a list at the following link:

Root Zone Database

The Root Zone Database represents the delegation details of top-level domains, including gTLDs such as .com, and…
www.iana.org

This list will help us later to scrape and extract information necessary for our PHP Whois query script.

Warning: Scraping itself is not bad, but misused it ruins everyone’s party, always be very responsible and respectful regarding the information you grab using this technique.

Structure of our solution

Our directory and solution files will be as follows:

/whois           # root directory (your web server must have access to it)
  /cache         # This directory will store json files that contain the whois
                 # server address for a given TLD, these files will be used to catch
                 # the scraping process results.
  - index.html   # This file will be our entry point and GUI interface to query
                 # whois information.
  - getwhois.php # This script will receive the whois request from index.html
                 # and return the result.
  - Whois.php    # Our class definition with all attributes and methods used
                 # to query whois information.

Remember that you must have a web development environment with PHP or at least that you can execute php scripts through a command console. This time we will use PHP 8.2 on an Apache web server.

We will write our code based on three large blocks as follows:

  • Scraping
  • Whois request/query
  • Interface and result presentation

1. Scraping

Let’s do some scraping to extract the Whois server from each domain type (TLD) available in https://www.iana.org/domains/root/db. We will do it “on demand”, we will only scrape as necessary, that is, when we do not have a previous cache file for the TLD, this way we will reduce traffic, respect the site where the information comes from and reduce response times while querying.

For example, we will visit the information of the TLD “.com”, the URL is https://www.iana.org/domains/root/db/com.html, general contact information will be shown and, at the bottom, the Whois server for this type of domains, like this:

iana.org whois

The address next to the text “WHOIS Server” will be the data of interest for our scrape process.

The first step will be to make a request to the website containing the required information and capture the HTML of the response. We can do this with our dear friend cURL like this:

    /** 
     * This function downloads HTML content from a URL 
     * 
     * @param string $url URL to be queried
     * 
     * @return string|bool HTML received in response from the website 
     * if an error occurs it will return false
     */
    function curlDownload(string $url): string|bool
    {

        $curl = curl_init();

        curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
        curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 60);
        curl_setopt($curl, CURLOPT_HEADER, 0);
        curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "GET");
        curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36");
        curl_setopt($curl, CURLOPT_URL, $url);

        $html = curl_exec($curl);
        curl_close($curl);

        return $html;
    }

In the User-Agent header, we set a value that simulates a visit from a desktop browser.

Now that we have got the content of the site, it is necessary to extract from the HTML received the address of the whois server, this can be done with a very useful tool, a tool that generates panic or immense hatred in the depths of the brave ones that dare to look it straight into its eyes, regular expressions.

No matter how you feel about it, always try to remember “Macho man” Randy Savage wise words:

You may not like it, but accept it! — “Macho Man” Randy Savage wisdom

We are going to review the content and create the regular expression, keep in mind that we are not going to cover in detail how it works, we will cover what is important, that it works for our present objective.

Let’s look at the source code of the page where the name of the whois server is located. The section we are interested in has this structure:

    <p>
    
        <b>URL for registration services:</b> <a href="http://www.verisigninc.com">http://www.verisigninc.com</a><br/>
    
    
        <b>WHOIS Server:</b> whois.verisign-grs.com
    
    </p>

Now let’s create a regex to extract just the text “whois.verisign-grs.com”, it will be something like this:

$this->regexWhoisServer = '#(?s)(?<=\<b\>WHOIS Server\:\<\/b\>)(.+?)(?=\<\/p\>)#';

This expression looks for the text between the text pattern “<b>WHOIS Server:</b>” and the first match with the text “<p>”, using PHP we can capture the returned value and save it in a variable to use it later in our whois query.

Our sample code to understand this concept will look like this:

//Array that will be used to store the regex results.
$matchesWhois = array();

//Now we use our HTML download function created previously.
$html = curlDownload("https://www.iana.org/domains/root/db/com.html");
//Use the regex expression to extract the whois server from the HTML.
$resWhois = preg_match("#(?s)(?<=\<b\>WHOIS Server\:\<\/b\>)(.+?)(?=\<\/p\>)#", $html, $matchesWhois, PREG_OFFSET_CAPTURE);

//Now we remove the blank spaces from the result text, stored at the [0][0]
//element in the array
$matchesWhois[0][0] = trim($matchesWhois[0][0]);

//Finally assignt the whois server name to a variable.
$whoisServer = $matchesWhois[0][0];

2. Whois request/query

We already managed to query the name of the server to which we will request the whois information, now we need to implement the code to perform the query, for this we open a connection through sockets and send the domain name to receive a response with the whois data, like this:

//Open a connection to the whois server on port 43 with 20 seconds timeout limit.
$whoisSock = @fsockopen("whois.verisign-grs.com", 43, $errno, $errstr, 20);
//This variable will be used to store the whois result.
$whoisQueryResult = "";

//Send the domain name ending with new line.
fputs($whoisSock, "mytestdomain.com" . "\r\n");

$content = "";

//Read the server response.
while (!feof($whoisSock)) {
    $content .= fgets($whoisSock);
}

//Close the socket.
fclose($whoisSock);

//Convert the string to an array (one element for each line on the string)
$arrResponseLines = explode("\n", $content);

foreach ($arrResponseLines as $line) {

    $line = trim($line);

    //ignore if the line is empty or if it begins with "#" or "%"
    if (($line != '') && (!str_starts_with($line, '#')) && (!str_starts_with($line, '%'))) {
        //Append the line to the result variable.
        $whoisQueryResult .= $line . PHP_EOL;
    }
}

//Show the result.
echo $whoisQueryResult;

Now that we have the result, we will generate the code to query and display the result on the web.

3. Interface and result presentation

To query and show results we will create the Whois class that integrates the concepts previously shown, a file to receive query requests and the web interface to display the results.

Let’s start with our class, we will call it Whois.php and it has the following structure:

<?php

class Whois
{
    //Regex matches
    private array $matchesWhois;
    //Cache files path
    private string $_CACHE_PATH;
    //Regex used to detect the whois server while scraping the TLD URL
    private string $regexWhoisServer;
    //Cache files extension (.json)
    private string $_FILE_EXT;
    //Flag, True = using cache file, False = scraped result
    private bool $usingCache;
    //Domain name to being used to query the whois info.
    private string $domain;
    //Domain TLD
    private string $tld;
    //Cache file name
    private string $cacheFile;
    //URL used to scrape the whois server to be used.
    private string $urlWhoisDB;
    //Array that will contain the answer and errors generated during the whois query
    private array $response;
    //Array, contains the whois server address
    private array $whoisInfo;
    //Tag to be replaced by the domain TLD extracted from the domain name.
    private string $tldUrlTag;
    //Whois port, default 43
    private int $_WHOIS_PORT;
    //Whois query timeout in seconds, default 20
    private int $_WHOIS_TIMEOUT;
    //User Agent to be used to scrape the whois server info.
    private string $_CURL_USER_AGENT;


    /**
     * Class constructor
     */
    public function __construct()
    {
        $this->matchesWhois = array();
        $this->whoisInfo = array();
        $this->_CACHE_PATH = __DIR__ . "/cache/";
        $this->regexWhoisServer = '#(?s)(?<=\<b\>WHOIS Server\:\<\/b\>)(.+?)(?=\<\/p\>)#';
        $this->_FILE_EXT = ".json";
        $this->usingCache = false;
        $this->domain = "";
        $this->tld = "";
        $this->cacheFile = "";
        $this->tldUrlTag = "[TLD]";
        $this->urlWhoisDB = "https://www.iana.org/domains/root/db/{$this->tldUrlTag}.html";
        $this->response = array();
        $this->_WHOIS_PORT = 43;
        $this->_WHOIS_TIMEOUT = 20;
        $this->_CURL_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36";
    }


    /**
     * Domain validation
     * 
     * @param string $domain domain name to be validated i.e. "google.com"
     * 
     * @return bool
     */
    public function isDomain(string $domain): bool
    {
        return filter_var($domain, FILTER_VALIDATE_DOMAIN);
    }


    /**
     * Extracts the TLD from the domain name
     * 
     * @param mixed $domain domain name
     * 
     * @return string
     */
    private function extractTld($domain): string
    {
        $arrDomain = explode(".", $domain);

        return end($arrDomain);
    }

    /**
     * Sets the cache filename for a given TLD, it also checks if the file exists and loads its content
     * 
     * @param mixed $tld domain (TLD), i.e. "com", "net", "org". 
     * 
     * @return void
     */
    private function setCacheFileName($tld): void
    {
        $this->cacheFile = $this->_CACHE_PATH . $tld . $this->_FILE_EXT;

        if (file_exists($this->cacheFile)) {

            $tmpCache = file_get_contents($this->cacheFile);
            $this->whoisInfo = json_decode($tmpCache, true);

            $this->usingCache = true;
        }
    }

    /**
     * This function can be used to check if there where errors during the process
     * 
     * @return bool true = there are errors, false = no errors
     */
    public function hasErrors(): bool
    {
        return isset($this->response["errors"]);
    }

    /**
     * Returns the response received including erros (if any).
     * @param bool $json Allows to select the response format, false = array, true = json
     * 
     * @return array|string
     */
    public function getResponse(bool $json = false): array|string
    {
        return ($json) ? json_encode($this->response) : $this->response;
    }

    /**
     * This function downloads and returns the HTML returned from a URL
     * 
     * @param string $url URL adddress
     * 
     * @return string|bool string containing the HTML received, if there is an error return false.
     */
    private function curlDownload(string $url): string|bool
    {

        $curl = curl_init();

        curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
        curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 60);
        curl_setopt($curl, CURLOPT_HEADER, 0);
        curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "GET");
        curl_setopt($curl, CURLOPT_USERAGENT, $this->_CURL_USER_AGENT);
        curl_setopt($curl, CURLOPT_URL, $url);

        $html = curl_exec($curl);
        curl_close($curl);

        return $html;
    }

    /**
     * Whois query entry point.
     * 
     * @param string $domain domain name for which you want to check the whois
     * 
     * @return void
     */
    public function getWhoisServerDetails(string $domain): void
    {

        $this->domain = $domain;
        $this->tld = $this->extractTld($domain);
        $this->setCacheFileName($this->tld);

        if (!$this->usingCache) {

            $urlWhoisDB = str_replace($this->tldUrlTag, $this->tld, $this->urlWhoisDB);
            $html = $this->curlDownload($urlWhoisDB);

            $resWhois = preg_match($this->regexWhoisServer, $html, $this->matchesWhois, PREG_OFFSET_CAPTURE);

            if ($resWhois != 1) {

                $this->response["errors"][] = array(
                    "error" => "TLD '{$this->tld}' not found!",
                    "domain" => $domain
                );

                return;
            }

            $this->matchesWhois[0][0] = trim($this->matchesWhois[0][0]);
            $this->whoisInfo["whois"] = $this->matchesWhois[0][0];

            file_put_contents($this->_CACHE_PATH . $this->tld . $this->_FILE_EXT, json_encode($this->whoisInfo, JSON_UNESCAPED_UNICODE));
        }

        if (!isset($this->whoisInfo["whois"])) {

            $this->response["errors"][] = array(
                "error" => "WhoIs Server for TLD {$this->tld} not found!.",
                "domain" => $domain
            );

            return;
        }

        $whoisSock = @fsockopen($this->whoisInfo["whois"], $this->_WHOIS_PORT, $errno, $errstr, $this->_WHOIS_TIMEOUT);
        $whoisQueryResult = "";

        if (!$whoisSock) {

            $this->response["errors"][] = array(
                "error" => "{$errstr} ({$errno})",
                "domain" => $domain
            );

            return;
        }

        fputs($whoisSock, $this->domain . "\r\n");

        $content = "";

        while (!feof($whoisSock)) {
            $content .= fgets($whoisSock);
        }

        fclose($whoisSock);

        if ((strpos(strtolower($content), "error") === false) && (strpos(strtolower($content), "not allocated") === false)) {

            $arrResponseLines = explode("\n", $content);

            foreach ($arrResponseLines as $line) {

                $line = trim($line);

                if (($line != '') && (!str_starts_with($line, '#')) && (!str_starts_with($line, '%'))) {
                    $whoisQueryResult .= $line . PHP_EOL;
                }
            }
        }

        $this->response["whoisinfo"] = $whoisQueryResult;
    }
}

In this class we will have all the main functions for scrape, whois server name extraction, domain name processing and final whois query. Note that error checking and handling routines were added, as well as functions to automate scraping from the domain type (TLD) queried and cache strategy to avoid making more queries than necessary to iana.org.

Now let’s create our file that will receive query requests, call it getwhois.php and it will have the following content:

<?php

//Include the class definition.
require("Whois.php");

//Decode the parameters received.
$paramsFetch = json_decode(
    file_get_contents("php://input"),
    true
);

//Create our whois object
$whoisObj = new Whois();
//Query the whois information
$whoisObj->getWhoisServerDetails($paramsFetch["domain"]);

//Return the response as JSON
echo $whoisObj->getResponse(true);
exit;

It is quite simple, it includes our Whois.php class, captures the parameters received from an HTML form that uses the javascript fetch function to send the request with the domain name, creates an instance of our class and makes the whois information query, then returns the result in JSON format and finishes the execution.

Now let’s go to the index.html file, this will be our graphical interface and the access point to the query and visualization of results. I use Bulma CSS for html controls an styling, it’s pretty straightforward, not intrusive, and you can generate results quickly. The file will look like this:

<?php

//Include the class definition.
require("Whois.php");

//Decode the parameters received.
$paramsFetch = json_decode(
    file_get_contents("php://input"),
    true
);

//Create our whois object
$whoisObj = new Whois();
//Query the whois information
$whoisObj->getWhoisServerDetails($paramsFetch["domain"]);

//Return the response as JSON
echo $whoisObj->getResponse(true);
exit;

It is quite simple, it includes our Whois.php class, captures the parameters received from an HTML form that uses the javascript fetch function to send the request with the domain name, creates an instance of our class and makes the whois information query, then returns the result in JSON format and finishes the execution.

Now let’s go to the index.html file, this will be our graphical interface and the access point to the query and visualization of results. I use Bulma CSS for html controls an styling, it’s pretty straightforward, not intrusive, and you can generate results quickly. The file will look like this:

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Whois</title>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css">
    <script type="module">
        window.addEventListener('load', (event) => {
            //Event Handler for the search button.
            document.querySelector(".search").addEventListener('click', (event) => {
                //Show that the query is executing
                event.currentTarget.classList.add('is-loading');
                //disable the button
                event.currentTarget.disabled = true;

                //Hide the result sections.
                document.querySelector(".result").parentElement.classList.add("is-hidden");
                document.querySelector(".error").parentElement.classList.add("is-hidden");

                //Prepare the payload
                const payload = JSON.stringify({
                    "domain": document.querySelector(".domain").value
                });
                
                //Send the request to getwhois.php
                fetch('getwhois.php', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                    },
                    body: payload,
                })
                    .then(response => response.json())
                    .then(data => {
                        //Process the response.
                        if (data.errors != undefined) {
                            document.querySelector(".error").parentElement.classList.remove("is-hidden");

                            for (const item in data.errors) {
                                document.querySelector(".error").innerText = data.errors[item].error + "\n";
                            }

                        } else {

                            document.querySelector(".result").parentElement.classList.remove("is-hidden");
                            document.querySelector(".result").innerText = data.whoisinfo;
                        }

                    })
                    .catch((error) => {
                        document.querySelector(".error").parentElement.classList.remove("is-hidden");
                        document.querySelector(".error").innerText = error;
                        console.error('Error:', error);
                    }).finally(() => {
                        document.querySelector(".search").classList.remove('is-loading');
                        document.querySelector(".search").disabled = false;
                    });
            });

        });
    </script>
</head>

<body>
    <section class="section">
        <div class="container">
            <div class="columns">
                <div class="column">
                    <div class="columns">
                        <div class="column"></div>
                        <div class="column has-text-centered">
                            <h1 class="title">
                                WhoIs Lookup
                            </h1>
                        </div>
                        <div class="column"></div>
                    </div>
                    <div class="columns">
                        <div class="column"></div>
                        <div class="column has-text-centered">
                            <div class="field is-grouped is-grouped-centered">
                                <div class="control">
                                    <input class="input domain" type="text" placeholder="Domain">
                                </div>
                                <div class="control">
                                    <button class="button is-info search">
                                        Search
                                    </button>
                                </div>
                            </div>
                        </div>
                        <div class="column"></div>
                    </div>
                </div>
            </div>
            <div class="columns box is-hidden">
                <div class="column result"></div>
            </div>
            <div class="columns box is-hidden">
                <div class="column notification is-danger error has-text-centered">
                </div>
            </div>
        </div>
    </section>
</body>

</html>

Testing

To perform tests it is only necessary point your browser to the path where our scripts are located, in my case http://localhost/whois, it will show the field to write the domain name and the “Search” button to request the Whois information. You can try a popular domain like “google.com”, the result will look like this:

After a successful whois query you will notice that in the /cache directory, a file will be created with the TLD, for example “com.json” and will contain the name of the corresponding whois server, this will allow us to avoid scraping again.

And that’s all, never forget that the one who is brave is really free, drink plenty of water, exercise and get enough sleep.

At Winkhosting.co we are much more than hosting. If you need shared hosting, domains or servers, stop by to visit us.

source: https://medium.com/winkhosting/querying-whois-information-with-php-f686baee8c7

Comments are closed.