Php Folks,
Q1.
Do you know REGEX to extract domain name from a url that your form user inputs in your form field ?
““
<form method=”GET” name=”submit_link” id=”submit_link” action=<?php echo $_SERVER[‘PHP_SELF’];?>>
<label for=”url”>Url</label>
<input_type=”url” name=”webpage_address” id=”url” placeholder=”Type your wepage address here …” REQUIRED>
</form>
When users submit their url of the page they want to my we crawler to crawl, I need to extract the domain name off from the submitted url. REGEX must work for all forms of urls.
Q2.
And, in what url format do download links come in ?
Imagine a crooked user, fed my crawler a url of a virus download page. I don’t want our crawler working on such a page. Meaning, if the submitted url is a download link then crawler should ignore.
You know. You sometimes find links on google search results that when you click, the browser does not take you to any webpage. Instead a file gets downloaded on auto to your hdd or your browser prompts you to select on your hdd where you want to save the file after it gets downloaded.
As soon as my crawler detects the user submitted a download link, I want the crawler to foil the crawling. I must learn the download link format in order to teach the crawler to not crawl such links/urls that appear in that particular format.
I guess I must achieve this with regex. What you say ?