/    Sign up×
Community /Pin to ProfileBookmark

googlebot still sending blank emails

I have a contact page on my website with a form for users to contact me. The form works great, except I get a blank email every night. Looks like it’s from googlebot crawling it (that’s my guess).

When I delete the contact page and php handler, the emails stop.

I copied it back and I’ve added this to my robots.txt page, but I’m still getting the blank emails each night. Any ideas?

Here’s what I’ve added to my robots page.

User-agent: *
Disallow: /contact.html

to post a comment
SEO

10 Comments(s)

Copy linkTweet thisAlerts:
@bigjohnncauthorMar 06.2017 — Update:

I tried putting my contact page (and handler.php file) into a separate folder and changing my robots.txt file. Still running into the same issue (another email last night).

User-agent: *

Disallow: /contact/
Copy linkTweet thisAlerts:
@WebAmateurMar 06.2017 — Do you have access to the code of your webpage?

In that case you should consider adding honeypots, bot checks and limits on time to send a mail. If you do not add these i guarantee you that you will be flooded by spam one day. Adding these checks will most likely also prevent google bot from sending you a mail.
Copy linkTweet thisAlerts:
@bigjohnncauthorMar 06.2017 — Yes I have access. I'll do some research on those things and see what I can find. If anyone has some good links to follow on these that'd be great--otherwise I'll start digging.
Copy linkTweet thisAlerts:
@WebAmateurMar 06.2017 — I once had a uncontrolled comment section and it was flooded with spam. So i added a little equation to solve (like 5+3). Stopped the bot for a day and then it restarted. The bot simply solved it. I experimented for better solutions:
[LIST]

  • [*] One solution is to add a user-entered control system as I just mentioned, but more complex so a bot can not solve it. This can be [B]Captcha[/B] (like Google's Recaptcha), or some smart question. The downside it that these are sometimes entered wrong by people, thus can be a nuisance to users.


  • [*] Another possiblity it to add a [B]honeypot[/B]. Like the name suggests, its purpose is to catch the entries by bots. A simple example is placing a text-field in your form, and hide it with css (display:none). Most bots don't know it is hidden and will fill it in anyway. So you check the field for a value, and if there is one you don't send the mail. Ofcourse this has also a downside. People with auto form-fillers will automatically fill this field too. Next to that, due to one reason or the other, the field might not be hidden for a person. Thus you'll be required to put a text beside it like 'Leave open'. These might then again be recognized by bots, making your trap inefficient


  • [*] Another idea, which i use, is to add a form field type=hidden. This form field contains a number which is the [B]timestamp[/B]. This timestamp is obfuscated by certain calculations. After the form is submitted, I "uncalculated" the number and check it with the current time. If certain time has passed (say: 10 seconds...?) i accept the entry. Bots will often submit the form hundreds of times each minute, while people take a while to fill it in. Again a downside: people with autofillers, or which had to make slight changes to be accepted might be wrongly seen as bots

  • [/LIST]


    There are other optioins as well. I prefer the last two together, and this combined with a good error-reporting system to the user. Always keep in mind in which way it prevents bot-entries, but also to which extent it might hinder users
    Copy linkTweet thisAlerts:
    @Kevin2Mar 07.2017 — [LIST]
  • [*]Googlebot obeys robots.txt

  • [*]I've never known Googlebot to "submit" a form. Crawl it, index it, but never submit.

  • [/LIST]


    So here's a couple of possibilities:

    1) Googlebot has not requested your updated robots.txt file yet. I have no clue how often it does, but it's definitely not on every request.

    2) There are fake Googlebots out there. They claim to be "Googlebot" in their User Agent string but do not come from the Googlebot IP address block.

    The solution to possibility 1 is time. Over the next week or so the updated robots.txt file will be requested, processed, etc.

    The solution to possibility 2 is to ban those fake Googlebots with .htaccess. The real Googlebot comes from the 66.249.xxx.yy IP block. Adding something like this should help:
    RewriteCond %{HTTP_USER_AGENT} GoogleBot [NC]
    RewriteCond %{REMOTE_ADDR} !^66.249.(6[4-9]|[7-8][0-9]|9[0-5]).
    RewriteRule .* - [F]


    Some sleuthing through your raw access logs (assuming you have access to them) may also be of help. You know what time those emails are sent. Find that time in your log file and see what's submitting your form. My guess is that it's not Googlebot.
    Copy linkTweet thisAlerts:
    @bigjohnncauthorMar 07.2017 — Thanks much Kevin.

    I'm accessing my logs and strangely not seeing anything post at the exact time I'm getting the emails (unless there's a timezone difference messing with me).

    I am seeing some SiteLockSpider stuff in there a lot--- not sure what that is.

    I've added the code you suggested and we'll see if that helps as well.

    Thanks again very much and we'll see what happens tonight.
    Copy linkTweet thisAlerts:
    @bigjohnncauthorMar 09.2017 — Meh... still getting an email at night. Completely blank except the number "1" for name has been entered.
    Copy linkTweet thisAlerts:
    @Kevin2Mar 10.2017 — Deny from 184.154.
    in .htaccess gets rid of SiteLock. ?
    Copy linkTweet thisAlerts:
    @bigjohnncauthorMar 10.2017 — Thanks Kevin2!
    Copy linkTweet thisAlerts:
    @ForePrime_SEOMar 12.2017 — Here are a couple of reasons why:

  • 1. You are using not that great contact form plugin with some bugs - change to something else

  • 2. Maby it's not a Googlebot but a spam bot, that name is similar to the Google - check that if it's a spam then buy some Bot Spanker to block that bot

    You don't have to disallow the page just do NOINDEX NoFollow your contact page
  • ×

    Success!

    Help @bigjohnnc spread the word by sharing this article on Twitter...

    Tweet This
    Sign in
    Forgot password?
    Sign in with TwitchSign in with GithubCreate Account
    about: ({
    version: 0.1.9 BETA 5.18,
    whats_new: community page,
    up_next: more Davinci•003 tasks,
    coming_soon: events calendar,
    social: @webDeveloperHQ
    });

    legal: ({
    terms: of use,
    privacy: policy
    });
    changelog: (
    version: 0.1.9,
    notes: added community page

    version: 0.1.8,
    notes: added Davinci•003

    version: 0.1.7,
    notes: upvote answers to bounties

    version: 0.1.6,
    notes: article editor refresh
    )...
    recent_tips: (
    tipper: @AriseFacilitySolutions09,
    tipped: article
    amount: 1000 SATS,

    tipper: @Yussuf4331,
    tipped: article
    amount: 1000 SATS,

    tipper: @darkwebsites540,
    tipped: article
    amount: 10 SATS,
    )...