Pixabay Blog


Hotlinking protection and watermarking for Google Images

Due to inline linking, the last update of Google Images caused a loss of traffic for most publisher websites. Find out how to get back your visitors by applying a sophisticated hotlinking protection.


Google Image Search HotlinkingOn January 25, 2013, Google's image search was updated to a new interface. With the old design, when a visitor clicked a thumbnail in the search results, Google loaded the publisher’s website in the background and included information about the image in an extra frame. A click on the image removed the frame and brought you to the image source. Today, pictures in search results are hotlinked in full resolution within Google. As a result, visitor numbers of most hosting websites have dropped dramatically by about 50% to 85%.

However, due to inline linking on Google's end, bandwidth consumption for source servers remained the same. Additionally, depicting large images directly in search results is way beyond "fair use" and by doing so, Google clearly violates the rights of image authors. Learn in this article, how you can handle and mitigate these issues.

 

Pixabay traffic from Google Image Search

Pixabay traffic from Google Image Search between January 08, 2013 and February 10, 2013.

 

1. Getting images removed from Google

If you fear for copyright violations and don't care about the number of visitors you get from Google Images, then this approach may be your best choice. There are two efficient ways to prevent your images from appearing in Google's index:

a) Add a robots.txt file to the root of your server and enter the following lines:

User-agent: Googlebot-Image
Disallow: /

To remove only a specific file type, e.g. all gif images, you may use an expression like this:

User-agent: Googlebot-Image
Disallow: /*.gif$

By that, Google will stop looking for images on your site, but it will take quite some time until they get de-indexed from their search index.

b) Another, immediate option is enabling hotlink protection on your server, which may save you lots of bandwidth by preventing other sites from displaying your images. How that is achieved, depends on your type of server. Here's how it's done for Apache, NGINX, and LightTPD.

 

2. Showing watermarks inside Googles Images

The following approach can save you bandwidth and may bring back a large portion of your lost traffic: Since images are hotlinked inside Gogle Images, it's fully in your hands to decide, what content to send back to Google. Instead of delivering the actual image, you may include a watermark and/or scale down the image before serving it to Google. Warning: It's up to Google to interpret this method as cloaking and ban your website at their will! 

FansShare.com came first up with the idea of "on-the-fly" watermarking and had implemented the technique only days after Google Images was redesigned (see it in action). As of February 18, 2013, we also use this type of protection for Pixabay, and we've seen a reasonable recovery of our traffic.

 

Watermark in Google Images

Watermarked picture of Pixabay inside Google Images 

If your website is driven by WordPress, there are already a few plugins available for this process:

Those plugins work quite well, and particularly WP-PICShield offers sophisticated hotlink protection - even for the new Bing. If you require more granular control, or are not using WordPress, you need to develop your own system. Therefore, you should know about URL rewriting, and you should also be familiar with HTTP header fields. In case your project is powered by NGINX/Django, we'll gladly share further details in the comments section of this article.

 

a) HTTP referrer-based watermarking

This method is used in the above mentioned WordPress plugins. It doesn't require any change inside your posts, but it usually fails with Google's SSL search (https), which accounts for about 15% to 40% of all searches. The approach is rather simple: an image request having the string ".google.com/blank.html" in its HTTP referrer is being redirected to a watermarked version of the original picture. In order to save bandwidth, the watermarked image could additionally be downsized appropriately; this is untested, though! For an improved performance, it's best to create all watermarked copies in advance, but they may also be generated on the fly (use cache!).

An inverse approach would be allowing only selected domains for hotlinking images, particularly your own domain. By blocking unknown and/or blank referrers, the watermark is also included in Google SSL search results. However, transmission of the HTTP referrer is omitted for users surfing in anonymous mode, and as a result, these users get to see the watermark on your website, as well.

Allow Googlebot and other search engine crawlers

No matter which version you choose, it's important to grant access to non-watermarked images for search engine crawlers like Googlebot. Otherwise even the thumbnails in search results will show a watermark, which has severe disadvantages and is not our goal! Thus, make sure to create an exception in your URL rewriting for selected HTTP User-Agents, i.a. Googlebot, bingbot, Slurp, Baiduspider, Yandex, and Sogou. These are crawler names of the main search engines. 

 

b) URL-based watermarking (SSL search proof)

Aside from parsing the optional HTTP referrer, there's another, reliable way for detecting Google image requests that also work in Google SSL search. Simply show (slightly) different image URLs to search engine crawlers than to your real visitors - let's call them trap URLs. Google hotlinking is easily revealed by matching these URLs on incoming requests. On Pixabay, we've decided to go for this approach, because in combination with the HTTP referrer we're also able to override clicks on the "View original image" button. More details and full examples are given below.

 

3. Hijacking clicks on "View original image" button in Google Images

Unfortunately, directly linked images are not the only issue with Google's new interface: To make sure, users really don't need to see any publisher website any more, Google additionally included a button called "View original image" adjacent to search result details.

"View original image" button in Google Images

Clicking this button delivers the full resolution image directly for downloading inside your browser. Naturally, we don't want this and here's what we do about it: If Google has been identified via trap URL as the origin of an image request, we also check the content of the HTTP referrer. If it contains ".google.com" and not ".google.com/blank.html", we redirect Google's request to the article/post the image is placed in. This approach even works for SSL Google search.

Pitfall warning: Served watermarked images must not be cached by the client's browser! Otherwise, clicking "View original image" will only fetch the watermarked image from the browser's cache instead of starting a new request. Cache-prevention is achieved by setting the following HTTP header field:

Cache-Control: no-cache, must-revalidate

This redirect will fail for users surfing in anonymous mode (blocked referrer). However, when using URL-based watermarking, they will only find watermarked copies of your images on Google. Thus, in order get the original images, such users still have to visit the publisher's website.

 

4. Covering Google.de, Google.fr, Bing, and more

Not all of Google's locale domains have been updated, yet. Particularly Google.de (Germany) and Google.fr (France) are still using the "old" interface. Therefore, we recommend introducing (additinally) a so-called Framekiller in the body of your HTML documents:

<script>Text if (top != self) top.location.replace(location);</script>

This short JavaScript snippet prevents your website from being embedded in an iframe. Clicks on images inside Google.de/Google.fr will therefore directly open the publisher's website, instead of showing the image encapsulated inside a preview frame. Additionally, this client based redirect also works for most Bing image search results, and other websites that try embedding your content.

 

Applied hotlinking protection on Pixabay

To summarize the above presented methods, the following lines show you in detail, how we handle all requests and URLs on Pixabay:

  1. Search engine crawlers, like Googlebot and bingbot, find in our posts plain image URLs. For other/real visitors, we simply append a GET parameter "i" to all relevant image URLs. Thus, "tree.jpg" becomes "tree.jpg?i". This difference is enough to detect Google image requests correctly. Crawlers are identified via HTTP User-Agent (vide supra). In addition, to grant Facebook access to our non-watermarked images, we also append the GET param "i" to all og:image tags.

  2. Watermarks have been created for all existing images in advance, and watermarked copies of new uploadeds are created automatically. We use Python to achieve this task and if you'd like to know the details, simply ask in the comments of this article.

  3. URL rewriting in our NGINX server config takes place according to the following scheme:
    If the HTTP referrer contains ".google." and not ".google.com/blank.html"
        Then redirect trap URLs to the post of the image ("View original image" button)
    Else if the request comes from a crawler (HTTP User-Agent) or GET param "i" is present
        Then serve the normal image without watermark
    Else
        Show watermarked image (prevent caching!)
    ...
    Some standard hotlink protection rewrite rules follow that are not related to Google Images

    Using conditional URL rewriting in NGINX requires some rather tricky lines of code. Therefore, we show you the relevant and slightly censored part of our NGINX config file:

    set $button_redirect 0;
    set $watermark 1;
    if ($http_referer ~ "\.google\.") { set $button_redirect 1; }
    if ($http_referer ~ "\.google\.[^/]+/blank\.html") { set $button_redirect 0; } 
    if ($http_user_agent ~ "Googlebot|bingbot|Slurp|Baiduspider|Yandex|Sogou") {
        set $button_redirect 0;
        set $watermark 0;
    }
    if ($button_redirect = 1) { rewrite "IMAGE_URL_REGEX" POST_URL last; }
    if ($args = "i") { set $watermark 0; }
    if ($watermark = 1) {
        add_header Cache-Control "no-cache, must-revalidate";
        rewrite "IMAGE_URL_REGEX" WATERMARK_URL last;
    }

We've only been running Pixabay for a few days with this protection enabled and here are our preliminary results:

 

Google Images hotlinking protection results for Pixabay

Google Images hotlinking protection results for Pixabay

We began experimenting on February 17, 2013. For starters, we only used the HTTP referrer for detecting Google Image requests. Traffic increased significantly by about 60%. On February 19, 2013, we've additionally implemented URL-based watermarking by appending "?i" to relevant image URLs. As you can see, our Google traffic has largely recovered. When compared to the number of visitors with hotlink protection disabled, we observe an increase of traffic by roughly 100%.

These findings are based on answers to my recently posted question on Stackoverflow. And now it's up to you, to spread the word! Share the information on Facebook and Twitter, tell your colleagues and friends in the web business. Good luck!

, 22.02.2013

Comments

Translate
geost77  4 days ago
Excellent ! I got WP-PICShield and made it work at my website (my website is not Wordpress). Just have in mind megapixels setting, as I had it set max to 6MPx by default and my photos were not protected at google images, as most of my photos are more than 12MPx. Once I had setting changed to allow WP-PICShield work with photos up to 25 MPx all is fine (you may need to increase memory limit, as GD use lot of RAM for big photos).
Translate
byrev  15 days ago
@sebastyan21 " the best courses about pinterest? "

do not spam, do not abuse the pins, do not put pictures with strong xxx context, do not follow to much per day :)

... and add funny pictures of cats and dogs, and you have traffic :D
Translate
sebastyan21  15 days ago
Thanks byrev. Indeed my traffic is up by 20 to 30% in last 24 hours. IS AMAZING !!!
Thank you guys!

What are the best courses about pinterest?
I'm really interested in image traffic cuz I have more than 10 images per post in my industry:)

Thanks again.
I bookmarked this site:X:X
Translate
byrev  15 days ago
@sebastyan21 So seems to work fine, you just need to watch/monitor the statistics to see if anything changes.
Translate
sebastyan21  15 days ago
yes, appear something like this:
to view the entire image click here:)
Translate
Simon  15 days ago
@sebastyan: It's best to simply check on Google Images, if your watermark and the redirects are working correctly.
Translate
sebastyan21  16 days ago
I downloaded the ByREV WP-PICShield plugin and enabled it on my sites.
Is it enough to do this and I receive the traffic from images again right?

Thanks !
Translate
zambapk  04/12/2013
Hi @byrev,
did you check my email?

Sameer
Translate
zambapk  04/11/2013
Thanks @byrev! email sent.
Translate
byrev  04/11/2013
Hi @zambapk

write'me to byrev @ yahoo dot com , and send the site name ... I'll check and respond today when I have free time.
61 more ...

Pixabay plugin for WordPress

Pixabay Images is a free WordPress plugin that lets you pick photos and cliparts from Pixabay and insert them quickly into your blog posts.

How to style a Facebook Like Box with CSS

Injecting a CSS file into fb:fan stopped working just recently. Here's an alternative solution to freely customize the layout of your Facebook Fan Page widget with CSS.

Notepad++ ColorPicker Plugin

A unicode ColorPicker plugin for Notepad++, including compiled binaries (DLL) and Delphi source.

How to style a Facebook Like Box with CSS Pixabay plugin for WordPress Photo Contest "Be Creative - Everything but Nature" Automatic Desktop Wallpaper Changer Notepad++ ColorPicker Plugin Write for us
Loading ...
Error!