Pixabay Blog


Hotlinking Protection and Watermarking for Google Images

Due to inline linking, the last update of Google Images caused a loss of traffic for most publisher websites. Find out how to get back your visitors by applying a sophisticated hotlinking protection.

Google Image Search HotlinkingOn January 25, 2013, Google's image search was updated to a new interface. With the old design, when a visitor clicked a thumbnail in the search results, Google loaded the publisher's website in the background and included information about the image in an extra frame. A click on the image removed the frame and brought you to the image source. Today, pictures in search results are hotlinked in full resolution within Google. As a result, visitor numbers of most hosting websites have dropped dramatically by about 50% to 85%.

However, due to inline linking on Google's end, bandwidth consumption for source servers remained the same. Additionally, depicting large images directly in search results is way beyond "fair use" and by doing so, Google clearly violates the rights of image authors. Learn in this article, how you can handle and mitigate these issues.

 

Pixabay traffic from Google Image Search

Pixabay traffic from Google Image Search between January 08, 2013 and February 10, 2013.

 

1. Getting images removed from Google

If you fear for copyright violations and don't care about the number of visitors you get from Google Images, then this approach may be your best choice. There are two efficient ways to prevent your images from appearing in Google's index:

a) Add a robots.txt file to the root of your server and enter the following lines:

User-agent: Googlebot-Image
Disallow: /

To remove only a specific file type, e.g. all gif images, you may use an expression like this:

User-agent: Googlebot-Image
Disallow: /*.gif$

By that, Google will stop looking for images on your site, but it will take quite some time until they get de-indexed from their search index.

b) Another, immediate option is enabling hotlink protection on your server, which may save you lots of bandwidth by preventing other sites from displaying your images. How that is achieved, depends on your type of server. Here's how it's done for Apache, NGINX, and LightTPD.

 

2. Showing watermarks inside Googles Images

The following approach can save you bandwidth and may bring back a large portion of your lost traffic: Since images are hotlinked inside Gogle Images, it's fully in your hands to decide, what content to send back to Google. Instead of delivering the actual image, you may include a watermark and/or scale down the image before serving it to Google. Warning: It's up to Google to interpret this method as cloaking and ban your website at their will!

FansShare.com came first up with the idea of "on-the-fly" watermarking and had implemented the technique only days after Google Images was redesigned (see it in action). As of February 18, 2013, we also use this type of protection for Pixabay, and we've seen a reasonable recovery of our traffic.

 

Watermark in Google Images

Watermarked picture of Pixabay inside Google Images

 

If your website is driven by WordPress, there are already a few plugins available for this process:

Those plugins work quite well, and particularly WP-PICShield offers sophisticated hotlink protection - even for the new Bing. If you require more granular control, or are not using WordPress, you need to develop your own system. Therefore, you should know about URL rewriting, and you should also be familiar with HTTP header fields. In case your project is powered by NGINX/Django, we'll gladly share further details in the comments section of this article.

 

a) HTTP referrer-based watermarking

This method is used in the above mentioned WordPress plugins. It doesn't require any change inside your posts, but it usually fails with Google's SSL search (https), which accounts for about 15% to 40% of all searches. The approach is rather simple: an image request having the string ".google.com/blank.html" in its HTTP referrer is being redirected to a watermarked version of the original picture. In order to save bandwidth, the watermarked image could additionally be downsized appropriately; this is untested, though! For an improved performance, it's best to create all watermarked copies in advance, but they may also be generated on the fly (use cache!).

An inverse approach would be allowing only selected domains for hotlinking images, particularly your own domain. By blocking unknown and/or blank referrers, the watermark is also included in Google SSL search results. However, transmission of the HTTP referrer is omitted for users surfing in anonymous mode, and as a result, these users get to see the watermark on your website, as well.

Allow Googlebot and other search engine crawlers

No matter which version you choose, it's important to grant access to non-watermarked images for search engine crawlers like Googlebot. Otherwise even the thumbnails in search results will show a watermark, which has severe disadvantages and is not our goal! Thus, make sure to create an exception in your URL rewriting for selected HTTP User-Agents, i.a. Googlebot, bingbot, Slurp, Baiduspider, Yandex, and Sogou. These are crawler names of the main search engines.

 

b) URL-based watermarking (SSL search proof)

Aside from parsing the optional HTTP referrer, there's another, reliable way for detecting Google image requests that also work in Google SSL search. Simply show (slightly) different image URLs to search engine crawlers than to your real visitors - let's call them trap URLs. Google hotlinking is easily revealed by matching these URLs on incoming requests. On Pixabay, we've decided to go for this approach, because in combination with the HTTP referrer we're also able to override clicks on the "View original image" button. More details and full examples are given below.

 

3. Hijacking clicks on "View original image" button in Google Images

Unfortunately, directly linked images are not the only issue with Google's new interface: To make sure, users really don't need to see any publisher website any more, Google additionally included a button called "View original image" adjacent to search result details.

"View original image" button in Google Images

Clicking this button delivers the full resolution image directly for downloading inside your browser. Naturally, we don't want this and here's what we do about it: If Google has been identified via trap URL as the origin of an image request, we also check the content of the HTTP referrer. If it contains ".google.com" and not ".google.com/blank.html", we redirect Google's request to the article/post the image is placed in. This approach even works for SSL Google search.

Pitfall warning: Served watermarked images must not be cached by the client's browser! Otherwise, clicking "View original image" will only fetch the watermarked image from the browser's cache instead of starting a new request. Cache-prevention is achieved by setting the following HTTP header field:

Cache-Control: no-cache, must-revalidate

This redirect will fail for users surfing in anonymous mode (blocked referrer). However, when using URL-based watermarking, they will only find watermarked copies of your images on Google. Thus, in order get the original images, such users still have to visit the publisher's website.

 

4. Covering Google.de, Google.fr, Bing, and more

Not all of Google's locale domains have been updated, yet. Particularly Google.de (Germany) and Google.fr (France) are still using the "old" interface. Therefore, we recommend introducing (additinally) a so-called Framekiller in the body of your HTML documents:

<script>if (top!= self) top.location.replace(location);</script>

This short JavaScript snippet prevents your website from being embedded in an iframe. Clicks on images inside Google.de/Google.fr will therefore directly open the publisher's website, instead of showing the image encapsulated inside a preview frame. Additionally, this client based redirect also works for most Bing image search results, and other websites that try embedding your content.

 

Applied hotlinking protection on Pixabay

To summarize the above presented methods, the following lines show you in detail, how we handle all requests and URLs on Pixabay:

  1. Search engine crawlers, like Googlebot and bingbot, find in our posts plain image URLs. For other/real visitors, we simply append a GET parameter "i" to all relevant image URLs. Thus, "tree.jpg" becomes "tree.jpg?i". This difference is enough to detect Google image requests correctly. Crawlers are identified via HTTP User-Agent (vide supra). In addition, to grant Facebook access to our non-watermarked images, we also append the GET param "i" to all og:image tags.

  2. Watermarks have been created for all existing images in advance, and watermarked copies of new uploadeds are created automatically. We use Python to achieve this task and if you'd like to know the details, simply ask in the comments of this article.

  3. URL rewriting in our NGINX server config takes place according to the following scheme:
    If the HTTP referrer contains ".google." and not ".google.com/blank.html"
        Then redirect trap URLs to the post of the image

    ("View original image" button)
    Else if the request comes from a crawler (HTTP User-Agent) or GET param "i" is present
        Then serve the normal image without watermark
    Else
        Show watermarked image (prevent caching!)

    ...
    Some standard hotlink protection rewrite rules follow that are not related to Google Images

    Using conditional URL rewriting in NGINX requires some rather tricky lines of code. Therefore, we show you the relevant and slightly censored part of our NGINX config file:

    set $button_redirect 0;
    set $watermark 1;
    if ($http_referer ~ "\.google\.") { set $button_redirect 1; }
    if ($http_referer ~ "\.google\.[^/]+/blank\.html") { set $button_redirect 0; }
    if ($http_user_agent ~ "Googlebot|bingbot|Slurp|Baiduspider|Yandex|Sogou") {
        set $button_redirect 0;
        set $watermark 0;
    }
    if ($button_redirect = 1) { rewrite "IMAGE_URL_REGEX" POST_URL last; }
    if ($args = "i") { set $watermark 0; }
    if ($watermark = 1) {
        add_header Cache-Control "no-cache, must-revalidate";
        rewrite "IMAGE_URL_REGEX" WATERMARK_URL last;
    }

We've only been running Pixabay for a few days with this protection enabled and here are our preliminary results:

 

Google Images hotlinking protection results for Pixabay

Google Images hotlinking protection results for Pixabay

 

We began experimenting on February 17, 2013. For starters, we only used the HTTP referrer for detecting Google Image requests. Traffic increased significantly by about 60%. On February 19, 2013, we've additionally implemented URL-based watermarking by appending "?i" to relevant image URLs. As you can see, our Google traffic has largely recovered. When compared to the number of visitors with hotlink protection disabled, we observe an increase of traffic by roughly 100%.

These findings are based on answers to my recently posted question on Stackoverflow. And now it's up to you, to spread the word! Share the information on Facebook and Twitter, tell your colleagues and friends in the web business. Good luck!

Update: In the meantime - three months later - we have published a follow-up post, in which we present our findings and conclusions concerning this type of hotlinking protection: An update on watermark-based hotlink protection for Google Images
Update 2: Using "blank.html" for deciding whether to serve the watermarked image has stopped working in May 2014. A new NGINX configuration is given in our latest Google Images post.
, Feb. 22, 2013  

Comments

Translate
Simon  05/20/2014
Just published an update post that works as of May 2014: http://pixabay.com/blog/posts/watermarks-and-redirects-in-google-images-reload-49/
Translate
Simon  03/28/2014
We don't use Apache / htaccess, but if anyone can provide this code, I'll include it in the post above.
Translate
xitclub  03/27/2014
Hello, can you tell please me configuration for htaccess on Apache?
Translate
Simon  09/01/2013
I really doubt that this is possibly on blogger. At least I wouldn't have a clue about it, sorry!
Translate
finicky321  09/01/2013
Greetings , Its a bit harder for me to understand coding etc. Can you please guide me how to do something similar to this on blogger? Regards
Translate
jonathantimar  07/31/2013
Thank you, Simon :)
Translate
Simon  07/31/2013
Nice work, Jonathan :-)
Translate
jonathantimar  07/31/2013
I wanted to implement this solution, but since I am on the much more common Apache webserver, you instructions weren't useful to me. I finally managed to figure out how to achieve this same protection on Apache, and have detailed the instructions here for anyone who is interested. http://inthelimelight.net/stop-google-image-search-from-hotlinking-your-photos-without-a-plugin-bing-too/
Translate
zambapk  07/14/2013
Hi Byrev, i was using wp-picshield plugin on my website and it was working fine. recently i've shifted my website code on GoDaddy VPS. on this new server wp-picshield is not working. when i enable plugin, this error appears: "Service Temporarily Unavailable" is there any special settings on the server, i've to enable? please help me in this regard.
Translate
byrev  07/04/2013
from my point of view, the pixabay trafic problem is not from watermak ;) ... I explained more here: http://pixabay.com/en/blog/posts/an-update-on-watermark-based-hotlink-protection-fo-41/#comments
86 more ...

Tagging Tutorial for Pixabay Images

This articles explains in detail why tagging your images is so important and how it's done correctly.

Free Software by Pixabay on GitHub

In the spirit of Pixabay, we've opened a GitHub repository under which we are going to release useful pieces of our platform code as free software.

Draggable jQuery Colorbox

Four very simple and efficient ways of rendering your jQuery Colorbox window draggable - with no dependencies and no overhead.

Tagging Tutorial for Pixabay Images Pixabay WordPress Plugin Version 2 Free Software by Pixabay on GitHub Django Search with Elasticsearch Draggable jQuery Colorbox