Gmail deploys image proxy servers

Discussion in 'In The News' started by roundabout, Dec 9, 2013.

  1. roundabout

    roundabout VIP

    Joined:
    Feb 17, 2011
    Messages:
    2,713
    Likes Received:
    154
    Trophy Points:
    63
    Gmail deploys image proxy servers

    This afternoon Justin Foster of LiveClicker posted to the OnlyInfluencers list asking about Gmail rewriting links.

    After some investigation, testing and talking with people at various ESPs, I can confirm that Google is rewriting image links. This rewriting appears to be happening during the delivery process. Older messages that are currently in mailboxes aren’t showing this tracking.

    Many marketers are concerned about this. The first concern is always about open tracking and how this will affect engagement metrics.

    Normal open tracking happens when a user opens an email and loads images into their mail client. Each email address is given a unique image name so that the sender knows who loaded the image. Every time a user opens the email, the image is reloaded from the image server.

    In the new Google setup, the first time an image is opened, Google downloads the image from the image server and caches it on a Google managed proxy. This means that the first image load can be tracked by the sender, but any subsequent image loads will not be tracked.

    For senders, this means that only the first open of any individual image will be recorded. When someone opens a mail, Google will check to see if that image is in their cache, if it isn’t, then they follow the link, load the image and put it in the cache. Any time someone tries to load that same image, whether the same or a different recipient, Google will serve the image from the cached page.

    For global images, this means only one user has to open the mail and the images are pulled from the server. In the case of tracking images, every image file name is unique. Every new open will cause Google to grab the uniquely named image. The result is that senders can track the first open, but no subsequent opens.

    We identified the following string from an open at Google

    66.249.84.36 - - [05/Dec/2013:13:00:55 -0800] "GET /zimbabwe.png
    HTTP/1.1" 200 2867 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1;
    de; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (via ggpht.com)"

    which is a Google proxy string.

    Images aren’t just used for open tracking, however. There are a number of services which provide geo-specific images depending on where the images are opened from. This new proxy is going to break that. I’m also hearing of at least one email services provider that is seeing no opens from Google today, possibly because of how their images are interacting with the proxy server.

    In any case, this is an issue we’ll be keeping a close eye on.

    Source:
    http://blog.wordtothewise.com/
     
  2. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,140
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    Depending on how Google is determining the "uniqueness" of an image it would be trivial to defeat their caching system.
     
  3. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    Looks like it is just by the image name/URL so there is not much you can do...

    I looked at Gmail image loads through Fiddler and apparently it is not a simple URL re-writing, but an HTTP CONNECT request. The tunnel is apparently encrypted since Fiddler cannot capture/show the images, just some incomprehensible binaries

    This is the first I have run into this CONNECT thingie, so it should take some experimentation to figure it out...
    http://en.wikipedia.org/wiki/HTTP_tunnel#HTTP_CONNECT_Tunneling
     
  4. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,140
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    Even if google is using the URL/Filename you can still force it to not be cached. Instead of a 302 to a static image url you could have a script spit out the image/png|jpg|gif whatever header and while your script is outputting binary image data you could pad it with random data (depending on the image) or alter it with imagemagick tools which will change the size and any hash used to calculate if it's unique..

    HTTP CONNECT is used for SSL connections. Fiddler will decrypt SSH, you just need to import the Fiddler root certificate as 'trusted', which Fiddler says not do to.. If you're paranoid just run it under a VM. If you're seeing binary crap make sure to enable the decode gzip option..
     
  5. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,140
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    err Fiddler will not decrypt SSH.. I meant fiddled will decrypt SSL. derp.
     
  6. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    This sounds about right... I just never had to deal with it before...
     
  7. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    This should be worth testing, but if their behavior is what you described what is really the point?

    If you are serving a different image and it won't be cached on the second load because it is different sounds like nothing changed. Am I missing something here?
     
  8. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,140
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    It all depends on how your system is handling open tracking and image delivery. I don't mail gmail so I haven't bothered to see what google is doing.
     
  9. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    I just ran a test and looked at Gmail opens. They do in fact fully mask the origin.

    Here is the IP -

    Here is the User Agent, clearly a bot, as my browser is Chrome -

    Subsequent page reloads get the same images quickly loaded from cache without further calls to servers. I have not tested at what point this caching behavior times out. Overall this blows at every level...
     
  10. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,140
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    What's the structure of your original image url?
    Is it a link to a static image or a link with user information that does a 302 to a static image?
     
  11. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    My "image URL" is just a unique, encrypted and well-randomized base64 string that encodes all tracking info without any observable pattern... There is no 302, the images are read and served by my code directly.

    A bunch of people working for various ESPs did their own testing and confirmed that when Gmail does the request to actual image servers they strip out all cache-related headers (which I have confirmed). Here are a few links:
    http://blog.wordtothewise.com/2013/12/more-info-about-gmail-image-caching/
    http://blog.wordtothewise.com/2013/12/faq-about-opens-and-gmail-caching/

    There was a proposed "solution" to breaking the cache by serving with header "Content-Length: 0", but when I tried it myself it did not break the Gmail cache and instead broke non-Gmail client (e.g. Outlook). Maybe Gmail patched that already or maybe I had some bug in implementing this "fix" (unlikely) - would be nice if others here test it too.
    http://emailexpert.org/gmail-tracking-changes-the-fix-what-you-need-to-know/

    I did further testing and looks like the caching does not rely just on local browser storage, but really is done by Google per-email, per-user-agent. When I cleared my Chrome cache and did a reload my server did not see the hits.

    What is something of a mystery is how their cache is expired/refreshed. I suspect it is done daily at some hour in early AM, because in late night experiments the images refreshed relatively quickly while during the daytime reloads I did not observe any reloads. This is somewhat inexact, but at the very least a refresh is expected to take hours, so you won't see repeat opens after that first one...

    The reason all this is really important is because today this is done by Gmail and tomorrow it could be any other mailbox provider, which would really cut down on the amount of data we can collect and use...
     

Share This Page