In this blog post I’ll cover how to use Fastly’s CDN to speed up JSON reads from a Google Spreadsheet.

JSON via Google Sheets: Rad + Slow

My unplugged-ness tracker uses a public Google Spreadsheet as a database and retrieves the contents as JSON for webby visualization. The result is a system that can track information for any user that has a Google account without any custom server infrastructure.

While kinda slick, Google Spreadsheets JSON retrieval is notoriously slow, making for a slow web UI.

Without Fastly</div>With Fastly</div>

Downloading, merging, slicing, and displaying Unplug data

The javascript library I’m using to interact with the Google Spreadsheets API (Tabletop.js) has custom support for a Heroku-based caching proxy to work around this issue. Fastly should be faster (and I work there) so I use it for caching instead.

Using Fastly to front a Google Spreadsheet is straightforward, except that you have to configure Fastly to tweak the cache control headers from the origin before it processes a response. Once configured, there is a dramatic speedup. Read on for details.

Speeding it up with Fastly

My goal was to have requests to the spreadsheet behind unplugged.js (hosted on spreadsheets.google.com) from your browser route to a reasonably-recent cached version at one of Fastly’s “Points of Presence” (PoP) around the world. Fastly magic will route users to the best (fastest) PoP, and spreadsheets.google.com is really slow, so I was expecting a decent speedup.

The scheme I devised was to have cdn.fastly.com CNAME to global.prod.fastly.net and to configure Fastly to cache requests to this hostname (cdn.fastly.com) to spreadsheets.google.com. The second part is kind of a hack, and there are some security implications (see below), but it seems to work reasonably well.

Here is how I did it:

1. Make cdn.foote.pub route to Fastly

This step is specific to the domain hosting solution. I’m using Google Domains (and Github Pages) for this site, so the result looks something like this:

Configuration

And the DNS response for foote.pub, post-Fastly:

$ dig cdn.foote.pub

; <<>> DiG 9.8.3-P1 <<>> cdn.foote.pub
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1631
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cdn.foote.pub.         IN  A

;; ANSWER SECTION:
cdn.foote.pub.      3600    IN  CNAME   global.prod.fastly.net.
global.prod.fastly.net. 2   IN  CNAME   global-ssl.fastly.net.
global-ssl.fastly.net.  2   IN  CNAME   fallback.global-ssl.fastly.net.
fallback.global-ssl.fastly.net. 2 IN    A   199.27.76.249
fallback.global-ssl.fastly.net. 2 IN    A   23.235.46.249

;; Query time: 121 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Tue Jul  7 16:31:45 2015
;; MSG SIZE  rcvd: 147

2. Make Fastly cache from spreadsheets.google.com

Fastly has lots of docs on basic configuration but you might not need them. The UI is reasonably intuitive, with help links in the right places.

Configuration

… so I’ll gloss over this step.

3. Deal with cache control headers

As far as I know there is no way to modify the cache control headers for a Google Spreadsheet, and by default Fastly will obey them. You can view the cache control headers for a spreadsheet via curl:

$ curl -vv -H "Fastly-Debug: 1" http://spreadsheets.google.com/feeds/worksheets/1QOPqFo1RQFrnyFGbvvNkU_JFeoWV8KYCiLR1jVOZ1nI/public/basic?alt=json 1> /dev/null
* Hostname was NOT found in DNS cache
* Connected to spreadsheets.google.com (63.88.73.88) port 80 (#0)
> GET /feeds/worksheets/1QOPqFo1RQFrnyFGbvvNkU_JFeoWV8KYCiLR1jVOZ1nI/public/basic?alt=json HTTP/1.1
> User-Agent: curl/7.37.1
> Host: spreadsheets.google.com
> Accept: */*
> Fastly-Debug: 1
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< X-Robots-Tag: noindex, nofollow, nosnippet
< Access-Control-Allow-Origin: *
< Expires: Tue, 07 Jul 2015 20:44:03 GMT
< Date: Tue, 07 Jul 2015 20:44:03 GMT
< Cache-Control: private, max-age=0, must-revalidate, no-transform
< Vary: Accept, X-GData-Authorization, GData-Version
< GData-Version: 1.0
< Last-Modified: Tue, 07 Jul 2015 20:43:19 GMT
< Transfer-Encoding: chunked
< P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
< P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
* Server GSE is not blacklisted
< Server: GSE
< Set-Cookie: NID=69=p6UwIbqGdbpRGQ1INtBl_-6-egpYXxPuJxLzC-cy4anLBPPsWfKHFX9S5Znom76uPvzOKwRk6XLKvtPhQ7WWbgqsYiSaJcFLoUyOzWUCEmO447Rg6Z4NJqHySVi_CyBC;Domain=.google.com;Path=/;Expires=Wed, 06-Jan-2016 20:44:03 GMT;HttpOnly
< Set-Cookie: NID=69=ZnqSmbMSyFiWnhdZOazTKK1eCVEZu-whfNmeK7UsTm5XuAX4MJRNJuaskXVqoigva8ZElIqiGBZWJxjq9fhQfSM7fmy8CAWYz97JrS6x3rQDxpgLAvWj9bKNg8uLK1mo;Domain=.google.com;Path=/;Expires=Wed, 06-Jan-2016 20:44:03 GMT;HttpOnly
< Alternate-Protocol: 80:quic,p=0
< 
{ [data not shown]

Here are the relevant headers and what I had to do to fix them:

  • Expires: Tue, 07 Jul 2015 20:44:03 GMT
    • Google tells Fastly to expire the content immediately, which makes sense for a Spreadsheet that might be updated in real-time (but not for JSON used by a web app that is not-quite-real-time)
    • Todo: Need to tell Fastly to delete this
  • Cache-Control: private, max-age=0, must-revalidate, no-transform
    • Google tells Fastly not to cache the content in a lot of ways
    • Todo: Need to tell Fastly to delete this
  • Vary: Accept, X-GData-Authorization, GData-Version
    • This one tells Fastly not to store HTTP responses that might correspond to the spreadsheet not being served – if the browser can’t handle the format, if the user supplies a garbage access key (a bad key results in an error even if the spreadsheet is public), etc.
    • Todo: Nothing; These will work as expected

I’ll omit the detail here, but you can see that I added rules to the Content pane of my Fastly config that correspond to the changes above.

Configuration

Note that Fastly runs a heavily modified version of Varnish cache – each of those rules correspond to statements in the varnish configuration file that Fastly consumes on my behalf.

Since my app updates from my devices roughly once every 5 minutes when they are active, I also added a rule to set the Time-To-Live (beresp.ttl) for cached versions of the spreadsheet to 5 minutes. This way, users will see unplugged-ness data that is no more than 5+5 = 10 minutes in the graphs on my web page.

4. Test

I used curl to see if the spreadsheet is caching in Fastly’s CDN. The command below passes the ID of the public spreadsheet I use for the unplug app along with the Fastly-Debug header to gather some extra debug info.

$ curl -vv -H "Fastly-Debug: 1" http://cdn.foote.pub/feeds/worksheets/1QOPqFo1RQFrnyFGbvvNkU_JFeoWV8KYCiLR1jVOZ1nI/public/basic?alt=json 1> /dev/null
* Hostname was NOT found in DNS cache
> GET /feeds/worksheets/1QOPqFo1RQFrnyFGbvvNkU_JFeoWV8KYCiLR1jVOZ1nI/public/basic?alt=json HTTP/1.1
> User-Agent: curl/7.37.1
> Host: cdn.foote.pub
> Accept: */*
> Fastly-Debug: 1
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< X-Robots-Tag: noindex, nofollow, nosnippet
< Access-Control-Allow-Origin: *
< GData-Version: 1.0
< Last-Modified: Tue, 07 Jul 2015 20:15:17 GMT
< Content-Encoding: gzip
< P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
< P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
* Server GSE is not blacklisted
< Server: GSE
< Alternate-Protocol: 443:quic,p=1
< Content-Length: 731
< Accept-Ranges: bytes
< Date: Tue, 07 Jul 2015 20:38:49 GMT
< Via: 1.1 varnish
< Age: 1393
< Connection: keep-alive
< Fastly-Debug-Path: (D cache-iad2126-IAD 1436301529) (F cache-iad2144-IAD 1436300136)
< Fastly-Debug-TTL: (H cache-iad2126-IAD - - 1393) 
< X-Served-By: cache-iad2126-IAD
< X-Cache: HIT
< X-Cache-Hits: 1
< X-Timer: S1436301529.573645,VS0,VE1
< Vary: Accept, X-GData-Authorization, GData-Version
< 
{ [data not shown]

There is other useful info in the response headers, but the X-Cache: HIT indicates a win for caching.

5. Speeeeeeed

Here is the output from Chrome’s network graph before and after the switch:

Before

Before

After

After

In addition to the massive speedup, I found that Fastly delivers the content much more consistently that Google Spreadsheets. Before Fastly, I noticed that occassionally Google Sheets would take over a minute to respond to requests – usually I’d just reload the page and try again.

Now the only thing that is holding me back is my crappy javascript.

Security implications

Because anyone can upload content to spreadsheet.google.com, and this solution points cdn.foote.pub at spreadsheets.google.com, there are some security issues with this setup.

For example, if this was a web app and not a static web page, there would be Same Origin Policy issues to consider – anyone with a Google account can serve spreadsheets via cdn.foote.pub. But since this webpage doesn’t do anything technically interesting the worst thing that could happen is defacement – you could make a cdn.foote.pub link to a spreadsheet that defames me!

In addition, since foote.pub is hosted on Github Pages I’m not able to configure TLS for the site dierctly. So for now I’ve essentially configured Fastly to strip SSL from connections to Google Sheets served over cdn.foote.pub. Again, not a big deal for this webpage, but would allow an attacker to eavesdrop on an app that deals with sensitive data.

Conclusion

Configuring Fastly’s CDN to speed up my Google Spreadsheets graphs wasn’t too rough, and the CDN’s super quick purging saved me some pain in the little debugging I had to do.

If you made it this far I hope you enjoyed the post – thanks for reading.