How RubyGems fetches Gems – Part 1

RubyGems and Bundler are often quite slow when I run them. I want to find a way to fix it. It can probably easily be done with a caching proxy, but where’s the fun in that? I want to figure out what exactly happens when I install a gem and why it is slow.

In case you want to follow along I’m using:

The index format

Let’s start by installing a gem. RubyGems has a verbose flag so you can get a bit more info about what it’s doing. This gives us the following:

$ gem install mysql2 --verbose
302 Moved Temporarily
304 Not Modified
200 OK
200 OK

The first line fetches the index of Gems held by Rubygems. This is currently hosted on Amazon S3, so a redirect is performed. Let’s take a look at this file:

$ wget
$ gzip -d latest_specs.4.8.gz
$ head -2 latest_specs.4.8
[�gI"_:ETU:Gem::Version[I1.2;TI"	ruby;TI"-;TU;[I"1;T@

Well that isn't what I expected. I assumed it would have been YAML or JSON. To the source!

Searching for "latest_specs" turns up the following in lib/rubygems/source.rb:14:

  FILES = { # :nodoc:
    :released   => 'specs',
    :latest     => 'latest_specs',
    :prerelease => 'prerelease_specs',

And on line 163:

  # Loads +type+ kind of specs fetching from +@uri+ if the on-disk cache is
  # out of date.
  # +type+ is one of the following:
  # :released   => Return the list of all released specs
  # :latest     => Return the list of only the highest version of each gem
  # :prerelease => Return the list of all prerelease only specs

  def load_specs(type)
    file       = FILES[type]
    fetcher    = Gem::RemoteFetcher.fetcher
    file_name  = "#{file}.#{Gem.marshal_version}"
    spec_path  = api_uri + "#{file_name}.gz"
    cache_dir  = cache_dir spec_path
    local_file = File.join(cache_dir, file_name)
    retried    = false

    FileUtils.mkdir_p cache_dir if update_cache?

    spec_dump = fetcher.cache_update_path spec_path, local_file, update_cache?

      Gem::NameTuple.from_list Marshal.load(spec_dump)

Ok, so it’s marshalled using Ruby’s internal marshaller. 4.8 is the version number of the marshalling format that has been used since Ruby 1.8.7.

We can find the format of the marshalled data in lib/rubygems/name_tuple.rb:26:

  def self.from_list list { |t| new(*t) }

Which says its an array of arrays. And on line 8:

class Gem::NameTuple
  def initialize(name, version, platform="ruby")
    @name = name
    @version = version

    unless platform.kind_of? Gem::Platform
      platform = "ruby" if !platform or platform.empty?

    @platform = platform

We find it contains the following:

irb(main):003:0> Marshal.load('latest_specs.4.8')).first
=> ["_", #<Gem::Version "1.2">, "ruby"]

So the name of this gem is _, it is version 1.2 and is generic. If we look for the mysql2 gem we can find there is a generic version, and two versions for Windows:

irb(main):011:0> Marshal.load('latest_specs.4.8')).select { |spec| spec[0] == "mysql2" }
=> [
  ["mysql2", #<Gem::Version "0.3.11">, "x86-mingw32"],
  ["mysql2", #<Gem::Version "0.3.11">, "x86-mswin32-60"],
  ["mysql2", #<Gem::Version "0.3.17">, "ruby"]

Even though the data is basically text, the Gem::Version object is included for backwards compatibility. This seems a bit unoptimal given it adds quite a bit of data when an individual spec is marshalled:

irb(main):002:0> Marshal.dump([ "mysql2","0.3.17"), "ruby" ]).length
=> 59
irb(main):003:0> Marshal.dump([ "mysql2", "0.3.17", "ruby" ]).length
=> 42

However the Ruby marshal format is rather efficient, so it only adds about 3kb (less than 1%) of overhead to the complete compressed spec file. It also compresses duplicate objects well:

irb(main):001:0> Marshal.dump([ "abc", "abc", "abc", "abc" ]).length
=> 45
irb(main):002:0> a = "abc"
=> "abc"
irb(main):003:0> Marshal.dump([ a, a, a, a ]).length
=> 21

In the next part I’ll look at how RubyGems figures out dependencies.

WordPress multisite with multiple domains

One of the reasons why I decided to migrate to WordPress rather than another blogging product, is because WordPress is a platform you can use to build more complicated sites on. One feature (that is now built in) is multisite – it allows you to setup multiple sites sharing a single Wordpress installation. The idea originally was to create a “network” of related sites, but you can use it to create sites that are completely separate from each other – each site can have it’s own plugins, themes and even users. Setting it up wasn’t quite as simple as I expected, so here is what I did to get it to work.

The first part of setting up multisite is to edit your wp-config.php adding a flag to enable WordPress multisite:

define('WP_ALLOW_MULTISITE', true);

Add that before the “That’s all, stop editing! Happy blogging.” line. Once that’s done in the admin panel you will find a new option Settings -> Network setup. This sets up your database (remember to take a backup first, I use Updraft Plus to put weekly backups on Amazon S3) and alters the config to enable the network.

www vs non-www

This gave me a warning though because my domains were prefixed with www. I usually setup sites to be hosted on the www subdomain and redirect the root domain to that through Nginx (or whatever web sever I’m using). This ensures search engines don’t see the different subdomains as different sites - having duplicated content on different sites impacts SEO. The warning said that removing the www prefix would still mean the content would be available and not break anything, so I proceeded to remove it but forgot about the redirect on Nginx resulting in an inaccessible site. I removed the redirect, but then everything was only accessible under, going to redirected there, but I wanted it the other way around.

I restored the latest backup and took a look again. The warning said the site url” needed to be changed, this is what is in the WordPress config:

Screen Shot 2014-07-14 at 14.06.00

So which one is the “site url”? I thought the second one, so changed that and tried again… nope. By “site url” it really means WordPress Address (URL), so I changed that to

When setting up the network it will give you some settings to put in your wp-config.php and .htaccess files. As I’m running Nginx I ignored the later (it worked out of the box after removing the redirect), and the wp-config.php file had been changed automatically, so there was nothing to do there. After that Stacked Notion was available under, with redirecting there. Weirdly permalinks still referred to, redirecting to www when going to them – I’m not sure how to change that, so if anyone knows let me know!


By default multisite is designed to be setup for multiple subdomains under a single root domain. As I already have I could add I didn’t want that though, I wanted a new site under a separate domain

The first part is to add a new site in WordPress, I created (My Sites -> Network Admin -> Sites then Sites -> Add New) and verified this worked as expected (at first I got a blank screen, by default it uses the twentyfourteen theme which I had uninstalled, so check the theme this site is using if you get the same). Next I needed to set it up to run from a separate domain.

The way to go about this is through the WordPress MU Domain Mapping plugin. Once I installed that plugin (you need to download and install it manually rather than through WordPress) and activated it for the network (My Sites -> Network Admin -> Plugins), you setup everything through Settings -> Domain Mapping. Again this isn’t as simple as it should be. I left the IP address field blank to ensure that the installation remains portable, and entered into the Server CNAME domain field. I left the options as the default – although if you want separate users you should disable Remote Login.

If you haven’t already, setup the DNS for your new domain to point at the server WordPress is hosted on and let it propagate. To then add the domain you go to Settings -> Domains. You don’t need to setup the original domain here ( in my case), so I just added with Site ID 2. You can get this from the wp_blogs table, but confusingly it is the blog_id value not site_id. After that going to mapped to the new WordPress site / blog.

Tidy up

To assist testing I setup Nginx to redirect any subdomains on and to WordPress. Going to these where the site doesn’t exist gave an error saying registration was disabled. To avoid that I just setup a redirect in Nginx so that anything other than www redirected to the root domain.

That seemed to work, until I tried logging into This failed as WordPress still thought it was hosted under, you can’t change the domain through the interface once multisite is enabled, so I did it in the database. The options for this site were stored in wp_2_options, so I replaced with in there:

mysql> select * from wp_2_options where option_value like '%jobs%';
| option_id | option_name | option_value                  | autoload |
|         1 | siteurl     | | yes      |
|         2 | blogname    | Remote Tech Jobs              | yes      |
|        33 | home        | | yes      |
3 rows in set (0.00 sec)

mysql> update wp_2_options set option_value = '' where option_id in (1, 33);

After that everything worked as expected!

Travelling while working is terrible; travelling while working is awesome

For the last couple of months I’ve been living as a digital nomad. I’ve been to 5 cities so far, and I’m currently in Georgetown, Penang, Malaysia. It’s 32c and sunny outside, yet I’m sitting in Starbucks (more on that later). When I meet people and tell them what I do, they say they would love be able to do that, yet my usual response is a bit of a negative “it’s not too bad”. I haven’t really seen any posts discussing the issues of being a digital nomad, so I wanted to cover that a bit.

Unlike a lot of digital nomads I work pretty much 40 hours a week. I don’t have a blog or a product I sell online, I work as a freelance software engineer. I could work less if I wanted to (I’m planning to reduce my hours over the next few months), but I still have deadlines I need to meet. I still have to be available for meetings, which often happen at 10pm at night due to time zone differences. I don’t have set hours, so if I just don’t feel like working I can go and sit on a beach for a couple of days, but I try to keep at least some sort of routing. If I don’t, I just end up doing whatever I feel like and then realise I haven’t done any work for the last 5 days. Even before I started travelling, I tried to keep a schedule when working otherwise the same would happen. Keeping a schedule ensures that doesn’t happen and also helps with motivation. Here where it’s warm and sunny, and the cool people you meet tell you about all the awesome things they are going to do, it’s a lot harder to motivate yourself to work.

Another issue is where to work, as I said before I’m sitting in Starbucks. An Aussie just asked me “Why are you getting the most expensive coffee in Asia?”*, I replied explaining what I did and that I need wifi. Unfortunately Starbucks is one of the most reliable places. Here in Georgetown there are actually quite a few decent coffee shops, however for some reason they are closed on Monday (today). In SE Asia, in general there aren’t really many coffee shops, at least what I would consider a coffee shop and wouldn’t mind spending a few hours in. Rather confusingly what is called a kopi tiam (coffee shop) here is more reminiscent of a restaurant or food court. Wifi in accommodation is a bit hit and miss as well, guest houses usually just have a residential broadband connection which falls apart under the load of 5 backpackers trying to use Skype. For backup I have a prepay SIM in my phone and tether 3G, I’ve currently got 4.5GB for £13, however that will probably only last a few weeks – mainly because the hotel I stayed at in KL had even worse wifi. One thing I’ve learned is to only book accommodation in advance for at most a couple of nights, if you like it you can extend it, usually paying a lot less than you would through an online booking site. As well as avoiding terrible wifi situations, it also means you can be a bit flexible on what you do. Don’t like the place? Go somewhere else! It’s coming to the end of the season here now, so you don’t really even need to book in advance.

As a solo traveller it is very easy to meet new people, backpackers are very open, so it’s really no more than just saying “hi”. A lot of people who make friends while travelling do this, then just tag along with them to wherever they go next. Some people I met are going to Thailand tomorrow, I wouldn’t mind tagging along except I’ve got work to do over the next couple of days. So while they are off on boat trips and exploring the place, I’ll be sitting in Koh Lipe’s equivalent of Starbucks (probably a bar on the beach – ok that doesn’t sound too bad). Being a digital nomad, especially working in the way I do, you still have some responsibility and can’t just change everything at the drop of a hat. This means although it’s easy to meet new people, it’s pretty hard to make long term friendships. Fortunately I have a SO as developing relationships would be even harder.

A lot of the questions from would be travellers, are how long they can travel for based upon how much they have saved up. Probably the best thing about being a digital nomad is that your budget is pretty much unlimited. I don’t need to worry about choosing between a 15 ringgit (£3) and 20 ringgit (£4) dorm, I can drop 60 ringgit (£12) on a private room without a blink of the eye. One of the interesting things about travelling here in SE Asia is actually how cheap it is. I’m eating out for every meal, going to a bar a few nights a week, going to tourist attractions and buying a plane/train/bus ticket every few weeks, yet I’m still spending less than I would on renting a flat back home. I’ve been told that Thailand is even cheaper than Malaysia, so I can’t wait to go there. Living as a digital nomad in SE Asia means that you will save a hell of a lot of money, alternatively (as I plan to do) it means you can just work a lot less.

Although I’ve been quite negative about this, I don’t really think I need to discuss why being a digital nomad is awesome. There are a lot of posts about that, but I wanted to go through some of the downsides. To recap rather than being stuck at home, where it’s 6c and raining, instead of doing the same boring thing everyday, I get to travel and explore the world. Oh, and what makes travelling while working really awesome? It’s 32c and sunny outside. I’m off to the beach.

(*A place here sells Kopi Luwak for £9/cup which is more expensive than Starbucks)

This was also posted on Reddit. You can find further insightful discussion there.