Cloud Development Best Practices: Matthew on Package Management
By popular demand, we have just released the new PF command line interface tool. With it you can interact with PHP Fog from the command line and continue to use git submodules.
We have heard from a large number of our users that git submodules are absolutely essential to the way they build their apps. A lot of us here at AppFog have also used this technology in the past and we have some very spirited discussions here in the office amongst ourselves about what the best technologies are and how to utilize them.
We have collected a series of hard-fought best practices (we’re ashamed to say we’re not always above throwing things…). We’d like to take this opportunity to share with you some of the reasons why we have moved away from using git submodules to using package management tools, like Composer and Bundler, as essential elements of our best practices.
We’ve no doubt that many of you, including some of those within AppFog itself, will have further thoughts and comments about this and we welcome it all.
Matthew Lyon, one of our senior engineers, had a lot to share during our discussions about the deprecation of git submodules. Matthew is a huge proponent of “compassionate programming” which he describes as “having compassion for those who will later have to understand your abstractions, be they customers, other developers, or your own fine self on a poorly caffeinated or sleep deprived day.”
Matthew believes very strongly that git submodules are more akin to a spending your weekend with a particularly angry badger than with experiencing a compassionate afternoon of herbal tea and a nice massage. We asked him to delve deeper into his thinking on the subject, and this is what he had to say:
Matthew: A robust package manager of some kind makes it easier to manage complex projects within a team.
AppFog: Can you give us an example?
Matthew: Personally, I use Git submodules in my dotfiles repo to manage Vim plugins that my vim setup is “dependent” on when I can, instead of downloading them from the vim script archive and stuffing them in there manually.
The nice thing about this approach for the submoduled plugins is that when a new version of Syntastic comes out that has support for say, Potion, I can perform a “git pull” from that submodule’s directory instead of looking up where to download a new version, doing so, splatting it over the previous version, and then getting Git’s index in order.
AppFog: It’s probably a good idea to avoid splatting whenever possible.
Matthew: My thoughts exactly. The first downside to this approach is that submodules are a pain to manage. My dotfiles repo is checked out in literally dozens of places, and it forced me to learn more about how git works than I would ask of any of our customers.
AppFog: They’d probably rather be spending time drinking tea and getting massages.
Matthew: I know I would. Plus, there’s *still* no “git submodule remove” command, as many of you know I’ve created several of my own, and still haven’t boiled down this convoluted set of commands, that I nearly always have to look up, into a nice abstraction. That’s an interface issue, however, and if our customers were to run into it, it doesn’t get in the way of managing deployments.
AppFog: I sense there’s more you want to say on the subject…
Matthew: Oh, yes, I have several points… The second downside is that submodules I reference will occasionally go away, and then I can’t get *any* of them until I spend some time figuring out which it was, removing it and then hopefully pushing those changes back to origin. Unless of course I’m logged into a server somewhere that I haven’t turned on ssh identity proxying for, in which case I have to repeat those steps elsewhere, and .… Oh wait what was I trying to do again?
AppFog: Sounds like a pain.
Matthew: But wait, there’s more… The third downside is that submodules aren’t first-class citizens in a git repository. What if you want to experiment with a new version of your dependency? Submodules are properties of the repository, not the branch, so if you update a submodule in a branch it’s updated for all your branches. If you go into a submodule’s directory, make changes, and make a commit, git will happily let you do that. Then one of your collaborators pulls down your repo with a reference to a submodule’s commit that doesn’t exist at the submodule’s url and things break. Because git happily lets you do this our customers have done so, causing their apps to break through no fault of their own.
AppFog: Nothing ruins a relaxing time tea time like your app breaking.
Matthew: Exactly. These latter two issues have been problems for our customers and it creates a dillema for AppFog: how we should handle these failures? The correct thing might be to halt the deploy and ask the customer to correct the problem, but the customers I’ve heard from said they’d prefer we DIDN’T do that, and deploy their now possibly broken app anyway.
AppFog: And now we’re back to spending time with the angry badger.
Matthew: In rare cases, it’s not even their submodules that are the problem, it’s a submodule that one of their submodules is relying on, perhaps two or three times removed. You tell them they can’t download library baz but they say they’re only using library foo, but it turns out that foo depends on bar depends on bee depends on baz. This is a real problem and it’s happened to one of our customers. It wasted a ton of their time and ours, too. You can imagine the customer’s frustration as they had to deal with a problem like this that took so much time to troubleshoot.
AppFog: Something like that would probably require us to go back on our rage medication.
Matthew: All of this says nothing about the complexity for our queue workers, network issues and access control problems–apps that reference submodules in private repositories leading to “ but it works for me” issues. I’ve seen any number of problems that wouldn’t exist if git just shipped the dependent files along with the repo.
AppFog: What would you recommend as a solution for this kind of problem?
Matthew: Most languages have large, centrally managed collections of libraries. Perl has CPAN, Python has the PyPI, Java has Maven (among others), Ruby has Gems, and Node.js has npm (which is becoming useful to client-side Javascript programmers as well, through the excellent package manager Ender). These library collections are high availability; published libraries are publicly available (no access control issues) and don’t simply go away (very slim chance of removed repository issues).
AppFog: What about PHP?
Matthew: Packagist seems to be gaining momentum in the PHP world.
AppFog: So, that’s good news… Problem solved!
Matthew: Well, not entirely… The presence of a central library repository doesn’t magically make all of the problems with submodules go away, but it would mitigate many. What we’d like to do is have the customer ship their dependencies alongside their code akin to how AppFog works… Let me tell you a story about Ruby’s Bundler and Node’s npm.
AppFog: Please do….
Matthew: I have a Rails app, circa 2006. It runs Rails 1.1.2, so I log into the server, perform “sudo gem install rails –version 1.1.2″ and then I can run my app…. Oh wait… this is the cloud. One does not simply log into the server and perform ad-hoc systems administration. Those of you with VPSs shouldn’t do that anyway, because that way madness lies.
So at this point I take my Rails gem and I stick it in my code’s directory structure (“cd vendor && gem unpack rails”) and then ship it with my app. This approach is called “vendoring” because the convention has been to stick it in vendor/. Then you’ve got other gems. Let’s say you need a markdown parser. Perhaps something to act as a front-end to ImageMagick. The convention in the rails world was vendor/gems, and this became the best practice at the time.
Now you set about adding your dependent gems to your repository so that your collaborators don’t have to worry about installing your new gems. So you decided to go with ‘image_science’ as your image processor, so you check it into vendor/gems and push it up because everything seems to be working. Then you get a 2 AM call from your night-owl collaborator who’s trying to deploy your feature long before the 8 AM press release announcing your new awesome, but can’t, because he doesn’t have ‘hoe’ installed. What? It turns out that ‘image_science’ depends on ‘hoe’, but since you have ‘hoe’ installed locally and your rails app is requiring rubygems you didn’t get a clean environment to tell you about this. What’s more fun is when everyone has ‘hoe’ installed locally and no one notices this problem until it breaks your production site and you have to do it live.
AppFog: Nevermind that you shouldn’t be using a hoe to manage anything but weeds in your garden.
Matthew: No wonder you left your “career” in comedy to join AppFog.
AppFog: It’s no wonder at all.
Matthew: Well, the scenario I described also speaks nothing of the problems that result from gems such as mysql-superfast that need to be compiled against native bindings, wherein you then have issues like “where is the linker path?” and “32 or 64 bit what?” and pretty soon you’re contemplating a career in pig farming, because, well, bacon.
AppFog: I do love some good bacon.
Matthew: For a long time, the accepted practice with managed hosting was to ask when you needed a library installed that you couldn’t just ship with your code. Hosts would have the 200 most popular libraries installed and if you could vendor it you did that. If it needed to be compiled against native bindings the host did that for you after filing a support ticket. It would take them awhile to get their ops guy to quit complaining about having to do this *again* and a week later you would be able to deploy. Agile!
AppFog: Like a gazelle!
Matthew: Relying on “system installed” libraries like this is also problematic for two reasons: one, your host can install new versions with backwards-incompatibilities that break your app (this happened to me once, and exactly once, back when I thought shared hosting was a neat idea), and two, they might not get around to installing security updates as quickly as you’d like. Or, you know, ever.
AppFog: Or it could be a combination of the two…
Matthew: Vendoring manually has its problems too, though. The chief among these being a lack of canonical information about your dependencies. The designer wants something called ‘compass’ and this depends on a new version of ‘sass’, and you think you’re using sass 1.2.4 but it says it wants 2.0.4 specifically (2.0.5 has a bug that hasn’t been fixed yet, 2.0.6 is out but compass hasn’t been updated yet to reflect that) and argh what a chore.
AppFog: Let’s just tell the designer to shove off…
Matthew: That’s what I always do.
AppFog: Wouldn’t it be great if you could list out your dependencies in a single manifest?
Matthew: Rails took an early stab at this with “rake gems:install” but I don’t know anyone who ever got that to work. Alternatives to rails were gaining popularity in the ruby world, and then Bundler came along, and for all its warts it more or less works as it says it does: you have a ‘Gemfile’, you run “bundle install”, and it finagles your system to have the necessary dependencies, and everyone is happy until you forget to run something somewhere without prepending “bundle exec” and you accidentally delete your entire database.
AppFog: Serious bummer.
Matthew: Yeah, that’s putting in mildly. Bundler also lets you specify a packaged gem at a git url, so you can use your custom bug-patched version of some gem whose author seems to have vanished off the internet (yes I’m guity of this).
It’s also unfortunate that bundler doesn’t encourage you to “vendor” your gem dependencies in your app’s source control repository. So, any machine that downloads the new repository has to go out and fetch the gems and install them. And in the case of git url’d dependencies, you’ll run into the same issues we’ve had with git submodules.
The node.js community was emerging while the ruby world was figuring this out, and the package manager that won them over, npm, learns from a lot from these *hard* lessons. A nice facet of npm is that you declare dependencies for your application the same way you do for a library, in the “package.json” file. So from the root directory of your app, you type “npm install” and it does more or less the same thing that “bundle install” does, except that it stores them all in a “node_modules” directory under your app’s root directory. For an app, you then check your dependencies into your source control as with vendoring and both your team and your deployment servers are all running the same code. npm also provides a “rebuild” command for getting around host architecture issues.
AppFog: That seems pretty great.
Matthew: There was a fair bit of controversy about this approach at first, and people were just doing f.e. “npm install -g connect” to install things ‘globally’ because they had ten apps that depended on the connect library and didn’t want ten copies of it lying around… apparently disk space is really super expensive these days thanks to flooding in Thailand. Or something.
AppFog: Dude. Flooding in Thailand is so not funny. Too soon…
Matthew: Okay, you’re right about that… I haven’t spent too much time poking at composer yet, but packagist seems to be picking up steam, and if a package isn’t available in packagist, composer lets you download packages from git urls or zip files or wherever. From what I can tell, the problem with its approach to git urls is that it makes no attempt to integrate them into an existing project’s source control if that project is also using git.
AppFog: Maybe this is something that can be dealt with in the future?
Matthew: Perhaps we can look into sending them a pull request that would support that, or find a way to manage “dependent” git repositories outside of the main git repository and then copy them into it locally. Whichever, we want our customers to be sending the dependent library code alongside their app code so there’s no issues at deploy time with attempting to grab third-party code off the internet. This principle is similar to how AppFog works.
AppFog: Thanks for taking the time to talk with us, Matthew!
Matthew: No problem, but next time can we talk after my massage hour? Plus, my tea is getting cold.
AppFog: Sure thing!
