If you are in IT, you probably noticed that most of the industry’s technical buzz lately has been centered around one of three huge areas – cloud computing, nosql and devops. Unlike Web 2.0 or Social Web, which are about content generation and content consumption models on the Internet, these three are actually about how software systems are built and operated – it is “engineering” vs “product.”
DevOps is on the rise as a newly re-defined standalone discipline, as evidenced by increased number of good articles about it around the blogosphere. In this post, I am going to take a stab at outlining what DevOps means to me.
I’ve got some devops cred. Before joining my current employer where my role morphed over time away from devops, for over 2 years I had been at Orbitz.com where I was in a group in charge of monitoring and automating a hugely distributed multi-datacenter custom airfare search application, running on many hundreds of machines, with several times as many separate entities and processes that needed to be coordinated, restarted, tweaked and so on (our group was in charge of everything above hardware, OS and basic network services such as connectivity, DNS and DHCP). Before that I had had various sysadmin roles, which all involved a large degree of coding beyond the level of simple shell or perl scripts.
To me, devops is a distinct discipline at the border between software engineering and ops, which focuses on developing software for the infrastructure on top of which end-user-facing software is running. It’s sometimes referred to as development of infrastructure software and includes release deployment. Devops has the following distinguishing characteristics.
1. Ability to write code beyond simple scripts
Obvious necessary condition.
2. Focus on stability and uptime
Stability and uptime in devops almost always trump features.
3, Extra focus on moving between states
In dev land, I have often observed situations when the end result of a particular feature was analyzed on its own merit, without taking into consideration how a system can be moved from its current state to its desired future state. Devops pays extra attention to this problematic area.
4. Different angle on business revenue
While developers usually work on things that are meant to increase or sustain business revenues, devops often work on things that are meant to prevent or reduce loss of business revenues. This is somewhat similar to defense vs offense in team sports. The key word is “balance.”
5. In devops, we are users of our own software
This is one of the most important distinctions. Unlike developers who create software to be used by someone else (internal customers, end users, site visitors, etc), devops is about developing software for internal needs. For example, you can certainly get sloppy in logging that error, but it’s you, not someone else, who’s going to suffer the consequences of having to waste extra time to find necessary information.
6. Architect, developer, tester, product manager, project manager – all in one.
My personal experience in devops is I/team get(s) an area of responsibility and it’s up to me/us to make it happen. Assigning priorities, figuring out dependencies, reacting to unexpected changes, managing resources – all of these functions are performed in devops by the same group of individuals.
7. Awareness of normal accidents
I have an entire blog post dedicated to this – check it out.
8. QA in production
Some tasks in devops can’t be adequately tested in smaller synthetic environments. Lack of scale, lack of unique hardware, lack of sufficient capacity in vendor’s test environment, lack of sufficient connectivity from test site to vendor’s systems – all could be factors. Phased deployment and other techniques designed to reduce the risk of a complete meltdown are (or should be) used extensively in such scenarios, but the truth is – from time to time in devops I had no other way but to actually run a test system in a live production environment.
9. Manual first, then automate
In my experience, a devops task is more likely to start out as something done manually at first, and automated later. In dev land, tasks rarely go through manual phase before being coded up and shipped in a release.
10. Almost always distributed or hyper-distributed
Conclusion
Devops is on the rise primarily due to realization that there is a big gap between developing end-user systems and bare-bones systems administration, in large part due to fast growth of IaaS cloud computing. Devops originated at places where a relatively few sysadmins were in charge of many hundreds or even thousands of hosts – where doing their job without automation was impossible. As time goes on, I expect devops will further solidify its role as a first-class citizen and make inroads into non-cloudy companies as well.
Tags: devops · infrastructure development
February 16th, 2010 · 1 Comment
In computer science, according to Wikipedia, abstraction is a “mechanism to reduce and factor out details so that one can focus on a few concepts at a time.”
When you hear about abstraction in the context of virtualization-based IaaS cloud computing, the most well known abstraction is computing resources themselves (encapsulation is at play here as well). You don’t need to know exact hardware on which your instance is running, or exact network setup – you only need to be able to treat your compute instances as nearly identical units that respond to certain set of signals in a predictable way.
With emergence of multiple IaaS clouds however, there is a second abstraction that is going to play a big role – workload.
Workload is an abstraction of the actual work that your instance or a set of instances are going to perform. Running a web server or a web server farm, or being a Hadoop data node – these are all valid workloads. I am treating a workload as an abstraction, because I intentionally leave out a huge component – exact way how work in a given workload gets mapped to resources offered by a given cloud. When speaking in terms of workloads, I want to focus on what needs to be done, as opposed to how it’s going to be done in the context of a particular cloud (remember that from technical architecture perspective, clouds are far from identical).
For example, “run this blog for 1 year, for up to 100 visitors a day” is a what (workload), while “run this blog on m1.small EC2 instance in us-east-1 for 1 year” or “run this blog on Terremark instance with 1 VPU and 1 GB of RAM for 1 year” are a how (for lack of a better word, I am going to call them deployments).
I think such an abstraction is very helpful. Running this blog may take 1 small instance in one cloud, half of a small instance in another, a third of an instance plus a dedicated load balancer in third. As you can see, once you map a workload to a set of compute, storage and network resources offered by one cloud, you can no longer move it to another cloud – your deployments are non-transferable from one cloud to another. Workloads here serve as transferable equivalents of your cloud deployments.
Secondly, workloads by themselves may have properties or attributes that could dictate where workload can or can’t run. This justifies existence of a workload as a separate entity – it is in theory possible to construct a workload for which no deployment can exist in any of the clouds available today.
There are many examples what kind of attributes a workload may possess. A workload may have a compliance attribute, which says that this workload must run in an environment with a certain certification. Another attribute may be a geo location requirement, whereas it must run within a certain geographic region for a legal reason.
A workload may be time-bound (”runs for 5 hours”) or time-unbound. A workload may have a specific start time or flexible start time, in which case it may have a hard stop time (for example, must finish by a certain time in the future). It can be interruptable or must run without interruptions.
A workload may have a certain lower limit of resources that it needs, expressed in work-independent form. For example, serving Wordpress blog for 1 visitor a day as opposed to 100 visitors an hour are two very distinct workloads (note that workloads are different, while the application inside the workload is the same). The latter will certainly end up consuming more resources than the former.
Workload may have a budget associated with it, it may have redundancy requirements. It may require a certain OS or distribution. It may require a certain feature (for example, persistent disk or non-private IP address directly attached to eth0). It may require a certain minimal access speed to some data source (for example, if my data are in S3 on the East coast, I may want my workload to run somewhere near). Each requirement is a restriction – the more requirements a workload has, the fewer clouds can potentially run it.
Conclusion
The answer to the question “Where is the best place to run this task?” used to be treated as a binary decision (”on premises” vs “in the cloud”) but not any more – because there are many different and incompatible implementations of the latter. Looking at your tasks via workloads/deployments prism may open new horizons for computing mobility. There is a saying “select the right tool for the job.” It can be now extended to “select the right tool and the right location for the job.”
If you like the idea of cloud computing workloads, you may find this post by James Urquhart interesting as well.
P.S. Believe it or not, this is my 100th post on this blog. Not bad. Hope at least some of you enjoy reading my posts as much as I enjoy writing them.
Tags: cloud computing
February 6th, 2010 · 2 Comments
From time to time, I come across a statement that every service on the Internet must have an API, or people behind this service are doing it wrong. This phrase usually applies specifically to publicly available API.
As a user who stands to benefit from increased number of services allowing third-party applications and mashups, I certainly tend to agree. But as a developer, I realize that prematurely making API public may be a disaster.
Publication of API represents a long-term commitment. You as a developer are committing to supporting this API for some non-trivial amount of time (at least 12 months I would imagine) and are essentially inviting other developers to create new functionality against this API. No one likes to spend their time developing against a given API just to discover shortly that API changed, or some functionality that used to be offered is no longer available.
By making your API public you are signaling that this part of your system is very stable, its functionality well established, understood and developed, usage patterns well thought out. Or at least that’s how I as a third-party developer interpret your action.
If you know your audience well enough and are pretty confident that they won’t mind your tweaking things after initial publication, you may take a risk. Twitter famously launched their API very very early, and in the end it proved a huge success for them. (So if they listened to my advice in this post, they would be worse off).
But not all developer audiences may be as agile and forgiving as Twitter’s. I can imagine a very conservative big user of your API that will very strongly object to your changing the API. What do you do next? Maintain 2 versions? But what if underlying database schema changes make old API incompatible with what you are trying to do in the future? Fork and host 2 different systems, old and new? I can’t honestly imagine a worse scenario.
My advice – before publishing your new API, make sure you are not going to force yourself into a corner down the road. Only publish API for those parts of your systems that are very stable (both operationally and from perspective of internal mechanics and functionality) and where usage patterns are well researched and predictable to a certain extent. Don’t rush it.
Tags: Internet · software engineering
January 26th, 2010 · 1 Comment
In December 2009, Amazon Web Services team introduced yet another innovation – spot pricing for EC2 instances. Several sites were created shortly to track spot price history by creating price charts. But price charts are relatively boring – juicy meat is in the dynamics hidden inside series of numbers which represent the price history. Let’s do some exploring!
Several notes first.
- All references to times and dates below are GMT for all regions.
- Spot instances went live on December 14, therefore I ignore all data points before that (for simplicity, my cutoff was set at UNIX timestamp 1260777600 – it’s 8am on December 14 GMT, which translates to midnight in Seattle where AWS is headquartered).
- Spot price history was obtained on January 25, 2010 at 10:54pm via API and cached locally for analysis.
- In order to be able to deal with integers instead of floats, all prices below are represented in points where 1,000 points = $1 per compute hour.
- Each product is specified as [region, instance_type, product_description] tuple.
- I am only going to outline facts below, all interpretation is up to you.
- These results have not been exhaustively verified, my analysis code may have bugs. Use at your own risk.
#1 Price averages
Here is a chart of average spot price for each product relative to regular price for the same product (averages take into consideration for how long each price was valid). Percentage next to a product identification represents the ratio between average spot price and regular price.

#2 Price increases in a row
Maximum number of price increases in a row was 6. It occurred on January 23-24 for [us-west-1, m1.large, Windows] and the price went up from 256 to 273.
5 price increases in a row happened also once, 4 in a row – 16 times, 3 in a row – 95 times, 2 in a row – 643 times, and a single increase immediately followed by price reduction happened 2,433 times. Of the latter, 684 times (28%) were a single price increase followed by price returning to where it used to be right before the increase (X -> X+Y -> X).
#3 Individual price increases
Maximum single price increase in absolute terms was 928 – it occurred for [us-east-1, m2.2xlarge, Windows] when the price went up from 572 to 1,500. Second biggest was 890 for [us-east-1, m1.large, Linux/UNIX] and third biggest – 551 for [us-east-1, m1.small, Windows]. Note that all of these occurred in us-east-1.
The biggest price increase as percentage of the regular price was 460% when a price for [us-east-1, m1.small, Windows] jumped from 49 to 600 on January 24. The second and third biggest in the category were 262% increase for [us-east-1, m1.large, Linux/UNIX] (110 -> 1000) and 64% increase for [us-east-1, m2.2xlarge, Windows] (572 -> 1500).
The same two biggest increases were also the biggest price increases as percentage of current spot price – 1,124% and 809%, respectively. Third place in this category was a 186% increase for [eu-west-1, m1.small, Linux/UNIX] when the price went up from 28 to 80.
Here is a chart showing price increases and reductions day by day.

#4 Number of datapoints per product and/or product family
There were a total of 4,469 spot price revisions for Windows and 3,885 for Linux/UNIX. By region, us-east-1 had the least price revisions in total – 2,491, of which 1,254 were for Windows and 1,237 for Linux/UNIX (50.3% vs 49.7%). A total of 2,809 price revisions in eu-west-1 were distributed 1,518 for Windows vs 1,291 for Linux/UNIX (54% vs 46%). A total of 3,054 price revisions in us-west-1 were distributed 1,697 for Windows vs 1,357 for Linux/UNIX (56% vs 44%).
[eu-west-1, m1.small, Windows] had the most price revisions – 287. [us-east-1, m2.4xlarge, Windows] had the least – 40.
Across all regions combined, the most price revisions per day happened on January 22, 2010 – 351 price revisions.
#5 Percentiles
Here is a Google Fusion table with percentile estimates for each product. I tried to calculate percentiles from 50th through 95th (step 5) and 99th, but since a price function consists of discrete values, not all percentiles could be estimated. For each percentile, a nominal price is provided along with its percentage of the regular instance price for a given product. Percentiles take into consideration for how long a given price was valid.
#6 Spot price over regular price
Situations when a spot price is equal or exceeds regular price are especially interesting. Most such situations occurred in us-east-1, and none of them occurred in eu-west-1.
Spot price has reached but not exceeded the regular price for [c1.xlarge, Linux/UNIX] twice, [c1.medium, Windows] twice, [m1.small, Windows] 6 times, [m1.large, Windows] once – all in us-east-1.
In us-west-1, spot price for [m1.large, Linux/UNIX] exceeded the regular price by 20 for under 2 hours on December 29.
Spot price for [us-east-1, m2.2xlarge, Windows] exceeded the regular price by 60 for over 20 hours on January 11-12.
Spot price for [us-east-1, m1.large, Linux/UNIX] exceeded the regular price by 64 on December 17 and by 660 twice on December 18.
And finally, spot price for [us-east-1, m1.small, Windows] exceeded the regular price by 480 once and by 430 once – both on January 24.
Conclusion
There are hardly any surprises in the spot price history so far, but it’s only been less than 2 months since the feature was launched. As the usage ramps up, I expect it will become more interesting. Kudos to AWS team for coming up with this innovative pricing mechanism and being the first to introduce it at such a large scale in a real environment. Only time will tell if it will stick in its current form or if it will morph into something else (I have a couple of ideas), but the first small step towards dynamic pricing of computing resources has been made.
Tags: cloud computing
January 11th, 2010 · 3 Comments
Designing a fully-automated or nearly-fully-automated computer system with many moving parts and dependencies is tricky, whether a system is distributed, hyper distributed or otherwise. Failures happen and must be dealt with. After a while, most folks grow up from “failures are rare and can be ignored” to “failures are not that rare and can not be ignored” to “failures are common and should be taken into consideration” to “failures are frequent and must be planned for.” The latter seems to represent the current prevailing point of view.
But here is a kicker – it’s not the end. I saw this tweet, read this post and checked out a book by Charles Perrow titled “Normal Accidents” from the library. Published in 1984, the book is not about IT, but its material fits our field nicely. And boy, was I enlightened!
The book’s main point: no matter how much thought is put into the system design, or how many safeguards are implemented, a sufficiently complex system sooner or later will experience a significant breakdown that was impossible to foresee beforehand, principally due to unexpected interaction between components, tight coupling or bizarre coincidence. For us in IT, it translates to “no matter how much planning you do or how many safeguards you implement, failures will still happen.”
There are at least 3 common themes that are present in multiple illustrations in the book:
- A big failure was usually a result of multiple smaller failures; these smaller failures were often not even related
- Operators (people or systems) were frequently misled by inaccurate monitoring data
- In a lot of cases, human operators were used to a given set of circumstances, and their thinking and analysis were misled by their habits and expectations (”when X happens, we always do Y and it comes back” – except for this one time, when it didn’t)
I have had my share of outages and downtimes, and I can attest that I have seen these 3 factors play a big role in tech ops. Some were bugs in management and monitoring code, some where human error, some where bizarre set of dependencies but all were a combination of multiple factors. For example, who would have thought that with a failure of primary DNS resolution server, the VIP would not fail over to the secondary; and even though hosts had more than one “nameserver” line in /etc/resolv.conf, application timed out waiting for DNS to respond before getting to ask the second nameserver; without name resolution, multiple load balancers independently thought that there was no capacity behind them (because management code calculated capacity in near real-time relying on worker hosts’ names) and disabled themselves, thus taking down the entire farm – now I know of course…
It turns out we can’t eliminate normal accidents altogether, but here are several techniques that I have been using to speed up detection and response in order to reduce the downtime.
Complexity budget. Described by Benjamin Black, this is a technique to allocate complexity among components beforehand and strictly follow the allocation during implementation phase. It helps avoid unnecessary fanciness and leads to simpler code, which tends to be easier to troubleshoot and recover after a failure.
Control knobs/switches for individual components. As John Allspaw shows on this slide, you need to be able to turn off any component in an emergency, or throttle it up or down. Planning this feature and building it in from the very beginning is very important.
Accuracy of monitoring data. Ensure your alarms are as accurate as possible. No matter how much chaos is going on inside the system during a severe failure, last thing you can afford is misleading the operators with wrong information. If you tried to ping a host A and didn’t get a response, your alarm should not say “host A is down” because it’s not the knowledge you obtained – it’s an assumption that you made. It should say “failed to ping host A from host B” – maybe it was network on host B that was an issue when a ping attempt was made, how do you know?
Availability of monitoring data. There is a reason first thing the military try to do when attacking, is disrupting enemy’s means of communication – it’s that important, which applies to our case as well. You either design your systems to be able to get monitoring data even during the worst outage imaginable (ideally from more than one source), or you at least should be getting an alarm about lack of such monitoring data (it’s a very weak substitute though).
All in all, to everybody in IT, I highly recommend the Normal Accidents book as well as this whitepaper (linked from John Allspaw’s blog).
Tags: distributed · infrastructure development · software engineering
December 18th, 2009 · 2 Comments
As most of you probably know, I work at CohesiveFT where I focus on VPN-Cubed product. In short, it’s a solution to build overlay networks in third-party clouds. Overlay networks in this case are based on redundant encrypted point-to-point connections from your regular servers to your VPN-Cubed servers called “managers” (that you run in the cloud); managers then act as virtual switches and routers of this overlay, which essentially sits above your physical network. In other words, an overlay network gives a customer effectively a LAN-like network where the servers can be located pretty much anywhere, including in the cloud.
However, not all people know what an overlay network is or what its benefits and strengths are. This holiday season, as we were putting up our outdoor decorations and holiday lighting, I realized that what my wife and I were doing was essentially building an overlay network. Let’s follow the similarities.
Imagine a regular house with a front yard where for the holidays you want to set up a bunch of lighted Christmas trees, deer and other holiday figures. All of them require electricity – but there is no power installed in the ground (parallel with VPN-Cubed overlay network: you are deploying servers to third-party cloud and want to continue using your IP addressing schemes, want to ensure that all communications are encrypted – but provider doesn’t offer any of these services out of the box).
You don’t need power out on your front yard all year around – so there is usually no point in investing money in installing one. Cloud computing is all about elasticity. As a complement to clouds, VPN-Cubed is easy to set up and take down if necessary for an experiment, or it can be running for long periods of time.
There are several outdoor outlets on the front wall so you are deciding to power your decorations from these outlets (you have VPN devices installed on the edge of your network – you will use them to offer connectivity to your servers from your network using VPN). The first obvious solution is to run a power cord from each piece towards an outlet. While it’s possible in theory, it will turn out ugly in practice. Firstly, a lot of long outdoor power cords are expensive. Secondly, it will create a cabling mess near the outlet. Thirdly, if a cord goes bad, you need to trace where exactly it’s plugged in and replace it. Fourthly, the more stuff you have to power up, the more difficult this octopus made of power cords is going to be. Absolutely the same problems apply in our parallel use case.
So you come up with optimization #1 – you go out and buy several outdoor power strips with several outlets each. By placing these power strips where your lighted trees and deer are, you are reducing cabling issues, gain ability to use shorter power cords and most likely save money on power cords. That’s your VPN-Cubed manager server instance. When you place it next to your cloud-based servers, you reduce latency for your endpoints and cut down on VPN connections from the edge of your network that you need to build and maintain.
If you are well prepared (i.e., have enough of everything), your composition will drive how many power cords and strips you will need and how long your cords need to be, not the other way around. Same with VPN-Cubed – you mold it to fit your use case, your desired topology or application – you don’t adjust your application to be able to work within VPN-Cubed overlay network.
Outdoor power strips have additional protection to let them function outdoors in low temperatures. And so are VPN-Cubed manager instances – they are running a hardened OS, with minimal set of enabled services, behind firewall protection. You can grab a regular switch and make it work outdoors – but why waste your time when these things don’t cost that much? Same with VPN-Cubed.
But power strips may fail – and if they do, entire section of your composition will be turned off. So you get a cold standby sitting in your garage in case a primary goes out. Or better – you install 2 power strips next to each other, connect them and evenly plug in your endpoints. If one goes out, you switch all connections to the other strip and it’s back. VPN-Cubed allows you to deploy a hot spare with automatic failover capability, which can help balance the load as well. Your outdoor lighted Christmas tree is connected to one power strip at any given time, but if one fails it can be reconnected to another within a power cord distance. Same with VPN-Cubed – your servers are connected to a single manager at any given time, but if a manager becomes unavailable, your servers can automatically re-connect to another manager.
And what happens if one of your outlets goes bad? Moving a handful of cables to another outlet is much easier than moving a whole lot. Same with VPN-Cubed – if your network loses one entry point, you just re-connect VPN-Cubed to another.
There are many more parallels between the two. Most of us have been building overlay networks of decorations for quite some time. Building overlay networks for the cloud may be new, but CohesiveFT VPN-Cubed product makes it easy and fun. Don’t be stuck with long power cords – get yourself some nice outdoor power strips. And enjoy the holidays!
Tags: cloud computing · cohesiveft · infrastructure development
December 13th, 2009 · Comments Off
This past Thursday I had a chance to attend CloudCamp Boston that took place in Microsoft research center in Cambridge, as a representative of CohesiveFT. The event was very well attended, and I was able to meet a lot of smart interesting people working in cloud computing space.
The lightning talks section started with Microsoft representative giving an overview of Azure. Then, John Willis gave a talk titled What color is your cloud in which he talked about various types of IaaS clouds. George Reese gave a talk where he compared a successful cloud deployment to reaching the Emerald City and pointed out that the yellow brick road to this goal is not always an easy one (great analogy!). Iron Mountain representative gave a talk about how one needs to be aware where their data are in the cloud, and emphasized security measures at their datacenter. Intuit representative talked about their PaaS, which allows developers to easily reach millions of small businesses already running Intuit products. PaaS is not my thing, but the idea makes sense and if I understand correctly, is very similar to idea behind Salesforce platform – develop against something which many organizations already use. And finally, Cory Von Wallenstein of Dyn, operators of a well-known DynDNS.org, gave a talk about their enterprise features like anycast DNS, CDN etc. Interestingly, now that I think about it, DynDNS offered a way to update DNS programmatically way back when, which definitely qualifies them as one of the earliest cloud APIs out there.
After a break, I attended a cloud security talk titled Cloudifornication by Chris Hoff. I’ve seen the slides and video of this talk before (for example, see here), but seeing it live was more than worth it. This is a very good and important talk for all cloud practitioners and especially architects and developers, and I highly recommend it. I personally had 3 main takeaways. Firstly, information security is based on C-I-A (confidentiality, integrity, availability). Therefore, any outage or service disruption is classified by a customer as a security issue, not only as an SLA issue. I didn’t know about this until a couple of weeks ago when Chris explained it to me on Twitter, and the talk also emphasized this fact. Secondly, I loved a series of slides about increasing complexity of interconnects as more and more vendors, intermediaries and brokers are added to one’s cloud mix. We at CohesiveFT are very aware of this as an emerging issue, and our VPN-Cubed product is targeted at such cases, among other things.
And thirdly, Chris very skillfully highlighted the brittleness of the foundation on top of which we collectively as an industry are currently building out our cloud offerings. When Internet was designed as it was for world wide web and static pages, it was all good. When we started doing e-commerce and social media on top of the same infrastructure, the risk increased many-fold but was still somewhat manageable (after all, it’s only buying stuff online). But now with cloud computing we are putting absolutely everything (!) on top of the same brittle foundation, and the risks are truly enormous.
Then, I attended a session on private cloud led by John Willis, where we discussed various private cloud technologies and ideas. My main takeaway was that there is or there will be a huge demand for private externally-operated clouds for mid-sized organizations, and that’s where I think future of colo and hosting is going to be.
All in all, this was a great event and thanks to all organizers and sponsors for putting it together, and to all participants for interesting discussion.
Tags: cloud computing
December 2nd, 2009 · Comments Off
Since earlier this year when I got my copy of Erlang book, I’ve wanted to do something unconventional with RabbitMQ source. I finally came up with an idea, which is somewhat interesting and maybe even useful, and could be done by an Erlang beginner like myself.
Some background first. Each Erlang program in general consists of multiple Erlang processes that send and receive messages to/from each other. These are not your regular processes – these processes are running inside the Erlang VM, and are not mapped to either processes or threads on the system level in any way. They are cheap to create, and communications between them are fast. Each process has a process ID (pid) associated with it. If multiple Erlang VMs share a cookie (a piece of text which is used as a security token), processes on any Erlang VM can freely talk to each other without any code modification.
RabbitMQ implements each queue as an Erlang process. You can think of AMQP exchanges as pieces of routing logic – when a message arrives, Rabbit applies the routing rules to its routing key and sends the message to all eligible queues. Since each queue is a process and has a pid, in a nutshell exchanges send the message to several pids.
In non-clustered mode, Rabbit routing function will return a list of local PIDs by selecting values stored in a mnesia table (rabbit_durable_queue and rabbit_queue). But since RabbitMQ can also run in clustered mode, developers already implemented a way to send messages to queues on remote nodes. So when I hacked rabbit_durable_queue and rabbit_queue tables and replaced PIDs pointing to local queues with PIDs pointing to remote queues, I got myself a remote queue forwarder.
How is this useful you might ask. RabbitMQ supports remote queues only in clustered mode. The way clustering is implemented today (using mnesia in distributed mode), it’s not recommended to run a rabbit cluster over non-LAN links. This is because if 2 rabbit nodes lose and then regain connectivity to each other, the cluster may enter a “partitioned network” state, which effectively means that rabbit cluster is not functional (in other words, mnesia sacrifices tolerance to partitioning in order to achieve consistency and availability – recall CAP theorem). With remote queue forwarding, you don’t need to set up clustering and hence “partitioned network” state won’t affect you by design – and that’s what mattered to me.
On the other hand, remote queue forwarding can potentially break some AMQP guarantees. For example, if a remote node is temporarily unavailable, Rabbit won’t queue the message for later re-delivery (because such situation is currently impossible in unhacked Rabbit). It means that YOU SHOULD NOT USE THIS HACK UNLESS YOU KNOW WHAT YOU ARE DOING.
The hack works for me though in the following scenario. I publish messages with immediate=true (indicates that payload is time sensitive). Messages are sent to N local queues. These N local queues are forwarded to N remote nodes, with a consumer attached to 127.0.0.1 on each node. In this project, I don’t rely on any AMQP guarantees – I discard basic.return commands and can tolerate occasional message not reaching some or even all consumers.
I doubt anyone will find this hack in its current form useful, but just in case I uploaded it to http://github.com/somic/rabbit_queue_forwarder.
PS. After I started this small project, Tony Garnock-Jones announced that his work on pluggable exchange types has been added to default branch. I still went ahead and published my post, but please note that pluggable exchange types could probably achieve similar effect more cleanly.
Tags: erlang · rabbitmq
November 17th, 2009 · Comments Off
I have recently noticed that costs were no longer always touted as the main driver for cloud computing – some have been advocating agility as the primary reason (for example, see here). It’s one thing when this theme gets mentioned in a talk at a technology conference where a company is sharing their experiences. But it’s a bit different when we start seeing it in vendors’ pitches – whenever anybody is trying to sell something new for more, I like to really understand what it is that I am paying premium for.
There are at least 2 types of agility that cloud computing could potentially enable.
The first is being able to provision resources faster than it would take in a traditional environment. Time to market, how fast you can see that an idea is working or is not working – both are positively affected.
The second is being able to right-size your resources (sometimes this concept is called “elasticity”). It eliminates the need to over-provision – start small, scale up when demand goes up, scale down when resources are no longer needed. In this case, agility refers to speed with which you can implement the upsize/downsize operation. No doubt cloud computing opens new opportunities in this space.
However, the most important question is – do any of these potential agility benefits apply to your use case (workload)? To me the answer is obviously not yes for every single project. Not every IT project in the world requires agility, and furthermore – not every IT project in the world requires agility at a price (remember who stands to benefit if all of a sudden all IT organizations in the world start putting premium on agility and become willing to cover it in real money when paying for services).
Again – I am not saying that none of IT projects could benefit from better agility. I am saying that not all projects could.
If your workload does not benefit from increased agility, your main driver towards cloud computing in theory should be cutting costs. It can take a form of explicit reduction in expenditures, or it can be paying the same for more, or it can be avoiding some cost which you’d otherwise have to pay. I am very far from accounting, but it’s my understanding that there may also be some benefit in spreading out costs over a longer period of time instead of putting up capital up front.
If you end up using cloud computing without taking advantage of increased agility and you didn’t cut costs migrating the cloud, I can think of several possibilities.
First, it’s possible that there is some cost that you simply overlooked. My favorite example in this area is Internet bandwidth. If you are in the cloud, you have a very fat pipe to the Internet – not something that every project accounts for.
Secondly, you may be trying to future-proof your staff – let them do a project in the cloud to gain experience, since cloud development and cloud operations may require slightly different skill set than traditional equivalents. Similarly, there are many proof-of-concept projects where main goal is to see what it would take an organization to do a project in the cloud, to see what breaks, what doesn’t work well, etc.
Thirdly, you may anticipate you will need agility or may anticipate potential for cutting costs in future projects, and you may be stepping into the cloud to avoid having to migrate these apps in the future anyway.
And finally, if none of the above applies to you, you must be doing it due to the hype. There is nothing wrong with this, as long as you understand it.
To summarize: cloud computing offers potential for unmatched agility; you may end up paying premium to take advantage of it; if agility is not on your list, you should get benefit of reduced costs; if you are not after agility and you don’t cut your costs, you should stop and think why exactly you are using cloud computing.
Tags: Economics · cloud computing
November 2nd, 2009 · 2 Comments
Disclaimer 1: Despite its possibly ominous name, this is NOT a network vulnerability or an attack that could lead to unauthorized access. UDP hole punching requires cooperation between two hosts, and hence can’t be easily used as an attack by itself (in other words, in order to run it, you most likely must already have gained access to the hosts).
Disclaimer 2: Conclusions reached at the end of this post are my educated guesses, and may turn out to be not true. They are based on my observations and not on actual knowledge how EC2 internals are designed or implemented.
I was once working on a setup in Amazon EC2 and came across an oddity, which when coupled with my interest in EC2 security groups mechanism, turned into this post.
UDP hole punching, in a nutshell, is a technique which allows two cooperating hosts, potentially located behind NAT and/or firewalls, to establish a peer-to-peer UDP communication channel directly to each other. It’s a technique used by Skype, for example, – you can read more about it in a Wikipedia article. If two hosts start sending UDP packets to each other on pre-agreed ports, bi-directional flow of packets leads NAT devices and firewalls to think that all these packets are a part of an established communication channel.
EC2 allows a lighter form of this technique because EC2 NAT never rewrites source port of outgoing packet (recall that in EC2, NAT is always 1-to-1 such that port rewriting isn’t necessary). We know with 100% certainty that a packet we are sending with a given source port X will be seen by remote instance with the same source port.
I wrote a small Python tool (available at http://gist.github.com/224795) to test UDP hole punching and set out to discover if it could work in EC2. My expectation was that it should work. Unless explicitly noted, I used ports above 45,000 and none of security groups explicitly allowed UDP traffic on these ports.
I was able to easily punch UDP holes between any two instances using each instance’s public IP address – in line with my expectation. But I hit a major snag when using private IP addresses of 2 instances in the same region (I used EC2-US) – I couldn’t get it to work no matter what I tried: same availability zone, different availability zones, same security groups, different security groups, same AWS account, different AWS accounts. I even tried punching a hole over port 53 (all EC2 instances support DNS name resolution which happens over this port without an explicit corresponding rule in security groups) – no luck (EC2 DNS servers are not located on 10.0.0.0/8 where all instances reside).
The only way I could get it to work using private IPs, is to allow my UDP port in security groups of at least one of the instances. When I did this, both hosts reported success.
This observation leads to several thoughts that might help uncover some aspects of EC2 firewall’s internal design (these are all more or less educated guesses):
- You can punch a UDP hole between any 2 instances using their public IPs, even if your security groups do not allow such communication.
- Private IP traffic is treated totally differently than traffic over public IPs.
- You can punch a UDP hole on port X using private IP addresses of 2 instances in the same region only if at least one of the instances allows port X in its security groups (can be used as a test if you don’t have access to query EC2 API endpoint)
- EC2 firewall somehow implements more logic than “all outgoing packets are allowed” when dealing with traffic over private IPs (if it were not the case, hole punching should have worked – see below).
- If we assume that security group rules are applied at an instance’s dom0 (as makes at least some sense and as this research implies), I now suspect that all dom0 hosts have entire view of all security groups in the region and are getting real time updates when a rule is added or deleted (modification of rules is currently not supported). This in fact was contrary to my expectation – initially I thought each dom0 “subscribes” to updates for only those security groups which correspond to instances running on this dom0 and I thought this was the reason why dynamic group membership changes were not possible (say I want to move an instance from “db” security group to “webapp” security group).
To clarify: under the above assumption, in order for hole punching to NOT work, an outgoing packet from instance A must not reach dom0 of instance B – and the only way it’s possible under “all outgoing packets are allowed” policy is if dom0 of instance A knows that dom0 of instance B will block this packet and somehow takes this into consideration – which in general case can only happen if all dom0 hosts have entire view of all security groups and permissions in the region.
I would love to hear your thoughts on what could possibly explain this behavior, please let me know in the comments below.
Tags: cloud computing · infrastructure development