February 6th, 2010 · 1 Comment
From time to time, I come across a statement that every service on the Internet must have an API, or people behind this service are doing it wrong. This phrase usually applies specifically to publicly available API.
As a user who stands to benefit from increased number of services allowing third-party applications and mashups, I certainly tend to agree. But as a developer, I realize that prematurely making API public may be a disaster.
Publication of API represents a long-term commitment. You as a developer are committing to supporting this API for some non-trivial amount of time (at least 12 months I would imagine) and are essentially inviting other developers to create new functionality against this API. No one likes to spend their time developing against a given API just to discover shortly that API changed, or some functionality that used to be offered is no longer available.
By making your API public you are signaling that this part of your system is very stable, its functionality well established, understood and developed, usage patterns well thought out. Or at least that’s how I as a third-party developer interpret your action.
If you know your audience well enough and are pretty confident that they won’t mind your tweaking things after initial publication, you may take a risk. Twitter famously launched their API very very early, and in the end it proved a huge success for them. (So if they listened to my advice in this post, they would be worse off).
But not all developer audiences may be as agile and forgiving as Twitter’s. I can imagine a very conservative big user of your API that will very strongly object to your changing the API. What do you do next? Maintain 2 versions? But what if underlying database schema changes make old API incompatible with what you are trying to do in the future? Fork and host 2 different systems, old and new? I can’t honestly imagine a worse scenario.
My advice – before publishing your new API, make sure you are not going to force yourself into a corner down the road. Only publish API for those parts of your systems that are very stable (both operationally and from perspective of internal mechanics and functionality) and where usage patterns are well researched and predictable to a certain extent. Don’t rush it.
Tags: Internet · software engineering
January 26th, 2010 · 1 Comment
In December 2009, Amazon Web Services team introduced yet another innovation – spot pricing for EC2 instances. Several sites were created shortly to track spot price history by creating price charts. But price charts are relatively boring – juicy meat is in the dynamics hidden inside series of numbers which represent the price history. Let’s do some exploring!
Several notes first.
- All references to times and dates below are GMT for all regions.
- Spot instances went live on December 14, therefore I ignore all data points before that (for simplicity, my cutoff was set at UNIX timestamp 1260777600 – it’s 8am on December 14 GMT, which translates to midnight in Seattle where AWS is headquartered).
- Spot price history was obtained on January 25, 2010 at 10:54pm via API and cached locally for analysis.
- In order to be able to deal with integers instead of floats, all prices below are represented in points where 1,000 points = $1 per compute hour.
- Each product is specified as [region, instance_type, product_description] tuple.
- I am only going to outline facts below, all interpretation is up to you.
- These results have not been exhaustively verified, my analysis code may have bugs. Use at your own risk.
#1 Price averages
Here is a chart of average spot price for each product relative to regular price for the same product (averages take into consideration for how long each price was valid). Percentage next to a product identification represents the ratio between average spot price and regular price.

#2 Price increases in a row
Maximum number of price increases in a row was 6. It occurred on January 23-24 for [us-west-1, m1.large, Windows] and the price went up from 256 to 273.
5 price increases in a row happened also once, 4 in a row – 16 times, 3 in a row – 95 times, 2 in a row – 643 times, and a single increase immediately followed by price reduction happened 2,433 times. Of the latter, 684 times (28%) were a single price increase followed by price returning to where it used to be right before the increase (X -> X+Y -> X).
#3 Individual price increases
Maximum single price increase in absolute terms was 928 – it occurred for [us-east-1, m2.2xlarge, Windows] when the price went up from 572 to 1,500. Second biggest was 890 for [us-east-1, m1.large, Linux/UNIX] and third biggest – 551 for [us-east-1, m1.small, Windows]. Note that all of these occurred in us-east-1.
The biggest price increase as percentage of the regular price was 460% when a price for [us-east-1, m1.small, Windows] jumped from 49 to 600 on January 24. The second and third biggest in the category were 262% increase for [us-east-1, m1.large, Linux/UNIX] (110 -> 1000) and 64% increase for [us-east-1, m2.2xlarge, Windows] (572 -> 1500).
The same two biggest increases were also the biggest price increases as percentage of current spot price – 1,124% and 809%, respectively. Third place in this category was a 186% increase for [eu-west-1, m1.small, Linux/UNIX] when the price went up from 28 to 80.
Here is a chart showing price increases and reductions day by day.

#4 Number of datapoints per product and/or product family
There were a total of 4,469 spot price revisions for Windows and 3,885 for Linux/UNIX. By region, us-east-1 had the least price revisions in total – 2,491, of which 1,254 were for Windows and 1,237 for Linux/UNIX (50.3% vs 49.7%). A total of 2,809 price revisions in eu-west-1 were distributed 1,518 for Windows vs 1,291 for Linux/UNIX (54% vs 46%). A total of 3,054 price revisions in us-west-1 were distributed 1,697 for Windows vs 1,357 for Linux/UNIX (56% vs 44%).
[eu-west-1, m1.small, Windows] had the most price revisions – 287. [us-east-1, m2.4xlarge, Windows] had the least – 40.
Across all regions combined, the most price revisions per day happened on January 22, 2010 – 351 price revisions.
#5 Percentiles
Here is a Google Fusion table with percentile estimates for each product. I tried to calculate percentiles from 50th through 95th (step 5) and 99th, but since a price function consists of discrete values, not all percentiles could be estimated. For each percentile, a nominal price is provided along with its percentage of the regular instance price for a given product. Percentiles take into consideration for how long a given price was valid.
#6 Spot price over regular price
Situations when a spot price is equal or exceeds regular price are especially interesting. Most such situations occurred in us-east-1, and none of them occurred in eu-west-1.
Spot price has reached but not exceeded the regular price for [c1.xlarge, Linux/UNIX] twice, [c1.medium, Windows] twice, [m1.small, Windows] 6 times, [m1.large, Windows] once – all in us-east-1.
In us-west-1, spot price for [m1.large, Linux/UNIX] exceeded the regular price by 20 for under 2 hours on December 29.
Spot price for [us-east-1, m2.2xlarge, Windows] exceeded the regular price by 60 for over 20 hours on January 11-12.
Spot price for [us-east-1, m1.large, Linux/UNIX] exceeded the regular price by 64 on December 17 and by 660 twice on December 18.
And finally, spot price for [us-east-1, m1.small, Windows] exceeded the regular price by 480 once and by 430 once – both on January 24.
Conclusion
There are hardly any surprises in the spot price history so far, but it’s only been less than 2 months since the feature was launched. As the usage ramps up, I expect it will become more interesting. Kudos to AWS team for coming up with this innovative pricing mechanism and being the first to introduce it at such a large scale in a real environment. Only time will tell if it will stick in its current form or if it will morph into something else (I have a couple of ideas), but the first small step towards dynamic pricing of computing resources has been made.
Tags: cloud computing
January 11th, 2010 · 3 Comments
Designing a fully-automated or nearly-fully-automated computer system with many moving parts and dependencies is tricky, whether a system is distributed, hyper distributed or otherwise. Failures happen and must be dealt with. After a while, most folks grow up from “failures are rare and can be ignored” to “failures are not that rare and can not be ignored” to “failures are common and should be taken into consideration” to “failures are frequent and must be planned for.” The latter seems to represent the current prevailing point of view.
But here is a kicker – it’s not the end. I saw this tweet, read this post and checked out a book by Charles Perrow titled “Normal Accidents” from the library. Published in 1984, the book is not about IT, but its material fits our field nicely. And boy, was I enlightened!
The book’s main point: no matter how much thought is put into the system design, or how many safeguards are implemented, a sufficiently complex system sooner or later will experience a significant breakdown that was impossible to foresee beforehand, principally due to unexpected interaction between components, tight coupling or bizarre coincidence. For us in IT, it translates to “no matter how much planning you do or how many safeguards you implement, failures will still happen.”
There are at least 3 common themes that are present in multiple illustrations in the book:
- A big failure was usually a result of multiple smaller failures; these smaller failures were often not even related
- Operators (people or systems) were frequently misled by inaccurate monitoring data
- In a lot of cases, human operators were used to a given set of circumstances, and their thinking and analysis were misled by their habits and expectations (”when X happens, we always do Y and it comes back” – except for this one time, when it didn’t)
I have had my share of outages and downtimes, and I can attest that I have seen these 3 factors play a big role in tech ops. Some were bugs in management and monitoring code, some where human error, some where bizarre set of dependencies but all were a combination of multiple factors. For example, who would have thought that with a failure of primary DNS resolution server, the VIP would not fail over to the secondary; and even though hosts had more than one “nameserver” line in /etc/resolv.conf, application timed out waiting for DNS to respond before getting to ask the second nameserver; without name resolution, multiple load balancers independently thought that there was no capacity behind them (because management code calculated capacity in near real-time relying on worker hosts’ names) and disabled themselves, thus taking down the entire farm – now I know of course…
It turns out we can’t eliminate normal accidents altogether, but here are several techniques that I have been using to speed up detection and response in order to reduce the downtime.
Complexity budget. Described by Benjamin Black, this is a technique to allocate complexity among components beforehand and strictly follow the allocation during implementation phase. It helps avoid unnecessary fanciness and leads to simpler code, which tends to be easier to troubleshoot and recover after a failure.
Control knobs/switches for individual components. As John Allspaw shows on this slide, you need to be able to turn off any component in an emergency, or throttle it up or down. Planning this feature and building it in from the very beginning is very important.
Accuracy of monitoring data. Ensure your alarms are as accurate as possible. No matter how much chaos is going on inside the system during a severe failure, last thing you can afford is misleading the operators with wrong information. If you tried to ping a host A and didn’t get a response, your alarm should not say “host A is down” because it’s not the knowledge you obtained – it’s an assumption that you made. It should say “failed to ping host A from host B” – maybe it was network on host B that was an issue when a ping attempt was made, how do you know?
Availability of monitoring data. There is a reason first thing the military try to do when attacking, is disrupting enemy’s means of communication – it’s that important, which applies to our case as well. You either design your systems to be able to get monitoring data even during the worst outage imaginable (ideally from more than one source), or you at least should be getting an alarm about lack of such monitoring data (it’s a very weak substitute though).
All in all, to everybody in IT, I highly recommend the Normal Accidents book as well as this whitepaper (linked from John Allspaw’s blog).
Tags: distributed · infrastructure development · software engineering
December 18th, 2009 · 2 Comments
As most of you probably know, I work at CohesiveFT where I focus on VPN-Cubed product. In short, it’s a solution to build overlay networks in third-party clouds. Overlay networks in this case are based on redundant encrypted point-to-point connections from your regular servers to your VPN-Cubed servers called “managers” (that you run in the cloud); managers then act as virtual switches and routers of this overlay, which essentially sits above your physical network. In other words, an overlay network gives a customer effectively a LAN-like network where the servers can be located pretty much anywhere, including in the cloud.
However, not all people know what an overlay network is or what its benefits and strengths are. This holiday season, as we were putting up our outdoor decorations and holiday lighting, I realized that what my wife and I were doing was essentially building an overlay network. Let’s follow the similarities.
Imagine a regular house with a front yard where for the holidays you want to set up a bunch of lighted Christmas trees, deer and other holiday figures. All of them require electricity – but there is no power installed in the ground (parallel with VPN-Cubed overlay network: you are deploying servers to third-party cloud and want to continue using your IP addressing schemes, want to ensure that all communications are encrypted – but provider doesn’t offer any of these services out of the box).
You don’t need power out on your front yard all year around – so there is usually no point in investing money in installing one. Cloud computing is all about elasticity. As a complement to clouds, VPN-Cubed is easy to set up and take down if necessary for an experiment, or it can be running for long periods of time.
There are several outdoor outlets on the front wall so you are deciding to power your decorations from these outlets (you have VPN devices installed on the edge of your network – you will use them to offer connectivity to your servers from your network using VPN). The first obvious solution is to run a power cord from each piece towards an outlet. While it’s possible in theory, it will turn out ugly in practice. Firstly, a lot of long outdoor power cords are expensive. Secondly, it will create a cabling mess near the outlet. Thirdly, if a cord goes bad, you need to trace where exactly it’s plugged in and replace it. Fourthly, the more stuff you have to power up, the more difficult this octopus made of power cords is going to be. Absolutely the same problems apply in our parallel use case.
So you come up with optimization #1 – you go out and buy several outdoor power strips with several outlets each. By placing these power strips where your lighted trees and deer are, you are reducing cabling issues, gain ability to use shorter power cords and most likely save money on power cords. That’s your VPN-Cubed manager server instance. When you place it next to your cloud-based servers, you reduce latency for your endpoints and cut down on VPN connections from the edge of your network that you need to build and maintain.
If you are well prepared (i.e., have enough of everything), your composition will drive how many power cords and strips you will need and how long your cords need to be, not the other way around. Same with VPN-Cubed – you mold it to fit your use case, your desired topology or application – you don’t adjust your application to be able to work within VPN-Cubed overlay network.
Outdoor power strips have additional protection to let them function outdoors in low temperatures. And so are VPN-Cubed manager instances – they are running a hardened OS, with minimal set of enabled services, behind firewall protection. You can grab a regular switch and make it work outdoors – but why waste your time when these things don’t cost that much? Same with VPN-Cubed.
But power strips may fail – and if they do, entire section of your composition will be turned off. So you get a cold standby sitting in your garage in case a primary goes out. Or better – you install 2 power strips next to each other, connect them and evenly plug in your endpoints. If one goes out, you switch all connections to the other strip and it’s back. VPN-Cubed allows you to deploy a hot spare with automatic failover capability, which can help balance the load as well. Your outdoor lighted Christmas tree is connected to one power strip at any given time, but if one fails it can be reconnected to another within a power cord distance. Same with VPN-Cubed – your servers are connected to a single manager at any given time, but if a manager becomes unavailable, your servers can automatically re-connect to another manager.
And what happens if one of your outlets goes bad? Moving a handful of cables to another outlet is much easier than moving a whole lot. Same with VPN-Cubed – if your network loses one entry point, you just re-connect VPN-Cubed to another.
There are many more parallels between the two. Most of us have been building overlay networks of decorations for quite some time. Building overlay networks for the cloud may be new, but CohesiveFT VPN-Cubed product makes it easy and fun. Don’t be stuck with long power cords – get yourself some nice outdoor power strips. And enjoy the holidays!
Tags: cloud computing · cohesiveft · infrastructure development
This past Thursday I had a chance to attend CloudCamp Boston that took place in Microsoft research center in Cambridge, as a representative of CohesiveFT. The event was very well attended, and I was able to meet a lot of smart interesting people working in cloud computing space.
The lightning talks section started with Microsoft representative giving an overview of Azure. Then, John Willis gave a talk titled What color is your cloud in which he talked about various types of IaaS clouds. George Reese gave a talk where he compared a successful cloud deployment to reaching the Emerald City and pointed out that the yellow brick road to this goal is not always an easy one (great analogy!). Iron Mountain representative gave a talk about how one needs to be aware where their data are in the cloud, and emphasized security measures at their datacenter. Intuit representative talked about their PaaS, which allows developers to easily reach millions of small businesses already running Intuit products. PaaS is not my thing, but the idea makes sense and if I understand correctly, is very similar to idea behind Salesforce platform – develop against something which many organizations already use. And finally, Cory Von Wallenstein of Dyn, operators of a well-known DynDNS.org, gave a talk about their enterprise features like anycast DNS, CDN etc. Interestingly, now that I think about it, DynDNS offered a way to update DNS programmatically way back when, which definitely qualifies them as one of the earliest cloud APIs out there.
After a break, I attended a cloud security talk titled Cloudifornication by Chris Hoff. I’ve seen the slides and video of this talk before (for example, see here), but seeing it live was more than worth it. This is a very good and important talk for all cloud practitioners and especially architects and developers, and I highly recommend it. I personally had 3 main takeaways. Firstly, information security is based on C-I-A (confidentiality, integrity, availability). Therefore, any outage or service disruption is classified by a customer as a security issue, not only as an SLA issue. I didn’t know about this until a couple of weeks ago when Chris explained it to me on Twitter, and the talk also emphasized this fact. Secondly, I loved a series of slides about increasing complexity of interconnects as more and more vendors, intermediaries and brokers are added to one’s cloud mix. We at CohesiveFT are very aware of this as an emerging issue, and our VPN-Cubed product is targeted at such cases, among other things.
And thirdly, Chris very skillfully highlighted the brittleness of the foundation on top of which we collectively as an industry are currently building out our cloud offerings. When Internet was designed as it was for world wide web and static pages, it was all good. When we started doing e-commerce and social media on top of the same infrastructure, the risk increased many-fold but was still somewhat manageable (after all, it’s only buying stuff online). But now with cloud computing we are putting absolutely everything (!) on top of the same brittle foundation, and the risks are truly enormous.
Then, I attended a session on private cloud led by John Willis, where we discussed various private cloud technologies and ideas. My main takeaway was that there is or there will be a huge demand for private externally-operated clouds for mid-sized organizations, and that’s where I think future of colo and hosting is going to be.
All in all, this was a great event and thanks to all organizers and sponsors for putting it together, and to all participants for interesting discussion.
Tags: cloud computing
December 2nd, 2009 · Comments Off
Since earlier this year when I got my copy of Erlang book, I’ve wanted to do something unconventional with RabbitMQ source. I finally came up with an idea, which is somewhat interesting and maybe even useful, and could be done by an Erlang beginner like myself.
Some background first. Each Erlang program in general consists of multiple Erlang processes that send and receive messages to/from each other. These are not your regular processes – these processes are running inside the Erlang VM, and are not mapped to either processes or threads on the system level in any way. They are cheap to create, and communications between them are fast. Each process has a process ID (pid) associated with it. If multiple Erlang VMs share a cookie (a piece of text which is used as a security token), processes on any Erlang VM can freely talk to each other without any code modification.
RabbitMQ implements each queue as an Erlang process. You can think of AMQP exchanges as pieces of routing logic – when a message arrives, Rabbit applies the routing rules to its routing key and sends the message to all eligible queues. Since each queue is a process and has a pid, in a nutshell exchanges send the message to several pids.
In non-clustered mode, Rabbit routing function will return a list of local PIDs by selecting values stored in a mnesia table (rabbit_durable_queue and rabbit_queue). But since RabbitMQ can also run in clustered mode, developers already implemented a way to send messages to queues on remote nodes. So when I hacked rabbit_durable_queue and rabbit_queue tables and replaced PIDs pointing to local queues with PIDs pointing to remote queues, I got myself a remote queue forwarder.
How is this useful you might ask. RabbitMQ supports remote queues only in clustered mode. The way clustering is implemented today (using mnesia in distributed mode), it’s not recommended to run a rabbit cluster over non-LAN links. This is because if 2 rabbit nodes lose and then regain connectivity to each other, the cluster may enter a “partitioned network” state, which effectively means that rabbit cluster is not functional (in other words, mnesia sacrifices tolerance to partitioning in order to achieve consistency and availability – recall CAP theorem). With remote queue forwarding, you don’t need to set up clustering and hence “partitioned network” state won’t affect you by design – and that’s what mattered to me.
On the other hand, remote queue forwarding can potentially break some AMQP guarantees. For example, if a remote node is temporarily unavailable, Rabbit won’t queue the message for later re-delivery (because such situation is currently impossible in unhacked Rabbit). It means that YOU SHOULD NOT USE THIS HACK UNLESS YOU KNOW WHAT YOU ARE DOING.
The hack works for me though in the following scenario. I publish messages with immediate=true (indicates that payload is time sensitive). Messages are sent to N local queues. These N local queues are forwarded to N remote nodes, with a consumer attached to 127.0.0.1 on each node. In this project, I don’t rely on any AMQP guarantees – I discard basic.return commands and can tolerate occasional message not reaching some or even all consumers.
I doubt anyone will find this hack in its current form useful, but just in case I uploaded it to http://github.com/somic/rabbit_queue_forwarder.
PS. After I started this small project, Tony Garnock-Jones announced that his work on pluggable exchange types has been added to default branch. I still went ahead and published my post, but please note that pluggable exchange types could probably achieve similar effect more cleanly.
Tags: erlang · rabbitmq
November 17th, 2009 · Comments Off
I have recently noticed that costs were no longer always touted as the main driver for cloud computing – some have been advocating agility as the primary reason (for example, see here). It’s one thing when this theme gets mentioned in a talk at a technology conference where a company is sharing their experiences. But it’s a bit different when we start seeing it in vendors’ pitches – whenever anybody is trying to sell something new for more, I like to really understand what it is that I am paying premium for.
There are at least 2 types of agility that cloud computing could potentially enable.
The first is being able to provision resources faster than it would take in a traditional environment. Time to market, how fast you can see that an idea is working or is not working – both are positively affected.
The second is being able to right-size your resources (sometimes this concept is called “elasticity”). It eliminates the need to over-provision – start small, scale up when demand goes up, scale down when resources are no longer needed. In this case, agility refers to speed with which you can implement the upsize/downsize operation. No doubt cloud computing opens new opportunities in this space.
However, the most important question is – do any of these potential agility benefits apply to your use case (workload)? To me the answer is obviously not yes for every single project. Not every IT project in the world requires agility, and furthermore – not every IT project in the world requires agility at a price (remember who stands to benefit if all of a sudden all IT organizations in the world start putting premium on agility and become willing to cover it in real money when paying for services).
Again – I am not saying that none of IT projects could benefit from better agility. I am saying that not all projects could.
If your workload does not benefit from increased agility, your main driver towards cloud computing in theory should be cutting costs. It can take a form of explicit reduction in expenditures, or it can be paying the same for more, or it can be avoiding some cost which you’d otherwise have to pay. I am very far from accounting, but it’s my understanding that there may also be some benefit in spreading out costs over a longer period of time instead of putting up capital up front.
If you end up using cloud computing without taking advantage of increased agility and you didn’t cut costs migrating the cloud, I can think of several possibilities.
First, it’s possible that there is some cost that you simply overlooked. My favorite example in this area is Internet bandwidth. If you are in the cloud, you have a very fat pipe to the Internet – not something that every project accounts for.
Secondly, you may be trying to future-proof your staff – let them do a project in the cloud to gain experience, since cloud development and cloud operations may require slightly different skill set than traditional equivalents. Similarly, there are many proof-of-concept projects where main goal is to see what it would take an organization to do a project in the cloud, to see what breaks, what doesn’t work well, etc.
Thirdly, you may anticipate you will need agility or may anticipate potential for cutting costs in future projects, and you may be stepping into the cloud to avoid having to migrate these apps in the future anyway.
And finally, if none of the above applies to you, you must be doing it due to the hype. There is nothing wrong with this, as long as you understand it.
To summarize: cloud computing offers potential for unmatched agility; you may end up paying premium to take advantage of it; if agility is not on your list, you should get benefit of reduced costs; if you are not after agility and you don’t cut your costs, you should stop and think why exactly you are using cloud computing.
Tags: Economics · cloud computing
November 2nd, 2009 · 2 Comments
Disclaimer 1: Despite its possibly ominous name, this is NOT a network vulnerability or an attack that could lead to unauthorized access. UDP hole punching requires cooperation between two hosts, and hence can’t be easily used as an attack by itself (in other words, in order to run it, you most likely must already have gained access to the hosts).
Disclaimer 2: Conclusions reached at the end of this post are my educated guesses, and may turn out to be not true. They are based on my observations and not on actual knowledge how EC2 internals are designed or implemented.
I was once working on a setup in Amazon EC2 and came across an oddity, which when coupled with my interest in EC2 security groups mechanism, turned into this post.
UDP hole punching, in a nutshell, is a technique which allows two cooperating hosts, potentially located behind NAT and/or firewalls, to establish a peer-to-peer UDP communication channel directly to each other. It’s a technique used by Skype, for example, – you can read more about it in a Wikipedia article. If two hosts start sending UDP packets to each other on pre-agreed ports, bi-directional flow of packets leads NAT devices and firewalls to think that all these packets are a part of an established communication channel.
EC2 allows a lighter form of this technique because EC2 NAT never rewrites source port of outgoing packet (recall that in EC2, NAT is always 1-to-1 such that port rewriting isn’t necessary). We know with 100% certainty that a packet we are sending with a given source port X will be seen by remote instance with the same source port.
I wrote a small Python tool (available at http://gist.github.com/224795) to test UDP hole punching and set out to discover if it could work in EC2. My expectation was that it should work. Unless explicitly noted, I used ports above 45,000 and none of security groups explicitly allowed UDP traffic on these ports.
I was able to easily punch UDP holes between any two instances using each instance’s public IP address – in line with my expectation. But I hit a major snag when using private IP addresses of 2 instances in the same region (I used EC2-US) – I couldn’t get it to work no matter what I tried: same availability zone, different availability zones, same security groups, different security groups, same AWS account, different AWS accounts. I even tried punching a hole over port 53 (all EC2 instances support DNS name resolution which happens over this port without an explicit corresponding rule in security groups) – no luck (EC2 DNS servers are not located on 10.0.0.0/8 where all instances reside).
The only way I could get it to work using private IPs, is to allow my UDP port in security groups of at least one of the instances. When I did this, both hosts reported success.
This observation leads to several thoughts that might help uncover some aspects of EC2 firewall’s internal design (these are all more or less educated guesses):
- You can punch a UDP hole between any 2 instances using their public IPs, even if your security groups do not allow such communication.
- Private IP traffic is treated totally differently than traffic over public IPs.
- You can punch a UDP hole on port X using private IP addresses of 2 instances in the same region only if at least one of the instances allows port X in its security groups (can be used as a test if you don’t have access to query EC2 API endpoint)
- EC2 firewall somehow implements more logic than “all outgoing packets are allowed” when dealing with traffic over private IPs (if it were not the case, hole punching should have worked – see below).
- If we assume that security group rules are applied at an instance’s dom0 (as makes at least some sense and as this research implies), I now suspect that all dom0 hosts have entire view of all security groups in the region and are getting real time updates when a rule is added or deleted (modification of rules is currently not supported). This in fact was contrary to my expectation – initially I thought each dom0 “subscribes” to updates for only those security groups which correspond to instances running on this dom0 and I thought this was the reason why dynamic group membership changes were not possible (say I want to move an instance from “db” security group to “webapp” security group).
To clarify: under the above assumption, in order for hole punching to NOT work, an outgoing packet from instance A must not reach dom0 of instance B – and the only way it’s possible under “all outgoing packets are allowed” policy is if dom0 of instance A knows that dom0 of instance B will block this packet and somehow takes this into consideration – which in general case can only happen if all dom0 hosts have entire view of all security groups and permissions in the region.
I would love to hear your thoughts on what could possibly explain this behavior, please let me know in the comments below.
Tags: cloud computing · infrastructure development
October 21st, 2009 · Comments Off
This is a quick note in case anyone is having the same issue.
When building erlang R13B02-1 on a 64bit non-SMP machine (not sure if it matters), “make -j 2″ somehow resulted in an error which I could not work around. Reverting to simply make (without -j 2) and starting compilation from the very beginning fixed it.
Also, after final make install, I could not start erl – it was complaining about “start.boot not found”. The solution is to symlink boot files like this:
cd /usr/lib/erlang/bin
ln -s /usr/lib/erlang/releases/R13B02/start.boot .
ln -s /usr/lib/erlang/releases/R13B02/start_clean.boot .
ln -s /usr/lib/erlang/releases/R13B02/start_sasl.boot .
I configured it with “./configure –prefix=/usr –disable-x –enable-threads –enable-kernel-poll –disable-hipe”.
Tags: erlang · rabbitmq
October 13th, 2009 · 4 Comments
Most of you have probably heard about a recent outage at BitBucket. In a nutshell, their systems hosted at AWS came under a UDP flood DDoS attack, which led to significantly increased traffic, which led to saturation of their local network interface, which led to their being unable to connect to their data stored on EBS, which led to their application becoming unresponsive.
This outage shed more light on some internal designs of EC2 itself, as described here. It might have also showcased our over-confidence in EC2’s ability to detect and defeat certain types of network attacks. But this post is about something else.
BitBucket was running their web front door and their backend application on the same instance. Front door is a part of the system which is facing the Internet and its task is to accept connections from clients. For obvious reasons, front door is running on the service’s discoverable IP address – whether they used Elastic IP or not, bitbucket.org resolved to that IP. Note that front door (usually) doesn’t need EBS.
Backend, however, is what needs EBS for disk persistence. At the same time, backend does not need to be publicly discoverable – as long as front door knows where its backend worker(s) is/are running, the app should be functioning just fine.
With front door and backend running on different instances, UDP flood would have saturated only the former’s network interface and would have had no impact on the backend and its EBS.
I know that AWS reportedly fixed the flood issue, but looks to me like separating front door and application backend may still be a good preventive measure – after all, it’s considered a good practice for a reason.
Please note that I am not trying to accuse BitBucket of running a bad architecture and causing their own outage. All I am doing is trying to learn a lesson.
Tags: cloud computing · infrastructure development · software engineering