On the Powers and Dangers of Caching
Posted: July 19, 2011 Filed under: Engineering Leave a comment »Operating systems, the internet, video games, social media – they all would not exist without significant hardware and software infrastructure devoted to caching. The same is true for us: We make enormous demands on a host of distributed caching solutions, some open source, some commercial, all carefully optimized for a particular workload.
Our front line of defense is our content distribution system, which replicates content that can be statically cached, out to network edges for speedy delivery. Automated tasks in our system determine which pieces of content are globally usable and push them out to the edges. Our JavaScript infrastructure does the same on browsers, pulling in data from the correct places at the right times so as to get optimal performance. This allows us to handle enormous traffic spikes.
A second line of defense is our web-tier content cache, which serves content to authenticated/connected users. This is a large, but all-in-all straightforward caching system.
More interesting is our query result cache system, a multi-level distributed caching system that lazily replicates cache data from centralized memcached servers to in-memory caches on our web tiers. This enables us to handle cache system failures gracefully, so that cache restarts do not result in massive load spikes on our database clusters. Indeed, this is one of the main dangers of caching: When caching servers go down, rewarming those caches can drive back-breaking load on back-end systems, and so managing cache hydration and persistence is a key task for systems with 24×7 uptime.
Also interesting is our approach to cache locality: Since modern databases can serve single-row queries on clustered-index data incredibly quickly, improving object fetch performance is best done by avoiding network round trips, rather than query overhead. Therefore, we store query results in remote caches, but never objects, which, if they are cached at all, are always cached on the web tiers.
Using these approaches, as well as many others, such as sophisticated ETag exchanges, we enable our infrastructure to scale to the tens of thousands of requests per second that our customer base drives at peak hours.
On SOLID Code: D
Posted: July 5, 2011 Filed under: Engineering | Tags: SOLID Leave a comment »To close out the discussion on SOLID, the D in SOLID stands for “Dependency Inversion” and is currently a bit of a darling in the industry, with significant efforts from many communities and companies to provide infrastructure that allows developers to implement this practice.
Now, dependency inversion does not require an inversion of control container, nor does it even imply that one must use dependency injection. These patterns are natural outcomes of this principal, that states that components should not depend upon concrete implementations, but on abstractions. More concretely, and often, that components should not depend on classes, but on interfaces.
The most powerful example of how this principal can help companies scale is in unit test suites: In a company that I’ve worked at previously, which built on of the world’s biggest applications, running regression tests on even a tiny part of the system could take days, and entire labs and teams were responsible for test execution and reporting. Startups simply cannot afford this kind of overhead (arguably, these days, no one can afford this kind of overhead.) The requirement for tests to run fast is a matter of survival. So consider the example of a startup, such as Sociable Labs, that is hosted in the cloud (Amazon, in our case) and makes use of many cloud services – key stores, binary storage systems, databases, distributed queues, Facebook’s Open Graph, the Twitter API, and much more. Now imagine running a full regression suite that runs thousands of tests against all of these systems. It would, and does, take hours to days, and new tests are constantly being added.
In order to be able to regression test our own systems independent of all of these systems, we provide mock implementations of all of the above services: We have mock Facebook implementations, mock S3 implementations, mocks of the database repositories and of our configuration server. In order to be able to tell our code to use these mocks, the code must not rely on any concrete implementations of S3 or Facebook clients – they must rely on interfaces so that we can provide one implementation at unit test time, and another during full regression passes or in production.
There are many systems that enable this sort of configuration, and most come down to some central authority, such as a service locator or container, which is configured via some mechanism so that it knows what concrete implementation to provide for a particular interface.
As an interesting aside, the classic Singleton pattern can easily become a massive source of violations of this principal. We use a combination of the service locator pattern and inversion of control to give our developers a very clean, intuitive view of the system that is actually more convenient and simple than the Singleton pattern, but without that pattern’s hard dependency code smell.
On SOLID Code: I
Posted: July 5, 2011 Filed under: Engineering | Tags: SOLID Leave a comment »The I in SOLID stands for “Interface Segregation” and is a hallmark of high quality, disciplined framework design: As system requirements change over time, there is often the temptation to short-circuit codepaths, to add just one more API to some class to make things just work. This is one of the ways in which code rot enters systems: One boundaries and responsibilities of systems are not clear, there is momentum towards tighter coupling/more brittleness in the system.
Interface segregation is something of a special case of single responsibility: An interface tells us that an object supports a certain set of operations, and interface segregation adds that that set of operations must have a maximal degree of cohesion. For example, an interface such as IObjectCache should really only have three methods: add, get, remove. It is tempting to start growing this interface, to add methods such as queryKeys, and getStatistics, increment, decrement, push and pop. As we add new methods, the interface becomes harder and harder for new implementations to support, and so decreases the usefulness of the interface. Breaking that interface down into a hierarchy allows classes to implement more or less capabilities, and clients to more clearly declare which particular set of capabilities they need. Goodness all around.
Applying Minimal Viable Product to Software Design
Posted: June 16, 2011 Filed under: Engineering, General Leave a comment »Process + Judgement
Posted: June 16, 2011 Filed under: Engineering, General Leave a comment »- By “meaningful”, we mean those questions that tell us what our risks are for any action: Schedule risk, quality risk, customer risk.
- By “iterative”, we mean that processes are constantly being tuned to better answer questions.
Hiring: Striving for a 1% success rate
Posted: June 16, 2011 Filed under: Engineering, General Leave a comment »On SOLID code: L
Posted: June 15, 2011 Filed under: Engineering | Tags: SOLID Leave a comment »The L in SOLID stands for “Liskov substitution” and you have been coding long enough, you have probably violated it and quickly found that the violation caused your system to become fragile and code to stop being reusable or stable.
Consider the case of a class that represents a vehicle: It has methods for getting velocity, checking the fuel level, adding fuel, and so on. Now imagine that the system is expanded to support solar-powered vehicles, and that those are written as subclasses of the vehicle class; SolarPoweredVehicle : Vehicle.
And now we have a problem: SolarPoweredVehicle has inherited the methods for checking fuel level and adding fuel, but these methods are meaningless for solar vehicles.
Liskov substitution states that subclasses of a class should not violate the contract that their base class provides. As another example, consider a subclass of a Circle class whose setRadius method interprets a value passed in as a size in inches, whereas the parent class interprets it as pixels. This would prevent the subclass from being passed to code that is expecting the base class.
Often, violations of Liskov substitution are a sign of improper modeling, of improper specification of abstractions. We do collaborative technical design sessions to try and tease out these issues and specify, together, the right architectures, build by composing the right sets of abstractions. Then we move fast and refactor continuously to drive the architecture to implementation while maintaining maximal correctness as the design meets the difficulties presented by the real world.
On SOLID code: O
Posted: June 15, 2011 Filed under: Engineering | Tags: SOLID Leave a comment »The O in SOLID stands for “Open/Closed Principle”, and in practice encompasses a number of programming practices that enable the construction of useful, stable lass hierarchies. When building a class, every part of the class should be as closed as possible for modification, and as open as needed for extension. In practice, this means that all methods should be private unless absolutely necessary. All members should be both private and final, again, unless absolutely necessary.
Classes should be closed for modification in order to enable a number of things. First and foremost, members should be private so that operations on them are handled by the root class, which can then be modified and versioned in ways that can be guaranteed to be both backwards compatible and stable. Once encapsulation is broken, base classes become both unstable and difficult to version, since there is no clear contract as to how and when their state can be modified. The most stable, versionable class has immutable state and no public members. Such classes are often not particularly useful or interesting, although they do exist (for example, one could imagine a class responsible for doing periodic healthchecks against a cluster of machines and recording the results.) When opening a class up to extension, such as declaring methods protected, there should be a clear, explicit contract as to the expected behavior of extensions. When opening methods up for public or package consumption, again, there should be both a clear, well understood behavioral contract, and subclasses should not be able to alter this contract, since that would violate the ability for those subclasses to be reliably substituted for their base class.
The use of interfaces and abstract base classes where appropriate also enables the closing of classes for modification, since modifications that break the interface’s contract, or that of the abstract class will be caught by the compiler. The intelligent use of interfaces and abstract classes is incredibly important: Successful framework design, which enables high degrees of reuse, often relies on dependencies on abstraction (interfaces, abstract classes,) rather than concretions. We put a great deal of thoughts into what things in the system should be interfaces, and where abstract classes can be deployed. Such tools provide wonderfully useful constraints as well, ensuring that failure to write extensions to a framework in the correct way are caught as early and quickly as possible.
On SOLID code: S
Posted: June 15, 2011 Filed under: Engineering | Tags: SOLID Leave a comment »Robert Martin, one of the fathers of the Agile Programming movement, introduced this acronym about a decade ago and it is somewhat surprising that so few engineers that we interview have even heard of it.
We use SOLID as a lens, a way to think about our code and to mentor new engineers, especially during code reviews.
The S in SOLID stands for “single responsibility” and appropriately, it is often the first thing we look for: Does a class clearly declare and then abstract away a single function of the system? Classes that take on more than one responsibility result in confusing APIs, brittle behaviors, and are difficult to version and reuse. The notion of “responsibility” can be a bit difficult for people to grasp, so we start with an example of a class that exhibits an extreme code-smell called “coincidental cohesion”: In this smell, a class attempts to cover multiple, unrelated functions, such as providing both logging and XML parsing services. A less extreme smell, which we see more often is naive code that exhibits only logical cohesion – where all functions that look the same from a code perspective are grouped together. In this case, the engineer has placed all code for, say, querying the database, in one class. Such classes exhibit rapid growth, and often have circular dependencies on other classes.
We look for classes that explicitly declare their dependencies, and which have a very clear name. Beware classes with names that end in “Service” or “Manager” or “Utils” – such general names can sometimes indicate that the class is taking on more than a class should.

