When designing an API, one of the considerations you usually have to address is concurrency. In other words, for every class in the API, what is the class’s threading policy? At the very minimum, an API should document how it behaves with regard to concurrent access, and even better is an API designed with concurrent access in mind.
Writing a good API that is concurrency-friendly is hard. More than anything, it requires lots of reasoning about how all of the moving pieces will work together under concurrent access. In this article I’m going to discuss a few simple guidelines you can follow when designing APIs and reasoning about concurrency. The specifics of what I’ll discuss apply to Java, although the general concepts probably apply equally well to .NET and other similar environments.
The goal is, in general, to choose the design that is the easist to reason about and still has acceptable concurrent performance and API usability. Here are four guidelines that can be used as a starting point when thinking about what the threading policy of a class should be:
1) For non-collection-oriented classes, default to immutable
2) For collection-oriented classes, default to mutable and thread safe
3) Prefer collections of immutable elements
4) For collections of mutable elements, make copies and treat the contained elements as immutable internally
These guidelines will help you to design classes that make it easier for the consumer of your API to reason about concurrency policy. This reasoning is made possible by clear documentation of each class’s design for concurrent access. Failure to consider the threading policy for a class leads to under-documented APIs, unexpected behavior at runtime, and obscure bugs that are hard to reproduce. Documenting the concurrency of an API is essential in order to guide the API’s consumers to correctly using the API.
Note that documenting an API as “not thread safe” is a valid design choice. It may not be the best choice (depending on the API). However, it is better to document that a class isn’t thread safe than to make the API consumer guess or rely on undocumented behavior that could change.
In the discussion below, I make a distinction between collection-oriented classes and non-collection-oriented classes. Collection-oriented classes contain multiple similar child objects (elements). They are usually easy to recognize since they often contain methods to add elements, remove elements, find elements, and perform similar operations across the contained elements. Collection-oriented classes shouldn’t be confused with composite classes that are made by composing together dissimilar classes.
Now I’ll go over each guideline. Remember that ultimately, we are looking at a class and trying to determine what a reasonable threading policy for that class might be.
For non-collection-oriented classes, default to immutable
Immutable classes are one of the best design choices you can make when designing an API. For classes that aren’t collection-oriented, default to using an immutable design.
Only design mutable classes if it seems that API consumers would be greatly inconvenienced by the immutable classes. Usually though, immutable non-collection-oriented classes aren’t a big hassle for users.
Immutable classes have the inherent property of being safe for simultaneous access from multiple threads. You don’t have to do any internal or external locking. This is the first big advantage of immutable classes - thread safety for free and no performance hit under concurrent access.
Another big advantage of immutable classes is that when you use them, you can more easily build a mental model of the system you’re designing. Since immutable classes are very easy to reason about, it takes less mental RAM to think about how instances will behave at runtime - they can only ever be in a single state (post construction).
Immutable classes are also a great way to set API consumer expectations. One of the most frustrating things about an API is when some class, say a service of some kind, takes a mutable object as initialization data. What happens after you hand the mutable configuration data off to the service? Are you still allowed to modify the configuration data? Does the service care? Will it break? Using an immutable class design for the configuration data instead solves these problems.
Designing immutable classes is extremely easy in Java. Briefly, here are the requirements you should satisfy when designing a class in order to call it immutable:
1) All fields in the class should be declared final
2) All fields should be either immutable objects or mutable objects that are not mutated outside of a constructor
3) The this reference does not escape a constructor
4) No mutable objects passed to a constructor are retained by the instance or any of its components
5) No mutable objects escape the instance
Satisfying these requirements is not the only way to design a thread safe immutable object, but other techniques are much harder to explain and require deep knowledge of Java’s memory model.
For collection-oriented classes, default to mutable and thread safe
When designing a collection-oriented class, the default choice should be to write a mutable container that can be safely accessed by multiple threads concurrently. Most collection-oriented classes should be mutable since that’s what the API consumer will expect. When using a collection-oriented class, the most common reason is because you want to modify the collection (add or remove elements) or modify the contained elements themselves. An immutable collection-oriented object makes consumers jump through hoops to do this.
Often, collection-oriented classes in an API will be used only from a single thread for many scenarios. It can be tempting not perform the necessary locking to make these classes thread safe, since that locking will be unnecessary for the majority of usage scenarios. In this case, one valid design decision would be to skip the locking and document the collection as not being thread safe. However, doing this penalizes the users of the class who are in concurrent-access scenarios, since the burden of implementing thread safety is now on them.
I recommend going ahead and doing the internal locking to make these types of classes thread safe. On modern Java runtimes, the cost of uncontended synchronization is extremely low (JCIP talks about this in detail). Even when the common case is single-thread use, it’s better to design a class to be thread safe if there are potential concurrent scenarios. If a profile reveals that the locking is a hotspot when using the class, then you have a good reason to avoid it. Otherwise, assume that locks are essentially free when uncontended.
Although it’s not the first choice I’d make, for some APIs immutable collection-oriented classes may not be a bad decision. You get all of the advantages mentioned above for immutable classes. If API consumers will not often need to mutate the collection or the collection’s elements, doing this may make sense. For example, some collection-oriented objects are often just passed around to other parts of the API, and rarely changed. If you decide to design an immutable collection-oriented class, be sure to document this very explicitly. Also, avoid method names that imply that the receiver is being changed as they can be confusing to a casual user of the API. An immutable collection-oriented class should not have a “public void add(Foo f)” method since the signature of that method implies that it alters the receiver.
Prefer collections of immutable elements
Whenever possible, collection-oriented classes should contain immutable elements. This helps reduce confusion about the API, since it is clear that the state of the elements can’t be changed while the collection contains them. I’ve found that designs that use small, immutable objects as building blocks and contain them in mutable containers tend to be very robust and easy to understand.
The biggest reason this is important is that most non-trivial collection-oriented classes have one or more invariants that they must enforce. For instance, a collection-oriented class may store elements that each have an identifier of some sort, and the collection may guarantee that contained elements will have unique identifiers. This is just an example - often the constraints can be much more complicated. If the contained elements are externally mutable, it will be very hard or even impossible for the collection to enforce those invariants.
Mutable collections of immutable objects are also very fast to make copies of. Since the contained elements are immutable, copying the collection only involves making a shallow copy. Supporting copies of collections is important for many APIs, so anything that makes this easier and faster is a win.
For collections of mutable elements, make copies and treat the contained elements as immutable internally
Unfortunately, it’s not always possible to design using only collections of immutable elements. Often, for one reason or another, the collection must contain mutable elements. One case of this is a collection-oriented class where the elements themselves are collection-oriented. No matter the specifics, there is a design pattern you can follow in this case.
The collection should make copies of the mutable elements as they are added or retrieved, ensuring that no external clients have a reference to the actual contained instances. In other words, every time a mutable element is added to the collection, the collection makes a copy and adds the copy instead. Every time a mutable element would be obtained, a copy is obtained instead. Internally, the collection should treat the contained elements as though they were immutable and should not call any method on the elements that could change them. By doing this, the collection will contain “effectively immutable” elements. The collection is then free to enforce constraints on the contained elements and know that no external client can break the constraints.
I can attest that this kind of design works well, but it does require some careful API documentation. Without proper documentation, clients may expect that they can obtain a contained element, mutate it, and have those changes automatically show up in the collection. Instead, this kind of design facilitates a more transactional usage. Clients obtain an instance, perform some changes to it, and then must add that instance back in to the collection. This usually isn’t a huge burden on the clients as long as expectations are set correctly.
That covers the 4 guidelines. Note that these guidelines are really meant to only cover simple cases - however, the simple cases make up the bulk of most APIs. There are certainly complex classes that don’t fall easily into one of the categories above, and they will need to be designed with more thought.