However, what if the object is managed by a remote web service? What if two threads reading and writing the same object is a common thing? Locking an object remotely is not easy - one has to deal with the situation where client goes away and leaves the lock hanging and other nasty corner cases.
Consider the following example. Let's say we have a structure that has title, description, and a checksum:
#define MAX_TITLE 128
#define MAX_DESCR 512
struct Record {
char title[MAX_TITLE];
char descr[MAX_DESCR];
unsigned long crc;
};
The record lives inside a blob store. Depending on the implementation, the blob store may or may not know about the internal structure of the record. To make the exercise interesting, let's assume that it does not own the crc computation algorithm - this happens on the client.
Depending on the API, the client can either write parts of the structure, or all at once.
The catch with updating the parts of the structure is that there is a field (crc) that depends on all of the rest of the structure.
Consider the following scenario: thread A reads the structure, modifies the title, and computes the new crc. Thread B reads the structure, modifies the description, and computes the new crc. Then both of them write the result of their work as (title, crc) and (description, crc). The new data structure ends up with the new title and the new description, but the crc is wrong - depending on who wrote last, it would be computed based on either the new title and the old description, or the old title and the new description.
The consistency problem can be rectified by only allowing writing the whole record at once, title, description and crc. This comes with its one set of problem - an early reader, later writer can wipe out changes by a different thread that writes its changes in between. In this scenario thread A reads the structure and updates title and crc. Thread B reads the structure and updates description and crc. Thread B then writes the data first, then thread A writes its changes out - wiping out modifications by thread B.
Short of distributed lock service (or a DTC) which makes the job very complicated and potentially rather slow, there is no direct solution to this problem. There is, however, a way to redefine the problem that makes it much easier: allowing the store to fail conflicting writes.
The question then becomes much simpler - how do we detect the conflict?
One easy (and quite standard way of doing it) is to maintain a version of the structure.
#define MAX_TITLE 128
#define MAX_DESCR 512
struct Record {
char title[MAX_TITLE];
char descr[MAX_DESCR];
unsigned long crc;
unsigned long version;
};
The version is read by the client and is handed back as part of the update request. It is incremented by the server every time the structure has been modified. If the client supplies the version number other than what currently is in store, the server assumes that the structure has already been modified by someone else, and fails the write. The client then can re-read the structure, reapply its changes, and attempt the write again.
Some systems use a time stamp instead, which makes it somewhat harder for a malicious client to game the system. The disadvantage of course is that one needs a counter with a high enough resolution that guarantees that a read and a write cannot complete within one "tick". Ever growing system performance coupled with the desire of OS vendors to preserve backwards compatibility (in this case, the resolution of a counter defined by a given API set) makes this approach brittle.
This approach is easy to understand, but it has a disadvantage - it exposes the internals of the concurrency management to the client and makes the client responsible for behaving in good faith.
How can we transfer the responsibility for the atomicity and correctness of the updates to the server? It turns out to be rather straightforward: for every read that results in the later update, we will let the CLIENT supply us with a hard to guess identifier (say, a GUID) to server as a transaction ID. We will accumulate these IDs in the internal implementation of the object (never to be returned to a client). We will only allow writes that are accompanied by a GUID that is known to us, and we will clear the known GUIDs the moment the write succeeds, immediately making all other updates stale.
// Synchronized externally
class RecordInternal {
private:
Record record;
setreads;
public:
void Read(GUID *id, Record *rec) {
*rec = record;
if (id)
reads.add(*id);
}
boolean Write(GUID &id, Record &rec) {
if (reads.count(id) == 0)
return false;
record = rec;
reads.clear();
return true;
}
}
Potential improvement would be to keep a map between the GUID and the timestamp of the read request instead of the set of all GUIDs - this would allow us to run down very old outstanding updates which will probably never complete because of the client failure.
On the client side, it would be convenient to wrap the messy business of generating the GUID in an "update" class, making the use of the API look as follows:
Update update;
Record record;
service.Read(update.id(), &record);
...update the record...
service.Write(update.id(), record);