Reactionary Visions

Sun, 19 Oct 2014

Anonymity in public life

In an article published in the Guardian yesterday, author Kathleen Hale recounts how her first book got some negative reviews by reviewers on a book review website. One reviewer in particular upset her and Kathleen ends up figuring out the reviewer is using a false identity, finds out who the reviewer really is and confronts her. The piece doesn't read to me like some sort of valedictory "I outed a fraud" type piece (though there are some passages in there which are questionable in that direction) and equally there are several passages where Kathleen expresses deep embarrassment and regret for the course of action she took. This episode, and that article in particular has caused substantial reaction: currently 600 comments on the Guardian article plus several other blog posts. There's no shortage of opinion to be found on Twitter either, as you'd expect.

The course of action that Kathleen took seems to be fairly undisputed as far as I can find. There is some dispute from some of the other blog posts as to exactly what was tweeted and said by whom, and there is dispute over Kathleen's claim that there are factual inaccuracies made in a review of her book. It is not disputed that the reviewer was using a false identity and that the reviewer had at least public Twitter, Facebook, and Instagram accounts under the false identity. The false identity was also a real name (Blythe Harris), by which I mean a name which if you introduced yourself by that name, no one would think you're using a false identity. This is distinct from claiming to be Peter Rabbit, or Buzz Lightyear.

Many people have equated Kathleen's actions with stalking. My dictionary defines the verb to stalk as:

  1. to follow or approach (game, prey, etc.) stealthily and quietly
  2. to pursue persistently and, sometimes, attack (a person with whom one is obsessed, often a celebrity)
  3. , 4,... [not relevant]

The second item there certainly fits. The British legal approach, whilst it gives no strict definition gives examples and guidance:

....following a person, watching or spying on them or forcing contact with the victim through any means, including social media.

The effect of such behaviour is to curtail a victim's freedom, leaving them feeling that they constantly have to be careful. In many cases, the conduct might appear innocent (if it were to be taken in isolation), but when carried out repeatedly so as to amount to a course of conduct, it may then cause significant alarm, harassment or distress to the victim.

I'm glad it includes "social media" there. Some comments have suggested that stalking "in real life" is worse than online. This seems bizarre to me: as if through a computer you are not interacting with other human beings but merely with shiny pixels who have no emotional capacity. "In real life" is everything we know. Whilst we're alive we have no personal experience of anything other than "in real life".

So I'm fairly sold on the whole argument that Kathleen's behaviour towards this reviewer can be considered stalking and as such is reprehensible.

To me, the far more interesting issue is the use of anonymity, false identities and any realistic expectation we have of privacy on the internet. A number of people who claim to write book reviews on such sites have suggested that the behaviour of Kathleen is exactly why they write their reviews under false names. I think there's something of a contradiction going on here.

But let's work backwards. Firstly, Kathleen, through some social engineering (she requested from the book review site the address of the reviewer so that she could post her a copy of the book) got the address of the book reviewer. She then used a telephone directory and census results to identify who really lived there (or likely owned the land). Now the use of the telephone directory seems a bit odd to me: telephony directories map names to numbers (and maybe addresses). Yes, you could use it to map an address to a name but it's very inefficient: you're essentially searching through the whole directory looking for the address whilst the directory is sorted by name, not address. So unless it was a very small telephone directory, I don't really buy that. Using census results is far more creditable: they're public documents and when they're online, they do allow you to search by address. In the UK you can only get access to the raw census details 100 years after the census has been published which, to a high probability, rules it out as a means to tie an address to a person who's still alive. You can get statistics and aggregates from more recent census results but you can't get the raw data. I'm assuming that in the US there's no such restriction on access to raw census data. If there is then I don't understand how Kathleen really managed to get a name for the owner of the property.

Instead, in the UK, if you want to find out who owns some land, you can pay the land registry £3 and they'll tell you. Presumably there are means by which you can legally hide this; I'm sure the rich have figured this out - probably some method by which some fake company in a tax haven technically "owns" the land and as they're registered abroad, they don't have to divulge any further details about that company. So yes, you could argue the Land Registry is profiting from facilitating stalkers, but equally there are a bunch of legitimate reasons to need access to such data and I can't think of any sane way to ensure the use of such a service isn't abused. So from that I conclude that unless the owner is a millionaire, the owner of any land is public knowledge.

The use of social engineering to get the address in the first place is more interesting but very obvious. This sort of thing happens a lot and sometimes to horrifying consequences (e.g. the Australian DJs who phoned up a hospital, pretending to be the Queen and Prince of Wales, enquiring as to the health of the Duchess of Cambridge. The nurse fell for the hoax and put the call through. Three days later, the nurse committed suicide). As a species we are not good at taking the time to verify who we're talking to or why. Whilst (hopefully) most of us would hang up if our bank apparently rang us and then asked for our credit card details "for security" this is largely only because it's in the bank's interest (in terms of cost of insurance) to reduce fraud, so they've trained us as such. But in all sorts of other scenarios we implicitly trust people we've no real reason to. A simple example: ticket inspectors on public transport. They may be wearing the uniform, but it could be faked. With their travel-card readers they could be seeing who has the expensive yearly travel cards, scanning the unique numbers from them and then using them to program up fraudulent cards. The crypto on those things is notoriously weak. Has anyone ever requested some means to verify the identity of a ticket inspector? And even if you could, how do you know they're not crooked regardless?

So phoning someone up, impersonating someone else, or pretending to have valid reasons to request the information you're requesting is always likely to work. It might be illegal in some cases, but it's certainly human nature to try to be helpful and if you're given a plausible justification, on what basis could you refuse the request unless it's contrary to some sort of company policy? In this case, if you're concerned about anonymity, wouldn't you be concerned about this possibility, and make use of an anonymous mail box?

Article 8 of the European Convention on Human Rights guarantees an individual's right to respect for privacy and family life, including correspondence. Is privacy the same as anonymity? No, definitely not:

In conflating anonymity and privacy, we have failed to see an important factual difference between them: under the condition of privacy, we have knowledge of a person’s identity, but not of an associated personal fact; whereas under the condition of anonymity, we have knowledge of a personal fact, but not of the associated person’s identity

The vast violations of our lives by state surveillance as revealed by Snowdon over the last year demonstrates the whole-scale collation of everything we do online and off by our governments. This is both being able to observe an action and identify the individual who caused it (thus we have no hope of performing any action anonymously), and being able to observe an individual and know the actions they take (thus no privacy). I can't work out whether the ECHR has anything to say on a right to anonymity; I get the sense that it doesn't try to protect that. So that's basically saying: "the state shouldn't record your every move (as that's an invasion of privacy), but moves that we're interested in, we can know who did them". Of course, we now know they're recording everything anyway.

We also know that computer systems can always be hacked into - there is no real security anywhere. Given a skilled and sufficiently funded adversary, any computer system connected in any way to the internet can be hacked into. Why? Because humans wrote the software that runs on those computers and humans are incapable of writing bug-free software. Look at all the large scale data breaches in recent history. Nothing is secure.

So we have laws that seem to try and protect privacy, but they're violated by our own governments, and in any case, we have countless examples of our inability to store any information securely. So is there really any hope to be able to exist with anonymity on the internet?

As ever, it depends who your adversary is. If your adversary is a government (either your own or some foreign government) then no, you have no hope. If it's a previous partner of yours who has no particular computer training, then yes, you're probably going to have a reasonable chance of being anonymous for a while. But you need to read up on this and think hard: it's not a trivial undertaking. There are some good guides as to how to do this, but:

All writers - whether writing under their own names or not - should be aware of the risks they may incur by hitting 'publish'.

What is the effect of hitting "publish"? It's to put more data points out there which may lead people to be able to identify you. The fewer data points out there, the better. So coming back to our book reviewer, if you want to review books anonymously, and if your justification for acting anonymously is to avoid being stalked by authors who don't like your reviews, then why put so many data points out there? Why have the Facebook page, the Instagram profile with the faked photos, the Twitter account? Why give your real postal address to the book review club knowing they're going to post books to it and might conceivably give your address out to other people?

The social media accounts in particular I find most odd. If you want to review books then review books. Build your following, your reputation and cachet on the quality of your reviews. If I'm looking at a book review I really don't care where you went on holiday, what your tweets are, or how many pets you have. Putting that information out there undermines your entire justification for being anonymous: if you want to be anonymous (i.e. you don't want people to find out who you are) then why are you putting so much unnecessary information out there that may allow people to figure out who you are?

Equally, use a name that clearly communicates to me you're trying to be anonymous: call yourself TheBookReviewer53, DostoyevskyLover or OrwellWasRight. Doing so doesn't lessen the validity of your opinions on your chosen subject and is more honest with people reading your reviews: it's overtly saying "I have reasons to want to exist anonymously on the internet". It reveals nothing more about your real identity either: regardless of the obvious fictitious-ness of your online persona, if you can be found, you can be found.

Researchers show that four data points about a person’s location can identify that person with 95% accuracy. FOUR. You think you can tweet anonymously from your phone? You think apps like Whisper allow you to act anonymously? As with pretty much everything related to the internet and computing, unless you've spent the last 20 years of your life working with computers, studying computers and thinking very hard about threat models and what data you're putting out there, and are utterly paranoid, you basically haven't got a chance. Do you turn off wifi on your phone when you leave the house? You should. You trust that USB pen drive you're transferring documents on? You shouldn't.

Finally and most obviously, any attempt at anonymity clearly doesn't insulate you from the law. As members of various hacking groups such as lulzsec found out, you always can be found out by law enforcement agencies. Yes, you might be able to make it difficult for a poorly funded person to come after you for libel (which is really just an indictment of the disgusting relationship between justice and money) but it's quite a risk to take. If you wouldn't put it in print with your real name attached, you're placing an awful lot of trust on your ability to maintain your anonymity against an adversary you probably don't know as well as you need to.

posted at: 16:00 | path: / | permanent link to this entry

Sun, 05 Oct 2014

Programming in the real world

One of the things that annoys me about Object Oriented Programming is how it's often suggested that it models the "real world". Frequently tutorials will start with creating an object modelling a chair, and through inheritance you'll be able to build up composable aspects of chairs: different numbers of legs, different colours, different designs. Sometimes they use tables rather than chairs. This is lovely, but it actually has everything to do with data modelling through inheritance, decomposition, abstraction and encapsulation, and almost nothing to do with Object Orientation: the key is that these chairs have no modifying methods on them. If they have any methods at all then they'll be for things like getting the number of legs or the colour, or volume or something - something that is basically fixed once the object is instantiated. At this point in such tutorials I'd probably claim this is not actually programming yet: all that's been achieved so far is that we've assigned some semantics to some numbers held in memory and we can write some numbers in memory. Programming is when we manipulate numbers: that involves reading and writing numbers.

The problem then is that Object Orientation immediately stops being about modelling the "real world" as soon as we can modify memory. If we think about how we actually would go about getting a chair made for us, it could go a bit like this:

  1. Go see your local carpenter,
  2. Have a discussion with them about the style and type of chair you'd like,
  3. They make the chair and give it to you in return for payment,
  4. You take the chair home,
  5. You decide you don't like the colour so you take it to your garage and repaint it yourself.

It should be clear that the inanimate object (the chair) is the odd one out here. Everything else is done by actors that have their own state, mainly act asynchronously, and can communicate with other actors through protocols - protocols that do not involve sharing mutable state (e.g. if I say something to you, that speech is immutable; you can't change what I've said (though you could choose to mishear me!)). At no point is any state of any actor actually exposed to another actor: I may share with you what I'm currently thinking, and you can try to influence me, but we don't exactly need a mutex around memory in my brain because YOU'RE NOT GETTING IN THERE!

If you tried modelling this sort of thing through Object Orientation without actors then you'd end up with your own thread doing all the work: it'd be you, it'd be the carpenter and it'd be the chair, maybe all at once. If your carpenter is in fact a growing business with a receptionist, a design team and a billing department your thread would be playing those roles too and would probably have to use locks to avoid unexpected interactions with other threads doing the same commissioning-receptioning-designing-constructing-delivery-repainting dance. And all the time, whilst you're doing the carpentry yourself, you'd could easily have your own thoughts, feelings, aspirations and regrets all on the same stack for your carpenter-alias to mess with.

Thus Object Orientation causes multiple personality disorder.

So in my view, the way Object Orientation gets introduced tends to be more like "useful tools for modelling data". But the OO approach to manipulating that data goes wrong as soon as you try to model the animated real world. Firstly it has nothing to say about separating out threads to self-contained actors (but try this in a language or on a platform without green-threads, or without the ability to preempt threads and you can quickly hit pain), and secondly even if you do have actors, OO encourages the sharing of mutable data rather than passing around either immutable data or copies of data. Yes, good programming discipline can result in sane designs and a successful result, but it's not a core aspect of the OOP mantra.

So, OOP has nothing good to say on manipulating data at all - it either says nothing or it encourages silly ideas like using locks. The data modelling bits are fine, but I think they're a broader concept beyond the confines of OOP. What else does OOP get you? An arbitrary restriction on the receiver of any method. That's about it. It's thanks to this restriction that writing combinators like cons on a list library in an OO language is really painful.

This week Erik Meijer wrote an article called The Curse of the Excluded Middle: "Mostly functional" programming does not work. After an introduction, we get onto The Problem, which (paraphrasing) is that languages that are mainly imperative but offer some features from pure functional languages are not as safe as pure functional languages.

The first three examples, from C# are certainly surprising to me (I barely know any C# at all though). The first two problems come from trying to compose side-effecting stuff with laziness. In the first case it's not clear that the problem is with the IO operation (printing things out) or actually with the laziness, but more the odd behaviour of the Where operator (presumably the implementation of Where doesn't know that a Cartesian product isn't necessary, but surely any normal monadic/list-comprehension implementation wouldn't have this problem?). The second case is certainly the terrifying composition of laziness with throwing exceptions and thus the exception having the potential to pop out anywhere where the lazy expression gets forced. However, if you know the Select operator is lazy, it's not really that surprising. It's arguably piss-poor language design that there's nothing there to help you, but C# doesn't have checked exceptions; apparently programmers don't like having to deal with errors so you reap what you sow. The third case is how C# has a nice using feature which binds a resource to a lexical scope. But if you construct a closure capturing the resource and then send it out of that lexical scope then using goes wrong (it will still discard the resource even though there's a reference to it within the closure which remains in-scope). This is certainly piss-poor language design: if the closure captures stuff from your lexical scope and you're not reference counting (or equivalent) your lexical scope then YOU'VE DONE IT WRONG. This is as bad as in C allocating stuff on your stack and then returning pointers to it.

Next he moves on somewhat tangentially to the point that if object creation is an observable action then you can't optimise it out. I'm not sure anyone outside a pure functional language specialist would ever want object creation to be optimised out, but the point is that if your constructor has side effects or can in any other way be observed then you can't have your language runtime do memoization of object creation. Doing side effects in object constructors has long been discouraged: I first read that back in the Effective Java book about a decade ago and I'm sure it wasn't exactly a ground-breaking piece of advice then.

So far then we have that side effects which are untracked have the potential to be bad: whether it's printing things out, or throwing exceptions, or discarding resources early, or preventing compiler optimisations. But next I feel the article goes a bit wrong. He first moves onto how channels in C⍵ can store state so they're not pure either, thus bad. And then goes onto how in Erlang you have the same problem as you're just modelling mutable state in actors:

Note how this Erlang actor basically encodes an object with dynamic method dispatch using the pattern-matching, message-sending, and recursion primitives of the language, which you may happily leverage to implement mutable references, sabotaging the fact that the Erlang language does not natively expose mutable state.

This is wrong: you cannot implement mutable references in Erlang. Data is immutable in Erlang so if you send some value out of an actor, you are sending that value. Not a reference to a value or variable. Even if you create a closure and send that out of the actor, the closure is capturing those values as they exist at that point in time. If you have received a value sent to you from an actor, you may use it to create other values, but doing so does not affect the "original", and similarly, the actor itself can continue to modify its own state, but it does not affect the values it sent to you. Yes, you can use Erlang actors to model objects. But an actor's own modifications of its state cannot be observed as side effects on values you've previously retrieved from that actor, and vice versa.

The reference you have to an actor is a process identifier (also immutable) which does not present any information itself about the state of the actor. Through that, you can send messages to an actor and test whether or not the actor is still alive, but that is all. And in any case, where has the sudden objection to mutable state come from? State is just a catamorphism on prior inputs. State is not the problem: unconstrained side effects are the problem. Certainly sharing mutable state is a problem (and you could argue that mutating shared state is a side effect and that it should be tracked statically), but Erlang does not allow for that.

He may have been better off going for an example of opening an file, sending the file handle to another process and then closing the file handle before it's been used (i.e. the same as the third C# example). Except:

  1. All file operations can return an error anyway so handling errors in such code is completely normal;
  2. In Erlang a file handle is normally an actor itself, so what you're doing is passing around a process identifier. Sending messages to a dead process (once the file is closed) is a normal activity and you can detect if the process has died in normal ways;
  3. If you bypass such normal file handling for performance reasons and open the file in "raw" mode then Erlang has a light form of object capabilities in which only the process that opened the file is allowed to use the file handle, so again the use of the file handle would error predictably;
  4. The language doesn't have the same C# feature for discarding resources once you return out of a lexical scope. Consequently closing a file is an explicit operation and given the asynchronous concurrent mindset one develops when working in Erlang, it's very likely you'll realise how odd it is to be closing a file handle whilst there's some closure out there which may not have been run yet.

Beyond this, he introduces the Haskell type system and explains that it captures side effects statically. As a result, by bowing to the demands of the type checker, it offers you a proof that if such effects occur, your program will handle them: exceptions will not go uncaught, IO operations are only permitted where the semantics lead to expected outcomes, resources are not used after they're discarded and the compiler can use all these proofs to do all manner of optimisations to your program.

These proofs can certainly be very valuable (though they are no substitute for disciplined, high quality design and careful implementation). Obviously, they don't capture everything though. Particularly relevant for concurrent and distributed programs, they don't capture sufficient side effects to allow for a proof of the absence of deadlocks. Haskell standard libraries contain channels and semaphores which can easily be used to sporadically end up with a deadlock between processes. A deadlock is definitely a side effect: the effect is the program probably stops working. The cause is an insufficient use of locks to control the scheduler (be it scheduling of OS threads or language runtime scheduling of green threads).

More broadly, the proof a type checker offers is that the specification you've provided (type signatures) is not violated by its inferences about your code. Until the type checker allows "and makes progress" as part of a specification, Haskell itself is no safer than any other language that claims to be "mostly functional".

posted at: 10:22 | path: / | permanent link to this entry

Sat, 27 Sep 2014

Reading highlights this week

I remember when I first went to University, coming across people who were both clearly extremely expert in their fields, from whom I wanted to learn, but were also religious, and how this utterly baffled me. At that point I would cheerfully describe myself as an avid atheist. My ignorance and naivety was somewhat extensive.

Over a decade later I like to think I have a more nuanced view. The most recent war in Gaza led, obviously, to vast amounts of suffering but some excellent articles on the subject (this one by Hadley Freeman in particular) helped me see perspectives more clearly and articulated how crucial it is to be precise with criticism: are you criticising a religion, a people, a government, a policy or something else? Nothing is ever black-and-white and it seems increasingly important to anticipate the consequences of an ill-thought-through comment or reaction. A good example of that is George Galloway's comments this week in the debate about this country once again getting involved in Iraq. On the face of it, and certainly without being remotely well-enough informed to evaluate the accuracy of his claims, if his claims on the size and makeup of ISIS/ISIL are true then there seems little likelihood that the bombing campaigns being discussed will be effective, and quite likely counter-productive. But all of that got lost due his description of Iraqis as quiescent. The way in which that description was seized upon by other MPs and the resultant media storm resulted in the over-shadowing not just of the rest of his contribution to the debate, but also of other important aspects of the debate, such as the resignation of Rushanara Ali (Labour's Shadow Minister for Education), citing once again the lack of a credible long-term plan for the region and our involvement.

Addressing the broader and somewhat more abstract issue is this enlightening article by Karen Armstrong. Again, I'm not claiming to be expert in the area, merely I found the article very educative. It had barely occurred to me that the western world's separation of the secular from the sacred was firstly such a recent occurrence, and secondly that it arose from a specific set of circumstances. There is no implicit reason why separation of state from church is an inevitable or even likely happenstance (to me, this reminds me of the question "if humans evolved from monkeys, then why can't we find monkeys still evolving into humans today?", to which the answer is "the circumstances are not right for that to occur"). The fact that the English word "religion" can't really be translated accurately into other languages (especially not languages that predate English such as Greek or Latin; as historically faith is all encompassing of life, not merely a private affair as we treat it today in the west) starts to show quite how odd the separation of secular from sacred in the modern west really is.

More interesting still is the observation that in the west, belonging to a Nation has in some ways subsumed the role of belonging to a Religion, only apparently with more positive overtones: we consider it almost reprehensible to die for your religion, but honourable to die for your nation. It would seem the concept of even belonging to a nation and having any sense of greater community outside your immediate surroundings only came about with the increased ability of governments to engage with (or intrude upon) their citizens. Before that point, presumably with church attendance widespread and frequent, one's interaction with "the wider world" was through the representative of the church. This would seem to explain a lot about why governments of the past sought the blessing of their nation's church for particular courses of action: maybe the church was seen as the bridge between the government (or monarchy) and the people. The whole article is worth a read.

posted at: 15:08 | path: / | permanent link to this entry

Sat, 20 Sep 2014

Concurrency, Actors, Locks and mailboxes

Having been part of the original team that wrote RabbitMQ (indeed I wrote the very first prototype, back in the summer of 2006), and having worked full time with Erlang since 2009 until fairly recently, it's been interesting doing some work in Go recently.

Go's a language I currently have mixed feelings for. In some ways I like it - it's simple, it doesn't take too long to learn, the syntax is clean, the tool chain is superb (compilation is quick), performance is very good, it's very easy to drop into C whenever you really need to. It also has one very nice high end feature: first class channels - you can pass channels over channels which is pretty powerful. But equally, it's statically typed and has a terrible type system (i.e. no generics. Personally, I don't feel like the proofs offered by type checkers are worth much to me so I'd much rather have no static type checking than one that is this primitive and brain dead), it's not extensible properly (e.g. you can't create your own data structures which work with the range) and worst of all, there's no pattern matching. The lack of pattern matching is particularly horrific given Go's "best practise" of returning tuples from functions (except they're not really tuples - another mistake in the language), the right most of which indicates an error or success. But you can't pattern match on the assignment so you end up with endless if-statements that explicitly check the error for nil. There are other irritations which I've found, particularly related to its locks package (i.e. non-re-entrant; can't upgrade read to write; waiting for a write blocks all attempts to gain reads. Yes, I know I'm free to implement my own locks package if I want to).

Go also doesn't push you to using actors - you have to re-implement all that yourself if you want to. In a recent project, I started off with some locks and within about three days found it utterly impossible to reason about which go-routines can hold which locks and whether or not there's any deadlock potential. Inevitably, there was. So I ripped all the locking code out and wrote my own actor loops.

This was quite interesting as here I could now be more flexible than Erlang. I think most people think that actors mean "only one thread/process/routine can read and write its state" - there is a key concept of owning that state and regulating access to it. However, what I found is that I actually only needed to limit modifying the state to a single go-routine: each actor-receive-loop routine would take a write lock on its state whenever it needs to modify its own state, but it's perfectly reasonable to have anyone read the state, provided they take a read lock before doing so. The fact we can share pointers in Go makes this possible, whereas it's impossible to do this in Erlang (well, not quite - if you use ets then you can do it, which is exactly what we do in RabbitMQ in the rabbit_msg_store - but it's certainly not pretty!). So now we can have concurrent reads and no need to pass read-requests over a channel/mailbox. This seems pretty nice to me.

Recently I was reading a paper and it suggested that:

In message passing systems, processes interact exclusively by sending and receiving messages and they do not have access to shared memory.

Firstly, on a very technical note, they do have access to shared memory - the mailbox or queue is exactly that. The key reason why it leads to more composable systems is that when you hold the lock to write into a mailbox, you can never do anything other than write into that mailbox - you can never try to acquire multiple locks, so you can't deadlock in this way. And that's even assuming you're using locks for mailboxes - queues make lovely structures for lock-free concurrent access.

Secondly, as I suggest above, it appears to be safe to allow multiple concurrent readers of an actor's state, provided modifications to the state are done atomically by the actor thread - though more care has to be taken now to ensure updates are consistent - you have to make sure you update all the state you need to change in one go under a write lock (the sort of transactional semantics you end up needing to ensure makes me heavily think of STM). Whilst I would probably still call such a system a "message passing system" I can certainly imagine others would disagree and at a minimum it's some sort of hybrid (you could argue that the side effect of releasing the write lock when you've finished modifying the state is to publish an immutable copy of the state to any and all subscribers that want it - except without all that overhead. When viewed in these terms, it makes more intuitive sense that it's safe - provided of course that you don't do anything blocking whilst you're holding a state read-lock). This design also seems to get a fair bit trickier once you get to distributed systems and the need to have proxy objects representing the state and interface of a remote actor. By comparison, in Erlang a reference to an Actor is an immutable process identifier of Pid which is easy to send around and reason about.

But mainly I was thinking about the pattern of data flow: a mailbox allows multiple writers to send data to a single reader (a gather operation, maybe). The actor loop allows the exact opposite: a single reader of the mailbox can then affect multiple things (a scatter) - either by sending out messages to many other actors (in essence, a push action), or by (as I suggest above) modifying state which can be concurrently read by many other actors (correspondingly, a pull action). In my mind's eye, I see a sort of concertina effect as all these messages are pushed into a mailbox, and then slowly the effects of each message spread out again. In some ways it seems slightly odd how powerful this is, but in other ways it makes perfect sense: if you consider a finite state machine then your mailbox is just the stream of events coming into it and you have your little automaton updating the state with some operation combining the current state with the current message. It is the very fact that the next state is dependent on the current state and the current message that requires mutual exclusion around modifying the state. And of course by ensuring that that mutual exclusion lock is (implicitly) held in absence of any other locks that makes actor systems so much easier to reason about and understand - any deadlocks that occur are at the protocol level and, if you model your protocols between actors properly, can be determined statically (though I'm not aware that anyone actually does this - false positives may abound).

This then made makes me think about how, once all actors have done their initialisation and reached the core actor loop, the entire system is purely event driven. When looked at like this, are we really sure actors are enough? Are there not other forms of expression that capture the relation between events as inputs, with state, and an output more cleanly? In particular I'm thinking of things like Join calculus and Functional Reactive Programming. Given that actors are apparently becoming rather more mainstream these days, I wonder if that really means they're only part of the solution: sure I can write large distributed systems that scale, perform well, don't deadlock or livelock and are exceedingly robust. But I can I write them with less code and cleaner semantics?

posted at: 15:02 | path: / | permanent link to this entry

Welcome

Yet again, a new blog. This one is much simpler than the last. Whilst one always hopes, don't expect the frequency of posts to be greater than any of the previous blogs here... The styling is currently non-existent - seriously, there is no CSS right now. Sorry.

This one is much simpler than the last - gone is Serendipity and PyBlosxom is in, in its place. I'm currently trialing using muse to write posts. Not entirely sure it's worth it, but I need to get my emacs foo improved so working with a new major mode is likely a good idea.

posted at: 13:32 | path: / | permanent link to this entry