RPC is surprisingly expensive
Once, long ago, I did a distributed computing project course where I tried to speed up a drawing program by moving its expensive floating point calculations (I believe it needed to calculate square roots for something) to a machine with a much faster CPU and floating point. We used Sun's RPC/XDR for this because it was both a great match for what we were doing and the obvious choice at the time, among other reasons.
Rather to our surprise the drawing program just didn't get much faster, even in the most unfavorable situation we could contrive (running on a Sun 3/50 without a floating point unit and sending the calculations to a much faster Sun 3 with a FPU). At the time the project sort of fizzled out amidst head-scratching, but now I can see that what I should have done (and what would have made an interesting report) was to dive into figuring out why it wasn't going much faster.
(I am pretty sure that if I could rewind time I would find the finger of blame pointed squarely at the overhead of RPC/XDR marshalling and demarshalling, although other interesting things might have also come up.)
However, ever since then I have had it firmly imprinted on the back of head that RPC can be surprisingly slow, and thus can be a source of hidden overhead in systems. Since all sorts of things involve 'RPC' in some form, even as general synchronous query/response message passing, this can very useful to remember.
For example, and what brought this to mind, consider the issue of memcached versus caching in SQL servers. Since normal SQL queries are already pretty time consuming, the wire format used to talk to the SQL server is probably more designed for things like system independence than high-speed, low-overhead marshalling. By contrast, with memcached you can store data blobs in a format that is as close to your memory layout as possible, and thus get demarshalling overhead down very low.