Things that limit the performance of hardware acceleration

December 26, 2009

Suppose that you have an infinitely fast hardware accelerator, one that can compute something of interest in no time at all. What external issues limit the total performance advantage that you can get by putting this hardware accelerator in a system?

I can think of the following limiters:

  • main memory speed limits, the latency and bandwidth limits of system RAM. This limits how fast you can interact with system memory.

  • the speed limits of the underlying hardware that you're talking to, if you are. For example, hardware RAID cannot go faster (over the long term) than the speed of the underlying disks, and anything that talks to a network is limited by the network's latency and bandwidth constraints.

  • the setup and transaction costs for passing commands and data between you and the CPU. For instance, how many PCI reads and writes does it take to tell your hardware acceleration to do something, or to determine its status?

    (When thinking about this, it's important to also consider the speed impacts of any necessary memory barriers.)

  • some sorts of interrupts, and in general any need for CPU involvement and decisions in your actions. Having to wait for CPU involvement is effectively a pipeline stall in your processing, with all of what you'd expect from that.

    (Interrupts are not necessarily a performance limit by themselves, since they may just be notification to the CPU that it can pay attention to you. They generally will incur transaction costs, though.)

My impression is that a lot of the increasing sophistication of hardware in general has been driven by reducing the transaction costs of operations, starting with DMA and moving upwards from there. There once was a day when the OS poked a bunch of control registers for each operation; these days, the OS writes all of that information to control blocks in memory, then pokes the hardware once to point it at the control blocks.

Written on 26 December 2009.
« Linux's non-strict overcommit is the right default
Some OpenSSL and SSL certificate basics »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Dec 26 00:51:58 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.