Things that limit the performance of hardware acceleration
Suppose that you have an infinitely fast hardware accelerator, one that can compute something of interest in no time at all. What external issues limit the total performance advantage that you can get by putting this hardware accelerator in a system?
I can think of the following limiters:
- main memory speed limits, the latency and
bandwidth limits of system RAM. This limits how fast you can
interact with system memory.
- the speed limits of the underlying hardware that you're talking to,
if you are. For example, hardware RAID cannot go faster (over the
long term) than the speed of the underlying disks, and anything
that talks to a network is limited by the network's latency and
bandwidth constraints.
- the setup and transaction costs for passing commands and data
between you and the CPU. For instance, how many PCI reads and
writes does it take to tell your hardware acceleration to do
something, or to determine its status?
(When thinking about this, it's important to also consider the speed impacts of any necessary memory barriers.)
- some sorts of interrupts, and in general any need for CPU involvement
and decisions in your actions. Having to wait for CPU involvement
is effectively a pipeline stall in your processing, with all of
what you'd expect from that.
(Interrupts are not necessarily a performance limit by themselves, since they may just be notification to the CPU that it can pay attention to you. They generally will incur transaction costs, though.)
My impression is that a lot of the increasing sophistication of hardware in general has been driven by reducing the transaction costs of operations, starting with DMA and moving upwards from there. There once was a day when the OS poked a bunch of control registers for each operation; these days, the OS writes all of that information to control blocks in memory, then pokes the hardware once to point it at the control blocks.
|
|