No C programmer can call himself (or herself) a real programmer without having seen (and been amazed by) Duff’s Device. It was invented as a more efficient alternative to regular loop unrolling used in copying bytes from memory to a register (each byte went to the same address in memory), by handling the case when there are “leftover” bytes after a full unroll.
The original code looked something like:
do {
*to = *from++;
} while (--count > 0);
In Duff’s application (real time animation), this was too slow. The standard solution would be to unroll the loop, to minimize the number of jumps:
register n = (count / 8)
do {
*to = *from++;
*to = *from++;
*to = *from++;
*to = *from++;
*to = *from++;
*to = *from++;
*to = *from++;
*to = *from++;
} while (--n > 0);
but then one is left with somewhere between [0,8) bytes remaining, which have to be copied manually. By playing with the number of unrolled statements, one can balance minimum loops vs loops at the end, but there is still the need for some special handling.
Duff’s device interleaves a switch / case
statement in with the do / while
loop. This, coupled with C’s default fall through behavior (case
s “fall through” into the next case if not aborted with a break
statement), allows the following code to work:
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
This code starts by handling the special case (the remainder), and then performs as many batches of 8 as are necessary. Note that the switch
is only used to trigger count % 8
copies; after that, the while loop takes over (the beauty is that it knows to jump back to the do {
statement even though it may not have started there!), and copies 8 bytes at a time.
Note that this is NOT designed to replace memcpy
, but rather to provide an efficient way to copy many bytes into one.