Serial number arithmetic
Many protocols and algorithms require the serialization or enumeration of related entities. For example, a communication protocol must know whether some packet comes "before" or "after" some other packet. The IETF (Internet Engineering Task Force) RFC 1982 attempts to define "Serial Number Arithmetic" for the purposes of manipulating and comparing these sequence numbers.
This task is rather more complex than it might first appear, because most algorithms use fixed size (binary) representations for sequence numbers. It is often important for the algorithm not to "break down" when the numbers become so large that they are incremented one last time and "wrap" around their maximum numeric ranges (go instantly from a large positive number to 0, or a large negative number). Unfortunately, some protocols choose to ignore these issues, and simply use very large integers for their counters, in the hope that the program will be replaced (or they will retire), before the problem occurs (see Y2K).
Many communication protocols apply serial number arithmetic to packet sequence numbers in their implementation of a sliding window protocol. Some versions of TCP use protection against wrapped sequence numbers (PAWS). PAWS applies the same serial number arithmetic to packet timestamps, using the timestamp as an extension of the high-order bits of the sequence number.[1]
Operations on sequence numbers
Only addition of a small positive integer to a sequence number, and comparison of two sequence numbers are discussed. Only unsigned binary implementations are discussed, with an arbitrary size in bits noted throughout the RFC (and below) as "SERIAL_BITS".
Addition
Adding an integer to a sequence number is simple unsigned integer addition, followed by unsigned Modulo operation to bring the result back into range (usually implicit in the unsigned addition, on most architectures.)
s' = (s + n) modulo (2 ^ SERIAL_BITS)
Addition of a value outside the range
[0 .. (2 ^(SERIAL_BITS - 1) - 1)]
is undefined. Basically, adding values beyond this range will cause the resultant sequence number to "wrap", and (often) result in a number that is considered "less than" the original sequence number!
Comparison
A means of comparing two sequence numbers i1 and i2 (the unsigned integer representations of sequence numbers s1 and s2) is presented.
Equality is defined as simple numeric equality. The algorithm presented for comparison is very complex, having to take into account whether the first sequence number is close to the "end" of its range of values, and thus a smaller "wrapped" number may actually be considered "greater" than the first sequence number. Thus i1 is considered less than i2, only if:
(i1 < i2 and i2 - i1 < 2^(SERIAL_BITS - 1)) or (i1 > i2 and i1 - i2 > 2^(SERIAL_BITS - 1))
Likewise, i1 is considered greater than i2, only if:
(i1 < i2 and i2 - i1 > 2^(SERIAL_BITS - 1)) or (i1 > i2 and i1 - i2 < 2^(SERIAL_BITS - 1))
Shortfalls
The algorithms presented by the RFC have at least one significant shortcoming: there are sequence numbers for which comparison is undefined. Since many algorithms are implemented independently by multiple, independent cooperating parties, it is often impossible to prevent all such situations from occurring.
The authors of RFC 1982 simply put:
While it would be possible to define the test in such a way that the inequality would not have this surprising property, while being defined for all pairs of values, such a definition would be unnecessarily burdensome to implement, and difficult to understand, and would still allow cases where s1 < s2 and (s1 + 1) > (s2 + 1) which is just as non-intuitive. Thus the problem case is left undefined, implementations are free to return either result, or to flag an error, and users must take care not to depend on any particular outcome. Usually this will mean avoiding allowing those particular pairs of numbers to co-exist.
Thus, it is often difficult or impossible to avoid all "undefined" comparisons of sequence numbers. However, a relatively simple solution is available. By mapping the unsigned sequence numbers onto signed Two's complement arithmetic operations, every comparison of any sequence number is defined, and the comparison operation itself is dramatically simplified. All comparisons specified by the RFC retain their original truth values; only the formerly "undefined" comparisons are affected.
General solution
The RFC 1982 algorithm specifies that, for N-bit sequence numbers, there are 2(N−1)−1 values considered "greater than", and 2(N−1)−1 considered "less than". Comparison against the remaining value (exactly 2N−1 distant) is deemed to be "undefined".
Most modern hardware implements signed Two's complement binary arithmetic operations. These operations are fully defined for the entire range of values for any operands they are given—since any N-bit binary number can contain 2N distinct values, and since one of them is taken up by the value 0, there are an odd number of spots left for all the non-zero positive and negative numbers. There is simply one more negative number representable than there are positive. For example, a 16-bit 2's complement value may contain numbers ranging from −32768 to +32767.
So, if we simply re-cast sequence numbers as 2's complement integers, and allow there to be one more sequence number considered "less than" than there are sequence numbers considered "greater than", we should be able to use simple signed arithmetic comparisons instead of the logically incomplete formula proposed by the RFC.
Here are some examples (in 16 bits, again), comparing some random sequence numbers, against the sequence number with the value 0.
unsigned binary signed sequence value distance -------- ------ -------- 32767 == 0x7fff == 32767 1 == 0x0001 == 1 0 == 0x0000 == 0 65535 == 0xffff == −1 65534 == 0xfffe == −2 32768 == 0x8000 == −32768
It is easy to see that the signed interpretation of the sequence numbers are in the correct order, so long as we "rotate" the sequence number in question so that its 0 matches up with the sequence number we are comparing it against. It turns out that this is simply done, using an unsigned subtraction, and simply interpreting the result as a signed two's complement number. The result is the signed "distance" between the two sequence numbers. Once again, if i1 and i2 are the unsigned binary representations of the sequence numbers s1 and s2, the distance from s1 to s2 is:
distance = (signed)( i1 - i2 )
If distance is 0, the numbers are equal. If it is < 0, then s1 is "less than" or "before" s2. Simple, clean and efficient, and fully defined. However, not without surprises.
All sequence number arithmetic must deal with "wrapping" of sequence numbers; the number 2N−1 is equidistant in both directions, in RFC 1982 sequence number terms. In our math, they are both considered to be "less than" each other:
distance1 = (signed)(0x8000 - 0x0) == (signed)0x8000 == -32768 < 0
distance2 = (signed)(0x0 - 0x8000) == (signed)0x8000 == -32768 < 0
This is obviously true for any two sequence numbers with distance of 0x8000 between them.
Furthermore, implementing serial number arithmetic using two's complement arithmetic implies serial numbers of a bit-length matching the machine's integer sizes; usually 16-bit, 32-bit and 64-bit. Implementing 20-bit serial numbers needs shifts (assuming 32-bit ints):
distance = (signed)((i1 << 12) - (i2 << 12))