Some dynamically-typed languages use pointer tagging as a quick way to identify or narrow down the runtime type of the value being represented. A classic way to do this is to convert pointers to a suitably sized integer, and add a tag value over the least significant bits which are assumed to be zero for aligned objects. When the object needs to be accessed, the tag bits are masked away, the integer is converted to a pointer, and the pointer is dereferenced as normal.
This by itself is all in order, except it all hinges on one colossal assumption: that the aligned pointer will convert to an integer guaranteed to have zero bits in the right places.
Is it possible to guarantee this according to the letter of the standard?
Although standard section 6.3.2.3 (references are to the C11 draft) says that the result of a conversion from pointer to integer is implementation-defined, what I'm wondering is whether the pointer arithmetic rules in 6.5.2.1 and 6.5.6 effectively constrain the result of pointer->integer conversion to follow the same predictable arithmetic rules that many programs already assume. (6.3.2.3 note 67 seemingly suggests that this is the intended spirit of the standard anyway, not that that means much.)
I'm specifically thinking of the case where one might allocate a large array to act as a heap for the dynamic language, and therefore the pointers we're talking about are to elements of this array. I'm assuming that the start of the C-allocated array itself can be placed at an aligned position by some secondary means (by all means discuss this too though). Say we have an array of eight-byte "cons cells"; can we guarantee that the pointer to any given cell will convert to an integer with the lowest three bits free for a tag?
For instance:
typedef Cell ...; // such that sizeof(Cell) == 8
Cell heap[1024]; // such that ((uintptr_t)&heap[0]) & 7 == 0
((char *)&heap[11]) - ((char *)&heap[10]); // == 8
(Cell *)(((char *)&heap[10]) + 8); // == &heap[11]
&(&heap[10])[0]; // == &heap[10]
0[heap]; // == heap[0]
// So...
&((char *)0)[(uintptr_t)&heap[10]]; // == &heap[10] ?
&((char *)0)[(uintptr_t)&heap[10] + 8]; // == &heap[11] ?
// ...implies?
(Cell *)((uintptr_t)&heap[10] + 8); // == &heap[11] ?
(If I understand correctly, if an implementation provides uintptr_t
then the undefined behaviour hinted at in 6.3.2.3 paragraph 6 is irrelevant, right?)
If all of these hold, then I would assume that it means that you can in fact rely on the low bits of any converted pointer to an element of an aligned Cell
array to be free for tagging. Do they && does it?
(As far as I'm aware this question is hypothetical since the normal assumption holds for common platforms anyway, and if you found one where it didn't, you probably wouldn't want to look to the C standard for guidance rather than the platform docs; but that's beside the point.)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…