2013-09-13

Type punning, aliasing, unions, strict-aliasing, oh my!

Imagine that you have a struct s which sometimes needs to be volatile (because it maps to a set of hardware registers) but other times you'd rather have it non-volatile (because you are working on a bunch of such structs that have been stored, so they are no longer used to interface to the hardware and you could do without the performance penalty of volatile).

That is easy enough: you can have your variables declared volatile or not depending on the situation.

But what happens when you have a function to deal with those structs? Could it be implemented so it not only works, but does the right thing in both volatile and non-volatile structs?

The trivial way to have such a thing is by just defining 2 versions of the function (with different names of course) with the differently qualified parameters; but another possibility is to define a union type in which one member is the plain type and the other is the volatile-qualified type, and make the function parameters use that union type. Standard-wise, it should work… shouldn't it?

This is one kind of thing that made me think of C++... I guess this would be solved mostly automatically there? (by overloading the function definition, or by using generics maybe...)

However, there is still another angle to it. For example, when looking for a memcpy for volatiles, looks like the common answer is that you have to decide what semantics do you exactly expect - and then implement it yourself. Makes sense in fact, because - what do you expect from a memcpy for volatiles?

So, in the same way, what could the compiler do given such a plain/volatile union…? The safest would be to enforce the strictest semantics, meaning the volatiles. Which would mean that the union is after all a red herring; if one wants to avoid the volatile overhead you need a non-volatile function, and accept that the semantics are, have to be, different.

So... it's looking like the way to go to avoid the volatile overhead in the general case of not needing volatile semantics in the function is simply making a local copy of the volatile before calling the non-volatile-params function, and later copying back the results to the volatile original.

But that only makes sense if the volatile value is "small" enough to be cached. What if it is a big structure shared between an interrupt routine and the main loop?

Indeed, my case is that I have grown a small library to deal with circular buffers and queues. Sometimes the struct which contains the buffer and metadata needs to be volatile (when an interrupt generates data for the main loop to process), and sometimes the struct is non-volatile (when a part of a function generates the data for later processing in another part of the same function). So... in this case the method of "make a local copy of the volatile and work with it" isn't practicable: the struct is big, there is not enough time nor memory, and anyway the very idea is abhorrent.

So it seems like the only solution is to duplicate the queue management functions for both use cases, which means duplicate maintenance. One way of avoiding the duplication is using the union parameters, but as reasoned, that will probably only generate bivalent functions with volatile semantics - it will not avoid the volatile pessimizations.

It can be argued that in an AVR, with no caches nor pipelines nor speculative execution, the pessimization can't be too important. And yet... Why does it have to be there at all?

Furthermore, I can't risk the correctness of the program for the sake of optimization. Even if I plainly checked that the compiler does the right thing, I can only assume that the next version of the compiler (or even another build of the program with this same version of the compiler) will keep doing the same thing if it is reasonable according to the standard and the compiler documentation. If not, any result is anecdotic and subject to change. (Gotta love undefined behaviour!)

And C simply doesn't seem to provide the flexibility I am looking for. So the safest, most correct option seems to duplicate the functions. Which smells like a maintenance problem.
Unless I get creative with the preprocessor or some scripting…

So: let's play programming language lawyers! 


Can we legally use a union of volatile and non-volatile? First stop, the C99 standard. After a lot of bumping around trying to make sense of all the subtleties, this is the summary I came up with. First, some definitions are needed. Any standard snippets referenced in the definitions are copied here too.

ALIASING is the action of referring to the same lvalue through different pointers. The aliasing practices before C99 have been described as "wild west", justifying that compiler writers had to be pretty conservative / defensive (for example, any pointer access meant that any memory position could have changed, so any cached variable had to be invalidated) and justifying then that some standardising was needed to enable more aggressive optimizations. C99 added a restriction: the pointers must have types in accordance with 6.5/7.

TYPE PUNNING is having a value stored in a memory location as type A and retrieving it as type B. Can be done in principle by writing and reading differently the same memory location/s (by aliasing pointers, or by using a union), or by copying a bit pattern from one memory location to another (via memcpy) - though the safety of each of those methods have to be assessed independently.

The difference with a cast is that casting can be explicit, and some rules are followed; with type punning, you are just reinterpreting an existing bit pattern in an implicit way.

STRICT-ALIASING is the compiler option that allows it to assume (and, importantly, not even check!) that 2 pointers with non-aliasable types (as in C99's 6.5/7) WILL NOT alias. So, type punning by via of differently typed pointers is almost forbidden; unions are the canonical / sanctioned method for type punning. ("almost forbidden" because there are some exceptions, like pointers to chars being able to alias anything). The advantage is greater optimization possibilities. This is the mandated ISO C99 behaviour, and default GCC flag in -O2, -O3, -Os.

(and, not directly related to the problem at hand, but connected by the aliasing thread: a RESTRICTED pointer is a pointer which is guaranteed to not be aliased by any other, even if having the same type. The only way to reference its pointed-to memory is by deriving from this pointer. Example: a pointer returned by malloc. Interestingly, there's a paper saying that this is too complicated to get right generally and with negligible effect on optimization anyway. UPDATE: that was written on 2004, but GCC 4.5, released on 2010, rewrote the restrict-related part of the optimization engine, so maybe its effect is more noticeable now)

The important points of the C99 standard (actually, these are taken from ISO/IEC 9899:TC3 Committee Draft - September 7, 2007 WG14/N1256), listed in such an order that each builds upon the previous points:

On type compatibility:

6.2.7/1        (on general type compatibility)
Two types have compatible type if their types are the same. [...]

6.7.3/9:     (on qualified types' compatibility)
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type.

On the relationship between type of an object and type of the access to that object:

6.5/6        (on the effective type of an accessed object)
The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.[...]

6.5/7:        (on the types permissible for accesses to an object)
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
  •  type compatible with the effective type of the object,
  •  qualified version of a type compatible with the effective type of the object,
  • [...]
On writing a union member and reading another:

6.5.2.3/3    (on writing to a union member and reading a different member)
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (footnote 82),  and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
(footnote 82):If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

6.2.6.1/6    (on a union's other members' bytes or padding being undefined when writing to one of the union's members)
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values (footnote 42). The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
(footnote 42) Thus, for example, structure assignment need not copy any padding bits.
... /7
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.


Other people's interpretations on the subject:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg01647.html
Linus hates strict-aliasing; so gcc -fno-strict-aliasing is used for the Linux kernel. His posture seems to be "fuck the specs, what I write is clear enough that the compiler should know what I mean".

http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html#Optimize-Options (look for -fstrict-aliasing)
Type punning via unions is explicitly allowed by GCC, even when using -fstrict-aliasing. But no playing with pointers-to-members-of-unions.

http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation
Further confirmation that reading a union member of a different type than that of the last write is OK for GCC from C90 at least.

http://dbp-consulting.com/tutorials/StrictAliasing.html
the best explanation I have found. Justifies everything on the standards. Accepts unions.

http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
Another explanation of aliasing, which mentions unions as only acceptable because they are a GCC extension. Handwavy, no references, but cited by others.

http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.htm
Good explanation, but relies on compiler's output to check for correctness and barely mentions the standard - and doesn't explain why he says that puning via unions is undefined. So, the reasonings are somewhat unreliable - a new compiler version might change everything according to him (and it could!, but should only happen because of bugs; while using the standard as the north star implies convergence to a definite state of compliance, instead of depending on the present build of the compiler)

In summary…


If using C99 at least, casting pointers (and mostly dereferencing them) is suspicious (code smell / antipattern), and type punning should be done rather via unions. Which means that my idea of using union parameters to be bivalent about volatile/non-volatile should be OK, in that it will not cause undefined behavior. But I can't see anything about the semantics of such a use.

Can't do much more by myself. Time to try Stack Overflow…

5 comments

  1. Let's see what Stack Overflow has to say about all of this.

    My first question there!
    And it's looking like the answers are either to use C++ or to get funny with the preprocessor. Urgh and urgh.

    ReplyDelete
  2. While we're at it, GNU's autoconf's manual has some chilling things to say about volatiles. It pretty much amounts to "abandon all hope". And together with papers on volatile compliance problems on compilers (published only 5 years ago, FFS!) and plain comments and discussions in AVR forums (even Atmel people can't be bothered to make it right?? it's pure embedded development!), makes me wonder how on Earth can we trust C at all for correctness in these things without dropping into ASM all the time.
    (...and then you'd have processor errata, of course...)

    Also, I read some guru arguing for the use of pointers-to-volatile as a way to opt-in to volatile semantics when needed, without having to declare volatile the underlying object itself, which would force the volatile semantics everytime. Well, the Autoconf link destroys that notion.

    ReplyDelete
  3. Well, that's interesting. I tried posting a comment at the guru's (Michael Barr) article mentioning how the Autoconf's manual doesn't agree on the pointer-to-volatile thing, ... and the comment first awaited for moderation, then disappeared. I tried posting again and now my comments don't even wait for moderation, but in further tries WordPress notifies me that I "already posted that".

    Just to make sure, I sent him a mail. No answer.

    So, maybe gurus don't like being disputed? Heh. Is it so bad to be wrong in something as muddy as the C standard?

    Funny I guess, given that the guy has posts on how other websites give wrong programming advice...

    ReplyDelete
  4. More chilling reading: as of GCC 4.9 looks like the semantics of an union of volatile and non-volatile objects are not decided. The tone of the conversation is fascinating. Compiler, standards: it's not only the realization that it's not infallible magic. It's rather the realization that it all a big, living, unfinished ... hack.

    At the beginning I didn't want to look at the compiler's output, thinking that the standard would be necessary and sufficient. What a cute idea.

    And now I remember some comments on how the Algol or Java standards were "so complete" in comparison to the C standards. Now I guess I start to understand. Same thing about comments on how LLVM tried to improve over GCC...

    Alas, there seems to be no LLVM for 8 bit processors.

    Well, everything conspires to drive home the point that it's better to stay on the most fundamentally correct and treaded path. So.... volatiles will it be. And preprocessor eventually.

    ReplyDelete
  5. Turns out that pointer-to-volatile as a way to have volatile semantics on-demand is extensively used in the Linux kernel, courtesy of ACCESS_ONCE and friends.

    STILL, that is in fact murky enough that GCC stopped honoring it for a while ("because the Standard allows it"); and it took 2 weeks of discussions, bringing in the likes of Henry Spencer, to argue it back in.

    In summary: the C99 standard does imply that pointer-to-volatile is a valid strategy to force volatile semantics, and GCC does accept it, to the point that once it was removed but re-added as a regression fix.

    Given that the pointer-to-volatile is used in READ_ONCE and friends in the Linux kernel, we can hope that this is an important enough use-case that it will be kept working in current versions of C standards and GCC. (indeed, it looks like the Linux kernel memory model is being taken into account for standardization...)

    ReplyDelete