Union Weirdness
I spent two days this week taking a deeper look at Endian Unions. The two days were productive, with Tom, Anjani, and me realising that there’s more to Endian Unions than meets the eye.
The 1st Day
I spent the majority of the first day investigating what testing needed to be added for Endian Unions. The main strategy for this was to look at the existing testing for Endian Structures, and deciding which ones applied to Endian Unions.
For both structures and unions, the user can specify the packing and create objects with bit fields. In deciding how to test these properties for Big and Little Endian Unions I first had to understand how they behaved for Unions in general. I did this by writing quick tests and scratch files.
And I found something fairly unintuitive.
Here’s some code to demonstrate. Hopefully it’s fairly self-explanatory so that if you are not familiar with structures and unions you can understand what they do:
from ctypes import *
# Create a structure with two c_uint32 fields.
class Struct(Structure):
_fields_ = [("a", c_uint32), ("b", c_uint32)]
# Initialise a null byte array with 8 bytes.
buff = bytearray(8)
# Create a structure so that it points to the byte array.
simple_struct = Struct.from_buffer(buff)
simple_struct.a = 1
simple_struct.b = 2
print("a is {}".format(simple_struct.a)) # Prints "a is 1"
print("b is {}".format(simple_struct.b)) # Prints "b is 2"
# Take a look at how the structure occupies the memory.
print("Buffer: {}".format(buff)) # Prints "Buffer: b'\x01\x00\x00\x00\x02\x00\x00\x00'"
# Create a union with two c_uint32 fields.
class Union_(Union):
_fields_ = [("a", c_uint32), ("b", c_uint32)]
# Initialise a null byte array with 4 bytes. Note that the union requires half
# the memory of the structure as fields "a" and "b" occupy the same bit of memory.
buff = bytearray(4)
# Create a union so that it points to the byte array.
simple_union = Union_.from_buffer(buff)
simple_union.a = 1
simple_union.b = 2
print("a is {}".format(simple_union.a)) # Prints "a is 2"
print("b is {}".format(simple_union.b)) # Prints "b is 2"
print("Buffer: {}".format(buff)) # Prints "Buffer: b'\x02\x00\x00\x00'"
# Create a structure with bitfields. We set "a" to a bitwidth of 24 bits and "b"
# to a bitwidth of 8 bits.
class BitFieldStruct(Structure):
_fields_ = [("a", c_uint32, 24), ("b", c_uint32, 8)]
# Initialise a null byte array with 4 bytes. Note that the bitfield structure
# requires half the memory of a regular structure.
buff = bytearray(4)
bitfield_struct = BitFieldStruct.from_buffer(buff)
bitfield_struct.a = 0xabcdef
bitfield_struct.b = 1
print("a is {}".format(bitfield_struct.a)) # Prints "a is 11259375" (0xabcdef in decimal).
print("b is {}".format(bitfield_struct.b)) # Prints "b is 1".
print("Buffer: {}".format(buff)) # Prints "Buffer: b'\xef\xcd\xab\x01'".
# It is easy to see why only a buffer of 4 bytes is required.
# Create a union with bitfields, analagously to the bitfield structure.
class BitFieldUnion(Union):
_fields_ = [("a", c_uint32, 24), ("b", c_uint32, 8)]
buff = bytearray(4)
bitfield_union = BitFieldUnion.from_buffer(buff)
bitfield_union.a = 0xabcdef
bitfield_union.b = 1
print("a is {}".format(bitfield_union.a)) # Prints "a is 11259375"
# Wait what? I would expect it to print 1 as bitfield_union.b was set last.
print("b is {}".format(bitfield_union.b)) # Prints "b is 1".
# I would expect this to be the same as bitfield_union.a.
print("Buffer: {}".format(buff)) # Prints "Buffer: b'\xef\xcd\xab\x00'".
# When we set bitfield_union.b it is never written to memory!
# To check the behaviour, have the two fields be the same bitfield lengths.
class SanityUnion(Union):
_fields_ = [("a", c_uint32, 32), ("b", c_uint32, 32)]
buff = bytearray(4)
sanity_union= SanityUnion.from_buffer(buff)
sanity_union.a = 1
sanity_union.b = 2
print("a is {}".format(sanity_union.a)) # Prints "a is 2" as we expect.
print("b is {}".format(sanity_union.b)) # Prints "b is 2" as we expect.
print("Buffer: {}".format(buff)) # Prints "Buffer: b'\x02\x00\x00\x00'" as we expect.
Bitfield unions behave a little strangely when fields are different lengths. The two fields no longer return the same value when you access them, although this doesn’t really matter as you would only ever want to retrieve one at any one time. What we should care about is what the memory looks like. There the weird behaviour is that the 1 doesn’t seem to have overwritten 0xabcdef in the memory, even though it was set after the larger number. There’s no mention of this in the Python docs. If anyone has any explanation for this weirdness contact us!
The 2nd Day
I spent the 2nd day writing test cases to test the Big and Little Endian Union implementation I wrote. Much of the day was spent discussing what exactly constituted a Big/Little Endian Union with Tom and Anjani. Further discussion on this can be read in Tom’s post - the concept of “endianness” is subtle for unions with different sized elements.
We concluded that there is still work to do in implementing Endian Unions as currently we do not account for this subtlety.
Looking forward
Tom, Anjani, and I haven’t figured out whether the remaining work exists in the Python or C code. We also still haven’t fully thought about more complex scenarios, i.e. nested unions with different sizes. This likely means that we will have to continue our work on Endian Unions into the New Year, but we definitely have a better grasp on what is left to do.
I enjoyed these 2 days as it allowed me to ‘get stuck in’ and think more about what is required. Not only have I learnt a fair bit about Python, but also lots about C!