Data bus width: the meaning of alignment and how to align and bring efficiency differences

Recently, I had a conversation with some colleagues about how data bus width and alignment affect program efficiency on the ARM platform. When defining structure data types, to enhance system performance, it's important to follow the principle of word-length alignment. A quick discussion to share some insights.

This post aims to explain what alignment is, how it works, and why alignment can significantly impact performance. Let's dive into this topic with an example.

The meaning of alignment and how to align and bring about the difference in efficiency

1. Let’s look at the following example:

#include <iostream>

#pragma pack(4)

struct A

{

char a;

int b;

};

#pragma pack()

#pragma pack(1)

struct B

{

char a;

int b;

};

#pragma pack()

int main()

{

A a;

std::cout << sizeof(a) << std::endl;

B b;

std::cout << sizeof(b) << std::endl;

}

The default alignment for VC is 4 bytes, while ADS uses 1-byte alignment. Since we're discussing PC platforms, I’ll focus on x86 architecture here.

If you’ve spent too much time thinking about this, don’t hesitate to ask directly. You can see that the results of 4-byte vs. 1-byte alignment under MS VC are completely different—8 vs. 5. Why? That’s the role of byte alignment on x86.

To speed up execution, some architectures are designed with aligned memory access. Typically, alignment boundaries match the word length. For structures like A, which contains an int (4 bytes), the entire structure is aligned to 4 bytes, making its size 8 instead of 5.

In our original understanding, we might expect the structure to take only 5 bytes (char + int). But due to alignment, it actually takes more space. A wastes 3 bytes compared to B, but that’s the trade-off between time and space.

So why use alignment at all?

Because alignment is a trade-off between time and space. It saves time by using more space, which is a design choice made by developers.

Why does alignment improve efficiency? That’s the core question many people have.

On common PCs, the bus width is 32 bits.

1. If the data is aligned:

All read/write operations can transfer data in one go without extra overhead.

|1|2|3|4|5|6|7|8|

Assume 'a' starts at position 1, and 'b' starts at position 5. Accessing 'a' would be a single 8-bit read, while accessing 'b' would be a full 32-bit read. Both are efficient.

2. If the data is not aligned:

|1|2|3|4|5|6|7|8|

Now, 'a' starts at 1, and 'b' starts at 2. This creates alignment issues. On architectures like SPARC or MIPS, unaligned access causes errors. However, x86 supports it by performing multiple reads to assemble the data.

For example, to get the value of 'b', the CPU may read from address 2, then 3, then 4, and finally 5, combining them into a 32-bit integer. This process is much slower than a single aligned read.

While the efficiency loss is significant for multi-byte accesses, it’s less noticeable for single-byte operations. Modern development often prioritizes performance, so there are two common approaches to handle alignment:

1. Explicitly insert padding to align the structure:

struct A {

char a;

char reserved[3]; // Use space for time

int b;

};

2. Leave it to the compiler to handle automatically.

Another approach is to group logically related data together.

Alignment issues can also appear implicitly, such as when casting pointers. For instance:

unsigned int i = 0x12345678;

unsigned char *p = NULL;

unsigned short *p1 = NULL;

p = &i;

*p = 0x00;

p1 = (unsigned short *)(p + 1);

*p1 = 0x0000;

Here, the code accesses an unsigned short from an odd address, which violates alignment rules. On x86, this affects performance, but on architectures like MIPS or SPARC, it could cause a runtime error.

FAKRA Automotive High Frequency Connectors

Fakra Automotive High Frequency Connectors,Multi-Port Rf Connectors,Fpc Automotive Connector,Automotive Terminals Connector

Dongguan Zhuoyuexin Automotive Electronics Co.,Ltd , https://www.zyx-fakra.com

Posted on