Wednesday, 5 May 2010

Windows CE network fix for x86 devices

Hi all,

Today I want to post a very helpful fix to improve the Windows CE network stack performance. This fix is not from me but from Anthony van Woerdekom, thanks a lot Anthony!

“Earlier this week, I promised to tell you about a tweak I have to enhance the performance of the WinCE network stack dramatically. It’s a pity that this is not a generic fix, it will only work on Intel compatible x86 systems (i.e. the ones that could run WinXP) and use DMA based Ethernet controllers.

NDIS has quite a number of functions to support the DMA with network adapters. As you may know, on most systems, using DMA means that the software must ensure that the processor cache is always coherent with the memory contents by flushing the processor cache at the right moment. If the software does not want to do this, it is often possible to access the memory that is used by the DMA buffers through an uncached memory "window".

When using Intel compatible systems (or AKA wintel), the software does not have to bother about maintaining coherency between the processor cache and the DMA buffers. This is really a great feature, because it allows you to use cached memory for DMA buffers, without you ever using the wrong data!

The main function NDIS has for allocating DMA buffers is
NdisMAllocateSharedMemory. It allows the driver to allocate buffers in cached or uncached memory. You will often see that drivers will allocate uncached memory because that is much easier to use.

However, even if the driver tries to allocate cached memory, Wince 5.0 and 6.0 will ALWAYS return uncached memory! Using uncached memory from software usually incurs a very large overhead - and in the case of NDIS drivers on wintel platforms, it will slowdown Ethernet throughput by more than a factor of 2.

There may be alternatives, but what I did, was to change the memory attributes of the memory returned by
NdisMAllocateSharedMemory:

NdisMAllocateSharedMemory(
hAdapterHandle,
dwRawSize,
TRUE, // Try to allocate all in cached mem (but WinCE
// will return uncached memory)
&pvRawVAddr,
&npaRawPAddr
);

// Check the allocation success
if (pvRawVAddr == (PVOID)NULL) return FALSE;
if (NdisGetPhysicalAddressHigh(npaRawPAddr) != 0) return FALSE;

// On WinCE, NDIS always allocates memory as uncached. This is not
// needed on intel platforms
// so simply change the allocated memory into cached memory
VirtualProtect(pvRawVAddr, dwRawSize, PAGE_READWRITE, &ulProt);


For me, the speed improvement was dramatic! I also did the following:
  • Removed all calls to NdisFlushBuffer, just in case this took a long time
  • Removed calls to NDisMGetCurrentSystemTime. If this is called on each packet, you will see a large hit count on a 64-bit division function in coredll.dll.

With these changes, I can get 500MBit/s TCP/IP throughput @100% CPU load on
a 1.5GHz VIA C7 processor with a CX700 bridge chip. And that is very similar
to what WinXP can do on that same system.”

Have fun!

No comments: