SDCC 4.1.12, a Game changer for C programming?

Страница 1/4
| 2 | 3 | 4

By aoineko

Paladin (706)

Аватар пользователя aoineko

27-11-2021, 02:00

Hi all,

The SDCC team, and especially Philipp Klaus Krause (a Z80 guru), has bring to developers a new function calling convention that will significantly increase the performance of C programs.
This new calling convention, which is the default since SDCC 4.1.12, allows some function parameters to be passed through the Z80 registers rather than through the stack, which is far much faster.

To make it short, the first parameters are sent by the registers and the rest by the stack. Details at the end of the message.

If you program 100% in C, it's very easy to take advantage of it... just update SDCC, recompile and voila! :)

If you use a mix of C and inline assembler (like me) the transition is less straightforward, but nothing very complex either. And... it's worth it!
In this case, what I advise is to first disable by hand the new calling convention on all functions where you read the input parameters and/or set the return value in assembler. To do this, just add the __sdcccall(0) directive to these functions. And that's it, you can compile and enjoy the optimizations of the new convention on all your pure C functions.
You can then gradually remove the __sdcccall(0) after adapting your assembly code to the parameter access changes.

For information, functions using the __z88dk_fastcall directive are not impacted by the changes and keep their own calling convention.

Here is the list of parameter combinations handled natively via the Z80 registers according to the calling conventions:

__sdcccall(0) is the old calling convention (default until SDCC 4.1.11).
__sdcccall(1) is the new calling convention (default since SDCC 4.1.12).
__z88dk_fastcall is an alternative calling convention that works only with 1-parameter functions.

Not only does the new calling convention natively handle more cases than __z88dk_fastcall, but it also works even if there are more parameters in the function. In this case, the parameters not handled by the registers are passed through the stack. For example, if a function takes 3 x 8-bit parameters, the first 2 will be passed through the registers (A and L) and the 3rd through the stack.

I haven't done a real benchmark yet, but after converting my game library (https://github.com/aoineko-fr/CMSX) I could see a gain of about 20% on my sprite test program (with 32 sprites moving at once).
The gain should vary a lot from program to program — especially depending on the number of functions called each frame — but in any case it should be significant.

I hope that one day soon SDCC will offer the possibility to choose in which register to put each parameter of a C function to help interfacing assembly libraries and BIOS, but in the meantime, it is already a big step that has been made to make C programming more efficient on our beloved 8-bit computers.

Thanks to the SDCC team!

Для того, чтобы оставить комментарий, необходимо регистрация или !login

By jepmsx

Master (195)

Аватар пользователя jepmsx

27-11-2021, 06:02

Thanks a lot aoineko for the update in the development and the good explanation. I'm in the first steps of learning sdcc using Fusion-C and it's good to know that sdcc is still improving after so many years.

By raymond

Hero (616)

Аватар пользователя raymond

27-11-2021, 06:18

Good to hear that there is a lot of progress at SDCC for our beloved Z80! I am using Fusion-C and wonder how much speed I will gain in using this new version Smile

By raymond

Hero (616)

Аватар пользователя raymond

27-11-2021, 06:45

The latest official release is still 4.1.0, where did you get this 4.1.12 version from?

By ericb59

Paragon (1087)

Аватар пользователя ericb59

27-11-2021, 09:11

Be careful FUSION-C is not compatible with this new convention. Since Fusion-C is based on functions written in assembler, it would be necessary to update the assembly code and recompile the FUSION-C library.
What I might do ...
I advise you not to use this new version of SDCC with Fusion-c.

By Bengalack

Hero (660)

Аватар пользователя Bengalack

27-11-2021, 09:44

raymond wrote:

The latest official release is still 4.1.0, where did you get this 4.1.12 version from?

http://sdcc.sourceforge.net/snap.php

I've been following this for a while, and I'm pretty excited as well, although I won't get any numbers near 20% gain from this -- I have programmed around "this problem" as best as I could in any important inner loop, or any part of the main game loop. But any gain is very much welcome. Always :)

By Bengalack

Hero (660)

Аватар пользователя Bengalack

27-11-2021, 10:14

To me, it seems like they should have utilized BC as well, extending this to more than just two parameters. I understand that an argument against using BC as well, would be that BC is often used as a counter, and would "generally" result in a lot of pushing and popping to store/restore the values when calling, but looking at the generated code (in 4.1 and earlier), I have the impression that SDCC often does that anyways. Just as if SDCC doesn't analyse 100% which registers are used in the called routines, and then just restores values at caller "to be safe". But maybe all this improves in 4.2? Smile

By aoineko

Paladin (706)

Аватар пользователя aoineko

27-11-2021, 11:26

raymond wrote:

The latest official release is still 4.1.0, where did you get this 4.1.12 version from?

As Bengalack says, the 4.1.12 is a snapshot that you can get from the SDCC website (http://sdcc.sourceforge.net).
I tested all the sample programs of my library and my tennis game: everything work just fine, so this version seems stable to me.
If you are not in a hurry (yes, I am impatient ^^) you can wait for the 4.2.0 version which will include these improvements.

Bengalack wrote:

To me, it seems like they should have utilized BC as well [...]

Don't hesitate to send your feedback to the SDCC team: https://sourceforge.net/p/sdcc/support-requests/
They are totally open to discussion.
I also wondered about some of the registry choices (especially the return via DE rather than HL), but they have solid arguments and they don't take their choice lightly.
This is a free collaborative project so any help from us to improve it is welcome.

By konamiman

Paragon (1176)

Аватар пользователя konamiman

28-11-2021, 16:23

This is awesome. Once SDCC 4.2 is out it'll be time to revisit all mi C applications, including Nextor's FDISK.

By ARTRAG

Enlighted (6891)

Аватар пользователя ARTRAG

18-01-2022, 13:10

Very nice paper from the person who did the work

https://arxiv.org/pdf/2112.01397v1.pdf

PS
SDCC is now at v4.1.14

By Grauw

Ascended (10639)

Аватар пользователя Grauw

18-01-2022, 15:55

Interesting research report. Nice to see that this change was based on a thorough investigation.

Though I would’ve liked to have read some more interpretation and analysis of the results; why does a particular arrangement perform better than others. Something to explain the purely empirical data. E.g. is DE better than HL because it reduces the amount of spilling when HL (16-bit accumulator) is needed for another calculation? But then why isn’t E better than A (8-bit accumulator) as well?

Страница 1/4
| 2 | 3 | 4