Assembler Optimizer

Page 44/53
37 | 38 | 39 | 40 | 41 | 42 | 43 | | 45 | 46 | 47 | 48 | 49

By Bengalack

Champion (425)

Bengalack's picture

30-04-2021, 08:53

Here are 3 more. As the other example, sdcc-code is commented out. Oriented towards speed.

;game.c:1106: if( g_nViewPortY < g_nMinViewportY )
	; 92 cycles
	; ld	hl, #_g_nMinViewportY
	; ld	iy, #_g_nViewPortY
	; ld	a, 0 (iy)
	; sub	a, (hl)
	; ld	a, 1 (iy)
	; inc	hl
	; sbc	a, (hl)

	; 56 cycles
	ld	de, (#_g_nMinViewportY)
	ld	hl, (#_g_nViewPortY)
	sbc hl, de

and this one:

;game.c:1119: g_nViewPortXSprites = g_nViewPortX+SPRITE_WIDTH;
	; 108 cycles
	; ld	hl, #_g_nViewPortXSprites
	; ld	iy, #_g_nViewPortX
	; ld	a, 0 (iy)
	; add	a, #0x10
	; ld	(hl), a
	; ld	a, 1 (iy)
	; adc	a, #0x00
	; inc	hl
	; ld	(hl), a

	; 57 cycles
	ld 	de, #0x10	; SPRITE_WIDTH
	ld 	hl, ( #_API_g_nViewPortX )
	add hl, de
	ld  ( #_g_nViewPortXSprites ), hl

;game.c:1120: g_nViewPortYSprites = g_nViewPortY+SPRITE_HEIGHT;

	; 108 cycles
	; ld	hl, #_g_nViewPortYSprites
	; ld	iy, #_g_nViewPortY
	; ld	a, 0 (iy)
	; add	a, #0x10
	; ld	(hl), a
	; ld	a, 1 (iy)
	; adc	a, #0x00
	; inc	hl
	; ld	(hl), a

	; 47 cycles - cheating a bit here, I know that SPRITE_HEIGHT always ==SPRITE_WIDTH,
	; so DE remains as is (MDL would have detected this today, I think)
	ld 	hl, ( #_g_nViewPortY )
	add hl, de
	ld  ( #_g_nViewPortYSprites ), hl

In many cases, SDCC always uses hl, bc, de first. It's just that it seems like there's no threshold using iy after those others have been in use. Just as iy is equivalent to the others. It's isn't of course. Index-registers shine in some areas, but not in all.

At some point, this all points to sdcc code generation, and should ideally be addressed there I think. But if some pattern can be deduced from this, and which could be useful in general in MDL, it might be something to pursue. That said, how often is this kind of pattern seen in other asm-files? What is the cost/benefit solving this?

Could be that I should drop a line in the SDCC-forums and ask for thoughts on the current sdcc-code-gen. Just speculating, but I would assume that performance isn't their first priority. I'm guessing that correctness, compatibility and compliance to the standard are first priorities, and that we are years away from getting an even more "delicate" index-register-treatment.

By pgimeno

Champion (302)

pgimeno's picture

30-04-2021, 14:02

@santiontanon Great progress!

santiontanon wrote:

I tried a smaller version (sort 2), I added support for the backslash as you mentioned, but I was actually able to encode the task without requiring "<=" constraints, like this (the "sort 3" is also encodeable like this):

Right, I didn't think of a ternary ? : operator. Anyway, you can always encode A <= B as !(B < A) so it's always possible to eliminate <= constraints and even to make the direction always one way.

santiontanon wrote:

Notice the "& 0xff", since all parameters are 16bit (I need to do something about that haha).

Oops! What about ?val is 8 bits and ??val is 16 bits? Like, 8 bits per '?'. It might help with speed too.

santiontanon wrote:

But even that's pretty hard. In fact it should be trivial, but since there are no jump instructions yet, it becomes hard.

Yes, it's not exactly trivial. If 'min' takes 5 instructions, it's expectable that a complete sort takes something between 9 and 12 instructions, and that may be out of reach. I guess that the 3 value sort was too much.

santiontanon wrote:

Anyway, updates in the new version (which should also fix the -ansioff thing): https://github.com/santiontanon/mdlz80optimizer/releases/tag/v2.0c

Thanks, it works! I still have to use the '|cat' trick to get the short help, but that's not a biggie.

By pgimeno

Champion (302)

pgimeno's picture

30-04-2021, 14:28

santiontanon wrote:

Thanks!! this is an interesting example!! I'll look for things like this in SDCC code, and I'm sure there's a few more patterns that can be found!

I recall this post from... *shiver* 36 pages ago.

hit9918 wrote:

it would be cool if it could deal SDCC code

;app.c:12: while(1) {
00102$:
;app.c:13: p->x += p->dx;
	push	bc
	pop	iy
	inc	iy
	inc	iy
	ld	e, 0 (iy)
	ld	d, 1 (iy)
	ld	l, c
	ld	h, b
	ld	a, (hl)
	inc	hl
	ld	h, (hl)
	ld	l, a
	add	hl, de
	ld	0 (iy), l
	ld	1 (iy), h
	jr	00102$
;app.c:15: }
--
199 cycles

Artrag posted a very optimized version of this specific code, but a more general approach would be to perform the 16-bit additions in 8-bit sections (add + adc). Can't post the optimized example right now, because I'm in a hurry, but hopefully you know what I mean.

It would be important to check the latest version of the compiler, to avoid doing work that will no longer be necessary.

By santiontanon

Paragon (1527)

santiontanon's picture

30-04-2021, 19:43

@pgimeno: great idea about the ?val and ??val!! Right now, you don't even need to add that "?" in front hehe (I just add it for clarity), but any variable name that is not a register/flag would do. But I like the idea of "?" and "??" to mark 8bit vs 16bit. I pushed a commit yesterday night that autodetects if you are declaring a parameter with an 8bit or 16bit register, but there are corner cases where it would not work well. So, I think I'll leave the auto detection for parameters without "?", and then use "?" and "??" (or similar) to explicitly define 8/16 bits, thanks for the suggestion!

@pgimeno: oh! good point about the short help. I noticed that when I was doing tests, and didn't have time to fix it, but I need to re-work the behavior of MDL when only flags like "-ansioff" are passed. So, I'll add a to-do item, and will get that fixed in the next version Smile

@bengalack: very nice 3 examples! I can totally see patterns from all 3 of them! So, I'm going to be looking into this over the weekend and post an update here. I'll also check that optimized version from ARTRAG, to see if I can get additional patterns from that too! It would be very nice to have patterns that address recurring issues in SDCC so that significant gains can be seen. Also, if this can result in feedback to the SDCC group so that the compiler generates faster code on its own, then that'd also be a positive effect Smile

By Bengalack

Champion (425)

Bengalack's picture

30-04-2021, 20:30

Cool!

The c-example is a bit special in that it is an endless loop. If I read this right, ARTRAG’s version runs well in one run/loop. But it trashes bc, so next run/loop won’t be correct. Unless I’m missing something of course Smile

By santiontanon

Paragon (1527)

santiontanon's picture

02-05-2021, 21:43

I just uploaded a new version with a couple of new patterns based on the examples you showed Bengalack! https://github.com/santiontanon/mdlz80optimizer/releases/tag...

It's still a temporary release, since not everything that I want to have for v2.0 is there yet, but at least if anyone wants to test, there it is :)

I also expanded the search-based optimizer with a few of the things pgimeno mentioned (you can use "\" to break lines and you can also specify flags as the target, like "z_flag = ?val1 == ?val2", and the "-ansioff" should now also work for the short help). But there are still lots of things that are missing. Maybe one more week of work and I can get it to the point that I'm confident enough as for making 2.0 the latest official release :)

But one thing that I am very excited about is that now that I have the ability to simulate programs with the Z80 simulator, there's so many cool things that can be done. For example, I created a new test that actually runs the "pattern" and the "replacement" of a given optimization pattern in the simulator to verify that the replacement is indeed equivalent. Thanks to this, I noticed already a few missing safety constraints that I added to a few patterns.

By Bengalack

Champion (425)

Bengalack's picture

02-05-2021, 23:30

Sweeet! I get this now:

INFO: Pattern-based optimization in Debug\objs\game.asm#16285: Replace ld iy,_g_uPowerup; ld (iy+0),0x08 with ld hl,_g_uPowerup+0; ld (hl),0x08 (3 bytes, 15 t-states saved)

I'll move the hand fixed asm code back to C again (from my examples), to test if MDL catches them too.

Great work!

By santiontanon

Paragon (1527)

santiontanon's picture

03-05-2021, 03:11

Thanks!!! The parts where you could get extra optimizations because you know the range of certain variables, you should still keep, as mdl will not get those, but the others I hope they are now applied automatically Smile

By Bengalack

Champion (425)

Bengalack's picture

03-05-2021, 18:56

santiontanon wrote:

The parts where you could get extra optimizations because you know the range of certain variables, you should still keep, as mdl will not get those

What does this mean exactly? Variables? You mean 16-bit/8bit, signed/unsigned, bool, etc?

Another thing, I have found that there is a detection like this:

INFO: Pattern-based optimization in Debug\objs\game.asm#3859: Remove unused ld d,a (1 bytes, 5 t-states saved)

(I placed it there for test)

I also placed this: "ld a,h", which in a way, also was unused. The thing here was probably that there was a command later "cp a". But cp a does not "work on A", only on F Smile Clear C-flag and sets Z-flag (and maybe more?). So I guess ideally, that line should have been removed too.

(BTW: These two unused values/commands were present in code from SDCC 4.0. Not sure about 4.1)

By santiontanon

Paragon (1527)

santiontanon's picture

03-05-2021, 19:32

Oh, I misread your earlier comment about the 16bit signed shift! you can ignore Smile

And about the "cp a", oh! good point! I have it lumped with "cp " and in general it depends on both A and , but that particular case needs to be fixed. Good catch! I'll modify my dependencies definition file to account for this!

Page 44/53
37 | 38 | 39 | 40 | 41 | 42 | 43 | | 45 | 46 | 47 | 48 | 49