Assembler Optimizer

Page 43/58
36 | 37 | 38 | 39 | 40 | 41 | 42 | | 44 | 45 | 46 | 47 | 48

By pgimeno

Champion (328)

pgimeno's picture

28-04-2021, 13:52

IMO the goal is not flexible enough. I tried to create a spec file for sorting three values. This is what I wrote:

allowed_ops:
    add
    sub
    cp
    adc
    sbc
    and
    or
    xor
    ld
allow_ram_use = false
allowed_registers: a, b, c, d, e
initial_state:
    C = ?val1
    D = ?val2
    E = ?val3
goal_state:
    C = ?val1 && D = ?val2 && E = ?val3 || C = ?val1 && D = ?val3 && E = ?val2\
 || C = ?val2 && D = ?val1 && E = ?val3 || C = ?val2 && D = ?val3 && E = ?val1\
 || C = ?val3 && D = ?val1 && E = ?val2 || C = ?val3 && D = ?val2 && E = ?val1
    C <= D
    D <= E

But it choked in the first &&. I could also formulate it as a single expression instead of separating C <= D and D <= E as separate expressions, if necessary. It'd be nice if it accepted == instead of = for the goal state, and \ for line continuation (I don't know if it does this); also to be able to specify expression constraints in the initial state, so that if you know that in your code not all inputs are valid, you can specify them as constraints in the hope that the superoptimizer can take advantage of that.

I also don't know how to specify memory input; e.g. in the above case, if the input values are in memory, how would I specify that? And in the goal? Also, is there a way to specify initial and goal flags?

Apart from that, is there a way to disable colours? I find them very distracting.

By santiontanon

Paragon (1805)

santiontanon's picture

28-04-2021, 15:26

Thanks pgimeno!! Indeed, there is no way to specify constraints like "C <= D" yet, but I just added it to the to-do list! For now all constraints are of the form " = "

The main issue here is that the "goal_state" check must be very fast as it's called millions of times during execution. So, right now, I compile down whatever expression you write into a (short) list of simple equality comparisons that do not involve any expression (expression evaluation is slow). In the example above, I think it's feasible to still achieve this, as once ?val1, ?val2, and ?val3 have a specific value, then the expected value of C, D and E is fully determined. So, I think this is a great test case for me to try to get to work Smile

alternatively, I could allow more complex conditions, but when present, I should warn the user that search will be slow as a result Smile

As for memory constraints. It's in the to-do list as well. I'll be adding these things bit by bit. Adding memory is not too hard, but the main challenge would be adding it while keeping the init/goal_check code as fast as possible.

As for colors, you can use the "-ansioff" flag Smile

By pgimeno

Champion (328)

pgimeno's picture

28-04-2021, 19:37

I did expect this to be extremely slow; just all combinations of 3 values account for a 65536X factor vs. checking a single value. And it's pretty obvious that it will need more than 6 instructions, making the number of combinations explode. I was ready to let it run for hours, but maybe even that is not realistic.

santiontanon wrote:

As for colors, you can use the "-ansioff" flag Smile

Thanks. It's not working well for me, though.

- java -jar mdl.jar shows the short help in colour. java -jar mdl.jar -ansioff gives an error. A workaround is to use java -jar mdl.jar | cat but that still shows the majority of the short help in bold white.
- java -jar mdl.jar -help -ansioff shows the long help in full colour. java -jar mdl.jar -ansioff -help still shows the majority of the long help in bold white.

It seems to work for things other than requesting help.

By santiontanon

Paragon (1805)

santiontanon's picture

28-04-2021, 21:01

oh! Maybe I missed the "white" color (which I added later)) in the filter that removes colors before printing. I'll make a note to check it when I get home tonight Smile

I'm very curious about what would it need to solve that task too! I'll take it as a challenge to get your specification to work. So, hopefully I can report a success in a few days (I might start with something simpler, like sort just C, D, and then move to C, D, E Smile

By Bengalack

Paladin (747)

Bengalack's picture

28-04-2021, 21:17

Great! I tested pre-release version "2.0b" on an asm-file generated from a rather large C-file using sdcc v4.1. No complaints in the output, and a quick test of the program shows that things seem to work fine.

Results for size:

java -jar mdl.jar Debug\objs\game.asm -dialect sdcc -ro -po size -asm-dialect Debug\objs\game.asm.mdl.asm
INFO: mdl optimization summary: 152 bytes, 640/695 t-states saved. CodeReorganizer moves: 21. Pattern-based optimizer pattern applications: 84.

Results for speed:

java -jar mdl.jar Debug\objs\game.asm -dialect sdcc -ro -po speed -asm-dialect Debug\objs\game.asm.mdl.asm
INFO: mdl optimization summary: 17 bytes, 910 t-states saved. CodeReorganizer moves: 21. Pattern-based optimizer pattern applications: 193.

I have the impression that the code generated is a bit speedier and quite more compact in this new v4.1, compared to v4.0.

The breakdown of 910 cycles and which types of optimizations used on the speed-variant are like:

What	Cycles saved	Share
Replace jr	244	26,81%
move lines	242	26,59%
Remove push	138	15,16%
Replace ld	126	13,85%
Remove duplic.	92	10,11%
Replace push	26	2,86%
Remove unnec.	16	1,76%
Remove unused	15	1,65%
Remove ld	5	0,55%
Replace adc	3	0,33%
Replace add	3	0,33%
detected this	0	0,00%

Pardon the short names... the regexp got a little aggressive, but @santi recognises them I think Smile

The biggest "single offender" MDL finds from SDCC, is non-needed push/pop-pairs. 23 cycles saved in one go.

I also see that that sdcc too easily turns to IX/IY-usage. Replacing that with BC/HL/DE or possibly BC'/DE'/HL' would likely speed up significantly.

By santiontanon

Paragon (1805)

santiontanon's picture

28-04-2021, 23:27

Very cool analysis!! Thanks for sharing Bangalack!!! Something that would be nice is to have a few more SDCC-targetted optimization patterns. For example, for what you mention of IX/IY, if we can find a few instances where they could be replaced by BC/HL/DE or their ghosts, I might be able to create more patterns to have further optimizations.

Also, I deactivated a few patterns when the dialect is SDCC due to limitations of the sdasz80 assembler parser with complex expressions. But I plan to bring a few of those back in now that the code generation in MDL is a bit more robust. So, hopefully we can see a few more gains soon too Smile

By santiontanon

Paragon (1805)

santiontanon's picture

29-04-2021, 05:43

@pgimeno, I tried the "sorting" example you were trying. It's actually quite hard, and MDL i snowhere near being able to generate that yet! But I like it, it's a good challenge Smile

I tried a smaller version (sort 2), I added support for the backslash as you mentioned, but I was actually able to encode the task without requiring "<=" constraints, like this (the "sort 3" is also encodeable like this):

  allowed_ops:
    add
    sub
    cp
    adc
    sbc
    and
    or
    xor
    ld
  allow_ram_use = false
  allowed_registers: a, b, c, d, e
  max_ops = 10
  initial_state:
      C = ?val1
      D = ?val2
  goal_state:
      C = (?val1 & 0xff) < (?val2 & 0xff) ? ?val1 : ?val2
      D = (?val1 & 0xff) < (?val2 & 0xff) ? ?val2 : ?val1

Notice the "& 0xff", since all parameters are 16bit (I need to do something about that haha).

But even that's pretty hard. In fact it should be trivial, but since there are no jump instructions yet, it becomes hard. With a "cp" and a "jr" this should be easy. So, maybe once I add those instructions to the search! But I simplified further, and at least this one works ("min", which is one of the examples used in the original "super-optimizador"):

  allowed_ops:
      add
      sub
      sbc
      and
      ld
  allow_ram_use = false
  allowed_registers: a, b, c, h, l
  max_ops = 5
  initial_state:
      A = ?val1
      H = ?val2
  goal_state:
      A = (?val1 & 0xff) < (?val2 & 0xff) ? ?val1 : ?val2

And this is easy (5 instructions, found in under a second). I will keep trying to get the harder cases working Smile

Anyway, updates in the new version (which should also fix the -ansioff thing): https://github.com/santiontanon/mdlz80optimizer/releases/tag...

By Bengalack

Paladin (747)

Bengalack's picture

29-04-2021, 14:27

santiontanon wrote:

For example, for what you mention of IX/IY, if we can find a few instances where they could be replaced by BC/HL/DE or their ghosts, I might be able to create more patterns to have further optimizations.

The single reason for me using C is that the turnaround time for changes and flexibility is great. My project is in contant change, refactoring all the time, with only the crucial parts done in assembly. Sometimes I'm also working in an "hybrid mode". That is: I'm taking SDCC's generated code, and place it in my own asm-files. That's handy for several reasons. Sometimes there are functions that are large, complicated and involving a lot of 16-bit signed arithmetics. I've then typically run SDCC with full opts (takes 15 minutes on my 8-core AMD Ryzen 7 4800U, sadly SDCC does not use more than one core...) and then take the "optimized" asm-code and get that into the quick debug/normal builds. In addition to this I can also modify the code manually. To optimize best, I should study the generated code more. For example, we know that SDCC put every local variable on the stack and reach them by the index register. This knowledge can possibly be further investigated and exploited.

Anyways, but there are things that we could do off the bat. Below is an example of a quick change I've done in a hybrid-function. As long as HL's value isn't used, and being re-set at next use, HL is free to use. 135 cycles freed up by this quick thingy.

There are several other similar cases to be found.

By Bengalack

Paladin (747)

Bengalack's picture

29-04-2021, 22:30

(ref the picture, it says signed int. It is, but it is never negative, so but shifting like this, to div by 8, works)

By santiontanon

Paragon (1805)

santiontanon's picture

29-04-2021, 23:14

Thanks!! this is an interesting example!! I'll look for things like this in SDCC code, and I'm sure there's a few more patterns that can be found!

Page 43/58
36 | 37 | 38 | 39 | 40 | 41 | 42 | | 44 | 45 | 46 | 47 | 48