Assembly Z80, best way to divide by 16

Page 2/2
1 |

By santiontanon

Paragon (1693)

santiontanon's picture

20-09-2022, 23:10

Ah, indeed, I disallowed numeric constants to prune the space. I can try allowing 8 bit constants, but that'll take forever haha.

And indeed, I'll try it in the unpacker! If it's already been optimized, I doubt anything would be found, but let me give it a try! Smile

By santiontanon

Paragon (1693)

santiontanon's picture

21-09-2022, 04:42

Ah, yes, allowing constants: 0, 1, 2, 3, 4, 7, 8, 15, 16, 31, 32, 63, 64, 127, 128, 240, 255

I get this in about 10 seconds, which is pretty much albs_br's second solution (other variations are found, but they are all equivalent):

ld a, h
rlca
rlca
rlca
rlca
and 15

Allowing for all constants between 0 - 255, I left it running for 5 minutes and it had not yet finished so, I stopped it hehe

By [WYZ]

Champion (448)

[WYZ]'s picture

21-09-2022, 09:09

are RRD or RLD useful here?

Ops, already thought by TheNestruo & GDX.

By Micha

Expert (70)

Micha's picture

21-09-2022, 10:20

The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :

ld l,0
ld a,(hl)

Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved

I would personally go for the lookup table as proposed in the first post of this topic...

By theNestruo

Champion (383)

theNestruo's picture

21-09-2022, 10:54

Micha wrote:

The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :

ld l,0
ld a, (hl)

Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved

I would personally go for the lookup table as proposed in the first post of this topic...

I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):

ld l, h
ld a,(hl)

The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)

By bore

Master (147)

bore's picture

21-09-2022, 11:57

While we are looking at impractical solutions, how about changing the way the value is represented?

	ld	a, h	; convert to 4.4 fixpoint and divide by 16
	;and	$f0	; drop fractional bits

As a bonus it can keep the fractional bits.

By Micha

Expert (70)

Micha's picture

21-09-2022, 16:28

theNestruo wrote:

I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):

ld l, h
ld a,(hl)

The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)

Great ! another 3 cycles shaved off....!

By santiontanon

Paragon (1693)

santiontanon's picture

21-09-2022, 19:02

Haha, awesome solutions! And some are not that crazy haha. Those spread out LUTs might even be feasible in some demos with constrained values Smile

By Grauw

Ascended (10633)

Grauw's picture

23-09-2022, 21:13

Haha, those are some great solutions indeed! Bore takes the win in my book Smile. Approaching the problem from a different angle like that can definitely lead to much faster algorithms, for sure I wouldn’t dare call it impractical without knowing the context of the precise intended application.

By albs_br

Champion (445)

albs_br's picture

01-10-2022, 15:38

theNestruo wrote:

If you are using Z80 Assembly meter, there is a z80-asm-meter.platform setting. Set it to msx.

Thanks @theNestruo, it worked like a charm. No standard Z80 times anymore!

Page 2/2
1 |