Direct Video Memory Access (DVMA) for V9938

Página 3/6
1 | 2 | | 4 | 5 | 6

Por flyguille

Prophet (3028)

Imagen del flyguille

22-10-2010, 19:11

the good thing about this new HW, is that you don't needs to write new soft for this.

What I means, this is not a new vdp chip, is the same, so there is nothing new.

So, let your already done video game, that was with speeds problems, let it to detect if the msx has this modification. If it pass, use short-cuts in the gameplay redraw routine to gain some speed.

This needs little modifications anyway in already done soft.

--------------------------------------------------------------------------------------

Thinking better.................................

NO!, is the same....

because

A) what are you using to map VRAM on CPU ¿a subslot/slot address?, ok , it is about one page selectable..... because you want to have CODE and DATA (gfxs) on CPU available, so see the common video game

you has in page1 and page2 the ROM with the game data, page3 is system data, what left? page 0, because you will no use BIOS anyway, so..... maybe you can trash the page 3..... bla bla bla

anyway, the hole is about 16k on CPU view.

so, on a hardware point of view, you can't to map all vram on one shot to CPU, you can to map 16k of it.

so, that means that you needs to do a mapper for paging, VRAM, just a normal 74ls670 mapper can do, reusing the registers fc-ff for standarize purposes.... ok, or any I/O

what happens? you needs to split anyway the address, in way to set up the PAGE, before rading / writting VRAM.!!!!!

And that having in account that the SLOT/SSLOT is already done and not changes!

UNLESS we talks about just MSX1.... but msx1 don't needs speed anyway, all its games are just SCREEN2 with patterns, for scrolling are just 678bytes to write, and sprite new x,y,p,c that is nothing.

UNLESS you are thinking in fixing a lot of speecy bad ported games that uses SW sprites.

So, it really don't compensate the overhead of VRAM ptr anyway, because it has one in msx2 anyway (the mapper).

What it will useful in MSX2?, i only can to thinks one application, to help to VDP's commands speed, doubing the speed.

if you has Symbios by example, a text windows, needs to scroll all the windows one text line.

you set up a vdp command to do the half, and at the same time CPU will do the other half, that kind of gain in scroll speed....

the same applied to large software sprites animations on bitmap....

CONCLUTIONS: It don't reduce the overhead, but it can help with smart programming.

Anyway?, anybody do a VDP's command execution and at the same time helps with normal I/O VRAM writes? it is possible?

Por PingPong

Prophet (3889)

Imagen del PingPong

22-10-2010, 23:15

Anyway?, anybody do a VDP's command execution and at the same time helps with normal I/O VRAM writes? it is possible?

@flyguille, yes. It's possible without data corruption.

I've done a test that while the vdp is executing memory copy commands (both byte and logical move) one could do I/O on port 98 at usual speed (with no command in progress). (the demo does otir or blocks of 64 outi instruction to move ram to vram while a LMMC or HMMC command is in progress)
To make sure there were no problems, i've also arranged 4 rows of 8 horizontally aligned 16x16 sprites in screen 5.
The test worked fine in 50/60hz modes. no data corruption at all. (Philips NMS8245 standard)

This make me wonder that the vdp used a fixed amount of times for data I/O and another fixed amount of time for commands.
A better way would be to dinamically use the z80 I/O time for commands if no data I/O is in progress. Unfortunately this is not. if one does not use z80 time this is simply lost.

Por flyguille

Prophet (3028)

Imagen del flyguille

23-10-2010, 05:24

IN RESUME
-------------

I thinks, about gamming , what can be to improve...

a) the overhead is not the answer. (you are skipping an I/O overhead to a PAGING overhead). ok?

the fact is in most of cases, the games has already calculated in [HL] the relative addr thinking in 64K blocks.

so, you needs to take from it the page selector value and offset, is like the I/O based overhead.

offcourse, can be a good improve in 3D simulations like games.

when you splits the drawing in 16K blocks, in the way to do the paging just one time, if the render routine allow that, there is the speed improve, IF YOU NEEDS to DO randoms inside each 16k blocks without repaging.

handling the [HL] register, with sums and take-aways operations is faster, if you are redrawing ignoring pixels that will not change anyway.

offcourse an operation that is impossible with I/O is the GOING-BACKWARD, like doing a copy of a block in reverse, using z80 you can use LDDR.

Anyway, on direct mapping you gets crash with the ADDR bus space that is limitted on 16bits/64K..... when you already uses 32K and 16K for system.

But maybe, in a smart move, you can take the page 2 and 3 for code & data, and page 0 + 1 for VRAM, and using lo-res like screen 5 that is 32KB per viewable page.

Under that circunstances you avoid at all the vram pagging.

---------------------------------------------------------------------------

Now, this is a just enought the improve in hardware capabilities in way to pay the cost?

----------------------------------------------------------------------------

I thinks, if this modification is doable in way to upgrade our msxs also is doable another thing.

A FASTER EXTERNAL VDP COPROCESSOR able to do basic things without Z80 intervention.

1) TO write vram with a value.

that simple function is sooooo important in 3D rendering, because by example you are loosing a lot of time with the slow internal vdp commands, when you re-redraw a 3d scene you needs a clean area.

that simple stuff.

Por PingPong

Prophet (3889)

Imagen del PingPong

23-10-2010, 11:06

@flyguille: i think the best approach to solve vram issues is this:
1) faster VRAM
2) ability to page in/out into cpu address space for specific operations
3) a real vram coprocessor, that works very fast on this vram

the CPU, if it's needed can map it's address space to access vram, but a lot of operations should be executed by (3), very fast, and with move block operations oriented to rectangular area and / or linear area. the use of LDIR LDDR and so on it's not always desiderable.
In this way, the v9938 or TMS9918, have access to vram, but they only serves barely as a display device.

MSX2 VRAM is too big to allow a fast memory mapping scheme. (Or in the other way of thinking the z80 has a too little addressing capabilities)

On msx1 however, things could be different (and interesting)

Por PingPong

Prophet (3889)

Imagen del PingPong

23-10-2010, 16:35

the sega genesis (?) vdp approach is a little more faster. while the vdp is always addressed in the same manner, the I/O ports are memory mapped.
so when the 68000 wants to write a value in vram does this in a way similar to: (in z80 asm)

; write a value contained in A at address pointed to HL

LD (__VRAMPTR), HL
LD (__VRAMDATA),A

instead of the msx way

push af
ld a,l
out (__VRAMPTR),a
ld a,h
out (__VRAMPTR),a
pop af
out (__VRAMDATA),a

the first is a bit faster, because you use the 16 bit addressing, instead of a lot of 8 bit instruction

Por flyguille

Prophet (3028)

Imagen del flyguille

24-10-2010, 20:44

yeah, that will be great..... a simple/small PCB pluggable in actual VRAM slots capable of doing a lot of things faster

so, you will boost every existing msx machine with amazing 2D acelerator external-co-vdp

now, there is a problem, not all MSX uses the same sockets for VRAM , but many of them is 4x 41464 (IirC) DRAM chips.

and then you needs an extra connection direct to z80 for the vram direct access and external co-vdp processor control....

I thinks can be done with just flat cable with terminators so the mouting is just soldering 3 flat cables linear

what will to have? SRAM & a smd microcontroller and that is all

Por hit9918

Prophet (2923)

Imagen del hit9918

24-10-2010, 22:57


push af
ld a,l
out (__VRAMPTR),a
ld a,h
out (__VRAMPTR),a
pop af
out (__VRAMDATA),a

what about this:

ld c,0x99

out (c),l
out (c),h
out (0x98),a

it uses C register. and it could work in some loop which does add offsets to HL. and high bit of H already set for vram write.

Somewhere I read that one also should wait between out 0x99 and out 0x98. and BIOS does strange 2x ex (sp),hl even though the 99 setup already has done EI + RET.

but out 0x99 needs only 2 microseconds "VDP delay" which is 7.14 cycles, round up to 8 cycles. And that is the 5 cycle opcode fetch and 3 cycle immedeate fetch of Z80 OUT (n),a.
havent tested this on real machine. bluemsx does not complain. but it did complain a outi nop nop outi which is 1 cycle too fast.

Por flyguille

Prophet (3028)

Imagen del flyguille

25-10-2010, 01:34

the strange ex (sp),hl was there because some msx test machines runs @ 6mhz.

IIRC the most slow timming is MSX1 TMS, and it needs just two NOP (8 cpu cycles) or a NOP + EI is the same timming @ 3.57.

Por hit9918

Prophet (2923)

Imagen del hit9918

25-10-2010, 17:53

the strange ex (sp),hl was there because some msx test machines runs @ 6mhz.

mistake, I had been looking at vram READ code!
in that case one got to wait with the IN 98 with 8 microseconds delay.
but no waiting needed before OUT 98 when you put it in write mode.

because when setting up for read, the VDP actually got to read the first byte from vram to internal buffer. it does that one read without any port 98 action! because the VDP cannot delay the IN instruction, it got to have the byte ready in an internal buffer when you do an IN 98.

in some TI99 forum I read that "write mode" actually is "prefetch inhibit". it does disable that read of the first byte. there actually is not a write mode and read mode, just this prefetch inhibit at address setup. so you can mix reads and writes without port 99 setup. BUT you better dont do this on MSX which got so many different VDP implementations.

so, in write mode, you can do the out 98 right after the out 99. the byte ends up in internal buffer. now you got to wait 8 microseconds till one can be sure that meanwhile the vdp had a vram acess slot and flushed that internal byte and is ready for taking another OUT.

again, havent tested on real machine but am 99% sure it wont wreck - on all VDP implementations. bios too does not do the ex (sp),hl delay in case of writing LDIRVM.

still it is crazy that an MSX 2 BIOS ends up in delays for a MSX 1 VDP with a 6Mhz z80 that doesnt exist. it does have a faster code, but below screen 4 debugger ends up in an old slow MSX 1 version. which is even too slow for MSX 1 VDP.

Por hit9918

Prophet (2923)

Imagen del hit9918

25-10-2010, 18:08

oh, bluemsx does not warn too fast vram acess with this one:

ld hl,0x0000
ld c,0x99
out (c),l
out (c),h
in a,(0x98)

it should warn lack of 8 microseconds delay between port 99 and port 98 acess (independant of whether port 98 IN or OUT). but only in case VDP was set up for read mode.

Página 3/6
1 | 2 | | 4 | 5 | 6