MSX 2 Direct VRAM access speeds?

صفحة 1/2
| 2

بواسطة Sandy Brand

Champion (309)

صورة Sandy Brand

03-04-2022, 14:50

Hey everyone,

I noticed that in one of my programs there was sometimes sprite flicker that I could not explain.
After some investigations I figured out that I was writing data too fast. My assumptions have always been that 17 T-States should be enough delay, but testing on actual hardware this assumptions turns out to be false. E.g.:

OUT (#98),A    ; 12 T-states
NOP                ;  5 T-states
OUT (#98),A    ; 17 T-states delay is not enough!

My speculation is that the internal destination VRAM address of the VDP is correctly incremented, but sometimes the byte that is actually written into VRAM somehow gets mutilated.

Been trying to find some 'scientific' information with some clear metrics on this but have not been able to find anything conclusive (e.g.: [[www.msx.org]] VRAM_access_speed or [[map.grauw.nl]] vdp-timings part 1 and 2.)

I have written a test program that runs a couple of test (see below) and ran it on some actual hardware. It seems that 17 T-states delay is only safe when inside V-blank, outside of it it needs to be 18 T-states or more (e.g.: just use OUTI).

Some results:

NMS 8245 (V9938), artifacts occur with 17 T-states if:

  • 50Hz, Outside V-blank, not executing a VDP command.
  • 50Hz, Outside V-blank, while executing a VDP command.
  • 60Hz, Outside V-blank, while executing a VDP command.

NMS 8255 (but was upgraded to MSX 2+ so it has a V9958), artifacts occur with 17 T-states if:

  • 50Hz, Outside V-blank, not executing a VDP command.
  • 60Hz, Outside V-blank, while executing a VDP command.

These results are bit puzzling, because only some combinations of 50Hz/60Hz and running a HMMM VDP command or not seems to cause glitches.

It could be that my hardware is just very old? (Especially my NMS 8255 has seen a lot of usage over the years.)

Could other people also try this test and see what results they get? :) (would be interesting to see what happens on actual MSX 2+ machines, for example?)

BTW: tried the test on latest openMSX (17.0) and the glitches are not visible there, nor does it warn of too-fast VRAM access, which could be an emulation inaccuracy?

Below you will find the test itself, which is basically a BASIC program with some assembly code to run the timing critical parts.
(also see the attached screenshot for what to look for).

Direct VRAM access timing test:

10 REM Direct VRAM I/O timing tests. By Sandy Brand (2022)
20 COLOR15,4,0:SCREEN 5,2:SET PAGE 0,0:COLOR=RESTORE:OPEN "GRP:" FOR OUTPUT AS #1
40 RESTORE 810:P$="":FOR I=1 TO 32:READ D:P$=P$+CHR$(D):NEXT I
50 SPRITE$(0)=P$
60 RESTORE 910:P$="":FOR I=1 TO 32:READ D:P$=P$+CHR$(D):NEXT I
70 FOR I=1 TO 63:SPRITE$(I)=P$:NEXT I
100 FOR I=0 TO 7:PUT SPRITE I,(120,120),15-I,0:NEXT I
200 RESTORE 1010
210 READ SZ
220 FOR I=1 TO SZ:READ D:POKE &H9000+I-1,D:NEXT I
230 D=BASE(28):POKE &H9000,D AND 255:POKE &H9001,D/256
240 FR=5*60:POKE &H9005,FR AND 255:POKE &H9006,FR/256
250 DEF USR0=&H9007
300 HZ=50
310 TN$="1A":VB=1:CM=0:DL=17:GOSUB 500
320 TN$="1B":VB=0:CM=0:DL=17:GOSUB 500
330 TN$="1C":VB=1:CM=1:DL=17:GOSUB 500
340 TN$="1D":VB=0:CM=1:DL=17:GOSUB 500
350 TN$="1E":VB=0:CM=1:DL=18:GOSUB 500
360 TN$="1F":VB=0:CM=1:DL=19:GOSUB 500
400 HZ=60
410 TN$="2A":VB=1:CM=0:DL=17:GOSUB 500
420 TN$="2B":VB=0:CM=0:DL=17:GOSUB 500
430 TN$="2C":VB=1:CM=1:DL=17:GOSUB 500
440 TN$="2D":VB=0:CM=1:DL=17:GOSUB 500
450 TN$="2E":VB=0:CM=1:DL=18:GOSUB 500
460 TN$="2F":VB=0:CM=1:DL=19:GOSUB 500
490 COLOR15,0,0:END
500 REM TN%: Test name
510 REM HZ: 50 Hz (PAL) or 60 Hz (NTSC)
520 REM VB: 0 = outside Vblank, 1 = inside VBlank
530 REM CM: 0 = no VDP command, 1 = run VDP command while writing
540 REM DL: 17 or 19 (T-States delay)
550 CLS
560 PSET(0,30),0:PRINT #1,"TEST: "+TN$+"      (run"+STR$(FR)+" frames)"
570 PSET(0,40),0:IF HZ=50 THEN VDP(10)=2:PRINT #1,"50 Hz" ELSE VDP(10)=0:PRINT #1,"60 Hz"
580 POKE &H9002,VB:PSET(0,50),0:IF VB=0 THEN PRINT #1,"Outside V-blank" ELSE PRINT #1,"Inside V-blank"
590 POKE &H9003,CM:PSET(0,60),0:IF CM=0 THEN PRINT #1,"No VDP command" ELSE PRINT #1,"Run VDP command while writing"
600 POKE &H9004,DL:PSET(0,70),0:PRINT #1,"T-States delay:"+STR$(DL)
610 X=0:A$="sprite":GOSUB640
620 X=126:A$="cross":GOSUB640
630 A=USR(0):RETURN
640 PSET(X,150),0:PRINT #1,"^  See":PSET(X,160),0:PRINT #1,"|  "+A$:PSET(X,170),0:PRINT #1,"|  here?":RETURN
800 REM Sprite pattern = Square.
810 DATA 255,128,128,128,128,128,128,128,128,128,128,128,128,128,128,255
820 DATA 255,1,1,1,1,1,1,1,1,1,1,1,1,1,1,255
900 REM Sprite pattern = Cross
910 DATA 128,64,32,16,8,4,2,1,1,2,4,8,16,32,64,128
920 DATA 1,2,4,8,16,32,64,128,128,64,32,16,8,4,2,1
1000 Rem Assembly code
1010 DATA 313
1020 DATA 0,0,0,0,17,1,0,243
1030 DATA 33,159,253,17,26,145,1,5
1040 DATA 0,237,176,33,66,144,17,159
1050 DATA 253,1,3,0,237,176,251,237
1060 DATA 75,5,144,237,91,158,252,42
1070 DATA 158,252,167,237,82,40,248,11
1080 DATA 121,176,32,239,243,33,26,145
1090 DATA 17,159,253,1,5,0,237,176
1100 DATA 251,201,195,75,144,219,153,165
1110 DATA 32,251,201,14,155,62,2,211
1120 DATA 153,62,143,211,153,58,2,144
1130 DATA 167,32,5,46,64,205,69,144
1140 DATA 58,3,144,167,40,49,33,54
1150 DATA 145,126,60,230,15,71,135,135
1160 DATA 135,135,176,119,33,46,145,62
1170 DATA 36,211,153,62,145,211,153,6
1180 DATA 11,237,179,46,1,205,69,144
1190 DATA 33,31,145,62,32,211,153,62
1200 DATA 145,211,153,6,15,237,179,42
1210 DATA 0,144,124,15,15,230,3,211
1220 DATA 153,62,142,211,153,125,211,153
1230 DATA 124,230,63,246,64,211,153,175
1240 DATA 211,153,62,144,211,153,175,211
1250 DATA 154,62,3,211,154,6,8,58
1260 DATA 4,144,254,19,40,43,254,18
1270 DATA 40,17,62,120,211,152,0,211
1280 DATA 152,175,211,152,0,211,152,16
1290 DATA 241,24,38,1,152,32,33,237
1300 DATA 144,237,163,237,163,237,163,237
1310 DATA 163,32,243,24,20,120,120,0
1320 DATA 0,62,120,211,152,35,211,152
1330 DATA 62,0,211,152,43,211,152,16
1340 DATA 240,62,216,211,152,0,175,211
1350 DATA 153,62,143,211,153,175,211,153
1360 DATA 62,144,211,153,175,211,154,175
1370 DATA 211,154,0,0,0,0,0,2
1380 DATA 0,0,0,0,0,0,0,254
1390 DATA 0,8,0,0,0,208,254,0
1400 DATA 0,0,2,0,8,0,0,0
1410 DATA 192

If you run the test, you should look for sprite glitches whereby a sprite is sometimes visible on the left side of the screen (or anywhere really), or sometimes the center sprite shows a cross (although this seems to be quite rare). Either one of these glitches means a sprite's X, Y and/or pattern number has been garbled while writing it to VRAM.


Edit: Hmm, for some reason the image tag doesn't work, try this instead: https://www.msx.pics/image/yH8cD

Login أوregister لوضع تعليقاتك

بواسطة mcolom

Champion (320)

صورة mcolom

03-04-2022, 16:33

That's interesting. I remember some time ago we compared openMSX with a real machine with a V9918. The Karateka game was used in the tests.
Both had wrong graphics due to the too-fast VDP writes, but the emulator showed images way more corrupted compared to the real machine. You seem to observe the inverse effect.

بواسطة Metalion

Paragon (1628)

صورة Metalion

03-04-2022, 18:24

Sandy Brand wrote:

Been trying to find some 'scientific' information with some clear metrics on this but have not been able to find anything conclusive

http://map.grauw.nl/articles/vdp_tut.php#vramtiming

Minimum VRAM access timings in 3.58 MHz Z80 cycles
Screen mode	VDP mode	TMS9918	V9938 / V9958
screen 0, width 40	TEXT 1	12	20
screen 0, width 80	TEXT 2		20
screen 1	GRAPHIC 1	29	15
screen 2	GRAPHIC 2	29	15
screen 3	MULTICOLOR	13	15
screen 4	GRAPHIC 3		15
screen 5	GRAPHIC 4		15
screen 6	GRAPHIC 5		15
screen 7	GRAPHIC 6		15
screen 8	GRAPHIC 7		15

بواسطة Sandy Brand

Champion (309)

صورة Sandy Brand

03-04-2022, 18:36

@Metalion: Thanks for the info Smile

I guess 'Z80 cycles' is meant to be interpreted as Z80 T-states?

Still, 15 T-states seems to be valid only in best-case scenarios?

بواسطة Bengalack

Paladin (802)

صورة Bengalack

03-04-2022, 20:06

I don’t have the answer, but I’ve found that openmsx allows speedier access than my physical computers do. Reports are in this issue: https://github.com/openMSX/openMSX/issues/1402

My case above, question this line in the link Metalion provided: “Finally, during vertical blanking or when the screen is disabled, there is no speed limit.”

بواسطة Grauw

Ascended (10821)

صورة Grauw

03-04-2022, 21:30

Sandy Brand wrote:

I guess 'Z80 cycles' is meant to be interpreted as Z80 T-states?

It is meant to be interpreted as Z80 clock cycles.

I have also experienced in the past that if you write to VRAM very quickly while executing a command, a point of contention occurs on assigning the access slot to either the CPU VRAM access or the command engine, which are both requesting one. The exact details of how that contention is resolved and why and how the timing plays into it exactly are still unknown though, more information is needed.

The numbers quoted by Metalion above do not consider that contention. They only consider the maximum time to a free access slot.

One theory is that if both the CPU and the command engine request an access slot, then they use up two consecutive slots. In that case the minimum access slot time would be 150 VDP cycles though (25 CPU cycles), and both you and I experienced that you can get away with less. So that theory’s probably not a good one. It seems like CPU VRAM access can stall the command engine, steal slots from it several times in a row, but not reliably when done beyond a certain speed.

بواسطة Sandy Brand

Champion (309)

صورة Sandy Brand

03-04-2022, 21:46

@Grauw: Thanks for the info Smile

I was always under the impression that T-states were synchronized to the Z80 clock-cycles, or do I have that wrong? Smile(z80-instruction-timing.html

“T-state” is equivalent with a clock cycle.

The interesting thing about my experiments though is that I also get garbled VRAM data without running a VDP command? That is the part that puzzles me the most :)

بواسطة Grauw

Ascended (10821)

صورة Grauw

03-04-2022, 22:26

Sandy Brand wrote:

I was always under the impression that T-states were synchronized to the Z80 clock-cycles, or do I have that wrong? :)(z80-instruction-timing.html

“T-state” is equivalent with a clock cycle.

Their duration is the same, but Zilog uses the term T-cycle to refer to specific named cycles in an instruction M-cycle, such as T1, T2, … T6. See the Timing section in the Z80 manual.

Not sure where the term T-state comes from, possibly it’s older Zilog terminology. The current manual mentions the term exactly once. I did learn the term in the past, but nowadays I use the more general term (clock) cycle since it’s commonly used in CPUs, VDPs and sound chips.

Sandy Brand wrote:

The interesting thing about my experiments though is that I also get garbled VRAM data without running a VDP command? That is the part that puzzles me the most :)

Yes, indeed. I haven’t experienced that. In my case the issues appeared once I started doing a VDP command in parallel, before that it was fine. But, I haven’t tested comprehensively, I had the specific situations in my code and did not produce a narrow test case to exhaustively test the details of the behaviour.

I do know that 18 cycles must be safe in combination with command execution, because I’ve used OUTI many times with the V9938 and never had issues.

Re. your varying results between 50 and 60 Hz; in terms of timing 60 Hz is identical to 50 Hz just with a longer vertical blanking period. The V9938 and V9958 are also identical in terms of timing. I would be extremely surprised if frequency or 38 / 58 would truly make a difference. Age of the hardware also does not matter.

So my first thought is (a bit lame but worth checking): Are you sure, was the test correct.

And my second thought is: Between 50 and 60 Hz, does the test run in the exact same moment of display considering that the vertical blanking takes more time. The command will complete sooner during vertical blanking, too.

My third thought is: What could be different between your computers is whether the CPU clock and the VDP clock are sourced from the same crystal (and thus run exactly in tandem), or from different cystals.

بواسطة Sandy Brand

Champion (309)

صورة Sandy Brand

03-04-2022, 23:33

Grauw wrote:

I do know that 18 cycles must be safe in combination with command execution, because I’ve used OUTI many times with the V9938 and never had issues.

Yes, that's what my tests have also shown me.
But the table that Metalion provided mention 15? So this table might not be entirely correct (maybe only valid inside V-blanks?)

Grauw wrote:

So my first thought is (a bit lame but worth checking): Are you sure, was the test correct.

Big smile
Well, this is why I wrapped it into an easy to use BASIC program: I would very much appreciate if other people could also run it on actual hardware, and (hopefully) prove me wrong. Because it is puzzling to say the least Smile

Grauw wrote:

And my second thought is: Between 50 and 60 Hz, does the test run in the exact same moment of display considering that the vertical blanking takes more time. The command will complete sooner during vertical blanking, too.

I run a 254 * 8 HMMM copy command just before I write the sprite data, and I am only writing sprite attributes for 8 sprites, so I am more than confident that the VDP command will not be completed too early, even when run inside the V-blank. Also, the test changes palette color 0 before and after writing sprite data, so you can visually inspect that it does indeed run inside or outside the V-blank.

Grauw wrote:

My third thought is: What could be different between your computers is whether the CPU clock and the VDP clock are sourced from the same crystal (and thus run exactly in tandem), or from different cystals.

Hmmm, that is an interesting one indeed. Is there anyway to know this?
The NMS 8255 has been upgraded to a MSX 2+, so it has had some modifications in the past. Although I never experienced any issues with it. The NMS 8245 is still the same as far as I know.

بواسطة mcolom

Champion (320)

صورة mcolom

06-04-2022, 21:10

I've tried on a real Sony HB-F1XDmk2:
1A: no cross
1B: flashing cross
1C, 1D, 1E. 1F : no cross
2A: 2B, 2C: no cross
2D: flashing cross
2E, 2F: no cross
I didn't observe the left square in any of the tests.

بواسطة snake

Expert (71)

صورة snake

06-04-2022, 21:47

If you change the visible screen content during the refresh there will be always glitches. Maybe is that the issue?

صفحة 1/2
| 2