Hatched from Poll: most challenging game that could be ported to msx in the'80, a discussion on feasibility of Stunt Car Racer for the MSX 1 that I felt I had made sufficiently technical to be off topic for 'General discussion' but on topic for 'Development':
"screen 2 also is a tile mode"
1 byte of nametable can fill 8x8 pixels
this could make superfast polygons
if one manages to sort out the corner cases
In the case of the 8-bit Stunt Car Racer it's actually a relatively easy problem such as I can make out. I really think all it's doing is picking the half a dozen-ish floor segments to draw, performing the actual 3d maths on their top parts, then for each from back to front performing a fill from the bottom of the screen to their upper boundary, and drawing on the appropriate lines (i.e. if that top segment is front facing, all of them; otherwise only those attached to the highest edge and possibly some down the side of the track from any highest edge that is also a side). So the sky is one colour or pattern, the floor and track are another, the top segment boundaries are added for definition.
So you could arrange it so that the filling task at each floor panel is just: between x1 and x2, fill from the bottom of the screen up to the values in the computed table h[x1...x2]. So anything between (x1+7) >> 3 and x2 >> 3 is a completely enclosed column of tiles. Compute (min(h[c1 ... c8]) + 7) >> 3 to get the first tile boundary below the lowest entry in that column. Tile fill up to there. Go tile internal only for the heights above that min, and from bottom of screen to top for anything between x1 and (x1 + 7) >> 3, and between (x2 >> 3) and x2.
Then add the lines, which will involve some sort of process for each tile touched of checking whether it is currently one of the special completely solid ones and if so then substituting another. Though I don't appear to have a good instinctive sense for whether it'd be better to have a fixed allocation from screen location to tile or whether to allocate them on demand. The latter could reduce upload costs but either bookkeeping is a hassle or you risk exhaustion given the possibility that you might claim a screen tile for new drawing, draw on it, replace it with a solid one later, then claim it again for new drawing, etc.
EDIT: as an additional observation, because you need to know only the top edge of each segment and are assuming the camera to be close to upright, a further observation is that you never need worry about full polygon clipping, only individual edge clipping. Which makes life a lot easier. When last I implemented that on a Z80 I naively went with a full multiply/divide solution but I'm sure binary search would work. Don't worry about the far clip plane and the only things to clip to are z=1, z=+x, z=-x, z=+y, z=-y, doing each line separately. So per line grab a bit mask of disobeyed constraints then AND them. If that's non-zero, throw away the line. If not then OR them. While that OR is non-zero, shift it right and for each non-zero bit perform the bisection to find the point on the line that matches the constraint. Substitute it for whichever point was in violation and continue. When done, project and draw.
The main problem would be squeezing two video buffers into VRAM
one can have two buffers in the charset mode mindset
render in halve a charset while showing the other halve charset
doublebuffering via the nametable
If we're embracing monochrome graphics anyway, maybe even the unofficial screen 0 with three segments mode? Definitely documented as 9938 incompatible, but more compact and the big win is that you're never more than 2/171ths of a line away from a VRAM access window. So the CPU can push as fast as it can push at any time. But the addressing turns into a hassle, you still can't have completely unique pixel images unless you restrict yourself to 192 pixels across, and if you think about it then it amounts to you having to upload 25% more data than is actually visible, as two bits out of every eight aren't visible. So pushing faster also buys you the need to push more. And at any likely frame rate, being constrained to VRAM access during only the non-pixel portion of the display quite likely isn't the bottleneck.