TLMBoy: The PPU
Table of Contents
1. Introduction
This the the second post of my TLMBoy post series covering the structure and implementation of the CPU.
PPU
Pixel Processing Unit. Screen: 160x144 (wxh) pixel; 20x18 tiles Screen buffer: 256x256 pixel; 32x32 tiles Tile Map contains 32x32 index bytes in a row-wise fashion. Each Byte addresses a certain tile which is stored in the tile data table Normal tile (small): 8x8 Big tile: 8x16 A 8x8 tile contains 16 bytes whereby each 2 bytes represent a line. The two bytes have to be interleaved in order to obtain the pixel colour. 01011010 | .123321 00111100 |
Two kindes of interrupts: vertical interrupts/blanks (ISR at 0x40): when all lines are drawn, roughly 60 times a second, horizontal blanks (ISR at 0x48)
Sprites: At most 10 sprites per scanline
From: https://forums.nesdev.com/viewtopic.php?f=20&t=17754&p=225009#p225009 Each scanline is 456 dots (114 CPU cycles). Consists of mode 2 (OAM search), mode 3 (active picture), and mode 0 (horizontal blanking). Mode 2 is 80 dots long (2 for each OAM entry), Mode 3 is about 168 plus about 10 more for each sprite on a given line, Mode 0 is the rest (at most 208, at least 108). After 144 scanlines are drawn are 10 lines of mode 1 (vertical blanking), for a total of 154 lines or 70224 dots per screen. The CPU can’t see VRAM (writes are ignored and reads are $FF) during mode 3, but it can during other modes. The CPU can’t see OAM during modes 2 and 3, but it can during blanking modes (0 and 1).
Sources: Cool video about the PPU interrupts: https://www.youtube.com/watch?v=zQE1K074v3s Detailed (especially FF41): https://gbdev.gg8.se/wiki/articles/Video_Display#LCD_Control_Register
Start of rendering bg 32*(num_line+pos_y)+reg_x/32
0 - 160
0 ------------
| |
- | |
| |
144 ------------
TODO: OAM interrupt?
General Loop
Renderloop is a CTHREAD. This where SystemC really shines (compare with other emulators)
Start OAM search mode. This mode lasts for 80 cycles and copies data from the m_IRQ48Signal &= 0x08; object attribute memory into the PPU. During this time a program should not access the OAM.
Then another 168 cycles for the LCD state in which data is copied to the LCD driver. During this time the CPU should not access VRAM or OAM.
In our simulation we directly work with the data in the OAM and VRAM. Hence, we actually do nothing during OAM search and LCD transfer.
Next follows the H-Blank mode. The LCD driver is rendering one horizontal line of the image. This allows the CPU to access VRAM and OAM. First set the mode Then draw the line. The increase line counter by 1.
Drawing Sprites
Is sprite drawing active? If not stop. num_rendered_sprites = Maximum number of sprites per line is 10. Keep track of the number of sprites that we drew. Pointer to tile data table. Sprites always use the lower ones. OAM table = sprite information. There are 40 sprites each has 4 Byte of corresponding data. Hence 160B in total. pos_x = x position of the lower right corner of the sprite. pos_y = y posititon of the lower right corner of the sprite. Positions already assume a big 8x16 sprite. A value of (0,0) hides the sprite. Display it in the upper left corner with (8,16). Calculations with the top-left corner are a little bit easier, hence pos_x - 8, and pos_y - 16; We also get a tile index, and some sprite flags.
Drawing BG
Two different data sets, so-called tile data table, which the GB can use. First: 0x8000-0x8FFF (4kiB) Index 0-255 Second 0x8800-0x97FF (4kiB) Index -128-127, 0 at 0x9000 There is an overlap of 2kiB.
Similar concept for the tile map which contains pointers to the tile data table. First: 0x9800-0x9BFF Second: 0x9C00-0x9FFF
// A complete screen refresh occurs every 70224 cycles.
void Ppu::RenderLoop() {
while (1) {
for (uint i=0; i < 144; i++) {
SetBit(reg_0xFF41, false, 0); // Mode = OAM-search (10).
SetBit(reg_0xFF41, true, 1);
wait(80);
SetBit(reg_0xFF41, true, 0); // Mode = LCD transfer (11)
wait(168);
DrawBgToLine(i);
if (*reg_0xFF41 & gb_const::kMaskBit3) {
*reg_intr_pending_dmi |= kMaskLcdcStatIf; // Continue here
}
*reg_0xFF41 = 0b11111100; // Mode = H-Blank (00).
bool ly_coinc_interrupt = *reg_intr_pending_dmi & gb_const::kMaskBit6;
bool ly_coinc = *reg_ly_comp == i;
if (ly_coinc_interrupt && ly_coinc) {
*reg_intr_pending_dmi |= kMaskLcdcStatIf;
}
SetBit(reg_0xFF41, ly_coinc, 2);
wait(208);
(*reg_lcdc_y)++;
}
SetBit(reg_0xFF41, true, 0); // Mode = V-Blank (01)
SDL_Delay(1); // TODO(niko) make this correct with realtime things etc.
DrawToBuffer();
game_wndw->DrawToScreen(*this);
window_wndw->DrawToScreen(*this);
DBG_LOG_PPU(std::endl << PpuStateStr());
// irq_vblank.write(true);
*reg_intr_pending_dmi |= kMaskVBlankIE; // V-Blank interrupt.
for (uint i=0; i < 10; i++) {
wait(456); // The vblank period is 4560 cycles.
(*reg_lcdc_y)++;
}
*reg_lcdc_y = 0;
}
}
Literature: Cool website with nice graphics: https://www.copetti.org/writings/consoles/game-boy/ Deep dive PPU: https://blog.tigris.fr/2019/09/15/writing-an-emulator-the-first-pixel/ Gearboy’s code: https://github.com/drhelius/Gearboy/blob/master/src/Video.cpp gbemu’s code: https://github.com/jgilchrist/gbemu/blob/master/src/video/video.cc