DivIDEo: Streaming video for the ZX Spectrum
Matt "Gasman" Westcott (@gasmanic / matt@west.co.tt)
DivIDEo is a video converter and player system for the ZX Spectrum 128K with
DivIDE interface, allowing video to be streamed
directly from an IDE device.
Features
- 25fps full-screen video
- Double-buffering using the 128K Spectrum's second screen, to avoid flicker
- Delta compression; only the screen data that changes between frames is stored,
reducing file size and (more importantly) transfer time
- Variable bitrate AY chip audio; all CPU time remaining after handling the video
data is devoted to sample playback
- No need for crazy disk partitioning schemes - videos will be recognised on any
filesystem that stores files in consecutive sectors with no gaps (including FAT,
as long as files don't become fragmented...)
FAQ
Does it work on a DivMMC? Could it be made to?
No, it doesn't. In theory it could be recreated in some form on the DivMMC, but:
- It would need to be a complete rewrite. The low-level APIs for disk access are completely different - esxDOS does a very good job of hiding the differences, but here we need to bypass esxDOS and do the low-level disk access ourselves, to ensure predictable timings. The DivIDEo routine is built around a highly-tuned implementation of the IDE protocol, and practically none of that can be re-used. A DivMMC version of the player would also almost certainly require a redesigned file format, and be incompatible with existing .dvo files.
- It would probably be lower quality. My initial investigations into the DivMMC suggest that data transfer requires a lot more work on the software side - you have to piece together bytes from bits, while the DivIDE lets you pull bytes pretty much directly from the interface. This would place a limit on how fast we can shift data around, leading to lower video quality.
- It would need to be done by someone with more patience reading or reverse-engineering specifications than me. When I looked into the possibility of doing the rewrite, the hardware documentation was either non-existent, or buried in 1000-page specification documents that spend half the time talking about the exact kind of plastic to use in SD card casings. I had access to code snippets for doing disk access on the DivMMC, but with no clues about how far they could be modified before falling outside of hardware tolerances, turning them into a DivIDEo player was a non-starter.
See it in action
To watch the demonstration video (first presented at the Outline 2010 demo party) on an emulator:
- Get yourself an emulator with DivIDE support (FUSE or SPIN)
- Download the DivIDEo Outline demo HDF package (18.8Mb ZIP)
- With DivIDE enabled and a 128K machine selected, open the HDF and TAP files and load the tape (either through Tape Loader or LOAD "").
- The player will begin scanning the disk and outputting the titles of any videos it finds; press Q to quit this process, then Q/A/Space to select the video.
- Enjoy...
On the real hardware:
- Get yourself an IDE disk formatted to work with FATware or ESXDOS
- Download the DivIDEo Outline demo DVO package (17.4Mb ZIP)
- Copy the .dvo and .tap file to the disk. If the disk has previously had deleted files on it, you may need to run a defrag program at this point
to ensure that the video file is not fragmented.
- Fire up FATware or ESXDOS and load the player TAP file.
- The player will begin scanning the disk and outputting the titles of any videos it finds; press Q to quit this process, then Q/A/Space to select the video.
- Enjoy...
The converter
This is a command-line app - after unpacking it somewhere suitable, start converting videos with:
divideo myvideo.avi myvideo.dvo
See divideo --help for more options to tweak the output.
To build from source, you'll need a recent Subversion snapshot of FFMPEG (I'm using the 2010-04-11 one), Imagemagick, and Argtable 2.x. You'll probably also need to hack the Makefile to specify the paths where you've installed things (sorry...)
Greetings
zxbruno, nuggetreggae, natxcross, brownb2, Yerzmyey, Zilog, Factor6, LaesQ, Phoenix, Baze, Garry Lancaster, LCD, Velesoft, Hood and Speccy IDE / video enthusiasts everywhere...
The gory details
Player source code - as ZIP file / as text
The player, and the .dvo file format, are designed around minimising the work that needs to be done on the Spectrum side.
There is no buffering going on (other than what you get for free from the 128K's second screen) - all data is arranged so that
the Spectrum will receive it at exactly the required time, and at exactly the required rate. As a result, the player and file
format are best understood as a single entity; plenty of odd design decisions have been made in the interest of optimising away
a few Z80 cycles. (Rest assured that it was even more confusing for me to design in the first place :-) )
Player pseudocode
let L = number of sectors in first frame
NEXT_FRAME:
if L = 0, exit
Issue an IDE request for the next L sectors
Wait for retrace (i.e. a HALT instruction)
Flip screens so that old screen is displayed and new screen is paged in at 0xc000
Go to MAINLOOP
TABLE:
0) Jump to NEXT_FRAME
2) Execute INI (i.e. read byte from disk to address HL, advance HL)
4) Execute INI
6) Execute INI
8) Execute INI
...
252) Execute INI
MAINLOOP (and table offset 254):
Wait for IDE 'ready' state
Read bytes L, H, A from disk
Read byte from disk and output to AY register 8 (channel A volume)
Jump to offset A in TABLE
File format
The .dvo file begins with a 512-byte (one sector) header:
Offset | Length | Description |
0x00 | 0x08 | Magic number - the ASCII string "DivIDEog" |
0x08 | 0x01 | Version number - must be 0x01. (These first two fields together are used to identify the start of a .dvo file when scanning a disk sector-by-sector) |
0x09 | 0x01 | Size of first frame, in sectors |
0x0a | 0x01 | Border colour - 0x00-0x07 |
0x0b | 0x15 | Empty (set to 0x00) |
0x20 | 0x20 | Video name - padded with 0x00 if less than 0x20 bytes |
0x40 | 0x1c0 | Empty (set to 0x00) |
The remainder of the .dvo file is a sequence of frames, each one consisting of an arbitrary number of video packets followed by one end marker packet.
A video packet contains a sequence of zero or more bytes to update screen memory with at a specified address, plus a single audio sample.
Note that as double buffering is in use, the video data in each frame needs to encompass all the changes to screen memory since two frames ago. (For the purposes of the first two frames - the initial screen state is entirely 0x00 bytes.)
- Each frame MUST start on a 512-byte (sector) boundary (this is because we begin each frame by reading a new set of sectors)
- Packets MUST NOT cross sector boundaries (this is because we need to wait for an IDE 'ready' state before reading each new sector, and we only perform this step between packets).
This means that the last video packet in a sector must be exactly large enough to fit into the remaining bytes of the sector - this may involve adding redundant bytes of video data to fill space.
The end marker packet does not need to be placed at the end of a sector, though; it's acceptable to fill the remaining space with padding bytes.
- All packets MUST be an even number of bytes long; again, this may involve adding a redundant byte of video data to the packet.
(This is because reading the IDE 'ready' state after each packet causes a reset of the DivIDE's internal latch for translating 16-bit words into 8-bit port reads, so an odd-numbered packet would cause this to get out of step and skip a byte.)
- Frame data SHOULD be constructed to make the player routine take up as close to two Spectrum frames (or whatever your chosen frame-rate is) in execution time as possible, to avoid gaps in the audio while the HALT instruction executes. This is difficult if not impossible to calculate exactly
(due to ULA contention, and variations introduced by waiting for the IDE 'ready' state) but 16.91 cycles per video byte plus 105 cycles overhead per packet appears to be a good estimate. This can be done by adding as many audio packets (i.e. ones with zero bytes of video data) as appropriate.
- Video packets SHOULD be kept as short as possible (to minimise the interval between audio samples, since each packet contains only one of them).
- Yes, this is a lot to deal with at once. That's why the converter code is a bit icky :-P
Format of a video packet:
Offset | Length | Description |
0x00 | 0x02 | Screen address (little-endian) to copy video bytes to, with 0xc000 being the start of screen memory |
0x02 | 0x01 | The value 254-(len * 2), where len is the length of the video data (which can be 0). (This is used as an offset into the table of INI instructions, so that the required number of them are executed) |
0x03 | 0x01 | Sample level (0x00-0x0f) to output to the AY, plus 0x80. (Adding 0x80 allows us to output to the AY with an OUT (0xfd),A instruction, freeing up the C register for DivIDE data reading; incomplete port decoding means that this will still be seen as a write to the AY's data port, and the AY will ignore the high bit. It also ensures that when we come to the next IN instruction, A is in the range 0x80-0xbf which avoids a contended port read. Phew!) |
0x04 | ... | Video bytes to be written to screen |
Format of an end marker packet:
Offset | Length | Description |
0x00 | 0x01 | Size of next frame, in sectors, or 0x00 to indicate end of video |
0x01 | 0x01 | Unused (set to 0x00) |
0x02 | 0x01 | 0x00 (used as an offset into the INI table, which handily begins with a jump to the 'next frame' routine) |
0x03 | 0x01 | Sample level (0x00-0x0f) to output to the AY, plus 0x80 |
- gasman 2010-05-01