Wednesday, September 20, 2023

RSX11D V4 boot problem update

   A few posts back I posted about finding RSX11D V4, on a deceased DEC engineer's RK05 pack.. 

  That post presented it, but with a boot problem that required a manual step or two to get it to boot. The boot problem occurred both in SIMH and on real hardware.

  Tony Lawrence read the post, and figured out the problem (along with some others). In less than 24 hours. He said I could share his analysis, in a shortened form (his work was incredibly detailed). Thanks, Tony! Heres a note from him, summarizing the problem.

"The boot loader abuses a hardware-dependent "feature" of an old RK11 disk controller and reloads its word count while the operation is still in progress, so to keep the disk controller going (with what it was doing, because the only thing to stop it would be an error or the WC reaching 0): until it reaches the end of memory.  More modern controllers (and certainly, simh) can't be used with such a "trick",as like I said previously, they are numb to the register write while the operation is on-going, and in case of simh, the operation just stops when it reaches the word count of 65534 (because the write is not time-simulated, it's a one-shot operation, internally, of basically one read off of the disk image -- so the new WC would not have ever been noticed)."

  He included a disasasembly of the code in question, and pointed out that in the process of doing all this, the boot sector loads the system image on top of the boot sector, which isn't a problem since the code that overlays the boot sector, from the system image, is the same.

  In another paragraph, he pointed out...

"I can't know why the developers chose to use this boot hack with sizing the memory using RK11 -- because that's basically what they were doing.  At the boot time, the file structure of the disk is obviously unknown, but the first 245. blocks following the initial Disk Address (DA), do load the operating system.  Since they kept reloading the WC, it'd cause the controller to keep on going, loading up all the following sectors (with basically unstructured garbage) until the bus timeout -- address does not exist -- converted to the NXM error by the controller.  Should they load fewer than 128K because of the NXM condition (e.g. 96K phys mem), the end of the image wouldn't be read in, and the system would crash (as the tail of the image file contains something which looks like the STD -- the System Task Directory).  If the NXM occurred past the 128K, it's all good and everything can actually continue! So it was not considered as an error at all, but an attempt to load as much as possible from that initial DA and to make sure the memory is not tight.  There could be a problem, though, with such a logic, if the OS image was located closer to the end of the drive, then it could have triggered an OVR (overrun) controller error before the end of memory reached, and the boot would fail (would loop to address 0 to start over again).  Anyways, it was a very bad example of how things should NOT be done! LOL"

  He even included a patch to remove the manual steps that I had been using to get the image to boot.

"All in all, if you open the disk image (the .dsk file) with a binary editor, go to locations 146 and 6421146, and enter the following to replace the next 7 words:

  Tony also pointed out that if you do a SYSGEN on this system, you'll have to apply the patches again, pointing at the new RSX.SAV file. I'm working on an automated way to take care of that - stay tuned for another update when I get it done.

xxx146: 105737  ; TSTB @#177404  ;  Complete? 
xxx150: 177404  ;
xxx152: 100375  ; BPL 146                   ;  Loop if not
xxx154: 005737  ; TST @#177404     ;  Error?
xxx156: 177404  ;
xxx160: 100707  ; BMI 0                        ;  Loop back to boot if so
xxx162: 000240  ; NOP

Then your system will boot right away!  (You can actually skip patching the boot sector, locations 146+, and only patch locations 6421146+ -- note this is an offset in octal)."

 For anyone that has an interest in RSX11D V4, I include below the PDP11.INI and RSX11DV4PAT.DSK, with the above desribed patch.

rsx11dv4pat.dsk

pdp11.ini

  Using them, here's what the boot looks like now

C:\simh40\PDP11\rsx11d\rsx11dv4>pdp11

PDP-11 simulator V4.0-0 Current        git commit id: ab3e07a4
Disabling XQ

RSX-004A
MCR>MOU DK:
MOUNT-**VOLUME INFORMATION**
        DEVICE =DK0
        CLASS  =FILE 11
        LABEL  =RSXSYS
        UIC    =[1,1]
        ACCESS =[RWED,RWED,RWED,RWED]
        CHARAC =[]
MCR>

  Of some interest, but not related to the boot problem, he pointed out a few curious things in the file structure of the disk. One was that there are files that weren't completely deleted - their checksums weren't deleted as they should be. As well, there are a number of files with really crazy looking update dates - here's a couple of examples.

RUN.TSK;1           (111,2420)      9./9.         C  01-APR-74 22:38  [1,5]    [RWED,RWED,RWE,R] 
32-SEP-73 00:01(9.)

INI.TSK;1           (112,2421)      17./17.       C  01-APR-74 22:38  [1,5]    [RWED,RWED,RWE,R] 
32-SEP-73 00:00(9.)

 I'm guessing these are results of Files-11 bugs from long ago.