Wednesday, May 22, 2019

VAX/VMS page file fragmentation

   Here's an article and utility I wrote circa 1987, about pagefile  fragmentation. 

  It uncovered a problem VMS had that was patched in version 5.2. Although the specific problem that I was looking at was patched long ago, the article and program might be of interest to other VAX/VMS hobbyists out there.

  Here's the program...

pfrag.mar

  To user...

$ mac pfrag
$ link pfrag
$ run pfrag


   Page File Fragmentation Analyzer

  A while back, as Houston started to pull out of an economic
slump, the usage of an 8600 at one of my sites grew dramatically
over the course of a few weeks. As it had been lightly used
before then, the growth was tolerated pretty well. After a
while, though, its performance deteriorated, and response times
became very leisurely. Predictably enough, phones rang, meetings
were held, and harsh words were spoken about system performance.

  SHOW SYSTEM showed many processes were spending a lot of time
in state RWMPW. This state stands for Resource Wait, Modified
Page Writer Busy. A process that needs to fault a page out to
the Modified Page List will be placed in this state if the
Modifed Page Writer (a subroutine of the SWAPPER process) is
busy at the time of the fault. It is normal to occasionally see
a process "pop" through this state, but it was happening a lot
on this system, and accounted for the sluggish response. 

  Another clue was found on the console - the message

%SYSTEM-W-PAGECRIT, Page file space critical, system trying to continue

had been printed out there. Sure enough, a SHOW MEM/FILES showed
that one of the page files had almost filled up. I sheepishly
doubled its size, and re-booted, thinking that would be the end
of the problem. 

  Well, I was wrong about that. A few weeks later, the same
sluggish performance occurred. SHOW SYSTEM once again showed a
lot of processes lingering in state RWMPW. This time, however,
SHOW MEM/FILES showed that the page file in question was only
one third full, with plenty of free space available. I rushed
into the computer room (they can't find me in there), and found 

%SYSTEM-W-PAGEFRAG, Page file badly fragmented, system continuing

printed on the console. Since I knew that all of the page files
were contiguous (I always create them that way) this message had
to be referring to some other type of fragmentation - some data
structure or another must have been in a zillion pieces.
Although I was starting to use a LOT of disk space (Hmmm...do
you think DEC designed the VAX as a virtual memory machine in
order to sell more disk drives???) I doubled the page file size
again, just to try and get the system back on the air, and
resolved to get to the bottom of this. 

  The needed information was found in the usual three places - 
VAX/VMS Internals and Data Structures by Kenah and Bate, 
ANALYZE/SYSTEM, and the VMS microfiche. It turns out that a
page/swap file is just a long string of blocks (or pages,
depending on how you think about it) in a normal Files-11 file.
They contain no internal structure or management information.
Management, location, and allocation information is external to
the file. 
  Each page/swap file is described by a data structure in Non
Paged Pool called a Page File Control Block (henceforth to be
referred to as a PFL block). Here is an annotated example, shown
by ANALYZE/SYSTEM... 

80103800   PFL$L_BITMAP              80103824   addr of the allocation bitmap 
80103804   PFL$L_STARTBYTE       00000000   offset to 1st byte in bitmap
80103808   PFL$W_SIZE                   0028           size of this PFL
8010380A   PFL$B_TYPE                   23              DYN$C_PFL type
8010380B   PFL$B_PFC                     00              Page Fault Cluster size
8010380C   PFL$L_WINDOW            8023FCA0  addr of file WCB
80103810   PFL$L_VBN                     00000000   base vbn ?
80103814   PFL$L_BITMAPSIZ          00000BB8  size of the bitmap in bytes
80103818   PFL$L_FREPAGCNT       00005DC0  free pages in file
8010381C   PFL$L_MAXVBN             003FFFFF  ?
80103820   PFL$W_ERRORCNT        0000          count of errors
80103822   PFL$B_ALLOCSIZ            60              current PFC alloc. size
80103823   PFL$B_FLAGS                 01              status byte
80103824   PFL$L_BITMAPLOC        FFFFFFFF bitmap starts here

  These blocks contain the information needed by the memory
management routines to locate the page/swap files, allocate and
deallocate pages, and read/write data to the files. The PFL
block for each file has an associated allocation bitmap that
records the pages in use in that file. Like a block bitmap for a
Files-11 disk, the PFL bitmap has a bit for every page in the
file, and its state indicates whether the page is free or in
use. A free page is indicated by a bit value of 1, and a page in
use is recorded by a bit value of 0. The address of the bitmap
is at offset PFL$L_BITMAP, and the size is at PFL$L_BITMAPSIZ.
The bitmap always follows the PFL block, at offset
PFL$L_BITMAPLOC. 
  The addresses of the PFL blocks are stored in an array, and
the array is pointed to by MMG$GL_PAGSWPVC. The size of this
array is determined by two SYSGEN parameters - PAGFILCNT and
SWPFILCNT. The sum of these values plus one (the first entry is
always a pointer to a dummy PFL) determines the total number of
entries in the array. Page file and swap file entries in the
array are separated, swap file entries first. MMG$GW_MINPFIDX
contains the smallest array index that is a page file - all
indices less than this are entries for swap files, and those
greater than or equal are page files. Unused array entries all
point to location MMG$GL_NULLPFL, as does the first entry. 

  Armed with this information, I wrote PFRAG - a utility that
reports on the status of the page and swap files. For each PFL
block, PFRAG first gets the name of the associated page/swap
file, using routine GETPAGNAM. PFRAG then calls routine COUNT to
analyze the bitmap, in terms of fragmentation. The results are
formatted and printed out, and the next PFL block is processed,
until all have been examined. 

  The MMG$GL_PAGSWPVC array is used to locate each PFL block. 
Entries in this array that point to MMG$GL_NULLPFL are ignored, 
since that indicates a vacant slot. SYSGEN parameters SWPFILCNT 
and PAGFILCNT determine the size of the array, so a total of 
SGN$GW_PAGFILCNT+SGN$GW_SWPFILCNT+1 entries are examined.

  Routine GETPAGNAM is called to get the name of the file.
GETPAGNAM looks up the address of the file's Window Control
Block in the PFL, and uses it to find the File Control Block.
The FCB contains the file ID of the file. The file ID is passed
to routine GETFILNAM, which uses it to retrieve the file name by
means of XQP QIOs. This works for files that are added
interactively, by the SYSGEN INSTALL command. For files that
were opened during system startup (before the XQP has been
started), the WCB does not point to an FCB, and the file ID is
not available. The file name for these files is inferred by the
location of the PFL block address in the array. If there is a
swap file (a swap file is, by the way, optional), it will be the
first entry after the null one, and will be named
SYS$SYSROOT:[SYSEXE]SWAPFILE.SYS. There will be a page file at
boot time (or the system won't start), and it will be the
MMG$GW_MINPFIDX'th entry in the array, with name
SYS$SYSROOT:[SYSEXE]PAGEFILE.SYS. 

  Routine COUNT is called to determine how many holes are in the
bitmap, and how big the 16 largest holes are, ordered by size.
COUNT uses the FFS (Find First bit Set) instruction to find the
beginnings of unallocated holes, and the FFC (Find First bit
Clear) instruction to find the ends. The total space free, the
number of holes, and the sizes of the 16 largest holes are
returned. 

  After the data for a file has been collected, the results are
formatted and output, then the next PFL is examined, until none 
are left.


  To assemble and link PFRAG...

$ MAC PFRAG
$ MAC COUNT
$ MAC GETFILNAM
$ LINK PFRAG,COUNT,GETFILNAM,SYS$SYSTEM:SYS.STB/SEL,SCSDEF.STB/SEL

Here is a sample run, on the 8600 in question.
$ RUN PFRAG

File $100$DUA0:[SYS1.SYSEXE]SWAPFILE.SYS
Size: 44000      Total Free: 35264      Allocation Size: 96   Holes: 13        
                       Sixteen Largest Holes
    --------    --------    --------    --------    --------     --------    --------     --------
   12768      6336      5600      2304      2304      1536      1440      1344
       768        384        192        192          96            0            0            0

File _$100$DUA0:[SYS1.SYSEXE]SWAPFILE1.SYS;1
Size: 14992      Total Free: 12496      Allocation Size: 96   Holes: 5         
                       Sixteen Largest Holes
   --------    --------    --------     --------  --------  --------  --------  --------
    4896      3664      1920      1344       672         0         0         0
          0            0            0            0           0         0         0         0

File $100$DUA0:[SYS1.SYSEXE]PAGEFILE.SYS
Size: 5000       Total Free: 4806       Allocation Size: 96   Holes: 68        
                       Sixteen Largest Holes
  --------    --------   --------   --------  --------  --------   --------  --------
    2590       449       102        89        87        86        84        74
        71         66         63        59        57        54        49        48

File _$100$DUA6:[PAGE]VAX1PAGEFILE.SYS;1
Size: 60000      Total Free: 58243      Allocation Size: 96   Holes: 201       
                       Sixteen Largest Holes
   --------     --------    --------    --------     --------    --------    --------     --------
   12004      3655      2407      1882      1650      1486      1303      1299
     1009        882        860        831        788        782        718        713

File _$100$DUA7:[PAGE]VAX1PAGEFILE.SYS;1
Size: 64000      Total Free: 50922      Allocation Size: 48   Holes: 4004      
                       Sixteen Largest Holes
 --------  --------  --------  --------   --------  --------   --------  --------
      55        55        55        54        53        53        53        53
      53        52        52        51        51        50        50        50


  The PFRAG utility turned up some interesting facts about the
effects of page file fragmentation. As I suspected, page file
fragmentation tends to increase over time, and does not
completely recover until a system re-boot is performed - kind of
reminds me of POOL fragmentation on my old IAS and RSX systems.
In the sample run, above, the last page file is an example of
just such a state. This run was done in the middle of the night,
when very little was happening on the system. This page file was
80% empty, yet the largest hole was only 55 blocks long, and the
allocation factor was down to 48. When the users got to work in
the morning, this fragmentation would have made itself felt. A
system boot was done, and the problem was avoided. 

  VMS memory management attempts to minimize the amount of
physical IO to the page file by "clustering" together pages to
be written. The Modified Page Writer maintains a field in each
PFL block that tells it how big to try and make the clusters.
This field is set when the file is created to the value stored
in MPW$GW_MPWPFC (from the SYSGEN parameter MPW_WRTCLUSTER, 96
by default). When the Modified Page Writer scans the bitmap and
does not find a contiguous hole of that size, it reduces the
value in that field by 16, and tries again. If the allocation
size falls to 16, the Modified Page Writer uses a worst case
allocation routine that can allocate chunks of size 1 to 16
blocks in length. It is at this time that the Modified Page
Writer prints the fragmentation warning message on the system
console. However, the Modified Page Writer can be a bottleneck
before the warning message appears - a small allocation factor
causes extra scans of the page file bitmap to be performed, and
additional IOs to be done to write modified pages to disk. 

  To avoid this overhead, first make sure that your page and
swap files have a generous amount of free space in them. Then
use PFRAG to determine the amount of fragmentation present. If
the allocation factor for a file falls below 96, try booting the
system to get all the holes in one piece again. If the
allocation factor for a file takes a fast nose dive below 96
after a system boot, then odds are the file could really stand
to be larger - or that you need an(other) alternate page file. 
Try to spread your page and swap file needs across several disks 
and controllers. 
  Try, however, to avoid making these files any larger than is 
strictly necessary. For reasons I have yet to ascertain, large
page/swap files can increase the amount of memory required in
the SYSTEM working set. This can send the number of SYSTEM page
faults through the roof if SYSMWCNT is not large enough to deal
with it. 

  The first time my 8600 slowed down, I really did need a larger 
page file (or files). The second time, though, all the system 
needed was a re-boot, In the future, I won't let that VAX go so 
long between restarts.