Thursday, July 26, 2018

VAX/VMS Logical Name Hash Table Statistics

            Logical Name Hash Table Statistics

  VMS (and RSTS, and RSX, for that matter) provides a logical
name service, by which a string is mapped into one or more
"equivalence" strings. These logical names can be used in place
of a literal file or device specification to provide device
independence. This allows programs and data to be moved from one
device or system to another without requiring extensive
modification. They are also used to present the user with a
shorter, more manageable "abbreviation" of an actual device or
file specification.

  Everyone is, by now, pretty much sold on the value of logical
names in VMS. Many  years ago, at one of my larger sites, the
Technical Support group (the group responsible for production
system standards at this site) set a policy that ALL access to
files and devices in production systems would be by logical
names. This seemed like a reasonable idea to me, since the tape
and disk farm there tended to change configuration every few
months or so, and the users' systems were wont to drift around
from disk to disk, displaying a tropism for free space. A strict
logical name policy allowed them to change the locations of
files that were used by thousands of users without causing any
problems.

  When we started out, we elected to create any logical names
used by more than a few individuals as system wide logicals,
created at system startup. This, we reasoned, would prevent
wasting processor time creating and deleting them as process
logicals each time a user entered an application. It would also
save space, since we would have one copy of each logical,
instead of hundreds. At that time, I made a cursory
investigation of the possible impact of such a tactic. All
logical names are allocated in Paged Pool. No problem there - we
would just make sure we SYSGENed it to be large enough. I also
found that the names were located by hashing, so I reasoned that
a large number of them should not affect the time required to
locate one, since hashing tends to be much less affected by the
number of items to be searched than, say, a linked list.

  But, that was several years ago, and a great many applications
have been written there since then. With thousands of logical
names being maintained, I thought that a more detailed look for
any potential bottlenecks in this process might be in order. An

examination of the data structures involved brought to light
some interesting facts. It turns out, that all of the many
logical name tables (both system and user provided) are, in a
sense, an illusion. Typically, we think of a table as a
structure that is used to locate and select something. Logical
names are located independently from the logical name table they
reside in - AFTER being located, the system checks what logical
name table they are in, to see if it has located the correct
logical name.

  Logical names are actually located by means of hash tables (do
not confuse logical name hash tables with logical name tables).
There is a system wide hash table used to locate shared logical
names - those accessible by more that one process. Each process
has its own process private hash table. The residency of a
logical name in a logical name table (eg, LNM$_PROCESS) can be
considered to be "virtual". The logical name is first located in
one of the hash tables, and after being located, its "residence"
in a logical name table is evaluated.

  A hash table is a "semi" random access technique for storing
and retrieving data. The principle behind it is simple - that it
is very efficient to access an entry in an array, given the
index of the desired item. In the real world, some elaboration
on the idea is necessary for two reasons. The first is that most
data do not have a simple numeric quality that can be used as an
array index, so it becomes necessary to generate one. The
second, is that, in most cases, the set of possible items to be
stored is much larger than the set that actually occurs, so
allocating space in an array for every possible item would be
very wasteful.

  The index into the table is generated by applying some
algorithm to the data, to produce an integer. Typically, the
algorithm "hashes" up the data, in an attempt to produce a
random distribution of indices. Since the table is smaller that
the number of possible items, it is inevitable that some items
will generate the same index (will "hash" to the same address,
thus causing a "collision").

  A method often used to resolve these collisions (and, indeed,
the very one used in the logical name hash tables) is to have
each entry in the hash table contain the listhead of a linked
list. As items are entered in the table, all entries that hash
to the same index are linked onto the same linked list. When
looking up an item, the hash address will direct you to the
appropriate listhead. The list can then be traversed to find the
item you want. This is referred to as collision resolution by
chaining.

  The efficiency of a hash table using collision resolution by
chaining is measured by how many "probes" are necessary to
locate an item. A probe is considered to be a comparison of an
item in the linked list with the search criteria to determine if
it is the one we are after. Thus, if we locate a linked list and
find the item we want is the first item in it, then we found it
with one "probe". If we hash to a location and discover that
there are no items in this linked list, then that is zero probes
- we know immediately that it is not in the table, with no
compares performed.

  My dusty old college Data Structures textbook pointed out that
the efficiency of a chained hash table is a function of two
things. One is the ability of the hashing algorithm to hash
items to a good random distribution of indices. An algorithm
that hashes to the same address for many different items will
cause the linked list for that index to be very long, and the
speed of access will degenerate to that of searching a linked
list. Testing showed that the one used for logical name hash
tables does a reasonably good (but, not perfect) job of
distributing different logical names over the hash table.

  The other factor affecting performance of a hash table is its
fill factor. The fill factor is the ratio of the total number of
entries in a hash table to the size of the table. For an
unsuccessful lookup (that is, the item we want is not in the
table) the number of probes tends to be exactly the fill ratio.
For a successful search, the number of probes is approximated by
1 + 0.5*fill factor.

  Armed with this knowledge, and a sinking feeling that the hash
tables on this system were very full, I wrote the LNMFILL
utility. This utility locates the hash tables, and travels down
their entries. For each entry in the hash table, it traverses
the linked list it points to. The utility counts slots used,
slots empty, and maximum and average lengths of the linked
lists. LNMFILL does its work in KERNEL mode, since it needs to
acquire the LNM MUTEX. It is compatible with VMS V5 and up as
well as VMS V4, since the call to SCH$LOCKR will acquire the
needed spinlocks automatically. All calculations are rounded off to
integral values, to make everything easier...

  A run of the utility on one of the systems in question
produced the following results...


   *  Shared Tables *

Total Listheads : 128
Free Listheads  : 0
Logical Names   : 2207
Ave. Chain Len. : 17
                   Top 16 Max. Chain Lens.
   157    154    151     68     64     58     54     54
     43      43      42     39     36     26     21     20

   *  Process Table *

Total Listheads : 127
Free Listheads  : 109
Logical Names   : 20
Ave. Chain Len. : 1
                   Top 16 Max. Chain Lens.
     2      2      1      1      1      1      1      1
     1      1      1      1      1      1      1      1


  The process private hash tables were in pretty good shape.
They had a fill factor of less than 1 (logical names/total
listheads), and the average chain length was 1. The maximum
chain length was 2, indicating that there was an occasional
collision, causing two items to hash to the same address. We
decided that the size of this table didn't need to be increased.
If necessary, it could be increased by SYSGEN parameter
LNMPHASHTBL. It is in units of table entries, and defaults
to 128.

  It was pretty clear that the hash table size for the shared
table needed to be increased, to get the fill factor down to
less than 1. This would cause fewer items to be linked onto
linked lists, and speed up both successful and unsuccessful
searches. Its size is controlled by SYSGEN parameter
LNMSHASHTBL, and it also defaulted to 128. This was too small by
over an order of magnitude, for this system. Fortunately, this
is not too "expensive", in terms of memory, to increase. We
raised it to 2048 - from .25 pages to 16 pages of paged pool.
After a reboot, the improvement was definitely noticeable.


   *  Shared Tables *

Total Listheads   : 2048
Free Listheads   : 1028
Logical Names   : 2203
Ave. Chain Len. : 2

                   Top 16 Max. Chain Lens.
   131    130    130     18     12     11      9      9
       9        9        9       8      8        8      8      8

   *  Process Table *

Total Listheads   : 127
Free Listheads   : 109
Logical Names   :   22
Ave. Chain Len. :     1

                   Top 16 Max. Chain Lens.
     3      2      2      1      1      1      1      1
     1      1      1      1      1      1      1      1

  This was much better, with a load factor of about 0.5, but,
the top three maximum chain lengths were surprisingly high,
about 131. A little inspection with SDA showed why there where
so many collisions at these three indices. Each job references
three logicals with the same names - SYS$SCRATCH, SYS$LOGIN, and
SYS$LOGIN_DEVICE. Since the hash address is a function of the
logical name, and since all JOB logicals are in the system
shareable hash table (since they are potentially accessed by
more than one process), each of these three hash to the same
three indices. There will be one entry per logged in user on
these three linked lists. In any case, the average chain length
was down to 2, so, for most logical names, access times were
much improved.

  If you have a small to medium site, then you probably do not
have to worry about changing the table sizes. If you have much
more than 128 logical names (the default hash table size), then,
you should consider increasing the size of LNMSHASHTABL and/or
LNMPHASHTBL, as needed to improve the speed of logical name
translations.
  To use LNMFILL, just assemble, link, and run it. Use of
LNMFILL requires CMKRNL privilege...

$ MAC LNMFILL
$ LINK LNMFILL
$ RUN LNMFILL