Hash Test

Copying Hash

The programs in the src directory allow you to initialize a hash test database and populate it with random data. Several test programs give you different views of the database.

Test Program Description
inithsh Initialize the database in batch
bldhsh Populate the database
delhsh Delete random records in the database
tmphsh Populate a temporary database
hshsize Print the number of records
dbck Check the database pointers
chains Calculate statistics on overflow chains
getprime Calculate a prime number
hreorg Reorganize the database
lsthsh List the database in physical sequence
dmphsh Dump the database in hex
schema Print the schema for a database
holdlock Lock the database while running tstlock
tstlock Test the Database Lock

Test Procedure

Initializing a Hash Database

The inithsh.c program is a batch wrapper program for the hinit.c subroutine in the library.

Read Initialize a Hash Database for initializing a new database.

Database Size

The hshsize program counts the total number of records in the hashing space and the overflow space. Then it counts the total number of empty records and deleted overflow records.

Database Check

The dbck program performs an integrity check on a database. If the database is correct the program ends quietly.

Overflow Chain Statistics

The chains program gathers statistics on the overflow chains in a database.

As a guideline, the total number of records in a hash database should not exceed the totrcds parameter in hinit.c. When this is true, the average length of each overflow chain should generally not exceed 2.0. The standard deviation should generally be around 1.0.

Prime Number Calculation

The getprime program calculates the largest prime number equal or less than an estimated number.

The total number of records in a hash database should be a prime number.

You may have to revise the total number of records in the database as it grows beyond your original estimate. Use hreorg to reorganize your database.

Database Reorganization

Input parameters to the reorg program are:

If the total number of records parameter is zero, the reorg program reorganizes your database based on parameter input and existing record counts.

The reorg program is included as a convenience. However, it is not the best way to reorganize your database. The best way to reorganize is to sort the lsthsh report by column two and prioritize the groups of records that hash to the same address.

Then when you insert the prioritized list into a new database, the chains are optimal for your application.

Lsthsh Report

The lsthsh report may be used for doing mass updates to a large database in physical sequence. It may also be used for analyzing overflow chains.

The first column is the physical sequence number of each record.

The second column is the offset in the database of each record's hashing address.

The third column is the offset in the database of each block. With overflow records, columns two and three differ.

The fourth column is the offset in the database of each record's next pointer in the overflow chain. The last record in the chain always has a next pointer of zero.

The fifth column is the record. The key is embedded in columns 5-12 of the test database record.

Sort the report by column two for doing mass updates to a large database. This allows you to update in the physical sequence of the database. More than one record may sort to the same hashing address in column two. Your mass update program should prioritize each record in the record group sharing the same hashing address. One of the goals for your mass update program should be to make the overflow chains more efficient.

The fourth column contains the next pointer of each record. This column is included to allow you to check for broken chains.

Dmphsh Listing

The dmphsh listing may be used for analyzing the pointers in the database.

The first 256 bytes are schema information about the database. See the structure hashfmt in the bthash.h header file for a description of this schema. The fields from hndl to the end are temporary. They aren't stored on disk.

The first record block starts in byte 256 of the database. See the structure h_nodefmt in the bthash.h header file for a description of the record block. The first 4 bytes are the next pointer in the overflow chain. Pointers are stored in little endian format on Intel machines.

The data portion of the record starts in byte 5 of the each block.

With this information, you can follow the structure of the database and repair the database if it is damaged.

The free stack pointer is at location 0x2c of the header. You may follow the free stack list until you find a next pointer of zero. No record on the free block chain should appear in an overflow chain.

Schema Report

The schema report lets you know what the database characteristics are for a database.

This program prints the information that the insert and update programs use to update the database. This information consists of:

This information is saved in the first 256 bytes of each database. The hashing routine uses these four fields to calculate the hashing address, based on the input key.

Creating and Updating the Database

The bldhsh test program creates a record with random data using hisrt. It then reads it back with hget and places 4 z's at the beginning of the record, then replaces the record in the database using the hupdt routine.

After all records have been created and updated, the next step in the test script prints the database in physical sequence to show the result of the bldhsh program. You will see 4 z's in the beginning of each record.

Create and Populate a Temporary Database

The tmphsh test program creates a temporary database and adds records with random data using hisrt. It then sweeps the database and prints the records.

Delete Program

The delete program delhsh deletes random records from the database.

It stores all the keys in a table, then selects a random key, one at a time, to delete. The size of the key table is limited to 100 thousand keys.

Testing the Database Lock

Testing the database lock requires two logical terminals and four steps.

First Terminal Second Terminal
Run holdlock.sh
Run tstlock.sh
Terminate holdlock
Re-run tstlock.sh