463
Chapter 17 ■ DireCt aCCess Files
The Key of Reference
When you define an indexed file with ACCESS MODE IS SEQUENTIAL, the file is always processed in ascending
primary-key order. But if the file is defined with ACCESS MODE IS DYNAMIC and is processed sequentially, the file system must be able to tell which of the keys to use as the basis for processing the file. Because the format of the sequential READ does not have a key phrase, the file system refers to a special item called the key of reference to discover which key to use for processing the file. Before reading a file defined as ACCESS MODE IS DYNAMIC sequentially, you must establish one of the file’s keys as the key of reference. You do so by using the key in a START or a direct READ. When the file is opened, the primary key is by default the key of reference, and the next-record pointer is pointing at the first record.
Indexed File Verbs
Indexed files use the same verbs for file manipulation as relative files, but in some cases there are syntactic or semantic differences. This section examines only those verbs that differ in syntax or semantics from those used with relative files.
The READ Verb
When an indexed file is defined with ACCESS MODE IS SEQUENTIAL, the READ format is the same as for sequential files. But when the file is defined with ACCESS MODE IS DYNAMIC, sequential processing of the file is complicated by the presence of a number of indexes. The order in which the data records are read depends on which index is being processed sequentially, and the index used is established by the key of reference.
For indexed files, the format of the READ used to read sequentially is the same as for relative files. But in the case of the direct READ, the format requires a KEY IS phrase to specify the key on which the file is to be read. The metalanguage for this format of READ is given in Figure 17-14.
Figure 17-14. READ format used to read an indexed file directly
To read a record directly from an indexed file, a key value must be placed in the KeyName data item (the KeyName data item is the area of storage identified as the primary key or one of the alternate keys in the SELECT and ASSIGN clause).
When READ executes, the record with a key value equal to the present value of KeyName is read into the file buffer.
After the record has been read, the next-record pointer points to the next logical record in the file. If the key of reference is the primary key, then this record is an actual data record; but if the key of reference is one of the alternate keys, the pointer points to the next alternate index base record.
If duplicates are allowed, only the first record in a group with duplicates can be read directly. The rest of the duplicates must be read sequentially using the READ NEXT RECORD format.
Here are some things to remember:
• If the record does not exist, the INVALID KEY clause activates, and the statement block
following the clause is executed.
• If the KEY IS clause is omitted, the key used is the primary key.
• When READ is executed, the key mentioned in the KEY IS phrase is established as the key of
reference.
464
Chapter 17 ■ DireCt aCCess Files
• If there is no KEY IS phrase, the primary key is established as the key of reference.
• The file must have an ACCESS MODE of DYNAMIC or RANDOM and must be opened for I-O or INPUT.
The WRITE, REWRITE and DELETE Verbs
The syntax and semantics of the WRITE, REWRITE, and DELETE verbs is the same as for relative files, except that
• Direct access for all these verbs is based on the primary key only.
• Although REWRITE may not change the value of the primary key, it may change the value of any
of the alternate keys.
The START Verb
The syntax for the START verb is the same as for relative files, except that instead of the format START FileName KEY
Condition RelKey, the format is as is shown in Figure 17-15. The key of comparison is any of the keys specified in the indexed file’s SELECT and ASSIGN clause.
Figure 17-15. Metalanguage for the START verb
Just as with relative files, the START verb may be used to control the position of the next-record pointer. In addition, with indexed files, the START verb may be used to establish a particular key as the key of reference.
The primary key or one of the alternate keys is the key of comparison. To establish a particular key as the key of reference and position the next-record pointer at a particular record, you first move the key value to the key-of-comparison data item. Then you execute the statement START..KEY IS EQUAL TO .. if you want to position the next-record pointer at the record with a key equal to the value in the key of comparison, or START..KEY IS GREATER THAN .. if you want to position the next-record pointer at the succeeding record.
Remember these things:
• The file must be opened for INPUT or I-O when START is executed.
• Execution of the START statement does not change the contents of the record area (that is, START
does not read the record—it merely positions the next-record pointer and establishes the key
of reference).
•
When START is executed, the next-record pointer is set to the first logical record in the file whose
key satisfies the condition. If no record satisfies the condition, the INVALID KEY clause is activated.
465
Chapter 17 ■ DireCt aCCess Files
Comparison of COBOL File Organizations
Now that you have examined all the COBOL file organizations, you may wonder which is the best one to use. The answer is that it depends. This section examines the advantages and disadvantages of each organization; from this information, you should be able to figure out which organization to use in a given situation.
First some terminology. The hit rate refers to the number of records in the file that are impacted when you process a file. For instance, if only 100 records are affected by an insert, a delete, or an update operation in a file of 100,000 records, the hit rate is low. But if 90,000 records are affected, the hit rate is high.
Sequential File Organization
The records in a sequential file are held serially, one after another, on disk, tape, or other media. This organization has both advantages and disadvantages.
Disadvantages of Sequential File Organization
Sequential files have the following disadvantages:
• They are slow when the hit rate is low. To read a particular record, you have to read all the
preceding records. To update records, you have to read all the records in the file and write
them to a new file. This is a lot of work if all you are doing is changing a few of the records in
the file.
• They are complicated to change. Changes to sequential files are batched together in a
transaction file to minimize the low-hit-rate problem, but this makes updating sequential
files much more complicated than updating direct access files. The complications arise from
having to match the records in the transaction file with those in the master file (that is, the file
to be updated).
• They take up double the storage when they are updated. The records in sequential files cannot
be updated in situ; instead, a new file must be created that consists of all the records in the
old file plus the insertions and minus the deletions. Of course, this storage problem may be
transient, because once the new file has been created, you can delete the old file.
Advantages of Sequential File Organization
Sequential file organization also has a number of advantages:
• When the hit rate is high, it is the fastest file organization because the record position does not have to be calculated and no indexes have to be traversed. Because the records
are stored
contiguously, this organization takes advantage of the fact that the file system doesn’t access
records on a per-record basis but instead scoops up a block or bucket at a time. When a block
contains a number of records, the number of disk accesses required to process the file is
greatly reduced.
• It is the most storage efficient of all the file organizations. No indexes are required, the space from deleted records is recovered, and only the storage actually required to hold the records is
allocated to the file.
• It is the simplest file organization. Records are held serially, so you read them one after another.
466
Chapter 17 ■ DireCt aCCess Files
• It allows the space from deleted records to be recovered. To delete records from a sequential file, you create a new file that does not contain the deleted records. Once you delete the old file,
all the storage previously used by the deleted records is recovered and can be used for storing
something else.
• Sequential files may be stored and processed on serial media such as magnetic tape. These
media are cheap, removable, and voluminous.
Relative File Organization
You can think of the records in a relative file as a one-dimensional table stored on disk. The file system can calculate where each record is on the disk because it knows the start location for the file, and it knows the amount of storage required to store each record. The record location is calculated as RecordLocation = BaseLocation + (SizePerRecord
* (RelativeRecordNumber - 1)).
Disadvantages of Relative File Organization
Relative file organization has a number of disadvantages:
• It wastes storage if the file is only partially populated with records. The file is allocated enough disk storage to hold records from 1 to the highest relative record number used, even if only a
few records have been written to the file. For instance, if the first record written to the file has a
relative record number of 100,000, room for that many records is allocated to the file.
• It cannot recover the space from deleted records. When a record is deleted in a relative file, it is marked as deleted, but the space that was occupied by the record is still allocated to the
file. This means if a relative file takes up 1.5MB of disk space when full, it still occupies 1.5MB
when 99% of the records have been deleted.
• It allows only a single, numeric key. The single key is limiting because often you need to access a file on more than one key. For instance, in a file of student records, you might want to access
the records on student ID, student name, course code, or module code. The mention of using
student name, course code, or module code highlights another drawback with relative files:
you frequently need to access a file using an alphanumeric key.
• The relative file key must map on to the range of the relative record numbers for the file. The facts that the key must be in the range between 1 and the highest key value and that the file
system allocates space for all the records between 1 and the highest relative record number
used impose severe constraints on the key. For instance, even though StudentId is numeric,
you can’t use it as a key because the file system allocates space for records between 1 and the
highest StudentId written to the file. If the highest StudentId written to the file is 9976683,
the file system will allocate space for 9,976,683 records. Universities rarely have this many
students, so most of the file will be wasted space.
Sometimes you can get around the limitations of the relative key by using a transformation
function to map the actual key onto the range of relative record numbers. There are a number
of possible transformation or hashing functions. These transformations include truncation
(using only some of the digits in the key as the relative record number), folding (breaking the
key into two or more parts and summing the parts), digit manipulation (manipulating some
of the digits in the key to produce a relative record number), and modulus division (using
the remainder of a division operation as the relative record number). Some sophisticated
transformation functions may even allow alphanumeric keys.
467
Chapter 17 ■ DireCt aCCess Files
• Relative files must be stored and processed on direct access media. Because relative files are direct access files, they must be processed on direct access media such as a hard disk. They
cannot be processed on magnetic tape or other cheap serial media; and if stored on tape, they
must be loaded onto a hard disk before they can be used.
Advantages of Relative File Organization
Although relative file organization has many disadvantages, it also has the following advantages:
• It is the fastest direct access organization. Only a few simple calculations have to be done to locate a particular record.
• Records in a relative file have very little storage overhead. Unlike indexed files, which must store the indexes as well as the data, relative files have only a small storage overhead for each
record (such as the record-deletion indicator).
• Records in a relative file can be read sequentially. In addition to allowing direct access, relative files allow sequential access to the records in the file.
Indexed File Organization
As shown in the Figure 17-10 earlier, the records in an indexed file are arranged in ascending primary-key order in a series of chained buckets/blocks. In addition to the actual data records, the primary key has a number of index records. For each alternate key specified for the file, there is a similar arrangement; but instead of data records at the final level, there are records arranged in ascending alternate-key order that consist only of the key and a pointer to where the actual record may be found. As shown earlier in Figure 17-11, in addition to the records at the base level, there are a number of alternate-key index records.
Disadvantages of Indexed File Organization
Indexed file organization has many disadvantages:
• It is the slowest direct access organization, because indexed files achieve direct access by
traversing a number of levels of index. Indexed files must have a primary-key index and an
index for each alternate key. Each level of index implies an I/O operation on the hard disk. For
instance, three I/O operations are required to read the record shown earlier in Figure 17-10:
two for the index records and one for data record).
• It especially slow when writing or deleting records because then the primary-key index and the
alternate-key indexes may need to be rebuilt.
• It is not very storage efficient, because indexed files must store the index records, the alternate index records, the data records, and the alternate data records.
• Space from deleted records is only partially recovered until the indexes are rebuilt (which has to be done periodically).
• Indexed files may only be processed on direct access media, because they are direct access files.
They cannot be processed on magnetic tape.
468
Chapter 17 ■ DireCt aCCess Files
Advantages of Indexed File Organization
As you have seen, indexed files have many disadvantages, but these are far outweighed by their advantages:
• They can use multiple, alphanumeric keys.
• They can have duplicate alternate keys.
• They can be read sequentially on any of their keys.
• They can partially recover space from deleted records.
• They can have mult
iple alphanumeric keys, and only the primary key must be unique.
Although indexed files have their disadvantages, the versatility afforded by having multiple, alphanumeric keys and being able to process the file both directly and sequentially on any of its keys overrides all their disadvantages.
As a result, indexed files are the most widely used direct access file organization.
Summary
This chapter introduced COBOL’s direct access file organizations: indexed and relative files. You learned about the arrangement of records in each of these file organizations, along with new concepts such as file status, the next-record pointer, and key of reference. You explored the syntactic and semantic changes that allow the existing file-processing verbs to process direct access files, and you were introduced to new COBOL file-processing verbs such as DELETE, REWRITE, and START. In the final section of the chapter, you saw the advantages and disadvantages of each of the COBOL file organizations.
The next chapter discusses the COBOL Report Writer. The Report Writer allows you to write programs that
produce reports using declarative rather than procedural/imperative techniques. In imperative programming, you tell the computer how to do what you want done. In declarative programming, you tell the computer what you would like done, and the computer works out how to do it.
The Report Writer also uses a kind of specialized exception handling called declaratives. You can also use declaratives with files. When you specify the declaratives for a file, an exception that would normally activate the AT
END or INVALID KEY clause instead executes the code you have written in the DECLARATIVE SECTION to deal with the problem.
By way of introduction, the answer to the exercise at the end of this chapter uses the Report Writer to print a small report. Because you don’t know how to use the Report Writer yet, you have to do the exercise the hard way. There is nothing like the pain of coding a report program to make you appreciate the benefits of the Report Writer!
prOGraMMING eXerCISe
time for a little exercise. Whip out your 2B pencil and see if you can come up with a solution to this problem.
Introduction
Michael Coughlan Page 57