Michael Coughlan
Page 41
SORT WorkFile
ON ASCENDING BookId-WF
AuthorName-WF
USING BookSalesFileUS, BookSalesFileEU
GIVING SortedBookSales
SORT WorkFile
ON DESCENDING NCAP-Result-WF
ASCENDING ManfName-WF, VehicleName-WF
USING NCAP-TestResultsFile
GIVING Sorted-NCAP-TestResultsFile
Simple Sorting Notes
Consider the following:
• SDWorkFileName identifies a temporary work file that the sort process uses as a kind of scratch
pad for sorting. The file is defined in the FILE SECTION using a sort description (SD) rather
than a file description (FD) entry. Even though the work file is a temporary file, it must still
have associated SELECT and ASSIGN clauses in the ENVIRONMENT DIVISION. You can give this
file any name you like; I usually call it WorkFile as I did in Example 14-1.
• SDWorkFileName file is a sequential file with an organization of RECORD SEQUENTIAL. Because
this is the default organization, it is usually omitted (see Listing 14-1).
• Each WorkSortKey#$i identifies a field in the record of the work file. The sorted file will be
ordered on this key field(s).
• When more than one WorkSortKey#$i is specified, the keys decrease in significance from left
to right (the leftmost key is the most significant, and the rightmost is the least significant).
• InFileName and OutFileName are the names of the input and output files, respectively.
• If more than one InFileName is specified, the files are combined (OutFileSize = InFile1Size
+ InFile2Size) and then sorted.
• If more than one OutFileName is specified, then each file receives a copy of the sorted records.
328
Chapter 14 ■ Sorting and Merging
• If the DUPLICATES clause is used, then when the file has been sorted, the final order of records
with duplicate keys (keys with the same value) is the same as that in the unsorted file. If no
DUPLICATES clause is used, the order of records with duplicate keys is undefined.
• AlphabetName is an alphabet name defined in the SPECIAL-NAMES paragraph of the
ENVIRONMENT DIVISION. This clause is used to select the character set the SORT verb uses for
collating the records in the file. The character set may be STANDARD-1 (ASCII), STANDARD-2
(ISO 646), NATIVE (may be defined by the system to be ASCII or EBCDIC; see your
implementer manual), or user defined.
• SORT can be used anywhere in the PROCEDURE DIVISION except in an INPUT PROCEDURE (SORT)
or OUTPUT PROCEDURE (SORT or MERGE) or in the DECLARATIVES SECTION. The purpose of the
INPUT PROCEDURE and OUTPUT PROCEDURE is explained later in this chapter, but an explanation
of the DECLARATIVES SECTION has to wait until Chapter 18.
• The records described for the input file (USING) must be able to fit into the records described
for SDWorkFileName.
• The records described for SDWorkFileName must be able to fit into the records described for
the output file (GIVING).
• The description of WorkSortKey#$i cannot contain an OCCURS clause (it cannot be a table), nor
can it be subordinate to an entry that contains one.
• The InFileName and OutFileName files are automatically opened by the SORT. When the SORT
executes, they must not already be open.
How the Simple SORT Works
Figure 14-2 shows how the simple version of SORT works. In this case, the diagram uses the example in Listing 14-1
to illustrate the point. The sort process takes records from the unsorted BillableServicesFile, sorts them using WorkFile (the temporary work area), and, when the records have been sorted, sends them to SortedBillablesFile.
After sorting, the records in the SortedBillablesFile will be ordered on ascending SubscriberId.
Figure 14-2. Diagram showing how the simple SORT works
329
Chapter 14 ■ Sorting and Merging
Simple Sorting Program
Universal Telecoms has subscribers all over the United States. Each month, the billable activities of these subscribers are gathered into a file. BillableServicesFile is an unordered sequential file. Each record has the following description: Field
Type
Length
Value
SubscriberId
9
10
–
ServiceType
9
1
1(text)/2(voice)
ServiceCost
9
6
0.10–9999.99
A program is required to produce a report that shows the value of the billable services for each subscriber (see Listing 14-1). In the report, BillableValue is the sum of the ServiceCost fields for each subscriber. The report must be printed on ascending SubscriberId and have the following format:
Universal Telecoms Monthly Report
SubscriberId BillableValue
XXXXXXXXXX XXXXXXXXXXX
XXXXXXXXXX XXXXXXXXXXX
XXXXXXXXXX XXXXXXXXXXX
Listing 14-1. A simple SORT applied to the BillableServicesFile
IDENTIFICATION DIVISION.
PROGRAM-ID. Listing14-1.
AUTHOR. Michael Coughlan.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT WorkFile ASSIGN TO "WORK.TMP".
SELECT BillableServicesFile ASSIGN TO "Listing14-1.dat"
ORGANIZATION LINE SEQUENTIAL.
SELECT SortedBillablesFile ASSIGN TO "Listing14-1.Srt"
ORGANIZATION LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD BillableServicesFile.
01 SubscriberRec-BSF PIC X(17).
SD WorkFile.
01 WorkRec.
02 SubscriberId-WF PIC 9(10).
02 FILLER PIC X(7).
FD SortedBillablesFile.
01 SubscriberRec.
88 EndOfBillablesFile VALUE HIGH-VALUES.
02 SubscriberId PIC 9(10).
330
Chapter 14 ■ Sorting and Merging
02 ServiceType PIC 9.
02 ServiceCost PIC 9(4)V99.
WORKING-STORAGE SECTION.
01 SubscriberTotal PIC 9(5)V99.
01 ReportHeader PIC X(33) VALUE "Universal Telecoms Monthly Report".
01 SubjectHeader PIC X(31) VALUE "SubscriberId BillableValue".
01 SubscriberLine.
02 PrnSubscriberId PIC 9(10).
02 FILLER PIC X(8) VALUE SPACES.
02 PrnSubscriberTotal PIC $$$,$$9.99.
01 PrevSubscriberId PIC 9(10).
PROCEDURE DIVISION.
Begin.
SORT WorkFile ON ASCENDING KEY SubscriberId-WF
USING BillableServicesFile
GIVING SortedBillablesFile
DISPLAY ReportHeader
DISPLAY SubjectHeader
OPEN INPUT SortedBillablesFile
READ SortedBillablesFile
AT END SET EndOfBillablesFile TO TRUE
END-READ
PERFORM UNTIL EndOfBillablesFile
MOVE SubscriberId TO PrevSubscriberId, PrnSubscriberId
MOVE ZEROS TO SubscriberTotal
PERFORM UNTIL SubscriberId NOT EQUAL TO PrevSubscriberId
ADD ServiceCost TO SubscriberTotal
READ SortedBillablesFile
AT END SET EndOfBillablesFile TO TRUE
END-READ
END-PERFORM
MOVE SubscriberTotal TO PrnSubscriberTotal
DISPLAY SubscriberLine
END-PERFORM
CLOSE SortedBillablesFile
STOP RUN.
Program
Notes
I have kept this program simple for reasons of clarity and space, and because you will meet a more fully worked version of the program when I explore advanced versions of the SORT. Because the SORT uses a disk-based WorkFile, it is slower than purely RAM-bound operations. You should be aware of this whenever you are considering using SORT.
You should probably use SORT only when no practical RAM-based solution is available; and even then, you should ensure that only the data items required in the sorted file are sorted. This may involve leaving out some of the records or changing the record size.
331
Chapter 14 ■ Sorting and Merging
In this instance, sorting the file does seem to be the only viable option. There are millions of telephone subscribers, and, in the course of a month, they make many calls and send hundreds of texts. So BillableServicesFile contains tens of millions, or hundreds of millions, of records. In COBOL, the only possible RAM-based solution (you can't create dynamic structures like trees or linked lists pre–ISO 2002) would be to use a table (one element per subscriber) to sum the subscribers’ ServiceCost fields. That solution has many problems. The array would have to contain millions of elements, you would have to ensure that the elements were in SubscriberId order, and, because new subscribers are constantly joining, the table would have to be redimensioned every time the program ran.
You may wonder why the example uses different record descriptions for the three files when the records are identical. The reason is that although the records are identical, they are used in different ways in the program, and the granular data descriptions reflect way the records are used.
The input file is used only by the SORT, so while you have to define how much storage a record will occupy you never need to refer to the individual fields. You could fully define the record as follows:
01 UnsortedSubcriberRec.
02 SubscriberId PIC 9(10).
02 ServiceType PIC 9.
02 ServiceCost PIC 9(4)V99
But then you would either have to use slightly different field names for the sorted file or qualify them using references such as SubscriberId OF SubscriberRec.
In WorkFile, only the data items on which the file is to be sorted (mentioned in the KEY phrase) need to be explicitly defined. In this case, the only item that must be explicitly identified is SubscriberId-WF.
The sorted file is normally the file that the program uses to do whatever work is required. This generally means that all, or nearly all, of the data items are mentioned by name in the program; and, hence, they have to be declared.
Normally, the record description for this file fully defines the record.
Using Multiple Keys
If you examine the SORT metalanguage in Figure 14-1, you will realize not only that can a file be sorted on a number of keys but also that one key can be ascending while another is descending. This is illustrated in Table 14-1 and Example 14-2.
The table contains student results that have been sorted into descending StudentId order within ascending GPA order. Notice that GPA is the major key and that StudentId is only in descending sequence within GPA. This is because the first key named in a SORT statement is the major key, and keys become less significant with each successive declaration.
Example 14-2. SORT with One Key Descending and Another Ascending
SORT WorkFile ON DESCENDING GPA
ASCENDING StudentId
USING StudentResultsFile
GIVING SortedStudentsResultsFile
332
Chapter 14 ■ Sorting and Merging
Table 14-1. Ascending StudentId within Descending GPA
-
SORT with Procedures
The simple version of SORT takes the records from InFileName, sorts them, and then outputs them to OutFileName.
Sometimes, however, not all the records in the unsorted file are required in the sorted file, or not all the data items in the unsorted file record are required in the record of the sorted file. For instance, suppose the specification for the Universal Telecoms Monthly Report changes so that you are only required to show the value of the voice calls made by subscribers. In that situation, the text records (ServiceType = 1) are not required in the sorted file. Similarly, if the specification changes so that the number of texts and phone calls is required rather than their value, you do not need the ServiceCost data item in sorted file records. In both cases, processing must be applied, to eliminate unwanted records or alter their format, before the records are submitted to the sort process. This processing is achieved by specifying INPUT PROCEDURE with SORT.
Sometimes, to reduce the number of files that have to be declared, you may find it useful to process the records directly from the sort process instead of creating a sorted file and then processing that. For instance, you could create the Universal Telecoms Monthly Report directly instead of creating a sorted file and then processing the sorted file to create the report. Such processing is accomplished by using OUTPUT PROCEDURE with SORT.
An INPUT PROCEDURE is a block of code that consists of one or more sections or paragraphs that execute, having been passed control by SORT. When the block of code has finished, control reverts to SORT. An OUTPUT PROCEDURE
works in a similar way.
333
Chapter 14 ■ Sorting and Merging
Figure 14-3 gives the metalanguage for the full SORT including the INPUT PROCEDURE and the OUTPUT PROCEDURE.
Figure 14-3. Metalanguage for the full version of the SORT verb
INPUT PROCEDURE Notes
You should consider the following when using an INPUT PROCEDURE:
• The block of code specified by the INPUT PROCEDURE allows you to select which records,
and what format of records, are submitted to the sort process. Because an INPUT PROCEDURE
executes before the SORT sorts the records, only the data that is actually required in the sorted
file is sorted.
• When you use an INPUT PROCEDURE, it replaces the USING phrase. The ProcedureName in
the INPUT PROCEDURE phrase identifies a block of code that uses the RELEASE verb to supply
records to the sort process. The INPUT PROCEDURE must contain at least one RELEASE statement
to transfer the records to the work file (identified by SDWorkFileName).
• The INPUT PROCEDURE finishes before the sort process sorts the records supplied to it by the
procedure. That's why the records are RELEASEd to the work file. They are stored there until the
INPUT PROCEDURE finishes, and then they are sorted.
• Neither an INPUT PROCEDURE nor an OUTPUT PROCEDURE can contain a SORT or MERGE
statement.
• The pre–ANS 85 COBOL rules for the SORT verb stated that the INPUT PROCEDURE and OUTPUT
PROCEDURE had to be self-contained sections of code and could not be entered from elsewhere
in the program.
• In the ANS 85 version of COBOL, the INPUT PROCEDURE and OUTPUT PROCEDURE can be
any contiguous group of paragraphs or sections. The only restriction is that the range of
paragraphs or sections used must not overlap.
334
Chapter 14 ■ Sorting and Merging
OUTPUT PROCEDURE Notes
You should consider the following when using an OUTPUT PROCEDURE:
• An OUTPUT PROCEDURE retrieves sorted records from the work file using the RETURN verb. An
OUTPUT PROCEDURE must contain at least one RETURN statement to get the records from the
work file.
• An OUTPUT PROCEDURE only executes after the file has been sorted.
• If you use an OUTPUT PROCEDURE, the SORT..GIVING phrase cannot be used.
How an INPUT PROCEDURE Works
A simple SORT works by taking records from the USING file, sorting them, and then writing them to the GIVING file.
When an INPUT PROCEDURE is used, there is no USING file,
so the sort process has to get its records from the INPUT
PROCEDURE. The INPUT PROCEDURE uses the RELEASE verb to supply the records to the work file of the SORT, one at a time.
Although an INPUT PROCEDURE usually gets the records it supplies to the sort process from an input file, the records can originate from anywhere. For instance, if you wanted to sort the elements of a table, you could use INPUT
PROCEDURE to send the elements, one at a time, to the sort process (see Listing 14-7, in the section “Sorting Tables Program”). Or, if you wanted to sort the records as they were entered by the user, you could use INPUT PROCEDURE to get the records from the user and supply them to the sort process (see Listing 14-3, later in this section). When an INPUT PROCEDURE gets its records from an input file, it can select which records to send to the sort process and can even alter the structure of the records before they are sent.
Creating an INPUT PROCEDURE
When you use an INPUT PROCEDURE, a RELEASE verb must be used to send records to the work file associated with SORT. The work file is declared in an SD entry in the FILE SECTION. RELEASE is a special verb used only in INPUT
PROCEDUREs to send records to the work file. It is the equivalent of a WRITE command and works in a similar way. The metalanguage for the RELEASE verb is given in Figure 14-4.
Figure 14-4. Metalanguage for the RELEASE verb
A template for an INPUT PROCEDURE that gets records from an input file and releases them to the SORT work file is given in Example 14-3. Notice that the work file is not opened in the OUTPUT PROCEDURE. The work file is automatically opened by the SORT.
Example 14-3. INPUT PROCEDURE File-Processing Template
OPEN INPUT InFileName
READ InFileName RECORD
PERFORM UNTIL TerminatingCondition
RELEASE SDWorkRec
READ InFileName RECORD
END-PERFORM
CLOSE InFileName
335
Chapter 14 ■ Sorting and Merging
Using an INPUT PROCEDURE to Select Records
Suppose that the specification for the Universal Telecoms Monthly Report is changed so that only the value of the voice calls made by subscribers is required. Figure 14-5 shows how you can use an INPUT PROCEDURE between the input file and the sort process to filter out the unwanted text (ServiceType = 1) records. Listing 14-2 implements the specification change and also produces a more fully worked version. In this program, the report is written to a print file rather than just displayed on the computer screen.