Michael Coughlan
Page 45
Although it is not strictly string manipulation, I’d like to explore an alternative solution to Listing 15-3 here, because it allows me to introduce an aspect of condition names that you have not seen before. In the solution in Listing 15-4, TextLine is defined as an array of characters. A PERFORM is used to step through the array and, at each character, test whether it is a vowel or a consonant; whichever it is, the PERFORM then increments the appropriate total. The interesting part is the way you discover whether the character is a vowel or a consonant.
You may not have realized that a condition name can be set to monitor a table element. That is what the program in Listing 15-4 does. Once the condition names for vowels and consonants are set up, all the program needs to do is test which condition name is set to TRUE for the character under consideration and then increment the appropriate count.
Listing 15-4. Using a Table Element Condition to Count Vowels and Consonants
IDENTIFICATION DIVISION.
PROGRAM-ID. Listing15-4.
AUTHOR. Michael Coughlan.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 TextLine.
02 Letter PIC X OCCURS 80 TIMES.
88 Vowel VALUE "A" "E" "I" "O" "U".
88 Consonant VALUE "B" "C" "D" "F" "G" "H" "J" "K" "L" "M" "N" "P"
"Q" "R" "S" "T" "V" "W" "X" "Y" "Z".
01 VowelCount PIC 99 VALUE ZERO.
01 ConsonantCount PIC 99 VALUE ZERO.
01 idx PIC 99.
PROCEDURE DIVISION.
Begin.
DISPLAY "Enter text : " WITH NO ADVANCING
ACCEPT TextLine
MOVE FUNCTION UPPER-CASE(TextLine) TO TextLine
PERFORM VARYING idx FROM 1 BY 1 UNTIL idx > 80
IF Vowel(idx) ADD 1 TO VowelCount
ELSE IF Consonant(idx) ADD 1 TO ConsonantCount
END-IF
END-PERFORM
DISPLAY "The line contains " VowelCount " vowels and " ConsonantCount " consonants."
STOP RUN.
INSPECT .. REPLACING: Format 2
INSPECT..REPLACING replaces characters in the string with a replacement character. The metalanguage for this version of INSPECT is given in Figure 15-2.
365
Chapter 15 ■ String Manipulation
Figure 15-2. Metalanguage for INSPECT..REPLACING
This version of INSPECT works by scanning the source string SourceStr$i from left to right and replacing
occurrences of all characters with a replacement character, or replacing specified characters with replacement characters:
• The behavior of the INSPECT is modified by the LEADING, ALL, FIRST, BEFORE, and AFTER
phrases. An ALL, LEADING, FIRST, or CHARACTERS phrase may only be followed by one BEFORE
phrase and one AFTER phrase.
• If Compare$il or Delim$il is a figurative constant, it is one character in size. But when
Replace$il is a figurative constant, its size equals that of Compare$il.
• The sizes of Compare$il and Replace$il must be equal.
• When there is a CHARACTERS phrase, the size of ReplaceChar$il and the delimiter that may
follow it (Delim$il) must be one character.
Modifying Phrases
Like INSPECT..TALLYING, the operation of INSPECT..REPLACING is governed by the modifying phrases used.
The meaning of these phrases is as follows:
BEFORE: Designates the characters to the left of its associated delimiter (Delim$il) as valid.
If the delimiter is not present in SourceStr$i, then using the BEFORE phrase implies that all
the characters are valid.
AFTER: Designates the characters to the right of its associated delimiter (Delim$il) as valid.
If the delimiter is not present in the SourceStr$i, then using the AFTER phrase implies that
there are no valid characters in the string.
ALL: Replaces all Compare$il characters with the Replace$il characters from the first
matching valid character to the first invalid one.
FIRST: Causes only the first valid character(s) to be replaced.
INSPECT .. REPLACING Examples
The INSPECT..REPLACING statements in Example 15-2 work on the data in StringData to produce the results shown in the storage schematics. Assume that before each INSPECT executes, the value "FFFAFFFFFFQFFFZ" (shown in the Before row) is moved to StringData.
366
Chapter 15 ■ String Manipulation
Example 15-2. Example INSPECT..REPLACING Statements with Results
1. INSPECT StringData REPLACING ALL "F" BY "G"
AFTER INITIAL "A" BEFORE INITIAL "Q"
2. INSPECT StringData REPLACING ALL "F" BY "G"
AFTER INITIAL "A" BEFORE INITIAL "Z"
3. INSPECT StringData REPLACING FIRST "F" BY "G"
AFTER INITIAL "A" BEFORE INITIAL "Q"
4. INSPECT StringData REPLACING
ALL "FFFF" BY "DOGS"
AFTER INITIAL "A" BEFORE INITIAL "Z"
5. INSPECT StringData REPLACING
CHARACTERS BY "z" BEFORE INITIAL "Q"
INSPECT: Format 3
The third format of INSPECT simply allows you to combine the operation of the two previous formats in one statement.
Please see those formats for explanations and examples. The metalanguage for the third INSPECT format is shown in Figure 15-3. This format is executed as though two successive INSPECT statements are applied to SourceStr$i, the first being an INSPECT..TALLYING and the second an INSPECT.. REPLACING.
367
Chapter 15 ■ String Manipulation
Figure 15-3. Metalanguage for format 3 of INSPECT
INSPECT .. CONVERTING: Format 4
INSPECT..CONVERTING seems very similar to INSPECT..REPLACING but actually works quite differently. It is used to convert one list of characters to another list of characters on a character-per-character basis. The metalanguage for this version of INSPECT is given in Figure 15-4.
Figure 15-4. Metalanguage for INSPECT..CONVERTING
Using INSPECT .. CONVERTING
INSPECT..CONVERTING works on individual characters. If any of the Compare$il list of characters are found in SourceStr$i, they are replaced by the characters in Convert$il on a one-for-one basis. For instance, in Figure 15-5, an F found in StringData is converted to z, X is converted to y, T is converted to a, and D is converted to b.
Figure 15-5. INSPECT..CONVERTING showing the conversion strategy
368
Chapter 15 ■ String Manipulation
The INSPECT..CONVERTING in Figure 15-5 is the equivalent of the following:
INSPECT StringData REPLACING
ALL "F" BY "z",
"X" BY "y",
"T" BY "a",
"D" BY "b"
These are some rules for INSPECT..CONVERTING :
• Compare$il and Convert$il must be equal in size.
• When Convert$il is a figurative constant, its size equals that of Compare$il.
• The same character cannot appear more than once in Compare$il, because each character
in the Compare$il string is associated with a replacement character. For instance, INSPECT
StringData CONVERTING "XTX" TO "abc" is not allowed because the system won’t know if X
should be converted to a or c.
INSPECT .. CONVERTING Examples
You saw an example of INSPECT..CONVERTING in Listing 15-1, where it was used to convert text to uppercase. That example is repeated in Listing 15-3, but here it demonstrates that Compare$il and Convert$il can be either strings or data items containing string values.
Example 15-3. Using INSPECT..CONVERTING to Convert Text to Uppercase or Lowercase
DATA DIVISION.
WORKING-STORAGE SECTION.
01 TextLine PIC X(60).
01 LowerCase PIC X(26) VALUE "abcdefghijklmnopqrstuvwxyz".
01 UpperCase PIC X(26) VALUE "ABCDEFGHIJKLMNO
PQRSTUVWXYZ".
PROCEDURE DIVISION.
Begin.
DISPLAY "Enter text : " WITH NO ADVANCING
ACCEPT TextLine
INSPECT TextLine CONVERTING
"abcdefghijklmnopqrstuvwxyz" TO
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
DISPLAY "Entered text in upper case = " TextLine
INSPECT TextLine CONVERTING UpperCase TO LowerCase
DISPLAY "Entered text in lower case = " TextLine.
Sometimes when you want to process the words in a line of text, especially if you want to recognize the words, you may need to get rid of the punctuation marks. Example 15-4 uses INSPECT..CONVERTING to convert punctuation marks in the text to spaces. UNSTRING is then used to unpack the words from the text.
369
Chapter 15 ■ String Manipulation
Example 15-4. Using INSPECT..CONVERTING to Convert Punctuation Marks to Spaces
ACCEPT TextLine
INSPECT TextLine CONVERTING ",.;:?!-_" TO SPACES
MOVE 1 TO UnstrPtr
PERFORM UNTIL EndOfText
UNSTRING TextLine DELIMITED BY ALL SPACES
INTO UnpackedWord
WITH POINTER UnstrPtr
DISPLAY UnpackedWord
END-PERFORM
The final example (Example 15-5) shows how you can use INSPECT..CONVERTING to implement a simple
encoding mechanism. It converts the character 0 to character 5, 1 to 2, 2 to 9, 3 to 8, and so on. Conversion starts when the characters @> are encountered in the string and stops when <@ appears.
Example 15-5. Using INSPECT..CONVERTING to Implement an Encoding Mechanism
WORKING-STORAGE SECTION.
01 TextLine PIC X(70).
01 UnEncodedText PIC X(10) VALUE "0123456789".
01 EncodedText PIC X(10) VALUE "5298317046".
PROCEDURE DIVISION.
Begin.
DISPLAY "Text : "
WITH NO ADVANCING
ACCEPT TextLine
INSPECT TextLine CONVERTING
UnEncodedText TO EncodedText
AFTER INITIAL "@>"
BEFORE INITIAL "<@"
DISPLAY "Encoded = " TextLine
INSPECT TextLine CONVERTING
EncodedText TO UnEncodedText
AFTER INITIAL "@>"
BEFORE INITIAL "<@"
DISPLAY "UnEncoded = " TextLine
STOP RUN.
String Concatenation
String concatenation involves joining the contents of two or more source strings or partial source strings to create a single destination string. In COBOL, string concatenation is done using the STRING verb. Before I discuss the STRING
verb formally, let’s look at some examples to get a feel for what it can do.
370
Chapter 15 ■ String Manipulation
The first example concatenates the entire contents of the identifiers String1 and String2 with the literal "LM051"
and puts the resulting sting into DestString:
STRING String1, String2, "LM051" DELIMITED BY SIZE
INTO DestString
END-STRING
The second example concatenates the entire contents of String1, the partial contents of String2 (all the
characters up to the first space), and the partial contents of String3 (all the characters up to the word unique) and puts the concatenated string in DestString.
STRING
String1 DELIMITED BY SIZE
String2 DELIMITED BY SPACES
String3 DELIMITED BY "unique"
INTO DestString
END-STRING
The STRING Verb
The metalanguage for the STRING verb is given in Figure 15-6.
Figure 15-6. Metalanguage for the STRING verb
The STRING verb moves characters from the source string (SourceString$il) to the destination string
(DestString$il). Data movement is from left to right. The leftmost character of the source string is moved to the leftmost position of the destination string, then the next-leftmost character of the source string is moved to the next-leftmost position of the destination string, and so on. Note that no space filling occurs; and unless characters in the destination string are explicitly overwritten, they remain undisturbed.
When a number of source strings are concatenated, characters are moved from the leftmost source string first until either that string is exhausted or the delimiter (Delim$il) is encountered in that string. When transfer from that source string finishes, characters are moved from the next-leftmost source string. This proceeds until either the strings are exhausted or the destination string is full. At that point, the STRING operation finishes.
The following rules apply to the operation of the STRING verb:
• The ON OVERFLOW clause executes if valid characters remain to be transferred in the source
string but the destination string is full.
• When a WITH POINTER phrase is used, its value determines the starting character position for
insertion into the destination string. As each character is inserted into the destination string,
the pointer is incremented. When the pointer points beyond the end of the destination string,
the STRING statement stops.
371
Chapter 15 ■ String Manipulation
• When the WITH POINTER phrase is used, then before the STRING statement executes, the
program must set Pointer#i to an initial value greater than zero and less than the length of
the destination string.
• If the WITH POINTER phrase is not used, operation on the destination field starts from the
leftmost position.
• Pointer#i must be an integer item, and its description must allow it to contain a value one
greater than the size of the destination string. For instance, a pointer declared as PIC 9 is too
small if the destination string is ten characters long.
• The DELIMITED BY SIZE clause causes the whole of the sending field to be added to the
destination string.
• Where a literal can be used, you can use a figurative constant (such as SPACES) except for the
ALL literal figurative constant.
• When a figurative constant is used, it is one character in size.
• The destination item DestString$i must be either an elementary data item without editing
symbols or the JUSTIFIED clause.
• Data movement from a particular source string ends when one of the following occurs:
• The end of the source string is reached.
• The end of the destination string is reached.
• The delimiter is detected.
• The STRING statement ends when one of the following is true:
• All the source strings have been processed.
• The destination string is full.
• The pointer points outside the string.
String Concatenation Example
Example 15-6 shows how you can build a destination string a piece at a time by executing several separate STRING
statements. Each time a STRING statement executes, the current value of StrPtr governs where the characters from the source string are inserted into the destination string.
Example 15-6. STRING Examples Showing How to Use the WITH POINTER Phrase
DATA DIVISION.
WORKING-STORAGE SECTION.
01 DayStr PIC XX VALUE "5".
01 MonthStr PIC X(9) VALUE "September".
01 YearStr PIC X(4) VALUE "2013".
01 DateStr PIC X(16) VALUE ALL "@".
01 StrPtr PIC 99.
PROCEDURE DIVISION.
Begin.
DISPLAY DateStr
MOVE 1 TO StrPtr
372
Chapter 15 ■ String Manipulation
STRING DayStr DELIMITED BY SPACES
"," DELIMITED BY SIZE
INTO DateStr WITH PO
INTER StrPtr
END-STRING
DISPLAY DateStr
STRING MonthStr DELIMITED BY SPACES
"," DELIMITED BY SIZE
INTO DateStr WITH POINTER StrPtr
END-STRING
DISPLAY DateStr
STRING YearStr DELIMITED BY SIZE
INTO DateStr WITH POINTER StrPtr
END-STRING
DISPLAY DateStr.
String Splitting
String splitting involves chopping a string into a number of smaller strings. In COBOL, string splitting is done using the UNSTRING verb. Before I discuss the UNSTRING verb formally, let’s look at some examples to see what UNSTRING
can do.
The first example uses UNSTRING to break a customer name into its three constituent parts: first name, middle name, and surname. For instance, the string “John Joseph Ryan” is broken into the three strings “John”, “Joseph”, and “Ryan”:
UNSTRING CustomerName DELIMITED BY ALL SPACES
INTO FirstName, SecondName, Surname
END-UNSTRING
The second example breaks an address string (where the parts of the address are separated from one another by commas) into separate address lines. The address lines are stored in a six-element table. Not all addresses have six parts exactly, but you can use the TALLYING clause to discover how many parts there are:
UNSTRING CustAddress DELIMITED BY ","
INTO AdrLine(1), AdrLine(2), AdrLine(3),
AdrLine(4), AdrLine(5), AdrLine(6)
TALLYING IN AdrLinesUsed
END-UNSTRING
The final example breaks a simple comma-delimited record into its constituent parts. Because the fields are not fixed length, they need to be validated for length—and that requires finding out how long each field is. The COUNT IN
clause, which counts the number of characters transferred to a particular destination field, is used to determine the actual length of the field: