One of decisions that need to be faced early on in any such project is whether to stick with native files or to do some conversion. The problem with trying to stick with native files is that you have to stick with EBCDIC. Generally it is best to convert to ASCII where possible and get rid of packed decimal fields. There are several PODS that do this. But if you decide to roll your own, perl is up to the task. Although if processing time is an issue, C is the only way to go. I might blog about packed decimal at a later date.
You could also Consider Convert::IBM390 and Convert::EBCDIC
However if you stick to just signed fields in your COBOL program, you may have come across the overpunch problem. On the originally card machines the last digit would be an overpunched packed-decimal field to indicate that the entire field was negative. Non-negative numbers would be just a normal digit. This means that negative signed fields (S9) end up with a strange non-numeric last digit, and your carefully crafted scripts will choke on them. Fortunately perl scripts can easily deal with this.
The high nibble on the last byte differs according to the which system produced the signed field. Signed fields in Unix differ from the Big Iron signed fields. AS400 and VAX machines may produce different high nibbles depending on the COBOL compiler.
For example the IBM cob2 compiler on AIX appears to output S9 fields as follows:
if the field is positive { Display it as a normal ASCII digit (0x30 - 0x39) } else { Display the last digit as 0x7P Where P is a packed decimal (between 0x0 and 0xA) The range of values is 'p' (0x70) to 'y' (0x79) } |
So if the COBOL program opened files in ASCII mode, when it created them (highly recommended for Unix), your perl script should mask off the the high bit, so that you can decipher the last digit.
In perl this subroutine will do it:
sub leading_sep { my $field = $_[0]; if ( substr($field, -1) =~ /([p-y])/) { my $last_digit = ord ($1) & 0xF; return "-" . substr($field,0,length($field) - 1) . $last_digit; } else { return ("+$field"); } } |
For S9 fields transferred from the mainframe they will be translated from the EBCDIC equivalents (if you used the translate option during transfer).
For EBCDIC big iron the last digit is \xC0 to \xC9 for positive and \xD0 to \xD9 for negative (reverse of the packed decimal sign nibble), so you may have to modify your routine accordingly (depending on the mode of transfer from the mainframe).
And also, a convenient one-liner for stripping off those pesky low-values that often end up on the end of files produced by COBOL (especially those that were transferred from the mainframe):
perl -npe 's/[\x00]/ /g;s/\s+$/\n/;'
awk one-liner to turn space separators into TAB
awk '{for (i=1;i<NF;i++){printf "%s\t",$i} print $NF}'
Generally speaking low-values cause problems for scripts other than perl scripts (perl does not have a problem because it does not rely on nulls to terminate strings).
Also, if you are using perl one-liners (or perl scripts) remember that the printf "%d" command, will choke if the number is greater than 2147483647. To avoid this, you should use printf "%.0f" , or some variant.
For example here is the console output on a Linux (Ubuntu) workstation:
cmd> perl -e 'printf "%d\n",2147483648' -2147483648 cmd> perl -e 'printf "%.0f\n",2147483648' 2147483648 |