|
|
PGTS Humble BlogThread: Perl Programming |
|
![]() |
Gerry Patterson. The world's most humble blogger |
Edited and endorsed by PGTS, Home of the world's most humble blogger | |
| |
Variable Length EBCDIC Records |
|
Chronogical Blog Entries: |
|
| |
Date: Tue, 21 Apr 2009 22:57:21 +1000In June last year, I blogged about translating EBCDIC data files. The routine was a quick subroutine table that can be included in a perl script. There are a lot of packages around which do this, but sometimes it can be just as easy to put a copy of the translation subroutine in a separate file and include it in the perl script with the require verb. |
Sometimes when working with data from a mainframe, you need to deal with variable length records. This is pretty easy if the records were exported with BDW and RDW control words. These are the words which the mainframe uses to control access to variable length records. The format of the BDW and RDW words are as follows:
- Format of the Block Descriptor Word (BDW)
A variable-length block consists of a block descriptor word (BDW) followed by one or more logical records or record segments. The block descriptor word is a 4-byte field that describes the block. It specifies the 4 byte block length for the BDW plus the total length of all records or segments within the block.
There are two types of BDW. If bit 0 is zero, it is a nonextended BDW. Bits 1-15 contain the block length. Bits 16-31 are zeroes. The block length can be from 8 to 32 760 bytes. All access methods and device types support nonextended BDWs.
If bit 0 of the BDW is one, the BDW is an extended BDW and BDW bits 1-31 contain the block length. Extended BDWs are currently supported only on tape.
- Format of the Record Descriptor Word (RDW)
A variable-length logical record consists of a record descriptor word (RDW) followed by the data. The record descriptor word is a 4 byte field describing the record. The first 2 bytes contain the length (LL) of the logical record (including the 4 byte RDW). The length can be from 4 to 32760. All bits of the third and fourth bytes must be 0, because other values are used for spanned records.
A simple perl script that reads variable length records and strips off trailing NULLs, would be as follows:
#!/usr/bin/perl use strict; use Carp; require "/pgts06/src/tmp/quick_ebcdic"; # See earlier blog entry for details (click on above link) # ------------------------------------------------------------------------ # variables required by read_inp_file() my $inp_file_ptr = 0; # pointer to current record position my $inp_file_prev_ptr = 0; # previous ptr (start of record currently in buffer) my $inp_file_nxt_block = 0; # pointer to next inp_file block (from mainframe BDW) my $inp_rec_count = 0; # number of records read sub read_inp_file { my $rec_len; my $actual_len; my $buf; my $nbytes; my $l; my $s = 0; $inp_file_prev_ptr = $inp_file_ptr; # Are we at the start of a new block? if ($inp_file_ptr == $inp_file_nxt_block) { # yes, read the BDW (see comments above) return "" unless (sysread(EBCDIC_IN, $buf, 4) == 4); my $block_length = ord(substr($buf,0,1))*256 + ord(substr($buf,1,1)); # We do not grok extended BDWs ... croak "Cannot grok extended BDW" unless ($block_length < 32768); $inp_file_nxt_block += $block_length; $inp_file_ptr += 4; } elsif ($inp_file_ptr > $inp_file_nxt_block) { croak "Totals in RDWs do not match BDW - File Ptr: $inp_file_ptr"; } # Read the RDW (see comments above) return "" unless (sysread(EBCDIC_IN, $buf, 4) == 4); $actual_len = ord(substr($buf,0,1))*256 + ord(substr($buf,1,1)); # Most likely we have a corrupt file if this error occurs ... croak "Cannot grok spanned records" unless (substr($buf,2,2) =~ /[\x00][\x00]/); $inp_file_ptr += 4; # adjust for the RDW (included in the length) $actual_len -= 4; croak "Actual length: $actual_len" unless ($actual_len > 0); # Makes it easier to debug if there is a problem with the file. sysread(EBCDIC_IN, $buf, $actual_len); $nbytes = length($buf); croak "Corrupt Input file: $nbytes != $actual_len at record count: $inp_rec_count" unless ($nbytes == $actual_len); $inp_file_ptr += $actual_len; $inp_rec_count++; return($buf); } # ------------------------------------------------------------------------ my $buf; while (@ARGV) { open EBCDIC_IN,shift @ARGV || die "$!"; while ($buf = read_inp_file() ) { $buf =~ s/[\x00]+$//; print ebc2asc($buf) . "\n"; } } |
This script will read an input file and print the translated ASCII records to STDOUT. In a real data application, you would need to do some other processing with the data.
One disadvantage of the above script is it is rather slow. The binary read mode in perl is a lot slower then a C program. This is especially true if you are running the script in windows. By contrast a version of this script written in C is blindingly fast when run on a Unix system.