PGTS PGTS Pty. Ltd.   ACN: 007 008 568

point Site Navigation

point Other Blog Threads



  Valid HTML 4.01 Transitional

   Stop Spam! Stop Viruses!
   Secure And Reliable Ubuntu Desktop!

   Ubuntu

   If you own a netbook/laptop~
   Download Ubuntu Netbook!






PGTS Humble Blog

Thread: Perl Programming

Author Image Gerry Patterson. The world's most humble blogger
Edited and endorsed by PGTS, Home of the world's most humble blogger

Variable Length EBCDIC Records


Chronogical Blog Entries:



Date: Tue, 21 Apr 2009 22:57:21 +1000

In June last year, I blogged about translating EBCDIC data files. The routine was a quick subroutine table that can be included in a perl script. There are a lot of packages around which do this, but sometimes it can be just as easy to put a copy of the translation subroutine in a separate file and include it in the perl script with the require verb.

Sometimes when working with data from a mainframe, you need to deal with variable length records. This is pretty easy if the records were exported with BDW and RDW control words. These are the words which the mainframe uses to control access to variable length records. The format of the BDW and RDW words are as follows:

A simple perl script that reads variable length records and strips off trailing NULLs, would be as follows:
#!/usr/bin/perl

use strict;
use Carp;
require "/pgts06/src/tmp/quick_ebcdic";
# See earlier blog entry for details (click on above link)

# ------------------------------------------------------------------------

# variables required by read_inp_file()
my $inp_file_ptr = 0;           # pointer to current record position
my $inp_file_prev_ptr = 0;      # previous ptr (start of record currently in buffer)
my $inp_file_nxt_block = 0;     # pointer to next inp_file block (from mainframe BDW)
my $inp_rec_count = 0;          # number of records read

sub read_inp_file {
        my $rec_len;
        my $actual_len;
        my $buf;
        my $nbytes;
        my $l;
        my $s = 0;

        $inp_file_prev_ptr = $inp_file_ptr;
        # Are we at the start of a new block?
        if ($inp_file_ptr == $inp_file_nxt_block) {
                # yes, read the BDW (see comments above)
                return "" unless (sysread(EBCDIC_IN, $buf, 4) == 4);
                my $block_length = ord(substr($buf,0,1))*256 + ord(substr($buf,1,1));
                # We do not grok extended BDWs ...
                croak "Cannot grok extended BDW" unless ($block_length < 32768);
                $inp_file_nxt_block += $block_length;
                $inp_file_ptr += 4;
        } elsif ($inp_file_ptr > $inp_file_nxt_block) {
                croak "Totals in RDWs do not match BDW - File Ptr: $inp_file_ptr";
        }
        # Read the RDW (see comments above)
        return "" unless (sysread(EBCDIC_IN, $buf, 4) == 4);
        $actual_len = ord(substr($buf,0,1))*256 + ord(substr($buf,1,1));
        # Most likely we have a corrupt file if this error occurs ...
        croak "Cannot grok spanned records" unless (substr($buf,2,2) =~ /[\x00][\x00]/);
        $inp_file_ptr += 4;
        # adjust for the RDW (included in the length)
        $actual_len -= 4;
        croak "Actual length: $actual_len" unless ($actual_len > 0);
        # Makes it easier to debug if there is a problem with the file.
        sysread(EBCDIC_IN, $buf, $actual_len);
        $nbytes = length($buf);
        croak "Corrupt Input file: $nbytes != $actual_len at record count: $inp_rec_count"
                unless ($nbytes == $actual_len);
        $inp_file_ptr += $actual_len;
        $inp_rec_count++;
        return($buf);
}

# ------------------------------------------------------------------------

my $buf;
while (@ARGV) {
        open EBCDIC_IN,shift @ARGV || die "$!";
        while ($buf = read_inp_file() ) {
                $buf =~ s/[\x00]+$//;
                print ebc2asc($buf) . "\n";
        }
}

This script will read an input file and print the translated ASCII records to STDOUT. In a real data application, you would need to do some other processing with the data.

One disadvantage of the above script is it is rather slow. The binary read mode in perl is a lot slower then a C program. This is especially true if you are running the script in windows. By contrast a version of this script written in C is blindingly fast when run on a Unix system.


Other Blog Posts In This Thread:

Copyright     2009, Gerry Patterson. All Rights Reserved.