PGTS Humble Blog



2018 Using mintty as a replacement for putty A Tiny Problem The New Improved Task Scheduler 2017 Counting arguments in a windows batch file. Now is the hour of hour discontent Transcode Matroska X265 to MP4 X264 Citrix client on Ubuntu 16.04 2016 Running Steam in Ubuntu 16.04 (64-bit) Tail Command For Windows (again) Installing ubuntu 15.10 on HP Pavilion (laptop) 2015 Restoring postgres databases 2014 Upgrading to Ubuntu 14.04 Server Setting File Associations with ASSOC and FTYPE in Windows 7 Ubuntu 14.04 LTS (Trusty Tahr) Editing MP4 Tags. Building CPAN Module for Strawberry in Windows 7 2013 Cygwin UID and Group IDs Installing Ubuntu On HP 650 Notebook 2012 Using Postgres Sequences Query Column Names In A Postgresql Table Converting HTML to PDF. Using ODBC with Windows Seven. 2011 Problems With /etc/fstab And Server Upgrade Buffering Audio Streams. Configuring Nautilus Network Manager In Ubuntu 2010 Upgrading To Cygwin 1.7 2009 Convert Video For A5146 With ffmpeg Gmail And Other MUAs CR/LF LF Linefeeds Again And Again Language Settings Open Office Copying Firefox Settings In Windows XP A Couple Of Tips For Vim and Gvim Some Handy Commands 2008 Detecting EBCDIC With Perl Problems with k3b library in Kubuntu Citrix Client On Ubuntu NDS Using R4DS For A DS Lite Simple EBCDIC Translation Which For Windows Ghostscript For Cygwin Back To The Floppy - Another Blast From The Past Using Filezilla with SSH in Windows Cygwin - Never Leave Unix Without It Coping With COBOL Signed (S9) Fields With Perl 2005 Postgres and Network Address Types
PGTS Humble Blog Thread: Tips/Tricks For Programming etc
	Gerry Patterson. The world's most humble blogger
Edited and endorsed by PGTS, Home of the world's most humble blogger

Detecting EBCDIC With Perl
Chronogical Blog Entries: Prev: 29-Nov-2008 The Rabbit-proof Internet Fence Next: 02-Dec-2008 Jumping The Shark

Date: Sun, 30 Nov 2008 10:26:21 +1100 Recently I was writing a perl script and I found that I needed to identify whether or not certain files were ASCII or EBCDIC. There are probably many ways to do this. The easiest way that I know of is to use the "file" command.

For example in AIX, you might use the following perl code:

my $ftype = `file $foo`;
if ( $ftype =~ /ascii text\n/) {
        print "ASCII file\n";
} elsif ( $ftype =~ /data or International Language text\n/) {
        print "EBCDIC File\n";
} else {
        print $ftype;
}

However because this involves a system call and reads at least 1024 bytes, of each file it can be a little slow. More bothersome was the fact that the EBCDIC files had been transferred from a mainframe using various methods. Some were fixed length binary files and some had an ASCII "\n" terminator at the end of each line of EBCDIC text. Depending on what type of Operating system you are using the message returned from the file command can vary. For example in (Ubuntu) Linux, the file command might return the following for EBCDIC files:

ISO-8859 text

ISO-8859 text, with very long lines, with no line terminators

or if the file has newline terminators

Non-ISO extended-ASCII text, with LF, NEL line terminators

Whereas for ASCII files, it might return

ASCII text

ASCII English text, with CRLF line terminators

Probably there is a way to reliably, easily (and quickly) determine a file's type. However, I know that these particular files are all mainframe reports which begin with ANSI control codes. Mainframe programmers will realise that this is just a number (like "0" or "1", etc) And the EBCDIC codes for numbers are 0xF0 - 0xF9. In fact EBCDIC text is practically guaranteed to have a character in the range 0x80 - 0xFF. And of course those sort of characters NEVER occur in an ASCII file.

Furthermore since in this particular case the files were either ASCII or EBCDIC, I settled on the following rather quick and dirty subroutine.

sub is_ebcdic_rpt {
        my $buf;
        open RPT,"gzip -dc $_[0]|" || die "Cannot open file $!";
        my $n = read RPT,$buf, 8;
        close RPT;
        return 0 if ($n < 8);
        return 1 if ($buf =~ m/[\x00-\x07\x80-\xFF]/);
        return 0;
}

In the above case the files were all gzipped. Essentially this script reads the first 8 bytes of the gzipped file, and determines if there is a character in the range 0x80 - 0xFF. Such an approach may not be suitable for you. This would be especially true for EBCDIC text that contained no ISO control characters and mostly spaces (0x40 or ASCII '@').

PGTS Humble Blog

Thread: Tips/Tricks For Programming etc

Detecting EBCDIC With Perl

Chronogical Blog Entries:

Date: Sun, 30 Nov 2008 10:26:21 +1100

Recently I was writing a perl script and I found that I needed to identify whether or not certain files were ASCII or EBCDIC. There are probably many ways to do this. The easiest way that I know of is to use the "file" command.

Other Blog Posts In This Thread: