PGTS PGTS Pty. Ltd.   ACN: 007 008 568               Mobile Version Coming Soon

point Site Navigation







Valid HTML 4.01!






   Download Kubuntu Today

   Ubuntu

   The Power Of KDE + Ubuntu





Feedback and Hints, February 2003, Published March 2003

If you have a question regarding any of the articles in this journal, or some comments please send them in. If there are any general questions about Unix or Database Administration, I will attempt to answer them.

Feedback:

Spam Diaries:

Hints for this month:


AltaVista Again ...

From: Brian Robson
Date: Mon, 10 Feb 2003 12:29:03 +1100 (EST)
Subject: Search Engine Crawlers

Hi there,

(1) I have been trying to find out which search engine belongs to Mercator

(2) A search for Mercator-2.0 leads to your database.

(3) Here is another page of someone analysing visitors:
    www.lodestone.org/people/maria/about/scooter.html.
    (found while searching for "mercator robot")

(4) It appears that Scooter and Mercator are both part of Altavista.

(5) Recent tests on any old thing show All The Web, Teoma and Altavista
    are all pretty much as good as Google and all have got their acts
    together (swim or sink these days, no more dot-com $$$$). However
    Google has stolen the market and created the word "to google", that
    happened after Alta Vista spent three years pretending to be a
    portal and blew their market share.

Cheers,

Brian

There has been a lot of interest shown in the AltaVista crawlers. They are busy Bots (the second busiest on the web), and website owners must be curious about them. The version of Mercator which visits my site seems to come from a CIDR that is owned by DEC. I recall that DEC used to own AltaVista. A good summary of the changing fortunes for AltaVista (and DEC) can be found at: searchenginewatch.com/searchday/02/sd1218-altavista.html

Back To Top


Using awk to decipher a pic 9 comp field.

From Jayant Feb 24, 2003 at 04:37:20PM +1100
Dear Webmaster,

I have a requirement where I need to read a COBOL file with binay fields
(I mean COMP data items), I need to convert them to normal display
format for reporting. I certainly do not want to write another COBOL
program to do so and I am sure I can use AWK for that. could you please
tell me how? How a PIC 9(8) COMP (COBOL format) will be defined and
handled in AWK and so on

Thanks,
Jayant

This depends on the platform and the type of file. If the file has been PRODUCED by a COBOL program I will assume that it is a fixed length (flat) file. If this is the case your first problem will be breaking the data into chunks. In unix this is most easily achieved with fread(). However awk prefers to read data that has been terminated with '\n' (0Ah). You can get round this, however you might find perl better suited to your task. If you are familiar with perl I would recommend that you use it. Perl has some powerful built-in functions for reading and converting all manner of fixed length (flat) files, e.g. pack, unpack, read, syscall, sysread, syswrite etc. And this will save you time.

As far as deciphering pic 9 comp -- once again it depends on the platform. Some systems today store packed decimal as one digit per byte, however I have seen it stored as two digits per byte. You should be able to figure out which scheme your system employs. For a two digit per byte storage schemes the number 9421 in a pic 9(4) comp field would look like 0x9421. Whereas in a one digit per byte scheme it would be 0x09040201

The easiest way to figure this out, is to look at a hex or octal dump of the file in question. Look for a known record, and match the fields to see if it meets your expectations.

Jayant sent me some sample data and the message below. Unfortunately the sample he sent was mangled by transmission via the Internet (see this month's hints).

From Jayant Tue Feb 25 14:00:58 2003

I have defined the numeric field as PIC 9(5) COMP. I think as per COBOL
definition it should take three bytes. (5+1)/2 = 3. But I am not sure,
it might have taken 2 bytes. I really dont know. But the fact is that is
is not COMP-5. It is COMP.

Also, I want perl on unix and not on windows. I am using AIX-UNIX
operating system and that is where I have a file generated by COBOL
program. What I sent you was just a two line of COBOL where I moved
34532 into a COMP filed and used a display statement

In Working storage section of the cobol program I defined like below

01 WS-ITEM.
  05 A-1    PIC X(6) VALUE "Jayant".
  05 A-2    PIC 9(5) COMP VALUE 34532.
  05 A-3    PIC X(7) VALUE "Manohar".

and in procedure division I executed

  display ws-item.

and I got the result that I have sent you. When I directed the output to a
file and saw the file using "vi" editor, I saw the value that I have sent you.

So If I use awk as below. Assuming the file name containing the output of the
display statement is "a.a" then:

awk '{print substr($0,1,6)}' a.a will give me "Jayant" - Occupying 6 bytes
awk '{print substr($0,7,3)}' a.a will give me "^ئل"    - Occupying 3 bytes
awk '{print substr($0,10)}' a.a will give me "Manohar" - Occupying 7 bytes

Now I want the second awk statement "awk '{print substr($0,7,3)}' a.a"
to give me 34532.

Thanks,
Jayant

First of all I recommend that you use gawk rather than awk. Your sysadmin may have set it up like this already, and just created a logical link to gawk from awk. Most people use gawk. It is almost a universal standard and has many more features.

Lately I have been using perl to do many of the tasks that I used to perform with awk. So I may be a bit rusty with this superb little program, which I used quite often for data conversion. The most common type of data in COBOL programs is COMP-3 (or BCD). The sign nibble is usually the last nibble in the BCD string, so you would need to evaluate it in an awk program. COMP (or "USAGE COMPUTATIONAL") internal format varies considerably from platform to platform. If you are using COMP, you will need to determine the actual format. This should be documented somewhere. Although it should be easy to figure out, if you can get a dump of the fields in question.

For the latest copy of perl try: www.activestate.com. For a good discussion of COMP-3 fields see: www.discinterchange.com/TechTalk_Packed_fields_.html. Also www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0205a where I discuss one of my own personal war stories.

From memory there is no way to deconstruct an ascii byte with awk ... i.e. turn it into a number (which you need to do). Not to worry. With awk you can easily roll your own.

Here is an example which might do what you want:

# a.awk example program:
BEGIN{
	# load the charmap array with all 256 possibilities
	for( i = 0; i < 256; i++)
		charmap[sprintf("%c",i)] = i;
}
# decipher a comp-3 field
function uncomp3(a){
	for (i = 1; i <= length(a); i++)
		t = t sprintf("%X",charmap[substr(a,i,1)]);
	s = substr(t,length(t));
	t = substr(t,1,length(t)-1)
	if (s ~ /D/)
		t *= -1;
	return(t + 0);
}
# Main ... put your code here ...
{
	print uncomp3(substr($0,7,3));
}
# end example program

Save this in a file called a.awk. You could run it against the file a.a as follows:

	awk -f a.awk a.a

This program achieves a result by using the function uncomp3, which converts the packed BCD using a character map (charmap). Recall that all arrays in awk are associative (what perl calls hashes). So all this is a lookup table. It uses the sprintf function with "%c" to print a character value of all 256 possible bytes. These bytes and the character formed with sprintf make up the lookup table. Therefore if you want the value of the ASCII character zero (48 decimal or 0x30), you could get it from the lookup table as:

	charmap["0"]

The last nibble of the last bit contains the sign of the number (0xD for negative).

It turned out that the data was not packed BCD, but plain binary. Many AIX systems run on RISC hardware, which use plain old big-endian notation.

In this case you could modify the program that I sent ... e.g. let's call it b.awk:

# ------------------------------------------------------------------------
# b.awk example program:
BEGIN{
	# load the charmap array with all 256 possibilities
	for( i = 0; i < 256; i++)
		charmap[sprintf("%c",i)] = i;
}
# decipher a comp (binary bigendian) field
function uncomp(a){
	p = 0;
	for (i = length(a); i; i--){
		t += charmap[substr(a,i,1)] *(256^p);
		p++;
	}
	return(t + 0);
}
# Main ... put your code here ...
{
	print uncomp(substr($0,7,3));
}
# end example program
# ------------------------------------------------------------------------

I think you'll find that something like this works ... NB you still need to figure out the sign bit ... actually it's probably the top byte (i.e. a sign byte)

NB: This awk program will not work on Intel hardware (beware the little-endian trap!). If your system uses Intel binary you must modify this algorithm

Back To Top


You've won the lottery!

There has been a spam drought this month! However Brian was fortunate to win the lottery! Here is the e-mail he sent me. Recipient addresses have been x'ed out.

From: Brian Robson
Subject: What Next?  I've Won the Lottery

Hi there,  Gerry,

My luck seemsd to be increasing by the day!!!

Strewth, now i've won the lottery, but they want me to keep it a secret!!!

I thought i had better tell you, all the same...

Brian

PS: The email address they have used is a combination, some pretty smart
crawler is prowling about.

==============================================================
Return-Path: <megalottery@netscape.net>
Delivered-To: brianr-xxxxxxxxxxxxxxxxxxxxxx
Received: (qmail 25499 invoked from network); 18 Feb 2003 01:12:49 +1100
X-Filtered: qmail-filter $Revision: 1.6 $ $Date: 2001/02/13 23:41:19 $
Received: from node-d-fe88.a2000.nl (HELO mail1.xxx.xxx.xx) (62.195.254.136)
  by nhj.xxx.xxx.xx with SMTP; 18 Feb 2003 01:12:48 +1100
From: "MEGA LOTTERY INTERNATIONAL." <megalottery@netscape.net>
Date: Mon, 17 Feb 2003 15:12:37
To:sharon@xxxxxxxxxxx.xxx
Subject: FINAL WINNING NOTIFICATIONS!
MIME-Version: 1.0
Content-Type: text/plain;charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

MEGA LOTTERY INTERNATIONAL.
FROM: INTERNATIONAL PROMOTION/PRIZE AWARD DEPT.

REF: OGS/2311786008/01
BATCH: 14/011/IPD /NL
RE: WINNING NOTIFICATION / FINAL NOTICE

Sir/Madam

We are pleased to inform you of the result of
the Lottery Winners International programs held on the
17th of Febuary 2003. Your e-mail address attached to
ticket number 20511465886-629 with serial number
3772-99 drew lucky numbers 7-14-17-23-31-22, which
consequently won in the 2nd category, you have
therefore been approved for a lump sum pay out of
1,000,000.00 Euro. (One Million Euro)
CONGRATULATIONS!!!
Due to mix up of some numbers and names, we ask that
you keep your winning information confidential until
your claims has been processed and your money
Remitted to you. This is part of our security protocol
to avoid double claiming and unwarranted abuse of
this program by some participants.
All participants were selected through a computer
ballot system drawn from over 100,000 company and
50,000,000 individual email addresses and names from
all over the world. This promotional program takes
place every three-year. We hope with part of your winning
you will take part in our end of year 50 million Euro
International lottery. To file for your claim, please
contact our fiducially agent Dr. RICHARDS MEYER. Of the,
 WESTHERN   ATLANTIC CONSULTANCY
TEL: +31-630-555-940.
FAX: +31-645 234 989.
Email: dr.richardsmeyer@consultant.com
Remember, all winning must be claimed not later than
31st of March 2003. After this date all unclaimed
funds will be included in the next stake. Please note
in order to avoid unnecessary delays and complications
please remember to quote your reference number and
batch numbers in all correspondence. Furthermore,
should there be any change of address do inform our
agent as soon as possible.

Congratulations once more from our members of staff
and thank you for being part of our promotional program.

Note: Anybody under the age of 18 is automatically
disqualified.

Sincerely yours,

Mrs. Jennifer Van Bosch
Lottery Coordinator.

Gosh Brian! Now that you are a multi-millionaire, does this mean you'll shout me a bottle of Grange Hermitage, next time I'm in Bondi? The netscape.net address was a genuine mail box and the scammer intended to use it as one of the clearing houses. Some ISPs get pedantic about the fact that spam did not actually originate from their network. They do not understand (or worse still pretend that they do not understand) that scammers always include a genuine address somehere in the e-mail, so that their intended victims can contact them. The genuine e-mail address will have been setup for criminal purposes (i.e. fraud). I got no arguments in this case ... Congratulations to the team at AOL.com, who zapped him immediately. It was one of the fastest responses to an abuse complaint that I have encountered! Well Done guys! However, of equal interests is the e-mail address (in text form) at consultant.com. According to whois this domain is hosted by register.com: .

   Domain Name: CONSULTANT.COM
   Registrar: REGISTER.COM, INC.
   Whois Server: whois.register.com
   Referral URL: http://www.register.com
   Name Server: DNS11.REGISTER.COM
   Name Server: DNS12.REGISTER.COM
   Status: ACTIVE

Back To Top


Sending fragile data by e-mail.

Some data is fragile and cannot withstand the rough and tumble of transmission in the body text of an e-mail. This is especially true for binary data or any data which has an ASCII value greater than 127 (0x7F). Remember that ASCII is a 7-bit code, namely:


| 00 nul| 01 soh| 02 stx| 03 etx| 04 eot| 05 enq| 06 ack| 07 bel|
| 08 bs | 09 ht | 0a nl | 0b vt | 0c np | 0d cr | 0e so | 0f si |
| 10 dle| 11 dc1| 12 dc2| 13 dc3| 14 dc4| 15 nak| 16 syn| 17 etb|
| 18 can| 19 em | 1a sub| 1b esc| 1c fs | 1d gs | 1e rs | 1f us |
| 20 sp | 21  ! | 22  " | 23  # | 24  $ | 25  % | 26  & | 27  ' |
| 28  ( | 29  ) | 2a  * | 2b  + | 2c  , | 2d  - | 2e  . | 2f  / |
| 30  0 | 31  1 | 32  2 | 33  3 | 34  4 | 35  5 | 36  6 | 37  7 |
| 38  8 | 39  9 | 3a  : | 3b  ; | 3c  < | 3d  = | 3e  > | 3f  ? |
| 40  @ | 41  A | 42  B | 43  C | 44  D | 45  E | 46  F | 47  G |
| 48  H | 49  I | 4a  J | 4b  K | 4c  L | 4d  M | 4e  N | 4f  O |
| 50  P | 51  Q | 52  R | 53  S | 54  T | 55  U | 56  V | 57  W |
| 58  X | 59  Y | 5a  Z | 5b  [ | 5c  \ | 5d  ] | 5e  ^ | 5f  _ |
| 60  ` | 61  a | 62  b | 63  c | 64  d | 65  e | 66  f | 67  g |
| 68  h | 69  i | 6a  j | 6b  k | 6c  l | 6d  m | 6e  n | 6f  o |
| 70  p | 71  q | 72  r | 73  s | 74  t | 75  u | 76  v | 77  w |
| 78  x | 79  y | 7a  z | 7b  { | 7c  | | 7d  } | 7e  ~ | 7f del|

So if your data contains bytes that are larger then [del] (0x7F or 127 decimal), it will be mangled as it passes through Internet mail gateways. When in doubt you should send the data in a seperate file as an attachment. Most modern MUAs (and even a few primative ones like mine) will automatically MIME encode an attachment. MIME (or UU) encoding turns all the data into 7-bit ASCII which will safely negotiate the Internet.

Back To Top