PGTS PGTS Pty. Ltd.   ACN: 007 008 568

point Site Navigation

point Other Blog Threads



  Valid HTML 4.01 Transitional

   Download Kubuntu Today

   Ubuntu

   The Power Of KDE + Ubuntu






PGTS Humble Blog

Thread: Tips/Tricks For Programming etc

GP JPG
Failure is not an option. It comes bundled with Microsoft Windows

CR/LF LF Linefeeds Again And Again


Chronogical Blog Entries:



Date: Tue, 25 Aug 2009 00:31:43 +1000

One task that never seems to go away is the old Windows CR/LF conversion problem. The apocryphal tale about this is that it was due to a basic misunderstanding of the difference between DCE and DTE back in the days of CP/M ... Whatever, this little annoyance will probably be with us forever. Here are some simple tips that help deal with it.

There are number of commands that will do this:

  1. dos2unix/unix2dos: These are available on Linux, FreeBSD, cygwin and many "nix"s. They are also known by the names fromdos and todos. In fact they are all just logical links to the same program. These commands do more or less what the names imply. Use "man" to read about the extra options. It is is the easiest and quickest way to convert files:

    unix2dos file.txt                       # convert file.txt unix to dos (over-write original)
    dos2unix -a /foo/bar/*.txt              # remove all "\r" from the specified files

  2. Other unix commands: However not all systems have the handy tofrodos commands. Almost all systems have sed and/or tr however. And most of them have awk and perl:

    awk '{print $0"\r"}' unix.txt > dos.txt
    sed -e "s/$/\r/" unix.txt > dos.txt
    perl -pne 's/\n/\r\n/' unix.txt > dos.txt
    tr -d "\r" < dos.txt > unix.txt
    cat dos.txt | tr -d "\r" > unix.txt
    perl -pne 's/\r//g' dos.txt > unix.txt  # this works like "dos2unix -a"
    sed -e 's/\r//g' dos.txt > unix.txt     # same as previous
    perl -pne 's/\r//' dos.txt > unix.txt   # removes the first occurence of "\r"
    sed -e 's/\r//' dos.txt > unix.txt      # same as previous
    sed -e 's/\r$//' dos.txt > unix.txt     # this works like dos2unix (without -a)

  3. Also, Gvim, vim and vi can eliminate carriage returns (and other control characters) with the substitute command. This is handy if the file is malformed. e.g. if most of the lines end with CR/LF but a few don't. In such cases Gvim (or vim) will load the file as a regular file and display all the carriage returns as the control character ^M. Gvim and vim, depending on how they are configured, usually show control characters in a different color. Stock standard vi doesn't have colours or dos filemode, and will always display carriage returns as ^M. You can get rid of the carriage returns (and other control characters) by using the Ctrl-V quote feature in vi or vim. The command will appear as follows:

    :%s/^M//

    Where ^M character is created with the the two keyboard presses of Ctrl-V Ctrl-M. For more details about the Ctrl-V quote command see the note below

  4. Windows commands: However, you may be stuck in Windows, without cygwin. If so, your best option is to install Activestate perl. Failing that the type command will cope with files terminated with a single "\n":

    perl -npe "" unix.txt > dos.txt
    perl -ne "print" unix.txt > dos.txt
    type unix.txt | find /V "" > dos.txt

The standard Activestate perl distribution comes configured for Windows. So the CR/LF is done automatically when you write to STDOUT. That's why the command with an empty expression (above) will work in Windows. NB: to stop this behaviour, include this line in your perl scripts:

binmode(STDOUT);

Alternatively the program Gvim for Windows is very powerful and capable of loading fairly large files (depending on how much memory your workstation has). In this case files can be converted by using the :set filetype=dos command.

Note: As mentioned above vi (and vim) can replace control characters using the the substitute command combined with the Ctrl-V quote technique. This is simply a matter of pressing Ctrl-V followed by the control character combination that you wish to quote. The (ASCII) control characters combinations are as follows:

Key Hex Code Description
^A 0x01 SOH Start Of Heading
^B 0x02 STX Start Of Text
^C 0x03 ETX End Of Text
^D 0x04 EOT End Of Transmission
^E 0x05 ENQ Enquire
^F 0x06 ACK Acknowledge
^G 0x07 BEL Bell
^H 0x08 BS Backspace
^I 0x09 TAB Tab
^J 0x00 NUL Null
^K 0x0B VT Vertical Tab
^L 0x0C FF Form feed
^M 0x0D CR Carriage return
^N 0x0E SO Shift Out
^O 0x0F SI Shift In
^P 0x10 DLE Data Link Escape
^R 0x12 DC2 Device Control 2
^T 0x14 DC4 Device Control 4
^U 0x15 NAK Negative Acknowledge
^V 0x16 SYN Syncronous Idle
^W 0x17 ETB End Of Transmission Block
^X 0x18 CAN Cancel
^Y 0x19 EM End Of Medium
^Z 0x1A SUB Substitute
^[ 0x1B ESC Escape
^\ 0x1C FS File Separator
^] 0x1D GS Group Separator
^^ 0x1E RS Record Separator
^_ 0x1F US Unit Separator

This all sorta works ... The control characters are mapped to the various letters of the alphabet starting with the letter A. Those of you who actually remember using a teleprinter terminal will appreciate that this mapping makes a weird mnemonic type of "sense" (if you know your Hex). And the missing characters make sense also. Ctrl-Q and Ctrl-S traditionally control output to the screen and don't appear in the table above. And of course in "vi" you can't "quote" Ctrl-J since it maps to the record terminator ("\n"), and therfore Ctrl-J becomes the "null" character. BTW If you press Ctrl-J while using the console it will map to the line-feed terminator. In fact all the control characters map to the corresponding non-control characters masked with 0x1F - e.g. Ctrl-A is 0x41 & 0x1F, Ctrl-B is 0x42 & 0x1F, ... etc.


Other Blog Posts In This Thread:

Copyright     2009, Gerry Patterson. All Rights Reserved.