From Windows to UNIX: Text File Formats



  • In the "old days" people used to fear moving between operating systems because they felt that the OSes themselves were not compatible. This was never really the case, it was that filesystems (even those on floppy disks) and applications were not compatible and the tools for dealing with these issues were not readily available. Today, this is not an issue.

    However, one place that causes some consternation is that the default UNIX text file format and the default Windows text file format are not the same! The difference is not tragic and can be pretty easily remedied, but moving between the two it is very important to realize that creating a file on Notepad on a Windows machine and then transferring it to a UNIX machine will very likely introduce some potential problems. Often text editors on UNIX will handle the Windows file transparently making it difficult to realize that something is amiss until there is a difficult to track down parsing error in another system.

    So what is the difference? The basic format of both files is the same: ASCII text. What is different is how line breaks are handled. UNIX systems which include Linux, BSD, Solaris, AIX, Mac OSX and more end a line with a single LF character (line feed). The classic Macintosh OS use a single CR character, instead (carriage return.) The Windows world uses a CR character followed by an LF character, a system inherited from DOS.

    Moving from Windows to UNIX is as simple as stripping out the nearly trailing CR charaters. Going the opposite direction requires adding them. As with many things that we will encounter, the UNIX world is typically well positioned to handle these translation issues and on Windows, doing so is rather more cumbersome. Of course Windows can do this translation, but tools to do so are not a standard operating system component.

    The standard UNIX tool for converting Windows text files into UNIX text files and vice versa is aptly named dos2unix. This is an extremely common and well known utility and one installed on many servers but, it should be noted, rarely included in "minimal" installations (such as we are using for our lessons.) All major UNIX systems include this application, but you may need to add it in from your chosen system's software packages if it is not included by default.

    Using it is simple, you just give it the name of the file that you want to convert:

    # dos2unix a_text_file_from_windows.txt


  • Another typo
    nearly training CR charaters
    think you mean trailing CR characters



  • No need for extra utilities which may or may not exist, use stuff built right in universally:

    sed -e 's/\r\n/\n/g' file.txt
    

    Or tr

    tr -d '\15\32' < file.txt > file_new.txt
    

    Or in vi's command mode:

    :1,$s/^M//g
    

    Or if you have perl, many people have at least the basic interpreter:

    perl -p -e 's/\n/\r\n/' < file.txt > file_new.txt
    


  • Thanks, fixed.