Hprc banner tamu.png

SW:dos2unix

From TAMU HPRC
Jump to: navigation, search

dos2unix

dos2unix is a DOS/MAC to UNIX text file format converter. It is used to convert files created or edited within a Windows/DOS or Mac environment to a format acceptable for usage within UNIX systems.

The most common usage on TAMU HPRC clusters is to remove the hidden Windows/DOS-specific characters contained within files (input, scripts, ...) that need to be parsed by some program or interpreter.

Usage

To use dos2unix on a file named "my_job_script.sh":

 dos2unix my_job_script.sh

Note: This tool must be used every time a program is edited within a DOS environment. If you transfer a file to your local Windows computer and edit it, but sure to use dos2unix again.


Common causes

The two most common ways of introducing hidden characters:

1. Using a Windows editor such as Notepad to edit your job files and then copying to the HPRC clusters.

2. Copying from a webpage or pdf file and pasting into your file editor.

One way to check for hidden characters is to use the file command

file my_job_script.sh

Which may output the following indicating that you need to run dos2unix:

ASCII English text, with CRLF line terminators


If you see the following input then you should check again using a second method.

Bourne-Again shell script text executable

A second way to check your file for hidden characters is to use the -v option of the cat command

cat -v my_job_script.sh

Example 'cat -v' output showing hidden Ctrl+M characters at the end of each line:

#!/bin/bash^M
^M
echo 'Hello World!'^M
^M

Other types of hidden character

Correcting other types of hidden characters.

If you do 'cat my_file.txt' and you see -2 (for example) and then you do 'cat -v my_file.txt' and see -2M-BM- then to convert the -2M-BM- to -2 do the following:

If the results of 'file my_file.txt' is of type

UTF-8 Unicode text

then do the following to convert the hidden characters and file type:

cat my_file.txt > tmpfile
iconv --from-code UTF-8 --to-code US-ASCII -c tmpfile > my_file.txt