Hprc banner tamu.png

Difference between revisions of "HPRC:CommonProblems"

From TAMU HPRC
Jump to: navigation, search
(Q: Why is my job pending?)
(Q: What is "Disk Quota Exceeded"?)
Line 53: Line 53:
 
===Q: Why is my program slow?===
 
===Q: Why is my program slow?===
 
===Q: What is "Disk Quota Exceeded"?===
 
===Q: What is "Disk Quota Exceeded"?===
 +
'''A:''' This message refers to one or more of your file quotas being reached.
 +
* Remember to check your quotas regularly with showquota.
 +
* '''SOLUTION:''' Clear out your problem directories of any unnecessary files.
 +
* For more information on filesystems and quotas, please refer to [[Ada:Filesystems_and_Files | this ]] page.

Revision as of 10:02, 25 July 2016

Common Problems & Quick Solutions

Accounts

Q: When do accounts expire?

A: Accounts expire at the start of the new fiscal year (September 1st). You can see when your account expires by going to our Account Management System (AMS) and checking under the Accounts tab.

Q: How do I get more SUs?

A: Students will need to have their PI transfer SUs to them. PIs can apply for up to two Small accounts for not more than 200,000 collective SUs. After this Small allocation has run out, PIs will need to apply for a Large allocation. See our Account Allocations page for more information on the allocation policies.

Q: How do I transfer SUs?

A: To transfer SUs, PIs will need a Small or Large account (see our Account Allocations page for more information). Once an account has been granted to the PI, they can transfer SUs to any of their researchers on our Account Management System (AMS). If a PI needs to add a new researcher, the PI must contact the Help Desk.

Batch Processing

Q: Why is my job pending?

A: There can be many reasons why a job would be pending:

  • Your job cannot fit on any of our nodes
    • If your job requests more than 245GB of memory, without requesting the xlarge queue, your job will be stuck pending.
    • SOLUTION: Kill your job and resubmit with less memory or in the xlarge queue. IMPORTANT NOTE: Your program MUST use Westmere compatible software to be able to run in the xlarge queue.
    • If your job requests more than 2TB of memory, your job will be stuck pending.
    • SOLUTION: Kill your job and resubmit with less memory.
    • If your job asks for more than the maximum number of cores per node (Ada: 20 or 40 with the xlarge queue, Curie: 16) with #BSUB -R "span[ptile=XX]" your job will be stuck pending.
    • SOLUTION: Kill your job and resubmit with a ptile value less than or equal to the maximum value for the cluster.
  • There are no job slots available
    • If your job requires the usage of the 256GB, 1TB, or 2TB nodes, your job might be pending for longer than usual.
    • If the cluster usage is particularly high right now, your job might be pending for longer than usual. You can see the System Load Levels on our Home Page.

Q: Why does my job fail?

A: There can be many reasons why your job fails. ALWAYS check your job (LSF) and program output files for information regarding why your job might have failed.

  • Wrong file format
    • If you edited your file on a Windows computer prior to using it on Ada, your file may be in the wrong format.
    • If you see errors in your output file caused by whitespace characters, your file may be in the wrong format.
    • SOLUTION: Try the dos2unix utility on your file and submit again.
  • Your job ran out of time
    • If you see "TERM_RUNLIMIT" in your job output file, your job ran out of time.
    • SOLUTION: Increase your wall time specification #BSUB -W HH:MM and submit again.
  • Your job ran out of memory
    • If you see "TERM_MEMLIMIT" in your job output file, your job ran out of memory.
    • SOLUTION: Increase your memory specifications #BSUB -R rusage[mem=XX] and #BSUB -M XX and submit again.
  • You ran out of space
    • If you see "DISK QUOTA EXCEEDED" in your output file, you ran out of disk space.
    • Remember to check your quotas regularly with showquota.
    • SOLUTION: Clear out your directories and submit again.

Q: How much memory do I need?

Q: How many cores should I use?

Q: How long is my job going to take?

Q: Why is my program slow?

Q: What is "Disk Quota Exceeded"?

A: This message refers to one or more of your file quotas being reached.

  • Remember to check your quotas regularly with showquota.
  • SOLUTION: Clear out your problem directories of any unnecessary files.
  • For more information on filesystems and quotas, please refer to this page.