Hprc banner tamu.png

SW:Galaxy

From TAMU HPRC
Jump to: navigation, search

Galaxy

Maroon Galaxy Accounts

Maroon Galaxy is available to students, faculty and staff for research use.

Before you request an account on Maroon Galaxy, you must do the following:

  • Go to usegalaxy.org and get familiar with Galaxy. You can start with a free account and learn about Galaxy tools.
  • Request a Maroon Galaxy account only if you have data to analyze, otherwise use FishCamp Galaxy for training and practice.
  • If you decide that Galaxy is a good choice for your research project then do the following
    • Establish an account on Ada by sending a request. See the NewUser page for details on how to request an account.
    • After you have your Ada account approved, request an account on Galaxy
    • Send us information on what type of data you will be analyzing and which tools you expect to use for your research project.

If you are off campus then you will have to install and run the TAMU VPN to connect to Maroon Galaxy.

Maroon Galaxy can be accessed using your favorite web browser such as Firefox, Chrome or IE.

https://hprcgalaxy.tamu.edu/maroon/

Set Your Default Account

If you would like to change your default service unit (SU) account

  • Go to our portal website at: portal.hprc.tamu.edu
  • After you login with your netid and password, click Clusters -> Ada Shell Access
    • Then enter your password when prompted.
  • Then type in the following to see your accounts
    • myproject
  • Then use the following command and replace the ### with your project account from the myproject command output
    • myproject -d ###
  • Then verify that there is a Y in the 'Default' column of your account by typing
    • myproject
  • If the Y is in the Default column then you are done and can type exit to exit the shell then close the portal tab in your web browser.
  • Then you will need to resubmit your Galaxy jobs and delete the current Galaxy queued jobs.

Account Security

Do not share your Galaxy account with anyone. Galaxy uses the TAMU Central Authentication Service which is linked to your TAMU account.

Make sure you always logout of Galaxy by selecting User -> Logout and then click the Logout button on the next screen and then close your browser when you are finished using Galaxy.

Galaxy on Ada Tutorial

slides

Permanently Delete unwanted files

In order to free disk space, you should permanently delete files that you have already deleted from the history.

Please make this part of your Galaxy work routine in order to free up disk space.

This is a two step process.

1. Click the X on the history item you want deleted.

2. At the top of the history panel, you will see a link to deleted files. Click that link and then select "Permanently remove it from disk" for the history item that you want removed

FishCamp Galaxy Accounts

The FishCamp Galaxy instance is reserved for training purposes such as Galaxy workshops.

When requesting access to FishCamp for a training workshop, please include your ada NetID in your request.

  • The FishCamp Galaxy is configured for training purposes.
    • Most jobs will run a maximum of 1 hour.
    • This is to enable jobs to be scheduled faster in the cluster queue.
    • Keep your input datasets small so that they will complete within one hour.

FishCamp Galaxy is not intended for research projects and data on FishCamp Galaxy should be considered to have short term accessibility.

Request a Maroon Galaxy account only if you have data to analyze.

The tools available on FishCamp and Maroon are the same.

If you are off campus then you will have to install and run the TAMU VPN to connect to FishCamp Galaxy.

Fishcamp Galaxy can be accessed using your favorite web browser such as Firefox, Chrome or IE.

https://hprcgalaxy.tamu.edu/fishcamp/

Uploading Files > 2GB via FTP to Maroon Galaxy

From a UNIX Computer (Mac or Linux)

Go to the directory containing the file on your local computer. (for example: sample_file.fastq)

Click "Upload File" icon then click "Choose FTP file" for which port to use. Port 2121 is for Maroon Galaxy.

Each Galaxy instance has its own port. The port number can be found by clicking the "Choose FTP file" button in the Galaxy upload tool menu.

Connect to the ftp server using your Maroon Galaxy credentials

 sftp -P 2121 Your_NetID@ada7.tamu.edu

Depending on your sftp version or if you are on an ada fast transfer node, you may have to use the following:

 sftp -oPort=2121 Your_NetID@ada7.tamu.edu

The sftp prompt looks like this:

 sftp>

When you see the sftp prompt, you can upload a file with the put command followed by the file name to upload.

 put sample_file.fastq

Then you can verify that the file was uploaded by using the ls command.

 ls

Then exit the sftp prompt and go to Galaxy to upload files and click the button 'Choose FTP file' which will show you the file you uploaded.

 exit

Using MobaXterm

Click the Session button in the top left corner then click the SFTP button and set Remote Host = ada7.tamu.edu, port = 2121 for Maroon Galaxy, also enter your username.

Then in the Advanced Sftp settings tab, check the box for 2-steps authentication and then click the OK button.

Enter your password at the first the password prompt.

The second prompt does not display the DUO authentication menu instead it asks for your password again but you should enter the number 1 to receive a DUO push or phone call whichever is enabled for your account instead of entering your password again. If you have a YubiKey, you can push it at the second password prompt instead of entering the number 1

Using Filezilla

Filezilla does not work with 2 factor authentication for Galaxy ftp uploads. It is recommended to use MobaXterm as an alternative.

Using WinSCP

WinSCP does not work with 2 factor authentication for Galaxy ftp uploads due to using non-standard port numbers. It is recommended to use MobaXterm as an alternative.

Reveille Galaxy

   sftp -P 2123 Your_NetID@ada7.tamu.edu

Requesting New Galaxy Tools

  • Go to usegalaxy.org or galaxy_toolshed and find a tool you want installed in Maroon Galaxy and send us the URL for the tool.
  • Send us a list of tools with the URLs that you would like installed so we can install them all at once.
  • Each time a tool is installed, the Galaxy server must be restarted.
    • We would prefer to install many tools at once and then restart the Galaxy server.
    • It is rare that restarting the Galaxy server can corrupt some jobs but it has happened and we want to reduce the chances of interrupting submitted jobs.

When a Galaxy tool is Available

  • If you know of an existing Galaxy tool that you have found at usegalaxy.org, for example, send us a request identifying the URL for the tool in the toolshed repository or usegalaxy.org.
  • We do not install tools via toolshed directly but we do download them and manually install in order to verify that the Ada system modules are properly configured in the Galaxy tool.

When a tool has no Galaxy interface

  • There are some Bioinformatics tools that do not have a Galaxy interface developed.
  • We will need to spend more time creating the xml configuration files and testing.
    • This will take more time than adding a tool from the Galaxy toolshed.
    • Some tools are simple and can be done rather quickly but others will require extensive development time.
    • In some rare cases we may not be able to configure a Galaxy tool for a software application due to the complexity of the tool installation which will not work in the Galaxy environment.

FAQ

1. Q: My job immediately fails with the message:

The cluster DRM system terminated this job

1. A: Check your file quota using the 'showquota' command at the Ada command line

2. Q: My job fails after running for a long time with the message:

The cluster DRM system terminated this job

2. A: Check the email sent to you regarding your Galaxy job and see if you see the following line then your job requires more memory or time to run. Contact us to update the tool configuration as needed.

 TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.

3. Q: My job has been in the queue status for a long time, what is going on?

3. A: First check the Ada command line to see if you job is pending (PEND) in the Ada queue by using the bjobs command. If the job is PEND then the cluster is busy at the moment. You can see how busy the Ada cluster is by looking at the System Load Levels on the HPRC homepage.

3. A: Check your SUs to make sure you have enough to run the Galaxy job by using the "My HPRC SU balance" tool in Galaxy or you can use the Ada command 'myproject -l' on the Ada command line. If you are unsure how many SUs the tool requires, contact the HPRC helpdesk.

3. A: In come cases, you will need to delete the queued job and restart but check the above conditions and contact the HPRC helpdesk first.

Tool specific notes

Fastq groomer

Fastq groomer is not always needed as it can take a long time to make a copy of your data which may already be in fastqsanger format.

See Galaxy tutorial on checking if Illumina reads are already in fastqsanger format.

You can find out the specific fastq format of your file by running the FastQC tool and looking at the RawData result for the Encoding.

If you see something like the following then all you need to do instead of running Fastq groomer is to click the pencil icon on your file in the right panel and change the datatype to fastqsanger which is more efficient than running Fastq groomer.

Encoding	Sanger / Illumina 1.9

If your file Encoding is already Sanger / Illumina 1.8+ then running fastq groomer on your file will only make an exact duplicate of the file and you will have unnecessarily used up SUs and disk space.

Trinity

Before you run a Trinity job

Contact the HPRC helpdesk (help@hprc.tamu.edu) so that we can increase your Ada file quota.

All new Ada accounts have a default max file quota of 50,000 files. Trinity will produce ~100,000+ temporary files during a de novo assembly. Very large data sets can create 500,000+ temporary files.

Please monitor your file quota during your job run using the following command and request an increase if your job is creating enough files that your file quota might be reached.

 showquota

Be aware that starting multiple Trinity jobs at the same time may cause you to reach your file quota at a quicker pace. Please monitor your file quota during your job run.

If your Trinity job Fails

  • If your Trinity job failed and after you have determined the reason for you Trinity job failing, you will need to delete the failed job files by selecting the 'Permanently remove it from disk' link which is part of a two step process.
    • First delete the job file in the right panel by clicking the X, then select the 'deleted' link in the right panel at the top which will show all deleted files.
    • Then click the link 'Permanently remove it from disk' for the failed job that you would like to permanently delete which will also delete all temporary files.
  • If you do not 'Permanently remove it from disk', then the temporary files from the failed job are still counted towards your quota and your next job will likely fail due to your file quota met again.

For more information on Delete vs Delete Permanently see this link

 https://wiki.galaxyproject.org/Learn/ManagingDatasets#Delete_vs_Delete_Permanently

RSEM

The Galaxy interface was downloaded from a git repo and was developed to work with RSEM 1.1.17.

Since we have RSEM 1.2.29 installed, we are updating the tool to work with RSEM 1.2.29 which means some additional parameters you select other than the defaults may fail.

We have had success with the default RSEM Galaxy tool settings.

Currently the option "Transcript and genome bam results files" in the RSEM calculate expression tool is not workiing. You should leave it at "No BAM results files" or "Transcript bam results file" for now.

Let us know if specific RSEM options you require cause the tool to fail and we will update the RSEM Galaxy tool.


BLAST

There are some genomes already available and we can enable other organism genomes upon request.

When requesting additional genomes, send the HPRC helpdesk a link to the genome at NCBI, Ensembl or UCSC or other model organism databases.

bwa, bowtie, bowtie2 hisat2

We can create genome indexes for common model organisms upon request.

When requesting additional genomes, send the HPRC helpdesk a link to the genome at NCBI, Ensembl or UCSC or other model organism databases.

HISAT2

If you will be using cufflinks on HISAT2 output, you will need to select the following option in the HISAT2 Galaxy tool:

  • 'Spliced Alignment Parameters'
    • 'Specify spliced alignment parameters'
      • 'Transcriptome assembly reporting'
        • 'Report alignments tailored specifically for Cufflinks'

Share your History

with user(s) on the same Galaxy instance

User with History to share:

Click 'History options' -> 'Share or Publish'

Select "Share with another user" and enter the Galaxy users full email address.


User receiving shared history:

Click 'History options' -> 'Histories Shared with Me'

Check the box for the history you want to copy to your list of Histories and click Copy.

Then click the 'View all Histories' icon in the right panel to see the shared history in your list of histories.

Manage your HPRC Accounts

There are a number of different types of tools that are configured to require different amounts of SUs.

Here is how SUs are calculated:

Max runtime      Cores       Calculation                    Required SUs
  1 day            1        24 hrs * 1 day  *  1 core     =      24
  1 day           20        24 hrs * 1 day  * 20 cores    =     480
  3 days          20        24 hrs * 3 days * 20 cores    =    1440
  7 days          20        24 hrs * 7 days * 20 cores    =    3360

If a tool does not state how many SUs it requires, then it is most likely configured to use 1 core for 1 day (24 SUs) or 20 cores for 1 day (480 SUs). If you are unsure of how many SUs the tool requires, run your "My HPRC SU balance" tool before you submit a job and then run it again after the job starts running. You will be able to see how many SUs were charged to your account.

Using the following HPRC SU balance as an example, if you have two accounts and one is configured as the default (as indicated by Y in the default column), the SUs will only be taken from the account with Y in the Default column:

Starting balance example:

=====================================================================
                List of users's Project Accounts
---------------------------------------------------------------------
|  Account   | Default | Allocation |Used & Pending SUs|   Balance  |
---------------------------------------------------------------------
|000000000001|        Y|     5000.00|              0.00|     5000.00|
|000000000002|        N|     5000.00|              0.00|     5000.00|
---------------------------------------------------------------------

After submitting a job that requires 1440 SUs and the job starts running, your balance will look like this:

=====================================================================
                List of users's Project Accounts
---------------------------------------------------------------------
|  Account   | Default | Allocation |Used & Pending SUs|   Balance  |
---------------------------------------------------------------------
|000000000001|        Y|     5000.00|          -1440.00|     3560.00|
|000000000002|        N|     5000.00|              0.00|     5000.00|
---------------------------------------------------------------------

Now if you submit two more jobs that require 1440 SUs, then your balance will look like this once the jobs start running

=====================================================================
                List of users's Project Accounts
---------------------------------------------------------------------
|  Account   | Default | Allocation |Used & Pending SUs|   Balance  |
---------------------------------------------------------------------
|000000000001|        Y|     5000.00|          -4320.00|      680.00|
|000000000002|        N|     5000.00|              0.00|     5000.00|
---------------------------------------------------------------------

Now you should not submit any more jobs that require 1440 SUs since your default account only has 680 SUs.

If you need to submit more 1440 SU jobs, then you will need to change your default account to your other account that has a balance of 5000 SUs.

If you do not change your default and try to submit a 1440 SU job and you only have 680 SUs in your default account, then your job will remain in a queued state and will never run. It will get stopped if Galaxy is restarted but otherwise, you will have to stop the job.


If your three 1440 SU jobs complete early, after 1 day for example, then you will be reimbursed for the unused 2 days (960 SUs per job) for each of the three jobs and your HPRC SU balance will look like this:


=====================================================================
                List of users's Project Accounts
---------------------------------------------------------------------
|  Account   | Default | Allocation |Used & Pending SUs|   Balance  |
---------------------------------------------------------------------
|000000000001|        Y|     5000.00|          -1440.00|     3560.00|
|000000000002|        N|     5000.00|              0.00|     5000.00|
---------------------------------------------------------------------

You can now use the default account if you want to run a job that requires 1440 SUs since the balance has increased to greater than 1440 SUs.