Hprc banner tamu.png

SW:Galaxy

From TAMU HPRC
Revision as of 15:38, 7 April 2017 by Cryssb818 (talk | contribs)
Jump to: navigation, search

Galaxy

Account Security

Do not share your Galaxy account with anyone. Galaxy uses the TAMU Central Authentication Service which is linked to your TAMU account.

Make sure you always logout of Galaxy by selecting User -> Logout and then click the Logout button on the next screen and then close your browser when you are finished using Galaxy.

FishCamp Galaxy Accounts

The FishCamp Galaxy instance is reserved for training purposes such as Galaxy workshops.

When requesting access to FishCamp for a training workshop, please include your ada NetID in your request.

  • The FishCamp Galaxy is configured for training purposes.
    • Most jobs will run a maximum of 1 hour.
    • This is to enable jobs to be scheduled faster in the cluster queue.
    • Keep your input datasets small so that they will complete within one hour.

FishCamp Galaxy is not intended for research projects and data on FishCamp Galaxy should be considered to have short term accessibility.

Request a Maroon Galaxy account only if you have data to analyze.

The tools available on FishCamp and Maroon are the same.

If you are off campus then you will have to install and run the TAMU VPN to connect to FishCamp Galaxy.

Fishcamp Galaxy can be accessed using your favorite web browser such as Firefox, Chrome or IE.

https://hprcgalaxy.tamu.edu/fishcamp/

Maroon Galaxy Accounts

Before you request an account on Maroon Galaxy, you must do the following:

  • Go to usegalaxy.org and get familiar with Galaxy. You can start with a free account and learn about Galaxy tools.
  • Request a Maroon Galaxy account only if you have data to analyze, otherwise use FishCamp Galaxy for training and practice.
  • If you decide that Galaxy is a good choice for your research project then do the following
    • Establish an account on Ada by sending a request. See the NewUser page for details on how to request an account.
    • After you have your Ada account approved, request an account on Galaxy
    • Send us information on what type of data you will be analyzing and which tools you expect to use for your research project.

If you are off campus then you will have to install and run the TAMU VPN to connect to Maroon Galaxy.

Maroon Galaxy can be accessed using your favorite web browser such as Firefox, Chrome or IE.

https://hprcgalaxy.tamu.edu/maroon/

Uploading Files > 2GB via FTP to Maroon Galaxy

From a UNIX Computer (Mac or Linux)

Go to the directory containing the file on your local computer. (for example: sample_file.fastq)

Click "Upload File" icon then click "Choose FTP file" for which port to use. Port 2121 is for Maroon Galaxy.

Connect to the ftp server using your Maroon Galaxy credentials

 sftp -P 2121 Your_NetID@ada7.tamu.edu

The sftp prompt looks like this:

 sftp>

When you see the ftp prompt, you can upload a file with the put command followed by the file name to upload.

 put sample_file.fastq

Then you can verify that the file was uploaded by using the ls command.

 ls

Then exit the ftp prompt and go to Galaxy to upload files and click the button 'Choose FTP file' which will show you the file you uploaded.

 exit

Using Filezilla

Use the following Host: in Filezilla for all Galaxy installations

 sftp://ada7.tamu.edu

Each Galaxy instance has its own port. The port number can be found by clicking the "Choose FTP file" button in the Galaxy upload tool menu.

Reveille Galaxy

   sftp -P 2123 Your_NetID@ada7.tamu.edu

Requesting New Galaxy Tools

  • Go to usegalaxy.org or galaxy_toolshed and find a tool you want installed in Maroon Galaxy and send us the URL for the tool.
  • Send us a list of tools with the URLs that you would like installed so we can install them all at once.
  • Each time a tool is installed, the Galaxy server must be restarted.
    • We would prefer to install many tools at once and then restart the Galaxy server.
    • It is rare that restarting the Galaxy server can corrupt some jobs but it has happened and we want to reduce the chances of interrupting submitted jobs.

When a Galaxy tool is Available

  • If you know of an existing Galaxy tool that you have found at usegalaxy.org, for example, send us a request identifying the URL for the tool in the toolshed repository or usegalaxy.org.
  • We do not install tools via toolshed directly but we do download them and manually install in order to verify that the Ada system modules are properly configured in the Galaxy tool.

When a tool has no Galaxy interface

  • There are some Bioinformatics tools that do not have a Galaxy interface developed.
  • We will need to spend more time creating the xml configuration files and testing.
    • This will take more time than adding a tool from the Galaxy toolshed.
    • Some tools are simple and can be done rather quickly but others will require extensive development time.
    • In some cases we may not be able to configure a Galaxy tool for a software application due to the complexity of the tool installation which will not work in the Galaxy environment.

FAQ

1. Q: My job immediately fails with the message:

The cluster DRM system terminated this job

1. A: Check your file quota using the 'showquota' command at the Ada command line

2. Q: My job fails after running for a long time with the message:

The cluster DRM system terminated this job

2. A: Check the email sent to you regarding your Galaxy job and see if you see the following line then your job requires more memory or time to run. Contact us to update the tool configuration as needed.

 TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.

3. Q: My job has been in the queue status for a long time, what is going on?

3. A: First check the Ada command line to see if you job is pending (PEND) in the Ada queue by using the bjobs command. If the job is PEND then the cluster is busy at the moment.

3. A: Check your SUs to make sure you have enough to run the Galaxy job by using the 'myproject -l' command on the Ada command line. If you are unsure how many SUs the tool requires, contact the HPRC helpdesk.

3. A: In come cases, you will need to delete the queued job and restart but check the above conditions and contact the HPRC helpdesk first.

Tool specific notes

Trinity

Before you run a Trinity job

Contact the HPRC helpdesk (help@hprc.tamu.edu) so that we can increase your Ada file quota.

All new Ada accounts have a default max file quota of 50,000 files. Trinity will produce ~100,000+ temporary files during a de novo assembly. Very large data sets can create 500,000+ temporary files.

Please monitor your file quota during your job run using the following command and request an increase if your job is creating enough files that your file quota might be reached.

 showquota

Be aware that starting multiple Trinity jobs at the same time may cause you to reach your file quota at a quicker pace. Please monitor your file quota during your job run.

If your Trinity job Fails

  • If your Trinity job failed and after you have determined the reason for you Trinity job failing, you will need to delete the failed job files by selecting the 'Permanently remove it from disk' link which is part of a two step process.
    • First delete the job file in the right panel by clicking the X, then select the 'deleted' link in the right panel at the top which will show all deleted files.
    • Then click the link 'Permanently remove it from disk' for the failed job that you would like to permanently delete which will also delete all temporary files.
  • If you do not 'Permanently remove it from disk', then the temporary files from the failed job are still counted towards your quota and your next job will likely fail due to your file quota met again.

For more information on Delete vs Delete Permanently see this link

 https://wiki.galaxyproject.org/Learn/ManagingDatasets#Delete_vs_Delete_Permanently

RSEM

The Galaxy interface was downloaded from a git repo and was developed to work with RSEM 1.1.17.

Since we have RSEM 1.2.29 installed, we are updating the tool to work with RSEM 1.2.29 which means some additional parameters you select other than the defaults may fail.

We have had success with the default RSEM Galaxy tool settings.

Currently the option "Transcript and genome bam results files" in the RSEM calculate expression tool is not workiing. You should leave it at "No BAM results files" or "Transcript bam results file" for now.

Let us know if specific RSEM options you require cause the tool to fail and we will update the RSEM Galaxy tool.


BLAST

There are some genomes already available and we can enable other organism genomes upon request.

When requesting additional genomes, send the HPRC helpdesk a link to the genome at NCBI, Ensembl or UCSC or other model organism databases.

bwa, bowtie, bowtie2 hisat2

We can create genome indexes for common model organisms upon request.

When requesting additional genomes, send the HPRC helpdesk a link to the genome at NCBI, Ensembl or UCSC or other model organism databases.