Hprc banner tamu.png

Difference between revisions of "HPRC:Systems"

From TAMU HPRC
Jump to: navigation, search
(Curie: An IBM POWER7 HPC Cluster)
 
(102 intermediate revisions by 8 users not shown)
Line 1: Line 1:
=== Ada: An IBM (mostly) NeXtScale Cluster ===  
+
=== FASTER: A Dell x86 HPC Cluster ===
 
----
 
----
[[Image:Ada_racks.jpg|right|400px|caption]]
+
[[Image:FASTER.jpeg|right|400px|caption]]
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Ada
+
| FASTER
 
|-
 
|-
 
| Host Name:
 
| Host Name:
| ada.tamu.edu
+
| faster.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Linux (CentOS 6.7)
+
| Linux (CentOS 8)
|-
 
| Nodes/cores per node:
 
| 845/20-core @ 2.5 GHz IvyBridge; <br />15/40-core @ 2.26 GHz Westmere
 
|-
 
| Nodes with GPUs:
 
| 30 (2 Nvidia K20 GPUs/node)
 
 
|-
 
|-
| Nodes with Phis:
+
| Total Compute Cores/Nodes:
| 9 (2 Phi coprocessors/node)
+
| 11,520 cores<br>180 nodes
 
|-
 
|-
| Memory size:
+
| Compute Nodes:
| 811 nodes with 64 GB (DDR3 1866 MHz);<br />34 nodes with 256 GB (DDR3 1866 MHz)
+
| 180 64-core compute nodes, each with 64GB RAM
 
|-
 
|-
| Extra-fat nodes:
+
| Composable GPUs:
| 11 nodes with 1TB (DDR 3 1066 MHz);<br />4 nodes with 2TB (DDR3 1066 MHz)
+
| 200 T4 16GB GPUs<br>40 A100 40GB GPUs<br>10 A10 24GB GPUs<br>4 A30 24GB GPUs<br>8 A40 48GB GPUs
 
|-
 
|-
 
| Interconnect:
 
| Interconnect:
| FDR-10 Infiniband based on the <br />Mellanox SX6536 (core) and SX6036 (leaf) switches.
+
| Mellanox HDR100 InfiniBand (MPI and storage)<br>Liqid PCIe Gen4 (GPU composability)
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| ~337 TFLOPs
+
| 1.2 PFLOPs
 
|-
 
|-
 
| Global Disk:
 
| Global Disk:
| 4 PB (raw) via IBM's GSS26 appliance
+
| 5PB (usable) via DDN Lustre appliances
 
|-
 
|-
 
| File System:
 
| File System:
| General Parallel File System (GPFS)
+
| Lustre
 
|-
 
|-
 
| Batch Facility:
 
| Batch Facility:
| Platform LSF
+
| [http://slurm.schedmd.com/ Slurm by SchedMD]
 
|-
 
|-
 
| Location:
 
| Location:
| Teague Data Center
+
| West Campus Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| September 2014
+
| 2021
 
|}
 
|}
  
Ada is a 17,500-core IBM commodity cluster with nodes based mostly on Intel's 64-bit 10-core IvyBridge processors. 20 of the nodes with GPUs have 256 GB of memory. Included in the 845 nodes are 8 login nodes with 256 GB of memory per node, 3 with 2 GPUs, and 3 with 2 Phi coprocessors.
+
FASTER is a 184-node Intel cluster from Dell with an InfiniBand HDR-100 interconnect. A100 GPUs, A10 GPUs, A30 GPUs, A40 GPUs, and T4 GPUs are distributed and composable via Liqid PCIe fabrics. All login and compute nodes are based on the [https://ark.intel.com/content/www/us/en/ark/products/212284/intel-xeon-platinum-8352y-processor-48m-cache-2-20-ghz.html Intel Xeon 8352Y Ice Lake processor]. See the [[ FASTER:Intro | FASTER Intro Page]] for more information.
 +
 
 +
For a quick introduction to FASTER and Slurm, see the [[FASTER:QuickStart | FASTER Quick Start Guide]].
 +
 
 +
Get details on using this system, see the [[FASTER | User Guide for FASTER]].
  
Get details on using this system, see the [[Ada | User Guide for Ada]].
 
  
=== Eos: An IBM iDataplex Cluster ===
+
=== ACES: An innovative composable computational testbed ===
 
----
 
----
[[Image:eos_picture.jpg|right|400px|caption]]
+
 
 +
{| class="wikitable" style="text-align: center;"
 +
! Component:
 +
! Quantity
 +
! Description
 +
|-
 +
| Graphcore IPU
 +
| 16
 +
| style="text-align: left | 16 Colossus GC200 IPUs and dual AMD Rome CPU server on a 100 GbE RoCE fabric
 +
|-
 +
| Intel FPGA PAC D5005
 +
| 2
 +
| style="text-align: left | FPGA SOC with Intel Stratix 10 SX FPGAs, 64 bit quad-core Arm Cortex-A53 processors, and 32GB DDR4
 +
|-
 +
| Intel Optane SSDs
 +
| 8
 +
| style="text-align: left | 3 TB of Intel Optane SSDs addressable as memory using MemVerge Memory Machine.
 +
|}
 +
 
 +
Available through the FASTER system. See the [[ACES | User Guide for ACES]] for more information.
 +
 
 +
 
 +
=== Grace: A Dell x86 HPC Cluster ===
 +
 
 +
[[Image:Grace-racks.jpg|right|400px|caption]]
 +
 
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Eos
+
| Grace
 
|-
 
|-
 
| Host Name:
 
| Host Name:
| eos.tamu.edu
+
| grace.hprc.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Linux (RedHat Enterprise Linux and CentOS)
+
| Linux (CentOS 7)
 
|-
 
|-
| Nodes/Cores per node:
+
| Total Compute Cores/Nodes:
| 324/8-core Nehalem <br> 48/12-core Westmere-based
+
| 44,656 cores<br>925 nodes
 
|-
 
|-
| Nodes with Fermi GPUs:
+
| Compute Nodes:
| 2 w 2 M2050 each <br> 2 w 1 M2070 each
+
| 800 48-core compute nodes, each with 384GB RAM <br> 100 48-core GPU nodes, each with two A100 40GB GPUs and 384GB RAM <br>9 48-core GPU nodes, each with two RTX 6000 24GB GPUs and 384GB RAM<br>8 48-core GPU nodes, each with 4 T4 16GB GPUs<br> 8 80-core large memory nodes, each with 3TB RAM
 
|-
 
|-
| Number of Processing Cores:
+
| Interconnect:
| 3168 cores @ 2.8GHz
+
| Mellanox HDR 100 InfiniBand
|-
 
| Interconnect Type:
 
| 4x QDR Infiniband <br> (Voltaire Grid Director GD4700 switch)
 
|-
 
| Total Memory:
 
| 9,056 GB
 
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| 35.5 TFlops
+
| 6.2 PFLOPS
 
|-
 
|-
| Total Disk:
+
| Global Disk:
| ~500 TB by a DDN S2A9900 RAID Array
+
| 5PB (usable) via DDN Lustre appliances for general use <br>1.4PB (usable) via Lenovo DSS GPFS appliance (purchased by and dedicated for Dr. Junjie Zhang's CryoEM Lab)<br>1.9PB (usable) via Lenovo DSS GPFS appliance (purchased by and dedicated for Dr. Ping Chang's iHESP Lab)
 
|-
 
|-
 
| File System:
 
| File System:
| General Parallel File System (GPFS)
+
| Lustre and GPFS
 
|-
 
|-
 
| Batch Facility:
 
| Batch Facility:
| PBS/Torque/Maui
+
| [http://slurm.schedmd.com/ Slurm by SchedMD]
 
|-
 
|-
 
| Location:
 
| Location:
| Teague Data Center
+
| West Campus Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| May 2010
+
| Spring 2021
 
|}
 
|}
  
Eos is an IBM &quot;iDataPlex&quot; commodity cluster with nodes based on Intel's 64-bit Nehalem &amp; Westmere processor. The cluster is composed of 6 head nodes, 4 storage nodes, and 362 compute nodes. The storage and compute nodes have 24 GB of DDR3 1333 MHz memory while the head nodes have 48 GB of DDR3 1066 MHz memory. A Voltaire Grid Director 4700 QDR IB switch provides the core switching infrastructure.
+
Grace is an Intel x86-64 Linux cluster with 925 compute nodes (44,656 total cores) and 5 login nodes.  There are 800 compute nodes with 384 GB of memory, and 117 GPU nodes with 384 GB of memory.  Among the 117 GPU nodes, there are 100 GPU nodes two A100 40 GB GPU cards, 9 GPU nodes with two RTX 6000 24GB GPU cards, 8 GPU nodes with four T4 16GB GPU cards.  These 800 compute nodes and 117 GPU nodes are a dual socket server with two Intel 6248R 3.0GHz 24-core processors, commonly known as Cascade Lake.  There are 8 compute nodes with 3 TB of memory and four Intel 6248 2.5 GHz 20-core processors. See the [[ Grace:Intro | Grace Intro Page]] for more information.
 +
 
 +
For a quick introduction to Grace and Slurm, see the [[:Grace:QuickStart | Grace Quick Start Guide]].
 +
 
 +
Get details on using this system, see the [[Grace | User Guide for Grace]].
  
Get details on using this system, see the [[Eos | User Guide for Eos]]. (For detailed technical information, [http://sc.tamu.edu/systems/eos/hardware.php click here].)
 
  
=== Crick: An IBM POWER7 BigInsights (Hadoop) Cluster ===
+
=== Terra: A Lenovo x86 HPC Cluster ===
 
----
 
----
[[Image:p7_placeholder_pic.jpg|right|400px|x300px|upright|Crick/Curie (under construction)]]
+
[[Image:Terra-racks.jpg|right|400px]]
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Crick
+
| Terra
 
|-
 
|-
 
| Host Name:
 
| Host Name:
| crick.tamu.edu
+
| terra.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| PowerLinux 6.5
+
| Linux (CentOS 7)
 
|-
 
|-
| Number of Nodes:
+
| Total Compute Cores/Nodes:
| 23 IBM PowerLinux 7R2 servers
+
| 8,512 cores<br>304 nodes
 
|-
 
|-
| Number of Processing Cores:
+
| Compute Nodes:
| 368 (all@4.2GHz)
+
| 256 compute nodes, each with 64GB RAM <br> 48 GPU nodes, each with one dual-GPU Tesla K80 accelerator and 128GB of RAM
 
|-
 
|-
| Interconnect Type:
+
| Interconnect:
| 10GB Ethernet
+
| Intel Omni-Path 100 Series switches.
|-
 
| Total Memory:
 
| 5.75TB (23 x 256GB)???
 
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| ??? TFlops (23 nodes x 16 cores x 4.2 Gcycles/sec x X FLOPs/cycle)
+
| 326 TFLOPs
 
|-
 
|-
| Total Disk:
+
| Global Disk:
| 377 TB (23 nodes x 28 disk/node x 600 gb/disk)
+
| 2PB (raw) via IBM/Lenovo's GSS26 appliances for general use <br>1PB (raw) via Lenovo's GSS24 appliance purchased by and dedicated for GEOSAT
 
|-
 
|-
 
| File System:
 
| File System:
| GPFS's File Placement Optimizer (FPO) - IBM's HDFS alternative
+
| General Parallel File System (GPFS)
 +
|-
 +
| Batch Facility:
 +
| [http://slurm.schedmd.com/ Slurm by SchedMD]
 
|-
 
|-
 
| Location:
 
| Location:
| Wehner Data Center
+
| Teague Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| estimated Q4 2014
+
| Spring 2017
 
|}
 
|}
  
=== Curie: An IBM Power7+ Cluster ===
+
Terra is a 8,512-core Lenovo commodity cluster. Each compute node has two [https://ark.intel.com/products/91754/Intel-Xeon-Processor-E5-2680-v4-35M-Cache-2_40-GHz Intel 64-bit 14-core Broadwell processors].  In addition to the 304 compute nodes, there are 3 login nodes (one with GPU), each with 128 GB of memory.  See the [[ Terra:Intro | Terra Intro Page]] for more information.
 +
 
 +
For a quick introduction to Terra and Slurm, see the [[:Terra:QuickStart | Terra Quick Start Guide]].
 +
 
 +
Get details on using this system, see the [[Terra | User Guide for Terra]].
 +
 
 +
 
 +
=== Ada: An IBM/Lenovo x86 HPC Cluster ===  
 
----
 
----
[[Image:p7_placeholder_pic.jpg|right|400px|x300px|upright|Crick/Curie (under construction)]]
+
[[Image:Ada_racks.jpg|right|400px|caption]]
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Curie
+
| Ada
 
|-
 
|-
 
| Host Name:
 
| Host Name:
| curie.tamu.edu
+
| ada.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Linux (RedHat Enterprise Linux 6.6)
+
| Linux (CentOS 6)
|-
 
| Number of Nodes:
 
| 48 IBM PowerLinux 7R2 servers
 
 
|-
 
|-
| Number of Processing Cores:
+
| Total Compute Cores/Nodes:
| 768 (all @ 4.2GHz)
+
| 17,340 cores<br>852 nodes
 
|-
 
|-
| Interconnect Type:
+
| Compute Nodes:
| 10Gbps Ethernet
+
| 792 compute nodes, each with 64GB RAM <br> 30 GPU nodes, each with dual GPUs and 64GB or 256GB RAM <br> 9 Phi nodes, each with dual Phi accelerators and 64GB RAM <br> 6 large memory compute nodes, each with 256GB RAM <br> 11 xlarge memory compute nodes, each with 1TB RAM <br> 4 xlarge memory compute nodes, each with 2TB RAM
 
|-
 
|-
| Memory Size:
+
| Interconnect:
| 50 Nodes with 256 GB/Node (DDR3 1066 MHz)
+
| FDR-10 Infiniband based on the <br>Mellanox SX6536 (core) and SX6036 (leaf) switches.
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| ~26 TFlops (48 nodes x 16 cores x 4.2 Gcycles/sec x X FLOP/cycle)
+
| ~337 TFLOPs
 
|-
 
|-
| Total Disk:
+
| Global Disk:
| 4PB (raw) via IBM's GSS26 appliance (shared with Ada)
+
| 4PB (raw) via IBM/Lenovo's GSS26 appliance
 
|-
 
|-
 
| File System:
 
| File System:
| General Parallel File System (GPFS) (shared with Ada)
+
| General Parallel File System (GPFS)
 
|-
 
|-
| Batch Facility
+
| Batch Facility:
 
| Platform LSF
 
| Platform LSF
 
|-
 
|-
| Location
+
| Location:
| Wehner Data Center
+
| Teague Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| May 2015
+
| September 2014
 
|}
 
|}
  
Curie is an 800-core IBM Power7+ cluster with nodes based on IBM's 64-bit 16-core Power7+ processors. Included in the 50 nodes are 2 login node with 256 GB of memory per node. Curie's file system and batch scheduler are shared with Ada cluster.
+
Ada is a 17,340-core IBM/Lenovo commodity cluster. Most of the compute nodes have two [https://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz Intel 64-bit 10-core Ivy Bridge processors]. In addition to the 852 compute nodes, there are 8 login nodes, each with 256 GB of memory and GPUs or Phi coprocessors per node. See the [[ Ada:Intro | Ada Intro Page]] for more information.
  
Get details on using this system, see the User Guide for Curie.
+
Get details on using this system, see the [[Ada | User Guide for Ada]].
 
+
<!--
=== Neumann: An IBM BlueGene/Q (BG/Q) Cluster ===
+
=== Crick: An IBM POWER7+ BigData Analytics Cluster ===
 
----
 
----
[[Image:Neumann-BGQ.png|right|400px|x300px|upright|Neumann]]
+
[[Image:Crick5.medium.jpg|right|400px|x300px|upright|Crick/Curie (under construction)]]
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Neumann
+
| Crick
 
|-
 
|-
 
| Host Name:
 
| Host Name:
| neumann.tamu.edu
+
| crick.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Redhat Enterprise Linux 6.5 (login nodes) <br> [http://en.wikipedia.org/wiki/CNK_operating_system CNK (IBM's BG Compute Node Kernel)] <br> [http://en.wikipedia.org/wiki/INK_%28operating_system%29 INK (IBM's BG I/O Node Kernel)].
+
| Linux (Red Hat Enterprise Linux 6)
 
|-
 
|-
| Number of Nodes:
+
| Nodes/Cores per node:
| 2048 IBM BG/Q nodes
+
| 23/16-core @ 4.2GHz POWER7+ 
 
|-
 
|-
| Number of Processing Cores:
+
| Memory Sizes:
| 32,736 [http://en.wikipedia.org/wiki/PowerPC_A2 PowerPC A2] (all@1.6GHz)
+
| 23 nodes with 265GB RAM (DDR3 1066MHz)
 
|-
 
|-
 
| Interconnect Type:
 
| Interconnect Type:
| Infiniband
+
| 10 Gbps Ethernet
|-
 
| Total Memory:
 
| 32TB (2048 x 16GB)
 
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| ??? TFlops (2048 nodes x 16 cores x 1.6 Gcycles/sec x X FLOP/cycle)
+
| ~13 TFLOPS
 
|-
 
|-
 
| Total Disk:
 
| Total Disk:
| ~2PB
+
| ~377TB
 
|-
 
|-
 
| File System:
 
| File System:
| General Parallel File System (GPFS)
+
| GPFS's File Placement Optimizer (FPO) - IBM's HDFS alternative
|-
 
| Batch Facility:
 
| Load Leveler
 
 
|-
 
|-
 
| Location:
 
| Location:
| IBM Campus, Rochester, Minnesota
+
| Wehner Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| estimated August 2015
+
| August 2015
 
|}
 
|}
  
=== Lonestar ===
+
Crick is a 368-core IBM POWER7+ BigData cluster.  Each compute node has two IBM's 64-bit 8-core POWER7+ processors. Included in the 23 nodes are 1 BigSQL node with 256GB of memory per node and 14TB (raw) of storage and 22 data nodes with 14TB (raw) storage for GPFS-FPO and local caching. Crick is primarily used for big data analytics. In addition to these nodes are 2 login nodes with 128GB of memory per node,
 +
 
 +
Get details on using this system, see the [[Crick | User Guide for Crick]].
 +
 
 +
 
 +
 
 +
=== Curie: An IBM POWER7+ HPC Cluster ===
 
----
 
----
==== Lonestar4: A Dell Linux Cluster ====
+
[[Image:Curie1.medium.jpg|right|400px|x300px|upright|Crick/Curie (under construction)]]
[[Image:lonestar.jpg|right|400px|caption]]
 
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Lonestar 4
+
| Curie
 
|-
 
|-
| Host Names:
+
| Host Name:
| lonestar.tacc.utexas.edu<br>lslogin1.tacc.utexas.edu<br>lslogin2.tacc.utexas.edu
+
| curie.tamu.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Linux
+
| Linux (Red Hat Enterprise Linux 6)
 
|-
 
|-
| Number of Nodes:
+
| Nodes/Cores per node:
| 1,888
+
| 48/16-core @ 4.2GHz POWER7+
 
|-
 
|-
| Number of Processing Cores:
+
| Memory Size:
| 22,656
+
| 48 nodes with 256GB RAM (DDR3 1066MHz)
 
|-
 
|-
 
| Interconnect Type:
 
| Interconnect Type:
| Quad Data Rate (QDR) InfiniBand
+
| 10Gbps Ethernet
|-
 
| Total Memory:
 
| 44 TB
 
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| 302 TFlops
+
| ~26 TFLOPS
 
|-
 
|-
 
| Total Disk:
 
| Total Disk:
| 276 TB (local), 1000 TB (global)
+
| 4PB (raw) via IBM/Lenovo's GSS26 appliance (shared with Ada)
 
|-
 
|-
 
| File System:
 
| File System:
| Lustre
+
| General Parallel File System (GPFS) (shared with Ada)
 
|-
 
|-
| Batch Facility:
+
| Batch Facility
| Load Leveler
+
| Platform LSF
 
|-
 
|-
| Location:
+
| Location
| [http://www.tacc.utexas.edu/ TACC]
+
| Wehner Data Center
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| 2011
+
| May 2015
 
|}
 
|}
  
The TACC Dell Linux Cluster contains 22,656 cores within 1,888 Dell PowerEdge M610 compute blades (nodes), 16 PowerEdge R610 compute-I/O server-nodes, and 2 PowerEdge M610 (3.3GHz) login nodes. Each compute node has 24GB of memory, and the login/development nodes have 16GB. The system storage includes a 1000TB parallel (SCRATCH) Lustre file system, and 276TB of local compute-node disk space (146GB/node). A QDR InfiniBand switch fabric interconnects the nodes (I/O and compute) through a fat-tree topology, with a point-to-point bandwidth of 40GB/sec (unidirectional speed).
+
'''Curie has been retired.''' It is an 768-core IBM Power7+ cluster. Each compute node has two IBM 64-bit 8-core POWER7+ processors. In addition to the 48 nodes are 2 login nodes with 256GB of memory per node. Curie's file system and batch scheduler are shared with Ada cluster.
Compute nodes have two processors, each a Xeon 5680 series 3.33GHz hex-core processor with a 12MB unified L3 cache. Peak performance for the 12 cores is 160 GFLOPS. The new Westmere microprocessor (basically similar to the Nehalem processor family, but using 32nm technology) has the following features: hex-core, shared L3 cache per socket, Integrated Memory Controller, larger L1 caches, Macro Ops Fusion, double-speed integer units, Advanced Smart Cache, and new SSE4.2 instructions. The memory system has 3 channels and uses 1333 MHz DIMMS.
+
-->
Technical information from [http://www.tacc.utexas.edu/resources/hpc/#lonestar TACC HPC Resources]. Get more information about using Lonestar in the [http://www.tacc.utexas.edu/user-services/user-guides/lonestar-user-guide TACC Dell Linux Cluster User Guide].
 
  
==== Lonestar5: A Cray XC40 Linux Cluster ====
+
 
[[Image:lonestar.jpg|right|400px|caption]]
+
=== Lonestar6: A Dell x86 HPC Cluster ===
 +
----
 +
[[Image:Lonestar6.jpg|right|400px|caption]]
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
 
| System Name:
 
| System Name:
| Lonestar 5
+
| Lonestar 6
 
|-
 
|-
 
| Host Names:
 
| Host Names:
| TBA
+
| ls6.tacc.utexas.edu
 
|-
 
|-
 
| Operating System:
 
| Operating System:
| Linux
+
| Linux (Rocky 8.4)
 
|-
 
|-
 
| Number of Nodes:
 
| Number of Nodes:
| 1252 dual-socket (E5-2690v3 12 core 2.6GHz) nodes with 64GB<br>2 1TB large memory nodes<br>8 512GB large memory nodes<br>(more details to come)
+
| 560 compute nodes, each with 128 cores<br>32 GPU nodes (same configuration as compute nodes), each with 3 NVIDIA 40GB A100 GPUs
 
|-
 
|-
| Number of Processing Cores:
+
| Memory Sizes:
| 30,000+
+
| 256 GB
 
|-
 
|-
 
| Interconnect Type:
 
| Interconnect Type:
| Intel [http://www.theregister.co.uk/2012/04/25/intel_cray_interconnect_followup/ (formerly Cray's)] Aries interconnect, Dragonfly topology
+
| Melanox HDR technology with full HDR (200 GB/s) connectivity
|-
 
| Total Memory:
 
| 86.25TB (88320GB = 1252 * 64GB + 2 * 2TB + 8 * 512GB)
 
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:
| 1.25+ PFlops
+
| 2.8 PFLOPS
 
|-
 
|-
 
| Total Disk:
 
| Total Disk:
| 5.4 PB (???) raw DDN ([http://www.ddn.com/ DataDirect Networks]) RAID storage
+
| 15 PB
 
|-
 
|-
 
| File System:
 
| File System:
Line 326: Line 343:
 
|-
 
|-
 
| Production Date:
 
| Production Date:
| 2016???
+
| January 2022
 
|}
 
|}
LoneStar5 is the latest in a series of [https://www.tacc.utexas.edu/systems/lonestar LoneStar clusters hosted at TACC].  Jointly funded by the University of Texas System, Texas A&M University and Texas Tech University, it provides additional resources to TAMU researchers.  At present, it is scheduled for production in early 2016.
+
 
 +
Lonestar6 is the latest in a series of [https://www.tacc.utexas.edu/systems/lonestar LoneStar clusters hosted at TACC].  Jointly funded by the University of Texas System, Texas A&M University, the University of North Texas, and Texas Tech University, it provides additional resources to TAMU researchers.   
  
 
* Sources:
 
* Sources:
 
** http://www.hpcwire.com/2015/07/20/cray-comes-back-to-tacc/
 
** http://www.hpcwire.com/2015/07/20/cray-comes-back-to-tacc/
** http://utsystem.edu/offices/health-affairs/utrc/news/2015/07/13/tacc-continues-legacy-lonestar-supercomputers
+
** https://portal.tacc.utexas.edu/user-guides/lonestar6
 +
 
 +
 
 +
 
 +
'''Note:''' Effective on September 27, 2016, all users must authenticate using Multi-Factor Authentication (MFA) in order to access TACC resources. More information on our [[TACC:MFA  | TACC MFA page]].
 +
 
 +
 
 +
[[ Category:HPRC ]]

Latest revision as of 11:00, 9 November 2022

FASTER: A Dell x86 HPC Cluster


caption
System Name: FASTER
Host Name: faster.tamu.edu
Operating System: Linux (CentOS 8)
Total Compute Cores/Nodes: 11,520 cores
180 nodes
Compute Nodes: 180 64-core compute nodes, each with 64GB RAM
Composable GPUs: 200 T4 16GB GPUs
40 A100 40GB GPUs
10 A10 24GB GPUs
4 A30 24GB GPUs
8 A40 48GB GPUs
Interconnect: Mellanox HDR100 InfiniBand (MPI and storage)
Liqid PCIe Gen4 (GPU composability)
Peak Performance: 1.2 PFLOPs
Global Disk: 5PB (usable) via DDN Lustre appliances
File System: Lustre
Batch Facility: Slurm by SchedMD
Location: West Campus Data Center
Production Date: 2021

FASTER is a 184-node Intel cluster from Dell with an InfiniBand HDR-100 interconnect. A100 GPUs, A10 GPUs, A30 GPUs, A40 GPUs, and T4 GPUs are distributed and composable via Liqid PCIe fabrics. All login and compute nodes are based on the Intel Xeon 8352Y Ice Lake processor. See the FASTER Intro Page for more information.

For a quick introduction to FASTER and Slurm, see the FASTER Quick Start Guide.

Get details on using this system, see the User Guide for FASTER.


ACES: An innovative composable computational testbed


Component: Quantity Description
Graphcore IPU 16 16 Colossus GC200 IPUs and dual AMD Rome CPU server on a 100 GbE RoCE fabric
Intel FPGA PAC D5005 2 FPGA SOC with Intel Stratix 10 SX FPGAs, 64 bit quad-core Arm Cortex-A53 processors, and 32GB DDR4
Intel Optane SSDs 8 3 TB of Intel Optane SSDs addressable as memory using MemVerge Memory Machine.

Available through the FASTER system. See the User Guide for ACES for more information.


Grace: A Dell x86 HPC Cluster

caption
System Name: Grace
Host Name: grace.hprc.tamu.edu
Operating System: Linux (CentOS 7)
Total Compute Cores/Nodes: 44,656 cores
925 nodes
Compute Nodes: 800 48-core compute nodes, each with 384GB RAM
100 48-core GPU nodes, each with two A100 40GB GPUs and 384GB RAM
9 48-core GPU nodes, each with two RTX 6000 24GB GPUs and 384GB RAM
8 48-core GPU nodes, each with 4 T4 16GB GPUs
8 80-core large memory nodes, each with 3TB RAM
Interconnect: Mellanox HDR 100 InfiniBand
Peak Performance: 6.2 PFLOPS
Global Disk: 5PB (usable) via DDN Lustre appliances for general use
1.4PB (usable) via Lenovo DSS GPFS appliance (purchased by and dedicated for Dr. Junjie Zhang's CryoEM Lab)
1.9PB (usable) via Lenovo DSS GPFS appliance (purchased by and dedicated for Dr. Ping Chang's iHESP Lab)
File System: Lustre and GPFS
Batch Facility: Slurm by SchedMD
Location: West Campus Data Center
Production Date: Spring 2021

Grace is an Intel x86-64 Linux cluster with 925 compute nodes (44,656 total cores) and 5 login nodes. There are 800 compute nodes with 384 GB of memory, and 117 GPU nodes with 384 GB of memory. Among the 117 GPU nodes, there are 100 GPU nodes two A100 40 GB GPU cards, 9 GPU nodes with two RTX 6000 24GB GPU cards, 8 GPU nodes with four T4 16GB GPU cards. These 800 compute nodes and 117 GPU nodes are a dual socket server with two Intel 6248R 3.0GHz 24-core processors, commonly known as Cascade Lake. There are 8 compute nodes with 3 TB of memory and four Intel 6248 2.5 GHz 20-core processors. See the Grace Intro Page for more information.

For a quick introduction to Grace and Slurm, see the Grace Quick Start Guide.

Get details on using this system, see the User Guide for Grace.


Terra: A Lenovo x86 HPC Cluster


Terra-racks.jpg
System Name: Terra
Host Name: terra.tamu.edu
Operating System: Linux (CentOS 7)
Total Compute Cores/Nodes: 8,512 cores
304 nodes
Compute Nodes: 256 compute nodes, each with 64GB RAM
48 GPU nodes, each with one dual-GPU Tesla K80 accelerator and 128GB of RAM
Interconnect: Intel Omni-Path 100 Series switches.
Peak Performance: 326 TFLOPs
Global Disk: 2PB (raw) via IBM/Lenovo's GSS26 appliances for general use
1PB (raw) via Lenovo's GSS24 appliance purchased by and dedicated for GEOSAT
File System: General Parallel File System (GPFS)
Batch Facility: Slurm by SchedMD
Location: Teague Data Center
Production Date: Spring 2017

Terra is a 8,512-core Lenovo commodity cluster. Each compute node has two Intel 64-bit 14-core Broadwell processors. In addition to the 304 compute nodes, there are 3 login nodes (one with GPU), each with 128 GB of memory. See the Terra Intro Page for more information.

For a quick introduction to Terra and Slurm, see the Terra Quick Start Guide.

Get details on using this system, see the User Guide for Terra.


Ada: An IBM/Lenovo x86 HPC Cluster


caption
System Name: Ada
Host Name: ada.tamu.edu
Operating System: Linux (CentOS 6)
Total Compute Cores/Nodes: 17,340 cores
852 nodes
Compute Nodes: 792 compute nodes, each with 64GB RAM
30 GPU nodes, each with dual GPUs and 64GB or 256GB RAM
9 Phi nodes, each with dual Phi accelerators and 64GB RAM
6 large memory compute nodes, each with 256GB RAM
11 xlarge memory compute nodes, each with 1TB RAM
4 xlarge memory compute nodes, each with 2TB RAM
Interconnect: FDR-10 Infiniband based on the
Mellanox SX6536 (core) and SX6036 (leaf) switches.
Peak Performance: ~337 TFLOPs
Global Disk: 4PB (raw) via IBM/Lenovo's GSS26 appliance
File System: General Parallel File System (GPFS)
Batch Facility: Platform LSF
Location: Teague Data Center
Production Date: September 2014

Ada is a 17,340-core IBM/Lenovo commodity cluster. Most of the compute nodes have two Intel 64-bit 10-core Ivy Bridge processors. In addition to the 852 compute nodes, there are 8 login nodes, each with 256 GB of memory and GPUs or Phi coprocessors per node. See the Ada Intro Page for more information.

Get details on using this system, see the User Guide for Ada.


Lonestar6: A Dell x86 HPC Cluster


caption
System Name: Lonestar 6
Host Names: ls6.tacc.utexas.edu
Operating System: Linux (Rocky 8.4)
Number of Nodes: 560 compute nodes, each with 128 cores
32 GPU nodes (same configuration as compute nodes), each with 3 NVIDIA 40GB A100 GPUs
Memory Sizes: 256 GB
Interconnect Type: Melanox HDR technology with full HDR (200 GB/s) connectivity
Peak Performance: 2.8 PFLOPS
Total Disk: 15 PB
File System: Lustre
Batch Facility: Slurm Workload Manager
Location: TACC
Production Date: January 2022

Lonestar6 is the latest in a series of LoneStar clusters hosted at TACC. Jointly funded by the University of Texas System, Texas A&M University, the University of North Texas, and Texas Tech University, it provides additional resources to TAMU researchers.


Note: Effective on September 27, 2016, all users must authenticate using Multi-Factor Authentication (MFA) in order to access TACC resources. More information on our TACC MFA page.