Cluster

The Life Sciences cluster consists of the following hardware:

14 nodes: Dual Intel® Xeon™ CPU 3.06GHz, 2GB RAM, RHEL3 U8 ×86
36 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 4GB RAM, RHEL3 U8 ×86
14 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 2GB RAM, RHEL3 U8 ×86
22 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 2.5GB RAM, RHEL3 U8 ×86_64
12 nodes: Quad Intel® Xeon™ CPU 2.80GHz, 8GB RAM, RHEL4 U3 ×86_64
1 node: Quad Genuine Intel® CPU 3.00GHz, 8GB RAM, RHEL4 U4 ×86_64
1 node: Dual Core AMD Opteron™ Processor 275, 4GB RAM, x86_64
1 node: Intel® Xeon™ CPU 3.20GHz, 4GB RAM RHEL4 U4, x86_64
1 node: Dual Core Intel® Xeon™ CPU 2.40GHz, 1GB RAM, RHEL4 U4, x86
Total: 229 cpus

Cluster head node: portal.cgr.harvard.edu
Dual Intel® Xeon™ CPU 3.06GHz, 2GB RAM, RHEL3 U8 ×86

Cluster Network: 1Us with 1GB NIC, IBM Blades chassis with 4 trunked ports, DellBlades with 1GB pass-through all connected into a Foundry Networks FastIron 1500 switch. 10 DellBlades have Infiniband network as well as 1GB pass-through network.

Admin host: serving the cluster is a IBM Blade server running DHCP, LSF licenses, Mathematica licenses. It contains Dual Intel® Xeon™ CPU 3.00GHz, 4GB RAM, RHEL4 U4 ×86

Disk Storage: All Nodes mount a central EMC SAN (through NSXs) on the cluster network via NFS. Each node has local scratch space which varies from 13GB to 67GB.

Tape Backup Robot: ADIC Scalar i2000 - 600 tape unit.

Backup Infrastructure: EMC Networker v7.2.2 on Linux, NDMP clients are 6 EMC NSXs. All zoned into one Cisco Fiber switch. Sustained tape backup speeds of 110-140+ MB/s from one NSX to one LTO-3 tape drive.

Queuing System: LSF v6.0 from Platform Computing (www.platform.com)

Current queues defined:

QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
interact         86  Open:Active       -    1    -    -     0     0     0     0
dellblades       85  Open:Active       -    -    -    -   460   416    44     0
delllong         84  Open:Active       -    -    -    -     4     0     4     0
rsoft            83  Open:Active       -    2    -    -     0     0     0     0
hunter           82  Open:Active       -    -    -    -     0     0     0     0
giribet          81  Open:Active       -    -    -    -    80     0    80     0
flybase          70  Open:Active       -    -    -    -     0     0     0     0
blades           65  Open:Active       -    -    -    -     0     0     0     0
CGRshort         60  Open:Active       -   50    -    -     0     0     0     0
CGRnormal        50  Open:Active      40    -    -    -   460   420    40     0
CGRlong          40  Open:Active       -   60    -    -     0     0     0     0
short            30  Open:Active       -   20    -    -     0     0     0     0
normal           20  Open:Active       -    -    -    -  5285  5161   123     1
long             15  Open:Active       -   10    -    -    24     0    24     0
 

interact - for interactive jobs with a MAX of 12hrs runtime, 1 job per
user dellblades - for jobs to run on Dellblades1-10, primary group that uses these is the karplus group.


rsoft - for jobs from the Lieber lab to run rsoft based jobs, only run on host cfa15 for 12hrs MAX runtime.


hunter - for jobs from the hunter lab group to run on hunter1, they have priority access to this machine.
giribet - for jobs from the Giribet lab to run on giribet1-2. They have
exclusive access to these machines.
flybase - for jobs from the Flybase group, to run in hosts fb1-22. They have priority access to fb1-22.
CGRshort - for jobs from CGR members that have a MAX runtime of 1hr.
CGRnormal - for jobs form CGR members that have a MAX runtime of 24hrs. maximum of 40 jobs total at any one time.
CGRlong - for jobs from CGR members, no time limit. Maximum of 60 jobs per users at any one time.
short - for all external or CGR members that run jobs for a MAX of 1hr runtime. Total of 20 jobs per user.
normal - for all external or CGR members that run jobs for a MAX of 24hrs runtime.
long - for all external or CGR members, no time limit, 10 jobs MAX per user.