Cluster
The Life Sciences cluster consists of the following hardware:
14 nodes: Dual Intel® Xeon™ CPU 3.06GHz, 2GB RAM, RHEL3 U8 ×86
36 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 4GB RAM, RHEL3 U8 ×86
14 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 2GB RAM, RHEL3 U8 ×86
22 nodes: Dual Intel® Xeon™ CPU 2.80GHz, 2.5GB RAM, RHEL3 U8 ×86_64
12 nodes: Quad Intel® Xeon™ CPU 2.80GHz, 8GB RAM, RHEL4 U3 ×86_64
1 node: Quad Genuine Intel® CPU 3.00GHz, 8GB RAM, RHEL4 U4 ×86_64
1 node: Dual Core AMD Opteron™ Processor 275, 4GB RAM, x86_64
1 node: Intel® Xeon™ CPU 3.20GHz, 4GB RAM RHEL4 U4, x86_64
1 node: Dual Core Intel® Xeon™ CPU 2.40GHz, 1GB RAM, RHEL4 U4, x86
Total: 229 cpus
Cluster head node: portal.cgr.harvard.edu
Dual Intel® Xeon™ CPU 3.06GHz, 2GB RAM, RHEL3 U8 ×86
Cluster Network: 1Us with 1GB NIC, IBM Blades chassis with 4 trunked ports, DellBlades with 1GB pass-through all connected into a Foundry Networks FastIron 1500 switch. 10 DellBlades have Infiniband network as well as 1GB pass-through network.
Admin host: serving the cluster is a IBM Blade server running DHCP, LSF licenses, Mathematica licenses. It contains Dual Intel® Xeon™ CPU 3.00GHz, 4GB RAM, RHEL4 U4 ×86
Disk Storage: All Nodes mount a central EMC SAN (through NSXs) on the cluster network via NFS. Each node has local scratch space which varies from 13GB to 67GB.
Tape Backup Robot: ADIC Scalar i2000 - 600 tape unit.
Backup Infrastructure: EMC Networker v7.2.2 on Linux, NDMP clients are 6 EMC NSXs. All zoned into one Cisco Fiber switch. Sustained tape backup speeds of 110-140+ MB/s from one NSX to one LTO-3 tape drive.
Queuing System: LSF v6.0 from Platform Computing (www.platform.com)
Current queues defined:
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
interact 86 Open:Active - 1 - - 0 0 0 0
dellblades 85 Open:Active - - - - 460 416 44 0
delllong 84 Open:Active - - - - 4 0 4 0
rsoft 83 Open:Active - 2 - - 0 0 0 0
hunter 82 Open:Active - - - - 0 0 0 0
giribet 81 Open:Active - - - - 80 0 80 0
flybase 70 Open:Active - - - - 0 0 0 0
blades 65 Open:Active - - - - 0 0 0 0
CGRshort 60 Open:Active - 50 - - 0 0 0 0
CGRnormal 50 Open:Active 40 - - - 460 420 40 0
CGRlong 40 Open:Active - 60 - - 0 0 0 0
short 30 Open:Active - 20 - - 0 0 0 0
normal 20 Open:Active - - - - 5285 5161 123 1
long 15 Open:Active - 10 - - 24 0 24 0
interact - for interactive jobs with a MAX of 12hrs runtime, 1 job per
user dellblades - for jobs to run on Dellblades1-10, primary group that uses these is the karplus group.
rsoft - for jobs from the Lieber lab to run rsoft based jobs, only run on host cfa15 for 12hrs MAX runtime.
hunter - for jobs from the hunter lab group to run on hunter1, they have priority access to this machine.
giribet - for jobs from the Giribet lab to run on giribet1-2. They have
exclusive access to these machines.
flybase - for jobs from the Flybase group, to run in hosts fb1-22. They have priority access to fb1-22.
CGRshort - for jobs from CGR members that have a MAX runtime of 1hr.
CGRnormal - for jobs form CGR members that have a MAX runtime of 24hrs. maximum of 40 jobs total at any one time.
CGRlong - for jobs from CGR members, no time limit. Maximum of 60 jobs per users at any one time.
short - for all external or CGR members that run jobs for a MAX of 1hr runtime. Total of 20 jobs per user.
normal - for all external or CGR members that run jobs for a MAX of 24hrs runtime.
long - for all external or CGR members, no time limit, 10 jobs MAX per user.