ATTENDING
- Jerry Sheehan, Josh Turner, Pol Llovet, Aurelien Mazurie, Jonathan Hilmer, Thomas
Heetderks
ABSENT
- Welcome - Jerry
- Hyalite Expansion (June 2016) - Pol
- 16 new nodes installed in June
- NEW Hyalite overview:
- 60 Nodes (XEON, 36 Sandy Bridge, 24 Haswell)
- 16 cores per node for a total of 960 cores (1920 HT)
- 4 GB Ram per core
- 620 TB of Lustre scratch storage
- 10 GbE fabric w/ RDMA
- Hyalite Maintenance (September 2016) - Pol
- Lustre updated to 2.5.42
- 10GbE network drivers updated
- Migrated IPMI (management network) to new network
- RobinHood installed and initialized (for Lustre management)
- RDMA drivers installed
- discussion: will the RDMA work with our existing compilers?
- Hyalite Usage / XDMoD Stats - Jonathan
- Who is using the most CPU time
- Who is running the longest jobs
- Who is waiting the most
- discussion: what happens when jobs go over time?
- Pol: they are killed... so users should err on the side of caution and estimate long
- Who waits the most per job
- Waiting Hours vs Job Length
- History of CPU Hours
- Details on wait time
- Waiting time, everyone, August
- Pol: fair-share algorithm accumulates historical wait time and job data to try to
make things fair over time
- discussion: does the fairness algorithm effect only users, or also groups?
- Pol: I think NO, but I am not sure... Sean says it does. As we accumulate more data,
we can get a better idea of how this works
- Pol: it may make sense to weight "fairness" below the priority queue...? Right now
they are the same weight
- Jerry: initially, we wanted to keep things as simple as possible-- we continue to
want to keep changes simple relying heavily on your feedback as we consider adjustments
- Who waits for short jobs and why
- discussion: I've been running jobs elsewhere (XSEDE resources) -- do we encourage
others to do the same -- I see Hyalite as a test platform
- Jerry: we will have more opportunity now to use Pol to work with users, now that we
have Jonathan
- Jerry: we have time is this group, as we move forward, to better define Hyalite's
use case
- Pol: things Hyalite is good at?... long jobs (too long for XSEDE)... other things?
- Jerry: on the other hand, as Ben Poulter once said-- do we users, by our use, encourage
bad user behavior?
- MATLAB Campus License Update - Pol
- MATLAB Total Academic Headcount License
- Faculty, Staff, or Student
- Any machine (Home or On-Campus)
- see UIT MATLAB Help Page about local Installation
- Hyalite is running MATLAB version R2016a
- Pol: do we need older versions than R2016a on Hyalite?
- MATLAB HPC Mentors Monthly Meeting
- Jerry: we were able to secure the funding for this license, we were NOT able to secure
funding for MATLAB support from CFAC-- so how do we support users?
- discussion: this support should not be on UIT
- Jerry: we could see if our advanced users would mind answering questions from other
users...
- Pol/Mike: maybe next year we'll get the CFAC funding for this support... when we've
had time to demonstate its need
- Pol: Mathworks provides MATLAB MENTOR support through sysadmins (us), so we could
take questions to them on a monthly basis
- Computational Chemistry Class (CHMY591) - Jonathan
- Professor: Robert Szilagyi
- Students: cap of 15, currently 4
- Hyalite usage plan:
- Students start with jobs on their own systems
- Learn software, shared-user systems
- Gradually move jobs to the cluster
- By end of semester, submitting very long job
- Software
- TINKER
- MOPAC
- DFTP+
- Gaussian09
- Tcl shell
- Estimates (rough Hilmer calculations)
- Averaged: 25-50% of a single node’s capacity, 24/7 for a semester
- Heavily imbalanced: weighted towards end of semester
- Very long jobs: up to one week each (single core jobs)
- Jerry: [per Ben Poulter] this is a problem if this kind of use effects existing research
jobs
- Jerry: but this could be a great opportunity-- a unique story for classroom usage
of Hyalite
- discussion: a new CS hire (David Millman) who's specialty is HPC, would like to teach
HPC... does he use Hyalite? or XSEDE? or AWS?
- Jerry: at this point, AWS is not an option because of MSU's legal position on AWS endemnity
- discussion: I would recommend a hybrid approach-- special queue with limited nodes
for classroom use
- discussion: would be great if we could accquire nodes dedicated to classroom use with
CFAC $
- discussion: I think its a good thing-- teaching approiate HPC usage good... a class
must teach correct behavior
ACTIONS
- Need to post documentation on RDMA programatic usage on Hyalite web pages
- Need to post listing of Hyalite installed software/modules (with version numbers)
on Hyalite web pages
- Need to research Slurm job scheduling behavior as it relates to our researchers (for
Slurm configuration adjustments)
FUTURE AGENDA
- Data Challenge (Data Science Competition)
- CyberCANOE / SAGE2
- Hyalite Communication & Publicity
University Information Technology
P.O. Box 173240
Bozeman, MT 59717-3240
UIT Service Desk
Tel: 406-994-1777
[email protected]
www.montana.edu/uit/servicedesk
UIT Service and Support Portal
Location: Renne Library, 1st floor Room
115G
HOURS:
Monday - Thursday, 8 a.m. - 5 p.m.
[In Person & via email/phone/remote]
Monday - Thursday, 5 p.m. - 7 p.m.
[Remote ONLY]
Friday, 8 a.m. - 5 p.m.
[In Person & via email/phone/remote]
excluding holidays & breaks
Vice President for IT & CIO:
Dr. Ryan Knutson
[email protected]