Karst at Indiana University
On this page:
- System overview
- System access
- Available software
- Setting up your user environment
- File storage options
- Working with data containing PHI
- Running jobs on Karst
- Queue information
- Requesting single user time
- Acknowledging grant support
Karst at Indiana University is a high-throughput computing cluster designed to deliver large amounts of processing capacity over long periods of time. Karst provides batch processing and node-level co-location services that make it well suited for running high-throughput and data-intensive parallel computing jobs.
Besides being available to IU students, faculty, and staff for standard, cluster-based, high-throughput computing, Karst offers two alternative service models to the IU community:
- Condominium computing: The condominium computing service model provides a way for IU schools, departments, and researchers to fund computational nodes for their own research purposes without shouldering the cost, overhead, and management requirements of purchasing individual systems. Condominium nodes are housed in the IU Bloomington Data Center, and are managed, backed up, and secured by UITS Research Technologies staff. Condominium nodes are available to "members" whenever they are needed, but when they are not in use, idle condominium nodes become available to other researchers and students on Karst. In this way, condominium computing promotes cost-effective expansion of IU's high-performance computing capabilities, enables efficient provisioning of computing resources to the entire IU research community, and helps conserve natural resources and energy.
- Dedicated computing: The dedicated computing service model lets schools and departments host nodes that are dedicated solely to their use within Karst's physical, electrical, and network framework. This provides 24/7 access for school or departmental use, while leveraging the network and physical components of Karst, and the security and energy efficiency benefits provided by location within the IU Data Center.
Karst's system architecture provides the advanced performance needed to accommodate high-end, data-intensive applications critical to scientific discovery and innovation. As configured upon entering production, Karst comprised 228 general-access compute nodes, 28 condominium nodes, and 16 dedicated data nodes (for separate handling of data-intensive operations). Each node is an IBM NeXtScale nx360 M4 server equipped with two Intel Xeon E5-2650 v2 8-core processors. Compute nodes have 32 GB of RAM and 250 GB of local disk storage. Data nodes have 64 GB of RAM and 24 TB of local storage. All nodes are housed in the IU Bloomington Data Center, run Red Hat Enterprise Linux (RHEL) 6, and are connected via 10-gigabit Ethernet to the IU Science DMZ.
Access is available to IU students, faculty, staff, and sponsored affiliates. For details, see the "Research system accounts (all campuses)" section of What computing accounts are available at IU, and for whom?
Once your account is created, you can user your IU
username and passphrase to log into Karst
karst.uits.iu.edu) with any SSH2
client. Public key authentication also is permitted; see How do I set up SSH public-key authentication to connect to a remote system?
Alternatively, if you are unaccustomed to working in Linux command-line environments, you can use the Karst Desktop Beta graphical remote desktop application, which lets you work on Karst from a desktop window running on your personal computer; for more, see At IU, what is Karst Desktop Beta, and how do I connect to it from my personal computer?
For a list of packages available on Karst, see Karst Modules in the IU Cyberinfrastructure Gateway.
Karst users can request software using the Software Request form.
Setting up your user environment
On the research computing resources at Indiana University, the Modules environment management system provides a convenient method for dynamically customizing your software environment.
For more about the Modules package, see the
module manual page
page. Additionally, see On Big Red II, Karst, and Mason at IU, how do I use Modules to
manage my software environment?
File storage options
For file storage information, see On IU's research systems, how much allocated and short-term storage capacity is available to me?
Working with data containing PHI
The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of individually identifiable health information. The HIPAA Privacy Rule and Security Rule set national standards requiring organizations and individuals to implement certain administrative, physical, and technical safeguards to maintain the confidentiality, integrity, and availability of protected health information (PHI).
This system meets certain requirements established in the HIPAA Security Rule that enable its use for research involving data that contain protected health information (PHI). You may use this resource for research involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place. For more, see When using UITS Research Technologies systems and services, what are my legal responsibilities for protecting the privacy and security of data containing protected health information (PHI)? If you need help or have questions, contact UITS HIPAA Consulting.
Running jobs on Karst
The default wall time for jobs running on Karst compute nodes is 30 minutes; the default virtual memory per job is 8 MB.
User processes on the login nodes are
limited to 20 minutes of CPU time. Processes on the login nodes that
run longer than 20 minutes are terminated automatically (without
warning). If your application requires more than 20 minutes of CPU
time, submit a batch job or an interactive session using the TORQUE
Because of this limit:
- When running Java programs, add the
-Xmxparameter (values must be multiples of 1,024 greater than 2 MB) on the command line to specify the Java Virtual Machine (JVM) maximum heap size. For example, to run a Java program (e.g.,
Hello_DeathStar) with a maximum heap size of 640 MB , on the command line, enter:
java -Xmx640m Hello_DeathStar
Use the TORQUE
qsub command to submit non-interactive
or interactive batch jobs for execution on Karst's compute nodes:
- Non-interactive jobs: To run a job in batch mode
on Karst, first prepare a TORQUE job script that specifies the
application you want to run and the resources required to run it, and
then submit it to TORQUE with the
Do not specify a destination queue in your job script or your
qsub command. TORQUE passes your job to the system's
default routing queue (
BATCH) for placement, based on its
resource requirements, in the
LONG execution queue. From there,
your job dispatches whenever the required resources become
If your job has resource requirements that are different from the
defaults (but not exceeding the maximums allowed), specify them either
with TORQUE directives in your job script, or with the
(a lower-case "L"; short for
resource_list) option in
On the command line, you can specify multiple attributes with
-l switch followed by multiple comma-separated
attributes, or multiple
-l switches, one for each
attribute. For example, to submit a job (e.g.,
death_star.script) that requires 24 hours of wall time
(instead of the default 30 minutes) and 100 GB of virtual memory
(instead of the default 8 MB) to run on four cores on one node, you
may enter either of the following commands (they are equivalent):
qsub -l nodes=1:ppn=4,vmem=10gb,walltime=24:00:00 death_star.script qsub -l nodes=1:ppn=4 -l vmem=10gb -l walltime=24:00:00 death_star.script
-l nodes=1:ppn=1in the script or
qsubcommand for each job.
-I(to specify an interactive job) and
-q interactive(to specify submission to the INTERACTIVE batch queue) switches; for example:
qsub -I -q interactive -l nodes=1:ppn=4,vmem=10gb,walltime=4:00:00
Submitting your job to the INTERACTIVE queue directs it to a specific set of nodes that are configured for shared access (versus single-user in the general batch queues). Consequently, your interactive job most likely will dispatch faster in the INTERACTIVE queue than in the general execution queues.
qsub options include:
||Execute the job only after specified date and time
||Run the job interactively. (Interactive jobs are forced to be not re-runnable.)|
||Mail a job summary report when the job terminates.|
||Specify the destination queue (
||Declare whether the job is re-runnable. Use the argument
||Export all environment variables in your current environment to the job.|
For more, see the
To monitor the status of a queued or running job, use the TORQUE
qstat options include:
||Display all jobs.|
||Write a full status display to standard output.|
||List the nodes allocated to a job.|
||Display jobs that are running.|
||Display jobs owned by specified users.|
For more, see the
qstat manual page.
To delete a queued or running job, use the
qdel -W <delay>to override the delay between SIGTERM and SIGKILL signals; for
<delay>, specify a value in seconds.
For more, see the
qdel manual page.
Karst employs a default routing queue that funnels jobs, according
to their resource requirements, into three execution queues configured
to maximize job throughput and minimize wait times (i.e., the amount
of time a job remains queued, waiting for required resources to become
available). Depending on the resource requirements specified in either
your batch job script or your
qsub command, the routing
queue (BATCH) automatically places your job into the SERIAL, NORMAL,
or LONG queue.
You do not have to specify a queue in your job script or in your
qsub command to submit your job to one of the three batch
execution queues; your job will run in the SERIAL, NORMAL, or LONG
queue unless you specifically submit it to the DEBUG, PREEMPT, or
INTERACTIVE queue, the properties of which are as follows:
- DEBUG: The DEBUG queue is intended for short, quick-turnaround test jobs requiring less than 1 hour of wall time.
- INTERACTIVE: Interactive jobs submitted to the INTERACTIVE queue should experience less wait time (i.e., start sooner) than interactive jobs submitted to the batch execution queues.
|Maximum wall time:||1 hour|
|Maximum nodes per job:||4|
|Maximum cores per job:||64|
|Maximum number of jobs per queue:||None|
|Maximum number of jobs per user:||2|
To submit a batch job to the DEBUG queue, either add the
#PBS -q debug directive to your job script, or enter
qsub -q debug on the
|Maximum wall time:||8 hours|
|Maximum nodes per job:||None|
|Maximum cores per job:||8|
|Maximum number of jobs per queue:||128|
|Maximum number of jobs per user:||16|
To submit an interactive job to the INTERACTIVE
queue, on the command line, enter
qsub with the
-q interactive options added; for example:
qsub -I -q interactive -l nodes=1:ppn=1,walltime=4:00:00
-q interactiveoption, your interactive job will be placed in the routing queue for submission to the SERIAL, NORMAL, or LONG batch execution queue, which most likely will entail a longer wait time for your job.
|Maximum wall time:||14 days|
|Maximum nodes per job:||None|
|Maximum cores per job:||None|
|Maximum number of jobs per queue:||1,800|
|Maximum number of jobs per user:||200|
To submit a job to the PREEMPT queue, add the
#PBS -q preempt directive to your job script, or enter
qsub -q preempt on the command line.
To see current status information for the work queues on Karst, on the command line, enter:
Requesting single user time
Although UITS Research Technologies cannot provide dedicated access to an entire compute system during the course of normal operations, "single user time" is made available by request one day a month during each system's regularly scheduled maintenance window to accommodate IU researchers with tasks requiring dedicated access to an entire compute system. To request such single user time, complete and submit the Research Technologies Ask RT for Help form, requesting to run jobs in single user time on HPS systems. If you have questions, email the HPS team.
Acknowledging grant support
The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see If I use IU's research cyberinfrastructure, what sources of funding do I need to acknowledge in my published work?
- If you have a system-specific question about Big Red II, Karst, Mason, or the Research Database Complex (RDC) contact the High Performance Systems (HPS) team.
- If you have questions about the Scholarly Data Archive (SDA), contact the Research Storage team.
- If you have questions about shared scratch or project space on the Data Capacitor II or Data Capacitor Wide Area Network (DC-WAN) file system, contact the High Performance File Systems (HPFS) team.
- If you have questions about the development tools, compilers, scientific or numerical libraries, or debuggers available on the research computing system, contact the Scientific Applications and Performance Tuning (SciAPT) team.
- If you have questions about the statistical and mathematical applications available on the research computing systems, contact the Research Analytics group.
- If you have questions about the bioinformatics and genome analysis packages available on the research computing systems, email the National Center for Genome Analysis Support (NCGAS).
For general inquiries about UITS Research Technologies systems and services, complete and submit the Research Technologies request for help form.