The Scholarly Data Archive (SDA) at Indiana University
On this page:
- System overview
- System information
- Working with electronic protected health information
- Requesting an account
- Transferring files
- Acknowledging grant support
The Indiana University Scholarly Data Archive (SDA) provides extensive capacity (approximately 42 PB of tape overall) for storing and accessing research data. The SDA is a distributed storage service co-located at IU data centers in Bloomington and Indianapolis, providing IU researchers with large-scale archival or near-line data storage, arranged in large files, with automatic off-site copies of data for disaster recovery.
The SDA is based on the High Performance Storage System (HPSS), a consortium-developed hierarchical storage management (HSM) package that makes the SDA's hierarchy of storage media transparent to its users. The SDA's system architecture comprises fast, efficient disk cache front-end components (with a capacity of roughly 1,800 TB) that move infrequently accessed data to two high-end tape libraries (with nearly 15 PB of capacity). Using the I-Light high-performance network between IUB and IUPUI, the SDA creates two tape copies of user data simultaneously (one at each data center), adding a degree of disaster tolerance to both sites.
The SDA is well suited for storing large volumes of data (i.e.,
tens of gigabytes to several terabytes per project), and data that are
accessed relatively infrequently (i.e., archival or near-line
storage). The SDA backend is not designed for storing a large number
of small files. Individual files should be at least 1 MB. If you need
to store many small files on the SDA, use a file compression utility
bundle your files into a single, large archive file.
The SDA supports high-performance access methods, such as the Hierarchical Storage Interface (HSI); an HPSS API is available for programmers, as well.
Note: At IU, the initials SDA, MDSS, and HPSS are often used interchangeably to describe the same service.
Note: The SDA is offline for regularly scheduled maintenance every Sunday 7am-10am.
|Machine type||Distributed HPSS data archive|
|Operating system||Red Hat Enterprise Linux 6|
|Network file system protocols||HSI/HTAR, CIFS (Samba) as read-only, SFTP/SCP, HTTPS|
|Usable tape capacity||15 PB|
|Total disk capacity (cache)||1800 TB|
|Quotas||50 TB (default) per user, 50 TB (default) per project; increases as needed|
|Backup and purge policies||Dual copies of data, but no backups; system is never purged|
|Aggregate I/O||80 Gbps|
Working with data containing PHI
The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of individually identifiable health information. The HIPAA Privacy Rule and Security Rule set national standards for maintaining the confidentiality, integrity, and availability of protected health information (PHI), requiring organizations and individuals to implement a series of administrative, physical, and technical safeguards when working with PHI.
This system is "HIPAA-capable", meaning it meets certain requirements established in the HIPAA Security Rule. You may use UITS HIPAA-capable resources for research involving data protected under HIPAA, or protected health information (PHI), only if you institute additional physical, administrative, and technical safeguards that complement those UITS already has in place. For details, see When using UITS HIPAA-capable systems at IU, what safeguards must I implement to comply with rules that protect the privacy and security of electronic protected health information? You can also contact the Advanced Biomedical IT Core for help.
Note: UITS HIPAA-capable resources are not recognized by the IU Committee of Data Stewards as appropriate for storing institutional data elements classified as Critical that are not PHI. For help determining which institutional data elements classified as Critical are considered PHI, see Which data elements in the classifications of institutional data are considered protected health information (PHI)?
Requesting an account
- For instructions on requesting an individual SDA or RFS account,
see At IU, if I already have some computing accounts, how do I get others?
- For instructions on requesting an SDA or RFS account for an IU
group or department, see Requesting IU computing accounts for groups or departments.
Note: In accordance with standards for access control mandated by the HIPAA Security Rule, you are not permitted to access ePHI data using a group (or departmental) account. To ensure accountability and enable only authorized users to access ePHI data, IU researchers must use their personal Network ID credentials for all work involving ePHI data.
After submitting your account request, UITS will notify you via email when your account is ready for use.
For eligibility requirements, see the "Research system accounts (all campuses)" section in What computing accounts are available at IU, and for whom?
Once you have an SDA account, you can access it from any networked host. The method you use depends on your operating system and level of comfort with the command-line interface.
Methods available for transferring data to and from the Indiana
University Scholarly Data Archive (SDA) include
Hierarchical Storage Interface (HSI), secure FTP (SFTP),
secure copy (SCP), and
https (via a web
browser). For instructions, see:
- At IU, how do I use HSI to access my SDA account?
- At IU, how do I use SFTP or SCP to access my SDA account?
- At IU, how do I use the Scholarly Data Archive web interface?
Read-only access is available via CIFS/Samba; see At IU, how do I access the SDA via Samba?
HSI, the highest performing non-grid method, provides shell-like
facilities for recursive operations, and can take input data from
standard input. HSI also can perform file migration to tape, stage
files from tape to disk, and purge files from the disk cache. HSI is
available on the UITS research computing systems when you load the
hpss module; for more about
Modules, see On Big Red II, Karst, and Mason at IU, how do I use Modules to manage my software environment? Accessing the SDA via HSI from a
personal workstation requires installing a special client; updated HSI
clients for Linux, OS X, or Windows are available for download from
the UITS Research Storage HSI page.
For Windows or OS X users who prefer a graphical interface, UITS recommends using a graphical SFTP client. For OS X users, UITS recommends Fetch, especially if you intend to transfer large amounts of data.
Note: For reasons of code compatibility with future versions of HPSS, UITS introduced a redesigned version of the Scholarly Data Archive (SDA) web interface, which entered production January 3, 2015.
The new web interface functions much like the previous version, with a few exceptions:
- Checksums: You can't create and verify checksums in the new web interface. UITS recommends using HSI instead; see How do I use HSI to create and manage checksums?
- Access Control Lists: You can't create or modify Access Control Lists (ACLs) in the new web interface. UITS recommends using the HSI
lsaclcommand to display existing ACLs and the
chaclcommand to create and modify ACLs. For more, see At IU, how do I use ACLs to share my SDA data with other users?
If you have questions about the new SDA web interface, contact the UITS Research Storage team.
Acknowledging grant support
The Indiana University cyberinfrastructure managed by the Research Technologies division of UITS is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see If I use IU's research cyberinfrastructure, what sources of funding do I need to acknowledge in my published work?