HPC: File systems

Affiliation

OMNI Cluster

File systems

The cluster has several file systems that have different purposes. For normal users, this primarily means that your data is located in one of the following directories (and their subdirectories):

Your home directory at home/. This directory is used by default for all your data. There is a limit of 100 GB per user. You can find out more about home directories in the Home directory section.
The workspace directory under /work. Workspaces are used for the short-term storage of large amounts of data. Each time you create a workspace, a subfolder is created for you under this directory. This is your workspace. Workspaces are unlimited in size, but limited in time. A total of 1 petabyte of hard disk space is available. More on this in the Workspaces section.
The burst buffer directory /fast. This directory is physically located on a partition with solid state disks (SSDs). It is used for calculations where large amounts of data need to be moved particularly quickly. The entire burst buffer is only 32 terabytes in size. You can find information on how to use this function in the Burst Buffer section.
Groupspaces are directories for working groups/chairs that can be found under /group. Groupspaces must be requested by the head of the working group/chair. Further information can be found in the Groupspaces section.
Your NAS or XNAS (available on OMNI under /nas or /xnas ), if you have requested one from ZIMT. Please note that the directories should only be used for transfers, they are too slow for invoices and software installations. More information below in the NAS/XNAS section.

These file systems are accessible from every node in the cluster.

In addition, each node also has local storage where temporary data that is no longer required after the job has finished is stored. Information on this can be found in the Temp directory section.

Home directory

As a user, you automatically have a home directory on the cluster where you can store your data. This is located under /home/. The size of your home directory is limited to 100 GB.

Snapshots of the home directories

Daily snapshots are created of the home directories; these snapshots remain available for 30 days. If you have lost data in your Home, you can switch to the /home/.snapshots directory. The daily snapshots are stored in this folder, each in its own subfolder. Attention: These snapshots are not to be regarded as reliable backups. We recommend that you also back up your data yourself on another computer to be on the safe side.

You can simply copy files that you want to restore back to your normal home directory:

cd ./home/.snapshot/daily__0010/
cp

Example:

cd /home/.snapshot/daily.2020-08-04_0010/demo_user/
cp file1 file2 /home/demo_user

To copy folders and their contents (recursively), use the -r option. Please note that files can potentially be overwritten when copying, especially when copying entire folders. The cp command offers options such as -i or -n to control this more precisely. You can use man cp to display the help function.

Example: copy back a folder exampledir and ask each existing file whether it should be overwritten

cd /home/.snapshot/daily.2020-08-04_0010/demo_user/
cp -i -r exampledir /home/demo_user

Example: copy back all accidentally deleted files in the exampledir folder without overwriting existing files

cd /home/.snapshot/daily.2020-08-04_0010/js056demo_user352/
cp -n -r exampledir /home/demo_user

Attention: On the HoRUS cluster, all users could read the home directories of every other user. This is no longer the case on the OMNI cluster. Of course, you can still share files and subfolders of your home directory with other people yourself using the chmod command (see Linux basics).

Workspaces

For your computing jobs, it is advisable not to use the home directory but to create a so-called workspace. This has two advantages: Firstly, there is no size limit for workspaces, and secondly, they are physically located on hard disks with a faster connection. There is a time limit of 30 days for workspaces: after this period, the workspace is deleted. However, you can extend this time limit up to three times.

Please note: No automatic backups of workspaces are created!

Creating and extending workspaces

You can create a new workspace with the command:

ws_allocate

where the duration is specified in days. The maximum duration without subsequent extension is 30 days.

Please note: if you omit the duration, the workspace is only created for one day.

The workspace is created in a subfolder of the /work/ws-tmp/ directory and is made up of your user name and the workspace name you have specified. The workspace can also be accessed with cd like any normal folder. In the following example:

$ ws_allocate test1 4
Info: creating workspace.
/work/ws-tmp/demo_user-test1
remaining extensions : 3
remaining time in days: 4

you will see that a workspace named test1 has been created for an initial period of 4 days.

If you want to extend an existing workspace, enter

ws_extend

with the name of an existing workspace and a duration. You can extend the workspace a total of three times for up to 30 days each time. If you enter the name of a workspace that does not yet exist, it will be created as if you had used ws_allocate.

The ws_allocate command also has other functions that you can display with man ws_allocate.

Select file system manually

In contrast to HoRUS, OMNI allows you to select the file system for the workspace. The file system for the normal workspaces has the name work in the workspace mechanism, the burst buffer described below has the name fast. You can use the command ws_list -l to display the file systems.

With ws_allocate and ws_extend, you can use the -F option to specify where you want to create the workspace. If you do not use -F, the default(work) is used:

ws_allocate -F [work|fast]   
ws_extend -F [work|fast]

Note: If you extend a workspace, you must specify the same file system as for the original ws_allocate. This means that if, for example, you created a workspace with ws_allocate -F fast, you must also extend it with ws_extend -F fast. If you originally created the workspace without -F, the default(work) was used and you do not need to specify this again.

E-mail notifications

The workspace mechanism can send you an e-mail before the workspace expires.

We recommend that you always use this function to avoid data loss.

The corresponding command is

ws_allocate   -r  -m

Use the -m option to specify the address and -r to specify the interval (in days) until expiry at which you would like to be notified. If you do not want to enter the e-mail address manually each time, you can create a file called .ws_user.conf in your home directory. In this file, write a text according to the following pattern:

mail: demo_user@uni-siegen.de

Please note that there must be a space after the colon(YAML syntax).

You can also create a calendar appointment using

ws_send_ical

Display workspaces

You can display your existing workspaces by using

ws_list

in the list.

Release (delete) a workspace

If you no longer need a workspace, you can release it. Attention: the files in this workspace are then no longer accessible!

To do this, use the command:

ws_release

Restore workspace

Expired workspaces are no longer accessible as described, but are not deleted immediately. Expired workspaces or workspaces released using ws_release are kept for up to 10 days before they are permanently deleted. This makes it possible to restore the data in workspaces that have expired by mistake. To do this, proceed as follows:

You can display a list of your expired workspaces using:

$ ws_restore -l 
--
        unavailable since Tue Jun 12 09:30:01 2018

Create a new workspace:
```
ws_allocate  
```
Restore the expired workspace in the new workspace with the command ws_restore. To do this, you need the complete name of the old workspace (including your user name and an ID number), which you can display using ws_restore -l as already mentioned.
```
ws_restore -- 
```
The new workspace will contain the old one in a subdirectory.
Type the displayed text. This is so that workspace restores cannot be automated.

Burst Buffer

The OMNI cluster has a so-called burst buffer, i.e. faster data storage. This consists of SSDs and is 32 terabytes in size.

There are two things to bear in mind when using the burst buffer:

The Burst Buffer is less stable than the other file systems, so you should move the data to a regular workspace as soon as possible after your calculation.
At 32 TB, the burst buffer is not too large and is accessible to all users of the cluster. Please only use it if you really need the higher processing speed.

Creating a workspace in the burst buffer

Functionally, your directories on the burst buffer are also workspaces. Therefore, most of the commands work as described in the previous section. To create a workspace in the burst buffer, use ws_allocate as usual, except that you must also specify that the file system under /fast is to be used.

ws_allocate -F fast

Please note that you must also specify the -F fast option for a ws_extend.

You can display the list of all file systems on which you can create workspaces with the command ws_list -l:

$ ws_list -l
available filesystems:
fast
work (default)

Temp directory

Many applications create temporary files, which you as a user do not always notice because the operating system provides a temp directory(/tmp) for this purpose by default. Each node has a temp directory on its local memory. Other nodes cannot access it. Temporary files are no longer required after the end of the program and are normally deleted by the applications themselves.

In the past, it occasionally happened that files in the temp directory were not deleted cleanly after a job and the directory filled up over time. We have now implemented a mechanism that automatically cleans up this directory after the job without affecting other running jobs.

Most applications use common environment variables to determine the storage location of temporary data and therefore do not require any customization. If this is not the case with your application or your self-written programs and scripts, please note the following information:

When the job is started, a new directory is created on each compute node involved (e.g. hpc-node300) under /tmp, which can be clearly assigned to your job(/tmp/slurm_. ). After the job ends, this temporary directory is deleted again.
The path to this temporary storage location is stored in the environment variables $TMP, $TEMP, $TMPDIR, $TMP_DIR, $TMPSESS.
For applications where you as the user explicitly specify a temporary directory, e.g. in a config file or via an option when calling the program, you should adapt this accordingly (e.g. -tmp_dir=$TMP / -temp=$TMP).

You must also adapt scripts that access the /tmp directory directly. Corresponding locations must be replaced by a query of the environment variable TMP.
When starting a shell in an interactive job(srun --pty ... /bin/bash), you must set the environment variables manually. E.g. with the following command: export TMP=/tmp/slurm_${SLURM_JOB_USER}.${SLURM_JOB_ID}
You can check the use of the temporary folder by connecting to one of the participating compute nodes during job execution and checking where your files are located. Your files should only be located under /tmp/slurm_. but not directly under /tmp.

Display your jobs, where you will also find a list of the nodes involved in the job:
squeue -u
Connect to a node on which your job is running:
sshhpc-nodeXXX
Find your files among all readable data:
find /tmp ! -readable -prune -o -user -print
Close the connection to the compute node:
exit
All data located directly under /tmp is not automatically cleaned up. In this case, check whether your scripts or applications can be configured so that the data ends up in your temporary directory, or delete this data manually at the end of the job (see next point).
If it is not possible to redirect temporary outputs to the corresponding folder, you should delete your data manually at the end of the job. To do this, you can include the following commands at the end of your job script or put them in a separate script that you call at the end of your job:
```
job_list=`/cm/shared/apps/slurm/current/bin/squeue --noheader --format=%i --user=$USER --node=localhost` || exit 0
if [ -z "$job_list" ] ; then
  rm -rf /tmp/.
fi
```
This will only delete your data under /tmp and only if you no longer have another job running on the node.

Groupspaces

For working groups and chairs, we offer the option of setting up a group space. A groupspace is a directory on the OMNI cluster to which only members of your group have access. This enables you to provide software installations and data for your group centrally on the OMNI. A groupspace behaves in a similar way to your home directory, it is available on both the login and computing nodes and can be accessed from outside via ssh. This distinguishes it from the group drives (XNAS), which can only be used from the login accounts. The size of the group space is limited to 100 GB.

A group directory can be set up on request at support@zimt.uni-siegen.de. The owner and person responsible for the group directory is then the person who heads the working group/chair (professor etc.). This person must be authorized for OMNI access as described here
described here. This person can add or remove group members themselves via the ZIMT self-service portal. Note: only one groupspace can be requested per working group leader.

If your working group/chair has already set up its own group for XNAS access, you can also use this group for your groupspace. If you do not yet have such a group, we will create a new group according to the following naming scheme when you apply: hpc_.

You can then find your Groupspace on OMNI under /group/.

A groupspace is suitable in the following cases, among others:

If your workgroup uses software that only group members should have access to (e.g. for licensing reasons).
If your workgroup uses software that the HPC team does not want to install centrally (e.g. because nobody outside the workgroup uses it or because it would mean a disproportionately high maintenance effort for the HPC team).
If several people use the same input files or other resources.
If a workgroup member maintains a software installation, but the software is also used by other members of the workgroup.

Software installation

The installation of software by you as a workgroup is expressly permitted in the Groupspaces. The HPC team will be happy to advise and support you. However, we cannot take over the complete installation for you and can only provide troubleshooting as part of the usual user support (e.g. during office hours
or in separate consultations).

You can modify the owning group of files using the chgrp command. Note: By default, new files and directories are created with the primary group of the creating user. The primary group for most OMNI users is unix-user. As a rule, you must therefore change the group membership to your workgroup for all newly created files and directories.

You can change permissions for the groupspace and its subfolders and files using the chmod command. Of course, files that are to be used by the entire group also require the corresponding group permissions. You can set permissions for the group using the g option, for example

chmod g+x

would make the file executable for the group. You can find more details in the chmod man page.

Connection to the ZIMT storage services (NAS/XNAS)

In order to facilitate the transfer of data for users of the OMNI cluster, it is possible to use the NAS and XNAS storage services provided by ZIMT. As this option is to be used exclusively for the transfer of data to or from the OMNI cluster, the network drives are only accessible via the login nodes. Please also note that we do not create automatic backups of the network drives; users are responsible for this themselves! The NAS service cannot be shared with third parties, but the XNAS can be used within the group to exchange files.

Users who have requested the storage services must execute the kinit command to gain access and then enter the password of their Zimt account:

$ kinit
Password for @UNI-SIEGEN.DE:

Please note that entering the password will not be displayed.

The network drives are then accessible under the paths /nas/ for NAS or /xnas/.