Affiliation	OMNI cluster
Required	HPC user account
Beginner links	https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC (en) https://ryanstutorials.net/linuxtutorial/ (en)

General information

As on practically all high-performance computers, the Linux operating system is installed on the OMNI cluster (in this case Rocky Linux). Here are a few general concepts, as well as common commands, especially those related to working on the cluster.

Most of the information on this page corresponds to that in our semesterly introductory Linux course and the interactive Linux video tutorial we helped create. Another very good tutorial on Linux basics can be found here, for example.

Linux also has a built-in help mechanism that can be accessed from the console. The man command displays the so-called man page ("man" for "manual") for a specific program, i.e. a help text built into the program (if the program has one). For example, the command:

$ man sbatch

shows the man page written by the SLURM developers for the sbatch command. A man page can be moved up and down using the arrow keys and exited using the q key.

Many commands also offer an internal help function, which can usually be accessed with -h or --help. This is often identical to the man page.

Directory structure

The directory structure in Linux is a tree structure, there is a directory at the highest level, the so-called root directory, designated /. All other directories (also known as folders) are subfolders of root or subfolders of subfolders.

The file structure of Linux is slightly different to that of Windows. Whereas in Windows the individual hard disks are assigned letters and the subdivisions are unique, Linux uses so-called mounting points: The directory structure is (largely) identical for every Linux system and the various hard disks are mounted at a specific point in this directory structure, their mounting point. The advantage is that you as a user do not usually have to worry about the physical hard disks.

In the case of the OMNI cluster in particular, there are two directories in which you usually move around: Your home directory(/home/) and your workspaces(/work/ws-tmp/). If you develop software yourself, you may need to include libraries. In this case, the installation directories are important for you. Most software is installed on the cluster under /cm/shared/apps.

Attention: These directories can change at any time! It always makes more sense to use environment variables instead of hard-coded file paths if possible. How to use environment variables is explained below.

Permissions

What you are allowed to do under Linux depends on which user you are logged in as. Usually you have exactly one user name that is identical to your ZIMT ID (g-number). The user with the highest administrator rights is called the superuser or root user(root) under Linux. You will never be given root rights on the cluster; these are only for ZIMT administrators.

Users are combined into groups. Each user has a primary group and can belong to any number of other groups. You can display your groups with the command id .

Every file and every directory in Linux belongs to a user and only this user (or the root user) decides who can do what with this file. If you create a new file, you are the owner. In addition, each file is automatically assigned to a group. By default, this is the primary group of the creating user.

There are three types of access rights for files and directories: read, write and execute. The rights can be set separately for three types of users, namely for the owner, the group or all others. For example, you can specify that other users may read a file belonging to you, but not change it.

Changing authorizations

The chmod command changes the permissions of a directory or file (only if you are allowed to change them, of course). There are several input methods, the simplest is

$ chmod u+x ex1.dat

In this example, a user(u) adds(+) the authorization to execute(x) the file ex1.dat for himself. Alternatives for u are g (Group), o (Other) and a (All). Authorizations are r for Read, w for Write, x forExecute. To remove an authorization, a minus sign is used instead of the plus sign.

Auto-completion

An important function that makes working with Linux much easier is the auto-complete function. If you enter part of a command and then press Tab, the command is automatically completed, but only if it is unique. For example, it is not enough to enter sb and press the Tab key to get sbatch because there are three commands on the cluster that start with sb. In this case, you can press the Tab key a second time to get a list of all commands that start with sb:

$ sb
sb sbatch sbcast

You can see that sba, for example, would be unique. If there are a lot of options, you will first be asked whether you really want to see the complete list (you can try this out by entering only one s and pressing the Tab key twice).

The auto-complete function works not only with commands, but also with file paths.

Processes

As with Windows, Linux has a large number of processes running at any given time, including those that you have explicitly started (with a specific command). Sometimes it is necessary to monitor the status of a process or forcibly terminate it. The top command is used for this purpose. top is similar to the Task Manager in Windows.

The top interface looks something like this:

top - 10:08:18 up 53 days, 23:05, 14 users, load average: 0.12, 0.21, 0.48
Tasks: 334 total, 1 running, 333 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.6 us, 0.3 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 14854156+total, 749016 free, 1722716 used, 14606982+buff/cache
KiB Swap: 12582908 total, 12360456 free, 222452 used. 14465592+avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2610 otheruser 20 0 180156 3936 1240 S 4,3 0,0 43:08.94 sshd
 2619 otheruser 20 0 180188 3968 1240 S 4,0 0,0 43:01.12 sshd
 2291 otheruser 20 0 67804 2576 1800 S 1,0 0,0 9:34.55 sftp-server
13770 demo_user 20 0 168156 2564 1636 R 0.7 0.0 0:00.07 top
   39 root 20 0 0 0 0 0 S 0.3 0.0 1:25.02 ksoftirqd/6
 6133 root 20 0 0 0 0 S 0.3 0.0 0:00.84 kworker/u48:1
    1 root 20 0 191612 3340 1672 S 0,0 0,0 14:56.12 systemd
    2 root 20 0 0 0 0 0 S 0,0 0,0 0:02.23 kthreadd
    3 root 20 0 0 0 0 0 S 0,0 0,0 0:22.88 ksoftirqd/0
    8 root rt 0 0 0 0 0 S 0,0 0,0 0:01.71 migration/0
    9 root 20 0 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
   10 root 20 0 0 0 0 0 S 0,0 0,0 81:45.44 rcu_sched
   11 root rt 0 0 0 0 0 S 0,0 0,0 0:11.99 watchdog/0

The running processes are listed in this (abbreviated and anonymized) display. The interface of top can be operated with commands consisting of single letters, q terminates top, for example. On the far left you can see the process ID of the corresponding process. This is a unique number that Linux assigns to a process. If you need to terminate a process, you can enter k (for "kill") and then the process ID, and Linux will terminate it (provided you have the authorization to terminate it). Next to it is the user who owns the process. As you can see, there are many system processes that have root as their owner; these do not usually affect you. You can only display the processes of a specific user by entering u and then the name of the user. The following columns provide information on how much memory and CPU resources the process is using. The name of the process is on the far right.

Tip: The percentage in the %CPU column refers to a single CPU, not to the entire computer. If you are running a parallel program (one with several threads), the number in this column may be greater than 100%.

Basic console commands

Linux is mainly operated via a text interface, the so-called console (also shell or terminal), especially on remote systems. Note that this is the Bash console, which is called by default on the cluster. Like most Linux systems, the cluster has several consoles installed, which differ slightly in their handling. For example, you can alternatively use the C shell (with the csh command) if you have more experience with it.

All commands specified here can be entered manually in the console or listed one after the other in a text file (a so-called script). The shell can then execute this script and the effect is the same as if the commands had been entered manually. This is why they are also referred to as shell scripts. This is one of the reasons for the popularity of Linux - repetitive processes can be automated very easily.

Special characters

The hash character # begins a comment.
The asterisk character * is a so-called wildcard and can be used as a placeholder if any character is required. For example, if you are looking for any PDF files (the search function is explained below), you can search for *.pdf. There are a number of other wildcards.
The pipe character | is used to forward the output of a command as input to another command. This allows several commands to be concatenated.
The semicolon ; is used to separate several independent commands. Entering several commands with semicolons between them is as if these commands were entered one after the other.
The ampersand & at the end of a command executes this command in the background. You can then continue working with the console while it is running by pressing the Enter key again afterwards. This is useful if you are running an application that opens a window - otherwise the console would be blocked while the window is open.

Moving in directories

The most common operations in the Linux console are moving from directory to directory and manipulating files in the current directory. Here are the most important commands you should know. Regardless of the command, there are two other special characters that are used with file paths: The dot . denotes the current directory, two dots .. denote the parent directory.

Note: Linux is case-sensitive, a test command and a test command can have different functions. The same applies to files and directories.

Changing to another directory

You can use the cd command (for "Change directory") to change to a different directory. To do this, you can either specify the relative file path (relative to your current position) or the absolute file path. You can recognize the difference by the fact that an absolute path begins with a forward slash /(note: unlike Windows, Linux uses forward slashes for path specifications). If a subfolder Example exists in the current directory, you can reach it with

$ cd example

You can also use .. to reach a relative path at a higher tree level. For example, if you are in a directory mydir/sub1 and mydir contains another subfolder called mydir/sub2, you can reach the parent directory mydir with

$ cd ..

or the sister directory with

$ cd ../dir2

You can also specify the absolute path, so the previous command is identical to

$ cd /home/demo_user/mydir/dir2

if, as in this example, mydir is located in the home directory of the user demo_user.

Show directory

You can use the command pwd (for "Print Working Directory") to display where you are:

$ pwd
/home/demo_user

Display directory contents

The command ls (for "List") displays the files and subfolders in the current directory:

$ ls
ex1.txt ex2 ex3.dat

In this example, the folder contains a subfolder named ex2 and two files. You can also display the contents of another folder

$ ls ex2
ex4.txt

This displays the contents of the subfolder ex2, which contains another text file (it does not necessarily have to be a subfolder, any path is possible).

You can also display additional details. For example, the -l option displays a table form:

$ ls -l
total 4
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat

Here you can see the following additional information: On the far left are the access rights, which have already been explained. Note that directories are marked with d. Then you will see the owner of this folder, in this case the user demo_user, as well as the owner group, in this case hpc-gpr-hiwis. You will then see the file size in bytes. The two text files only contain one letter each and are therefore only 2 bytes in size. The size displayed for subfolders is only the amount of metadata about the folder; the size of the files contained in the subfolder is not included in this numerical value. Finally, you will see the date of the last change and the file or folder name.

Show hidden folders (and files)

A hidden file in Linux is a file whose name begins with . These files can also be displayed with ls if the -a option is used.

  $ ls -la
24 in total
drwxr-xr-x 4 demo_user hpc-gpr-hiwis 4096 23. Jul 17:06 .
drwxr-xr-x 56 demo_user hpc-gpr-hiwis 12288 23. Jul 17:06 ...
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 17:06 .hidden_ex.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:43 test1

As you can see in this example, it is also possible to combine options. In this case, ls -la is equivalent to ls -l -a. This works with many Linux programs, but is not guaranteed.

Creating a directory

The command mkdir ("Make Directory") creates a directory with the specified name:

$ mkdir test1
$ ls
ex1.txt ex2 ex3.dat test1

Rename or copy directory

The mv ("Move") command moves a directory or a file. It is also the usual method for renaming something.

$ ls # Original content
ex1.txt ex2 ex3.dat test1
$
$ mv ex1.txt renamed.txt
$
$ ls # Changed content.
ex2 ex3.dat renamed.txt test1

The cp ("Copy") command copies a directory or a file. In contrast to mv, the -r ("recursive") option must be used if a directory and its contents are to be copied.

Searching for files and directories

The find command searches a folder and all subfolders for files and folders with a specific name. Parts of the name can also be specified.

$ find . -type f -name "ex*"
./ex2/ex4.txt
./ex3.dat

In this example, -type f is only used to search for files, not folders. All files whose name begins with ex are listed. The find command has a variety of options to narrow down the search results. It can also, for example, with the option -exec ;, execute specific commands for each of the files found.

Search text in files

The grep command is used to search text files. If you want to search for a specific text within specific text files, enter

$ grep [options] "Text" filename

in. Strictly speaking, the text only needs to be in quotation marks if it contains spaces. Wildcards can also be used instead of a single file name (e.g. *.txt). Important options are, for example, -r (searches recursively, i.e. also in subfolders) and -i (ignores upper and lower case). A more complete list of options can be displayed with grep --help.

Scripts: create and execute

A script in the Linux context is a file containing a series of commands. As already mentioned, practically all commands that are entered in the console can also simply be written in a script. The execution of the script is then identical to the execution of all listed commands one after the other.

A (shell) script always begins with the line #!, e.g. #!/bin/bash. This is the complete path to the bash program. A Linux shell does not necessarily have to be at this point. For example, a script could also have #!/usr/bin/python in the first line and would then be executed as a Python script.

In order for a script to be executed, it must be executable (see above in the section on authorizations and chmod). It can then be executed like any other command by specifying the absolute or relative path.

$ ./example.sh

(Environment) variables

In addition to explicitly written arguments, variables can also be used. Variables in Bash serve the same purpose as in other programming languages and work in a similar way. The main difference to most programming languages is that variable values are output by placing a $ in front of them. However, they are defined with an equals sign var=value. Please note that there must not be a space to the left or right of the equals sign. In principle, variables in the Linux console are always text variables (strings).

In the following example, several operations are carried out with the same file:

#!/bin/bash

# Variable definition
file1="/home/demo_user/exampledir/ex1.txt"

# Displays the file content in the console.
cat $file1

# Copies the file.
cp $file1 copy_example.txt

This example demonstrates the usefulness of variables: If you want to perform the same operations with a different file, the file name only needs to be changed in one place in the script instead of everywhere.

Environment variables

A variable can only ever be accessed in the console or script in which it was defined. However, there are also so-called environment variables. These are also accessible in all sub-processes (i.e. in processes that were started by the current process). In this way, for example, a certain setting can be set before a script or program is called. Environment variables are set using export var=value. Which environment variables are set can be displayed with the command printenv.

In every Linux system, a large number of environment variables are set by default, either by the system or by installed software. For example, there is always a variable USER in Linux, which displays the user name of the person currently logged in.

Command line parameters

In the special case of shell scripts, further variables are also automatically available. For example, $0, $1, $2 and so on can be used to retrieve the arguments with which the script was started (command line parameters). If a script is started with:

$script.sh -f 5.0

then $0=script.sh, $1=-f, $2=5.0. This means that specific settings can be passed to a script.

The PATH variable

The PATH environment variable has a special function: If a command is entered in the console, the directories listed in PATH are searched for this command. Conversely, this means that a command will not be found if its directory path is not added to PATH. To add a directory to PATH, you can append the new directory, separated by a colon, at the beginning or end:

$ export PATH=$PATH:/home/demo_user/exampledir

The order of the individual paths is important because a command with the same name could appear in several directories in the PATH. The first command found with the name searched for is always used.

Caution: Please note that errors when manipulating PATH can sometimes have serious consequences because important commands may not be accessible.

Custom settings

Settings such as exporting environment variables or loading modules are only valid as long as the current shell is open. If you log out of the cluster, for example, or the script with the corresponding settings is closed, these are lost. However, there are ways to make settings permanent. The most important is the .bashrc file. This is located in your home directory and is called up every time Bash is started (i.e. also when you log in). Commands that you insert there are then executed. The .bashrc file is well suited for saving environment variables and other settings that you often need.

You can also combine settings in a separate shell script and make these settings available with the source command . Of course, you can also write this source command in the .bashrc file.

Caution: as the .bashrc file is executed each time you log in, a faulty .bashrc can result in you no longer being able to log in! Make absolutely sure that settings in .bashrc do not contain any typing errors etc. You can test settings by first entering them manually in the console. If you make a typing error, you can then restore the previous settings by logging out and logging in.

Various tips

The Linux console does not have an "undo" function. You should always check for typing errors when entering commands and make frequent backups, especially if you are working on a Linux system on which you have root rights. It is quite possible to destroy a complete Linux installation through carelessness.
File extensions are not as important in Linux as they are in Windows, in particular they are not used to specify the file type. However, it is advisable to use consistent extensions (for example, the extension .sh is common for shell scripts) so that a human can recognize the file type by looking at it.
Virtually every command is actually a program (or script), even the built-in commands in Linux. You can see the location of the program with which . This is particularly useful if several versions of the same software are installed and you want to ensure that you are running the correct variant.
You can define commands in the form of an alias, i.e. as an abbreviation for another command. For example, alias myjobs="squeue -u demo_user" creates a command called myjobs that instructs SLURM to display only the jobs of the user demo_user. Like environment variables, aliases must be written in .bashrc in order to be permanently accessible.
In addition to wildcards, there is another way to specify patterns of characters, so-called regular expressions (regex). These allow very complex patterns, but are also very difficult to learn. For reasons of space, we will not go into further detail here.
You can calculate the size of files and directories with the du command (for "Disk Usage"). Important options are in particular -h for "human-readable", here the sizes are displayed with an addition (for example GB for gigabyte), also -s if only the current directory is to be listed (otherwise each file in each subdirectory would be listed individually, which can become confusing with many files). Example:

$ du -sh *
5.4M abaqus
8.0K abaqus_plugins
4.0K abaqus.rpy
32K all_users_2018-02-07.txt
4.0K bin
4.0K bsp.f90

Linux, like Windows, allows symbolic links (shortcuts). You can use ls -l to recognize what a link to another file or directory is.

$ ls -l
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 56 30. Jul 09:24 ex3.dat
lrwxrwxrwx 1 demo_user hpc-gpr-hiwis 7 1 Aug 09:47 ex3_link.dat -> ex3.dat

You can create a link yourself with the command ln -s .
The ls command is used so often with the -l option that many Linux systems, including the one on the OMNI cluster, provide an ll command that is an alias for ls -l.
The command cd without any argument changes to your home directory, cd - changes to the previous directory (before the last call to cd).
You can use the up and down arrow keys to display the last commands entered in the console. You can use the history command to display all the last commands entered. This is particularly useful if you no longer remember the syntax of a recently entered command. You can then use history | grep to find it again. In the case of Bash, this history is saved in the ~/.bash_history file.