Things covered here:
- Introduction to CLIMB
- Exercises to recap Bash
Although your MacBooks run a version of Unix, most bioinformaticians prefer to use Linux and so most programs and pipelines work best on Linux. In bioinformatics, most researchers also prefer to use remote virtual servers over their own personal (bare metal) machines for several important reasons:
We will be using a Jupyter notebook server, which is an easy-to-access web-based platform that allows users to create and interact with Jupyter Notebooks, which in turn are documents that can contain live code, equations, visualizations, and text. The server runs the computational backend that provides command line access to a Linux shell, but can also processes code cells written in languages like Python, R, or Julia, and returns the output to the browser interface where users can view and manipulate results.
Before we get stuck in to using a Notebook server, let’s set the scene with a short Powerpoint presentation
The terminal allows us to access a system shell (just like on your MacBooks), so let’s start there.
Select File -> New -> Terminal, or click the Terminal icon on the launcher pane. You’ll get a new terminal tab in the activity bar, and find yourself in a bash shell.
Looking at your bash prompt, you’ll notice that your username is jovyan (derived from “Jupyter”). Why does everyone have the same username in this environment? That’s because your notebook server is running inside a container. Containers are lightweight, self-contained environments that bundle an application (in this case, the Jupyter notebook server) together with its dependencies, ensuring that it runs consistently across different systems.
The container instance is private and linked to your specific user storage on the system, but the actual image (which includes the software, configurations, and libraries) is the same for everyone using the platform. This is why it’s unnecessary to have unique system users for each individual—everyone operates within the same standardized environment, hence the shared username.
A container is a technology that allows you to package up an application, along with all of its dependencies, libraries, and configurations, into a single isolated environment. This guarantees that the application runs consistently, no matter where it’s deployed. Unlike virtual machines, which simulate an entire operating system, containers are more lightweight because they share the host system’s kernel, making them faster and less resource-intensive.
In a Jupyter Notebook server context, containers allow multiple users to have their own private instances, even though they all use the same underlying software image. This ensures consistency across users, simplifies deployment, and provides isolation—what happens in your container doesn’t affect others. It’s a key technology in environments where reproducibility and ease of scaling are important, like in data science and bioinformatics workflows.
TLDR: Don’t worry about it! Inside your notebook server, everyone’s username is set to jovyan because you’re working inside a standardized container environment that provides all the necessary tools and libraries for your Jupyter notebooks.
By default, you’re in a bash shell running against the base operating system of the climb-jupyterhub container image (which is based on a flavour of Linux called Ubuntu). You’ll see in your bash prompt that you’re in your home directory (represented by the tilde character ~).
jovyan:~$ pwd
/home/jovyan
sudo hereOne Unix command that you cannot use on this Notebook server is sudo. Try it
sudo touch file.txt
jovyan doesn’t have sudo privileges. This may seem restrictive, but we’ve pre-configured the climb-jupyter base image with everything you’d likely need sudo for pre-installed. Everything else should be installable via package managers, such as Conda, which we are going to learn more about after coffee.
Within the Terminal here, you can do almost all the things we did yesterday using Unix on your MacBooks.
In the terminal, create a for loop that outputs the name of any file that is detected as empty in the folder shared-team/2025_training/week1/unix_intro/gene_annotations
Can you make it into a script that you can pass a folder of your choice to check for empty files?
Can you make it run like a standard bash command? (i.e. without calling bash first with bash <script>.sh)
How do you get bash to do resolve basic arithmetic (5 + 10)? (May have to ask google)
Challenge: Find all the genes with a sequence length greater than 2000 nucleotides in gene_annotations.csv file. Output the full sequences of these genes to a file called long_genes.fa
This training course was adapted from material develop by Andrea Telatin and Lisa Marchioretto.