Jupyter Notebook on AWS

I've been running an instance of Jupyter notebook on AWS to gain access to some more powerful resources for my NN models for my master thesis. I used Chris Albon's fantastic guide to do the following.

High Level Summary

In broad strokes, here is what you need to do to get started.

  1. Start an EC2 Instance
  2. Set up security key for EC2 Instance
  3. SSH into your EC2 Instance
  4. Install Python
  5. Set up Jupyter
  6. Access notebook

Detailed Step-by-step

1. Start an EC2 Instance

Go to Amazon AWS and sign up or log in.

Navigate to the EC2 Dashboard and Launch Instance.

Next, you can choose any Linux based OS (Amazon Linux, Ubuntu, etc.). If you're brand new, you can choose Ubuntu.

On this next page "Step 2: Choose an Instance Type" you can choose any type of instance. In this tutorial, I'll choose the micro tier (which is free tier eligible). You will probably need more firepower to run train complex algorithms, but you can adjust the type later.

Step 2: Choose an Instance Type

Follow through with the review and finally click "Launch."

2. Set up security key for EC2 Instance

After clicking "Launch" you will be prompted with a few security key options.

If you're familiar with this, you can skip down to the next step.

For those who don't know what this is, a PEM file is a key that AWS will check when you try to access your EC2 instance via SSH. Since you don't have one yet, let's create a new pair.

Once you select the option, write a key pair name and download.

Save your PEM file in a secure place, anyone with access to that file can theoretically get into your EC2 instance.

Congrats! You now have a working EC2 instance (well, it usually takes a minute to get up and running).

3. SSH into your EC2 Instance

Once you're back on your dashboard, take note of what the Public DNS (IPv4) address is for your instance. It should read something like ec2-123-45-678-999.compute-1.amazonaws.com . This is the public hostname of the instance, which resolves to the public IP address or Elastic IP address of the instance.

Via AWS

By far the easiest and most convenient way for most beginners will be to connect directly through AWS in their browser.

Click on Connect and select A Java SSH Client directly ... option and click Launch. Make sure to include the correct path to your PEM file.

Congrats! You're in! Now skip the For Windows and For Mac section and head straight to 4. Install Python and set up Jupyter.

For Windows

If you're on Windows, you might not be familiar with the terminal (like bash or zsh) and don't have access to one natively. If you are familiar and have access to one, refer to the Mac section. If you aren't, it's time to download PuTTY, which can be downloaded here.

Once installed, you need to open PuTTYgen, which will be accessible through your Windows key (press your Window key and search for PuTTYgen and it should pop up).

PuTTYgen

Click on Load and load up your PEM file. After the prompt, click Save private key and save the PPK file. All this KeyGen did was convert your key into a usable format for PuTTy.

Now load up PuTTy.

We first have to load up your brand new PPK file. To do that navigate on the side panel Connection > SSH > Auth. Browse and open your PPK file from where you saved it.

Load up your PPK File first

Navigate back to Session and paste in your Public DNS for your EC2 instance. Click

Great! Now skip the For Mac section and head straight to 4. Install Python and set up Jupyter.

For Mac

Open your terminal and type in the following:

$ ssh -i "/PATH/TO/keypair.pem" ec2-user@ec2-xxxxxxxxxx-xx.compute-1.amazonaws.com

replacing /PATH/TO/keypair.pem with the path and name of your key pair that you downloaded earlier, and replacing ec2-xxxxxxxxx-xx.compute-1.amazonaws.com with your own Public DNS.

Hint: type "pwd" into your terminal if you’re unsure what your present working directory is

If you get an error that your PEM file is not publicly viewable, you made need to execute this command:

$ chmod 400 /PATH/TO/keypair.pem

4. Install Python

Now that you're in, you should be seeing something like

ubuntu@ip-172-31-61-252:~$

Now we need to install Anaconda. We can download it by using the following command:

$ wget https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh
You can get the latest installer by visiting https://www.anaconda.com/distribution/

Once it finishes downloading, install Anaconda using the following command:

$ bash Anaconda3-2018.12-Linux-x86_64.sh

Next, you'll be prompted through a lot of questions and eventually install Anaconda3. At the end, you'll be asked to include Anaconda3 into your .bashrc PATH. Make sure to type yes.

(If you accidentally pressed enter before typing no, take a look at the bottom of this post at "Extra" to fix it!)

Now, let's set up Aanaconda3 as your default Python environment. Depending on which image you started off with, youur instance might be configured to use the system's Python 2.7. To switch your environment to what we just installed, type out the following two commands:

$ which python /usr/bn/python
$ source .bashrc

5. Set up Jupyter

In this step, we are looking to do – things.

First, we set up a password for your Jupyter Notebook so you can access it via brower privately. Then, we have to take the SHA version of said password and create a certificate so you can access your notebooks from your local computer through your broswer.

We first access the Ipython console with the following:

$ ipython

This will now show you the prompt In [1]:. We want to create a password, so we import a password module:

In [1]: from IPython.lib import passwd

then

In [2]: passwd()

Type in your password and verify. Remember this password. After that you'll be given the SHA version. Store this password for later. Now, type in exit to exit.

Next, we are going to generate SSL certificate so our browser will trust the Jupyter server. Start with:

$ jupyter notebook --generate-config

Then the following two commands to make and access your new directory:

$ mkdir certs
$ cd certs

Then we create a new PEM file (different from the PEM file stored localled to access AWS):

$ sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

This certificate is good for 365 days. You'll be asked for a bunch of personal information, but its okay to just press enter through it and not provide anything.

Now, we have to adjust the Jupyter configuration files with the new certification we created. To get started, let's get back out to the home directory:

$ cd

Now, we'll open Vim again and edit the config file created earlier:

$ vim .jupyter/jupyter_notebook_config.py

Again, I'll leave you and Google to figure out how to type/paste stuff into the file using Vim. But before you do, you'll notice the entire file is commented out, so feel free to put the following lines anywhere (making sure your replace the password with what you stored earlier):

c = get_config()

# Kernel config
c.IPKernelApp.pylab = 'inline'  # if you want plotting support always in your notebook

# Notebook config
c.NotebookApp.certfile = u'/home/ec2-user/certs/mycert.pem' #location of your certificate file
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False  
c.NotebookApp.password = u'sha1:262xxxxxxxxxxxxxxx65f'  
c.NotebookApp.port = 8888

Finally, from your home directory, create a folder for your new notebooks and start the Jupyter notebook from inside:

$ mkdir Notebooks
$ cd Notebooks
$ jupyter notebook

6. Accessing Jupyter Notebooks

Now we have to set up the proper security rules for your EC2 instance. Go back to your EC2 Dashboard and look at the instances you have running. If you scroll to the far right, you can see the column Security Groups, click on the associated security group of your instance (it should read something like launch-wizard-1).

Now add another line in the Inbound tab. Make sure your port range includes 8888 (as we set up earlier).

Two lines for two inbound traffice (SSH + broswer)

Save it. Now open up your browser and put in your Public DNS for your EC2 instance and it should aslo include https:// at the beginning and :8888 at the end. For instance, https://ec2-123-45-678-999.compute-1.amazonaws.com:8888/.

Extra

If you accidentally hit enter before typing yes, it defaults to no. So now, you’ll have to manually type the PATH into your .bashrc file. To do this type:

vim .bashrc

Vim is a text editor (just like Notepad on Windows or Notes on Mac). But, if this is your first time, it might seem confusing. I'll let you do some googling and figure out how to do the following.

Once you open up .bashrc you'll have to add the following line at the bottom of the file:

export PATH="/home/ec2-user/anaconda3/bin:$PATH"

Save and exit!