Enterprise Ready Jupyter Hub on the Linux Data Science VM

Hi everyone, this time we want to try to improve the data science experience of the Azure Linux Data Science VM with a special focus on accounts and authentication.

Data Scientists , as all the employees in major Enterprises, have an active directory account, the one they use everyday to logon on their laptops and to access emails. When instead they have to use Data science exploration tools like Jupyter notebooks they have to remember specific local accounts created for this objective (using notebooks) and of course the IT managing these accounts it’s even loaded of procedures and checks to manage these additional accounts in addition to the usual ones on Active Directory.

We will try to improve this experience explaining how to set up a Linux Data Science VM joined to a managed domain and have also Jupyter Hub authentication working with the very same domain.

The things to set up are the following:

  1. An Azure Active Directory that usually mirrors automatically the on-premise active directory structure and content
  2. Azure Active Directory Domain Services with its own Classic VNET
  3. Another Resource Manager VNET where one or more Linux DS VMs will be deployed
  4. A peering between the two VNETs
  5. The packages needed for the Linux OS to join a managed domain
  6. The authentication module for Jupyter Hub that makes authentication happen against the managed domain

Why all these components and complexity?

Well Azure Active Directory works mainly with oauth protocol while OS authentication works with Kerberos tickets that requires an “old fashion” managed domain and Domain Services it’s a way to have this completely managed by Azure. In addition Domain Services gives us also LDAP protocol support that is exactly what we need for Jupyter Hub.

The two VNETs are needed because Domain Services still needs a “Classic VNET” while the modern Linux DS VMs are made with Resource Manager template. The peering between the two guarantees that they can see each other even if they are separate.

Peering

Now we will demonstrate step by step how to setup all the necessary components.

Step N. 1 Create an Azure Active Directory

Go to https://manage.windowsazure.com/ and here click on +New button on the left hand bottom corner, go and click on App Services > Active Directory > Directory, finally click on Custom Create , here choose name, domain name and Country .

Pay attention to country choice because it will decide on which datacenter your active directory will be.

Once done you should have something like mytestdomain.onmicrosoft.com .

Step N. 2 Create Azure Active Directory Domain Services with its own Classic VNET

Here simply follow this great Microsoft step by step tutorial   completing all the 5 tasks. Do not forget , if you do not import from on premise AD, to add at least one user , to change the password of this user and to add it to AAD DC Administrators group.

Step N.3 Create a Resource Manager based VNET

Here simply go to the new portal.azure.com and create a normal vnet paying attention to choose the addresses in way that are not overlapping with the ones of the previous VNET (so if you have choosen 10.1.0.24 for the classic VNET , pick 10.2.0.24 for the new one).

Step N.4 Define the peering between the two VNETs

Go to portal.azure.com, to the new VNET that you have just created and enable the peering :

peering

Step N.5 Deploy  and Configure Linux DS VM

Again from portal.azure.com , add a new Linux Data Science VM CentOS version and during the configuration pay attention to pick as VNET the latest one you created (the ARM based one).

Once the VM is up install the needed packages with this command:

yum install sssd realmd oddjob oddjob-mkhomedir adcli samba-common samba-common-tools krb5-workstation openldap-clients policycoreutils-python -y

Go now to /etc/resolv.conf and setup the name resolution in the putting the domain name and the ipaddress of the azure domain services (one of the two).

Here an example:

search mytestdomain.onmicrosoft.com
nameserver 10.0.0.4

Now join the domain with this command (change the user to the admin defined at Step 2)

realm join –user=administrator mytestdomain.onmicrosoft.com

Check that everything is ok with the command realm list .

Now modify the /etc/sssd/sssd.conf changing these two lines in the following way:

use_fully_qualified_names = False
fallback_homedir = /home/%u

and restart the sssd demon with this command systemctl restart sssd .

Now try to login/ssh with simple domain username (without @mytestdomain.onmicrosoft.com) and password and everything should work.

Step 6 Configure Jupiter Hub

Add the ldap connector with pip:

pip install jupyterhub-ldapauthenticator

Configure the jupyter hub configuration file in the following way (change Ip Address and other parameters accordingly):

c.JupyterHub.authenticator_class = ‘ldapauthenticator.LDAPAuthenticator’
c.LDAPAuthenticator.server_address = ‘10.0.0.4’
c.LDAPAuthenticator.bind_dn_template = ‘CN={username},OU=AADDC Users,DC=mytestdomain,DC=onmicrosoft,DC=com’
c.LDAPAuthenticator.lookup_dn = True
c.LDAPAuthenticator.user_search_base = ‘DC=mytestdomain,DC=onmicrosoft,DC=com’
c.LDAPAuthenticator.user_attribute = ‘sAMAccountName’
c.LDAPAuthenticator.server_port = 389
c.LDAPAuthenticator.use_ssl = False
c.Spawner.cmd = [‘/anaconda/envs/py35/bin/jupyterhub-singleuser’]

Now to troubleshoot and verify that everything works kill the jupyterhub processes running by default on the linux DS VM and try the following command (sudo is needed to launch jupyter hub in multiuser mode):

sudo /anaconda/envs/py35/bin/jupyterhub -f /path/toconfigfile/jupyterhub_config.py –no-ssl –log-level=DEBUG

Now try to authenticate going to localhost:8000 with domain username (without @mytestdomain.onmicrosoft.com) and password and you should be able to log on on juypter with your AAD credentials.

 

 

Annunci

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...