Mesosphere DCOS, Azure, Docker, VMware & Everything Between – Part 4

Now that we have the docker engine up and running and all of our network & security related configurations in place, it’s time to get the DC/OS cluster rolling on top of VMware vSphere. This is the first major milestone in our entire platform setup. Let’s get moving…

Since this is not a “DC/OS Deep Dive” series, I will not go into much details on DCOS components but I will provide relevant info on why things the way they are.

Before diving into the installation steps, I highly recommend going over to DCOS Node Types and Network KBs.

https://dcos.io/docs/1.9/overview/architecture/node-types/

https://dcos.io/docs/1.9/overview/concepts/#infrastructure-network

For our vSphere DCOS cluster deployment, I will not deploy public agent nodes. To understand why, we need to go back and review the CI/CD flow.

As you remember, in our flow, the “production containers” stops at the DCOS cluster deployed on vSphere. The reason for not deploying public nodes (think of them as your DMZ deployed hosts) is customer requirement to have the production containers available only from corporate LAN or via VPN. Later on, the plan is to provide internet access via the corporate load balancer but to keep things nice and simple, we will deploy only the private agents.

The agent nodes are responsible for hosting your docker containers and for this deployment, we will have 3 of those.

Bootstrap Node Installation

Enough with the mambo jumbo, let’s get down to business. The DCOS installation requires to setup couple of files – the config.yaml config file $ the ip-detect script.

The config YAML file responsible for describing the cluster and to generate your cluster installation files. Make sure to change the bootstrap URL and add your DNS server under the “resolvers” section. Create the genconf folder and config file.

Although it is pretty self-explanatory it’s worth talking about both the exhibitor_storage_backend and the master_discovery parameters which got to do with cluster load balancing and storage configuration.

The exhibitor_storage_backend parameter basically responsible what type of storage backend the cluster Exhibitor will use. In my case I used static but in a production environment, you probably would want to use an external storage to cluster instead of DCOS internal one.

The master_discovery parameter role is rather the agent nodes will communicate with the masters via a static list of IP address or via load balancer virtual IP (VIP). Again, I used the non-recommend way and did not use LB in front of my master nodes for this deployment.

To better understand what’s the role of each component I highly recommend you read the Components KB followed by the Configuration Parameters KB.

https://docs.mesosphere.com/1.9/overview/architecture/components/

https://docs.mesosphere.com/1.9/installing/custom/configuration-parameters/

The official installation guide from Mesosphere is describing the ip-detect role best:

The following ip-detect script is written to match my setup and IP range of 192.168.x.x and also assumed your Linux network interface name is “ens192”. You can read more about it here:

https://dcos.io/docs/1.9/installing/custom/advanced/

By the end of this section, you should have two files under the genconf folder.

On the bootstrap node, download and run the DCOS configuration script (this will take few minutes to download).

We now need to deploy nginx docker container which will allow the installer to be used by the other cluster nodes on your network.

Bootstrap is ready for DCOS installation – it’s snapshot time! (just the bootstrap)

Master & Agent Nodes Installation

Now that we have the bootstrap ready for to provide the installation parameters, it’s time to install the nodes.

From the bootstrap node, SSH to each master node and run the installation with the master role (since we already configured ssh authorization, you should not be prompted to enter a password). Provide your own bootstrap FQDN and the port you entered in the config.yaml file.

If you followed my instructions, all tests should pass and after few minutes the node will be promoted to “master”. Next, do the same on all your master nodes (don’t forget to exit the ssh session before login to next node in line).

Next, do the same on the private agent nodes but make sure you change the node role to “slave”.

To make sure everything is fine and dandy, go to the Exhibitor URL at http://your-master01-fqdn:8181/exhibitor/v1/ui/index.html

Log in to DC/OS

If you didn’t understand this by now, you have a working DC/OS cluster on top of vSphere!  🙂 

Log in to the main DCOS UI web page at http://your-master01-fqdn/

setup authentication (I used Google) and see that you have all your private nodes in place.

As you can see, I have a couple of unhealthy components – The Navstar and Metronome. This is what happens kids when you forget to configure NTP  🙂 

After fixing it and enabling NTP on my nodes, under the Components tab, everything was shown as healthy.

You now have a working DC/OS Cluster – it’s snapshot time!

Awesome work on getting your DC/OS Enterprise cluster on top of VMware vSphere!

In the next part, we will deploy DC/OS via Azure Container Service.

1 Trackback / Pingback

  1. Mesosphere DCOS, Azure, Docker, VMware & Everything Between – Deploying DC/OS on VMware vSphere – Cloud Data Architect

Leave a Reply