Getting Started with NVIDIA Containers and DIGITS

For the GTC 2019 conference, IAS wanted to put together something to demonstrate the value of NVIDIA’s DGX platform for artificial intelligence and machine learning. I’ve never actually done anything like this before and so I welcomed the opportunity to get my hands on the DGX Station and learning how to unleash the power of Tesla V100 cards. I’m far from building SkyNet, but it only took a few days to get the hang of everything and get something rolling.

To help others on this journey, I’m going to outline the process I used to get the DGX Station running a DIGITS docker container, download the MNIST data set, and do the basic test of DIGITS with some image classification. This is not the only way to accomplish this and this way might be a little “dirty”, but it worked and the results were there. Thus, take this as a guide and move on from there.

In our lab, everything is done via SSH up until we get to the DIGITS interface for building the data models and running our training. So to start, SSH into your DGX host. Additionally, you’re going to want to open a web browser and access the NVIDIA GPU Cloud (NGC) at https://ngc.nvidia.com.

The NVIDIA GPU Cloud is the repository for all of the NVIDIA docker containers that have already been optimized and configured to leverage GPUs from within Docker containers. You can do this manually but the process can be time consuming and one main values of the NGC is using those containers to skip to the head of the line when it comes to starting your AI projects. Why get tied down trying to get GPUs to work in your application like TensorFlow or Caffe, when it’s already been done for you?

Now that we’re logged into both the SSH session on the DGX and the NGC, we’re going to get the DIGITS container onto the DGX. From within the NGC, search for “digits” and you should find a result for NVIDIA’s DIGITS.

You can click the download button in the top right corner to get URL copied to your clipboard or you can click the name to get some more details. From that new window, you can hit the Pull button in the top right corner or the copy button to grab the URL as well.

Switch over to your SSH session and paste the pull command to begin the container download. The version of Caffe will change as it’s updated over time. FYI – It changes quite frequently.

docker pull nvcr.io/nvidia/digits:19.01-caffe

A little about DIGITS while that is downloading. Quite simply, DIGITS is a web based interface for Caffe and TensorFlow that NVIDIA developed for end users to jump into deep learning faster. By creating a simple web interface, data scientists get right into learning. It’s just to help speed up the initial process, but it’s a great way to test your DGX and good for demos.

Once the container has been downloaded, we’re going to need to setup a location for our MNIST data set. This part is a little wonky. Since these are containers, the content within the container is non-persistent, but we don’t want to keep downloading the MNIST set every time we want to play with DIGITS. The data set is small and can fit locally, but you can put it anywhere that the container has access to via the local linux system. NFS or local is best. I put it locally for demo purposes and so I created the following folder path:

/data/mnist

I also need to make a workspace directory for DIGITS to save my work. I created the following path as well:

/data/digits/jobs

Additionally, there’s another catch. You can just download the MNIST data set and copy it over to that location, but if you want to test multiple images later, you’re going to need a text file that has the path information for all of the images you want to run the test on. The container has a Python script to do this for you at the time of download, but it’s in the container and therefore we want to route the download to the local or mounted folder outside of the container so we only have to do this once.

With that said, let’s get the container started. At the time of this writing, I was using Caffe 19.01.

nvidia-docker run --name digits --rm -d -p 8888:5000 -v /data/mnist:/data/mnist -v /data/digits/jobs:/workspace/jobs nvcr.io/nvidia/digits:19.01-caffe

Let’s review the command quickly:

run = run the container
–name = the name the container will be called (this is not the image name)
–rm = remove the container when it’s been terminated
-d = run container in background and print the ID
-p = create a port map from 8888 (external) to 5000 (internal to the container). So when I go to the webpage, I’m accessing it on port 8888, but internally it’s configured by NVIDIA to run on port 5000.
-v = create a volume map from the local file system to the container file system. The first half is the local file system path and the second in the path in the container.
nvcr.io/nvidia/digits:19.01-caffe = this is the name of the image we are launching as a container and it’s tag. (note your version may be different)

Now it’s time to download those MNIST images. First, we need to get into the container to run a command.

docker exec -it digits /bin/bash

Once inside the container, run the following to download the MNIST images. Change the last part to the folder you want the images to land in if your path is different than mine.

python -m digits.download_data mnist /data/mnist

It will take a bit depending on your internet connection. I believe the data set is about 4 GB total with 50,000 images. When the download is complete, verify that you have images in the /data/mnist directory. You should also see new sub-directories for training and testing as well.

Switch over to your web browser and access your DGX system by going to http://IP_Address_Or_URL:8888. You should be presented with the DIGITS webpage.

Looks like it’s working. Let’s add our dataset. Click the Datasets tab and then click Images on the right followed by Classification.

Under image type, select Greyscale and use an image size of 28×28. Leave Squash as the resize transformation. Additionally, enter your MNIST directory path, adding the “train” folder, for the Training Images path. Leave all of the remaining fields the same except for the very last field for “Dataset Name”. Put anything you want for the dataset name.

Watch the magic happen! You’re now training for your first data model! (Maybe). From my observations, only the first V100 card is used in this process.

When it’s complete, which took less than a minute to complete on my DGX Station, you’re done training your first data set. You can look at the results within that page and see how fast your set was trained. Now, we’ll move onto the model to see if we can do some…….

Click the DIGITS button at the top of the webpage and select Models. From the Images blue drop down box, select Classification.

Select your dataset from the top left box. Leave all of the values as they are and scroll down the neural networks to use. From the Standard Networks, select LeNet. Then, select how many GPUs you want to use. You can use only one or as many as you have. You can also select specific GPUs if you have multiple. Finally, give the model a name and click Create.

At that point, the training model job will begin using the GPUs that you have specified. On my DGX Station, this takes less than a minute to complete. The result looks something like this:

So let’s test it out. Scroll down on that page to the Trained Models section. There, you can test how well the training did by testing either a single image or set of images.

To test the single image, you can either browse the “test” folder in the MNIST dataset that you downloaded and just choose one of the images, or you can click the Browse button to choose an image from your computer to analyze. If you want to use one of your own, you’ll need a 28×28 pixel image of a number and a transparent background. You can use MS Paint or some other free paint program to do this. Select your image and then click the checkbox for “Show visualizations and statistics”. When ready, click “Classify One”.

DIGITS will then attempt to identify the image and show you the results. Neat!

For testing a list of images, you’ll need the TXT file that was created as part of the python download script. Use that TXT file as the source for the list of images. If you need to modify the path for some reason, any decent TXT editor can help you do a quick find and replace to fix the path. Click the Browse button on the right to Upload Image List and then click “Classify Many”.

That’s it!

With that you’ve deployed NVIDIA DIGITS, got a dataset downloaded and prepared, built the training data set and then created the training model, and finally tested to ensure it works.

I will note that using the canned test images is great, but for fun you should really create your own image and try to identify it. It will give you more realistic results in your testing phase because it won’t be perfect.

Thanks for your time and I hope that you found this information useful. If you have any questions, please use the comment section below and if you want to get your hands on a DGX or learn more about how they can help your business, please let me know.

Getting Started with NVIDIA Containers and DIGITS

Like this:

Leave a Reply Cancel reply

Share this:

Like this:

Leave a Reply Cancel reply