Blog

  • Background v1

    Kimolbran is a complex world in the outskirts of the galaxy. There are only a few reasons to come and live here, and all of them are dangerous.

  • Ansible Tower Upgrade – Quick Note on EC2 Sizing

    Last week I started on a personal project to get Ansible Tower upgraded in our lab. Our Tower deployment actually runs in AWS as an EC2 instance with some EBS backed storage. It’s a simple and common setup for folks that are running Tower. There wasn’t much excitement with this upgrade from 3.1 to 3.4 except for one minor hiccup.

    We typically run lean and mean in the lab, using only what we really need to use resource-wise. The AWS instance for Tower is a m3.medium which gives a single vCPU and 3.75 GB of RAM. Ansible Tower requires 2 GB RAM minimum but has a recommended size of 4 GB RAM. Thus, the m3.medium seems to the most ideal EC2 instance type of this type of testing workload.

    While the original 3.1 deployment had no issues with the m3.medium, the upgrade failed several times and quit reporting the following error:

    TASK [preflight : Preflight check - Fail if this machine lacks sufficient RAM.] **********************************************************************
    fatal: [localhost]: FAILED! => {"changed": false, "msg": "This machine does not have sufficient RAM to run Ansible Tower."}
    	to retry, use: --limit @/home/ec2-user/ansible-tower-setup-3.4.3-1/install.retry
    

    Doing some quick research, it seems that the requirements had changed for a beefier server size for Tower. The new requirements are 2 vCPU and 4 GB RAM minimum for the installation of Tower.

    To address this issue, I shut down the Tower instance and changed the instance type from m3.medium to m3.large to attempt the installation again. This worked successfully and Tower was able to get up and running.

    For a test, I shutdown the Tower instance again and changed it back to an m3.medium to see if it would work. It did, but I did notice that it was a little sluggish when launching, but nothing after that. I would say it’s at the borderline but sufficient enough for testing or lab work.

    Looking at other instance types, there only a few options to select a cheap instance type that meets the requirements. I highly recommend the following link for searching the different instance types and their associated costs.

    https://www.ec2instances.info/

    The T3.medium is the best choice right now for this. Giving 2 vCPU and 4 GB RAM, with boost option as well. The cost is the cheapest in the 2 by 4 category, at least at the time of this article. Always check the latest prices before committing to the cloud folks. 🙂

    Thanks!

  • Liliana and the Family

    Again, another long delay in getting updates out about the family. I really need to get better at this.

    The big news was the birth of daughter Liliana. She’s happy and healthy and as I write this, is now just over four months old. Recently, it’s been about rolling over and trying to get some forward motion going. Overall though, it’s more like barrel rolling. The only downside is that she hates being on her stomach at night sometimes, and that makes for horrible sleep for Mommy and Daddy.

    There’s the little one at three months. Big eyes like her Dad. With some luck, she won’t have my stare down eyes look.

    On the Jameson side of the house, he’ll be starting Little Kickers. We just did our first trial and we think he really enjoyed it. It was quite fun to watch him engage with the class and do some of the exercises. I think it’s still a bit of a challenge to wrangle a bunch of two years olds into doing specific activities, but it does seem possible with some effort and a lot of luck. But I couldn’t be prouder of him trying out something new.

    The last little update was that Melissa and I celebrated our fourth wedding anniversary. This year was silk or linen apparently. I gave her some red silk pajamas and she gave me a hammock! I’ve never owned a hammock but I’m excited to have one. I just need to get a place to hang it.

    That’s it for the family update. Perhaps I’ll get these out more frequently.

  • Getting Started with NVIDIA Containers and DIGITS

    For the GTC 2019 conference, IAS wanted to put together something to demonstrate the value of NVIDIA’s DGX platform for artificial intelligence and machine learning. I’ve never actually done anything like this before and so I welcomed the opportunity to get my hands on the DGX Station and learning how to unleash the power of Tesla V100 cards. I’m far from building SkyNet, but it only took a few days to get the hang of everything and get something rolling.

    To help others on this journey, I’m going to outline the process I used to get the DGX Station running a DIGITS docker container, download the MNIST data set, and do the basic test of DIGITS with some image classification. This is not the only way to accomplish this and this way might be a little “dirty”, but it worked and the results were there. Thus, take this as a guide and move on from there.

    In our lab, everything is done via SSH up until we get to the DIGITS interface for building the data models and running our training. So to start, SSH into your DGX host. Additionally, you’re going to want to open a web browser and access the NVIDIA GPU Cloud (NGC) at https://ngc.nvidia.com.

    The NVIDIA GPU Cloud is the repository for all of the NVIDIA docker containers that have already been optimized and configured to leverage GPUs from within Docker containers. You can do this manually but the process can be time consuming and one main values of the NGC is using those containers to skip to the head of the line when it comes to starting your AI projects. Why get tied down trying to get GPUs to work in your application like TensorFlow or Caffe, when it’s already been done for you?

    Now that we’re logged into both the SSH session on the DGX and the NGC, we’re going to get the DIGITS container onto the DGX. From within the NGC, search for “digits” and you should find a result for NVIDIA’s DIGITS.

    You can click the download button in the top right corner to get URL copied to your clipboard or you can click the name to get some more details. From that new window, you can hit the Pull button in the top right corner or the copy button to grab the URL as well.

    Switch over to your SSH session and paste the pull command to begin the container download. The version of Caffe will change as it’s updated over time. FYI – It changes quite frequently.

    docker pull nvcr.io/nvidia/digits:19.01-caffe

    A little about DIGITS while that is downloading. Quite simply, DIGITS is a web based interface for Caffe and TensorFlow that NVIDIA developed for end users to jump into deep learning faster. By creating a simple web interface, data scientists get right into learning. It’s just to help speed up the initial process, but it’s a great way to test your DGX and good for demos.

    Once the container has been downloaded, we’re going to need to setup a location for our MNIST data set. This part is a little wonky. Since these are containers, the content within the container is non-persistent, but we don’t want to keep downloading the MNIST set every time we want to play with DIGITS. The data set is small and can fit locally, but you can put it anywhere that the container has access to via the local linux system. NFS or local is best. I put it locally for demo purposes and so I created the following folder path:

    /data/mnist

    I also need to make a workspace directory for DIGITS to save my work. I created the following path as well:

    /data/digits/jobs

    Additionally, there’s another catch. You can just download the MNIST data set and copy it over to that location, but if you want to test multiple images later, you’re going to need a text file that has the path information for all of the images you want to run the test on. The container has a Python script to do this for you at the time of download, but it’s in the container and therefore we want to route the download to the local or mounted folder outside of the container so we only have to do this once.

    With that said, let’s get the container started. At the time of this writing, I was using Caffe 19.01.

    nvidia-docker run --name digits --rm -d -p 8888:5000 -v /data/mnist:/data/mnist -v /data/digits/jobs:/workspace/jobs nvcr.io/nvidia/digits:19.01-caffe

    Let’s review the command quickly:

    • run = run the container
    • –name = the name the container will be called (this is not the image name)
    • –rm = remove the container when it’s been terminated
    • -d = run container in background and print the ID
    • -p = create a port map from 8888 (external) to 5000 (internal to the container). So when I go to the webpage, I’m accessing it on port 8888, but internally it’s configured by NVIDIA to run on port 5000.
    • -v = create a volume map from the local file system to the container file system. The first half is the local file system path and the second in the path in the container.
    • nvcr.io/nvidia/digits:19.01-caffe = this is the name of the image we are launching as a container and it’s tag. (note your version may be different)

    Now it’s time to download those MNIST images. First, we need to get into the container to run a command.

    docker exec -it digits /bin/bash

    Once inside the container, run the following to download the MNIST images. Change the last part to the folder you want the images to land in if your path is different than mine.

    python -m digits.download_data mnist /data/mnist

    It will take a bit depending on your internet connection. I believe the data set is about 4 GB total with 50,000 images. When the download is complete, verify that you have images in the /data/mnist directory. You should also see new sub-directories for training and testing as well.

    Switch over to your web browser and access your DGX system by going to http://IP_Address_Or_URL:8888. You should be presented with the DIGITS webpage.


    Looks like it’s working. Let’s add our dataset. Click the Datasets tab and then click Images on the right followed by Classification.

    Under image type, select Greyscale and use an image size of 28×28. Leave Squash as the resize transformation. Additionally, enter your MNIST directory path, adding the “train” folder, for the Training Images path. Leave all of the remaining fields the same except for the very last field for “Dataset Name”. Put anything you want for the dataset name.

    Watch the magic happen! You’re now training for your first data model! (Maybe). From my observations, only the first V100 card is used in this process.


    When it’s complete, which took less than a minute to complete on my DGX Station, you’re done training your first data set. You can look at the results within that page and see how fast your set was trained. Now, we’ll move onto the model to see if we can do some…….

    Click the DIGITS button at the top of the webpage and select Models. From the Images blue drop down box, select Classification.

    Select your dataset from the top left box. Leave all of the values as they are and scroll down the neural networks to use. From the Standard Networks, select LeNet. Then, select how many GPUs you want to use. You can use only one or as many as you have. You can also select specific GPUs if you have multiple. Finally, give the model a name and click Create.

    At that point, the training model job will begin using the GPUs that you have specified. On my DGX Station, this takes less than a minute to complete. The result looks something like this:


    So let’s test it out. Scroll down on that page to the Trained Models section. There, you can test how well the training did by testing either a single image or set of images.

    To test the single image, you can either browse the “test” folder in the MNIST dataset that you downloaded and just choose one of the images, or you can click the Browse button to choose an image from your computer to analyze. If you want to use one of your own, you’ll need a 28×28 pixel image of a number and a transparent background. You can use MS Paint or some other free paint program to do this. Select your image and then click the checkbox for “Show visualizations and statistics”. When ready, click “Classify One”.

    DIGITS will then attempt to identify the image and show you the results. Neat!


    For testing a list of images, you’ll need the TXT file that was created as part of the python download script. Use that TXT file as the source for the list of images. If you need to modify the path for some reason, any decent TXT editor can help you do a quick find and replace to fix the path. Click the Browse button on the right to Upload Image List and then click “Classify Many”.

    That’s it!

    With that you’ve deployed NVIDIA DIGITS, got a dataset downloaded and prepared, built the training data set and then created the training model, and finally tested to ensure it works.

    I will note that using the canned test images is great, but for fun you should really create your own image and try to identify it. It will give you more realistic results in your testing phase because it won’t be perfect.

    Thanks for your time and I hope that you found this information useful. If you have any questions, please use the comment section below and if you want to get your hands on a DGX or learn more about how they can help your business, please let me know.

  • Resetting Root for iDRAC inside a VRTX

    It amazes me how confusing Dell’s troubleshooting website is, or perhaps just how difficult it is to find useful information on it. All I wanted to do was reset the Root account on an iDRAC for a M520 blade within a VRTX. I was remote from the system and could access the Chassis Management Controller, but I must have fat fingered the password reset the last time I was in one of the blades.

    Well, there is a way to do this, but it was difficult to find the solution on Dell’s site. Thus, once I figured it out, I figured I would note this down for others. Perhaps it will save someone else the headache of finding this info. For the record, these actions were taken on a VRTX running 3.2 of the CMC, so it might be different in newer or older versions.

    First, log into the CMC and go to Server Overview left menu option and then Setup tab, with sub-tab iDRAC selected. You’ll see an option for the default iDRAC Root Password. Enter the password you want for the blade(s) to have. Be sure to click the Save QuickDeploy Settings under that section.


    Second, scroll down to the bottom of that same page and locate the blade you wish to reset. Check the box for “Change Root Password” and click the Apply iDRAC Network Settings.

    Done.

    Now you can access the iDRAC as normal. I hope that helps.

    Thanks!

  • Ansible Tower Demo

    I mentioned in my recent NetApp Insight post that we created an Ansible Tower demo to show off at Insight this year. The Tower demo took the playbooks we created for Ansible Engine to spin up EC2 instances in AWS and clone a volume hosted on Cloud Volumes ONTAP, and with some slight tweaking, adjusted them to work in Ansible Tower as a Workflow.

    Like before, I wanted to record this demo so that others could see how the whole process worked. Because Tower is more a repo friendly solution for environments, we also created a GitHub repo for both the Ansible Engine playbooks and the Ansible Tower playbooks. They are slightly different, mostly because of how inventories are handled in Tower vs Engine. Additionally, there are some differences in how we handle variables and pass them from one playbook to another.

    I hope this demo is helpful for others to get going on their Ansible journey. If you want to check out the playbooks, you can find them on our IAS GitHub repository at the following location:

    https://github.com/IAS-Lab/tower

    Thanks for your time and if you like, please share and comment.

     

    Matt

     

  • NetApp Insight 2018 – IAS Update and Demos

    Hey everyone. I hope you’re all doing well as always.

    NetApp Insight 2018 was last week and for the first time, I was able to attend. There was some concern that my wife might go into labor while I was gone, but the little one was nice enough to hang out through the conference. With that said, let’s review some of the IAS stuff we did at Insight.

    IAS had a speaking session on Tuesday that had a solid turnout from attendees. The focus was on our updated Ansible demo that shows Ansible working its magic on EC2 instances and NetApp Cloud ONTAP. Shawn Hamby also did a great overview of Cloud Volume Service and Cloud Volumes ONTAP, and the benefits of each. We also showed a little demo on Ansible Tower and how the entire set of playbooks we had already done in Ansible Engine could be combined into a single Tower Workflow. I think attendees were pleased that the session was both informative and technical, and actually showed a live demo.

    Additionally, IAS was in booth #305 at the expo for the first time as well. Conveniently located near the eating area and the NetApp Store, we saw a pretty solid flow of customers, partners and NetApp employees swing by. Attendees were able to check out all of the demos we have on Ansible, Ansible Tower, Red Hat Virtualization, and even a physical NetApp HCI unit that seemed to be one of the few actually at the conference. Most importantly, there was candy swag to hand out and a really cool giveaway.

    Folks that know me can easily tell you that I love my gaming. There’s no doubt on that. So I worked with the IAS Marketing team to come up something awesome for a daily prize at the conference. Every day at the IAS booth, we had a chance for attendees to win an Oculus Rift VR System!!! 

    Image result for oculus rift vr marvel

    While the big prize was popular, I think the NetApp HCI was actually the most popular item in the booth. A lot of people had not actually seen the guts of the NetApp HCI unit and hear about the deployment considerations and details that aren’t generally covered in sessions and key notes.

    Overall, the show was a blast and our IAS event at the Skyfall Lounge in the Delano was definitely a nice reward after several days in the booth. I had a really great time and felt the show was a success from an IAS perspective. I’ll finish up with some a few special thanks for Shawn Hamby and Robert Nelson for working the speaking session with me; Drew Mogan for allowing me to attend the conference this year and giving me the time to setup the demos with Robert; and most importantly Ali Vega for a lot of the marketing around the booth, the session and the show (she is what makes these events really successful). And a final quick thanks to NetApp for having us at the expo and giving us the session.

    Thanks everyone and I hope to be there again next year!

     

    Matt

     

  • Ansible Demo v2 – Scaling to AWS EC2 using Red Hat Ansible and Leveraging NetApp Cloud ONTAP

    Hey folks. I hope everyone is doing well.

    A real quick update on the Ansible Demo that we build several months ago for Red Hat Summit in San Francisco. The original demo was build mostly using the RAW module because at the time of the creation of the demo, NetApp had not yet published modules that would be relevant to the demo. However, that’s all changed over the past few months and now NetApp has more Ansible modules than any other storage vendor. Hats off to David Blackburn and his team for working on it and keeping the updates rolling.

    With all of the changes, we decided to make a version two of the Ansible demo to showcase more NetApp modules than RAW commands. We still cannot completely move away from RAW commands as things, like collecting ONTAP version information and setting a Junction Path during a clone operation, are just not 100% available yet via a module. But at least we could incorporate a few modules to show the process.

    The version 2 video is posted below, so please check it out. Additionally, a second post coming is my demo of Ansible Tower and taking the entire Ansible demo and putting it into a Tower Workflow! So while I could put the Ansible demo in one large playbook, the Tower Workflow is much cleaner and offers a lot more in terms of flexibility and scale. Check that out at well!

    Thanks, as always, for checking out my site and videos, and let me know if you have any questions or comments below.

    Matt

  • NetApp Virtual Storage Console – Migrating to a New Server

    I ran into this with a customer this week and felt it was a good note to point out for those that are migrating customers or themselves to a new version of vCenter. In this particular case, the customer was upgrading from vCenter 5.1 to 6.5 through another 5.5 upgrade, and additionally was going to migrate from the Windows-based vCenter to the VCSA.

    Like many NetApp customers that have been using NetApp as their datastore storage platform, the customer had the Virtual Storage Console (VSC) plugin for vCenter setup and running many backup jobs. The VSC software is installed on a Windows server and typically, customers that were running vCenter on Windows also installed the VSC on the same Windows server. This is fine except when the upgrade path will take the customer from the Windows-based vCenter to the appliance, where the vCenter migration will shut down the existing vCenter to make use of the original IP address and FQDN. In turn, this kills the VSC server at the same time.

    The best practice would be to migrate the VSC to a new server prior to do the upgrades, at least the migration from the WIndows-based vCenter to the appliance. The following article published below by NetApp still works in this scenario. But in short, you’re going to want to take a backup copy of the Repository directory that resides in the Virtual Storage Console directory under Program Files (C:\Program Files\NetApp\Virtual Storage Console\SMVI\server\repository).

    With that backup, you just need to make sure that you stop the two services for NetApp VSC, copy the directory to the new server, and start the services back up. I did not need to restart the vCenter server or services, but your mileage may vary as it is mentioned in the NetApp article.

    https://kb.netapp.com/app/answers/answer_view/a_id/1030422

    Thanks and good luck!

    Matt

     

  • Family Update

    It’s been a while since I posted something about the family. That’s because it’s been an extremely busy year thus far and I have neglected to post on the blog. But let’s give a quick update.

    Jameson is getting so big and really moving around now. Weighing in at just over 25 pounds, he’s starting to pick up new talents like throwing balls and moving around on his own feet while sitting in his little car or on his construction truck. It’s still a wonder to see how humans advance over time. Just when you think things are moving slowly, it suddenly jumps into fast mode with several new things happening at once.

    He also cut a new tooth and we feel there’s probably a few others coming in. You have to feel bad for them when they can’t express themselves verbally and there is something going on that’s probably painful. It breaks your heart in a way but you know they’ll get through it because you did, and with less medical science. Hopefully, this will all happen quickly for him.

    Lastly, he had his first haircut. I like how it looks, but I agree with my wife that I do miss his long hair at times. She really misses the long hair and makes sure to point that out to me frequently because it’s apparently my fault. 🙂 We’ll probably let it grow out and I’ll let her choose the next haircut appointment.

    The last bit of news is great as well. Melissa and I are expecting another baby. This time near Thanksgiving. We apparently like having kids near the holidays. This time around we’re having a girl. We are very excited. Hopefully, I’ll have the energy to handle this. 😀

    That’s it for now. All the best.

    Matt