TensorFlow Deep Learning Machine ezDIY

If you’re thinking of buying or building your own deep learning machine with some nice GPU for training, you may have come across Andrej Karpathy’s tweet about his deep learning rig build, which was a little outdated, being published in Sep. 2015, or more recently Lukas Biewald’s Build a super fast deep learning machine for under $1,000, published on Feb. 1, 2017. I happened to start exploring training some CNN and RNN models since mid-January on my own machine, and am pretty happy with what I ended up with - a very cost effective Dell PC (Intel i7-6700 Processor 3.4GHz CPU, 16GB memory, 2TB Sata hard drive, 4GB Nvidia GTX 745) for $849 and a Nvidia GTX 8GB 1070 GPU for $388 - the total is $1237, $237 more than Lukas’s machine but his GPU is 3GB GTX 1060 (about $200 less than 8GB GTX 1070, and 1070 is about 50% faster than 1060), 8GB memory ($129), 1TB hard drive ($50), and Intel i5-6600 3.3GHz CPU. So with $237 more, you get the value of about $400, plus you just need to replace the GPU instead of building the whole machine from scratch, and the Dell machine comes with Windows 10 Professional, which may sound not too sexy or geeky but its remote desktop feature can definitely make your life more enjoyable (I’ll talk more about this later).

Here’s a quick summary of my past month of experience of setting up my own deep learning machine:

First, I bought a Dell XPS Tower with 8GB Nvidia GTX 1070 from Costco for $1,499 (after $150 discount from their current $1,649 in early January).

I tried to install Ubuntu alongside with Windows 10 for a couple of days, battled with BIOS, UEFI, partitions, etc, but was unable to get it done. Meanwhile, installing TensorFlow (both its GPU and CPU versions, using Anaconda, for performance comparisons) on Windows was a breeze, so was the Nvidia CUDA toolkit and cuDNN installation. I planned to install Torch and Caffe along with TensorFlow on Ubuntu, but I was getting more and more comfortable with TensorFlow and found that some well-known projects such as Andrej’s char-rnn and Justin Johnson’s Neural Style have all been ported to TensorFlow, so I decided to stay with Windows and TensorFlow for now. For Ubuntu, I do hope Madonna is right: “It takes more strength to cry, admit defeat.”

Then I found on Costco another $650-less Dell for only $849. It has the same 16GB memory and 2TB hard drive, but a slightly slower 3.4GHz CPU and 4GB NVIDIA GTX 745 GPU. I was curious to test run some CNN and RNN training and compare the performance of GTX 1070 and 745, and found that, when running the training of a neural style model or another one, or the fast neural style one, the GTX 745 and 1070 comparison benchmark is pretty accurate: 1070 is indeed 7-8 times faster. By the way, you can see from the GPU benchmark site that 1080 is about 26% faster than 1070, with about $200 more.

So it’s obvious that GTX 745 is way too slow, compared with 1070. So I bought EVGA GeForce GTX 1070 from Amazon for $388. Replacing the GTX 745 that came with the Dell machine with 1070 was pretty easy - although I wondered for a moment if 1070 would fit in the Dell case as the size of 1070 is several times bigger than that of 745!

Not surprisingly, running the training on the two Dell machines both with GTX 1070 GPU takes about the same time. But there’s another nice benefit that comes with the cheaper Dell machine: it has Windows 10 Professional built in; the more expensive one only comes with Windows 10 Home. I’d really like to remote control my Windows PC from my MacBook Air or Pro when I prefer to lean back and give my neck a nice rest, and only Windows 10 Professional lets your PC be controlled remotely - just go to Apple’s App Store and download the free Microsoft Remote Desktop app. There’re some other possible remote control solutions that may apply to Windows 10 Home but I don’t feel too comfortable with them.

Some final notes on GPU/CPU performance tests:

Running python neural_style.py of the TensorFlow Neural Style project takes about 4 minutes on GTX 1070, compared to 26 minutes on 745, and 2.5 minutes on Titan X (Pascal) as specified by the project author. This is consistent with the GPU benchmark of 1070 vs Titan X Pascal. But Titan X Pascal is about $1,100 more expensive than 1070. Even the Titan X Maxwell (a previous generation of GPU architecture before Pascal), slightly slower than 1070, is $700 more expensive.
Running python train.py of the TensorFlow char-rnn project takes 0.039 second per batch on GTX 1070, 0.063 per batch on GTX 745 (but it takes much longer on GTX 745 than on 1070 for the CNN training as shown above), 0.094 on $849 Windows i7-6700 Processor 3.4GHz CPU, and 0.110 on the latest MacBook Pro with 2.6GHz quad-core Intel Core i7 processor and cost over $3,000.

Another surprisingly positive impression to me is Windows 10. I pretty much abandoned Windows since I entered the iOS world in 2009 but my recent renewed experience with Windows has been nothing but pleasant. Of course, I still can’t live without Mac and iOS as deep learning and AI on mobile, anytime and anywhere, is one of my main goals, but I’ve come to realize that sometimes the best indeed comes out of a mix of many different things - this calls for an open mind, a mind knowing that even a closed system or a system that used to suck may also be a nice integral part of a great open and complete system.

And, fortunately my wife won’t read this blog of mine or check my home office carefully (Happy Valentine’s Day, Sweetie!) - I plan to keep both Dell systems, to try distributed training one day and to avoid single point of failure of my deep learning training effort. But if you just need to start with a decent deep learning machine with a very cost-effective GPU, I’d recommend the nice Dell XPS and GTX 1070 for $1,237 - actually less than $1,200, as you should be able to sell the GTX 745 for over $50 on eBay. Guess I’m really trying to move towards Lukas’s “Building a super fast deep learning machine for under $1,000”.

[Update Feb. 15, 2017] I just finished my day of watching live streaming of TensorFlow Dev Summit 2017. What an exciting day and an even more exciting journey ahead! Now I’m even more convinced that my Dell machines with GTX 1070 GPU powered by TensorFlow on Windows will keep me happy and happily challenged for a long time.

[Update Feb. 23, 2017] While I was playing with TensorFlow Magenta and Image Captioning, I found Windows indeed doesn’t have good support for Bazel, the TensorFlow build tool. That seems a real problem for some TensorFlow projects that relies on Bazel to run. Besides, guess I’m after all a geek deep down, and I still miss the good old time of using RedHat and other Linux servers. Hours later, on my Dell XPS Tower (aka XPS 8910) with 8GB Nvidia GTX 1070, by a magical command sudo apt-get remove dmraid issued on my trial Ubuntu launched from my USB stick, I was able to see options such as “Install Ubuntu alongside with Windows Boot Manager” and “Erase disk and install Ubuntu” when installing Ubuntu 16.04.1 LTS. I didn’t care Windows Home too much so I chose the “Erase disk” option. After that, installing the Nvidia CUDA and cuDNN libraries and TensorFlow went pretty smoothly and I’m able to use Bazel and run Magenta and im2txt on my Dell Ubuntu now. Still, Windows is a good platform for many TensorFlow projects, and I bet in the near future Bazel’s support for Windows should be good enough, or maybe I should just try to fix the Bazel build errors on Windows, if I’m bothered by it big time. Right now, I’m pretty happy living in the harmonious world of Mac, Windows and Ubuntu :)

[Final Update Feb. 26, 2017] To make it perfect and as a final step, I successfully installed Ubuntu 16.04.1 LTS alongside my Windows 10 Pro for dual boot on my Dell XPS 8900 of $849 with GeForce GTX 1070, and then installed the Nvidia driver and toolkit and TensorFlow 1.0 GPU version. I tried the training of im2txt, fast neural style, char rnn, etc and it all went nicely. Just two hiccups I had to overcome on Dell XPS 8900 (a little different from the $1,499 Dell XPS 8910): I selected “Something else” in the Ubuntu Installation type menu so my about 1TB shrinked volume from Windows can be selected by Ubuntu for installation target disk; I added pci=noaer to my Ubuntu kernel command line in /etc/default/grub to avoid endless logging. Now, I can’t be happier with what I have and what it can do, and I’ll leave Madonna’s song for something else, like some ultimate AI challenge, but maybe also just for a little while.