[Update June 13, 2017: Based on user feedback, I’ve updated the code and this blog after testing steps 2-6 on both TensorFlow 0.12 (again) and TensorFlow 1.1, and running the iOS app in step 7 with the TensorFlow 1.0 source.]
Believe it or not, yesterday, Feb 18, 2017, was the first time I heard of and tried on my iPhone the Prisma app, an AI-based offline photo app with lots of art filters, and Apple’s App of the Year 2016. Sorry to disappoint you with my seemingly cave-living life (now I know that the Prisma app has been covered by “Hundreds of Publications, With the Number Soaring Daily”), but since you found me, I’ll reward you with a step-by-step guide on how to develop an iOS app just like Prisma, with regards to its deep learning and AI aspect - the iOS UI part I’d assume is easy for you.
So what did it take for me to figure out all the major deep learning related details and how to run it on iOS quickly and effectively like Prisma? It started with my review of Andrej Karpathy’s CS231n Lecture 9 on Neural Style last month, and this time I decided to look into the details of how neural style gets done. So I read the original paper A Neural Algorithm of Artistic Style published in Aug 2015, tried Justin Johnson’s Torch implementation of the algorithm, and a nice TensorFlow implementation. I also played with another simpler implementation, with helpful code comments, of neural style in TensorFlow, called Neural Style Painting.
Maybe indeed I have a deep liking for the cave-living lifestyle, or maybe it’s just a calling of my ancestors hundreds of thousands of years ago - I enjoy reading A Brief History of Humankind every time time I pick it up - I really prefer an offline access to the cool features of an app, if possible at all, so I can use the app anytime, anywhere. Because the original neural style algorithm requires optimization/training when adding a style to a content image, I managed to build a TensorFlow graph file and use it to actually do the training on iOS, after verifying that it is theoretically possible by following the example in this blog Training a TensorFlow graph in C++ API. As it takes a few minutes to do the training on a modern GPU (see the final note in my previous blog for detail), it naturally takes forever on iOS to run a training operation which minimizes the loss using AdamOptimizer.
Guess I have to accept a server based solution for now, like the app from DeepArt.io (somehow I found a couple of server-based neural style iOS apps at the time)? Then I don’t know if it’s good luck or a little bit more perseverance, I came across a Logan Engstrom’s Fast Style Transfer in TensorFlow project and, from there, Justin John’s paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution in March 2016, which states that “Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster.” Three orders of magnitude - that’s like 1,000 times faster! After about a week of reading the paper, debugging the code, dreaming and thinking about it last thing before going to sleep and first thing after waking up, I figured out how to build an iOS app that can add amazing styles to photos in about 7-8 seconds, without the need of an Internet connection, and each style, after the process of training, frozen and quantized, only adds about 1.7MB to an iOS app. Here’s how (the TensorFlow version I tested with is 0.12):
- Get the repo here which is a fork of Fast Style Transfer in TensorFlow, added with the iOS offline support summarized as follows:
- Replaced in
preds = tf.nn.tanh(conv_t3) * 150 + 255./2with
preds = tf.add(tf.nn.tanh(conv_t3) * 150, 255. / 2, name="preds")so we can refer to the output result using the name
- Added in
evaluate.pythe following lines after loading the trained checkpoint generated when running
saver = tf.train.Saver() saver.save(sess, "checkpoints_ios/fns.ckpt")
Note that the two lines have to be added after the completion of running
- A Python script to freeze the trained checkpoint with input image placeholder and output image name;
- iOS sample code that sends the input image and processes the output stylized image: the returned stylized image bitmap data gets converted to UIImage in the
- Replaced in
setup.shto download the pre-trained VGG model and the training dataset. Then run the following commands on a Terminal to create the checkpoint files containing both the graph and network parameter values (details are available the github repo’s “Training Style Transfer Networks” step):
mkdir checkpoints mkdir test_dir python style.py --style images/udnie.jpg --test images/ww1.jpg --test-dir test_dir --content-weight 1.5e1 --checkpoint-dir checkpoints --checkpoint-iterations 1000 --batch-size 10
mkdir checkpoints_iosand uncomment the two lines of code in
# saver = tf.train.Saver() # saver.save(sess, "checkpoints_ios/fns.ckpt")
evaluate.pyas follows to generate a new checkpoint with input image placeholder and output image name:
python evaluate.py --checkpoint checkpoints \ --in-path examples/content/dog.jpg \ --out-path examples/content/dog-output.jpg
python freeze.py --model_folder=checkpoints_ios --output_graph fst_frozen.pbto build a .pb file which combines the graph and the parameter values in the checkpoint. This will create a .pb file of about 6.7MB.
fst_frozen.pbfile to /tf_files, then cd to your TensorFlow source directory, run
bazel-bin/tensorflow/tools/quantization/quantize_graph --input=/tf_files/fst_frozen.pb --output_node_names=preds --output=/tf_files/fst_frozen_quantized.pb --mode=weights. This is the same step as step 6 of my other blog What Kind of Dog Is It - Using TensorFlow on Mobile Device and will reduce the size of the .pb file to about 1.7MB, which means that for an app size of about 135MB, the size of the Prisma app, you can put in about 60 styles for offline processing (the TensorFlow iOS library takes more than 20MB).
- Drag and drop the
fst_frozen_quantized.pbfile to the TensorFlow sample iOS Simple project or your own iOS app and refer to this repo’s iOS sample, modified based on the TensorFlow iOS simple project example, to see how to run a session with an image input and get and process the output stylized image. To check out the iOS sample, copy the
ios_simple_fstfolder to your TensorFlow source root’s
tensorflow/contrib/ios_examplesfolder (because our project refers relatively to the TensorFlow source and the library built there) and launch the Xcode project and run the app on iOS simulator or device to see the effect of adding the trained style filter in the
fst_frozen_quantized.pbto input images. Just tap on the “Run Model” button and you’ll see one of the the three images included in the project stylized in about 7 seconds.
Important You need to enter in the iOS code the same width and height of the –in-path image used for the
img_placeholder when running
evaluate.py in step 4 above, otherwise you’ll get an
Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed error when running the iOS app. If you use the example dog.jpg in step 4, then you don’t need to change
RunModelViewController.mm as it already has the right width and height set for you. To change the image width and height used in the iOS project, search in
const int wanted_width and
const int wanted_height and replace the two values with the width and height of the –in-path image used in step 4.
There you go, you can collect many awesome-looking images for your style images and train them one by one, then freeze and quantize them and add them to your iOS app, achieving the same offline AI effect as Apple’s Best iPhone App of 2016. I’m glad that I found Prisma yesterday and that it does pretty much what I planned to do - now I can save a few days of iOS and training work and move on to something new to me, or one day sooner or later, to something new to everyone. That’d be a more exciting time. But the journey to that day is also full of fun.