The View of TensorFlow vs PyTorch from the Deep Learning Window of cursed 2020š®
Hello there š
Hope you are keeping up well with this new normal and staying safe in this pandemic. š
The motivation of this article is to put some light on the long-running cold war between PyTorch and TensorFlow from an ML Engineer point of view. On the internet, most of the articles, I could find on this topic were full of old TensorFlow capabilities ignoring the advancements started from Sep 2019, yes the birth of TF 2.0.
Where did it start?
TensorFlow is developed by Google Brain and actively used at Google both for research and production needs. Its closed-source predecessor is called DistBelief.
PyTorch is a cousin of the lua-based Torch framework which was developed and used at Facebook. However, PyTorch is not a simple set of wrappers to support popular language, it was rewritten and tailored to be fast and feel native.
1. Setting up the environment utilizing GPU...
Well before we go to the uses, we need to have these libraries installed in our system properly and in 2020 we better have it with GPU, well RTX 3000 series just released to get you the real boost, you know šø, RTX 2000 series users donāt feel bad though seeing the last emoji, after all, it has been by your for all this long time š.
Letās get started by setting up the TensorFlow environment.
Itās always a best practice to set up different python-venv/conda-environment before going with the installation. TensorFlow does not provide backward compatibility š£, you need to have the exact CUDA, cuDNN version mentioned here for a specific version of TensorFlow.
Issue example:
Create a new environment in conda first and activate it.
conda install -c conda-forge tensorflow-gpu=1.12.0
This will install older TensorFlow-GPU version 1.12 with Cuda 9.0.0 and cuDNN 7.6.5. Now just do pip install for the latest TensorFlow as mentioned in TensorFlow-site
I used, pip install tf-nightly
This will install the latest TensorFlow.
Now in your system you have the latest TensorFlow and older Cuda and cuDNN, so just to verify whether the backward compatibility is there or not, try to import TensorFlow and you will face this error.
P.S., the Windows machine mentioned in the screenshot above is used just for demonstration of the issue, which is common in all platforms, however, the Production systems are Linux based.
For this try conda installation of TensorFlow which will add the necessary cuDNN and Cuda packages which the TensorFlow version was built with.
conda install -c conda-forge tensorflow
Done with TensorFlow installation šµ
Letās get started by setting up the PyTorch environment.
PyTorch releases separate builds for different CUDA versions.
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
Done with PyTorch installation š
Winner: PyTorch
2. Ramp-up Time:
TensorFlow uses symbolic programming, PyTorch uses Imperative Programming.
With TensorFlow, we know that the graph is compiled first and then we get the graph output. Code snippet of basic addition
The same addition using PyTorchās Imperative programming.
With TensorFlow 2.0, a lot has changed since then, now we have eager execution enabled by default, which introduces the environment of Imperative Programming. TF2.0 is all about ease of use, and simplicity.
So there is no more requirement of managing those placeholders, sessions manually to execute operations.
And. . . Thatās done.
Winner: Tie
3. Graph Creation
The computation graph was a major design difference between the two frameworks to start with.
TensorFlow adopted a static Computation Graph approach where one defines the sequence of computation that one wants to do, with a placeholder for the data. After that for training / running the model you feed in the data. A static computation graph is great for performance and ability to run on different devices (CPU / GPU / TPU) but is a major pain to debug.
PyTorch on the other hand adopted a dynamic computation graph approach, where computations are done line by line as the code is interpreted. This makes it a lot easier to debug the code, and also offers other benefits ā for example supporting variable-length inputs in models like RNN.
Fast forward to 2020, TensorFlow 2.0 introduced the facility to build the dynamic computation graph through a major shift away from static graphs to eager execution, and PyTorch allows the building of static computational graph, so you kind of have both static/dynamic modes in both the frameworks now.
Winner: Tie
4. Serialization
PyTorch serves a simple API that saves all the weights of the model or pickle the entire class.
TensorFlow also offers a significant advantage that the entire graph can be saved as a protocol buffer, including parameters and operations as well. Also, other supported languages such as C++ and Java can load the graph; this is critical for deployment stacks where Python is not offered. It is also useful when the user changes the model source code and but wants to run old models.
Winner: TensorFlow
5. Documentation
Both of the frameworks are well documented. There is one thing which makes TensorFlow best, which is that the PyTorch C library is mostly undocumented. However, this will only matters when writing a custom C extension and perhaps if contributing to the software overall.
If we can constraint on the code to be only in Python, then . . .
6. Language Support:
TensorFlow provides stable Python (for version 3.7 across all platforms) and C APIs; and without API backward compatibility guarantee: C++, Go, Java, JavaScript, and Swift (early release).
PyTorch, on the other hand, is written specifically for Python however we have Torch(base library) support for C/C++.
Winner: TensorFlow
7. Device Management
In the case of TensorFlow, users donāt need to define anything for different devices since all the default settings are pre-set. It will automatically assume that the user wants to be on the GPU if there is a GPU available. Though it also means that even if one GPU is already in use, it will consume the memory of all the available GPUs.
In the case of PyTorch, you would have to move all the data onto the device to run on that particular device.
Winner: TensorFlow
8. Training
And here comes one of the most important thing. . .
After the last 5 boring theoretical comparisons, here returns the code.
Will train a shallow ConvNet Classifier on MNIST, with a very basic layer structure.
Winner: TensorFlow(simplicity of training due to Keras)
9. Debugging
Well, as the training goes on, does the debugging.
PyTorch is based on the dynamic computational graph, so debugging is a lot easier by using any Python debugging tools such as pdb.
In TF 2.0, there are no placeholders, no sessions, and no feed dictionaries. Because operations are executed immediately, you can use (and differentiate through) if statements and for loops (no more tf.cond or tf.while_loop). You can also use whatever Python data structures you like, and debug your programs with print statements and pdb.
Tensorboard can also be used with both of the frameworks.
Winner: Tie
10. Deployment
Now, last but not least the deployment which actually shows the real capabilities of TensorFlow when used in a production environment.
With TensorFlow, we have TensorFlow-Serve to deploy our model in a seamless way and also to manage different versions of it, with the product workload being managed by the library itself. You can refer here for the well-guided TensorFlow tutorial on TensorFlow-Serve.
However, when we come to PyTorch we have this newly introduced package PyTorch-Serve released in June 2020 yet to mature.
Anyway, the cloud is there for rescue, if you have an AWS subscription you can actually leverage Sagemaker(platform to train and deploy your ML models) Python SDK PyTorch estimators to serve the model. We can leverage AWS Sagemaker for TensorFlow also TensorFlow Estimator.
We can deploy TensorFlow Lite (which are lighter versions of the original TensorFlow models, with minimal to no reduction in accuracy) on Mobile/IoT devices. We can also integrate TensorFlow models to run in the browser itself with TensorFlow JS.
Coming to PyTorch we have mobile support only with PyTorch Mobile.
Winner: TensorFlow
11. Community Support
In terms of community support both the framework has a really large no. of active users and developers.
From the production window though, TensorFlow Community is bigger and enriched with very active communicators, whereas PyTorch has a very active research community.
Winner: TensorFlow(Production Community)
Conclusion:
The main motivation for this article was to demystify the TensorFlow advances which give it a higher edge on top of PyTorch. When I was checking the differences myself which were available on the internet and could find mostly biased due to the TensorFlow 1.x workings, there is little to minimal existence of actual comparison with TF2.0.
To conclude, we can see around 7 topics to be on TensorFlow's heavy side.
Although on the installation dependencies of TensorFlow, we can still have an upper edge using TensorFlow prebuilt docker containers.
docker pull tensorflow/tensorflow:latest # Download latest stable image
docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server
So from the open window of production and with the latest advancements of TensorFlow till now, we can see TensorFlow as the best choice for production work.
However, PyTorch fans, obviously you know PyTorch is still the way to go for Research Work. š
TensorFlow is extending its capability on the research side as well.
One of the recent beauties I came across is the Knowledge Distillation with TensorFlow as well as PyTorch implementation which uses a larger well-trained Teacher model to teach a light weighted Student model. Itās like how our teacher teaches us any new concept, similarly, a small DL model(Student) will learn from an experienced DL model(Teacher) with the philosophy of disclosing one complexity at a time to boost the performance of the smaller model. This is just the intro, feeling the nerves going insane to know more about this? checkout this beautifully explained concept here.
Stay well, stay safe, Happy Deep Learning š
References: