A lengthy debate within the Data Science and Machine Learning communities is which programming language should take a precedent in your tool kit. Much of the debate stems from the utility of Python versus R, the languages’ flexibility and application in different environments and programming settings, and how easy it is to acquire the respective language.
At Untitled, we like and use both languages to solve different problems, however, we believe that everyone will tend to gravitate towards one over the other. This is not to say that it wouldn’t be worth it for you to become well versed in both languages, as they both have advantages.
Before picking a language for a project we look at four specific categories: the objective of the project, the learning curve of the language given our objectives, the packages that are available and how we will use them, and if/how we will need to leverage the supporting community of developers and Data Scientists.
Objective – Data Science or Machine Learning
The objective of any project is the most important determiner of which language we will use. In our opinion, R is a better descriptive analytics language (as it was built for staticians in mind). We like to use R for exploratory data analytics projects where we go on deep-dives of our clients data and need to parse through large tables quickly to get a sense for trends. R is a great utility for the quick and dirty work required to get a holistic understanding of a company’s first and third party data. On top of this, R is equipped with a wonderful arsenal of data visualization packages which expedite the process of publishing findings in a presentable way.
However, Python wins out for Machine Learning. There is a larger set of Machine Learning libraries that are easily compatible with Python, and it seems to be the industry preference to use Python for ML over R. Additionally, Python is necessary for implementation of the code into production or web applications so if the project goes beyond just analysis, we will be using Python.
Learning Curve: Python and R
The learning curve for Python is much smoother and easier to grasp early on. Programmers and Data Scientists who have acquired both languages will most likely tell you that Python was a much more linear learning trend comparatively to R. However, as stated previously, R will get you to publishable results fastor with its impressive suite of data visualization, so if speed to results is your aim, R will most likely be better suited. If pliability in other environments and settings is your goal, and time is in your favor, pick Python.
Both programming languages are packed full of functions, commands, and versatility. At the time of this writing, R currently has around 13,175 packages. Python has an astounding 155,345 packages. The chances of you using more than 500 of the available packages for each language throughout your career is probably very rare. With that said, both languages clearly have an impressive armory available to help with whatever problem you encounter and whatever solution you need.
Given that both programming languages are open source, Data Scientists and Machine Learning programmers will find vast support with whatever problems they encounter. On a regular basis, Untitled will visit R-Bloggers for tips and tricks pertaining to the R language and environments. For all problems related to Python, we like to start our solution searches on Planet Python. You’ll be greeted at both of these sites with vast amounts of knowledge, sample code and support, all for free.
As we stated at the beginning of this post, we encourage Data Scientists and Machine Learning programmers to acquire both languages as we see each to be advantageous to have in your tool kit. The Untitled team uses both Python and R for problem solving, and we encourage you to do the same.