Machine Learning In The Cloud
This is part 9 of a 9 part series on Machine Learning. Our goal is to provide you with a thorough understanding of Machine Learning, different ways it can be applied to your business, and how to begin implementations of Machine Learning within your organization through the assistance of Untitled. This series is not by any means limited to only those with a technical pedigree. The objective is to provide a volume of content that will be informative and practical for a wide array of readers. We hope you enjoy and please do not hesitate to reach out with any questions. To start from part 1, please click here.
Welcome to the final post in the Untitled Machine Learning series. In this article, we’ll be discussing Machine Learning in the cloud. We’ll start with some high level concepts, discuss the current Machine Learning as a Service (MLaaS) landscape, and wrap up with some tools we use to assist with workflow and implementation at Untitled.
A Conceptual Framework for Cloud-Based Machine Learning
Sadly enough, a majority of the young data-scientists we’ve met have never worked on ML within a cloud environment. Because of this, the amazing projects and models they’ve built never really reach beyond the analysis and offline phases. The cloud provides the best framework for scaled model development and rapid deployment.
Additionally, because training models like a Neural Network rely heavily on graphics processing units (GPUs), a cloud environment to scale GPU workers as needed is often much cheaper than assembling a powerful computer and will speed up your model deployment process. Models that could take days to train on a home computer can be reduced to hours (or minutes depending on your budget) with a cloud backbone.
Something to consider when using a cloud application is that a lot of the existing infrastructure services such as AWS, Microsoft Azure, IBM Watson Studio and Google BigQuery have built in pre-processing as a part of the platforms they offer. Most data scientists will tell you themselves that pre-processing and table construction consumes about 80% of the time that actually goes into building a model.
By taking that element largely out of the equation, data scientists can spend more time thinking about the models they build and deploy and less time doing grunt work to get to a workable scaled data set.
Machine Learning as a Service
MLaaS breaks down into three primary categories: ML infrastructure, automated ML tools, and ML packages for process improvement, auditing and testing. Like most ‘as a Service’ products, these tools are usually paid for on a monthly subscription basis or pay as you go model, which makes experimentation loads of fun and budget friendly. Prior to launching into the MLaaS landscape, we highly recommend doing your research and homework.
There are many platforms out there, some of which have extreme niche application and some that will allow for broad stroke approaches. As it goes with any development problem, pick the solution that solves the problem at hand, or in the case of choosing ML infrastructure, a solution that supports the frameworks and libraries you desire to use.
Tools & Packages to Enhance Workflow
Below are some of the tools Untitled utilizes internally for cloud-based machine learning processes. We won’t go too in depth about how we apply each, but we certainly encourage our readers to explore the links.
MLflow is a wonderful tool to use when you have multiple individuals collaborating on a cloud machine learning project. This tool is great for version control, testing models, and keeping track of the overall machine learning life cycle.
Interested in leveraging a platform that can quickly enrich your data models with off-the-shelf data repositories? This tool has built a wonderful interface for doing that. Often times, machine learning models grow to be exponentially more dynamic with the addition of just a few key variables. Explorium will do just that.
There are plenty of automated machine learning platforms available on the market, but for the time being, most of these platforms have extremely costly barriers to entry. However, even though h2o is a paid tool, we’ve found it to be more affordable than comparables like DataRobot. h20 is a lot of fun for beginners in machine learning to play around with, and provides an interface for the “more statistician, less software developer” data scientist archetype to build and deploy models.
It is hard to spend any time reading about data science tools on the internet and not come across Jupyter. Every data scientist should have at least some familiarity with Jupyter. This tool will be a tremendous help for anyone working in the field of data science, plus it’s for beginners and experts alike. If you haven’t already, visit the Jupyter link and launch a cloud notebook to work from. You’ll find the user experience of the platform delightful to work in.
That is a wrap to the Untitled Machine Learning series. Whether you read through this series in a linear fashion, or intermittently with posts catered to specific solutions you were seeking, we hope you enjoyed the content and took something away from it. This will not be the end of our content and posting regarding machine learning.
We plan to utilize future posts to highlight strategies, specific use-cases, new packages, tools available, and more. However, we hope this series has formulated a foundation for readers on how Untitled thinks about and utilizes machine learning as an advanced analytics firm.
If you are interested in starting on a machine learning project today or would like to learn more about how Untitled can assist your company with data analytics strategies, please reach out to us through the contact form. Additionally, if this series has prompted you to any follow on thoughts or questions, we’d love to hear them. Please reach out to us through your preferred method to start a conversation about machine learning.