The Risk of Offline Models
This is part 8 of a 9 part series on Machine Learning. Our goal is to provide you with a thorough understanding of Machine Learning, different ways it can be applied to your business, and how to begin implementations of Machine Learning within your organization through the assistance of Untitled. This series is not by any means limited to only those with a technical pedigree. The objective is to provide a volume of content that will be informative and practical for a wide array of readers. We hope you enjoy and please do not hesitate to reach out with any questions. To start from part 1, please click here.
Definition Of An Offline Model
When Untitled refers to offline models, we mean the models that we build, train, and tune without consideration of their future production environments. As we’ve spent time researching how other advanced analytics consulting firms engage with their clients, we’ve seen one common mistake that is potentially avoidable: building the wrong solution given a company’s current data position.
This can be avoided by listening and asking the right questions up front. We’ll outline three important questions to ask when looking at the risk of offline models, as well as the general mentality to take when entering into an organization for a Machine Learning engagement.
#1. Is There a Foundation for Advanced Analytics?
This is the most important question to ask when evaluating the risk of offline models or before you even consider pitching a predictive model to a company. I’ve watched this play out first hand. It often goes like this:
Client: “We’re having trouble keeping track of inventory, which often leads to over-selling our product, or misforcasting demand which entails cash flow deficiencies. Can you help with that?”
Consultant: “Sure we can, how about we use a deep learning model, say TensorFlows recurrent neural network, which will be able to predict the next 6 months of inventory levels.”
Client: “Sounds awesome, what do you need to get started?”
Consultant: “Inventory level counts by day, for the last 3 years.”
Client: “I just told you we don’t have a way to keep track of our current inventory levels, that’s why we brought you in. If we had that data, why would we be engaging with you?”
The consultant will then proceed to try and sell the client on the future of Deep Learning, and why they would be insane not to implement a predictive model. In some cases, they’ll even succeed in this pitch, leading to high implementation costs and ultimately an offline model that never makes it to production. Not to mention, a build that does not solve the client’s current needs.
For large organizations that have their own internal data science and business intelligence teams, this will be extremely frustrating, especially considering it will be a decision that is pushed down to them to help implement from their non-technical boss.
If your goal is one-and-done projects with organizations, sure this system could work. However, Untitled desires to be a long term advanced analytics partner for any company we engage with. If that means walking them through the data analytics hierarchy of needs or setting up systems for descriptive analytics (like real-time counts of inventory levels) so be it. We want to assist in every stage of our clients’ data journey.
#2. What Are Teams Working on Internally That Your Model Can Augment, Not Curtail?
The larger the company, the more likely they will already have an internal data science team. We have seen situations in which individuals very high up in organizations do not know that these teams exist, but they do. Nonetheless, some of the most talented data scientists that Untitled knows are at large organizations and are doing amazing work. Sometimes you’ll find your way into these companies where the first stopping point is a manager of data science at the organization.
The fastest way to be thrown out as a vendor is to see these folks as a barrier and not as a collaborator. Data scientists and data engineering teams more often than not have an extreme backlog of projects they want to be working on (or recognize the importance to work on) but simply can’t due to the team’s bandwidth. For every internal data science team we talk to, we start with two critical questions.
First, “What is the most frustrating part of your job that we can help with?” This often will entail the opportunity to help implement new tools, speed up system architecture, or even educate their bosses on why analytics teams need bigger budgets and how they will affect the bottom line. Second, “What backlog of projects are on your docket that you don’t have time to get to for at least another quarter?”
Something amazing happens when you ask this question: you usually get to work on something right away. The reason? The data science team will sell you up and through the organization because they view you as a collaborator and not a barrier to their own internal goals.
#3. Have You Factored Their Production Environment And Compliance Protocols?
Last, but not least, an important question to consider is the organization’s current production environment and overall IT security protocols. More often than not, the bigger the organization the more red tape there will be in model implementation.
This also means that if they do in fact have an internal data science or BI team, those individuals, even if they are your collaborators, will be pretty far removed from production code. Big companies are reactionary to changes in technology, and rightfully so.
There is a lot more risk for a multi-billion dollar company than a small business to implement new technology. Some of the largest companies we know and collaborate with still have the majority of their code running in on-premise data centers.
The thought of cloud-based computing makes IT’s spines shiver, but there are ways around this. Maybe that means converting your python-based model you built to a language native in their stack or running your model semi-offline (just touch the cloud components of the business, or simulate the model as if it were in production over a period of time) and prove its efficacy over a 6 month time frame to show IT the viability of the implementation. These are all things to consider when engaging companies on a Machine Learning project.
We hope you enjoyed this post regarding the risk of offline models, and we hope you will utilize it as a road map for your next advanced analytics project. The more projects Untitled has the opportunity to work on, the more we learn about how to help companies with their long-term data analytics goals.
If you are interested in starting on a Machine Learning project today or would like to learn more about how Untitled can assist your company with data analytics strategies, please reach out to us through the start form.
Check out part nine of this series.