Machine learning has gained wide popularity and every day you can hear about a new way of its application. The technology is very promising, but everything has its complications. Machine learning isn’t an exception. Very often developers and managers face misestimation of its capabilities and the complexities of its implementation. To organize an effective development process, it is important not only to understand the needs of the customer but also to form the right customer expectations providing the details of ML features, peculiarities of implementation, and advantages/disadvantages.
What is a Machine Learning Process?
Machine learning can automatically create and refresh an algorithm for solving a complex problem based on a large volume of data. It’s not necessary to search for business-important patterns. Instead, a team of qualified professionals should prepare the right data set for training and automate the entire data processing and application process. But at the same time, it is very important to ensure the quality of data and continuous quality control of algorithms.
Example
Task:
Create the recommended goods section in an online store
Possible Solution:
You can recommend bestsellers to everyone. You can interview sales experts who will suggest the buyer of dog goods to offer a leash and a collar. Unfortunately very often such rules set becomes too large and they sometimes contradict each other, and many of them are no longer relevant. So the development process turns into chaos.
Machine Learning Solution:
There is a border beyond which machine learning is more efficient. It is easier to take information about sales and process it with machine learning algorithms. These algorithms will create a so-called machine learning model: a set of rules, mathematical formulas, etc based on the actual user’s behavior. These rules solve problems by themselves, for example, to recommend relevant products. A very important point: there should be a lot of data processed and prepared in advance, and algorithms – effective just in this task.
Checklist: How to Organize the Process of Building an ML Model
Machine Learning has proved to be a very effective tool for solving routine problems, starting from text recognition and machine translation and ending up with shopping recommendations as well as using chatbots for customers support.
Pitfalls to Avoid while Working on a Machine Learning Component:
- a certain period to clarify requirements and expectations within the team;
- some personal interaction with the development team during the data construction process unless the latter is already available for the domain;
- money spent on computational resources for model training purposes;
- a long period of time for improving the system’s performance which may not guarantee a perfect result.
To avoid all the issues mentioned above, follow this checklist:
Point 1. Have you collected all the data sets?
Using a machine-learning component it’s necessary to keep in mind two important issues:
- You need to collect a large amount of data;
- Be aware that Big Data will not always be of good quality.
Practically useful models are usually complex since they have to cover many details of the business. Machine Learning requires a lot of examples to detect these details:
- Are the examples used within an ML model different enough?
- Do they cover all types of customers?
- Are you sure your data isn’t out of date?
After all, both the situation in the market and the behavior of customers could change since we’ve started to collect the necessary information. All the data should be verified.
Let’s take a look at just one case. A retail company can collect a huge array of data on individual sales. How useful is this information? You can extract important data on the seasonality of demand and the changes in the market. But at the same time, it becomes clear that this information is very monotonous if the buyer remains anonymous. Increasing data size by 10 times does not lead to the same increase in the number of useful facts.
Point 2. Tip: just in case you do not have all the data
When the volume of collected data isn’t enough, you have two options:
1) Clarify the task and adjust the goals;
2) Collect the missing data again.
Start with updating and expanding the existing data collection system. For example, a retail company may offer personal discount cards and identify the next buyer purchases. A tiny loss of revenue may be a reasonable price for a significant improvement in data quality. You can determine buyers’ typical shopping sequences that are weeks apart. It is also possible to evaluate the effectiveness of advertising campaigns and form individual offers. Unfortunately, it can take time to be carried out.
Sometimes it is possible to buy data from other companies or take it from crowdsourcing platforms such as Amazon Mechanical Turk. For example, in the case of financial models, you can get anonymous data from the credit bureau. Machine Learning itself also offers a possible solution in the face of Transfer Learning, a model trained to solve one problem can be re-trained to solve similar ones with fewer data.
Point 3. You got a properly built data set. What is next?
So, you’ve collected enough data. Can you proceed with creating the actual model? From the data scientist’s point of view, everything is ready for work. But right at this step, there is a risk of losing control over the development, because the modeling is quite a complex process. Often arising misunderstandings of what is happening can even lead to a breach of customers’ trust. Therefore, we always suggest you take two very important steps:
A. create a baseline model. Developing the first production model can take a considerable amount of time. You can start with a simple model that (roughly) requires just a few formulas in Excel instead of a deep neural network, filtering keywords instead of complex natural language processing algorithms. A simple, understandable model is a reliable starting point.
B. develop indicators of “goodness” of the model. You can control and improve only what you are able to measure. Therefore, you should spend time developing so-called model quality metrics. Such as an example, a sales forecast error, or the difference between the expected increase in the number of customers and real ones. Then you will be able to objectively evaluate the effectiveness of both the new idea and even the most insignificant modification in the developed components.

What do you get as a result? A clear understanding of where you were at the beginning and whether you are moving in the right direction.
Point 4. The first version of your model is complete. Is machine learning done?
The first working model doesn’t only show our success. At this point, it is usually possible to get rid of temporary solutions and major technical difficulties. And, paradoxically, after that, all the shortcomings are also better identified. That’s why it turns out that the first model takes only 25% of the time and work effort. The remaining resources will be spent on enhancing this version, getting more data, and fixing the ML component for preprocessing. The better this component will be, the more effort it will take to improve it.
What has changed from the previous step? You now have a new starting point, but you have the same indicators (metrics), that we will continue to improve.
Point 5. How to maintain the ML model?
You should not expect that the customer’s behavior won’t change. The external environment and market will also be inevitably changing. Any model will need to receive new data and be regularly updated. At this stage, you will need a set of metrics to control the model’s quality. It is very important to keep in mind the following:
1) It is enough to retrain the model on new data to make sure the quality of work will remain at an acceptable level. Let’s continue maintaining this model.
2) Something is dramatically changed and it is necessary to add completely new data. This means that it’s necessary to refine the model by going back to the top of this checklist.
Advantages & Disadvantages of Machine Learning Component
It takes a long time to implement. It is expensive. It needs data and computational resources. Why do you still want to use Machine Learning? It is all because of its extensibility and maintainability.
If a handwritten system deals with data that changes over time, the latter needs to be rewritten. A machine learning system requires fresh training logs to retrain and update the model. If a handwritten entity recognition system needs to work on another language, it has to be rewritten. A machine-learning component will just need an initial set of data in another language to be retrained.
Moreover, it’s generally hard to maintain handwritten systems because every once and a while the system may crash or end up at a point when it gets so complicated that it has to be rewritten from scratch because nobody understands the old one.
Machine learning is constantly expanding its capabilities. Every day you can hear about the automation of a new process in another industry that was not possible before.
To sum it up, there are the basic requirements for the successful development of the Machine Learning component:
- Check to see if there is enough data;
- Find the ways to supplement your data set if you need to;
- Consider creating development controls such as a metrics system and baseline model;
- Remember that the process does not end with the first working model, this is just the beginning;
- Even a successful component must be monitored and improved.