Multi Model Self Learning Machines : Big Data for Customer Value Analysis
Let us start with an example, we often face in any business vertical.
- Traditionally we consider Customer Basic Information and models for Customer Comprehensive Value Score, Customer Churn Propensity, Life cycle Stage Prediction etc.
- But .. How different actions can be recommended for segments which are
- Dominated by customer score (e.g. high value customers)
- Dominated by both score and churn (e.g. high value customers with churn propensity)
- Subsequently, how system can refine the segmentation models and recommendation models by incorporating additional loop back information
- Date/Time when the action reached the customer
- Customer’s Location
- Customer’s Device
- Customer’s account Balance
- Immediately Preceding Events (e.g. recharge, browsing, mobility etc.)
In the following we briefly summarize One Basic and Three Advanced Techniques for Customer Value Management by Segmentation and Profiling.
The first basic rule is very simple: It is based on tagging and segmentation rules based on the tags. These tags are deduced from descriptive information we already know about the customer.
The next level is a uni model silo-ed approach which takes into account customer information from a single set of attributes and single set of prediction models for computed attributes.
The next higher level of maturity is shown when the segmentation depends on multiple models often competing or collaborating to develop deeper insights from the customer behavior.
The ultimate level of maturity is shown these tagging and segmentation rules are self learning and self optimized based on incremental changes captured from the customer's environment and his interaction with the business. As shown in the following two diagrams:
So, How do these methods compare among each other, as we try to summarize below:
This brings us to a set of requirements that stand to be fulfilled by any learning platform. Below we example such key 6 requirements.
1. Automatic Algorithm Selection
It is a normal practice to apply different ML algorithms to the same problem, evaluate the results (model validation) and then select the most appropriate ones. Typical examples -manually selecting between boosted tree ensembles and decision trees, or selecting between Naive Bayes and Random Forest etc.
It is common to apply an algorithm in current month's data and a different one next month because the model validation results change.
In the 'Self Learning Toolkit', the model should be able to select the best algorithm according to some pre configured 'goals'. Goals may be - 'best accuracy', 'best sensitivity', 'best performance' etc.
2. Feature Learning
There are two scenarios:
Scenario 1: Currently features are decided by persons and normally methods such as PCA, SVD, Rho etc. are applied to identify the most dominant features from a data set. Then we build a model using those features. As the data set changes with time, the dominant features may also change. Machine should have the intelligence to re-adjust to the newly discovered dominant features or at least should be able to alert Data Scientists about this (in short physical, semantic and stochastic quality of data)
Scenario 2: A prediction algorithm which works with a certain set of features, need to be refined and made more accurate by feedback of additional features in form of results. A typical case is in the following flow.
Customer Value Score -> Micro Segment -> Action -> Result Feedback
In this flow Attributes about the Action taken and the Attributes of the Result Feedback may be added to the original feature set of Customer Value Score and optimize it further. The Goal is to make the result feedback as close as possible to the intention of the action (e.g. Result = 'Conversion Successful' when Action is 'offer a product')
Challenge: In this scenario, the model starts with lesser number of features and as more data is generated, it adds new features to itself
3. Hyper Parameter Optimization
Hyper-Parameters are parameters of the model or algorithm which are often left as it is (default value). Typical examples are - Number of bags, number of trees, iteration limit, Gini Index, K-folding etc. These parameters are to be tuned according the desired outcome and algorithm. Tuning Hyper Parameters can improve the performance and accuracy of the algorithm. In self-learning methodology, the model itself can learn to optimize the algorithm hyper-parameters for best results.
Alternatively, the platform may be able to alert data scientists if certain hyper parameters need tuning
4. Training Sample Auto Optimization
Training Sample (which is sub divided into learning set and validation set) play an important role in supervised learning models such as classifiers and regressors. Depending on the algorithm that we select, the training sample may need to conform to certain characteristics. For example, Naive Bayes learns best if the training data is normally distributed (Gaussian), In some cases training data sets need to minimally sparse and so on. The problem is traditional CRISP-DM suggests we prepare the training data set before we select the model.
But in reality we must tune or optimize the training data set according to the model to have the best outcome. The expectation is the model should be able to sample the training data automatically and intelligently or at least should be able to alert the data scientists if some non-optimal situation is observed
5. Reinforcement Learning
Traditionally Machine Learning algorithms can support supervised and unsupervised learning. In reinforcement learning, the model tries to take actions (outputs) to maximize the cumulative rewards. There is an automatic feedback loop of the results into the model makes the model to tune its internal rules or policies.
Although RL is a large umbrella topic, in the scope of a customer we can explore the possibility of implementing a Markovian Decision Process (such as using Q-Learning algorithm) to decide on the best offer or marketing action for a customer (belonging to a certain segment and having a certain customer value score). For example if offer 1 is not accepted, then issue offer 2 as recommendation if not then issue offer 3 etc. and the decision to make the offer is dynamic, dependent on machine learning and not fixed policies.
As a simpler alternative, simulated learning may be developed (such as Monte Carlo Simulations) and the action policy can be designed by the marketing personnel (with help from BI) based on the results of the simulation.
6. Artificial Neural Networks
This topic is related to Feature Leaning requirement. ANNs can be one of the methods to implement Feature Leaning (un-supervised feature learning - close to scenario 2 in Feature Learning requirement). In addition ANN can be used to 'reverse engineer' the rules that people apply based on 'gut feeling' for example ad-hoc scoring, perceptory decisions, root cause analysis, fault diagnosis etc.
In this post, we try to summarize the various methods by which customer value segmentation may be done and how the process can mature with more advanced ways of customer analytics and self learning machines. It will be interesting to see how businesses adopt these methods in their road-map and approach the problem of customer analytics in a holistic manner.