Multi Model Self Learning Machines : Big Data for Customer Value Analysis
Let us start with an example, we often face in any business vertical.
- Traditionally we consider Customer Basic Information and models for Customer Comprehensive Value Score, Customer Churn Propensity, Life cycle Stage Prediction etc.
- But .. How different actions can be recommended for segments which are
- Dominated by customer score (e.g. high value customers)
- Dominated by both score and churn (e.g. high value customers with churn propensity)
- Subsequently, how system can refine the segmentation models and recommendation models by incorporating additional loop back information
- Date/Time when the action reached the customer
- Customer’s Location
- Customer’s Device
- Customer’s account Balance
- Immediately Preceding Events (e.g. recharge, browsing, mobility etc.)
- Other..
In the following we briefly summarize One Basic and Three Advanced Techniques for Customer Value Management by Segmentation and Profiling.
The first basic rule is very simple: It is based on tagging and segmentation rules based on the tags. These tags are deduced from descriptive information we already know about the customer.
The next level is a uni model silo-ed approach which takes into account customer information from a single set of attributes and single set of prediction models for computed attributes.
The next higher level of maturity is shown when the segmentation depends on multiple models often competing or collaborating to develop deeper insights from the customer behavior.
The ultimate level of maturity is shown these tagging and segmentation rules are self learning and self optimized based on incremental changes captured from the customer's environment and his interaction with the business. As shown in the following two diagrams:
So, How do these methods compare among each other, as we try to summarize below:
This brings us to a set of requirements that stand to be fulfilled by any learning platform. Below we example such key 6 requirements.
1. Automatic Algorithm Selection
It is a normal
practice to apply different ML algorithms to the same problem, evaluate the
results (model validation) and then select the most appropriate ones. Typical
examples -manually selecting between boosted tree ensembles and decision trees,
or selecting between Naive Bayes and Random Forest etc.
It is common to apply
an algorithm in current month's data and a different one next month because the
model validation results change.
In the 'Self Learning
Toolkit', the model should be able to select the best algorithm according to
some pre configured 'goals'. Goals may be - 'best accuracy', 'best
sensitivity', 'best performance' etc.
2. Feature Learning
There are two scenarios:
Scenario 1: Currently features are decided by persons and normally
methods such as PCA, SVD, Rho etc. are applied to identify the most dominant
features from a data set. Then we build a model using those features. As the
data set changes with time, the dominant features may also change. Machine
should have the intelligence to re-adjust to the newly discovered dominant
features or at least should be able to alert Data Scientists about this (in
short physical, semantic and stochastic quality of data)
Scenario 2: A prediction
algorithm which works with a certain set of features, need to be refined and
made more accurate by feedback of additional features in form of results. A
typical case is in the following flow.
Customer Value Score -> Micro Segment -> Action -> Result
Feedback
In this flow Attributes about the Action
taken and the Attributes of the Result Feedback may be added to the original
feature set of Customer Value Score and optimize it further. The Goal is to
make the result feedback as close as possible to the intention of the action
(e.g. Result = 'Conversion Successful' when Action is 'offer a product')
Challenge: In this scenario, the model starts
with lesser number of features and as more data is generated, it adds new
features to itself
3. Hyper Parameter Optimization
Hyper-Parameters are parameters of the model
or algorithm which are often left as it is (default value). Typical examples
are - Number of bags, number of trees, iteration limit, Gini Index, K-folding etc. These parameters are
to be tuned according the desired outcome and algorithm. Tuning Hyper
Parameters can improve the performance and accuracy of the algorithm. In
self-learning methodology, the model itself can learn to optimize the algorithm
hyper-parameters for best results.
Alternatively, the platform may be able to
alert data scientists if certain hyper parameters need tuning
4. Training Sample Auto Optimization
Training Sample (which is sub divided into
learning set and validation set) play an important role in supervised learning
models such as classifiers and regressors. Depending on the algorithm that we select, the training
sample may need to conform to certain characteristics. For example, Naive Bayes
learns best if the training data is normally distributed (Gaussian), In some
cases training data sets need to minimally sparse and so on. The problem is
traditional CRISP-DM suggests we prepare the training data set before we select
the model.
But in reality we must tune or optimize the
training data set according to the model to
have the best outcome. The expectation is the model should be able to sample
the training data automatically and intelligently or at least should be able to
alert the data scientists if some non-optimal situation is observed
5. Reinforcement Learning
Traditionally Machine Learning algorithms can
support supervised and unsupervised learning. In reinforcement learning, the
model tries to take actions (outputs) to maximize the cumulative rewards. There
is an automatic feedback loop of the results into the model makes the model to
tune its internal rules or policies.
Although RL is a large umbrella topic, in the
scope of a customer we can explore the possibility of implementing a Markovian Decision Process
(such as using Q-Learning algorithm) to decide on the best offer or marketing
action for a customer (belonging to a certain segment and having a certain
customer value score). For example if offer 1 is not accepted, then issue offer
2 as recommendation if not then issue offer 3 etc. and the decision to make the
offer is dynamic, dependent on machine learning and not fixed policies.
As a simpler alternative, simulated learning
may be developed (such as Monte Carlo Simulations) and the action policy can be
designed by the marketing personnel (with help from BI) based on the results of
the simulation.
6. Artificial Neural Networks
This topic is related to Feature Leaning
requirement. ANNs can be one of the methods to implement Feature Leaning
(un-supervised feature learning - close to scenario 2 in Feature Learning
requirement). In addition ANN can be used to 'reverse engineer' the rules that
people apply based on 'gut feeling' for example ad-hoc scoring, perceptory decisions, root
cause analysis, fault diagnosis etc.
Summary:
In this post, we try to summarize the various methods by which customer value segmentation may be done and how the process can mature with more advanced ways of customer analytics and self learning machines. It will be interesting to see how businesses adopt these methods in their road-map and approach the problem of customer analytics in a holistic manner.
Comments
Post a Comment