Big Data Is Finally About Small Data!
When we talk about Big Data we often think of the V's. There is sufficient amount of literature available elaborating on these V's - the most popular being - Volume, Velocity and Variety. The two other V's which stand out as well are Veracity and Value.
Veracity basically means Truthfulness and deals with the 'appropriateness' of Data. And is closely linked with 'Value' - what exactly you want to derive out of these Large Volume of Data generated as High Velocity in Myriad Types (Variety). Unless you have selected your data appropriately you can not derive the right value you are looking for.
So what kind of value are you looking for when you talk about Big Data? You basically look for insights. Key Information which are important for you to take informed decisions, efficiently and which otherwise can not be found by just looking at the ocean of multidimensional data, from multiple sources in multiple formats. To give an example, a typical application involving Big Data is recommendation use cases. A Consumer when he accesses one service gets recommended by the service provider about other services that he may like.
It could be related to a book he buys and the seller (say an online retailer) recommends him books that are similar, books that he may like because others liked them too and going further books that fit the taste of the buyer around that point of time. To be frank the more dimensions you want to include (books that are similar, books that are liked by others, books that match the reader's browsing interest during that day or past week, books that are bought by his friends with at least two books in common and so on), the more complex the recommendation algorithm becomes.
What is you want now is to correlate parameters like the reader's and his friends (say facebook friends) reading preferences during about the same time, match it up with recent activity, Internet browsing history, actual purchases, purchasing power, monthly expenditure on books, language preferences etc. and want to do a more precise recommendation. Just in order to increase the hit ratio. You can always do blind recommendation based on some generic parameters and it will deluge all the population in the same manner and you probably can have one or two successful hits (recommendations materialized to actual purchases). Or you can invest a few bucks more, crunch a lot of data to give personalized recommendations and expect a hit one in four (by the way, a hit ratio of 25% is considered GREAT going by current standards).
What is however intriguing that the algorithm for recommendation is nothing really new. Traditional AI/Machine Learning Algorithms devised decades back fit perfectly to make recommendations. What differentiates the old and new is the way those algorithms are implemented to crunch large volume of data, within a fairly reasonable time frame with fairly common computing infrastructure. And in the end the output is fairly same in size.
To elaborate, the earlier algorithm would say process One Million Records on one parameter and identify top 10 Thousand users who may be potentially liking a certain product. The new one will process the same One Million Records to output a different set of 10 Thousand records, claiming it to be a set with better hit ratio because it has taken into consideration probably Ten Million additional records (different parameters) to fine tune its results.
So finally as we see, the whole activity of Big Data processing is to arrive as Small Actionable Data which then can be fed to traditional services to get better results. Earlier we were generating Small Data with Small Data and the resultant data might nit have been exactly what we wanted to achieve. Now it just becomes better and possible. Of course when you go about generating this small actionable data which can be actually consumed by a service (as you know traditional services can not consume large data and they need not really) you do need to invest into additional infrastructure, quite different from from the things of the past.
So to summarize
1. Traditional Services need not consume Big Data directly. Hence they as such need not change themselves
2. A Big Data Infrastructure would need to ensure that it feeds more Appropriate Yet Small Data to the Service
3. Veracity and Value are Linked and should be viewed together
Veracity basically means Truthfulness and deals with the 'appropriateness' of Data. And is closely linked with 'Value' - what exactly you want to derive out of these Large Volume of Data generated as High Velocity in Myriad Types (Variety). Unless you have selected your data appropriately you can not derive the right value you are looking for.
So what kind of value are you looking for when you talk about Big Data? You basically look for insights. Key Information which are important for you to take informed decisions, efficiently and which otherwise can not be found by just looking at the ocean of multidimensional data, from multiple sources in multiple formats. To give an example, a typical application involving Big Data is recommendation use cases. A Consumer when he accesses one service gets recommended by the service provider about other services that he may like.
It could be related to a book he buys and the seller (say an online retailer) recommends him books that are similar, books that he may like because others liked them too and going further books that fit the taste of the buyer around that point of time. To be frank the more dimensions you want to include (books that are similar, books that are liked by others, books that match the reader's browsing interest during that day or past week, books that are bought by his friends with at least two books in common and so on), the more complex the recommendation algorithm becomes.
What is you want now is to correlate parameters like the reader's and his friends (say facebook friends) reading preferences during about the same time, match it up with recent activity, Internet browsing history, actual purchases, purchasing power, monthly expenditure on books, language preferences etc. and want to do a more precise recommendation. Just in order to increase the hit ratio. You can always do blind recommendation based on some generic parameters and it will deluge all the population in the same manner and you probably can have one or two successful hits (recommendations materialized to actual purchases). Or you can invest a few bucks more, crunch a lot of data to give personalized recommendations and expect a hit one in four (by the way, a hit ratio of 25% is considered GREAT going by current standards).
What is however intriguing that the algorithm for recommendation is nothing really new. Traditional AI/Machine Learning Algorithms devised decades back fit perfectly to make recommendations. What differentiates the old and new is the way those algorithms are implemented to crunch large volume of data, within a fairly reasonable time frame with fairly common computing infrastructure. And in the end the output is fairly same in size.
To elaborate, the earlier algorithm would say process One Million Records on one parameter and identify top 10 Thousand users who may be potentially liking a certain product. The new one will process the same One Million Records to output a different set of 10 Thousand records, claiming it to be a set with better hit ratio because it has taken into consideration probably Ten Million additional records (different parameters) to fine tune its results.
So finally as we see, the whole activity of Big Data processing is to arrive as Small Actionable Data which then can be fed to traditional services to get better results. Earlier we were generating Small Data with Small Data and the resultant data might nit have been exactly what we wanted to achieve. Now it just becomes better and possible. Of course when you go about generating this small actionable data which can be actually consumed by a service (as you know traditional services can not consume large data and they need not really) you do need to invest into additional infrastructure, quite different from from the things of the past.
So to summarize
1. Traditional Services need not consume Big Data directly. Hence they as such need not change themselves
2. A Big Data Infrastructure would need to ensure that it feeds more Appropriate Yet Small Data to the Service
3. Veracity and Value are Linked and should be viewed together
Comments
Post a Comment