In the era of cloud, we need to look at search as the glue that allows us to find the big data and analyze it together, no matter where it dwells.
As big data and the cloud were tough problems, finding things was becoming worse. The meltdown was foreseen because the use of Hadoop formed another data silo but it produced little actual insights.
Why is this the case? This is partly due to the fact that the technology industry is trend-driven instead of problem-solving. Several years ago, it revolved around client/server under the guise of distributed computing similar to Enterprise JavaBeans, followed by web services and then big data. Now it is all about machine learning. Most of these steps are essential and machine learning is an important tool for problem-solving.
What happened when big data emerged? We lost indexing and search
Sadly, the most important problem-solving trend got lost in the jumble: indexing and search.
The modern web began with search. We would have had a much smaller web if Yahoo and the search portals of the late 1990s had triumphed. The dot-com shift happened and yet Google was born from its ashes. Search also introduced the advance of big data and arguably the modern machine-learning trend. Google, Facebook and other companies require more ways to handle their indexing jobs and their large amounts of data distributed to an internet scale. In the meantime, they need better solutions to find and organize data after they meet the limits of crowdsourcing and human intelligence.
Amazon.com beat the retail market partially because it dared to invest money in big data’s search technology. Why do people choose Amazon rather than other vendors? It’s because they’ll likely find what they’re looking for. In fact, Amazon may even recommend what you want before you go and search for it (although Amazon’s recommendation systems are now falling behind the curve). Despite this, many retailers are still using the built-in search in their commerce suite and ask why customer conversion and engagement is off (because they can’t find anything to buy).
What about the companies that still keep old-style enterprise search products? Some of these products aren’t even maintained and owned by dead or acquired companies. Many people still operate with bookmarks. Therefore, if you move some data of yours to either SaaS solutions, PaaS solutions or IaaS solutions across several vendors’ cloud platforms while maintaining your data behind the firewall then, of course, no one is going to find anything!
How to redefine “integration”?
An old interpretation of data integration is: taking all the data and dumping it into a big, fat, single area. First, this was used for database, then data warehouses and then Hadoop. Amusingly, the more we did this, the more we moved further away from indexed technology.
Today, integration means that we can index and find the data where it lives, duplicate it and collect results. We need to capture timestamps and source IDs in order to find a single source of truth.
How can we integrate big data? We need a single search solution that can reach our on-premises data and our cloud data. The worst things we can possibly do is deploying a search tool that solely searches one source of data, serves only one use case or can’t be used behind the firewall.
In the cloud era, we can’t just put everything in one place. We need tools to let us get to exactly the right data where it lives.