If you have a data warehouse, it is very possible that you have already asked yourself if it is time to migrate it to the cloud. This is an important decision, since most companies have worked for many years in their data warehouse and to change it they need a good reason.
Depending on the provider with whom you speak, you may be presented with different reasons to justify that migration. They may even suggest a different solution.
However, it seems obvious that what you should do is rethink the role of the data warehouse in the cloud. In other words, ask yourself: “how should a data warehouse be in the cloud?”
It is a complex question, but based on the ideal of what a data warehouse in the cloud should be, we could say that it should comply with something similar to this:
Separate storage and computing
The cloud is the place where storage is cheaper and computing can be ordered on demand.
A data warehouse in the cloud should radically separate the storage of data from the engine that performs the computation. This allows you to store as many data as possible and as many different types of calculation engines as are necessary to process and create the data warehouse.
This separation significantly changes the economy of the data warehouse because it is not necessary to build a large system to handle the maximum storage needs within a local system.
With specific motors on demand to support workloads
The ability of the cloud to launch as many different computing engines as necessary in order to handle workloads reduces the complexity of a data warehouse.
Some of these engines will be used and will continue to function later, handling requests on demand or waiting for batch jobs. Others process only one workload and then disappear. The aspect that must be taken into account is that each of these engines must be created in a separate infrastructure that does not compete with the others. This simplifies the implementation.
Redo the optimizer based on the power of the cloud
The performance of a data store is determined by the quality of the optimizer that analyzes SQL queries and determines how they will run. The cloud has a lot of computing power, multi-rate memory and costs, and massive amounts of low-cost storage. The ideal optimizer of a cloud data store must adapt to use these new capabilities. For example, it should be possible to cache a large number of query results, given the availability of low-cost storage.
Manage the volume and variety of Big Data
Surely it is much easier to expand a data warehouse to handle various types of unstructured data than to do it with Hadoop and a powerful SQL engine. A data warehouse in the cloud must be able to process a large number of basic unstructured documents and extract structured data from them.
Execution of queries through multiple repositories
Having a single data warehouse is an outdated work model. In any organization of significant size, there is a need to have multiple repositories.
Have Scalable Data Movement and Replication Capability
A data warehouse in the cloud must have a strategy to move data in a scalable way to and from the data warehouse, and to be able to replicate and synchronize them. This is another key capability needed to support a multirepository and multinube world.
We could still talk about more dimensions such as compatibility with data transmission, security and disaster recovery. You can get many more suggestions, but the idea of this article is to start a conversation about what an ideal cloud data store should look like, and for that we have started by contributing these 6 ideas. Now is your turn.