The Rise Of Scale Out Storage
Virtualisation helped solve It problems, but created more issues, that only scale out architectures can solve, says Stefan Bernbo
It’s no secret that the rate at which data is being created is growing at an astonishing rate. In fact, a recent IDC study predicts that by 2020, 35 zettabytes of data will be generated annually, a profound increase from the 1.8 zettabytes created in 2011.
One of the driving forces behind the surge in data creation is the popular adoption of virtual machines. With virtualisation, software is used to virtualise the functions of the physical hardware, which greatly reduces hardware-related costs and provides greater flexibility than a physical machine. However, as one might guess, the increased use of virtualisation makes it possible for organisations to run more applications at a given time. In doing so, they create huge amounts of data that require massive amounts of storage.
To stay competitive, therefore, the entire storage ecosystem must therefore adapt to a virtualised world as fast as it can, and find ways to cut costs while keeping performance high.
The ascent of virtualisation
Virtualisation has caught on quite quickly for several reasons, but the main one is the cost savings and flexibility. One main benefits of virtualisation is its ability to make more efficient use of the data centre’s hardware. Usually, a data centre’s physical servers are idle for the majority of the time they are on. Organisations can enhance the use of their CPUs and take better advantage of the hardware they have by installing virtual servers within their physical servers. This incentivises companies to virtualise more of their hardware infrastructure with the goal of saving money.
The other primary benefit of virtualisation is its high level of flexibility. It is far more convenient to have infrastructure composed of virtual machines rather than as physical ones. For example, if an organisation wants to upgrade its hardware, the administrator can easily move the virtual server to the newer, better physical system, which will increase performance and reduce costs. Before virtual servers, administrators needed to install the new server, then reinstall and migrate all the data stored on the old server. It tends to be simpler to move a virtual machine than a physical one.
The proliferation of virtualisation
Not every data centre has the need or the desire for virtualisation. But data centres with a significant amount of servers − somewhere around 20 or more − are beginning to think about converting their servers into virtual machines. These organisations can reap substantial levels of cost savings and flexibility benefits described earlier. Furthermore, virtualising one’s servers makes them much easier to manage. Maintaining a sizable number of servers can be quite difficult for data centre staff; however, virtualisation makes data centre management much easier by allowing administrators to manage the same number of servers on fewer physical machines.
Despite all the benefits of virtualisation, the trend towards increased implementation of virtual servers is placing more and more stress on typical data centre infrastructure and storage devices.
The impact on storage
In a sense, this problem can be attributed directly to the popularity of virtualisation. The first virtual machines used the local storage within the physical server. This process makes it impossible to move a virtual machine from one physical server to another more powerful one. However, the introduction of shared storage − either a NAS or SAN − to these VM hosts has effectively solved this problem. This success paved the way for stacking on more and more virtual machines, which all became located in the shared storage. Over time, this situation evolved to today’s virtualisation scenario where all the physical servers and virtual machines are connected to the same storage.
But a problem still exists: Data congestion.
By having only one point of entry, organisations tend to create a single point of failure. With all data flow forced through a single gateway, data gets bogged down quickly during periods of high demand. As the number of virtual machines and data projects grows, it should be clear that the current approach to storage architecture must be improved. The architecture must evolve to keep up with the rate of data growth.
Studying the pioneers
The early adopters of virtual machines, such as major service providers or telcos, have already come across this problem and are currently taking precautions to mitigate its impact. With other organisations starting to virtualise their data, they will inevitably encounter this issue as well.
However, hope is not lost. Organisations looking to maximise the benefits of virtualisation while also avoiding data congestion brought on by traditional scale-out environments can ensure their storage ability keeps up – by removing the single point of entry.
Typical NAS or SAN storage solutions have just a single gateway that controls the flow of data, which leads to congestion when demand spikes. Organisations should look for solutions with multiple data entry points and spread the load uniformly across all servers. That way the system retains optimal performance and minimises lag-time despite being accessed by several users at once.
Even though this tactic might be the most direct fix, the next round of storage architecture is suggesting another alternative as well.
Combining compute and storage
Coming forth to meet the storage challenge of scale-out virtual environments is the idea of running the virtual machines inside the storage nodes themselves (or running the storage inside the virtual machine host). This practice, which turns the virtual machine into a compute node, is quickly becoming the next generation in storage architectures.
With this approach, the whole architecture is essentially flattened out. For example, if an organisation is using shared storage in a SAN, usually the virtual machine hosts from the top of the storage layer, effectively creating a single large storage system with only one point of entry. Organisations are combating the data congestion problems this type of system makes by transitioning away from the usual two-layer architecture that runs both the virtual machine and storage from the same layer.
The spread of virtualising infrastructure shows no signs of slowing any time in the near future. In fact, more and more organisations have been implementing virtual machines, and just as many will encounter the same performance and lag problems mentioned above. However, by learning from early pioneers who produced the best practices, organisations can create a well-functioning virtual environment to maximise their output and minimise their hardware costs.
Stefan Bernbo is the founder and CEO of Compuverde, and a veteran creator of storage systems for huge data sets.
Are you a glutton for storage? Try our quiz!