Managing storage systems in a big data world

Managing storage systems in a big data world

The pressure on managing storage systems in a big data world is growing to meet demands for capacity, performance, functionality and flexibility, as new applications are fuelled by big-data analytics, mobility and social platform integration.

Enterprise Strategy Group’s (ESG) whitepaper, Key reasons to use software-defined storage – and how to get started, outlines approaches that can combat these challenges. It points out that old practices of buying more and more traditional storage to stay ahead of demand are uneconomic and doomed to failure because they cannot scale and do not meet the need for flexibility or budget constraints.

Another challenge is the complexity of today’s IT environment and how to manage storage across a diverse IT estate.

“Manually managing across heterogeneous storage systems, silos, and clouds increases administrative overhead. For example, duplicating data across storage pools or geographic locations can create copy management cost and complexity issues; expanding to the cloud can increase that complexity,” states the report.

An Introduction to Scalable, Complete Storage

A fast, simple, scalable and complete storage solution is the foundation for today’s data-intensive enterprise.

Download

Software-defined storage
Although software-defined storage (SDS) is a broad term, its central tenet is an emphasis on the value that software brings to the storage arena, and how software-led storage solutions can meet an organization’s needs for users, workloads and applications.

ESG points out that organizational needs will vary but are related to factors such as a performance, quality of service, functionality and budget.

In a big-data world, SDS makes sense by ensuring that storage matches need in organizations that want to exploit big data for business opportunities without breaking the bank.

Organizations that have lots of unstructured data growth and plan to deploy new applications, perhaps based on data from the internet of things, might want to consider a solution such as IBM’s Spectrum Scale, according to ESG, because it can manage the range, scale and accessibility of that data.

Bernie Spang, director of strategy and marketing for the IBM software group, said last year that IBM will invest $1bn over five years to build more software-defined storage offerings and cloud services based on existing IBM storage technology.

“This is building on existing technology that is based on over 700 IBM patents. This will enable our customers to take advantage of hybrid cloud computing. They want economies of cloud computing. They don’t always want their data literally in the cloud. They want private cloud and cloud services where it makes sense,” he says.

Organizations should consider SDS that includes cloud as one of the types of storage under management. It is important to ensure that storage functionality can be mapped to specific workloads dynamically, whether on premises or in a public or private cloud.

Consider a supplier with a wide range of SDS capabilities that can meet specific needs and has a broad range of hardware. Integrating into an existing architecture is a prerequisite as no organization can afford to jettison its legacy systems. Storage must be managed in a hybrid environment.

   

Evolution not revolution
CIOs will want to implement SDS in an evolutionary rather than a revolutionary manner, so that managing storage is no longer an administrative and financial burden. Centralized control is desirable with a consistent interface to ensure that storage management costs do not soar.

SDS can manage a heterogeneous environment and removes constraints such as data locations, data types, applications or storage types. The ideal is to have a “pool of data” that can be called on whatever the requirement.

Buying traditional storage commits an organization to spending on both the hardware and the associated proprietary management software. SDS ensures savings on capital and operational expenditure as there is just one logical software investment to manage all storage hardware.

This is critical in a big data world as economies of scale are necessary and storage administrators need to be able to manage more data than ever before and utilize resources efficiently.

In a separate paper, The optimal storage platform for big data, ESG highlights that managing storage in a big data world is best achieved if organizations have a repository of data that delivers both a cost-effective archive and provides ready access to the most relevant data sets for any given usage. While data may accumulate to petabytes, most analytics tasks use less than 25 terabytes, so there is a dual need for cost-effective retention and flexible access to meet the changing use-criteria for data in analytics tasks.

Storage systems also need to handle unstructured data independent of origin, as data from social media and websites offers insight into customer behavior for new business opportunities.

Dynamic tiering is a further requirement to ensure a storage platform can allocate data to appropriate media. Multi-protocol accessibility to different data sources is required. “It is more economical and easier to manage if a single, central storage repository can be multi-purpose in nature,” says ESG.

Data protection
As companies grow and invest in technologies that make it possible to scale to larger virtual environments, it is critical that the data protection technology keeps pace.

Companies can purchase point solutions for specific data protection needs, but the total cost of ownership and increased complexity can add up very quickly. It is not enough to simply deploy a quick fix for a specific backup issue; the data protection and disaster recovery environments must integrate with the business workflow while becoming more efficient.

ESG points out that IBM Spectrum Protect supports large virtual environments and adds many new features.

“IBM has demonstrated a dedication to improving management efficiency with each major release. And, with each new version, the client-side user interfaces are becoming more capable,” it says.

Tackling the challenges associated with data protection in highly virtualized environments is difficult, says ESG.

“Organizations are looking for easy-to-use tools delivered in a format familiar to their IT staff. They are looking for options that improve recovery time, allow quick validation of backups, and enable IT to restore only the data that matters to the business,” it says.

The priority of data protection and availability is highlighted by the example of CERN, Europe’s nuclear research laboratory. Ian Bird, project leader of the Large Hadron Collider (LHC) Computing Grid, says the IT department at CERN manages a huge data archive of over 100 petabytes.

Scalability is a key requisite, and so is the protection of data and allowing real-time access for institutes and physicists worldwide to analyze the data from experiments.

IBM’s Spectrum Protect (formerly IBM Tivoli Manager) guarantees safeguards to data and allows CERN to quickly restore the configuration and the data of any machines in the datacenter that may have experienced hardware or data failure.

“When you get to the scale of data that we’re talking about, the human effort involved in fixing problems is enormous,” says Bird. “So the more reliability and stability we have, the less manual intervention we need. This is critical for us to be able to afford to run the operation.”

In a big data world, managing storage has many facets that all require consideration.