|Page (1) of 1 - 10/30/12||email article||print page|
In-House, In CloudTiering and Prioritizing Big Storage
Unfortunately, while the IT budgets have remained relatively stagnant, storage capacity requirements have grown exponentially. The global demand for storage is expected to increase by as much as 48 percent this year to as much as 2.7 Zettabytes. Clearly, both smarter storage and a smarter storage strategy are needed to help businesses stay ahead of the demand as the amount of data we store continues to explode.
Two major contributors to this data explosion have been the on-going development of technology that has made it physically possible to store more data than ever before; and the increase in data sources, which provides us with more "necessary" data to store. With more companies adding social networking activities, mobile devices at work, automated sensors, Internet transactions etc., data growth will not be slowing down any time soon. Regardless of the company's size or the sector it serves, this data has become as important as their capital, raw materials and labor. The ability to quickly and accurately analyze that data can, for example, lead to faster product improvements and greater agility in responding to customer behavior changes, ultimately resulting in significant savings and profit gains.
However, it has become obvious (at least to IT managers and their staffs) that these big data stores have outgrown the traditional infrastructure, requiring new networking, analytic processes and general computing systems to handle the challenge of Petabyte data sets. The question then, is how to manage the data storage they need, on the budget they have.
Big Data in Small Spaces
The complexity associated with large, easily scaleable capacity, is the balance of virtually unlimited growth without causing significant performance degradation. Finding equilibrium on this tightrope is vitally important to applications such as medical imaging, the movie industry and even the business of cloud storage, where large unstructured data sets are the rule. In applications like these, object storage and object-based storage devices (OSDs) may offer a more promising solution.
OSDs group data into related categories or logical units as defined either by the user or as designated by an application, and assigns an Object ID (OID) which is used to retrieve the data. Both public cloud service providers and in-house private cloud users can benefit by the innate security functions of OSDs that manage security functions, either providing or blocking access to data files interfaced via an Application Programming Interface (API) or Hypertext Transfer Protocol (HTTP), rather than traditional Internet Small Computer System Interface (iSCSI) or Network-attached Storage (NAS) protocols.
In addition, because each logical unit (containing the OID, metadata and the data itself) is handled as a single self-contained object, the standard limits of hierarchical storage structure that restrict the number of directories and supported levels do not apply. As a result, absolute storage limits are virtually eliminated and truly unrestricted scalability, enhanced security and functionality are made possible. And, because OSD architectures easily lend themselves to deduplication, both within individual nodes and some forms of global deduplication across nodes, data reduction can optimize bandwidth enabling large infrastructures to be spread across multiple locations.
Furthermore, storage infrastructures that include such global file systems, give multiple users on multiple hosts access to files stored on various back-end storage systems in multiple locations. As a result, it's possible for even more people to share in the data loop, giving big data even greater business value.
What's on Your Server?
The ability to reduce capacity consumption on the back end, even by a small fraction, can ultimately provide a considerable return on investment as data sets continue to grow over time. Although such a reduction may seem impossible, multiple studies have shown that at minimum, 80 percent of data stored to most primary systems is seldom-to-never used, but must often be retained for legal and regulatory compliance.
Another consideration is the application performance requirements of that data. Many businesses mistakenly assume that because a data set is on the "always accessed" or "top priority" list, that it must be stored on their first tier. However, in many cases, the applications needed to access and run that data may not need to live on a high cost, 17K RPM drive to be fully useful.
Beyond usage and performance, consider the organization's data utilization trends. On average, although most companies would estimate their ideal utilization rate to be around 70 percent, the typical enterprise server generally maintains utilization rates below 15 percent. So, while 15 percent of data is being accessed and used, 85 percent is waiting on spinning disks, racking up the power and cooling bill. In monetary terms, about $6 of every $10 spent on over-provisioned data storage is a waste of budget.
All Data is Not Created Equal
Over time, the value of data changes, making the need for the implementation of multi-tier storage a necessity for today's storage environments. Storage tiering is an optimization strategy built on the concept that by identifying and moving over-provisioned data to a secondary, high- capacity but moderate performance storage solution, organizations can significantly improve the impact on their IT budget, without impacting their users.
The over-provisioning of storage performance to these data sets, paired with the under-utilization of resources, results in higher storage costs. These rising storage costs are not limited to the initial hardware purchase; they include the associated environmental costs of power, cooling, floor space and eventually landfill waste. Moving over-provisioned data from the standard 300GB 15K RPM FC drives that are typically implemented in a mirrored RAID configuration, to Blu-ray drives (BD) for example, can cut overall storage costs (and the impact on the environment) by over 50 percent
Tiering Storage According to Data Priority
Multi-tier storage management and data policies automatically move data to less expensive storage mediums, including Blu-ray optical disc (BD), resulting in greatly reduced per-Gigabyte storage costs. Data management applications let end users quickly and easily find the data they need, while cutting administrative and IT support requirements.
Automated tiering also enables organizations to establish their own schedule and parameters, enabling files to be spread over multiple tiers for the majority of the time, while still making the entire batch quickly recoverable even when stored to different global locations. For example, financial records can be scheduled to be moved to second- and even third-tier storage after three and six months of inactivity, but then moved back to the first tier in time to run the quarterly report. Prioritizing data to a multi-tiered storage system utilizing automated data movement and management technology provides the best solution for today's data explosion ensuring the highest efficiency and the lowest cost.
Tier 1: Best defined as "mission-critical," this is information that is extremely time-sensitive and is of high value to the organization. This data must be captured, analyzed and presented at the highest possible speed. An example is transactional data, where even short response delays can result in lost sales. In most cases, high-speed (15K+) FC/SAS disk drive systems are the expected solution, although PCI Flash and solid-state drives (SSD) are gaining market share. Even though these solutions have the highest cost per GB, impact power and cooling bills, and require personnel and equipment to monitor and maintain the service level, the expense is justified because of their high, sub-second response speeds.
Tier 2: A lower-cost, midrange storage subsystem, this service level is ideal for "mission important," active or hot data that requires fast, but not sub-second response, such as email, less-active financial records, databases and generally files more than 30 days old. Traditionally, 7.5K ATA/SATA drives are used.
Tier 3: Important for productivity, but rarely used, this is typically event-driven data. These are files that are generally more than three months old and must be retained for financial, legal or regulatory compliance. Although tape has traditionally been the media of choice for tier 3 storage, BD optical has been gaining ground because of its lower TCO and ability to retrieve data in seconds rather than minutes giving users an "active archive," rather than the offline archive offered by tape.
To Tape or Not to Tape
Long-term electronic data archives of large-scale data are still largely maintained on magnetic tape because of its low media cost, low power cost, extended archive life and removability.
However, tape has its challenges, including accessibility, access performance, data integrity assurance, general management cost, reliability, scalability, and technology compatibility.
The biggest concern with tape reliability is the issue of MTBF (Mean Time Between Failure) rate. The more tapes are accessed, causing the heads to rub over the tape; the more they are physically worn, and the sooner the tape will become less reliable. According to Symantec NetBackup, a tape that has been mounted 32 times should be considered unreliable.
In addition, unlike disks, there are no active processes like active data verification & assurance to enable users to verify tape media for integrity over time. Instead, tapes must be periodically loaded and rewritten to verify data integrity. More recent high-quality tape media have a long life span rating, with some tape formats specifying a 30-year lifespan at 20-degrees centigrade and 40 percent non-condensing humidity. However, variances outside of these narrow storage conditions shorten the life of the tape.
With tape, users also face issues of backwards compatibility as the technology continues to evolve. New LTO5 tape drives will not read LTO1 tapes created just a few years ago. As a result, users must either continually migrate their data to the next generation, or maintain a collection of obsolete tape drives to read data stored to earlier generation tape media. In addition, because data stored to tape must be accessed sequentially, larger tape capacities ultimately mean longer access times.
Big Blu Data
In comparing the cost per GB of tape cartridges vs. Blu-ray discs, BD is quickly closing in with an initial "at register" cost of less than $.06/GB. However, BD is easily the more cost-effective solution when considering the total cost of ownership (TCO). BD drives can be purchased for under $100 and have a high MTBF rate of 60,000 power on hours (poh). In addition, although tape media must be refreshed/replaced every 5 years, data stored to a BD disc does not need to be migrated for 50 years or more, which also results in less landfill waste. This extended MTBF rate provides a valuable long-term savings when considering some records must be retained for more than 50 years.
Providing the lowest energy consumption of any digital archive solution, BD solutions only require electricity to read and write data to disc. Unlike tape technologies that must be kept in
cool, climate-managed environment, BD drives and libraries can lie dormant until data access is required.
Not only is the carbon footprint small, but the physical space requirement of even a 35TB BD library is less than that of an office water cooler.
BD offers a modular design that can expand storage in 25, 50, 100 or 200GB increments. This enables organizations to purchase only the blocks of storage they currently require and gives them the flexibility to add to their archive capacity as needed.
Unlike other media, no head comes in contact with BD media to cause reliability issues; and a hard top coating safeguards the discs from scratches and fingerprints. In addition, because the hardware components and media are standardized, future technologies will always be able to read previous generations of optical media for assured backward compatibility.
Big Blu Data in the Cloud
With studies indicating that data storage requirements will grow by a factor 30 over the next ten years, with as much as 80 percent of that data consisting of files ranging from Terabytes to Petabytes in size, we can safely predict that big data is here to stay. Utilizing Object Storage systems previously described, these massive data sets can regain their usefulness to a company employing secure, private cloud storage. The potential problem with using a public cloud is that many organizations cannot tolerate the latency involved in transferring data to and from a public facility. Another consideration is that laws and regulations may prohibit the public cloud option for many industries.
On the other hand, the private cloud is proving to be a more flexible and efficient means of storing and accessing big-data; although the cost and burden of owning and managing the resources falls back on the organization itself. Pairing the private cloud structure with BD would give organizations the balance needed between file access security, low-latency accessibility and overall technology reliability.
Yasuhiro Tai is General Manager,
Related Keywords:Big Data, Archiving
Source:Digital Media Online. All Rights Reserved