What are the best, simplest, high availability large scale NAS file storage options? This is a question that has troubled many of us for many years...
Let me background this by saying that in my opinion it’s relatively easy to do simple HA large-scale block storage (my current favourites being Nimble Storage, and Pure Storage if you have the cash) and it’s also relatively easy to do simple HA large-scale object storage (e.g. Cloudian, Cleversafe, Scality).
But when it comes to file storage, one does not simply do scalable HA NAS.
There are plenty of NAS options, especially in the UNIX/Linux world. Most of them, however, are what I might call geek-only solutions, but let me run through some of the more commercial offerings:
Microsoft Windows Server
Microsoft Windows Server is probably the default choice for NAS, but my sources (who have tried it and have the scars to prove it) tell me that you shouldn’t push that above 100 TB, and also that delivering HA using Windows clustering is a world of pain and complexity. So Microsoft does great Windows NAS unless you also want scale, high availability and simplicity, and I’m really not too sure about their nfs implementation either (which has support for selected nfs versions).
Microsoft Azure File Storage
Microsoft Azure File Storage is fully managed SMB (2.1 and 3.0) storage on the Microsoft Azure cloud. It allows access from both within Azure and also directly from your on-premises servers with SMB3. Because it appears almost indistinguishable from a standard Windows file share, file system I/O APIs also work, meaning that apps such as IIS and SQL Server can talk to it as well. This is a fully elastic, highly available NAS offering with options for local redundancy and geographic redundancy, and accessible from the Azure portal as well as a REST API, and of course can be treated just like a standard SMB filer share. Being in the cloud means not having to think much about the hardware and software, so when it comes to ease of use, it is hard to beat. While you can put 500TB in a subscription, currently each share is a max of 5TB, so the scalability aspect needs some work.
IBM Spectrum Scale
IBM Spectrum Scale claim to have nailed the performance, HA and reliability issues in delivering enterprise NAS, and maybe they have, but I don’t think simple or easy are words you’re likely to hear from anyone who has dabbled with what used to be called gpfs. The NAS incarnation of Spectrum Scale is essentially the son of the original SONAS product which was initially defined in part by Andrew Tridgell (primary developer of SAMBA).
There is a raft of Linux-based products that implement fairly vanilla versions of SAMBA and nfsv3, but getting them to scale can be problematic. Looking from the outside, SAMBA seems to have serialized I/O bottlenecks that mean for example that disk I/O can block network I/O, or that one user can block another user, so product-builders using SAMBA need to deal with those issues. Low-cost or free SAMBA-based implementations tend to struggle when it comes to simplicity, scalability and high availability.
The wise old dogs of NAS
At this point I’m going to risk someone’s ire by lumping two traditional NAS vendors together - EMC and Netapp.
Netapp do a very solid enterprise NAS implementation, they almost invented it, but they also do complexity. Despite what anyone might claim, you do need command line skills, and you need to be very aware of the interdependencies of their many features. Netapp has nfs running in its veins, no question they do a great job there, and they have avoided many Windows integration issues by licensing code directly from Microsoft. But although they inherited scale-out clustering from Spinnaker, the added complexity can make it unattractive, especially considering that for most workloads you can do the job with a simple controller pair. So in summary, Netapp does great NAS unless you want simplicity.
EMC acquired Isilon in 2010 and then acquired Likewise in 2012. Isilon started off using SAMBA, but gave up and switched to Likewise to provide better Windows integration. Likewise is a competitor to SAMBA, developed by people close to Microsoft and built for parallel scalability, lower resource consumption, better Windows compatibility and better handling of overload conditions as well as fewer knobs to tweak (which I see as a strength). The only complaints I hear about Isilon are for I/O performance, which may have as much to do with buying the cheapest fattest drives as the architecture itself. Isilon seems to have nailed ease of use, scalability, and high availability and I think the Likewise stack has given them an edge over others that implement SAMBA. It’s a shame that Isilon removed support for iSCSI however, as that was a nice little bonus to have in the mix.
Multi-site file appliances and cloud gateways
Panzura is an example of a global file caching appliance. Panzura delivers distributed file access with multi-site locking, compression and dedup and can archive into AWS (cost-effectively because of data reduction). Panzura can handle heavy loads at each site and large quantities of data. They are good on scale and ease of use and HA. Typically each site would have a front-end cache of SSD, with a back-end either in AWS or on a central on-premise object-store like Cloudian.
CTERA is an example of a global file sync appliance. The sync approach tends to favour many small sites, with local locking only so there is no centrally stored authoritative copy of a file. It’s really a specialist use case.
Cloudian, the on-premises S3 storage people, have pretty much always offered an nfsv3-to-S3 gateway into their HyperStore object storage back-end. They have also just announced Cloudian HyperStore Connect for Files (HCF) which supports SMB2.1 and ftp as well, and supports multi-site file locking and dedup and compression at the back-end. Cloudian also auto-tiers out to AWS (S3 and thence Glacier).
Amazon Web Services
Amazon Elastic File System is a fully managed cloud service, but EFS provides nfsv4 only and only supports servers running in Amazon Elastic Compute Cloud (EC2). The issue with nfsv4 is that it is very different from nfsv3, and v3 is what pretty much everyone uses in an nfs world. To my thinking the main driver for v4 is security, but adoption has been very slow because it makes the major change from v3’s stateless protocol to a stateful protocol (more like Windows SMB).
The easy way to think about stateless is that an nfsv3 server does not maintain a connection to a v3 client – it assumes it’s there after initial authentication and if one goes away the other will only notice when it tries to send or receive some data. It’s not chatty and network intolerant. V4 on the other hand maintains a connection, just like Windows SMB does. Simply calling one v3 and one v4 is a bit controversial when such a major architectural change has been introduced. As v4 moves through to 4.1 with support for Panasas’ parallel NFS (pNFS) I am sure that the new tech will become more mainstream.
HA scalable file storage can be complicated and I’m sure Amazon is happy to start with a limited use case while they put the technology through its paces and prove its robustness. EFS will likely become a very important option in a 2017 timeframe, but right now, in preview at US$.30 per GB/month it’s not so attractive.
So like I said at the start, it’s not as easy as it sounds to achieve highly scalable highly available NAS with simplicity. Folks have been trying to get this right for many years. Of course there are many options that I didn’t mention above, including some interesting solutions from Data Direct Networks, Avere, and even the new nfs Storageblade from Pure Storage. Everyone will have their favourites, but no-one should be under the illusion that simple, scalable, highly available NAS is as easy as block or object storage.