Pools Pools organize storage drives into logical groupings for data management. Pools serve as the containers for your datasets in BrickStor. There are two types of pools in BrickStor: Boot Pools Hybrid Pools BrickStor uses the Boot Pool primarily for appliance administration purposes. For the purposes of data management, when this documentation refers to pools, it is referring to hybrid pools. This walkthrough primarily covers hardware-centric deployments, and may not represent effectivity with virtual deployments. Boot Pools The Boot Pool consists of two mirrored SSDs and contains the BrickStorOS. It is a mirrored pool used to boot the appliance. This pool should remain untouched during normal BrickStor operations. Logs stored on the boot pool are set to auto rotate and expire to prevent any partition or directory from becoming full. Hybrid Pools A typical BrickStor deployment is referred to as a hybrid storage system. A hybrid pool is a collection of drives, optionally with dedicated read-optimized cache devices and write optimized journal devices. All storage pools are hybrid pools because they are a combination of in-memory read cache as well as actual high capacity persistent storage and optionally read and write cache devices. The high capacity data drives are organized into virtual devices called vdevs. A vdev, also know as a stripe, is a virtual device that can be a single disk, two or more disks that are mirrored, or a group of disks with a parity scheme such as RAID-5. The concept of a vdev is something that abstracts away some unit of storage, which may or may not have any redundancy. vdevs can be viewed as a building block for pools. Pools are groups of virtual devices usually implemented with some data protection scheme, such as RAID or mirroring, on top of which filesystems and raw block devices are provisioned. A typical hybrid pool is a mix of mechanical drives and solid-state drives. In such a pool, data is redundantly stored on large capacity, slower, typically mechanical devices, arranged into a parity scheme that satisfies data protection as well as capacity and IOPS requirements, while high bandwidth, low latency solid state drives are used for the purposes of caching to accelerate reads and for the purposes of handling synchronous writes, enabling a much better cost to performance ratio over traditional purely mechanical, or purely solid state configurations. BrickStor also configures all flash pools, which continue to leverage RAM for cache solid state disks instead of mechanical disks to provide consistently lower latency and higher IOPS. You must configure one or more data pools on a system in order to present storage to consumers via NFS or SMB. While there is no hard limit on number of pools a system can have, usually fewer than four pools are configured on any given system. Under normal circumstances, the burden of designing and configuring pools is not on the customer, but in the instances where a system is no longer satisfying previously prescribed requirements, RackTop strongly recommends that customer contacts support before any changes are made to configuration of any pool. From a systems administrator’s point of view, a pool is a logical organization of independent drives and contains all information about the devices comprising it, including structure, filesystems, raw volumes, replication target if any, etc. This information is encoded within its metadata, which makes it possible to easily migrate pools between systems. Critically, this property means that loss of the controller does not in any way compromise data. A replacement controller is all that’s necessary to return to normal operations. This feature also enables BrickStor’s high availability capabilities, which can move pools, as well as related network configuration, between nodes in the cluster. Adaptive Replacement Cache Adaptive Replacement Cache (ARC) is a portion of memory in the controller dedicated to caching recently accessed data. The ARC caches both recently written data, with the assumption that this data may be read soon after being written as well as recently read data, with the assumption that this data is potentially going to be read again. Depending on the popularity of data it may remain in the cache for a long time, or be evicted in favor of other data, based on criteria which both the user as well as the system can optimize for. Read Cache Optional SSD Cache device that can be used to extend the amount of data that is cached for Read operations. When data is evicted from the ARC it will potentially move to the L2ARC (based up on user configuration settings). Data read from L2ARC will be moved back into ARC. Write Cache RackTop uses a journal methodology for its write cache and is implemented in most systems as a mirrored SSD pair. A journal is both a software concept and a core physical component, a write ahead log that is used to reduce latency on storage when synchronous writes are issued by clients. RackTop frequently refers to journal as a ZIL, an intent log or a log device. In synchronous write cases, writes are committed to this journal and periodically pushed to primary storage. Journal guarantees that data is protected from loss on power failure due to being in cache before cache is flushed to stable storage. A log device is normally only ever written to and never read from. A log device i.e. journal is present to protect the system from unexpected interruptions, such as power loss, a system crash, loss of storage connectivity, etc. In rare instances where recovery is necessary due to power loss or some other catastrophe, journal is read from in order to recreate a consistent state of the pool, which may require rolling back some transactions, but results in restoring the pool to a consistent state, unlike traditional storage systems where only best effort is promised. RackTop recommends mirroring journal devices as a means of preventing loss of a journal device, which has performance and potential availability impact. In all pools configured at the factory prior to system shipping, the journal, if present, will be mirrored. Resilvering Resilvering is the process of rebuilding a disk within a vdev after a drive has been replaced. BrickStor OS does not have an fsck repair tool equivalent, common on Unix filesystems. Instead, the filesystem has a repair tool called "scrub" which examines and repairs silent corruption and other problems. Scrub can run while the volume is online; scrub checks everything, including metadata and the data. This process works from the top down and only writes data to the disk that is needed. If a disk was temporarily offline it would only have to rebuild the data that was missed while the device was offline. RAID Performance BrickStor uses mirrors and RAID-Z for disk level redundancy within vdevs. RAIDZ RAID-Z vdevs are a variant of RAID-5 and RAID-6: You can choose the number of data disks and the number of parity disks. Today, the number of parity disks is limited to 3 (RAID-Z3). Each data block that is handed over to ZFS is split up into its own stripe of multiple disk blocks at the disk level, across the RAID-Z vdev. This is important to keep in mind: Each individual I/O operation at the file system level will be mapped to multiple, parallel and smaller I/O operations across members of the RAID-Z vdev. When writing to a RAID-Z vdev, ZFS will use a best fit algorithm when the vdev is less than 90% full. Write transactions in ZFS are always atomic, even when using RAID-Z: Each write operation is only finished if the überblock has been successfully written to disk. This means there’s no possibility to suffer from the traditional RAID-5 write hole, in which a power-failure can cause a partially (and therefore broken) written RAID-5 set of blocks. Due to the copy-on-write nature of ZFS, there’s no read-modify-write cycle for changing blocks on disk: ZFS writes are always full stripe writes to free blocks. This allows ZFS to choose blocks that are in sequence on the disk, essentially turning random writes into sequential writes, maximizing disk write capabilities. Just like traditional RAID-5 and RAID-6, you can lose up to 1 disk or 2 disks respectively without losing any data using RAID-Z1 and RAID-Z2. And just like ZFS mirroring, for each block at the file system level, ZFS can try to reconstruct data out of partially working disks, as long as it can find a critical number of blocks to reconstruct the original RAID-Z group. This walkthrough primarily covers hardware-centric deployments, and may not represent effectivity with virtual deployments. Performance of RAIDZ When the system writes to a pool it writes to the vdevs in a stripe. A Vdev in a RAID-Z configuration will have the IOPS and performance characteristics of the single slowest disk in that vdev (it will not be a summation of the disks). This is because a read from disk requires a piece of data from every disk in the vdev to complete the read. So, a pool with 3 vdevs in a RAID-Z1 with 5 disks per vDEV will have the raw IOPS performance of 3 disks. You may see better performance than this through caching, but this is the most amount of raw IOPS the pool can deliver from disk. The more vdev’s in the pool the better the performance. Performance of Mirrors When the vdev’s are configured as mirrors the configuration of the pool is equivalent to RAID-10. A pool with mirrored vdev’s will always outperform other configurations. A read from disk only needs data from one disk in the mirror. As with RAID-Z, the more vdevs the better performance will be. Resilver times with mirrored vdevs will be faster than with RAID-Z and will have less of a performance impact on the overall system during resilvering. RackTop recommends the use of mirrored vdevs in environments with high random IO such as virtualization because it provides the highest performance. Compression Compression is performed inline and at the block level. It is transparent to all other layers of the storage system. Each block is compressed independently and all-zero blocks are converted into file holes. To prevent “inflation” of already-compressed or incompressible blocks, BrickStor maintains a 12.5% compression ratio threshold below which blocks are written in uncompressed format. BrickStor supports compression via the LZJB, GZIP (levels 1-9), LZE, and LZ4. RackTop finds that LZ4 works very well, balancing speed and compression performance. It is common to realize a 1.3 to 1.6 compression ratio with highly compressible data which not only optimizes storage density but also improves write performance due to the reduction in disk IO. RackTop recommends always using compression because any CPU penalty is typically outweighed by the savings in storage and bandwidth to the disk. Deduplication Deduplication is performed inline and at the block level, also like compression, deduplication is transparent to all other layers of the storage system. For deduplication to work as expected the blocks written to the system must be aligned. Deduplication even when turned off will not reverse the deduplication of blocks already written to the system. This can only be accomplished through copying or moving the data. Deduplication negatively impacts the system performance if data is not significantly duplicative because an extra operation must be done to look if it is a duplicate block for writes and if it is the last block for deletes. Additionally, the deduplication table must be stored in RAM. This takes up space that could otherwise be used for metadata and caching. Should the deduplication not all fit in RAM then system performance will degrade sharply because every read and write operation will require the system to reread the dedup table from disk. Deduplication is only supported on All Flash Pools. Clones ZFS clones create an active version of a snapshot. By creating a snapshot of a base VM and using clones of that same snapshot you can have an unlimited number of copies of the same base virtual machine without taking up more storage capacity. The only increased storage footprint will come from the deltas or differences between clones. Additionally, since each VM will reference the same set of base data blocks the system and user will benefit from caching since all VM’s will be utilizing the same blocks of data. Imbalance of vdev Capacity If you wish to grow the capacity of a volume by adding another vdev you should do so by adding a vdev of equivalent size to the other vdevs in the pool. If the other vdevs are already past 90% capacity they will still be slow because data will not automatically balance or spread across all vdevs after the additional capacity is added. To force a rebalance in a VMware environment you can perform a vmotion or storage migration. With the Copy On Write Characteristics of ZFS, the pool will automatically rebalance across all vdevs. Pool Hierarchy and Containers Pools include special containers that are used for organizing datasets and volumes so that they always reside within the same location within the pool. Global – Contains all the datasets and other containers except for the tenant containers on a Pool Volume Container – Contains all virtual block devices which are special datasets exposed over iSCSI Replication – Top level container for all incoming replication streams from other pools within the same BrickStor or other BrickStor’s Meta – Contains all of the user behavior audit data and the snapshot index data Pool Types This in software implementation allows for various parity schemes as well as mirroring configurations. The following are schemes currently supported by RackTop: The following table explains the pool types that are available in BrickStor: Table 1. Pool Types Type Description mirror Equivalent to RAID 10 / RAID 1+0, aka a stripe of mirrors, where two or more drives in a mirror are possible, offers highest availability with a capacity trade-off raidz3 (triple parity) Like RAIDZ2, but with even more parity protection, allowing for loss of three drives in each group (vdev) raidz2 (double parity) Equivalent to RAID 60 / RAID 6+0, which allows for loss of two drives in each group (vdev) raidz1 (single parity) Equivalent to RAID 50 / RAID 5+0, which allows for loss of a single drive in each group (vdev) disk (no parity) fast, but with only minimal protection, and total loss if any single device is lost, useful for scratch-only data Creating Pools You can create pools from the details pane or Rack View. Creating Pools from the Details Pane To create a pool from the details pane, complete the following steps: In Connections, select the appliance. On a clean install, only the appliance level will display. In the lower portion of the details pane, click Add Pool. You can also select the General tab, and then click the add icon next to Pools. The Create Pool dialog box appears. In the Create Pool dialog box, type a name for the pool. Under Type, choose one of the following options: mirror raidz3 raidz2 raidz1 disk Optionally, select to Auto choose drives from alternating enclosures if you want BrickStor SP Manager to select the drives where your pools will reside. Clear the check box if you prefer to manually select your disks. Under Drive Type, select from available drive types in your deployment. Select the number of vdevs. Select the number of drives per vdev. Optionally, select the number of spare drives. Click Create. In the Changes pane, click Commit Changes. Creating Pools from Rack View When you create a pool from Rack View, you can first view a topography of your storage system and then choose drives based on availability. In Connections, select the appliance. On a clean install, only the appliance level will appear. Right-click and select Open Rack View. In the details pane, select the drives where you want to create a pool. Shift-click to select multiple drives. Optionally, selecting a drive from the right-hand dropdown of Available when sorted by Pool. The selected drive will display a blue border. In the lower portion of the Details pane, click Create Pool. The Create Pool dialog box appears. In the Create Pool dialog box, type a name for the pool. Under Type, choose one of the following options: mirror raidz3 raidz2 raidz1 disk Optionally, select to Auto choose drives from alternating enclosures if you want BrickStor SP Manager to select the drives where your pools will reside. Uncheck the check box if you prefer to manually select your disks. Under Drive Type, select from available drives. Select the number of vdevs. Select the number of drives per vdev. Select the number of spare drives you want the pool to have. Click Create. Rack View will display the queued changes and any pool that will be affected by changes will have the [changes staged] indicator on it. In the Changes pane, click Commit Changes. Viewing Pools Selecting a pool in the Connections pane displays information about the Pool’s structure and performance. Managing Pools BrickStor SP Manager features several ways to modify pools that are currently on the system. Expanding a Pool There are multiple ways to expand a pool. The first is to select the pool in Rack View, select ‘more’ from the bottom bar, and then click any of the available expansion options. The second option is to select the pool from the Connections pane on the left-hand side of BrickStor SP Manager and click either the Expand Data, Add Read Cache, Add Write Cache, or Add Spare button under the Pool heading, depending on what you would like to add to expand the pool (will only appear if the correct types of drives are available). This will bring up the Expand Pool dialog box where you can choose to expand the pool by adding more vdevs, read and write caches, or spares. When the desired settings have been configured, click create to queue the change. All changes in the queue will be indicated in Rack View and must be committed using the changes tab on the right side of BrickStor SP Manager. Growing a Pool While expanding a pool primarily deals with adding additional disks to an existing pool, there’s also the concept of growing the pool which is possible when the capacity of the underlying disk increases. This is typically possible when one of the following events occur: The pool is composed of mechanical or Solid State (SSD) drives and are replaced with a new ones of higher capacity. BrickStor SP is a VM and VM disk of the pool size is increased. The pool disk is an iSCSI or FibreChannel LUN and the size is increased on the underlying SAN solution. Should the option to grow the pool become available following one of these events, do the following: Using BrickStor SP Manager, select the desired pool. Select the General tab. Click the Fix button to grow the pool. Replacing a Drive If a drive becomes disabled or faulted it may be necessary to replace the drive with another available drive in the system. Select the drive you wish to replace in Rack View, click ‘more,’ and click ‘Replace Drive’. Or, if the drive is offline, you can navigate to the degraded pool in the Connections Pane on the left-hand side of the screen and click the Replace Drive button under the ‘Notable Vdevs & Drives’ heading. Selecting an offline drive from Rack View will also bring up actions that can be performed on it. This will bring up the Replace Drive dialog box where you can select the drive to use as the replacement then click the Replace button to queue the change. The change will be indicated in Rack View and will not be committed until the Commit Changes button is clicked on the Changes tab. Removing a Spare Drive If a pool has a spare drive that no longer requires one, it can be removed to free up the drive by selecting the spare in the Rack View, selecting ‘more,’ and clicking the ‘Remove Drive’ button. The change will be indicated in Rack View and will not be committed until you click the Commit Changes button in the Changes tab on the left-hand side. Splitting a Mirrored Pool A pool consisting of mirror vdevs can be split into two pools with no redundancy that contain the same data. that this is only recommended in certain scenarios as the lack of redundancy increases the risk of data loss. To split a mirrored pool, navigate to the pool from the Connections pane on the left-hand side and click the Split Mirrors into New Pool button under the Pool heading (you will need to click the arrow button to the right of the Pool heading to access this). From the changes tab on the right-hand side you can change the name of the new pool that will result from the split and commit the changes with the Commit Changes button (by default the new pool created this way will be exported). Attaching a Drive to a Pool A pool with no redundancy can be converted to a mirrored pool, if there are enough available drives, in order to reduce the risk of data loss. To do this, select the pool in Rack View, select ‘more’, and click the ‘Attach & Create Mirror’ button. Or navigate to the pool from the Connections pane on the left-hand side and click the Convert Disks to Mirrors button under the Pool heading. If done through Rack View, you will need to select the drive to attach yourself. When done through the pool’s page it will select a drive for you automatically. The change will be indicated in Rack View and will not be committed until you click the Commit Changes button in the Changes tab on the right-hand side. Trimming a Pool If a pool is going to be retired or is no longer necessary and to be removed, it can be trimmed to the minimum operational set of drives. This will remove all redundancy and additional data protection and should only be done in specific scenarios. To trim a pool, navigate to the pool from the Connections pane on the left hand side and click the Trim Drives to Minimum Operational Set button under the Pool heading (you will need to click the arrow button to the right of the Pool heading to access this). The steps it will take to trim the pool will be listed in the changes tab on the left-hand side and no changes will take effect until the Commit Changes button is clicked. Scanning and Repairing a Pool A pool can be checked for faults or problems and corrected using the scan pool feature. To scan a pool for potential faults, either select the pool in Rack View and click the more button at the bottom of the rack view and click Start Scan. The button is also available on the Pool Tab. The scan will not be started until you click the Commit Changes button in the Changes tab on the left-hand side. If the scan detects a faulty drive in the pool, it will mark the drive as degraded and replace it with a spare drive if one is available. From the pool’s screen on the Connections pane, the faulted drive will appear under Notable Vdevs & Drives. You can choose to promote the spare drive and detach the faulted drive from the pool, replace the faulted drive with another available drive on the system and return the spare to be a spare for the pool, or you can clear the errors on the drive if the problem has been corrected and return the spare. These options can also be found at the bottom of the screen in Rack View. Each of these changes will require you to click the Commit Changes button in the Changes tab on the left-hand side to complete the action. Pool Storage Utilization Storage Utilization allows you to view information about the physical storage consumed by a pool. Viewing Pool Storage Utilization Statistics In the Connections pane, select a pool. In the Details pane, select Storage Utilization. Pool Performance Clicking on the ‘Pool Performance’ link leads to a page with charts and graphs about this pool’s performance history. Admins can zoom in on the graph to look at specific time periods. Pool Sharing Information The sharing tab shows the same information as the Sharing menu at the appliance level but scoped only to those shares on the selected pool. Pool Settings This tab contains settings that apply to the pool including a pool level reservation. The pool reservation by default is set to 10% of the pool capacity up to 100GB. This is in place as a safety measure to prevent the pool from becoming completely full and making it difficult to do the necessary operations to remove data. When the pool becomes full the admin can release some or all of the Pool reservation. There is a hidden checkbox at the top of the page, ‘show advanced,’ that will provide more options. Destroying Pools To Destroy a pool, select the destroy icon while in the pool view. Once committed, this will destroy all descendant datasets and snapshots as well. You must double-click the pool(s) in the dialog to confirm. To ensure that all data is fully unrecoverable, there is also the option to Cryptographically Erase data on Self Encrypting Drives. This option is presented in the Changes pane during the commit. See Cryptographically Erasing SEDs for more details.