Dedicated Access to GlusterFS-Based Shared Storage on Oracle Cloud Infrastructure

 Sanjay Basu


Oracle Cloud Infrastructure (OCI) provides bare metal compute instances for both high-frequency CPU and GPU environments, that's why OCI is naturally great infrastructure for many high-performance computing (HPC) applications that need that processing power.

However, many applications also require fast access to shared file systems in order to execute quickly and efficiently. Companies developing machine-learning-based applications want to provision their own shared file systems on Oracle Cloud Infrastructure because they are already using them elsewhere and are familiar with them, their performance characteristics, and other features.

We recommend using GlusterFS for very fast shared file storage for HPC, machine-learning, or deep-learning workloads using GPU nodes. GlusterFS is a distributed, scale-out file system that lets you rapidly provision additional storage based on your storage consumption needs. It incorporates automatic failover as a primary feature. And it accomplishes all of this without a centralized metadata server. There is an excellent blog post about setting up GlusterFS on Oracle Cloud Infrastructure for fast shared access from client instances.

Dedicate a VNIC to File Storage Traffic

The raw compute power of these bare metal compute instances can sometimes lead to bandwidth constraints for storage traffic because the storage traffic shares the physical network interface card (NIC) with other network traffic. Our bare metal instances have two physical NICs capable of pushing traffic at 25 Gbps each. So, to address this issue, we’ve enabled a configuration that dedicates the second physical NIC (VNIC) to this file storage traffic. The following figure shows the high-level architecture.

The configuration consists of the following high-level steps:

  1. A separate regional subnet is created for fast data access.
  2. A second NIC (VNIC) is activated on all HPC bare metal shapes.
  3. The second NIC (VNIC) is placed in the fast data access subnet.
  4. The GlusterFS solution for shared file access is placed in this same subnet.

For the console and API details, see the documentation.

Use the following detailed steps to build out this configuration.

Create a Regional Subnet for Data Access

A regional subnet spans all the availability domains in the region. We recommend using regional subnets because they are more flexible and make it easier to implement failover across availability domains.

  1. Open the navigation menu. Under Core Infrastructure, go to Networking and click Virtual Cloud Networks.

  2. Click the VCN you're interested in.

  3. Click Create Subnet.

  4. In the Create Subnet dialog box, specify the resources to associate with the subnet (for example, a route table, and so on). By default, the subnet is created in the current compartment, and you choose the resources from the same compartment. Click the click here link in the dialog box if you want to enable compartment selection for the subnet and each of those resources.

    Enter the following values:

    • Create in Compartment: If you've enabled compartment selection, specify the compartment where you want to put the subnet.
    • Name: A friendly name for the subnet. It doesn't have to be unique. You can’t change it later in the Console, but you can change it with the API. Avoid entering confidential information.
    • Regional or AD-specific subnet: We recommend creating only regional subnets, which means that the subnet can contain resources in any of the region's availability domains.
    • CIDR Block: A single, contiguous CIDR block for the subnet (for example, 172.16.0.0/24). It must be within the cloud network's CIDR block and can't overlap with any other subnets. You can't change this value later. See Allowed VCN Size and Address Ranges. For reference, use a CIDR calculator.
    • Enable IPv6 Address Assignment: This option is available only if the VCN is in the Government Cloud. For more information, see IPv6 Addresses.
    • Route Table: The route table to associate with the subnet. If you've enabled compartment selection, under Route Table Compartment, you must specify the compartment that contains the route table.
    • Private or public subnet: This controls whether VNICs in the subnet can have public IP addresses. For more information, see Access to the Internet.
    • Use DNS Hostnames in this Subnet: This option is available only if you provided a DNS label for the VCN during creation. If you want this subnet's instances to have DNS hostnames (which can be used with the built-in DNS capability in the VCN), select the check box for Use DNS Hostnames in this Subnet. Then you can specify a DNS label for the subnet, or the Console generates one for you. The dialog box automatically displays the corresponding DNS Domain Name for the subnet (<subnet DNS label>.<VCN DNS label>.oraclevcn.com). For more information, see DNS in Your Virtual Cloud Network.
    • DHCP Options: The set of DHCP options to associate with the subnet. If you've enabled compartment selection, under DHCP Options Compartment, you must specify the compartment that contains the set of DHCP options.
    • Security Lists: One or more security lists to associate with the subnet. If you've enabled compartment selection, you must specify the compartment that contains the security list.
    • Tags: Optionally, you can apply tags. If you have permissions to create a resource, you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information, see Resource Tags.
  5. Click Create.

    The subnet is created and displayed on the Subnets page in the compartment that you chose.

Create a Secondary NIC (VNIC)

You can add secondary VNICs to an instance after it's launched. Each secondary VNIC can be in a subnet in the same VCN as the primary VNIC, or in a different subnet that is either in the same VCN or a different one. However, all the VNICs must be in the same availability domain as the instance.

  1. Confirm that you're viewing the compartment that contains the instance you're interested in.

  2. Open the navigation menu. Under Core Infrastructure, go to Compute and click Instances.

  3. Click the instance to view its details.

  4. Under Resources, click Attached VNICs.

    The primary VNIC and any secondary VNICs attached to the instance are displayed.

  5. Click Create VNIC.

  6. In the Create VNIC dialog box, specify which VCN and subnet to put the VNIC in. By default, the VNIC is created in the current compartment, and you choose the VCN and subnet from the same compartment. Click the click here link in the dialog box if you want to enable compartment selection and choose a VCN or subnet in a different compartment.

    Enter the following values:

    • Name: A friendly name for the secondary VNIC. The name doesn't have to be unique, and you can change it later. Avoid entering confidential information.
    • Virtual Cloud Network Compartment: The compartment that contains the VCN that in turn contains the subnet of interest.
    • Virtual Cloud Network: The VCN that contains the subnet of interest.
    • Subnet Compartment: The compartment that contains the subnet of interest.
    • Subnet: The subnet of interest. The secondary VNIC must be in the same availability domain as the instance's primary VNIC, so the subnet list includes any regional subnets or AD-specific subnets in the primary VNIC's availability domain.
    • Physical NIC: Only relevant if this is a bare metal instance with two active physical NICs. Select which one you want the secondary VNIC to use. When you later view the instance's details and the list of VNICs attached to the instance, they're grouped by NIC 0 and NIC 1.
    • Use network security groups to control traffic: Select this check box to add the secondary VNIC to at least one network security group (NSG) of your choice. NSGs have security rules that apply only to the VNICs in that NSG.
    • Skip Source/Destination Check: By default, this check box is not selected, which means that the VNIC performs the source/destination check. Select this check box only if you want the VNIC to be able to forward traffic. See Source/Destination Check.
    • Private IP Address: Optional. An available private IP address of your choice from the subnet's CIDR (otherwise the private IP address is automatically assigned).
    • Assign public IP address: Whether to assign an ephemeral public IP address to the VNIC's primary private IP. Available only if the subnet is public. For more information, see Public IP Addresses.
    • Hostname: Optional. A hostname to be used for DNS within the cloud network. Available only if the VCN and subnet both have DNS labels. For more information, see DNS in Your Virtual Cloud Network.
    • Tags: Optionally, you can apply tags. If you have permissions to create a resource, you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information, see Resource Tags.
  7. Click Create VNIC.

    The secondary VNIC is created and then displayed on the Attached VNICs page for the instance. It can take several seconds for the secondary VNIC to appear on the page.

  8. Configure the OS to use the VNIC. See Linux: Configuring the OS for Secondary VNICs or Windows: Configuring the OS for Secondary VNICs.

In conclusion, the second NIC (VNIC) in each compute instance is used for fast access to shared file storage using the whole 25 Gbps bandwidth. This enables all the instances to share a dedicated 25 Gbps channel between the file storage headends and those instances. This is particularly beneficial for workloads running machine-learning-based training that shares massive amounts of data.

If you haven't yet tried Oracle Cloud Infrastructure, you can do so at https://www.oracle.com/cloud/free.

Comments

Popular posts from this blog

OCI Object Storage: Copy Objects Across Tenancies Within a Region

Religious Perspectives on Artificial Intelligence: My views

Resilient IP-Based Connectivity Between IoT Sensors and Diverse Oracle Cloud Infrastructure Regions