Skip to content

PDW’s Architecture: The Control Rack

2010 June 27
by Brian Mitchell

 

Trying to describe the architecture of SQL Server 2008 R2 Parallel Data Warehouse without a picture is difficult.  Trying to draw a good picture might be even harder.  I’m going to start with a modified version of what we typically use to describe the PDW appliance.  This diagram is a mixture of logical and physical design that gives us a good starting point for discussions.

PDW Architecture

 

The SQL PDW appliance ships with a minimum of two racks, a control rack and a data rack.  Additional data racks can be added to grow an appliance to distribute load and increase capacity.  Currently up to 40 compute nodes are supported, which would be either four or five data racks, depending on the vendor. 

The Control Rack includes six servers:  two Management Nodes, two Control Nodes, the Landing Zone, and the Backup Node.  Storage Area Network’s (SAN) are also included for the Control Node, Landing Zone, and Backup Node.  Additionally, the control rack ships with dual Infiniband, Ethernet, and Fiber switches needed for the rack. 

Because the appliance is designed to work out of the box, it includes its own Active Directory that is housed within the Management Node.  There are several reasons why PDW needs Active Directory, one of which is that we use Microsoft Clustering Services (MCS) within the appliance and MCS requires domain accounts for certain services to run.  Additionally the Management Node includes High Performance Computing (HPC) that is used during the initial install and for ease in management of the nodes within the appliance.

The Control Node is where user requests for data will enter and exit the appliance.  On the control nodes, queries will be parsed and then sent to compute nodes for processing.  Additionally, the metadata of the appliance and distributed databases is located here.  Essentially, the control node is the brains of the operation.  No persisted user data is located here, that all exists on the compute nodes within the data racks.  User data can be temporarily aggregated on the control node during query processing and then dropped after sent back to a client.

The Landing Zone is essentially a large file server with plenty of SAN storage to provide a staging ground for loading data into the appliance.  You will be able to load data either through the command line with DWLoader or through SSIS which now has an connector  for PDW.  The Backup Zone is another large file server that is designed to hold backups of the distributed databases on the appliance.  Compute nodes will be able to backup to the Backup Node in parallel via the high speed Infiniband connections that connect the nodes.  From the backup node, organizations will be able to offload their backups through their normal procedures.  Backups of a PDW database can only be restored to another PDW appliance with at least as many compute nodes as the database had when backed up.

I will continue to describe the control rack’s architecture in detail in upcoming posts.  Specifically, I will go into more detail on the part each component plays in the appliance.  My next post will discuss the data rack’s architecture at a high level.