I had the pleasure of speaking at Ignite again this year. One thing was clear from the event, Microsoft’s momentum in the data platform, big data, and cognitive intelligence spaces is continuing to grow. For years I’ve talked about how Microsoft’s super power is making the complex simple. In the early part of this century that is how SQL Server went from being considered a departmental level database to being the most often deployed enterprise database in the world. At first it was adopted because of its simplicity and eventually it gained the enterprise chops to take on any competitor. In Azure, the same can be true as we see Data Lake Analytics become more mature and the idea of using a job service instead of building Hadoop clusters becomes normalized.
In my session, Architecting Robust Big Data Solutions with Azure Data Lake I focused on putting the pieces together based on what’s worked well for customers that have deployed big data solutions in Azure. The response to this session has been great and I thank everyone that attended the session in person. Now you too can watch the on-demand version of my session along with all the sessions from Ignite 2016.
A colleague and friend of mine, Brian Walker, formally a Solution Architect in the Data Insights COE, has published a new Analytics Platform System whitepaper on managing security in APS. BrianW (as I call him) has moved on to a new role in sales, but the gift he left behind in this whitepaper is valuable to anyone setting up security in APS. I know you will find it as informative and useful as I did.
Here is a quick summary of the whitepaper: This paper provides a framework for designing, implementing, and monitoring user roles and permissions in an enterprise data warehouse. The intended audience for this paper includes IT pros who are responsible architecting, implementing, and managing a security framework to support corporate and regulatory compliance. Microsoft’s Analytics Platform System and other Microsoft technologies opens the door to enterprise data warehousing (EDW) which can host multiple datasets with varying levels data classifications and regulatory requirements. This model is designed to address these requirements while minimizing administrative overhead.
The Data Insights Global Practice published two important whitepapers on the Analytics Platform System last month. Getting this kind of guidance on APS has been a long time coming. For anyone working on APS, I highly recommend downloading these papers and digesting their information. You’ll be glad you did. I’m working with other Architects in the global practice on a few other whitepapers for APS and you’ll be seeing those shortly.
Andy Isley, Data Insights Solutions Architect, has published Data Loading with APS that is now available publically online. This whitepaper explains many of the concepts important to loading data efficiently into APS using both DWLoader and SSIS.
Michael Hlobil, Data Insights Solutions Architect, and Ryan Mich, ACTO Data Insights COE Consultant, authored an outstanding guide Optimizing Distributed Database Design for APS whitepaper that is now available publically online. This whitepaper goes beyond the basic description of designing for distributed and replicated tables but delves into many edge use cases that you may run into on APS.
Both of these papers are also included in the menu of choices for download within the APS Documentation and Tools page.
If there is other guidance you are looking for, please let me know and I’ll explore my ability to influence getting it done.
The Analytics Platform System team continues to provide rapid updates to the APS environment. The rate of release from the product team has been consistent for the last several years with 2-3 releases per year. Each of these releases not only stabilizes the platform while bringing performance improvements but continues down the path of providing TSQL parity with the SQL Server product line. Appliance Update 3 provides updates to the Parallel Data Warehouse workload along with updates to Polybase.
The updates to the PDW workload include moving the Windows Server version up to Windows Server 2012 R2 and the SQL Server version up to SQL Server 2014. Additionally, there are many T-SQL error handling statements now available such as Try..Catch and @@Error.
In a significant improvement to Polybase, it now supports ORC files in Hadoop. ORC files are Optimized Row Columnar files that offer superior compression and thus improved performance over previously supported file formats. The work on ORC files as been a collaboration between Microsoft and Hortonworks and now we can reap the benefits when using Polybase.
Other interesting improvements include being able to use a Data Management Gateway so that PowerBI can now access your APS appliance. Also, Microsoft simplified some of the infrastructure under the hood which should improve reliability for the appliance.
Finally, maybe the most exciting news is that now the documentation and tools for APS are publically available for download. For those of us that consult, being able to provide a public link to customers will definitely make our life easier than passing USB drives around. You can find the documentation and tools here:
I love TechED. The energy from the keynotes and expo simply isn’t matched by any of the other conferences I attend on a regular or irregular basis. I also love the diversity of sessions and people I will meet throughout the entire conference. I will be going back this year for the fourth time in five years. I have two sessions at TechED that I’ll be presenting on. Like last year, I will be presenting one session on SQL Server Parallel Data Warehouse and another focusing on HDInsight. If you are curious about whether or not you want to attend, check out my session on SQL Server 2012 PDW from last years TechED. If you can’t attend one of my sessions below, please stop by the Microsoft section of the Expo and we can talk Big Data and SQL Server PDW there.
The final reason I love TechEd is the variety of session and discussion I can have. From Cloud OS, SQL Server, System Center, Virtualization, Windows Server, .NET, Visual Studio…I can hit them all in a week and really round out my understanding of the entire ecosystem. Anyway, below you can get an idea of what my two sessions will be on and I look forward to meeting you there.
DBI-B329 The Role of Polybase in the Modern Data Warehouse
Tuesday, May 13 5:00 PM – 6:15 PM
We have all heard about the Polybase feature of the Microsoft SQL Server 2012 Parallel Data Warehouse, but what are the use cases for this technology? In this session we explore and demonstrate specific use cases for implementing Polybase into your Modern Data Warehouse solution. Specifically, we examine how Polybase can help you: streamline your ETL process by using Hadoop as the staging area of the backroom; export to your Hadoop environment your Enterprise Data Warehouse conformed dimensions; use Hadoop as a low cost, online data archive; and enrich your relational data with ambient data resident in Hadoop.
DBI-B328 From Zero to Data Insights Using HDInsight on Windows Azure
Wednesday, May 14 3:15 PM – 4:30 PM
Windows Azure HDInsight enables you to embrace Hadoop with seamless management of any type or size of data. The Microsoft Big Data and BI platforms enable data enrichment through discovery and advanced analytics so all users can easily gain and act on insights from all of their data, structured or unstructured, via familiar tools such as Microsoft Excel and the Power Suite. We walk through an end-to-end story, showcasing an example that includes how to provision a Hadoop cluster on Windows Azure in minutes, load data to the persistent Azure blob store, cleanse data with Pig, add structure with Hive, orchestrate workflows with Oozie, and gain insights via Excel.