The Analytics Platform System team continues to provide rapid updates to the APS environment. The rate of release from the product team has been consistent for the last several years with 2-3 releases per year. Each of these releases not only stabilizes the platform while bringing performance improvements but continues down the path of providing TSQL parity with the SQL Server product line. Appliance Update 3 provides updates to the Parallel Data Warehouse workload along with updates to Polybase.
The updates to the PDW workload include moving the Windows Server version up to Windows Server 2012 R2 and the SQL Server version up to SQL Server 2014. Additionally, there are many T-SQL error handling statements now available such as Try..Catch and @@Error.
In a significant improvement to Polybase, it now supports ORC files in Hadoop. ORC files are Optimized Row Columnar files that offer superior compression and thus improved performance over previously supported file formats. The work on ORC files as been a collaboration between Microsoft and Hortonworks and now we can reap the benefits when using Polybase.
Other interesting improvements include being able to use a Data Management Gateway so that PowerBI can now access your APS appliance. Also, Microsoft simplified some of the infrastructure under the hood which should improve reliability for the appliance.
Finally, maybe the most exciting news is that now the documentation and tools for APS are publically available for download. For those of us that consult, being able to provide a public link to customers will definitely make our life easier than passing USB drives around. You can find the documentation and tools here:
I love TechED. The energy from the keynotes and expo simply isn’t matched by any of the other conferences I attend on a regular or irregular basis. I also love the diversity of sessions and people I will meet throughout the entire conference. I will be going back this year for the fourth time in five years. I have two sessions at TechED that I’ll be presenting on. Like last year, I will be presenting one session on SQL Server Parallel Data Warehouse and another focusing on HDInsight. If you are curious about whether or not you want to attend, check out my session on SQL Server 2012 PDW from last years TechED. If you can’t attend one of my sessions below, please stop by the Microsoft section of the Expo and we can talk Big Data and SQL Server PDW there.
The final reason I love TechEd is the variety of session and discussion I can have. From Cloud OS, SQL Server, System Center, Virtualization, Windows Server, .NET, Visual Studio…I can hit them all in a week and really round out my understanding of the entire ecosystem. Anyway, below you can get an idea of what my two sessions will be on and I look forward to meeting you there.
DBI-B329 The Role of Polybase in the Modern Data Warehouse
Tuesday, May 13 5:00 PM – 6:15 PM
We have all heard about the Polybase feature of the Microsoft SQL Server 2012 Parallel Data Warehouse, but what are the use cases for this technology? In this session we explore and demonstrate specific use cases for implementing Polybase into your Modern Data Warehouse solution. Specifically, we examine how Polybase can help you: streamline your ETL process by using Hadoop as the staging area of the backroom; export to your Hadoop environment your Enterprise Data Warehouse conformed dimensions; use Hadoop as a low cost, online data archive; and enrich your relational data with ambient data resident in Hadoop.
DBI-B328 From Zero to Data Insights Using HDInsight on Windows Azure
Wednesday, May 14 3:15 PM – 4:30 PM
Windows Azure HDInsight enables you to embrace Hadoop with seamless management of any type or size of data. The Microsoft Big Data and BI platforms enable data enrichment through discovery and advanced analytics so all users can easily gain and act on insights from all of their data, structured or unstructured, via familiar tools such as Microsoft Excel and the Power Suite. We walk through an end-to-end story, showcasing an example that includes how to provision a Hadoop cluster on Windows Azure in minutes, load data to the persistent Azure blob store, cleanse data with Pig, add structure with Hive, orchestrate workflows with Oozie, and gain insights via Excel.
I was lucky enough to be included as part of the author team for Microsoft Big Data Solutions from Wiley Press. I would like to thank Adam Jorgensen (@AJbigdata) for including me. Other members of the writing team included James Rowland-Jones (@jrowlandjones), John Welch (@john_welch), Dan Clark, and Chris Price (@BluewaterSQL). This was a labor of love for all involved as it explores Big Data through the Microsoft Solution stack specifically either with HDInsight or with their partner Hortenworks distribution of Hadoop known as Hortonworks Data Platform (HDP). If you are interested in getting started with Big Data and Hadoop and come from a background of using the Microsoft Data Platform, then this book is a good place to get started.
The Book in Parts:
Part I: What is Big Data
Part II: Setting Up Your First Big Data Environment
Part III: Storing and Managing Big Data
Part IV: Working with Your Big Data
Part V: Big Data and SQL Server Together
You can find more details about Microsoft Big Data Solutions from Wiley here: http://www.wiley.com/WileyCDA/WileyTitle/productCd-1118729080.html
If you are addicted to Amazon and want to go straight there and purchase, you can find Microsoft Big Data Solutions here:
Monitoring HDP with System Center
One of the things I am most proud of was getting System Center Operations Manager set up and configured to monitor an HDP cluster. For those companies that use System Center, being able to monitor your Hadoop solution with SCOM is a game changer when it comes to integrating Hadoop into your existing monitoring and alerting solutions. I know I had to be one of the first to get this configured and provide documentation on the process as I literally picked up the bits the week they were available. I’m looking forward to any feedback anyone has on the process.
So my wife asked me once it was complete whether or not I would ever do another book. Halfway through the process, I would have said no. This was the first time I had ever been part of a team writing a book and I was not very good at it at all. I’m sure my editors would agree. But towards the end, I started getting a feel that writing a book is like eating an elephant. You have to do it one bit at a time and do a couple pages a day, about 5 days a week. If you do that, you can keep up and even get ahead depending on your schedule. So yes, if I’m lucky enough to be asked to join another team, I’ll happily do it and hopefully do it with much less stress.
Last year the Professional Association of SQL Server (PASS) tried something new with the Business Analytics Conference. I was lucky enough to attend and I thought it was a hit. There was a diverse set of sessions ranging from traditional Microsoft BI to where open source solutions such as R can fit in an organization. Also, the keynotes where some of the best I’ve seen in years with Ariel Netz rocking PowerBI presentations and Stephen Levitt absolutely killing it with his take on analytics. I’m expecting the PASS BA Conference of 2014 to be even better. If you haven’t registered and would like to spend some time in Northern California in May, register here.
I’m very excited about presenting at the PASS Business Analytics conference with one of my teammates from the Big Data Center of Expertise – Tammy Richter Jones. Our session will focus on The Role of PDW (AU1) & Polybase in the Modern Data Warehouse. If you are interested in PDW and are wondering what Microsoft’s story is for integrating it into the larger ecosystem of Big Data and a Modern Data Warehouse, I suggest you attend our session. This session is something we’ve been working on for a while and we know you will come away from the session not only informed about the technicalities of how SQL Server PDW works but also be better prepared to utilize all of its new features in your environment.
In this session, we’ll introduce and discuss the architecture of SQL Server 2012 Parallel Data Warehouse and the new Appliance Update 1. Specifically, we’ll dig into Transparent Data Encryption, Integrated Authentication, the new HDInsight Region, and functionality for adding capacity to an appliance. We’ll also discuss Polybase in depth. This session will not only discuss the technical details of the new features, but also the use cases for this technology, by examining how Polybase can help you:
• Streamline your ETL process by using Hadoop as the staging area of the backroom
• Export to your Hadoop environment your Enterprise Data Warehouse conformed dimensions
• Use Hadoop as a low cost, online data archive
• Enrich your relational data with ambient data resident in Hadoop
The January 2014 SQL Server Data Tools update has some specific PDW updates to it to make it SQL Server 2012 PDW Appliance Update 1 (AU1) aware. AU1 is coming in the near future and you should update your tools to be ready for it. You can go ahead and update now as this update will make SSDT PDW version aware and you will get a different experience depending on whether or not you are on AU1 or not. I’ve updated my SSDT and connected just fine to my previous AU 0.5 appliance and I’m looking forward to checking out the differences once I have access to a AU1 appliance.