Moving Your Data Warehouse to the Cloud

The Cloud offers many advantages and many organizations are looking to utilize Cloud solutions.  But how easy is it to move an existing Data Warehouse to Microsoft Azure or Amazon Web Services?  To gain insight and better advise our clients, Data Warehouse Consultants has begun two research projects to identify the benefits and pitfalls of a move to the Cloud.

One project migrates an existing operational data warehouse to Microsoft Azure utilizing Azure SQL Data Warehouse, SQL Server Integration Services (SSIS) and T-SQL in a cloud hosted server environment.  The other project migrates the same operational data warehouse to AWS Redshift using Python in the AWS Lambda serverless environment.

Through these two parallel development efforts, Data Warehouse Consultants will be better positioned to help clients evaluate the different Cloud based solutions and assist an organization as it migrates existing systems or implements new solutions in a Cloud based environment.

Consultant Advises CMU Students on Cutting Edge Capstone Project

Data Warehouse Consultants is focused on the Western Pennsylvania market and encourages our consultants to support the local institutions here.  Bob Brichacek,  a senior consultant for the firm, moonlights as an adjunct professor at the Carnegie Mellon University Heinz Graduate School.  Bob teaches a graduate course in Data Warehousing and acts as a regular sponsor of the Master’s of Information System Management’s “capstone” program.  This program is a semester-long project course where students work in teams to solve a real-world problem submitted by local businesses.

This year Bob’s team built an incredibly functional cloud-based video conversion tool using the Amazon Web Services cloud-based products.  This project has been featured on Carnegie Mellon’s website.  They write:

Bob Brichacek, the team’s faculty advisor, commended the students on their motivation and creativity, calling them “some of the smartest people I’ve ever worked with.”

“This is the most current technology. [The students] got to play with [tools] that a very small percentage of the population has played with,” said Brichacek. He adds that the pieces of the AWS toolset don’t just plug and play, and that the students’ main accomplishment was successfully customizing each piece to have the functionality Legendary required.

Click here to read the full article.

 

Excel External Content Blocked

We have been encountering a new issue with Microsoft Excel and one of our clients had it today.  It took a little while to figure out, so we want to help out others in this situation.

 

Issue – Excel External/Web Content is Blocked: In most of our interactive applications, we allow for exporting data (either to csv or to xls extensions). By default, these downloads will open in Excel.  A recent update did something to Excel to prevent these files from opening.  It happened to my PC at home (just figured I did something). But then today, it happened at one of the client sites. Everything looks OK with the file when you download it, but either the file doesn’t open, or excel opens and there is no file there.  There is no error message or anything – the file just doesn’t open.

 

Resolution: When the file downloads, instead of opening the file, open the folder location to view the file. Right click the file, and click properties.  At the bottom of the window, there is an “Unblock” button. If you click unblock you can open the file. This shows that the file content is getting blocked because it came from the internet (external).  Unblocking it allows for opening the file (but you’d have to do that every time you download/open a file).  To not have to do it every time, you have to change excel settings.

 

Steps:
1. Open excel -> options -> trust center. Click “Trust Center Settings”.
2. Options -> Trust Center -> Click “Trust Center Settings”
3. Go to Protected View: Uncheck “Enable Protected View for files originating from the internet”.

 

After doing some research, it seems we are not the only ones experiencing this issue.  Here is a link to news article about it:

 

8/26/2016 Update – It seems like Microsoft has now resolved the issue.

UPMC Wins Davies Award for Custom DWC Solution

CHP LogoPittsburgh, PA:  The Children’s Hospital of Pittsburgh of UPMC this week was named the 2015 HIMSS Enterprise Davies Award recipient for two innovative solutions designed and implemented by Data Warehouse Consultants. The Davies Awards program promotes electronic health record-enabled improvement in patient outcomes through sharing of case studies and lessons learned across a wide range of efforts, including implementation strategies, workflow design, best practice development and adherence, and patient engagement that have improved care outcomes for patients.

This specific solution that earned UPMC the awards was the CHP Pharmacy MedEx Mobile Application, which uses mobile technology to accurate track the delivery of pharmaceuticals within the hospital. A second contributing solution was the CHP Breastmilk-Tracking Mobile Application, which uses mobile technology to accurately match babies with their mother’s breastmilk in the Neonatal ICU. For more information please visit the HIMSS Award Page.

About Children’s Hospital of Pittsburgh: Renowned for its outstanding clinical services, research programs and medical education, Children’s Hospital of Pittsburgh of UPMC has helped establish the standards of excellence in pediatric care. From Ambulatory Care to Transplantation and Cardiac Care, talented and committed pediatric experts care for infants, children and adolescents who make more than 1,000,000 visits to Children’s, its many neighborhood locations, and Children’s Community Pediatrics practices each year.   (www.chp.edu)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)

 

 

UNIX Quick Start Guides

Unix (all-caps UNIX for the trademark) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, developed in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.
From the power user’s or programmer’s perspective, Unix systems are characterized by a modular design that is sometimes called the “Unix philosophy”, meaning that the operating system provides a set of simple tools that each perform a limited, well-defined function, with a unified filesystem as the main means of communication and a shell scripting and command language to combine the tools to perform complex workflows. Aside from the modular design, Unix also distinguishes itself from its predecessors as the first portable operating system: almost the entire operating system is written in the C programming language that allowed Unix to reach numerous platforms. Read more.

 

There following are some resources to learn about UNIX:

Unix Operating System
Unix Commands Tutorial 1
— TutorialPoints Unix_Tutorial
— UNIX CheatSheets Unix_CheatSheet_1 Unix_CheatSheet_2

Getting Started with ASP.NET Web Forms and MS Visual Studio

The .NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows. It includes a large class library known as Framework Class Library (FCL) and provides language interoperability (each language can use code written in other languages) across several programming languages. Programs written for .NET Framework execute in a software environment (as contrasted to hardware environment), known as Common Language Runtime (CLR), an application virtual machine that provides services such as security, memory management, and exception handling. FCL and CLR together constitute .NET Framework. Read more.

 

The following are some resources to learn about ASP.NET, Visual Studio and other Microsoft products:

W3Schools ASP.NET 4 Tutorial
Tutorial Points ASP.NET Tutorial
MS Introduction to ASP.NET Web Forms
— ASP.NET Hands On Project ASP.NET Project
Free Microsoft Training Delivered by Experts

 

Using Google Apps

Google Apps for Work (formerly Google Apps for Business) is a suite of cloud computing productivity and collaboration software tools and software offered on a subscription basis by Google.

It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar, and Google Docs. While these products are available to consumers free of charge, Google Apps for Work adds business-specific features such as custom email addresses at your domain (@yourcompany.com), at least 30 GB of storage for documents and email, and 24/7 phone and email support. As a cloud computing solution, it takes a different approach from off-the-shelf office productivity software by hosting customer information in Google’s network of secure data centers, rather than on traditional in-house servers that are located within companies. Read more.

 

The following are some resources to learn about Google Apps:

Using Google Apps at Work or School

Introduction to AWS (Amazon Web Services)

Amazon Web Services is a collection of remote computing services, also called web services, that make up a cloud computing platform offered by Amazon.com. These services are based out of 11 geographical regions across the world. To learn more about Cloud Computing please click here.

 

The following are some resources to learn about Amazon Web Services:

— Webinar: What Is AWS?

Introduction to Amazon Web Services

AWS in plain English

7 Steps to Select the Right Architecture for Your Web Application

AWS Instructional Videos and Labs

A top 10 Children’s Hospital Implements Radiology Tracking Board

The radiology department of a top 10 Children’s Hospital began using a customized on-line patient tracking board created for them by Data Warehouse Consultants this week. This new application significantly reduces the phone calls and coordination that nurses, physicians and staff must do to efficiently move patients through the radiology department.

The implementation of the tracking board was the final result of an extensive development effort led by Data Warehouse Consultants. The firm worked with the hospitals staff to document the specific processes it uses to maintain a world-class facility.

These processes were then converted into a web-based application using Data Warehouse Consultants’ methodology to create a customized tracking board specifically for the radiology department. The tracking board allows staff in the department to monitor and update the status of X-ray, ultrasound, MRI and other radiology patients as they progress through the department.

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA. Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)

 

 

Development in Sqoop 1 vs Sqoop 2

I was working on a project last week to load data into a HDFS-based Hive database. This was essentially a periodic load so Sqoop appeared to be the best tool for the job. My small project consisted of the following goals:

– Connect Sqoop to SQL Server and/or Oracle instances
– Export a set of tables into HDFS files
– Load the data files into Hive tables

Sqoop was a new tool to me, so I started with the highest version which was 1.99.3. I was thinking that it’s almost always better to start with the new functionality in case you might need it. I struggled through the limited documentation but was eventually able to get Sqoop connected to both Oracle and SQL Server using the Command Line Interface available in Sqoop 2. The most challenging part of this exercise was working on the connection string and TCP/IP isues, but that’s a topic for another time.

I was able to export the tables into HDFS relatively easily, and I began looking for the run-time option that would allow me to automatically create the Hive table. I couldn’t figure out how to do it right away, but I was able to run a LOAD DATA operation in HIVE to load the data files into HIVE. This was an ok solution, but the problem is that I expected Sqoop to do this automatically. I needed to transfer about 500 tables, so loading them all manually was going to be a real pain.

After researching the issue further I discovered that the 1.99 version of Sqoop does not support the automatic creation of Hive tables that is available in 1.4.4 yet. Doh! This is a key requirement for my project with so many tables, so it turns out that choosing 1.99 was not the best decision. Once I knew that was the case, I began researching how to do this task in version 1.4.5 instead. In this version there is a simple –create-hive-table option that accomplishes my goal easily and seamlessly. Luckily for me most of the work I had already done on 1.99 translated fairly well back to 1.99. This allowed me to complete the project relatively quickly after I decided to roll back to an earlier version.

The moral of this story is that in the Wild Wild West of big data, newer is not always better. It pays to put the work in up front to be sure the version you are selecting meets you needs. In the open source world often the old version is older but “old reliable” – more reliable and has more features.