UNIX Quick Start Guides

Unix (all-caps UNIX for the trademark) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, developed in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.
From the power user’s or programmer’s perspective, Unix systems are characterized by a modular design that is sometimes called the “Unix philosophy”, meaning that the operating system provides a set of simple tools that each perform a limited, well-defined function, with a unified filesystem as the main means of communication and a shell scripting and command language to combine the tools to perform complex workflows. Aside from the modular design, Unix also distinguishes itself from its predecessors as the first portable operating system: almost the entire operating system is written in the C programming language that allowed Unix to reach numerous platforms. Read more.


There following are some resources to learn about UNIX:

Unix Operating System
Unix Commands Tutorial 1
— TutorialPoints Unix_Tutorial
— UNIX CheatSheets Unix_CheatSheet_1 Unix_CheatSheet_2

Getting Started with ASP.NET Web Forms and MS Visual Studio

The .NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows. It includes a large class library known as Framework Class Library (FCL) and provides language interoperability (each language can use code written in other languages) across several programming languages. Programs written for .NET Framework execute in a software environment (as contrasted to hardware environment), known as Common Language Runtime (CLR), an application virtual machine that provides services such as security, memory management, and exception handling. FCL and CLR together constitute .NET Framework. Read more.


The following are some resources to learn about ASP.NET, Visual Studio and other Microsoft products:

W3Schools ASP.NET 4 Tutorial
Tutorial Points ASP.NET Tutorial
MS Introduction to ASP.NET Web Forms
— ASP.NET Hands On Project ASP.NET Project
Free Microsoft Training Delivered by Experts


Using Google Apps

Google Apps for Work (formerly Google Apps for Business) is a suite of cloud computing productivity and collaboration software tools and software offered on a subscription basis by Google.

It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar, and Google Docs. While these products are available to consumers free of charge, Google Apps for Work adds business-specific features such as custom email addresses at your domain (@yourcompany.com), at least 30 GB of storage for documents and email, and 24/7 phone and email support. As a cloud computing solution, it takes a different approach from off-the-shelf office productivity software by hosting customer information in Google’s network of secure data centers, rather than on traditional in-house servers that are located within companies. Read more.


The following are some resources to learn about Google Apps:

Using Google Apps at Work or School

Introduction to AWS (Amazon Web Services)

Amazon Web Services is a collection of remote computing services, also called web services, that make up a cloud computing platform offered by Amazon.com. These services are based out of 11 geographical regions across the world. To learn more about Cloud Computing please click here.


The following are some resources to learn about Amazon Web Services:

— Webinar: What Is AWS?

Introduction to Amazon Web Services

AWS in plain English

7 Steps to Select the Right Architecture for Your Web Application

AWS Instructional Videos and Labs

Children’s Hospital Implements Radiology Tracking Board

CHP LogoPittsburgh, PA:  The radiology department at Children’s Hospital of Pittsburgh of UPMC this week began using a customized on-line patient tracking board created for them by Data Warehouse Consultants. This new application significantly reduces the phone calls and coordination that nurses, physicians and staff must do to efficiently move patients through the radiology department.

The implementation of the tracking board was the final result of an extensive development effort led by Data Warehouse Consultants. The firm worked with Children’s Hospital staff to document the specific processes the hospital uses to maintain a world-class facility. These processes were then converted into a web-based application using Data Warehouse Consultants’ methodology to create a customized tracking board specifically for the radiology department. The tracking board allows staff in the department to monitor and update the status of X-ray, ultrasound, MRI and other radiology patients as they progress through the department.

About Children’s Hospital of Pittsburgh: Renowned for its outstanding clinical services, research programs and medical education, Children’s Hospital of Pittsburgh of UPMC has helped establish the standards of excellence in pediatric care. From Ambulatory Care to Transplantation and Cardiac Care, talented and committed pediatric experts care for infants, children and adolescents who make more than 1,000,000 visits to Children’s, its many neighborhood locations, and Children’s Community Pediatrics practices each year.   (www.chp.edu)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)



Development in Sqoop 1 vs Sqoop 2

I was working on a project last week to load data into a HDFS-based Hive database. This was essentially a periodic load so Sqoop appeared to be the best tool for the job. My small project consisted of the following goals:

– Connect Sqoop to SQL Server and/or Oracle instances
– Export a set of tables into HDFS files
– Load the data files into Hive tables

Sqoop was a new tool to me, so I started with the highest version which was 1.99.3. I was thinking that it’s almost always better to start with the new functionality in case you might need it. I struggled through the limited documentation but was eventually able to get Sqoop connected to both Oracle and SQL Server using the Command Line Interface available in Sqoop 2. The most challenging part of this exercise was working on the connection string and TCP/IP isues, but that’s a topic for another time.

I was able to export the tables into HDFS relatively easily, and I began looking for the run-time option that would allow me to automatically create the Hive table. I couldn’t figure out how to do it right away, but I was able to run a LOAD DATA operation in HIVE to load the data files into HIVE. This was an ok solution, but the problem is that I expected Sqoop to do this automatically. I needed to transfer about 500 tables, so loading them all manually was going to be a real pain.

After researching the issue further I discovered that the 1.99 version of Sqoop does not support the automatic creation of Hive tables that is available in 1.4.4 yet. Doh! This is a key requirement for my project with so many tables, so it turns out that choosing 1.99 was not the best decision. Once I knew that was the case, I began researching how to do this task in version 1.4.5 instead. In this version there is a simple –create-hive-table option that accomplishes my goal easily and seamlessly. Luckily for me most of the work I had already done on 1.99 translated fairly well back to 1.99. This allowed me to complete the project relatively quickly after I decided to roll back to an earlier version.

The moral of this story is that in the Wild Wild West of big data, newer is not always better. It pays to put the work in up front to be sure the version you are selecting meets you needs. In the open source world often the old version is older but “old reliable” – more reliable and has more features.

ACDHS Qualifies Data Warehouse Consultants to Build Custom Applications

Alleghany County Seal LogoPittsburgh, PA:  Allegheny County Department of Human Services (DHS) approved Data Warehouse Consultants to build business intelligence and web applications beginning in September 2014. This provides Data Warehouse Consultants the opportunity to fulfill projects for the Department of Human Services on an as needed, project-by-project basis.

Of the company’s recent success, Data Warehouse Consultants President John Shantz said “We are very excited to have been qualified to do application work for DHS.  This will allow us to expand our existing relationship with the county to really make a difference in continually improving the valuable services that DHS provides.  We look forward to working closely with DHS to use technology in helping Allegheny County communities and citizens.”

Achieving this qualification designation was the result of a rigorous Request For Qualifications (RFQ) process administered by the county that required Data Warehouse Consultants to compete with other interested firms.  Ultimately DHS decided that the consulting firm’s experience and focus are the best fit for the work that needs to be done.

About Allegheny County Department of Human Services:  DHS is responsible for providing and administering publicly-funded human services to Allegheny County residents. DHS is dedicated to meeting these human services needs, most particularly for the county’s vulnerable populations, through an extensive range of information exchange, prevention, early intervention, case management, crisis intervention and after-care services.  (www.alleghenycounty.us/dhs/)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)



Header/Detail XML Query in Oracle

On a client project today we had the need to create a table from an XML data file sent in a hierarchical format. The XML file contains multiple orders. Each order is for a particular store which is provided inside a header tag. Each store can order multiple items which are provided via sub tags of each header. A simplified sample of the file looks like this:

[box type=”shadow”]

        <account_name>JOES CONVENIENCE</account_name>
        <store_address>123 MAIN STREET</store_address>
            <product_name>Peg Counter Spinner</product_name>
        <account_name>SMITH LIMITED</account_name>
        <store_address>132 WEST AVENUE</store_address>
            <product_name>Peg Counter Spinner</product_name>


The challenge was to write an Oracle query that could easily return both the header and detail information on a single row.  After some quick research I found that this can be done by using two separate XMLTables that hit the same file.  The first XMLTable contains store-level information.  The second XMLTable contains item level information.  When used together the second table starts with the XMLType (order_lines) that is defined in the first XMLTable.  The outer query simply selects fields you need.  It’s essentially a Cartesian product but it only intersects where appropriate given the “passing” clause. Here is a sample of the query to read the data.  This query assumes the file name is orders.xml and it is sitting in the previously defined XMLDIR Oracle directory.

[box type=”shadow”]

SELECT stores.order_id
FROM XMLTable('/orders/order_header'
	     passing xmltype(  bfilename('STRIPE_UAT','wrigley_orders.xml'), nls_charset_id('AL32UTF8') )
         columns  order_id varchar2(2000) path 'id'
                 ,account_number varchar2(50)  path 'account_number'
                 ,account_name   varchar2(50)  path 'account_name'
                 ,retail_number  varchar2(12)  path 'store_externalid'
                 ,store_name     varchar2(50)  path 'store'
                 ,store_address  varchar2(100) path 'store_address'
                 ,store_state    varchar2(5)   path 'store_address_state'
                 ,store_zip      varchar2(10)  path 'store_address_postcode'
                 ,order_lines    XMLTYPE       path 'order_line'
     ) stores,
	     passing stores.order_lines
         columns  product_code   varchar2(100) path 'product_code'
                 ,product_name   varchar2(100) path 'product_name'
                 ,order_quantity varchar2(100) path 'order_quantity'
     ) items


This methodology can easily be extended to query an XML structure n levels deep to return information in one result set.

ASP.NET Issue: The SqlDataSource control does not have a naming container…

I have received the following .NET error several times in the past when developing nested databound controls. I have found the solution does not have good coverage on the Internet. The curious thing about this error is that it does not appear until 2nd or 3rd postback on the page.

[box]The SqlDataSource control does not have a naming container. Ensure that the control is added to the page before calling DataBind[/box]

The cause is simple, however. It occurs when you use a ControlParameter on a DataSource that is within another data control. I ran into it again today on a Gridview within a DataList for example. The error occurs because the inner control cannot find the control containing the parameter for some reason.

The solution is simple as well. You must remove the ControlParameter from the DataSource that is inside the other control. This can be accomplished by using a normal <asp:Parameter> and assigning a DefaultValue to it when the outer control binds.