Loading Flat Files into a Database (Method Overview)

Here we feature a few methods for loading files into an Oracle or SQL Server database. There are many methods for accomplishing this, but over the past few projects, and after implementing most of the methods discussed, I documented some of my experiences.   Here I focus on developing an application or process to load data from files, or on demand, when a more sophisticated tool like SSIS or Informatica is not already in use.

Oracle External  Tables:  When working with Oracle, I have often used external tables to assist with file loading. They are easy to setup, provide good error handling, and allow querying the data prior to loading it for validations. This has always been a favorite method for Oracle. However, on my last few projects this was not a great option.  Without access to the database server it requires coordination to setup the directory and to get access to the files in that directory on the server.  It also requires a secondary process to stage, rename, archive, and log the files on the file server.  Additionally, on a project using Amazon RDS, using external tables was simply not available.

SQL Server Bulk Methods (BULK Insert / OpenRowset): Similar to Oracle external tables, these methods provide a fast and easy way to get the data into the database.  Again, however, these require special permissions and/or access to the database server.  After going down this path on a recent project the method worked fairly well, but without access to the database server it was difficult to coordinate getting the files to the server. When using a file share on a remote server, it was difficult getting the permissions necessary on both the remote server (file share) and the database server.  It is possible, but with sensitive data it requires extra coordination and permissions, some of which are not desirable.  I also find it difficult to do detailed error handling and logging if the file does not arrive in the format that is expected.  This also requires a secondary process to stage, rename, archive, and log the files on the file server.

SSIS and other tools:  If SSIS (or other tools) are available; I find it to be a great method for loading the files quickly into the database.  Most of my experience with the tools is SSIS to SQL Server, but I know others have similar capabilities.  Typically if a client is already working with SSIS and/or Informatica, the client has processes in place for handling the files. If not, there is often a lot of cost and overhead to setup this process from scratch.  For our purposes let’s assume the developer is not already working with one of these tools at the client site.

I have had success with loading data using each method described thus far. Other than SSIS (or other third party tools), they all require a separate application or scripting to handle the actual files. An application that reads the file, does validations, does logging, moves/renames files, and archives the files when complete. This is sometime done through scripting, but often requires a more in depth approach for logging and auditing the actions. Additionally, all of the methods discussed thus far require additional coordination, permissions, and/or access. There are other BULK methods I have seen or used that are not mentioned. Most of the other methods that I have used were for a one time load of large data, an initial migration, a large data transfer, or import/exports.

C# Application: This leads me to an application-based method. My preference in the last couple of projects has been to use a C# application to handle the file moving/renaming/logging, file parsing, and data loading. This method is easily scheduled, configured, and migrated.  It typical requires little to no additional coordination, software, or permissions.  One, fairly simple and self-contained application, can handle the processing.  Additionally, the application is very similar whether running against Oracle or SQL Server (Cloud or on-premises).  The data loading aspects are also easy converted to web applications that require user uploading of data to the database.

Once an application template is designed for logging, and error handling, it is fairly simple to expand the application to handle numerous formats in the same manner.  This method has been efficient, easy to implement, and easy to use between various projects. Within the C# application I have used various options to load the data, and found that some are much more efficient than others.  Click Here to read more about the methods we have used within our C# applications.

Moving Your Data Warehouse to the Cloud

The Cloud offers many advantages and many organizations are looking to utilize Cloud solutions.  But how easy is it to move an existing Data Warehouse to Microsoft Azure or Amazon Web Services?  To gain insight and better advise our clients, Data Warehouse Consultants has begun two research projects to identify the benefits and pitfalls of a move to the Cloud.

One project migrates an existing operational data warehouse to Microsoft Azure utilizing Azure SQL Data Warehouse, SQL Server Integration Services (SSIS) and T-SQL in a cloud hosted server environment.  The other project migrates the same operational data warehouse to AWS Redshift using Python in the AWS Lambda serverless environment.

Through these two parallel development efforts, Data Warehouse Consultants will be better positioned to help clients evaluate the different Cloud based solutions and assist an organization as it migrates existing systems or implements new solutions in a Cloud based environment.

Consultant Advises CMU Students on Cutting Edge Capstone Project

Data Warehouse Consultants is focused on the Western Pennsylvania market and encourages our consultants to support the local institutions here.  Bob Brichacek,  a senior consultant for the firm, moonlights as an adjunct professor at the Carnegie Mellon University Heinz Graduate School.  Bob teaches a graduate course in Data Warehousing and acts as a regular sponsor of the Master’s of Information System Management’s “capstone” program.  This program is a semester-long project course where students work in teams to solve a real-world problem submitted by local businesses.

This year Bob’s team built an incredibly functional cloud-based video conversion tool using the Amazon Web Services cloud-based products.  This project has been featured on Carnegie Mellon’s website.  They write:

Bob Brichacek, the team’s faculty advisor, commended the students on their motivation and creativity, calling them “some of the smartest people I’ve ever worked with.”

“This is the most current technology. [The students] got to play with [tools] that a very small percentage of the population has played with,” said Brichacek. He adds that the pieces of the AWS toolset don’t just plug and play, and that the students’ main accomplishment was successfully customizing each piece to have the functionality Legendary required.

Click here to read the full article.

 

UPMC Wins Davies Award for Custom DWC Solution

CHP LogoPittsburgh, PA:  The Children’s Hospital of Pittsburgh of UPMC this week was named the 2015 HIMSS Enterprise Davies Award recipient for two innovative solutions designed and implemented by Data Warehouse Consultants. The Davies Awards program promotes electronic health record-enabled improvement in patient outcomes through sharing of case studies and lessons learned across a wide range of efforts, including implementation strategies, workflow design, best practice development and adherence, and patient engagement that have improved care outcomes for patients.

This specific solution that earned UPMC the awards was the CHP Pharmacy MedEx Mobile Application, which uses mobile technology to accurate track the delivery of pharmaceuticals within the hospital. A second contributing solution was the CHP Breastmilk-Tracking Mobile Application, which uses mobile technology to accurately match babies with their mother’s breastmilk in the Neonatal ICU. For more information please visit the HIMSS Award Page.

About Children’s Hospital of Pittsburgh: Renowned for its outstanding clinical services, research programs and medical education, Children’s Hospital of Pittsburgh of UPMC has helped establish the standards of excellence in pediatric care. From Ambulatory Care to Transplantation and Cardiac Care, talented and committed pediatric experts care for infants, children and adolescents who make more than 1,000,000 visits to Children’s, its many neighborhood locations, and Children’s Community Pediatrics practices each year.   (www.chp.edu)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)

 

 

Children’s Hospital Implements Radiology Tracking Board

CHP LogoPittsburgh, PA:  The radiology department at Children’s Hospital of Pittsburgh of UPMC this week began using a customized on-line patient tracking board created for them by Data Warehouse Consultants. This new application significantly reduces the phone calls and coordination that nurses, physicians and staff must do to efficiently move patients through the radiology department.

The implementation of the tracking board was the final result of an extensive development effort led by Data Warehouse Consultants. The firm worked with Children’s Hospital staff to document the specific processes the hospital uses to maintain a world-class facility. These processes were then converted into a web-based application using Data Warehouse Consultants’ methodology to create a customized tracking board specifically for the radiology department. The tracking board allows staff in the department to monitor and update the status of X-ray, ultrasound, MRI and other radiology patients as they progress through the department.

About Children’s Hospital of Pittsburgh: Renowned for its outstanding clinical services, research programs and medical education, Children’s Hospital of Pittsburgh of UPMC has helped establish the standards of excellence in pediatric care. From Ambulatory Care to Transplantation and Cardiac Care, talented and committed pediatric experts care for infants, children and adolescents who make more than 1,000,000 visits to Children’s, its many neighborhood locations, and Children’s Community Pediatrics practices each year.   (www.chp.edu)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)

 

 

ACDHS Qualifies Data Warehouse Consultants to Build Custom Applications

Alleghany County Seal LogoPittsburgh, PA:  Allegheny County Department of Human Services (DHS) approved Data Warehouse Consultants to build business intelligence and web applications beginning in September 2014. This provides Data Warehouse Consultants the opportunity to fulfill projects for the Department of Human Services on an as needed, project-by-project basis.

Of the company’s recent success, Data Warehouse Consultants President John Shantz said “We are very excited to have been qualified to do application work for DHS.  This will allow us to expand our existing relationship with the county to really make a difference in continually improving the valuable services that DHS provides.  We look forward to working closely with DHS to use technology in helping Allegheny County communities and citizens.”

Achieving this qualification designation was the result of a rigorous Request For Qualifications (RFQ) process administered by the county that required Data Warehouse Consultants to compete with other interested firms.  Ultimately DHS decided that the consulting firm’s experience and focus are the best fit for the work that needs to be done.

About Allegheny County Department of Human Services:  DHS is responsible for providing and administering publicly-funded human services to Allegheny County residents. DHS is dedicated to meeting these human services needs, most particularly for the county’s vulnerable populations, through an extensive range of information exchange, prevention, early intervention, case management, crisis intervention and after-care services.  (www.alleghenycounty.us/dhs/)

About Data Warehouse Consultants:  Data Warehouse Consultants is a consulting firm based in Pittsburgh, PA.  Founded in 2004 the firm provides custom development services for web applications and corporate database and data warehouse environments.  (www.dwconsultants.com)