Loading Flat Files into a Database (Method Overview)
Here we feature a few methods for loading files into an Oracle or SQL Server database. There are many methods for accomplishing this, but over the past few projects, and after implementing most of the methods discussed, I documented some of my experiences. Here I focus on developing an application or process to load data from files, or on demand, when a more sophisticated tool like SSIS or Informatica is not already in use.
Oracle External Tables: When working with Oracle, I have often used external tables to assist with file loading. They are easy to setup, provide good error handling, and allow querying the data prior to loading it for validations. This has always been a favorite method for Oracle. However, on my last few projects this was not a great option. Without access to the database server it requires coordination to setup the directory and to get access to the files in that directory on the server. It also requires a secondary process to stage, rename, archive, and log the files on the file server. Additionally, on a project using Amazon RDS, using external tables was simply not available.
SQL Server Bulk Methods (BULK Insert / OpenRowset): Similar to Oracle external tables, these methods provide a fast and easy way to get the data into the database. Again, however, these require special permissions and/or access to the database server. After going down this path on a recent project the method worked fairly well, but without access to the database server it was difficult to coordinate getting the files to the server. When using a file share on a remote server, it was difficult getting the permissions necessary on both the remote server (file share) and the database server. It is possible, but with sensitive data it requires extra coordination and permissions, some of which are not desirable. I also find it difficult to do detailed error handling and logging if the file does not arrive in the format that is expected. This also requires a secondary process to stage, rename, archive, and log the files on the file server.
SSIS and other tools: If SSIS (or other tools) are available; I find it to be a great method for loading the files quickly into the database. Most of my experience with the tools is SSIS to SQL Server, but I know others have similar capabilities. Typically if a client is already working with SSIS and/or Informatica, the client has processes in place for handling the files. If not, there is often a lot of cost and overhead to setup this process from scratch. For our purposes let’s assume the developer is not already working with one of these tools at the client site.
I have had success with loading data using each method described thus far. Other than SSIS (or other third party tools), they all require a separate application or scripting to handle the actual files. An application that reads the file, does validations, does logging, moves/renames files, and archives the files when complete. This is sometime done through scripting, but often requires a more in depth approach for logging and auditing the actions. Additionally, all of the methods discussed thus far require additional coordination, permissions, and/or access. There are other BULK methods I have seen or used that are not mentioned. Most of the other methods that I have used were for a one time load of large data, an initial migration, a large data transfer, or import/exports.
C# Application: This leads me to an application-based method. My preference in the last couple of projects has been to use a C# application to handle the file moving/renaming/logging, file parsing, and data loading. This method is easily scheduled, configured, and migrated. It typical requires little to no additional coordination, software, or permissions. One, fairly simple and self-contained application, can handle the processing. Additionally, the application is very similar whether running against Oracle or SQL Server (Cloud or on-premises). The data loading aspects are also easy converted to web applications that require user uploading of data to the database.
Once an application template is designed for logging, and error handling, it is fairly simple to expand the application to handle numerous formats in the same manner. This method has been efficient, easy to implement, and easy to use between various projects. Within the C# application I have used various options to load the data, and found that some are much more efficient than others. Click Here to read more about the methods we have used within our C# applications.