Data Cleansing and Migration Toolset

Helping subject-matter experts analyze more than 100 million rows of legacy data in preparation for cleansing and migration.

Idaho Department of Transportation New Hampshire Department of Motor Vehicles

The states of Idaho and New Hampshire wanted to modernize their data and operations off the mainframe and onto modern platforms. The dataset sizes for Idaho and New Hampshire were each more than 100 million individual records. To add to the challenge, a great majority of these records needed to be cleansed.

For these projects, RESPEC recommended the SMETools Analysis Toolset (STAT) to analyze and scope the data issues and, working with the Subject Matter Experts (SMEs), to identify the business and data rules needed to cleanse the data for migration to their modernized system.

To assist subject matter experts (SMEs) in Idaho and New Hampshire in reviewing and analyzing the more than 100 million rows of legacy data in preparation for cleansing and migration, RESPEC provided  the SMETools Analysis Toolset (STAT). STAT provides an interactive user interface that enables SMEs to quickly and easily browse through entire tables, individual columns, distinct value listings, validation exceptions, and data rules violation reports, all the while optionally applying custom filtering and sorting, drilling down into the underlying source records, and exporting results to file for further analysis. STAT allows both the SMEs and business users to visualize and understand the detailed information about the state, cleanliness, and compliance of data with technical requirements and business rules in terms, methods, and interfaces that they can understand.

An analytical data store on the back end allows interactive real-time results, even for very large datasets. Two sets of tools support two distinct workflows: (1) tools for “undirected data exploration,” to be used early in the modernization process to get a “first look” into the data, and to build up a starting catalog of rules, validations, and issues, and (2) “directed exploration” tools for reviewing the results of applying these rules and validations. In practice, the combination of undirected and directed analyses, real-time query response, and a clean, organized, simplifying view of the data has allowed customers to get a good initial grip on their data.

The Idaho and New Hampshire projects were both successfully completed. Both clients expressed great satisfaction with the ease of use of STAT and how quickly they were able to clean up their data.