Test Data Management Maturity Assessment

Test data generation has matured beyond the realm of inputs, as it is described on Wikipedia. Test Data practices must consider the overall quality of the digital assets for a business. The data are not only persistent, they are often the most important IT artifacts for these businesses.

With the emphasis on software development in most IT shops, the emphasis on data quality has been lost. Data are more than inputs to a software program.

Data validation, the accuracy of the persistence, relationships and overall integrity of data is tantamount to solution quality. To help drive data quality as an initiative we have enumerated the following disciplines within test data management.

Test Data Infrastructure

The tasks and responsibilities within this discipline include managing the costs of the hardware, software and other licenses, with a special focus on storage. This also includes insuring the reliability of the underlying infrastructure elements to achieve the internal service level agreements to the test and development teams.

Because this area works with the infrastructure, database administrators and financial stewards of the business, the challenges are usually more bureaucratic than most of the other disciplines. Developing a clear and open communication with these other groups is one way to manage this challenge. There are recent tool advancements for database and service virtualization that address these problems such as Delphix

The following table presents a maturity progression for this area.

Practice Test Data Infrastructure
Level 3: Optimizing – Data storage optimized with virtualization and/or cloud.
– Data access managed via Organizational Unit or defined LDAP group.
– Licenses and costs quantified per project/release.

Level 2: Managed – Data storage centralized to reduce downtime and costs.
– Data access managed centrally
– Licenses and costs quantified for QA team.

Level 1: Defined – Data storage tracked.
– Data access checklist exists.
– High level costs for infrastructure are defined at whole IT level.

Level 0: Repeatable – Data are backed up.
– Key owners of test data have access rights.
– License and server costs are not a priority.

Level -1: Initial – Test data storage across file systems, databases, and spreadsheets.
– Access to data not managed.
– License and server costs not quantified.


Test Data Coverage Analysis

In order to find gaps in the testing coverage the tasks and responsibilities within this discipline include examining:

  • Test data inputs
  • Database at the entity – relationship level

This level of analysis helps prevent defects in the design of the databases, edge conditions related to relational pairing permutations, as well as simple gaps in row input complexity. In many application architecture designs, the database implementation embeds core logic outside of what is commonly considered source code. By analyzing the data models directly with a quality assurance viewpoint, the data related defects can be fixed more efficiently and the test cases will be more valuable.

The following table presents a maturity progression for this area.

Practice Test Data Coverage Analysis
Level 3: Optimizing – Enterprise tools used to quantify coverage.
– Data models drive test cases.
– Root cause analysis drive test data model optimization.
– Data coverage results are near 100% with a continuous improvement plan in place.

Level 2: Managed – Test team collaborates with data analysts and SME to define coverage.
– Coverage goals are communicated to teams that perform testing functions.
– Data coverage analyzed as part of overall testing process.

Level 1: Defined – Input coverage is defined by the SME subjectively.
– Test cases are updated on an as needed basis via search or visual inspection of lists.

Level 0: Repeatable – Test databases are production snapshots.
– Individual testers use data to fit test cases.
– Individual testers update test cases proactively.

Level -1: Initial – Test data is a subset chosen to fit within the storage limitations.
– Test cases do not specify test data.
– No review of defects or incidents to improve test cases.


Test Data Security Practices

Within this dimension of Test Data Management, a primary concern is maintaining security and compliance for the data which will be used in the test environment.

IT security practices recommend reducing the attack surface, thus when a test team uses PII, PCI, or other sensitive data, they increase the attack surface and thus the risk. A better solution for solution testing is to mask or obfuscate the sensitive elements in the data during the process of creating or updating the test systems.

Another way to minimize the access to the systems, which must have the sensitive data is to carefully manage the lists of the users with access as well as tracking their interactions with sensitive systems.

Furthermore, in order to comply with the more stringent security and compliance requirements the test team will participate in audits. Therefore, the test team should take a proactive mindset to audit requirements and costs. For example the tools used for managing the test data should automate the reports.

The following table presents a maturity progression for this area.

Practice Test Data Security Practices
Level 3: Optimizing – Audit reports are proactively generated.
– Access list is directly associated to security best practices.
– Obfuscation and masking tools pass security reviews regularly.

Level 2: Managed – Obfuscation and masking tools are used.
– Access list is managed by LDAP/Active Directory.

Level 1: Defined – Production copies are used with some masking performed in test environment.
– Access list is tracked in database or via LDAP.

Level 0: Repeatable – Production copies are used in testing.
– Access list is tracked via email.

Level -1: Initial – Production copies are used during testing.
– Test systems sometimes point to production databases.
– User access to test systems is undefined.
– A security breach has occurred in the test environment.


Test Data – Execution Support

The availability of accurate test data are in the critical path within the test execution process. In fact, test data availability can easily prevent the actual application testing from beginning at all. Thus, improving the time it takes to provide the test data typically reduces the overall test execution timeline. As delivery teams try to shorten release cycles or iterations, the hours spent waiting multiply. In order to do this most effectively, the test data process should be integrated into the QA automation pipeline.

Also, as the rate of test cycles increases the impact of an unplanned delay in the test data process can block an entire release. If the test data processes are not reliable and repeatable with automation that are integrated with the rest of the development processes, then the test leadership team will be solely responsible. Robust engineering practices and designs can reduce the mean time to recovery. As teams mature, the test data readiness can be measured against a Service Level Agreement along with other elements with the delivery pipeline that are critical path.

Also, as the solution evolves the test data required for testing does as well. Thus the individuals in the delivery team need an efficient process to request data. These requests should also be optimized for communication and efficiency. As these requests improve, the quality of the test data for the organization becomes more robust in terms of business coverage as well. The ultimate goal is to create pools of data that maximize test coverage with the least possible permutations.

The following table presents a maturity progression for this area.

Practice Test Data – Execution Support
Level 3: Optimizing – Test Data process is integrated into QA automation pipeline.
– Test data pool readiness is not in critical path.
– Test data uptime meets SDLC goals for availability to project teams.
– New test data requests meet delivery team efficiency goals

Level 2: Managed – Data pools are common practice.
– Test data requests are managed.
– All testing requests are tracking and delivered on time.

Level 1: Defined – Improvement goals for shortening test data generation are defined.
– Separate pools are designed for sharing and coordinating.

Level 0: Repeatable – The execution time spent capturing test data is tracked.
– The estimates for this time are accurate and repeatedly performed.

Level -1: Initial – Testers manage their own test data.
– Testers regularly impede each other by using each other’s’ records.


Data Validation

Validation processes are essential to improving the data quality of the solution. The ways that data are implemented within an application can span a broad range of implementations. Similarly the validation points can span the range of: duplicated values, missing values, corrupted data, broken references, or truncated information

The old rules still apply: Garbage in, garbage out. Data faults can be generated from: poor UI design, application integration faults or inaccurate transformation and decomposition. These can be exacerbated by broken checks in the transports between solution tiers such as UI to application tier or between application tiers. In fact, the new wave of distributed solutions increases the risk of accurate communication.

The test data management team should examine the various type of validation that are performed by testing and development. With their central position, they can help coordinate activities for data validation. They can also foster a test driven mindset for data by generating robust data permutations that force robustness into the inputs.

The following table presents a maturity progression for this area.

Practice Data Validation
Level 3: Optimizing – Data reconciliation tools used during functional test automation.
– Data accuracy is validated at each tier of the application stack.
– Root cause analysis includes TDM org.

Level 2: Managed – Data validation incorporates some business level oversight.
– Data accuracy is validated within one or more tiers of the application.
– Data faults are debugged by coordinated and centralized team.
– Data Defects are reduced considerably.
– Easier to detect bugs or any other defects types.
– Root cause analysis includes TDM org.

Level 1: Defined – Data input validation is automated.
– Data faults are tracked and compiled.
– Data accuracy is validated within one or more tiers of the application.

Level 0: Repeatable – Data input validation is mapped for UI components.
– Time spent on data validation is tracked as a category.

Level -1: Initial – The UI provides data validation feedback.


Test Data Development

The process of creating test data for an organization or even a large project gains efficiency and quality by taking a development engineering mindset. Test data processes should leverage common languages and tools. The teams should create automation that is reusable, componentized, and standardized. The issues and progress should be tracked in as requirements or action items like any other task within the SDLC.

The TDM team should leverage data flows and actively collaborate with data architecture to help foster a test driven design. The test data development should be included in project or sprint planning and retrospectives. Additionally, the test data scripts should be part of the release version control system.
The following table presents a maturity progression for this area.

Practice Test Data Development
Level 3: Optimizing – Test data development is included in project or sprint planning and retrospectives.
– Test data scripts are part of release version control system.
– Test data development is synchronized with data architecture.

Level 2: Managed – Test data scripts exist per each project.
– Test data development is aligned with development and architecture team loosely.

Level 1: Defined – Test data scripts exist per tester.
– Test data scripts are backed up periodically.

Level 0: Repeatable – Test data requirements are defined within the test cases.

Level -1: Initial – Test data are created or found by testers via manual methods.


By Jim Azar

James (Jim) Azar, Orasi Senior VP and Chief Technology Officer A 29-year veteran of the software and services industry, Jim Azar is charged with oversight of service delivery, technology evaluation, and strategic planning at Orasi. Among his many professional credits, Azar was a co-founder of Technology Builders, Inc. (TBI), where he built the original CaliberRM requirements management tool. Azar earned a B.S. in Computer Science from the University of Alabama, College of Engineering, where he was named to the Deans’ Leadership Board. He furthered his education with advanced and continuing studies at Stanford University, Carnegie Mellon, and Auburn University at Montgomery. Azar has been published in both IEEE and ACM.

Leave a comment