Solving the Challenge of Data Security in Development and Testing

Data breaches are becoming impossible to escape. They occur in every business sector and in companies of all sizes. Juniper Research estimates that the average cost of a single data breach will exceed $150 million by 2020, with the global exposure exceeding $2 trillion by 2019.

Yet, many organizations still struggle to secure their production data effectively. In many cases, business leaders think they are protected, only to discover after a data breach that the opposite was true. For companies that develop, maintain, and/or customize software, either for sale or for internal use, the risk is even greater.

Software teams are under increasing pressure to deliver releases more quickly, yet security budgets haven’t risen to counter the escalating threat. Additionally, accelerated methodologies such as DevOps can create additional pressure, and meeting delivery timetables often takes precedence over security. For example, the U.S. Department of Homeland Security (DHS) estimates that 90 percent of security incidents result from problems with software.

The time has come for organizations to accept responsibility for comprehensive, end-to-end security across the continuous delivery pipeline. In this article, we’ll cover some of the challenges and best practices specific to data security.

The Development Data Challenge

Delphix, a data virtualization firm, commissioned a report in 2015 demonstrating that copies of production data used in development and testing represent one of the greatest data breach risks in an enterprise. Furthermore, 62% of DevOps leaders say full data access for non-production environments is a requirement for success. Moreover, as many as 80% of firms use (and often store) production data at some point in their development and testing activities.

This approach is extremely dangerous, since development and testing environments are not often within the organization’s strongest layers of security. Instead of the data residing in tightly monitored systems with strong access restrictions and a small group of authorized users, it is accessible to a larger number of people with minimal security clearance.

In some cases, the data is being shared with disparate or even outsourced teams. As a result, sensitive data is crossing networks that may not be adequately encrypted or is leaving the confines of the U.S.

Data Leaks Compound the Issue

Even teams that do not use copies of large production datasets should scrutinize their processes and monitor their data more closely. If development and testing teams have any access to production data, there is potential for accidental leakage and outright theft.

Production data can be exposed in a number of ways, from an employee copying it to a laptop for off-site (e.g., home) testing to a team member sending it to someone in another department or another company. It may seem illogical that anyone would do such a thing in today’s threat-laden environment, but I assure you that it happens every day.

Furthermore, spearfishing—whereby a cybercriminal spoofs an email to look legitimate and then embeds malicious links in it—remains one of the most prevalent sources of data breaches. With software teams often connected to servers that might not be as heavily secured as a firm’s core data servers, the outcome if someone clicks on one of those links could be disastrous.

Disgruntled employees are also a leading source of data theft. Organizations must be vigilant about not allowing unbridled access to copies of production data, and they must immediately revoke access to all corporate resources when any employee leaves. Many organizations neglect to create/implement/utilize an off-boarding process to mitigate this threat.

If development teams are geographically separated from production and each other, organizations must exercise even greater caution. With remote operations, it is likely that both infrastructure and processes are optimized for remote access, potentially making it easier for criminals to steal cloned production data.

Production Servers Can Post Significant Risk

Production servers are another area where data is often endangered by development and testing activities. For web applications in particular, it is quite common for organizations to develop and test directly on production servers.

Exacerbating the problem, internal development and testing applications are often accessible online yet inadequately secured. If development and testing are being done on a production server, such applications can easily be leveraged by a skilled hacker running sophisticated (and sometimes even basic) scripts to compromise and access the production server.

Also, it is not unusual for teams to archive test sites hosting unmasked production data in public directories that even a moderately tech-savvy person can find. Because these applications are under development, they often have vulnerabilities that hackers can exploit to attain access. Web application content and server-side scripting create additional vulnerabilities, especially in unfinished, not-yet-security-tested projects.

The Problem Is Universal

At this point, you may be saying to yourself, “My firm doesn’t take any of these risks. We use best practices to protect data.” Maybe you clone production data for testing but believe your development and testing servers are adequately secured. Or, perhaps you use synthetic data, and only a handful of people can access production data for specific purposes. No matter what safeguards you have taken to protect your data, unless you have adopted end-to-end processes or technologies that mask and secure production data across the entire enterprise, your firm is painting a target on its back.

As an aside, synthetic data presents an additional challenge not related to security. Being a simulation, it may not adequately replicate what exists in production. Since synthetic data is a model, it may not account for scenarios outside the vision of its creator.

The hard truth for software teams—and the companies that develop, maintain, and/or customize software—is that the current state of software development and testing encourages security problems.

Whether due to competitive pressure or demand from users (internal or external), application development cycles are being shortened, yet resources and budgets are not increasing at the same pace. This is especially true for mobile apps, which are becoming requisite in both the consumer and enterprise arenas.

Gartner estimates that market demand for mobile application development services is growing at least five times faster than internal IT organizations’ capacity to deliver them. It is inevitable that these pressures will cause project leaders and their teams to take shortcuts, circumvent security protocols, and adopt “just-this-once” mentalities.

Data Security Best Practices

Now that I have your attention, let’s talk about solving the problem. The following are some recommendations for best practices. This list is intended as a starting point—it is not all-encompassing, and every organization’s needs are different.

Production Data Handling

If production data is used for development and testing, it must be scrubbed or masked.
The data masking/scrubbing solution should be able to deliver data on demand, within minutes, and it should be self-service. This will help team members resist the tendency to circumvent security policies and use production copies.
The organization should also deploy data virtualization in tandem with the data masking/scrubbing solution. Such a solution provides data snapshots that give all team members access to consistent data from the same masked dataset, reducing the overall data footprint and eliminating the temptation to access production data.
Optimally, even with the best data masking solution, the organization should have a comprehensive set of validation procedures.

Production Data Storage

Data on production servers should be protected in transit and at rest with the highest possible grade of encryption (preferably at file-level).
Direct data access should be limited to a handful of trusted individuals, accessible only via multi-factor authentication. Permissions should be assigned using role-based access control.
Development and testing of web applications should always be done on servers isolated from the Internet and should never use or connect to production data and databases.
Web application or website data, code files, and scripts should always be on a separate partition or drive from that of any database, operating system, system files, or logs.

Policies and Procedures

Stringent policies and procedures should be developed and enforced, and the ramifications for accessing and/or distributing copied production data should be clear.
All employees (development, quality assurance, project managers, support personnel, third-party contractors, etc.) should review and sign off on the policies and procedures.
A mechanism for creating a testing activities audit trail with a risk assessment report is strongly recommended. This is a requirement for organizations subject to regulatory control and compliance. However, audit trails and reports are valuable for any firm that becomes the victim of a data breach. Not only can the documentation help identify and eliminate security holes, it can also establish proof of a consistent effort to take security precautions.

What to Look for in a Data Handling Platform

All data masking/scrubbing solutions are not created equally. Orasi finds Delphix to be an attractive choice for its ability to provide end-to-end data masking and data virtualization in a single, unified platform. Some of the features we like in Delphix, that organizational leadership should look for in their own solutions, include:

The capability to profile the data and automatically identify “at-risk” fields, rather than requiring users to define them.
The sophistication to provide data that, while not real, is realistic. Some data (e.g., social security numbers, credit cards, vehicle identification numbers, etc.), has very specific requirements and the solution must address those restrictions.
The confidence that the masked data is sufficiently anonymized, so it cannot be correlated back to the original data.
The complexity to maintain referential integrity so that data keys—as well as connections between lists, data subsets, and other pivot points—are not lost in the masking process.
The assurance of stringent, built-in validation of the resulting masked data.
The opportunity to subset the masked production data in order to minimize the organization’s total data footprint.

Conclusion

With organizations increasingly being subjected to compliance mandates such as the Federal Information Security Management Act of 2002 (FISMA), Sarbanes-Oxley (SOX), the Health Insurance Portability and Accountability Act (HIPAA), and the Payment Card Industry Data Security Standard (PCI-DSS), the need to protect production data at all levels will only increase. New standards such as the EU’s General Data Protection Regulation (GDPR) will add pressure for firms conducting business with major European partners.

Even for firms not subject to any of these mandates, the realities of data breaches can be financially catastrophic. Taking a forward-thinking approach to data security now can minimize the risk of significant disruption—and potential disaster—in the future.