Despite its importance in the software development cycle, controlled lab testing can never identify all the issues that actual users might encounter. For mobile apps in particular, there are literally thousands of combinations of devices, operating systems and network conditions, making comprehensive testing of all scenarios impractical. Furthermore, because testers in the lab know what an app is supposed to do, it is impossible for them to experience that app as a first-time user would.
One solution to these and other testing challenges is crowd testing, where anywhere between a few hundred to 10,000 individuals are asked to use an app and report on their experiences, problems and/or overall impressions. Although the concept was pioneered in the early 2000s, crowd testing didn’t really begin to take off until recently, as the number of potential testing scenarios—and the negative impact of user disappointment—has grown over time.
The questions then become where and when is crowd sourced testing most effective, how does it produce reliable results and what (if any) barriers can hamper its value in the development life cycle?
Where Crowd Testing Makes Sense
There are a number of specific scenarios in which crowd-sourced app testing is not only beneficial; it produces results impossible to achieve in a lab under typical conditions (and budgets). The following are a few of the more pertinent examples, but the possibilities are limitless.
Challenging geographic conditions: If a software developer wants to know how an app will perform in areas with limited or unreliable cellular coverage, such as in rural areas or in the middle of the mountains, crowd-sourced testing is the most practical solution.
Network variations: Given the increasing popularity of alternate networks such as Wi-Fi for mobile device connectivity, many companies want to know how their app will respond when a user moves through different networks and environments. One example would be for a user to open an app on a Wi-Fi network at home, then continue using it as he or she goes into an elevator with a weak 3G signal (and experiences the hand-off from Wi-Fi to cellular networks), then walks across a parking lot, picking up additional bandwidth (4G) from reduced structural interference and more transmitting towers. Another example would be testing how an app responds when someone uses it while traveling in a train or a car with hand-offs between multiple primary and partner networks.
Defect identification: Crowd testing is also an excellent way to identify unlikely bugs and defects. In the lab, the traditional approach to testing is to define a group of tests and run those hundreds of times in essentially identical conditions, looking for anomalies and “stress fractures” that appear over time. This process identifies the majority of defects, but it cannot pinpoint all of them, especially those that are specific to certain devices or situations.
With crowd-sourced testing, the developer may specify only a few tests but have thousands of people run them across a wide variety of devices and conditions. The additional variation helps pinpoint errors and bugs that might not crop up on mainstream devices or in lab-defined conditions.
Old, new or unusual equipment: When developers want to test their apps on outdated devices that are hard to find, new devices that are still very expensive or fairly unusual devices such as Internet-connected gaming equipment or less-popular OSs, crowd-sourced testing is a great solution. In most cases, the testers own their devices, so companies avoid the expense and effort of equipment acquisition.
App server load tests: An app server might perform beautifully when being bombarded by repetitive signals from a simulator, but what happens when hundreds of real users in different locations—and with disparate devices—try to connect? Without replicating those conditions, it’s impossible to know the results.
User experience issues: In many instances, a feature may function as designed but not be intuitive to the average user. For example, if 90% of apps on a platform have a simple function that involves a swipe from the right, and a new app incorporates a swipe from the left, it will confuse users. Crowd-sourced testing is perfect for identifying these types of design defects. A company can have 1000 “newbie” users attempt to use the app, and if 100 cannot figure out a feature, that’s a problem.
Out of the Lab and Into the Wild
Although some crowd-sourced testing is as basic as asking a tester to use the app (like our example mentioned earlier), in some cases it involves specific, designated tests. In the latter scenario, reliable results depend upon finding individuals who can perform assigned tasks in an accurate and reliable manner.
So, how can a company assemble such a varied group of people, and how can it vet them to ensure they perform the tests as designed? How can they keep corporate spies or mischievous testers from getting involved and intentionally mucking up the process?
Due to the difficulty of assembling a large, objective crowd of people whose input will be trustworthy, most companies work with third-party vendors for their crowd-sourced testing projects. As a result, despite the implications of the name, crowd test projects aren’t completely organic or random—for a very good reason.
To ensure value to the software developer, crowd-testing vendors build databases of individuals interested in participating in these programs. Some testers are invited to participate based upon their involvement with technology forums, online communities and other likely recruiting spots. Other would-be testers seek opportunities on their own. Some crowd source testers are paid for their time; others receive consideration in the form of gift certificates, equipment, professional accolades or other perks.
No matter how they assemble their pools of testers, reputable vendors build extensive databases of these individuals, tagged with any specialties as well as demographics, level of IT experience and other characteristics. With every project, a tester receives ratings based upon the number of tests he or she completes, the accuracy of the results (what percentage of similar testers identify the same behaviors and/or whether the developer can subsequently replicate the issue) and other factors that indicate reliability and commitment to the effort.
This approach also enables vendors to provide exactly the type of testers a company seeks. One company might want a crowd of users experienced with mobile eCommerce, for example. Others might want users who are familiar with iPhones but not Android devices. Based on whatever criteria the developing company considers important—age, income, technological sophistication and more—the crowd-testing vendor uses the client’s profile to build a crowd to match it.
Another benefit of this technique is that it helps the client company prioritize the results of tests, where appropriate. For example, if 80% of top-rated testers found a particular bug, the developer might put it ahead of one where only 20% of highly-rated testers experienced it.
From Wild to Trialed
As I alluded earlier, crowd-sourced test results don’t stand on their own. After receiving results, the client’s testers will vet them (some or all, depending on the design of their test) by seeing if they can replicate the issues testers identified. Once they replicate an issue, they then determine whether a defect or issue is worth addressing. The developing organization also provides feedback to the crowd-testing vendor, which the vendor uses to update tester ratings, adjust the testing profiles if requested and perform other tweaks.
Crowd testing is a valuable contributor, not only before the final release of code to production, but also in the early stages of testing. The first crowd-sourced test might happen at the beginning of the testing cycle and identify a wide array of defects and concerns. After coders make adjustments, a second crowd-sourced test toward the end of the production cycle can confirm whether or not the issues have been resolved.
An additional run of crowd-sourced tests, possibly with a different profile, might be used to determine if other categories of users find an issue objectionable. This helps companies determine if allowing an inconsequential or subjective flaw to stand will impact user attitudes.
Another area where early crowd-sourced testing is beneficial is to identify opportunities for improvement. Well-educated, experienced crowd testers are perfect candidates to pinpoint not what is wrong but rather what is missing in the user experience, based upon their experience with other apps.
Crowd Concerns and Challenges
Crowd-sourced testing is not without its issues, although most of them can be overcome with prudence and common sense. On the production side, companies must have pre-production servers with sufficient capacity to handle whatever testing loads they request. Spontaneous loads from a variety of OSs and locations can be significantly higher than the same number of server requests in a controlled environment.
Another issue is security. Pre-production servers and any other aspect of the infrastructure that a crowd may affect must be highly secure. If they are not, both the client’s infrastructure and the testers’ devices could be put at risk.
Issues on the design side also pop up, but here, ingenuity rules the day. One example that many companies experience is the conflict between wanting to work with reputable, experienced testers and the need to collect the impressions of unsophisticated users. Neophyte users are the best candidates for user-experience tests—they will pick up on confusing or overly complicated features that any experienced user would intuitively figure out.
To remedy this problem, crowd-testing vendors can tweak the parameters of the data collection effort to focus on less-experienced users but still ensure a sufficient pool of testers to produce statistically viable results. When a large group of inexperienced users consistently makes the same mistakes, developers know they are dealing with a valid design defect.
That’s the beauty of crowd testing with today’s advanced business intelligence and data analysis platforms. Thanks to the power of computing, virtually any test, no matter how arcane, can be structured to ensure sufficiently trustworthy results from a large, properly targeted sample of testers.
Join the Crowd
Crowd-sourced testing can be a powerful tool, and it’s an emerging science that requires the involvement of both crowd vendors and their clients. Working together, vendors and customers build better databases of crowd testers, refine what really matters in software apps and chart new territory with the goal of making apps better overall.
With that said, on an individual basis crowd-sourced testing isn’t necessarily about making a perfect product. It’s about taking a service-oriented approach to discovering and addressing as many issues as feasible before releasing software to real-world app users. With user abandonment rates soaring for problematic apps—and with close to 50% of app users saying that a poor user experience affects their opinion of a company and its brand—it’s software suicide to let your users be the ones to find your biggest mistakes.
Yet, thousands of companies continue to cross their fingers and hope that testing automation, lab simulations and professional evaluations with a handful of devices are sufficient to ensure a good user experience. Often, they are wrong.