Learn
Workshops
Learn systematic test design techniques
Learn to generate reliable tests using AI
Video Courses
Comprehensive video course on testing
Booklet
Long form essays about testing
Testimonials
Read what other people say about the workshop
For Businesses
Custom workshops for your team
Testing strategy and implementation consulting
Invite me to speak at your event or conference
Get help convincing your manager to pay for the workshop
Resources
Articles and insights about testing
Get in touch for questions or collaborations
Good Tests Begin with Good Samples
Written by Lucian Ghinda
Make your future easier by spending time today to properly set up test data.
The time you spend today setting up test data that matches real and important user cases is an investment.
It will help your daily work but it will be a great help during a priority zero incident, and you will be glad you invested that time early.
To achieve this, understand your users, your product, and your business. Work closely with your product, support, and sales teams, then apply sampling.
Choosing good test data is about sampling well. Borrowing ideas from statistics helps us make smarter, more deliberate choices.
1. Population vs. Sample
Think of your code’s input space as a population: every possible combination of data and state. Your tests are a sample of that population.
The goal isn’t to test everything, but to test a representative subset that helps you discover likely problems early.
2. Random vs. Systematic Sampling
You can:
- Pick random inputs (great for fuzz testing or probabilistic code), or
- Pick systematic samples, one from each meaningful category or boundary.
For most business logic, structured sampling works best. Example:
If your API accepts ages 0 to 120, you don’t need 100 random ages. You need a few key samples:
- 0 (lower boundary)
- 1 (typical valid)
- 119 and 120 (upper boundaries)
- 121 (invalid)
This approach is actually the folundation for boundary value analysis testing technique.
3. Stratified Sampling: Covering All Groups
Not all inputs are equal. Divide your input space into logical groups and pick samples from each one.
Example:
- For a login form, you might stratify by “valid credentials,” “invalid password,” “nonexistent user,” “empty input.” Then pick one or two cases from each.
This prevents over-testing one area and missing others. In testing terms: that’s a part of the equivalence partitioning testing technique.
4. Sampling Bias: The Trap of “Happy Paths”
Most developers unintentionally sample the same region, sometimes called the happy path. The result is biased tests that miss the tricky edge cases where bugs often hide.
Ask yourself:
Are my test cases representative of real-world usage and rare conditions?
If 90% of your users are on mobile Safari, don’t just test on desktop Chrome. If your code processes files, test both small and large ones, and consider different file extensions. See the stratified sampling.
5. How Many Samples Are Enough?
Adding more test data isn’t always better. Beyond a certain point, new samples might reveal new information, but adding more to an already covered space increases testing costs.
Think of sampling like this:
- Start broad: cover each risk area once.
- Go deeper only where you see instability or recent changes.
This is called risk-based coverage: you focus your effort where it matters most and avoid unnecessary work.
6. Always Include Extremes and Weirdness
Include a few “troublemakers”:
- Empty strings
- Nulls
- Maximum lengths
- Invalid characters
- Very large numbers
These aren’t just edge cases. They’re bug magnets.
7. Realistic vs. Synthetic Data
- Synthetic data helps you test logic cleanly and deterministically.
- Realistic data reveals integration issues, performance surprises, and messy real-world inputs.
You need both.
For example, your happy path tests can use synthetic data, but your end-to-end suite should include a realistic snapshot of production-like data.
8. Adaptive Sampling
You don’t have to get it right from the start.
Start small.
When a bug appears, expand your sample around that area.
Think of it as zooming in with a magnifying glass to spot where problems start to appear.
9. Things to keep in mind
Good test data isn’t random.
Good test data is chosen carefully, based on structure, boundaries, and risk.
Next time you write a test, ask:
- What’s the space input here?
- Which categories or boundaries matter most?
- Am I sampling fairly across them?
- Where might the system behave differently?
That’s not over-engineering. That’s just good sampling, and it makes testing much easier.
Next Workshop
10 October 2025 - 15:00 UTC
6 Going
JAMIE SCHEMBRI, Christopher Barton and 4 others
9 spots remaining
#goodenoughtesting #subscribe #email
Get free samples and be notified when a new workshop is scheduled
You might get sometimes an weekly/monthly emails with updates, testing tips and
articles.
I will share early bird prices and discounts with you when the courses are ready.