Welcome

RTSCheck is a framework for checking RTS tools.

Overview

Regression test selection (RTS) reduces regression testing costs by re-running only tests that can change behavior due to code changes. Researchers and large software organizations recently developed and adopted several RTS tools to deal with the rapidly growing costs of regression testing. As RTS tools gain adoption, it becomes critical to check that they are correct and efficient. Unfortunately, checking RTS tools currently relies solely on limited tests that RTS tool developers manually write.

RTSCheck is the first framework for checking RTS tools. RTSCheck feeds evolving programs (i.e., sequences of program revisions) to an RTS tool and checks the output against rules inspired by existing RTS test suites. Violations of these rules are likely due to deviations from expected RTS tool behavior, and indicative of bugs in the tool.

Components

RTSCheck uses three components to obtain evolving programs.

AutoEP

AutoEP automatically generates evolving programs and corresponding tests.

DefectsEP

DefectsEP uses buggy and fixed program revisions from bug databases.

EvoEP

EvoEP uses sequences of program revisions from actual open-source projects' histories.

Rules for Detecting Violations

RTSCheck currently applies seven rules to detect safety, precision, and generality violations.


Rule Id Violation Description Type
R1 In some revision, the number of newly failed tests when run with the tool is lower than with RetestAll. safety
R2 In some revision, the tool selects zero tests but all other tools select all tests safety
R3 In all revisions, the tool selects all tests precision
R4 In some revision, the tool selects all tests but all other tools select zero tests precision
R5 The first two revisions are the same, and the tool selects one or more tests in the second revision precision
R6 In the first revision, the tool selects a different number of tests than RetestAll generality
R7 In some revision, the number of failed tests when run with the tool is greater than with RetestAll generality


Run RTSCheck Demo

As the full run would take several days, we provide steps to observe a subset of behavior reported in the paper.

  1. Download RTSCheck and unzip
  2. Run AutoEP on a subset of experiments (~30min) and print violations (should see Clover violations on rule R7):
    cd autoep
    ./main
    An example violation is:
         GROUP: R1-clover-lang-GR2-fail--1-less
         On lang-42, clover violates R1
  3. Run DefectsEP on a subset of experiments (~30min) and print violations:
    cd defectsep
    ./main
  4. Run EvoEP on a subset of experiments (~30min) and print violations. (no violations in this case):
    cd evoep
    ./main

Run Full RTSCheck

To fully run RTSCheck experiments, use the following commands:

  1. Download RTSCheck and unzip
  2. Run AutoEP on all the experiments:
    cd autoep
    ./main all
  3. Run DefectsEP on all the experiments:
    cd defectsep
    ./main all
  4. Run EvoEP on all the experiments:
    cd evoep
    ./main all

AutoEP-generated Programs

All the evolving programs generated by AutoEP in our experiments are available here.

DefectsEp Violations Grouping

We group rule violations reported by DefectsEP based on which rule is violated, which tool violates the rule, and on which project the violation is triggered. Specifically, we apply the following grouping rules (GRs):
  1. GR1: Put two violations in the same group, if under both violations, the same tool runs the same number of tests as RetestAll in the new revision on the same project.
  2. GR2: Put two violations in the same group, if under both violations, the same tool fails X less tests than RetestAll in V1 as RetestAll in the new revision on the same project (one group per X).
  3. GR3: Put two violations in the same group, if under both violations, the same tool fails X more tests than RetestAll in V1 as RetestAll in the new revision on the same project (one group per X).
  4. GR4: Put two violations in the same group, if under both violations, the same tool fails the same number of tests (different that RetestAll) in the new revision on the same project.
  5. GR5: Put two violations in the same group, if under both violations, for the same tool, the number of run tests falls in the same percentage bucket of all the tests in the new revision on the same project. The percentage buckets are: (0%-20%, 20%-40%, 40%-60%, 60%-80%, 80%-100%).
  6. GR6: Put two violations in the same group, if under both violations, the same tool runs the same number of tests in the old revision on the same project.

Detected Violations

The violations detected by RTSCheck are available:
AutoEP
DefectsEP
EvoEP

Publication

A Framework for Checking Regression Test Selection Tools.
Chenguang Zhu, Owolabi Legunsen, August Shi, and Milos Gligoric.
The 41st International Conference on Software Engineering
ICSE 2019, pages 430-441.

Contact

Please send comments or questions to Chenguang Zhu.