How exams work#

Each exam is described by a JSON config file. This file lists the set of tests performed on the file(s) submitted by the student. It can also define global parameters for the test execution.

An exam is always executed with a single file selected as an entry point. This must have one of the supported extensions, which will determine what tests are available to be run on the file.

Test execution is done in a temporary directory, using a user that only have access to that directory. All files uploaded by the student are copied into that directory.

All paths are interpreted locally. This is so that if an exam is ran locally with the framework, it should be able to find all the files. In order to enable collaboration, it is recommended to define paths local to the root of the directory where the exams are stored, and from which exams are shared with others. If a full path is defined, that might be different on someone else's system. When packaging with feedback-builder all paths are replaced with ones that work in the Autograder.

Tests#

Specify a list of tests that should be ran, as an array under the tests key, with each having a type key to identify the test, and a set of parameters with values. Tests are executed in the order specified in the config file

Every test requires the "max_score" key, and some tests may require other parameters

Currently supported tests and parameters#

Parameters supported by all tests#

  • max_score: float - this parameter is required on every tests, unless explicitly stated. This sets the total maximum score for that test. 0 is allowed, where the test doesn't count towards the score, but can provide feedback towards the student.
  • (optional) number: string - assign a number to the test. If the test run generates multiple results, these will be added to this number separated by a dot. If number is not set numbers will be assigned by autograder sequentially
  • (optional) tags: [] - tags are supported by Autograder and will be passed through directly
  • (optional) visibility: string - visiblity controll for Autograder. Will be passed through directly

To learn more about tags and visibility, take a look at the Autograder documentation

Tests supported for all languages and their unique parameters#

  • replace_files - replace files with default versions. Files matched by name. Commonly used to ensure .h file are not modified. This test doesn't support max_score config value and doesn't produce a result entry. It can also be used to make sure certain files are in the same directory as the student code.
    • files: [] - path to the files to use for replacement.
    • (optional) override: bool - override the file if already present, or keep it as is. Default is true
  • comments - evaluate the comment density of the tested file. See how comment scoring works

Tests supported for C++ Exams#

  • compile - compile submitted C++ file
    • (optional) warning_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per each warning generated per comilation warning. Default is 0.2
  • functionality - test the functionality with a compiled-in tester
    • tester_file: string - path to the tester file
    • Supports all execution test parameters
  • functionality_executable - test the functionailty with an executable tester
    • tester_file: string - path to the tester file
    • Supports all execution test parameters
  • static - run static analysis on the tested file using cppcheck
    • (optional) error_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per each error. Default is 0.2
  • style - run code style check using clang-format
    • style: string - the target style. Default is google

Tests supported for Python Exams#

  • syntax - try to load the python tested file and check for syntax issues. If the module loads, full score is awarded, if not 0 marks are awarded
  • functionality - test the functionality of the tested file. For details see how python execution works
    • tester_file: string - path to the tester file
    • Supports all execution test parameters
  • static - run static analysis on the tested file, using lint
    • (optional) error_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per static error. Default is 1
    • (optional) warning_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per static warning. Default is 0.1
    • (optional) convention_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per convention violation. Default is 0.05
    • (optional) refactor_penalty: float [0.0, 1.0] - what proportion of the max score is deducted per suggested refactoring. Default is 0.05

Parameters supported by all execution tests#

  • timeout: int, timeout_sec: int, timeout_min: int, timeout_ms: int - maximum duration after which execution is interrupted and terminated. If set to -1 timeout will be disabled (this doesn't disable the default autograder timeout. Disabling the timeout is not recommended). Default is 30 seconds. Default unit is seconds
  • child_process_limit: int - maximum number of child processes the tested code is allowed to start. If more than the allowed number of processes are detected, execution is terminated. Default is 0. Outside very specific cases it is not recommended to enable the tested code to start child processes. This limit accounts for the additional process when executable testing is used. Child limit can be disabled by setting this value to -1
  • memory_limit: int, memory_limit_kb: int, memory_limit_mb: int - Limit the maximum memory allocated by the tested file. This can be used to check for memory leaks in C++ testing. If more than the allowed memory is allocated execution is terminated. Default is 526mb. Default unit is bytes. Memory limit can be disabled by setting it to -1. If multiple memory limits is specified, the smallest is used. This doesn't override the overall memory limit of the autograder container
  • allow_connections: bool - Allow or forbid the tested file to start any network connections. If it's not allowed and the tested file attempts to start a connection, execution is terminated. Default is false

Global configuration#

Some parameters must be applied to the entire evaluation. These are defined in a separate "globals" section after the "tests" section.

Supported global parameters#

  • cpp_std: string - Specify the C++ standard that's used by the compiler. Default is c++11. From full list of standards versions above c++98 are supported. gnu++ standards are supported above gnu++98. Depricated naming convenstions are not supported.
  • requirements - Specify the Python packages in the environment where the tested code is ran. Accepts array of package names (optionally with version) or single requirement file.

Example (C++) config#

{
    "tests": [
        {
            "type": "compile",
            "max_score": 10.0,
            "number": "1",
            "tags": [],
            "visibility": "visible"
        },
        {
            "type": "functionality",
            "max_score": 60.0,
            "number": "2",
            "tags": [],
            "visibility": "visible",
            "tester_file": "run_code.cpp"
        },
        {
            "type": "static",
            "max_score": 10.0,
            "number": "3",
            "tags": [],
            "visibility": "visible"
        },
        {
            "type": "comments",
            "max_score": 10.0,
            "number": "4",
            "tags": [],
            "visibility": "visible"
        },
        {
            "type": "style",
            "max_score": 10.0,
            "number": "5",
            "tags": [],
            "visibility": "visible",
            "style": "google"
        }
    ],
    "globals": { }
}

Methods of testing (C++)#

The following methods of tested file execution are supported:

Compiled-in#

For module-like C++ files, where the API is defined in a .h file, but the tested file has no individual execution component, complied-in execution should be used, where a tester, using the following template, imports the module and runs the various functions/classes defined. The template described here takes care of running the tests.

#include "TESTED_MODULE.h"
#include "cpp_eval_util.h"  // Module containing the evaluator prototype

class NAMEEvaluator : public Evaluator<TYPE>{
  public:

    NAMEEvaluator(int argc, char** argv):Evaluator(argc, argv){}

    // For a given question i return the name of the question
    // This is used to discover the number of tests.
    // If i is greater than or equal to the number of tests, return an empty string "".
    // This indicates that the limit has been reached.
    string GetName(int i){
      return i < no_of_questions ? "Name" : ""; // Example only
    }

    // For a given question i return the value/object evaluated for the given test
    TYPE GetResult(int i){
      return [TODO]
    }

    // For a given question i, which returned result return the score that can be rewarded for the result
    float GetScore(int i, TYPE result){
      return result == expected ? 1.0f : 0.0f; // Example only
    }

    // For a given question i, which returned result and was awarded score, return appropriate feedback
    string GetFeedback(int i, TYPE result, float score){
      return "Appropriate feedback" + (score >= 1 ? " : PASS!" : " : FAIL!");   // Example only
    }
};

int main(int argc, char **argv) {
  NAMEEvaluator evaluator(argc, argv);
  return evaluator.Run();
}

Replace NAME and TYPE to the name of the test and the return TYPE from the tested function/class

Executable#

For executable-like C++ files, where the file executed by itself prints some output. The tester, using the following template, can run the file, enter inputs and check outputs. The template described here takes care of running the tests.

from py_eval_util import Evaluator  # Py_eval_util will always be available.

def test_file():    

    # Create evaluator object
    e = Evaluator()

    # For a given question i return the name of the question
    # This is used to discover the number of tests.
    # If i is greater than or equal to the number of tests, return None.
    # This indicates that the limit has been reached.
    e.with_name(lambda i: "Name" if i < no_test_cases else None)  # Example only

    # Specify that you want to treat the tested file as an executable
    e.run_executable()

    # For a given question i pass in the following input to the executable on stdin, supports single value, and iterable.
    e.with_input(lambda i: [TODO])

    # For a given question i which returned result lines, return a score that can be rewarded for it
    # Result is a string list
    e.with_score(lambda i, result: [TODO])

    # For a given question i which returned result lines and was awarded score, return some appropriate feedback
    e.with_feedback(lambda i, result, score: "Appropriate feedback : " + ("PASS" if score >= 1 else "FAIL"))    # Example only

    # Start evaluation
    e.start()

if __name__ == "__main__":
    test_file()

The evaluator object also supports e.with_score_and_feedback(lambda i, result: 0, "feedback") which must return the score and the feedback a single tuple

The usage of lambda function is not required, it is also allowed to pass in the name of a function to the builder:


def score_provider(i, result):
    return 0    # Some score

def test_file():
    ...
    e.with_score(score_provider)

Cpp Eval Util#

The cpp_eval_util library, that is available to all C++ testers, beyond having having the Evaluator abstract class, also has a number of usefull functions:

  • is within_margin(a, b, margin) - returns bool wheather a and b are within margin of each other. Supports floats and doubles
  • btyle_to_binary(x) - convert an integer to its binary representation as a string of 0s and 1s
  • vector_to_string(vector) - convert a vector to its string representation. Supports int, float, double and string

Methods of testing (Python)#

For python tested files both module and executable testing is handled in a similar fashion. Use the following templates for the various approaches:

Test individual function or class:

from py_eval_util import Evaluator

def test_file():

    # Expected signature (the name and arguments will be matched with a function in the tested module)
    def sig_fun(x, y):
        pass

    # Alternatively, a class can be found by it's name and init signature
    # Keep in mind, only the init signature is matched, not other methods/properties.
    # Make sure any method/property exists before accessing it using the hasattr function
    class SigClass:
        def __init__(self, x, y):
            pass

    # Create evaluator object
    e = Evaluator()

    # For a given question i return the name of the question
    # This is used to discover the number of tests.
    # If i is greater than or equal to the number of tests, return None.
    # This indicates that the limit has been reached.
    e.with_name(lambda i: "Name" if i < [no_test_cases] else None)  # Example only

    # For a given question i, and a function from the module matching the requested signature, return some value that will be evaluated
    e.run_code(lambda i, f: f(...), signature=sig_fun)

    # For a given question i which returned result lines, return a score that can be rewarded for it
    e.with_score(lambda i, result: [TODO])

    # For a given question i which returned result lines and was awarded score return some appropriate feedback
    e.with_feedback(lambda i, result, score: "Appropriate feedback : " + ("PASS" if score >= 1 else "FAIL"))    # Example only

    # Start evaluation
    e.start()


if __name__ == "__main__":
    test_file()

Evaluator also supports with_score_and_feedback and full functions instead of lambdas (see above)

Test whole modules that are loaded as main, and will execute some code automatically. It is similar to Executable testing with C++

from py_eval_util import Evaluator


def test_file():

    # Create evaluator object
    e = Evaluator()

    # For a given question i return the name of the question
    # This is used to discover the number of tests.
    # If i is greater than or equal to the number of tests, return None.
    # This indicates that the limit has been reached.
    e.with_name(lambda i: "Name" if i < [no_test_cases] else None)  # Example only

    # Run the tested module as main
    e.run_module()

    # For a given question i pass in the following input to the module on stdin, supports single value, and iterable.
    e.with_input(lambda i: )

    # For a given question i which printed result lines, return a score that can be rewarded for it
    # Result is a string list
    e.with_score(lambda i, result: )

    # For a given question i which returned result lines and was awarded score return some appropriate feedback
    e.with_feedback(lambda i, result, score: "Appropriate feedback : " + ("PASS" if score >= 1 else "FAIL"))    # Example only

    # Start evaluation
    e.start()


if __name__ == "__main__":
    test_file()

Evaluator also supports with_score_and_feedback and full functions instead of lambdas (see above)

Py Eval util#

The py_eval_util module, that is available for all python testers, have the following usefull functions:

  • get_module_functions(module) - returns all function in a module
  • get_module_classes(module) - returns all classes in a module
  • get_levenshtein_ratio(string1, string2) - determin how similar two strings are to each other
  • result_to_dictionary(question, mark, weight, feedback) - turn question result into a dictionary, with correct names and rounded marks
  • numbers_close(a, b, margin) - determin if two numbers (a, b) are within margin to each other
  • InputOverride(list) - object that overrides stdin with the passed in list, and cleans up after itself when disposed. Use with the with keyword. If stdin is accessed too many times the too_many_reads flag is set to true
  • Capturing() - object that captures everything from stdout. It must be used with the with keyword. Results are stored in the object as a list
  • CapturingErr() - object that captures everything from stderr. It must be used with the with keyword. Results are stored in the object as a list

Comment scoring#

The score for comment tests are calculated in the following way:

  • The number of code lines and comment lines are counted with cloc. Empty lines are ignored
  • Comment density is calculated as such: comment lines / (comment lines + code lines)
  • The base of the score is the comment density
  • The comment density is matched with one of the categories in comment_feedback.json by selecting the largest category it's smaller than. If that category has a bonus, that bonus is added.