ExecutionAgent 🚀

Automate Building, Testing, and Validation of GitHub Projects in Isolated Containers

ExecutionAgent is a robust tool leveraging a large language model (LLM) to autonomously clone, build, install, and run test cases for projects hosted on GitHub—all inside an isolated container. With support for multiple languages and configurations, ExecutionAgent aims to streamline development and quality assurance workflows.

📦 Dev Container Setup

To get started in a VSCode Dev Container:

Install the Remote - Containers extension.
Clone this repository.
Open the repository in VSCode, and it will prompt you to reopen in the dev container. Alternatively, use a command to open the current folder in a dev container.

✨ Key Features

Dual Mode Execution: Run ExecutionAgent with a batch file or directly with a GitHub repository URL.
Autonomous Workflow: Clone, build, and test GitHub projects with no human intervention (we will add human-in-the-loop soon).
Language Support: Multiple languages like Python, C, C++, Java, JavaScript, and more.
Dev Container Integration: Preconfigured for VSCode Dev Containers for seamless development.
Metrics (based on evaluation set of 50 projects):
- Build Success Rate: 80%
- Test Success Rate: 65%

🚀 How It Works

1️⃣ Single Repository Mode

You can directly process a single GitHub repository using the --repo option:

./ExecutionAgent.sh --repo <github_repo_url> -l <num_value>

Example:

./ExecutionAgent.sh --repo https://github.com/pytest-dev/pytest -l 50

When this mode is used, ExecutionAgent will:

Extract the project name from the URL.
Determine the repository's primary programming language by calling get_main_language.py.
Clone the repository and set up metadata.
Launch the main loop of ExecutionAgent to build the project and run its test cases.

The -l option allows you to specify the number of cycles which corresponds to the number of actions the agent can execute. By default, if -l is not provided, it will be set to 40. If you want to set a different number, simply pass the desired value after -l. For example, -l 50 will use 50 instead of the default value.

2️⃣ Batch File Mode

Prepare a batch file listing projects to process in the format:
<project_name> <github_url> <language>

Example (notice how for now we leave one empty line after each entry):

scipy https://github.com/scipy/scipy Python

pytest https://github.com/pytest-dev/pytest Python

Run ExecutionAgent with the batch file:

./ExecutionAgent.sh /path/to/batch_file.txt

ExecutionAgent will process each project listed in the file, performing the same steps as the single repository mode. The -l option can also be applied here by adding it to the command when running the script.

To show the results of the last experiment for a specific project, you can call:

#
python3.10 show_results.py <project_name>
# example python3.10 show_results.py pytest

To clean all the logs and unset the api token, you can use the following command (WARNING: ALL THE LOGS AND EXECUTION RESULTS WOULD BE DELETED)

./clean.sh

🔧 Configuration

More options on configuring the agent would be coming soon

Control the Number of Iterations:

By default, the number of attempts ExecutionAgent will make is 3. After each attempt, ExecutionAgent learns from the previous one and adjust its strategy. In each attempt, the agent executes a number of commands (or cycles) defined by the parameter l mentioned above (default = 40).

To set the number of attempts, you need to change line 17 (local max_retries=2) to any number you want (the total number of attempts would be max_retries +1).

Keep or Delete a docker container

You can set this option in the file customize.json. Default value is "FALSE" (containers would be deleted). The other option is "True" which keeps the containers.

This options useful for the ones who want to reuse the container already built by the agent.

📊 Results Summary

Metric	Success Rate
Build Success Rate	80%
Test Execution Rate	65%

Results are logged under experimental_setups/experiment_XX, where XX is an incremented number for each invocation of ExecutionAgent.

📁 Output Folder Structure Explanation

The folder structure under experimental_setups/experiment_XX is organized to keep track of the various outputs and logs generated during the execution of the ExecutionAgent. Below is a breakdown of the key directories and their contents:

files: Contains files generated by the ExecutionAgent, such as Dockerfile, installation scripts, or any configuration files necessary for setting up the container environment.
- Example: Dockerfile, INSTALL.sh
logs: Stores raw logs capturing the input prompts and the corresponding outputs from the model during execution. These logs are essential for troubleshooting and understanding the behavior of the agent.
- Example: cycles_list_marshmallow, prompt_history_marshmallow
responses: Holds the responses generated by the model during the execution process in a structured JSON format. These responses include details about the generated build or test configurations and results.
- Example: model_responses_marshmallow
saved_contexts: Contains the saved states of the agent object at each iteration of the execution process. These snapshots are useful for debugging, tracking changes, and extracting subcomponents of the prompt across different cycles.
- Example: cycle_1, cycle_10, etc.

📜 Research Paper

Dive into the technical details and evaluation in our paper.

📬 Feedback

Have suggestions or found a bug? Open an issue or contact us at my_email.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.devcontainer		.devcontainer
autogpt		autogpt
execution_agent_workspace		execution_agent_workspace
experimental_setups		experimental_setups
nightly_runs		nightly_runs
parsable_logs		parsable_logs
plugins		plugins
problems_memory		problems_memory
problems_memory_old		problems_memory_old
prompt_files		prompt_files
scripts		scripts
search_logs		search_logs
.gitignore		.gitignore
ExecutionAgent.sh		ExecutionAgent.sh
LICENSE		LICENSE
README.md		README.md
ai_settings.yaml		ai_settings.yaml
benchmarks.py		benchmarks.py
clean.sh		clean.sh
clone_and_set_metadata.py		clone_and_set_metadata.py
commands_interface.json		commands_interface.json
create_files_index.py		create_files_index.py
customize.json		customize.json
data_ingestion.py		data_ingestion.py
execution_agent.png		execution_agent.png
get_main_language.py		get_main_language.py
manage_docker_images.py		manage_docker_images.py
plugins_config.yaml		plugins_config.yaml
post_process.py		post_process.py
prepare_ai_settings.py		prepare_ai_settings.py
project_meta_data.json		project_meta_data.json
pyproject.toml		pyproject.toml
remove_api_token.py		remove_api_token.py
requirements.txt		requirements.txt
run.sh		run.sh
setup_api_key.py		setup_api_key.py
show_results.py		show_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExecutionAgent 🚀

📦 Dev Container Setup

✨ Key Features

🚀 How It Works

1️⃣ Single Repository Mode

2️⃣ Batch File Mode

🔧 Configuration

Control the Number of Iterations:

Keep or Delete a docker container

📊 Results Summary

📁 Output Folder Structure Explanation

📜 Research Paper

📬 Feedback

About

Releases

Packages

Contributors 2

Languages

License

sola-st/ExecutionAgent

Folders and files

Latest commit

History

Repository files navigation

ExecutionAgent 🚀

📦 Dev Container Setup

✨ Key Features

🚀 How It Works

1️⃣ Single Repository Mode

2️⃣ Batch File Mode

🔧 Configuration

Control the Number of Iterations:

Keep or Delete a docker container

📊 Results Summary

📁 Output Folder Structure Explanation

📜 Research Paper

📬 Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages