Tip
Would you like to participate in the 4th Workshop on 3W?
| Register at https://forms.gle/cmLa2u4VaXd1T7qp8 We will hold this workshop on the 3W between October 20 and 23, 2025. Always from 09:00 to 12:00 (GMT-3 - Brasília time). This workshop will be 100% online, free of charge, and aimed at those interested in exploring, using and/or contributing to the 3W Project. Short courses will be offered and works developed with the 3W Project resources by different authors around the world will be presented. |
|---|
This is the first repository published by Petrobras on GitHub. It supports the 3W Project, which aims to promote experimentation and development of Machine Learning-based approaches and algorithms for specific problems related to detection and classification of undesirable events that occur in offshore oil wells.
The 3W Project is based on the 3W Dataset, a database described in this paper, and on the 3W Toolkit, a software package that promotes experimentation with the 3W Dataset for specific problems. The name 3W was chosen because this dataset is composed of instances from 3 different sources and which contain undesirable events that occur in oil Wells.
Timely detection of undesirable events in oil wells can help prevent production losses, reduce maintenance costs, environmental accidents, and human casualties. Losses related to this type of events can reach 5% of production in certain scenarios, especially in areas such as Flow Assurance and Artificial Lifting Methods. In terms of maintenance, the cost of a maritime probe, required to perform various types of operations, can exceed US $500,000 per day.
Creating a dataset and making it public to be openly experienced can greatly foment the development of tools that can:
- Improve the process of identifying undesirable events in the drilling, completion and production phases of offshore wells;
- Increase the efficiency of monitoring the integrity of wells and subsea systems, whose related problems can generate invaluable losses for people, environment, and company's image.
The 3W is the first pilot of a Petrobras' program called Conexões para Inovação - Módulo Open Lab. This pilot is an open project composed by two major resources:
- The 3W Dataset, which will be evolved and supplemented with more instances from time to time;
- The 3W Toolkit, which will also be evolved (in many ways) to cover an increasing number of undesirable events during its development.
Therefore, our strategy is to make these resources publicly available so that we can develop the 3W Project with a global community collaboratively.
With this project, Petrobras intends to develop (fix, improve, supplement, etc.):
- The 3W Dataset itself;
- The 3W Toolkit itself;
- Approaches and algorithms that can be incorporated into systems dedicated to monitoring undesirable events in offshore oil wells during their respective drilling, completion and production phases;
- Tools that can be useful for our ambition.
The 3W Project was conceived and publicly launched on May 30, 2022 as a strategic action by Petrobras, led by its department responsible for Flow Assurance and its research center (CENPES). Since then, 3W has become increasingly consolidated at Petrobras in several aspects: more professionals specialized in labeling instances, more projects and teams using the resources made available by 3W, more investment in developing the digital tools needed to label and export instances, more interest in including different types of undesirable events that occur in wells during the drilling, completion and production phases, etc.
Due to this evolution, from May 1st, 2024 the 3W's governance is now done with the participation of the Petrobras' department responsible for Well Integrity.
We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.
Before you can contribute to this project, you need to read and agree to the following documents:
It is also very important to know, participate and follow the discussions. See the discussions section.
All the code of this project is licensed under the Apache 2.0 License and all 3W Dataset's data files (Parquet files saved in subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.
In the 3W Project, three types of versions will be managed as follows.
- Version of the 3W Toolkit: specified in the init.py file;
- Version of the 3W Dataset: specified in the dataset.ini file;
- Version of the 3W Project: specified with tags in the git repository;
- We will exclusively use the semantic versioning defined in https://semver.org;
- Versions will always be updated manually;
- Versioning of the 3W Toolkit and 3W Dataset are completely independent of each other;
- The version of the 3W Project will be updated whenever, and only when, there is a new commit in the
mainbranch of the repository, regardless of the updated resource: 3W Toolkit, 3W Dataset, 3W Project's documentation, example of use, etc; - We will only use annotated tags and for each tag there will be a release in the remote repository (GitHub);
- Content for each release will be automatically generated with functionality provided by GitHub.
See the discussions section. If you don't get clarification, please open discussions to ask your questions so we can answer them.
To the best of its authors' knowledge, this is the first realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data. For more information about the theory behind this dataset, refer to the paper A realistic and public dataset with rare undesirable real events in oil wells published in the Journal of Petroleum Science and Engineering (link here).
The 3W Dataset consists of multiple Parquet files saved in subdirectories of the dataset directory and structured as detailed here.
A 3W Dataset's general presentation with some quantities and statistics is available in this Jupyter Notebook.
The 3W Toolkit is a software package written in Python 3 that contains resources that make the following easier:
- 3W Dataset overview generation;
- Experimentation and comparative analysis of Machine Learning-based approaches and algorithms for specific problems related to undesirable events that occur in offshore oil wells during their respective drilling, completion and production phases;
- Standardization of key points of the Machine Learning-based algorithm development pipeline.
It is important to note that there are arbitrary choices in this toolkit, but they have been carefully made to allow adequate comparative analysis without compromising the ability to experiment with different approaches and algorithms.
The 3W Toolkit is implemented in sub-modules as discribed here.
Specific problems will be incorporated into this project gradually. At this point, we can work on:
All specification is detailed in the CONTRIBUTING GUIDE.
The list below with examples of how to use the 3W Toolkit will be incremented throughout its development.
- 3W Dataset's overviews:
- Binary classifier of Spurious Closure of DHSV:
For a contribution of yours to be listed here, follow the instructions detailed in the CONTRIBUTING GUIDE.
For all results generated by the 3W Toolkit to be consistent, we recommend you create and use a virtual environment with the packages versions specified in the environment.yml, which was generated with conda. Our current recommendation is to use the conda distributed by Miniforge. Download and install Miniforge according to the official instructions. Open a prompt on your operating system (Windows, Linux or MacOS). Make sure the current directory is the directory where you have the 3W. Run the following commands as needed:
- To create a virtual environment from our environment.yml:
$ conda env create -f environment.yml
- To activate the created virtual environment:
$ conda activate 3W
- To use the 3W Toolkit resources interactively:
$ python
- To initialize a local Jupyter Notebook server:
$ jupyter notebook
The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries.
More information about this community can be found here.
- About
- Development Documentation
- Usage Documentation
- Toolkit UML
- Setup
- Contributing
- License
- Contact
- Acknowledgments
The evolution of machine learning has been catalyzed by the rapid advancement in data acquisition systems, scalable storage, high-performance processing, and increasingly efficient model training through matrix-centric hardware (e.g., GPUs). These advances have enabled the deployment of highly parameterized AI models in real-world applications such as health care, finance, and industrial operations.
In the oil & gas sector, the widespread availability of low-cost sensors has driven a paradigm shift from reactive maintenance to condition-based monitoring (CBM), where faults are detected and classified during ongoing operation. This approach minimizes downtime and improves operational safety. The synergy between AI and big data analysis has thus enabled the development of generalizable classifiers that require minimal domain knowledge and can be effectively adapted to a wide range of operational scenarios.
In this context, we present 3WToolkit+, a modular and open-source AI toolkit for time-series processing, aimed at fault detection and classification in oil well operation. Building upon the experience with the original 3WToolkit system and leveraging the Petrobras 3W Dataset, 3WToolkit introduces enhanced functionalities, such as advanced data imputation, deep feature extraction, synthetic data augmentation, and high-performance computing capabilities for model training.
The development of the 3WToolkit+ is the result of a collaborative partnership between Petrobras, with a focus on the CENPES research center, and the COPPE/Universidade Federal do Rio de Janeiro (UFRJ). This joint effort brings together complementary strengths: COPPE/UFRJ contributes decades of proven expertise in signal processing and machine learning model development, while CENPES offers access to highly specialized technical knowledge and real-world operational challenges in the oil and gas sector. This synergy ensures that 3WToolkit+ is both scientifically rigorous and practically relevant, addressing complex scenarios with robust and scalable AI-based solutions for time-series analysis and fault detection in oil well operations.
The image above illustrates the high-level architecture of the 3WToolkit+, designed to support the full pipeline of machine learning applications using the 3W dataset—from raw data ingestion to model evaluation and delivery to end users. Each block in the architecture is briefly described below:
This block represents different available versions of the 3W dataset, which include real and simulated data from offshore oil wells. These datasets serve as the foundation for all subsequent stages of data processing, modeling, and evaluation.
The Data Loader module is responsible for importing, validating, and preparing the raw 3W data for use in model training and evaluation. It handles missing data, standardizes variable formats, and performs initial quality checks to ensure compatibility across toolkit components.
This central module provides the infrastructure for designing, training, and optimizing machine learning models for fault detection and classification. It supports both classical and deep learning models and includes tools for hyperparameter tuning, cross-validation, and model versioning.
The Assessment module evaluates model performance using both sample-level and event-level metrics. It includes support for traditional indicators (e.g., accuracy, precision, recall) as well as domain-specific metrics such as detection lag and anticipation time, which are critical for condition-based monitoring.
A curated set of ready-to-use model configurations and scripts that demonstrate how to apply the toolkit to common fault detection tasks using the 3W dataset. These examples accelerate onboarding and reproducibility.
The 3WToolkit examples can be found here
Step-by-step tutorials and demonstration notebooks that guide users through the toolkit’s functionalities, explaining how each module operates and how to configure different experiments.
The 3WToolkit demos can be found here (TO BE DONE!)
This component provides benchmarking tasks and open challenges using real scenarios derived from the 3W dataset. It promotes collaborative development and comparative evaluation of machine learning solutions in fault diagnosis.
The 3WToolkit challenges can be found here (TO BE DONE!)
Instructional videos that explain toolkit concepts, walk through complete modeling pipelines, and offer insights from domain experts. These videos aim to broaden accessibility and support training initiatives.
The 3WToolkit videos can be found here (TO BE DONE!)
Building upon the high-level block diagram architecture, a detailed UML (Unified Modeling Language) diagram was developed to support the software engineering and implementation of the 3WToolkit+. The UML model formalizes the relationships between components, data structures, and workflows described in the block-level architecture, enabling a structured and maintainable development process.
This transition from conceptual blocks to formal UML design ensures that each module—such as the Data Loader, Model Development, and Assessment—has clearly defined interfaces, class responsibilities, and interaction protocols. It also facilitates modular programming, unit testing, and future extensibility of the toolkit by providing developers with a shared, consistent blueprint for implementation.
The UML diagram serves not only as an internal reference for the development team but also as part of the developer-oriented documentation that accompanies the toolkit and it is shown bellow
To ensure a consistent, reproducible, and isolated development environment, this project uses Docker as part of its core development workflow. Docker enables the encapsulation of all dependencies, configurations, and system-level requirements needed to run the application, eliminating the "it works on my machine" problem. By containerizing the development environment, we guarantee that all contributors and automated CI/CD pipelines operate under the same conditions, improving reliability and minimizing unexpected behaviors. Additionally, Docker simplifies environment setup, allowing developers to start contributing quickly without manually installing and configuring complex dependencies. This approach also facilitates testing across multiple versions of Python or system libraries when needed, supporting robust and portable software engineering practices.
All dependencies and system requirements for this project have been fully encapsulated within a Docker image to ensure consistency and reproducibility across environments. As such, it is highly recommended that developers use this Docker image during development. You can either build the image locally or pull it directly from Docker Hub, depending on your preference or workflow.
Docker operates by leveraging containerization, which allows applications and their dependencies to run in isolated user-space environments that share the host system's kernel. Unlike traditional virtual machines, which emulate entire hardware stacks and run full guest operating systems, Docker containers are significantly more lightweight and faster to start. This leads to improved resource efficiency, lower overhead, and greater scalability. In development environments where multiple users are working on the same codebase, Docker provides a critical advantage: it ensures that all contributors run the exact same environment, from system libraries to Python packages, without the need for heavy virtual machines or complex configuration. Containers can be spun up instantly, consume fewer resources, and integrate seamlessly with CI/CD pipelines. Moreover, Docker images can be versioned, shared via registries like Docker Hub, and easily rebuilt, enabling collaborative and reproducible workflows across diverse teams and systems.
To build the Docker image locally, navigate to the root directory of the project and run:
docker build --tag=<usr name>/3w_tk_img:latest .To push the image to Docker Hub, make sure you are logged in and then execute:
docker pull mathtzt/3w_tk_imgAfter building or pulling the image in computer, just run:
docker run mathtzt/3w_tk_img-
VSCode extension: Dev Containers (ID:
ms-vscode-remote.remote-containers). -
Open your project root folder (
3WToolkit/) in VSCode. -
Press
F1orCtrl+Shift+Pand select:Dev Containers: Open Folder in Container -
VSCode will build the image and open your project inside the Container.
-
Working inside the Container:
- Once the container is running, it is possible to use the VSCode terminal, which now runs inside the container.
Note:
Install libraries using pip will stay isolated from your host system.
This project uses Poetry as its dependency and packaging manager to ensure a consistent, reliable, and modern Python development workflow. Poetry simplifies the management of project dependencies by providing a single `pyproject.toml` file to declare packages, development tools, and metadata, while automatically resolving compatible versions. Unlike traditional `requirements.txt` workflows, Poetry creates an isolated and deterministic environment using a lock file (`poetry.lock`), ensuring that all contributors and deployment environments use exactly the same package versions. It also streamlines publishing to PyPI, virtual environment creation, and script execution, making it a comprehensive tool for managing the entire lifecycle of a Python project. By adopting Poetry, we reduce the risk of dependency conflicts and improve the reproducibility and maintainability of the codebase.
It is possible to perform the installation in two different ways.
- ThreeWToolkit is on PyPI, so you can use pip to install it:
pip install ThreeWToolkit- Installing directly from the git repository (private): You can install directly using:
pip install git+https://github.com/Mathtzt/3WToolkit.gitNote: Authentication is required.
Thank you for your interest in contributing to this project! We welcome contributions that help improve and expand the functionality of this repository. To ensure a smooth collaboration process, please follow the guidelines below.
Start by forking this repository to your own GitHub account.
Create a new branch from main for your feature or fix:
git checkout -b feature/my-new-featureEnsure your code is readable, modular, and follows PEP 8 standards.
Every new feature or functionality must be accompanied by unit tests relevant to the code you are contributing. Tests should be placed under the tests/ directory and must cover both typical and edge cases.
Before submitting a pull request:
- Run all existing and new tests, and ensure they pass with no errors.
- Use
coverageto check test coverage, ensuring that the new functionality is properly covered.
To run tests and check coverage:
pytest --cov=your_package_name💡 Replace
your_package_namewith the appropriate module or package path.
Along with your code, you must include a Python Jupyter Notebook that clearly demonstrates how to use the new functionality. The notebook should:
- Be placed under the
docks/notebooksfolder. - Provide a step-by-step explanation.
- Include code cells, outputs, and descriptive markdowns for clarity.
Open a pull request to the main branch with a clear title and detailed description of what your contribution does. Link any relevant issues if applicable.
- Code is PEP 8 compliant
- Unit tests are included and passing
- All existing tests pass without errors
- Test coverage checked using
coverage - Usage notebook is provided with step-by-step explanation
- Changes are well-documented
- Pull request includes a meaningful description
r1remote/main