diff --git a/README.md b/README.md index d345c6af2d42c9e27e908bec191a5c53d3a273a8..1e8ea4dc80780bb7d1352495c0108bacca6bccfa 100644 --- a/README.md +++ b/README.md @@ -17,12 +17,24 @@ This repository contains the software **BioFlow-Insight** written in Python. **B - [Description](#description) - [Table of Contents](#table-of-contents) - [Installation](#installation) + - [Using from source](#using-from-source) + - [Using the Python package](#using-the-python-package) - [Usage](#usage) - - [Contributing](#contributing) + - [Input](#input) + - [Output](#output) - [License](#license) ## Installation +### Using from source + +The python packages needed are described in the `requirements.txt` file. + +> Note : To install graphviz, in linux you might need to execute this command `sudo apt install graphviz` + + +### Using the Python package + **BioFlow-Insight** is easily installable as a [Python package]()<!--TODO : Add LINK-->. To install it using *pip*, use the following command : @@ -45,13 +57,92 @@ The 3 different graphs generated by **BioFlow-Insight** are : 2. The second graph represents operations without any inputs, along with processes and their dependencies. This graph, called the *dependency graph without branch operations*, is obtained by removing the branch operations and linking the remaining elements if a path exists between them in the original specification graph. 3. The final graph, called the *process dependency graph*, represents only processes and their dependencies. Similar to the latter, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph. -To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)). +> To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)). + +### Input + +In this example, we are going to use the **BioFlow-Insight** source code. After cloning both repositories (this one and the rnaseq-nf workflow). We can run the following command to run the analyses (the different steps are described below) : + +```python +import os +current_path= os.getcwd() +os.chdir("bioflow-insight/") +from src.workflow import Workflow +os.chdir(current_path) + +w = Workflow("./rnaseq-nf/main.nf", duplicate=False, display_info=True) +w.initialise() +w.generate_all_graphs(render_graphs = True, processes_2_remove=[]) +``` -In this example, we are going to use the **BioFlow-Insight** source code. After cloning both repositories (this one and the rnaseq-nf workflow), the working directory should look like this : +1. line 1 to 5 : import the `Workflow` object allowing the analysis +2. line 6 : create the object `w` corresponding to `Workflow` + 1. line 6 : the first parameter is the address of the main Nextflow file (obligatory paramter). + 2. line 6 : parameter `duplicate` (by default `False`), in the case some processes and subworkflows are duplicated in the workflow by the `include as` option, this parameter will duplicate the elements in the graphs. + 3. line 6 : parameter `display_info` (by default `True`), shows the files which are being analysed +3. line 7 : `initialise` runs the entire analysis of the Nextflow workflow +4. line 8 : `generate_all_graphs` generates all the graphs in the mermaid and dot formats + the associated metadata for the graphs + 1. line 8 : parameter `render_graphs` (by default `True`), if true the png images of the dot graphs are generated thanks to Graphviz. For large workflows this can sometimes fail (depending on the hardware). + 2. line 8 : parameter `processes_2_remove` (by default `[]`), is a list of processes which are to be removed from the graphs. This is usefull in the cas of `MULTIQC` processes (they don't really serve a functionnal role but can cluter the structure since they are connected to the majority of processes). -## Contributing +### Output + +After the workflow has been analysed and the graphs genrated, the outputs are saved in the `results` folder. + +The structure of this folder is organised as such : + +``` +. +├── debug +│  ├── calls.nf +│  ├── operations_in_call.nf +│  └── operations.nf +├── graphs +│  ├── dependency_graph_wo_branch_operations.dot +│  ├── dependency_graph_wo_branch_operations.json +│  ├── dependency_graph_wo_branch_operations.mmd +│  ├── dependency_graph_wo_branch_operations.png +│  ├── dependency_graph_wo_branch_operations_wo_lables.dot +│  ├── dependency_graph_wo_branch_operations_wo_lables.mmd +│  ├── dependency_graph_wo_branch_operations_wo_lables.png +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations.dot +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations.mmd +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations.png +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.dot +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.mmd +│  ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.png +│  ├── metadata_dependency_graph_wo_branch_operations.json +│  ├── metadata_process_dependency_graph.json +│  ├── metadata_specification_graph.json +│  ├── process_dependency_graph.dot +│  ├── process_dependency_graph.json +│  ├── process_dependency_graph.mmd +│  ├── process_dependency_graph.png +│  ├── specification_graph.dot +│  ├── specification_graph.json +│  ├── specification_graph.mmd +│  ├── specification_graph.png +│  ├── specification_graph_wo_labels.dot +│  ├── specification_graph_wo_labels.mmd +│  ├── specification_graph_wo_labels.png +│  ├── specification_wo_orphan_operations.dot +│  ├── specification_wo_orphan_operations.mmd +│  ├── specification_wo_orphan_operations.png +│  ├── specification_wo_orphan_operations_wo_labels.dot +│  ├── specification_wo_orphan_operations_wo_labels.mmd +│  └── specification_wo_orphan_operations_wo_labels.png +└── ro-crate-metadata-rnaseq-nf.json +``` + +* The `ro-crate-metadata-rnaseq-nf.json` describes the workflow following an extended Workflow [RO-Crate](https://www.researchobject.org/ro-crate/) profile which can be found [here]() (TODO). +* the `debug` folder contains different intermediary files which are ussefull for debugging +* the `graphs` folder contains the different graphs which are generated. For each of the 3 graphs described above, **BioFlow-Insight** generates : + * A `json` file which describes the graph using **BioFlow-Insight** specific format + * A `json` file which describes the metadata which are extracted from the graph + * Where possible **BioFlow-Insight** also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented. + +> For each graph **BioFlow-Insight** generates it in the `mermaid` format and the dot `dot` format. If the `render_graphs` option is set to `True`, the `png` image is also generated. -Guidelines for contributing to the project, including how to report issues and submit pull requests. ## License @@ -68,4 +159,4 @@ ___ <br/><br/> <br/><br/> -<br/><br/> + diff --git a/src/ro_crate.py b/src/ro_crate.py index 0e4b85ab2e0f2f4979e8baff2bbe18072f2bfbde..3b180dd288b999320b9e9240785a18497b72e93d 100644 --- a/src/ro_crate.py +++ b/src/ro_crate.py @@ -76,8 +76,8 @@ class RO_Crate: info = "" current_directory = os.getcwd() os.chdir("/".join(self.workflow.nextflow_file.get_file_address().split("/")[:-1])) - try: - os.system(f"git log {'--reverse'*reverse} {file} > temp_{id(self)}.txt") + try: + os.system(f"git log {'--reverse'*reverse} {file} > temp_{id(self)}.txt >/dev/null 2>&1") with open(f'temp_{id(self)}.txt') as f: info = f.read() os.system(f"rm temp_{id(self)}.txt") @@ -116,7 +116,7 @@ class RO_Crate: info = self.fill_log_file(file, reverse = True) for match in re.finditer(r"Author: ([^>]+)<([^>]+)>",info): return [{"@id": match.group(1).strip()}] - return None + return [] def get_types(self, file): diff --git a/src/workflow.py b/src/workflow.py index b29ea75bb39edab8f5cd7c0265fc5a97df092b6e..cd29a6a812b733f6cf37b0465420310016a4a042 100644 --- a/src/workflow.py +++ b/src/workflow.py @@ -37,7 +37,7 @@ class Workflow: current_directory = os.getcwd() os.chdir("/".join(self.nextflow_file.get_file_address().split("/")[:-1])) try: - os.system(f"git log --reverse > temp_{id(self)}.txt") + os.system(f"git log --reverse > temp_{id(self)}.txt >/dev/null 2>&1") with open(f'temp_{id(self)}.txt') as f: self.log = f.read() os.system(f"rm temp_{id(self)}.txt") @@ -52,7 +52,7 @@ class Workflow: current_directory = os.getcwd() os.chdir("/".join(self.nextflow_file.get_file_address().split("/")[:-1])) try: - os.system(f"git ls-remote --get-url origin > temp_address_{id(self)}.txt") + os.system(f"git ls-remote --get-url origin > temp_address_{id(self)}.txt >/dev/null 2>&1") with open(f'temp_address_{id(self)}.txt') as f: self.address = f.read() os.system(f"rm temp_address_{id(self)}.txt")