Update README

649d36b0 · George Marchment · a3bd4245 · 649d36b0 · 649d36b0 · 649d36b0
Commit 649d36b0 authored 1 year ago by George Marchment
--- a/README.md
+++ b/README.md
@@ -17,12 +17,24 @@ This repository contains the software **BioFlow-Insight** written in Python. **B
  - [Description](#description)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
+    - [Using from source](#using-from-source)
+    - [Using the Python package](#using-the-python-package)
  - [Usage](#usage)
-  - [Contributing](#contributing)
+    - [Input](#input)
+    - [Output](#output)
  - [License](#license)

 ## Installation

+### Using from source
+
+The python packages needed are described in the `requirements.txt` file.
+
+> Note : To install graphviz, in linux you might need to execute this command `sudo apt install graphviz`
+
+
+### Using the Python package
+
 **BioFlow-Insight** is easily installable as a [Python package]()<!--TODO : Add LINK-->.

 To install it using *pip*, use the following command :
@@ -45,13 +57,92 @@ The 3 different graphs generated by **BioFlow-Insight** are :
 2. The second graph represents operations without any inputs, along with processes and their dependencies. This graph, called the *dependency graph without branch operations*, is obtained by removing the branch operations and linking the remaining elements if a path exists between them in the original specification graph.
 3. The final graph, called the *process dependency graph*, represents only processes and their dependencies. Similar to the latter, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph.

-To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)).
+> To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)).
+
+### Input 
+
+In this example, we are going to use the **BioFlow-Insight** source code. After cloning both repositories (this one and the rnaseq-nf workflow). We can run the following command to run the analyses (the different steps are described below) :
+
+```python
+import os
+current_path= os.getcwd()
+os.chdir("bioflow-insight/")
+from src.workflow import Workflow
+os.chdir(current_path)
+
+w = Workflow("./rnaseq-nf/main.nf", duplicate=False, display_info=True)
+w.initialise()
+w.generate_all_graphs(render_graphs = True, processes_2_remove=[])
+```

-In this example, we are going to use the **BioFlow-Insight** source code. After cloning both repositories (this one and the rnaseq-nf workflow), the working directory should look like this :
+1. line 1 to 5 : import the `Workflow` object allowing the analysis
+2. line 6 : create the object `w` corresponding to `Workflow`
+   1. line 6 : the first parameter is the address of the main Nextflow file (obligatory paramter).
+   2. line 6 : parameter `duplicate` (by default `False`), in the case some processes and subworkflows are duplicated in the workflow by the `include as` option, this parameter will duplicate the elements in the graphs.
+   3. line 6 : parameter `display_info` (by default `True`), shows the files which are being analysed
+3. line 7 : `initialise` runs the entire analysis of the Nextflow workflow
+4. line 8 : `generate_all_graphs` generates all the graphs in the mermaid and dot formats + the associated metadata for the graphs 
+   1. line 8 : parameter `render_graphs` (by default `True`), if true the png images of the dot graphs are generated thanks to Graphviz. For large workflows this can sometimes fail (depending on the hardware).
+   2. line 8 : parameter `processes_2_remove` (by default `[]`), is a list of processes which are to be removed from the graphs. This is usefull in the cas of `MULTIQC` processes (they don't really serve a functionnal role but can cluter the structure since they are connected to the majority of processes).

-## Contributing
+### Output
+
+After the workflow has been analysed and the graphs genrated, the outputs are saved in the `results` folder.
+
+The structure of this folder is organised as such :
+
+```
+.
+├── debug
+│   ├── calls.nf
+│   ├── operations_in_call.nf
+│   └── operations.nf
+├── graphs
+│   ├── dependency_graph_wo_branch_operations.dot
+│   ├── dependency_graph_wo_branch_operations.json
+│   ├── dependency_graph_wo_branch_operations.mmd
+│   ├── dependency_graph_wo_branch_operations.png
+│   ├── dependency_graph_wo_branch_operations_wo_lables.dot
+│   ├── dependency_graph_wo_branch_operations_wo_lables.mmd
+│   ├── dependency_graph_wo_branch_operations_wo_lables.png
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.dot
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.mmd
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.png
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.dot
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.mmd
+│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.png
+│   ├── metadata_dependency_graph_wo_branch_operations.json
+│   ├── metadata_process_dependency_graph.json
+│   ├── metadata_specification_graph.json
+│   ├── process_dependency_graph.dot
+│   ├── process_dependency_graph.json
+│   ├── process_dependency_graph.mmd
+│   ├── process_dependency_graph.png
+│   ├── specification_graph.dot
+│   ├── specification_graph.json
+│   ├── specification_graph.mmd
+│   ├── specification_graph.png
+│   ├── specification_graph_wo_labels.dot
+│   ├── specification_graph_wo_labels.mmd
+│   ├── specification_graph_wo_labels.png
+│   ├── specification_wo_orphan_operations.dot
+│   ├── specification_wo_orphan_operations.mmd
+│   ├── specification_wo_orphan_operations.png
+│   ├── specification_wo_orphan_operations_wo_labels.dot
+│   ├── specification_wo_orphan_operations_wo_labels.mmd
+│   └── specification_wo_orphan_operations_wo_labels.png
+└── ro-crate-metadata-rnaseq-nf.json
+```
+
+* The `ro-crate-metadata-rnaseq-nf.json` describes the workflow following an extended Workflow [RO-Crate](https://www.researchobject.org/ro-crate/) profile which can be found [here]() (TODO).
+* the `debug` folder contains different intermediary files which are ussefull for debugging 
+* the `graphs` folder contains the different graphs which are generated. For each of the 3 graphs described above, **BioFlow-Insight** generates :
+  * A `json` file which describes the graph using **BioFlow-Insight** specific format
+  * A `json` file which describes the metadata which are extracted from the graph
+  * Where possible **BioFlow-Insight** also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.
+
+> For each graph **BioFlow-Insight** generates it in the `mermaid` format and the dot `dot` format. If the `render_graphs` option is set to `True`, the `png` image is also generated.

-Guidelines for contributing to the project, including how to report issues and submit pull requests.

 ## License

@@ -68,4 +159,4 @@ ___

 <br/><br/>
 <br/><br/>
-<br/><br/>
+
--- a/src/ro_crate.py
+++ b/src/ro_crate.py
@@ -76,8 +76,8 @@ class RO_Crate:
        info = ""
        current_directory = os.getcwd()
        os.chdir("/".join(self.workflow.nextflow_file.get_file_address().split("/")[:-1]))
-        try:
-            os.system(f"git log {'--reverse'*reverse} {file} > temp_{id(self)}.txt")
+        try:           
+            os.system(f"git log {'--reverse'*reverse} {file} > temp_{id(self)}.txt >/dev/null 2>&1")
            with open(f'temp_{id(self)}.txt') as f:
                info = f.read()
            os.system(f"rm temp_{id(self)}.txt")
@@ -116,7 +116,7 @@ class RO_Crate:
        info = self.fill_log_file(file, reverse = True)
        for match in re.finditer(r"Author: ([^>]+)<([^>]+)>",info):
            return [{"@id": match.group(1).strip()}]
-        return None
+        return []


    def get_types(self, file):

--- a/src/workflow.py
+++ b/src/workflow.py
@@ -37,7 +37,7 @@ class Workflow:
        current_directory = os.getcwd()
        os.chdir("/".join(self.nextflow_file.get_file_address().split("/")[:-1]))
        try:
-            os.system(f"git log --reverse > temp_{id(self)}.txt")
+            os.system(f"git log --reverse > temp_{id(self)}.txt >/dev/null 2>&1")
            with open(f'temp_{id(self)}.txt') as f:
                self.log = f.read()
            os.system(f"rm temp_{id(self)}.txt")
@@ -52,7 +52,7 @@ class Workflow:
        current_directory = os.getcwd()
        os.chdir("/".join(self.nextflow_file.get_file_address().split("/")[:-1]))
        try:
-            os.system(f"git ls-remote --get-url origin > temp_address_{id(self)}.txt")
+            os.system(f"git ls-remote --get-url origin > temp_address_{id(self)}.txt >/dev/null 2>&1")
            with open(f'temp_address_{id(self)}.txt') as f:
                self.address = f.read()
            os.system(f"rm temp_address_{id(self)}.txt")