>This branch corresponds to the results of the crawler for the study linked with [BioFlow-Insight](https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight)
>This branch corresponds to the results of the crawler for the study linked with [BioFlow-Insight](https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight)
The crawler gathered 677 open license Nextflow workflows. A static version of this corpus can be found here: [https://zenodo.org/records/10817606](https://zenodo.org/records/10817606).
## Description
## Description
...
@@ -22,6 +29,7 @@ While the crawler is running, the data is saved in a JSON file.
...
@@ -22,6 +29,7 @@ While the crawler is running, the data is saved in a JSON file.
## Table of Contents
## Table of Contents
-[Github-Crawler](#github-crawler)
-[Github-Crawler](#github-crawler)
-[Results of the crawler](#results-of-the-crawler)
-[Description](#description)
-[Description](#description)
-[Table of Contents](#table-of-contents)
-[Table of Contents](#table-of-contents)
-[Installation](#installation)
-[Installation](#installation)
...
@@ -33,7 +41,9 @@ The python function dependancies are described in the `requirements.txt` file.
...
@@ -33,7 +41,9 @@ The python function dependancies are described in the `requirements.txt` file.
## License
## License
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT)
This project is licensed under the [GNU Affero General Public License](https://www.gnu.org/licenses/agpl-3.0.en.html).
"Hence, at least 93.1% of Nextflow workflows found on Github are not integrated into WorkflowHub\n"
"Hence, at most 6.9% of Nextflow workflows found on Github are integrated into WorkflowHub\n"
]
]
}
}
],
],
"source": [
"source": [
"nb_wfhub = 52\n",
"nb_wfhub = 52\n",
"print(f\"Hence, at least {(len(dict)-nb_wfhub)/len(dict)*100:.1f}% of Nextflow workflows found on Github are not integrated into WorkflowHub\")"
"print(f\"Hence, at most {(nb_wfhub)/len(dict)*100:.1f}% of Nextflow workflows found on Github are integrated into WorkflowHub\")"
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Analysis of results of crawler
# Analysis of results of crawler
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
importseabornassns
importseabornassns
importmatplotlib.pyplotasplt
importmatplotlib.pyplotasplt
importnumpyasnp
importnumpyasnp
sns.set(style='darkgrid',palette="Accent")
sns.set(style='darkgrid',palette="Accent")
taille=(9,5)
taille=(9,5)
```
```
%% Output
%% Output
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
importjson
importjson
importpandasaspd
importpandasaspd
withopen('wf_crawl_nextflow.json')asjson_file:
withopen('wf_crawl_nextflow.json')asjson_file:
dict=json.load(json_file)
dict=json.load(json_file)
_=dict.pop("last_date")
_=dict.pop("last_date")
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
print(f"The crawler found {len(dict)} Nextflow workflows with at least Nextflow file at the root.")
print(f"The crawler found {len(dict)} Nextflow workflows with at least Nextflow file at the root.")
```
```
%% Output
%% Output
The crawler found 752 Nextflow workflows with at least Nextflow file at the root.
The crawler found 752 Nextflow workflows with at least Nextflow file at the root.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
At the time of writing there are 52 Nextflow workflows integrated on WorkflowHub.
At the time of writing there are 52 Nextflow workflows integrated on WorkflowHub.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
nb_wfhub=52
nb_wfhub=52
print(f"Hence, at least {(len(dict)-nb_wfhub)/len(dict)*100:.1f}% of Nextflow workflows found on Github are not integrated into WorkflowHub")
print(f"Hence, at most {(nb_wfhub)/len(dict)*100:.1f}% of Nextflow workflows found on Github are integrated into WorkflowHub")
```
```
%% Output
%% Output
Hence, at least 93.1% of Nextflow workflows found on Github are not integrated into WorkflowHub
Hence, at most 6.9% of Nextflow workflows found on Github are integrated into WorkflowHub