Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
Job Output Folder Data Type Plugin
==================================

This is the data type plugin for Qiita jobs that only create a folder and the command is responsible of generating all the files.
The goal of this Qiita type plugin is to validate and summarize any kind of folder output.

Note that `job-output-folder` expects a single folder and this will become an artifact that will live in
`[qiita-base-path]/job-output-folder/[artifact-id]/[output-folder]` and this plugin will generate:

- `summary.html`: a browser friendly file listing that will include all files at `[artifact-id]/[output-folder]` and
any `index.html` files in any subfolder. As a reminder, the Qiita nginx basic configuration allows to display/load any
html/JS available files; thus, able to display properly `index.html` files available
- `MANIFEST.txt`: a comprehensive list of all available files in the folder.
Comment on lines +9 to +12

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the wording, it is not 100% clear to me if

  • "will include all files at [artifact-id]/[output-folder]" includes recursion through all sub-directories
  • "Qiita nginx basic configuration allows to display/load any html/JS available files" which lines in the example configuration refers to this? Can you point to those?
  • which class of user can make use of the MANIFEST.txt. Can he/she download the whole directory as a ZIP archive?


The two main plugins using this output are:

- https://github.com/qiita-spots/qp-knight-lab-processing: which will generate an `[output-folder]` contaning all the logs,
files and summaries from BCL to clean FASTQ processing. Note that multiqc resoults are part of this and the outputs are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add which user class can run this command? I figure it is only admins but no normal users, right?

properly displayed in Qiita using this method.
- https://github.com/qiita-spots/qp-pacbio: `PacBio processing`, the output are MAG, LCG and other output, which will be used
for dowstream analyses.
57 changes: 40 additions & 17 deletions qtp_job_output_folder/summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,37 +6,60 @@
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------

from glob import glob
from json import dumps
from os.path import dirname, exists, isdir, join
from os import sep, walk
from os.path import basename, dirname, exists, isdir, join


def _folder_listing(folder):
results = []
for f in glob(f"{folder}/*"):
if isdir(f):
results.append(("folder", f))
results.extend(_folder_listing(f"{f}/*"))
else:
results.append(("file", f))
return results
index, manifest = [], []
# only adding main files on top directory
# and index.html at any level
separator = "|--"
for dpath, _, files in walk(folder):
# assuring same order, mainly for testing
files.sort()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this inplace sorting?


# if we are at the top, we should add
# all files
if dpath == folder:
for f in files:
index.append(("file", f"{dpath}/{f}"))
# if we are not at the top, we should only add
# the index.html files
elif "index.html" in files:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to take care for different case spelling, e.g. a file with name Index.html?

index.append(("file", f"{dpath}/index.html"))

depth = dpath.replace(folder, "").count(sep)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is folder relative or absolute? What if the path contains the folder infix twice like job-output/4/job-output/4/subdir?

space = separator * depth
manifest.append(f"{space} {basename(dpath)}/")
for filename in files:
manifest.append(f"{space}{separator} {filename}")

return index, manifest


def _generate_html_summary(jid, folder, out_dir):
summary = f"<h3><b>{folder}</b> does not exist.</h3>"
manifest_fp = join(folder, "MANIFEST.txt")
index_fp = join(out_dir, "summary.html")

if exists(folder) and isdir(folder):
# calculating the "trimming" for the fullpaths, +1 is to remove /
tname = len(dirname(folder)) + 1
tlink = len(dirname(dirname(folder)))
summary = "<br/>\n".join(
[
f'<a href=".{f[tlink:]}" type="{ft}" target="_blank">{f[tname:]}</a>'
for ft, f in _folder_listing(folder)
]
)
link = '<a href=".%s" type="%s" target="_blank">%s</a>'
index, manifest = _folder_listing(folder)

with open(manifest_fp, "w") as of:
of.write("\n".join(manifest))

links = [link % (manifest_fp[tlink:], "file", manifest_fp[tname:])]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would https://www.geeksforgeeks.org/python/python-os-path-relpath-method/ be a safer solution? Instead of performing string operations here?

for ft, f in index:
links.append(link % (f[tlink:], ft, f[tname:]))

summary = "<br/>\n".join(links)

index_fp = join(out_dir, "summary.html")
with open(index_fp, "w") as of:
of.write(summary)

Expand Down
Empty file.
Empty file.
54 changes: 32 additions & 22 deletions qtp_job_output_folder/tests/test_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
class SummaryTests(PluginTestCase):
def setUp(self):
self.out_dir = mkdtemp()
self.source_dir = join(mkdtemp(), "test_data")
self.source_dir = join(mkdtemp(), "result")
source = join(dirname(abspath(getfile(currentframe()))), "test_data")
copytree(source, self.source_dir)
self._clean_up_files = [self.out_dir]
Expand Down Expand Up @@ -77,31 +77,41 @@ def test_summary(self):
with open(html_fp) as html_f:
html = html_f.read()

self.assertCountEqual(
sorted(html.replace("<br/>", "").split("\n")),
sorted(EXP_HTML.format(aid=aid).replace("<br/>", "").split("\n")),
)
self.assertEqual(html, EXP_HTML.format(aid=aid))

# verifying the new MANIFEST.txt
mfp = join(res["files"]["directory"][0]["filepath"], "MANIFEST.txt")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0: is guaranteed that this artifact will have only one file object?

self.assertTrue(exists(f"{mfp}"))
with open(mfp, "r") as f:
obs = f.readlines()
self.assertCountEqual(obs, EXP_MANIFEST)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that full file paths are problematic, but how about testing equality of the basename of each entry?



EXP_HTML = (
'<a href="./{aid}/test_data/folder_a" type="folder" target="_blank">'
"test_data/folder_a</a><br/>\n"
'<a href="./{aid}/test_data/folder_a/folder_b/folder_c" type="folder" '
'target="_blank">test_data/folder_a/folder_b/folder_c</a><br/>\n'
'<a href="./{aid}/test_data/file_2" type="file" target="_blank">'
"test_data/file_2</a><br/>\n"
'<a href="./{aid}/test_data/file_1" type="file" target="_blank">'
"test_data/file_1</a><br/>\n"
'<a href="./{aid}/test_data/test_data" type="folder" target="_blank">'
"test_data/test_data</a><br/>\n"
'<a href="./{aid}/test_data/test_data/folder_a/folder_b" type="folder" '
'target="_blank">test_data/test_data/folder_a/folder_b</a><br/>\n'
'<a href="./{aid}/test_data/test_data/folder_a/folder_b/folder_c/file_c" '
'type="file" target="_blank">test_data/test_data/folder_a/folder_b/'
"folder_c/file_c</a><br/>\n"
'<a href="./{aid}/test_data/test_data/folder_a/file_a" type="file" '
'target="_blank">test_data/test_data/folder_a/file_a</a>'
'<a href="./{aid}/result/MANIFEST.txt" type="file" target="_blank">'
"result/MANIFEST.txt</a><br/>\n"
'<a href="./{aid}/result/file_1" type="file" target="_blank">'
"result/file_1</a><br/>\n"
'<a href="./{aid}/result/file_2" type="file" target="_blank">'
"result/file_2</a><br/>\n"
'<a href="./{aid}/result/folder_a/folder_b/index.html" type="file" '
'target="_blank">result/folder_a/folder_b/index.html</a><br/>\n'
'<a href="./{aid}/result/folder_1/index.html" type="file" '
'target="_blank">result/folder_1/index.html</a>'
)
EXP_MANIFEST = [
" result/\n",
"|-- file_1\n",
"|-- file_2\n",
"|-- folder_a/\n",
"|--|-- file_a\n",
"|--|-- folder_b/\n",
"|--|--|-- index.html\n",
"|--|--|-- folder_c/\n",
"|--|--|--|-- file_c\n",
"|-- folder_1/\n",
"|--|-- index.html",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, an entry is identified as a sub-directory only by the trailing /? Can we add an empty sub-directory for testing OR would that sub-dir not even listed, even if it would exist?

]


if __name__ == "__main__":
Expand Down
Loading