Common Workflow Language(cwl)

cwl 参考资料
https://www.commonwl.org/user_guide
Github: https://github.com/common-workflow-language/user_guide
可用实现:cwltool、toil 和 SBG
https://toil.readthedocs.io/en/latest/running/cwl.html

其他资料、示例:
https://github.com/common-workflow-library/bio-cwl-tools
https://mmb.irbbarcelona.org/biobb/availability/tutorials/cwl


My Note:
NGS/pipeline: https://github.com/DawnEve/txtBlog/blob/master/data/NGS/pipeline.txt
Local Dir: 
	/home/wangjl/data/test/testCWL
	/home/wangjl/test/cwl_test

更换临时文件/tmp 为当前目录:
$ TMPDIR=$PWD cwltool arguments.cwl --src Hello.java

$ sudo systemctl restart docker #重启docker也不是万能的
如果重启docker还不能解决问题,可能是docker安装方式不对:snap 安装的只能在$HOME 下使用,而 apt-get 安装的不受限制。
https://github.com/common-workflow-language/common-workflow-language/issues/927

1. Install

这两个名字(cwltool, cwl-runner)有什么区别?cwl-runner is the generic name for any CWL implementation. cwltool is the reference implementation. >>其他实现.

I'm guessing you installed cwlref-runner which installs cwltool under the cwl-runner name.

依赖 node.js, Java compiler。

$ git clone https://github.com/common-workflow-language/cwltool.git
$ cd cwltool# Switch to source directory
$ pip3 install . -i https://pypi.douban.com/simple/ # Install `cwltool` from source

$ cwltool --version# Check if the installation works correctly
/home/wangjl/.local/bin/cwltool 3.1.20210825140344

$ cwl-runner --version
/usr/bin/cwl-runner 1.0.20180302231433 #有点过时了


## try1: 升级
$ pip3 install cwlref-runner -i https://pypi.douban.com/simple/
$ whereis cwl-runner
cwl-runner: /usr/bin/cwl-runner /home/wangjl/.local/bin/cwl-runner

## try2: 升级
$ pip3 update cwl-runner -i https://pypi.douban.com/simple/

$ whereis cwl-runner
cwl-runner: /usr/bin/cwl-runner /home/wangjl/.local/bin/cwl-runner

$ /home/wangjl/.local/bin/cwl-runner --version
pkg_resources.ContextualVersionConflict: (decorator 5.0.9 (/home/wangjl/.local/lib/python3.6/site-packages), Requirement.parse('decorator<5,>=4.3'), {'networkx'})




看样子需要对包 decorator 降级:
$ python3 -V
Python 3.6.9

$ pip3 -V
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

$ pip3 freeze | grep decorator 
decorator==5.0.9

随便蒙一个版本
$ pip install decorator==4.5.1 -i https://pypi.douban.com/simple/
ERROR: Could not find a version that satisfies the requirement decorator==4.5.1 (from versions: 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.2, 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.0.6, 4.0.8, 4.0.9, 4.0.10, 4.0.11, 4.1.0, 4.1.1, 4.1.2, 4.2.1, 4.3.0, 4.3.1, 4.3.2, 4.4.0, 4.4.1, 4.4.2)
ERROR: No matching distribution found for decorator==4.5.1
选一个满足要求的最新的
$ pip install decorator==4.4.2 --user -i https://pypi.douban.com/simple/

$ pip3 install --upgrade pip -i https://pypi.douban.com/simple/
Successfully installed pip-21.2.4

$ python3 -m pip -V
pip 21.2.4 from /home/wangjl/.local/lib/python3.6/site-packages/pip (python 3.6)

$ python3 -m pip freeze | grep decorator
decorator==5.0.9
版本没变?

先删掉高版本
$ python3 -m pip uninstall decorator
Would remove:
    /home/wangjl/.local/lib/python3.6/site-packages/decorator-5.0.9.dist-info/*
    /home/wangjl/.local/lib/python3.6/site-packages/decorator.py

删完高版本,低版本就出来了。
$ python3 -m pip freeze | grep decorator
decorator==4.4.2




$ whereis cwl-runner
cwl-runner: /usr/bin/cwl-runner /home/wangjl/.local/bin/cwl-runner

$ /home/wangjl/.local/bin/cwl-runner --version
/home/wangjl/.local/bin/cwl-runner 3.1.20210825140344

默认版本还是古老版本
$ cwl-runner --version
/usr/bin/cwl-runner 1.0.20180302231433
$ ls -lth /usr/bin/cwl-runner
lrwxrwxrwx 1 root root 28 Nov 26  2018 /usr/bin/cwl-runner -> /etc/alternatives/cwl-runner

## 修改链接
$ sudo ln -s -f /home/wangjl/.local/bin/cwl-runner /usr/bin/cwl-runner
$ cwl-runner --version
/usr/bin/cwl-runner 3.1.20210825140344


# 升级 docker? 百度一下,这已经是次新版了。
$ docker --version
Docker version 20.10.8, build 3967b7d


## 此前的版本 cwl-runner 1.0.20180302231433
## 后来的版本 cwl-runner 3.1.20210825140344

2. First Example: hello world

用法: cwltool [tool-or-workflow-description] [input-job-settings]

需要准备2个文件:第一个cwl文件描述做什么,第二个yaml文件设置IO。

$ cat 1st-tool.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0  #版本号
class: CommandLineTool  #声明这是个命令行工具
baseCommand: echo #实际运行的命令
inputs:   #设置输入,使用yaml格式
  message:  #变量名字
    type: string  #变量类型:字符串
    inputBinding:  #可选 在命令行的位置等信息
      position: 1  #第一个参数
outputs: []  #没有指定输出格式。value是空。


$ cat echo-job.yml
message: Hello world! from cwl


$ cwl-runner 1st-tool.cwl echo-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved '1st-tool.cwl' to 'file:///data/wangjl/test/testCWL/1st-tool.cwl'  #解析cwl文件绝对地址
[job 1st-tool.cwl] /tmp/tmpm2AZXd$ echo \  #实际执行的命令
    'Hello world! from cwl'          #第一个参数
Hello world! from cwl               #输出
[job 1st-tool.cwl] completed success
{}
Final process status is success

3. Essential Input Parameters

type 支持 string, int, long, float, double, and null; complex types are array and record; 另外还有特殊类型 File, Directory and Any.

本例展示不同类型的输入参数,并设置其在命令行的位置。

$ cat inp.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
inputs:
  example_flag:
    type: boolean  #布尔值
    inputBinding:
      position: 1 #决定着这是第几个参数,可选。
      prefix: -f  #如果值是true,则加上参数 -f,否则不加。
  example_string:
    type: string
    inputBinding:
      position: 3
      prefix: --example-string  #这个参数可选,如果提供了,参数会渲染成 --example-string hello
  example_int:
    type: int
    inputBinding:
      position: 2
      prefix: -i
      separate: false #该参数是false,就是不分开前缀,参数渲染成 -i42
  example_file:  #注意:这是一个文件,提供的输入必须注明 class: File, path: 路径
    type: File?  #后面的?表示这是可选参数,如果输入文件不提供该参数也不会报错。
    inputBinding:
      prefix: --file=
      separate: false
      position: 4

outputs: []


$ cat inp-job.yml 
example_flag: true
example_string: hello
example_int: 42
example_file:
  class: File
  path: whale.txt

$ vim whale.txt 
this is whale txt #随便写点东西。

运行程序
$ cwl-runner inp.cwl inp-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'inp.cwl' to 'file:///data/wangjl/test/testCWL/inp.cwl'
[job inp.cwl] /tmp/tmpSlo05_$ echo \
    -f \
    -i42 \
    --example-string \
    hello \
    --file=/tmp/tmpGVfLSg/stg16effb79-831d-43f1-903d-2e0a6548d048/whale.txt
-f -i42 --example-string hello --file=/tmp/tmpGVfLSg/stg16effb79-831d-43f1-903d-2e0a6548d048/whale.txt
[job inp.cwl] completed success
{}
Final process status is success

4. Returning Output Files

outputs 中描述输出格式。

本例展示如何从tar压缩文件解压。

$ cat tar.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [tar, --extract] #基本命令
inputs:
  tarfile:
    type: File
    inputBinding:
      prefix: --file
outputs:
  example_out:
    type: File
    outputBinding: #如何设置每个输出参数
      glob: hello.txt #设置输出文件夹内的文件名,如果不确定,可以使用通配符  glob: '*.txt'.
# 这个只能输出单个文件,输出多个文件则报错

$ cat tar-job.yml
tarfile:
  class: File
  path: hello.tar

## 准备输入文件
$ touch hello.txt bar.txt
$ vim hello.txt #随便写点东西
$ tar -cf hello.tar hello.txt bar.txt
$ rm hello.txt bar.txt


运行程序
$ cwl-runner tar.cwl tar-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'tar.cwl' to 'file:///data/wangjl/test/testCWL/tar.cwl'
[job tar.cwl] /tmp/tmpyT5glu$ tar \
    --extract \
    --file \
    /tmp/tmpe45IvX/stg8640c9f0-3984-4bd3-bb9a-c3ab9bee6557/hello.tar
[job tar.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$adf860ab98c892f8e37318745b782dd1e9494b4f", 
        "basename": "hello.txt", 
        "location": "file:///data/wangjl/test/testCWL/hello.txt", 
        "path": "/data/wangjl/test/testCWL/hello.txt", 
        "class": "File", 
        "size": 21
    }
}
Final process status is success

5. Capturing Standard Output

使用 stdout 指定一个文件名,来截获标准输出流。

相应的输出参数必须注明 type: stdout.

$ cat stdout.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
stdout: output.txt #把stdout输出到文件
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs:
  example_out:
    type: stdout  #输出到stdout

$ cat echo-job.yml
message: Hello world!


运行程序:
$ cwl-runner stdout.cwl echo-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'stdout.cwl' to 'file:///data/wangjl/test/testCWL/stdout.cwl'
[job stdout.cwl] /tmp/tmpazbm3f$ echo \
    'Hello world! from cwl' > /tmp/tmpazbm3f/output.txt
[job stdout.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$8f24a97752ec555e86e165f8ad005ed389776dda", 
        "basename": "output.txt", 
        "location": "file:///data/wangjl/test/testCWL/output.txt", 
        "path": "/data/wangjl/test/testCWL/output.txt", 
        "class": "File", 
        "size": 22
    }
}
Final process status is success

检查输出的文件
$ cat output.txt 
Hello world! from cwl

6. Parameter References

如何重用参数值?使用符号 $(...),是一种JS子集的语法。

前面做过tar解压的例子,其局限性很大,就是 hello.txt 是写死到cwl脚本中的,怎么能在yml中更灵活的指定呢?

$ cat tar-param.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [tar, --extract]
inputs:
  tarfile:
    type: File
    inputBinding:
      prefix: --file
  extractfile:
    type: string
    inputBinding:
      position: 1
outputs:
  extracted_file:
    type: File
    outputBinding:
      glob: $(inputs.extractfile)  # 可以引用输入文件的值,

## 输入文件。每个压缩包能解压出什么东西,不应该依赖程序,而应该是压缩包本身的属性。
$ cat tar-param-job.yml 
tarfile:
  class: File
  path: hello.tar
extractfile: goodbye.txt #改成 goodbye2.txt 报错,还是不够灵活啊


造输入文件
$ vim goodbye.txt 
$ tar -cvf hello.tar goodbye.txt
$ rm goodbye.txt


$ cwl-runner tar-param.cwl tar-param-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'tar-param.cwl' to 'file:///data/wangjl/test/testCWL/tar-param.cwl'
[job tar-param.cwl] /tmp/tmpY0EG0b$ tar \
    --extract \
    --file \
    /tmp/tmpG7pXtJ/stg9840099e-8e6a-40e9-94a6-59374734209a/hello.tar \
    goodbye.txt
[job tar-param.cwl] completed success
{
    "extracted_file": {
        "checksum": "sha1$260eb2c9cd323ee68f72df1a0f9d1d176634e9c5", 
        "basename": "goodbye.txt", 
        "location": "file:///data/wangjl/test/testCWL/goodbye.txt", 
        "path": "/data/wangjl/test/testCWL/goodbye.txt", 
        "class": "File", 
        "size": 13
    }
}
Final process status is success

只有在某些域使用参数引用:

1.From CommandLineTool
arguments
  valueFrom
stdin
stdout
stderr

From CommandInputParameter
  format
  secondaryFiles
  From inputBinding
    valueFrom
From CommandOutputParamater
  format
  secondaryFiles
From CommandOutputBinding
  glob
  outputEval

2.From Workflow
From InputParameter and WorkflowOutputParameter
  format
  secondaryFiles
  From steps
    From WorkflowStepInput
      valueFrom

3.From ExpressionTool
  expression
  From InputParameter and ExpressionToolOutputParameter
    format
    secondaryFiles

4.From ResourceRequirement
  coresMin
  coresMax
  ramMin
  ramMax
  tmpdirMin
  tmpdirMax
  outdirMin
  outdirMax

5. From InitialWorkDirRequirement
  listing
  in Dirent
    entry
    entryname

6.From EnvVarRequirement
  From EnvironmentDef
    envValue

7. Running Tools Inside Docker

容器是一个隔离的环境,如何保证容器内可以获得input文件,容器外能解析到输出文件?cwl可以自动完成。cwl的一个任务就是,映射输入文件和容器内的路径。

容器可以简化软件依赖的管理。在cwl中指定Docker镜像的语句实 hints 中的 DockerRequirement 参数。

本例展示容器内的 Node.js 输出 hellow world 到标准输出。

$ cat docker.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: node
hints:
  DockerRequirement:  #这里依赖docker
    dockerPull: node:slim #这是告诉docker源
inputs:
  src:
    type: File #提供了js源代码文件
    inputBinding:
      position: 1
outputs:
  example_out:
    type: stdout
stdout: output.txt


$ cat docker-job.yml
src:
  class: File
  path: hello.js


输入文件 
$ echo "console.log(\"Hello World, from docker\");" > hello.js
$ cat hello.js
console.log("Hello World, from docker");





############################
## 接下来是 docker 本身的调试:镜像下载问题、运行报错等。
# 中间一个bug不知道什么原因、怎么解决?
$ docker --version #物理机的docker版本
Docker version 20.10.8, build 3967b7d

$ node --version #物理机的node版本,比镜像新
v14.16.1

原版docker.com拉取失败。
error pulling image configuration: 
#Get https://production.cloudflare.docker.com/registry-v2/docker/...: dial tcp 104.18.124.25:443: i/o timeout

拉取国内镜像替代
$ docker pull hub.c.163.com/library/node:slim
slim: Pulling from library/node
bc2a558c8dfc: Pull complete 
29cb6f6be636: Pull complete 
9cef66688ce2: Pull complete 
2aca22233faa: Pull complete 
096ff65f16a8: Pull complete 
a4ef5a464551: Pull complete 
Digest: sha256:8395d2c578dc420998a726686f57d0231ad634d05b6c6198e2e02557bd130687
Status: Downloaded newer image for hub.c.163.com/library/node:slim
hub.c.163.com/library/node:slim

改名字
$ docker tag hub.c.163.com/library/node:slim node:slim
$ docker images
REPOSITORY                   TAG         IMAGE ID       CREATED        SIZE
node                         slim        914ef9e2ccb0   4 years ago    227MB

报错及可能原因:
- 要把-v放到容器名字前面。
- 报错: 有人说不能有软连接。https://stackoverflow.com/questions/50817985/docker-tries-to-mkdir-the-folder-that-i-mount
docker: Error response from daemon: 
error while creating mount source path '/home/wangjl/data/test/testCWL': mkdir /home/wangjl/data: file exists.
- 报错:/home/可以,但是 /data/ 不行,why?
docker: Error response from daemon: 
error while creating mount source path '/data/wangjl/test/testCWL': mkdir /data: read-only file system.


#这个地址可以,不含软链接
$ docker run -it -d --name try1 -v /home/wangjl/test/cwl_test:/home/wangjl/ node:slim bash
e8f8105
$ docker ps
CONTAINER ID   IMAGE       COMMAND   CREATED          STATUS          PORTS     NAMES
e8f8105cc79e   node:slim   "bash"    19 seconds ago   Up 18 seconds             try1

修改地址 $ cp hello.js /home/wangjl/test/cwl_test/
使用绝对地址: /home/wangjl/test/cwl_test/hello.js

$ docker exec -it e8f bash
root@e8f8105cc79e:/# node --version #镜像中的docker版本
v8.4.0
root@e8f8105cc79e:/# node /home/wangjl/hello.js #运行文件映射后的虚拟机中的脚本
Hello World, from docker
root@e8f8105cc79e:/# exit
exit
$ docker stop e8f

也可以直接运行:路径映射 + 运行脚本
$ docker run -v /home/wangjl/test/cwl_test:/home/wangjl/ node:slim node /home/wangjl/hello.js
Hello World, from docker
############################





使用绝对路径 /home/wangjl/test/cwl_test/hello.js
$ cat docker-job.yml 
src:
  class: File
  path: /home/wangjl/test/cwl_test/hello.js


运行程序
$ cwl-runner docker.cwl docker-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'docker.cwl' to 'file:///data/wangjl/test/testCWL/docker.cwl'
[job docker.cwl] /tmp/tmpYizLGR$ docker \
    run \
    -i \
    --volume=/tmp/tmpYizLGR:/var/spool/cwl:rw \
    --volume=/tmp/tmp3ATtsb:/tmp:rw \
    --volume=/home/wangjl/test/cwl_test/hello.js:/var/lib/cwl/stg5ca094dc-00bb-45fa-9064-3388bec4119a/hello.js:ro \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --log-driver=none \
    --user=1001:1001 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    node:slim \
    node \
    /var/lib/cwl/stg5ca094dc-00bb-45fa-9064-3388bec4119a/hello.js > /tmp/tmpYizLGR/output.txt
[job docker.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$de3bc1b9891d98a2929ca4fdd2cab229dc775baa", 
        "basename": "output.txt", 
        "location": "file:///data/wangjl/test/testCWL/output.txt", 
        "path": "/data/wangjl/test/testCWL/output.txt", 
        "class": "File", 
        "size": 25
    }
}
Final process status is success

检查输出:
$ cat output.txt 
Hello World, from docker

检查 $ docker ps -a 无残留。不知道怎么做到的。

cwl创建了一个很长的、包含路径映射的命令,来运行docker,然后在docker内运行该脚本并输出。

8. Additional Arguments and Parameters //*

如何指定不需要输入也不依赖于输入的参数(比如CPU核心数)?如何引用运行时参数?

本例使用Java从源文件编译出class文件。默认状态javac会把class文件输出到源文件所在文件夹,但是cwl的输入文件是只读的,所以需要指定另外的输出路径。

/home/wangjl/test/cwl_test/

$ cat arguments.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: Example trivial wrapper for Java 9 compiler
hints:
  DockerRequirement:
    dockerPull: openjdk:9.0.1-11-slim #dawneve/openjdk:latest
baseCommand: javac
arguments: ["-d", $(runtime.outdir)]
inputs:
  src:
    type: File
    inputBinding:
      position: 1
outputs:
  classfile:
    type: File
    outputBinding:
      glob: "*.class"


$ cat arguments-job.yml
src:
  class: File
  path: Hello.java



创建 java 源文件
$ cat Hello.java
public class Hello { 
	public static void main(String args[])  {    
		System.out.println("Hello world, from Java!");  
	}
}
编译
$ javac Hello.java #生成 Hello.class
$ java Hello
Hello world, from Java!
$ rm Hello.class



#################################
docker 需要先登录,否则下载大概率失败
$ docker login -u 用户名
输入密码
$ docker pull openjdk:9.0.1-11-slim
$ docker run openjdk:9.0.1-11-slim java -version
openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
OpenJDK 64-Bit Server VM (build 9.0.1+11-Debian-1, mixed mode)

在容器中编译
$ docker run -v /home/wangjl/test/cwl_test:/home/wangjl/ openjdk:9.0.1-11-slim bash -c 'cd /home/wangjl/ && javac Hello.java'



如果反复试验无法拉取镜像,
拉取国内镜像替代
$ docker pull hub.c.163.com/library/openjdk
重命名
$ docker tag hub.c.163.com/library/openjdk:latest dawneve/openjdk:latest
$ docker images
dawneve/openjdk   latest      4551430cfe80   4 years ago    738MB

$ docker run dawneve/openjdk java --version

$ docker run -it -d dawneve/openjdk bash
$ docker exec -it 308 bash
root@308b5227d32b:/# java -version
openjdk version "1.8.0_141"  #版本好古老
OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15)
OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)

$ docker run -v /home/wangjl/test/cwl_test:/home/wangjl/ dawneve/openjdk java -version #同样的输出
#################################




运行程序 
$ cwl-runner arguments.cwl arguments-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 2.0.20200224214940
INFO Resolved 'arguments.cwl' to 'file:///data/wangjl/test/testCWL/arguments.cwl'
INFO [job arguments.cwl] /tmp/e9ecc9_i$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/e9ecc9_i,target=/RWzMpq \
    --mount=type=bind,source=/tmp/o8iw1gdm,target=/tmp \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/Hello.java,target=/var/lib/cwl/stgdc4c49be-ea06-4981-9e9f-08b924938ff4/Hello.java,readonly \
    --workdir=/RWzMpq \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/RWzMpq \
    --cidfile=/tmp/wtz3qqzz/20210913165405-409026.cid \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /RWzMpq \
    /var/lib/cwl/stgdc4c49be-ea06-4981-9e9f-08b924938ff4/Hello.java
INFO [job arguments.cwl] Max memory used: 0MiB
INFO [job arguments.cwl] completed success
{
    "classfile": {
        "location": "file:///data/wangjl/test/testCWL/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$6f2a091492a911598cbc1f01a5c73993a00abb22",
        "size": 427,
        "path": "/data/wangjl/test/testCWL/Hello.class"
    }
}
INFO Final process status is success

此处docker报错,后查明原因:用snap版的docker只能在$HOME下使用docker,挂载其他目录会报错。

其他运行时变量 $(runtime.tmpdir), $(runtime.ram), $(runtime.cores), $(runtime.outdirSize), and $(runtime.tmpdirSize), >>更多介绍

提供一个 gcc 编译的例子

$ cat Hello.c
#include
int main(){
  printf("hello, c!\n");
}

#################
$ docker pull gcc
$ docker run gcc gcc --version
gcc (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ docker run -v /home/wangjl/test/cwl_test:/home/wangjl/ gcc gcc /home/wangjl/Hello.c -o /home/wangjl/Hello.out
$ ./Hello.out 
hello, c!
#################

9. Array Inputs

2种方式提供一个数组作为参数。1. type: array, 2. 使用方括号,如type: string[];

$ cat array-inputs.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
inputs:
  filesA:
    type: string[]
    inputBinding:
      prefix: -A
      position: 1

  filesB:
    type:
      type: array #数组
      items: string #类型
      inputBinding: #可以在数组内定义
        prefix: -B=
        separate: false
    inputBinding:
      position: 2

  filesC:
    type: string[]
    inputBinding:
      prefix: -C=
      itemSeparator: "," #参数分隔符
      separate: false
      position: 4

outputs:
  example_out:
    type: stdout
stdout: output.txt #截获标准输出到文件
baseCommand: echo  #基础命令


$ cat array-inputs-job.yml
filesA: [one, two, three]
filesB: [four, five, six]
filesC: [seven, eight, nine]



运行程序:
$ cwl-runner array-inputs.cwl array-inputs-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'array-inputs.cwl' to 'file:///home/wangjl/test/cwl_test/array-inputs.cwl'
[job array-inputs.cwl] /tmp/tmpOq2vlf$ echo \
    -A \
    one \
    two \
    three \
    -B=four \
    -B=five \
    -B=six \
    -C=seven,eight,nine > /tmp/tmpOq2vlf/output.txt
[job array-inputs.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$91038e29452bc77dcd21edef90a15075f3071540", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 60
    }
}
Final process status is success

检查输出 
$ cat output.txt 
-A one two three -B=four -B=five -B=six -C=seven,eight,nine

数组参数,定义在包含 type: array 的 type 下。

数组参数在命令行的样式,由inputBinding 指定。

itemSeparator 域控制数组参数的连接符。

10. Array Outputs

如何输出多个文件?如何指定保留哪个?

使用 glob 捕获多个输出文件到数组中,可以使用通配符:glob: "*.txt"

$ cat array-outputs.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: touch
inputs:
  touchfiles:
    type:
      type: array
      items: string
    inputBinding:
      position: 1
outputs:
  output:
    type:
      type: array  #如果type是array,则用 items 指定 File
      items: File  
    outputBinding:
      glob: "*.txt"  #只保留 *.txt,不保留*.bat。


$ cat array-outputs-job.yml
touchfiles:
  - foo.txt
  - bar.dat
  - baz.txt

运行程序
$ cwl-runner array-outputs.cwl array-outputs-job.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'array-outputs.cwl' to 'file:///home/wangjl/test/cwl_test/array-outputs.cwl'
[job array-outputs.cwl] /tmp/tmpEvH1Io$ touch \
    foo.txt \
    bar.dat \
    baz.txt
[job array-outputs.cwl] completed success
{
    "output": [
        {
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", 
            "basename": "baz.txt", 
            "location": "file:///home/wangjl/test/cwl_test/baz.txt", 
            "path": "/home/wangjl/test/cwl_test/baz.txt", 
            "class": "File", 
            "size": 0
        }, 
        {
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", 
            "basename": "foo.txt", 
            "location": "file:///home/wangjl/test/cwl_test/foo.txt", 
            "path": "/home/wangjl/test/cwl_test/foo.txt", 
            "class": "File", 
            "size": 0
        }
    ]
}
Final process status is success

检查输出:
$ ls -lth
-rw-rw-r-- 1 wangjl wangjl   0 Sep 10 09:19 baz.txt
-rw-rw-r-- 1 wangjl wangjl   0 Sep 10 09:19 foo.txt

11. Advanced Inputs

如何描述有些参数必须组合,或不能组合?描述输入的关系。

使用 type: record 域把参数分组。同一个参数描述符内的多个 type: record 被认为互斥。

$ cat record.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
inputs:
  dependent_parameters:
    type:
      type: record
      name: dependent_parameters
      fields:
        itemA:
          type: string
          inputBinding:
            prefix: -A
        itemB:
          type: string
          inputBinding:
            prefix: -B
  exclusive_parameters:
    type:
      - type: record
        name: itemC
        fields:
          itemC:
            type: string
            inputBinding:
              prefix: -C
      - type: record
        name: itemD
        fields:
          itemD:
            type: string
            inputBinding:
              prefix: -D
outputs:
  example_out:
    type: stdout
stdout: output.txt
baseCommand: echo


$ cat record-job1.yml
dependent_parameters:
  itemA: one
exclusive_parameters:
  itemC: three


$ cwl-runner record.cwl record-job1.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'record.cwl' to 'file:///home/wangjl/test/cwl_test/record.cwl'
Workflow error, try again with --debug for more information:
Invalid job input record:
record-job1.yml:1:1: the `dependent_parameters` field is not valid because
                       missing required field `itemB`
报错了,没有提供参数 itemB


$ cat record-job2.yml
dependent_parameters:
  itemA: one
  itemB: two
exclusive_parameters:
  itemC: three
  itemD: four

$ cwl-runner record.cwl record-job2.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'record.cwl' to 'file:///home/wangjl/test/cwl_test/record.cwl'
record-job2.yml:6:3: invalid field `itemD`, expected one of: 'itemC'
[job record.cwl] /tmp/tmpi8apBq$ echo \
    -A \
    one \
    -B \
    two \
    -C \
    three > /tmp/tmpi8apBq/output.txt
[job record.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$329fe3b598fed0dfd40f511522eaf386edb2d077", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 23
    }
}
Final process status is success

$ cat output.txt 
-A one -B two -C three
由于C和D互斥,所以只使用一个。




$ cat record-job3.yml
dependent_parameters:
  itemA: one
  itemB: two
exclusive_parameters:
  itemD: four

$ cwl-runner record.cwl record-job3.yml
/usr/bin/cwl-runner 1.0.20180302231433
Resolved 'record.cwl' to 'file:///home/wangjl/test/cwl_test/record.cwl'
[job record.cwl] /tmp/tmpC8WTZ4$ echo \
    -A \
    one \
    -B \
    two \
    -D \
    four > /tmp/tmpC8WTZ4/output.txt
[job record.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$77f572b28e441240a5e30eb14f1d300bcc13a3b4", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 22
    }
}
Final process status is success
$ cat output.txt 
-A one -B two -D four



如果互斥的两个都不提供呢?
$ cat record-job4.yml
dependent_parameters:
  itemA: one
  itemB: two

$ cwl-runner record.cwl record-job4.yml
Workflow error, try again with --debug for more information:
Invalid job input record:
record.cwl:19:3: Missing required input parameter 'exclusive_parameters'

12. Environment Variables

如何为工具的执行设置环境变量?工具在一个限制的环境中运行,不继承父进程的大多数环境变量。可以通过 EnvVarRequirement 设置环境变量。

$ cat env.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: env
requirements:
  EnvVarRequirement:
    envDef:
      HELLO: $(inputs.message)
inputs:
  message: string
outputs:
  example_out:
    type: stdout
stdout: output.txt

$ cat echo-job.yml
message: Hello world!

$ cwl-runner env.cwl echo-job.yml
[job env.cwl] /tmp/tmpTrhSfY$ env > /tmp/tmpTrhSfY/output.txt
[job env.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$a00671d2ed5b00e0aa51e993dff77108b3fc42e0", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 1524
    }
}
Final process status is success

$ cat output.txt 
PATH=/home/wangjl/soft/bowtie2-2.3.5.1-linux-x86_64:...:/home/wangjl/soft/homer/.//bin/
HELLO=Hello world!  ##这地方新增一个环境变量
TMPDIR=/tmp/tmpfo6Q9Q
HOME=/tmp/tmpTrhSfY

13. JavaScript Expressions

cwl不提供内建的方法,如何动态创建值?可以在cwl描述中插入js表达式。

添加 requirements: InlineJavascriptRequirement:{} 后,就支持js解析了。注意:只在必要的时候使用js。优先考虑内建文件属性: basename, nameroot, nameext 等。更多推荐实践: https://www.commonwl.org/user_guide/rec-practices/

要点: 1. InlineJavascriptRequirement 指定后,就可以在cwl中包含js表达式了;2.js表达式只能用在特定的域;3.js表达式只应在cwl没有内建解决方案的情况下使用。

$ cat expression.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo

requirements:
  InlineJavascriptRequirement: {}

inputs: []
outputs:
  example_out:
    type: stdout
stdout: output.txt
arguments:
  - prefix: -A
    valueFrom: $(1+1)
  - prefix: -B
    valueFrom: $("/foo/bar/baz".split('/').slice(-1)[0])
  - prefix: -C
    valueFrom: |
      ${
        var r = [];
        for (var i = 10; i >= 1; i--) {
          r.push(i);
        }
        return r;
      }
# As this tool does not require any inputs we can run it with an (almost) empty job file:

$ cat empty.yml
{}

$ cwl-runner expression.cwl empty.yml
[job expression.cwl] /tmp/tmpDAhtNa$ echo \
    -A \
    2 \
    -B \
    baz \
    -C \
    10 \
    9 \
    8 \
    7 \
    6 \
    5 \
    4 \
    3 \
    2 \
    1 > /tmp/tmpDAhtNa/output.txt
[job expression.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$a739a6ff72d660d32111265e508ed2fc91f01a7c", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 36
    }
}
Final process status is success

$ cat output.txt 
-A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1

Where are JavaScript expressions allowed?

像参数引用(https://www.commonwl.org/user_guide/06-params/index.html)一样,js表达式也只能用在一些特定的域。
1. From CommandLineTool
  arguments
    valueFrom
  stdin
  stdout
  stderr
  From CommandInputParameter
    format
    secondaryFiles
    From inputBinding
      valueFrom
  From CommandOutputParamater
    format
    secondaryFiles
    From CommandOutputBinding
      glob
      outputEval

2. From Workflow
  From InputParameter and WorkflowOutputParameter
    format
    secondaryFiles
  From steps
    From WorkflowStepInput
      valueFrom

3. From ExpressionTool
  expression
  From InputParameter and ExpressionToolOutputParameter
    format
    secondaryFiles

4. From ResourceRequirement
  coresMin
  coresMax
  ramMin
  ramMax
  tmpdirMin
  tmpdirMax
  outdirMin
  outdirMax

5. From InitialWorkDirRequirement
  listing
  in Dirent
    entry
    entryname

6. From EnvVarRequirement
  From EnvironmentDef
    envValue

14. Creating Files at Runtime

如何从输入参数创建必须的输入文件?如何运行脚本而不是一个简单命令?除了 inputBinding ,还能怎么传参?

使用 InitialWorkDirRequirement 可以在运行期间创建文件(运行结束会被删除)。比如有些工具需要读取配置文件,却不接受命令行参数。或者需要一个包裹好的shell 脚本。

$ cat createfile.cwl
class: CommandLineTool
cwlVersion: v1.0
baseCommand: ["sh", "example.sh"]

requirements:
  InitialWorkDirRequirement:
    listing:
      - entryname: example.sh
        entry: |-
          PREFIX='Message is:'
          MSG="\${PREFIX} $(inputs.message)"
          echo \${MSG}
inputs:
  message: string
outputs:
  example_out:
    type: stdout
stdout: output.txt


$ cat echo-job.yml
message: Hello world! -v2


表达式$(inputs.message)等在文件创建前cwl都解析了。
注意: cwl 表达式独立于此后运行的shell变量。也就是说需要保留的$符号必须前面加上反斜线\。

$ cwl-runner createfile.cwl echo-job.yml
[job createfile.cwl] /tmp/tmpBTLnvr$ sh \
    example.sh > /tmp/tmpBTLnvr/output.txt
[job createfile.cwl] completed success
{
    "example_out": {
        "checksum": "sha1$0ec41f68473a70f91a09240595318e9edbe3017d", 
        "basename": "output.txt", 
        "location": "file:///home/wangjl/test/cwl_test/output.txt", 
        "path": "/home/wangjl/test/cwl_test/output.txt", 
        "class": "File", 
        "size": 29
    }
}
Final process status is success

$ cat output.txt 
Message is: Hello world! -v2

解析语法: 上例中要运行的就是 sh example.sh,所以要动态构建文件 example.sh。

InitialWorkDirRequirement 必须提供一个 listing。而 listing 是一个 array,按照yaml格式 每个元素第一行要加上 - 前缀。该数组只有一个元素 entryname: 要生成的文件名,要和之前baseCommand 指定的名字一致。

最后的部分是 entry: |- 是 yaml 的引用语法,意味着后面是多行字符串(不加引用则需要写到一行)。>>yaml语法

15. Staging Input Files //*

如果工具要在输入文件夹之外输出文件,怎么办?

InitialWorkDirRequirement 把输入文件挂载到输出文件夹(工作目录)中。本例使用js表达式提取输入文件的base name,也就是去掉前面的文件夹路径。

$ cat linkfile.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
hints:
  DockerRequirement:
    dockerPull: openjdk:9.0.1-11-slim
baseCommand: javac

requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.src)
inputs:
  src:
    type: File
    inputBinding:
      position: 1
      valueFrom: $(self.basename)

outputs:
  classfile:
    type: File
    outputBinding:
      glob: "*.class"

$ cat arguments-job.yml
src:
  class: File
  path: Hello.java


新版本cwl工具
$ cwl-runner linkfile.cwl arguments-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 2.0.20200224214940
INFO Resolved 'linkfile.cwl' to 'file:///home/wangjl/test/cwl_test/linkfile.cwl'
INFO [job linkfile.cwl] /tmp/wtawcupq$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/wtawcupq,target=/JKaiyz \
    --mount=type=bind,source=/tmp/vy14efwz,target=/tmp \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/Hello.java,target=/JKaiyz/Hello.java,readonly \
    --workdir=/JKaiyz \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/JKaiyz \
    --cidfile=/tmp/kr8x374i/20210913170837-038313.cid \
    openjdk:9.0.1-11-slim \
    javac \
    Hello.java
INFO [job linkfile.cwl] Max memory used: 0MiB
INFO [job linkfile.cwl] completed success
{
    "classfile": {
        "location": "file:///home/wangjl/test/cwl_test/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$6f2a091492a911598cbc1f01a5c73993a00abb22",
        "size": 427,
        "path": "/home/wangjl/test/cwl_test/Hello.class"
    }
}
INFO Final process status is success
$ java Hello 
Hello world, from Java!

使用-v挂载文件和目录

$ docker run -it --rm \
    -v /tmp/qi7gv8jg:/AgmkPB \
    -v /tmp/_mfqect7:/tmp \
    -v /home/wangjl/test/cwl_test/Hello.java:/AgmkPB/Hello.java:ro \
    --workdir=/AgmkPB \
    --read-only=true \
    --cidfile=/tmp/lgio1cr4/20210910184532-748371.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/AgmkPB \
    openjdk:9.0.1-11-slim bash
root@b590f40db9c8:~# javac Hello.java
root@b590f40db9c8:~# ls
Hello.java
root@b590f40db9c8:~# java Hello
Hello world, from Java!

16. File Formats

如何标记输入文件中的必须的文件格式?输出的文件格式呢?type: File 然后在 format: 指定格式,现有格式: IANA here and for EDAM here

下一节再解释 $namespaces and $schemas,现在先提前用着。对于相加的元素, cwltool 会做基本的格式推测,如果有明显错误会提醒。

$ cat metadata_example.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

label: An example tool demonstrating metadata.

inputs:
  aligned_sequences:
    type: File
    label: Aligned sequences in BAM format
    format: edam:format_2572
    inputBinding:
      position: 1

baseCommand: [ wc, -l ]

stdout: output.txt

outputs:
  report:
    type: stdout
    format: edam:format_1964
    label: A text file that contains a line count

$namespaces:
  edam: http://edamontology.org/
$schemas:
  - http://edamontology.org/EDAM_1.18.owl

## 等价的命令行就是 wc -l /path/to/aligned_sequences.ext > output.txt




样本参数文件
$ cat sample.yml
aligned_sequences:
    class: File
    format: http://edamontology.org/format_2572
    path: file-formats.bam

下载文件
$ wget https://github.com/common-workflow-language/user_guide/raw/gh-pages/_includes/cwl/16-file-formats/file-formats.bam
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... failed: Connection timed out.
-rw-rw-r-- 1 wangjl wangjl  45M Sep 10 19:55 file-formats.bam
$ samtools view file-formats.bam |wc -l
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[main_samview] truncated file.
288227





运行程序
$ cwltool metadata_example.cwl sample.yml
INFO /home/wangjl/.local/bin/cwltool 3.1.20210825140344
INFO Resolved 'metadata_example.cwl' to 'file:///home/wangjl/test/cwl_test/metadata_example.cwl'
INFO [job metadata_example.cwl] /tmp/bd3g9ild$ wc \
    -l \
    /tmp/qazmmskk/stg334f6bcf-b994-4032-8a18-70f85336d1fa/file-formats.bam > /tmp/bd3g9ild/output.txt
INFO [job metadata_example.cwl] completed success
{
    "report": {
        "location": "file:///home/wangjl/test/cwl_test/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$c2549632a2f5079926c146f3e0e2889a88fe88c0",
        "size": 77,
        "format": "http://edamontology.org/format_1964",
        "path": "/home/wangjl/test/cwl_test/output.txt"
    }
}
INFO Final process status is success

检查输出
$ cat output.txt 
13698 /tmp/qazmmskk/stg334f6bcf-b994-4032-8a18-70f85336d1fa/file-formats.bam

17. Metadata and Authorship //运行失败

如何标出作者信息等 元信息,增加引用?

这是非必须扩展。对于开发者,建议按照如下最少metadata原则构建工具和流程。如下例子包含如何引用。

$ cat metadata_example2.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

label: An example tool demonstrating metadata.
doc: Note that this is an example and the metadata is not necessarily consistent.

inputs:
  aligned_sequences:
    type: File
    label: Aligned sequences in BAM format
    format: edam:format_2572
    inputBinding:
      position: 1

baseCommand: [ wc, -l ]

stdout: output.txt

outputs:
  report:
    type: stdout
    format: edam:format_1964
    label: A text file that contains a line count

s:author:
  - class: s:Person
    s:identifier: https://orcid.org/0000-0002-6130-1021
    s:email: mailto:dyuen@oicr.on.ca
    s:name: Denis Yuen

s:contributor:
  - class: s:Person
    s:identifier: http://orcid.org/0000-0002-7681-6415
    s:email: mailto:briandoconnor@gmail.com
    s:name: Brian O'Connor

s:citation: https://dx.doi.org/10.6084/m9.figshare.3115156.v2
s:codeRepository: https://github.com/common-workflow-language/common-workflow-language
s:dateCreated: "2016-12-13"
s:license: https://spdx.org/licenses/Apache-2.0 

$namespaces:
  s: https://schema.org/
  edam: http://edamontology.org/

$schemas:
 - https://schema.org/version/latest/schemaorg-current-https.rdf
 - http://edamontology.org/EDAM_1.18.owl


# 以上等价于如下命令  wc -l /path/to/aligned_sequences.ext > output.txt

运行程序
$ cwl-runner metadata_example2.cwl sample.yml 
//运行失败

Extended Example 扩展后的例子

$ cat metadata_example3.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

label: An example tool demonstrating metadata.
doc: Note that this is an example and the metadata is not necessarily consistent.

hints:
  ResourceRequirement:
    coresMin: 4

inputs:
  aligned_sequences:
    type: File
    label: Aligned sequences in BAM format
    format: edam:format_2572
    inputBinding:
      position: 1

baseCommand: [ wc, -l ]

stdout: output.txt

outputs:
  report:
    type: stdout
    format: edam:format_1964
    label: A text file that contains a line count

s:author:
  - class: s:Person
    s:identifier: https://orcid.org/0000-0002-6130-1021
    s:email: mailto:dyuen@oicr.on.ca
    s:name: Denis Yuen

s:contributor:
  - class: s:Person
    s:identifier: http://orcid.org/0000-0002-7681-6415
    s:email: mailto:briandoconnor@gmail.com
    s:name: Brian O'Connor

s:citation: https://dx.doi.org/10.6084/m9.figshare.3115156.v2
s:codeRepository: https://github.com/common-workflow-language/common-workflow-language
s:dateCreated: "2016-12-13"
s:license: https://spdx.org/licenses/Apache-2.0 

s:keywords: edam:topic_0091 , edam:topic_0622
s:programmingLanguage: C

$namespaces:
 s: https://schema.org/
 edam: http://edamontology.org/

$schemas:
 - https://schema.org/version/latest/schemaorg-current-http.rdf
 - http://edamontology.org/EDAM_1.18.owl


运行命令
$ cwl-runner metadata_example3.cwl sample.yml 
// 运行失败 大概率是 墙 的原因。

18. Custom Types //运行失败

如何自定义类型?本例把 biom 表格转为 hd5 格式。

$ cat custom-types.cwl
#!/usr/bin/env cwl-runner 
cwlVersion: v1.0
class: CommandLineTool

requirements:
  InlineJavascriptRequirement: {}
  ResourceRequirement:
    coresMax: 1
    ramMin: 100  # just a default, could be lowered
  SchemaDefRequirement:
    types:
      - $import: biom-convert-table.yaml

hints:
  DockerRequirement:
    dockerPull: 'quay.io/biocontainers/biom-format:2.1.6--py27_0'
  SoftwareRequirement:
    packages:
      biom-format:
        specs: [ "https://doi.org/10.1186/2047-217X-1-7" ]
        version: [ "2.1.6" ]

inputs:
  biom:
    type: File
    format: edam:format_3746  # BIOM
    inputBinding:
      prefix: --input-fp
  table_type:  #这是一系列允许的表格类型
    type: biom-convert-table.yaml#table_type
    inputBinding:
      prefix: --table-type

  header_key:
    type: string?
    doc: |
      The observation metadata to include from the input BIOM table file when
      creating a tsv table file. By default no observation metadata will be
      included.
    inputBinding:
      prefix: --header-key

baseCommand: [ biom, convert ]

arguments:
  - valueFrom: $(inputs.biom.nameroot).hdf5  
    prefix: --output-fp
  - --to-hdf5

outputs:
  result:
    type: File
    outputBinding: { glob: "$(inputs.biom.nameroot)*" }

$namespaces:
  edam: http://edamontology.org/
  s: https://schema.org/

$schemas:
  - http://edamontology.org/EDAM_1.16.owl
  - https://schema.org/version/latest/schemaorg-current-http.rdf

s:license: https://spdx.org/licenses/Apache-2.0
s:copyrightHolder: "EMBL - European Bioinformatics Institute"



$ cat custom-types.yml
biom:
    class: File
    format: http://edamontology.org/format_3746
    path: rich_sparse_otu_table.biom
table_type: OTU table



下载文件 
$ wget https://raw.githubusercontent.com/common-workflow-language/user_guide/gh-pages/_includes/cwl/19-custom-types/rich_sparse_otu_table.biom



$ cat biom-convert-table.yaml
type: enum
name: table_type
label: The type of the table to produce
symbols:
  - OTU table
  - Pathway table
  - Function table
  - Ortholog table
  - Gene table
  - Metabolite table
  - Taxon table
  - Table

// 运行失败

19. Specifying Software Requirements //运行失败

使用 SoftwareRequirement 声明软件的依赖。

$ cat 01.cwl 
cwlVersion: v1.0
class: CommandLineTool

label: "InterProScan: protein sequence classifier"

doc: |
      Version 5.21-60 can be downloaded here:
      https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html

      Documentation on how to run InterProScan 5 can be found here:
      https://interproscan-docs.readthedocs.io/en/latest/HowToRun.html

requirements:
  ResourceRequirement:# 必须软件 InterProScan version 5.21-60.
    ramMin: 10240
    coresMin: 3
  SchemaDefRequirement:
    types:
      - $import: InterProScan-apps.yml

hints:
  SoftwareRequirement: 
    packages:
      interproscan:
        specs: [ "https://identifiers.org/rrid/RRID:SCR_005829" ]
        version: [ "5.21-60" ]

inputs:
  proteinFile:
    type: File
    inputBinding:
      prefix: --input
  applications:
    type: InterProScan-apps.yml#apps[]?
    inputBinding:
      itemSeparator: ','
      prefix: --applications

baseCommand: interproscan.sh

arguments:
 - valueFrom: $(inputs.proteinFile.nameroot).i5_annotations
   prefix: --outfile
 - valueFrom: TSV
   prefix: --formats
 - --disable-precalc
 - --goterms
 - --pathways
 - valueFrom: $(runtime.tmpdir)
   prefix: --tempdir


outputs:
  i5Annotations:
    type: File
    format: iana:text/tab-separated-values
    outputBinding:
      glob: $(inputs.proteinFile.nameroot).i5_annotations

20. Writing Workflows

本流程从tar文件中取出来压缩文件并编译。

每一步都必须有自己的 cwl 描述。

顶级的工作流IO的描述在 inputs 和 outputs中。

具体的每一步在 steps 中。命令的顺序,是按照steps 中的上下连接。

$ cat 1st-workflow.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow  # 这是一个流程

inputs:
  tarball: File
  name_of_file_to_extract: string

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

  compile:
    run: arguments.cwl
    in:
      src: untar/extracted_file
    out: [classfile]


输入文件
$ cat 1st-workflow-job.yml
tarball:
  class: File
  path: hello.tar
name_of_file_to_extract: Hello.java



压缩文件
$ tar -cvf hello.tar Hello.java
-rw-rw-r-- 1 wangjl wangjl  10K Sep 10 22:09 hello.tar

运行流程
$ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml
INFO /usr/bin/cwl-runner 3.1.20210825140344
INFO Resolved '1st-workflow.cwl' to 'file:///home/wangjl/test/cwl_test/1st-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step untar  #第一步 开始
INFO [step untar] start
INFO [job untar] /tmp/v9t6cuyi$ tar \
    --extract \
    --file \
    /tmp/c7yx22pj/stgeb6f82c2-3eff-4718-90bc-bec036747b07/hello.tar \
    Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow ] starting step compile  #第二步 开始
INFO [step compile] start
INFO [job compile] /tmp/i034w_up$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/i034w_up,target=/cPihUl \
    --mount=type=bind,source=/tmp/yg3_rlid,target=/tmp \
    --mount=type=bind,source=/tmp/v9t6cuyi/Hello.java,target=/var/lib/cwl/stg2ad810f9-d2cc-4d06-b192-2a46cad3aa56/Hello.java,readonly \
    --workdir=/cPihUl \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --cidfile=/tmp/f6ew8xpx/20210910221641-236483.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/cPihUl \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /cPihUl \
    /var/lib/cwl/stg2ad810f9-d2cc-4d06-b192-2a46cad3aa56/Hello.java
docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /tmp/i034w_up.
See 'docker run --help'.
// 运行失败,大概率是Docker 的某个原因。





根据作者的建议,命令前面添加 TMPDIR=$PWD 
$ TMPDIR=$PWD cwl-runner 1st-workflow.cwl 1st-workflow-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 3.1.20210825140344
INFO Resolved '1st-workflow.cwl' to 'file:///home/wangjl/test/cwl_test/1st-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step untar
INFO [step untar] start
INFO [job untar] /home/wangjl/test/cwl_test/ckyo_6ne$ tar \
    --extract \
    --file \
    /home/wangjl/test/cwl_test/y_sxyxc3/stg8d21ddb0-25e9-4789-bd44-389e096728ca/hello.tar \
    Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow ] starting step compile
INFO [step compile] start
INFO [job compile] /home/wangjl/test/cwl_test/0ud2404c$ docker \
    run \
    -i \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/0ud2404c,target=/bSkuqg \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/phc7ntw0,target=/tmp \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/ckyo_6ne/Hello.java,target=/var/lib/cwl/stg0031d60f-e925-4e47-8b1f-eb459b1f8e04/Hello.java,readonly \
    --workdir=/bSkuqg \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --cidfile=/home/wangjl/test/cwl_test/njg38xsg/20210911161824-144392.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/bSkuqg \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /bSkuqg \
    /var/lib/cwl/stg0031d60f-e925-4e47-8b1f-eb459b1f8e04/Hello.java
INFO [job compile] Max memory used: 12MiB
INFO [job compile] completed success
INFO [step compile] completed success
INFO [workflow ] completed success
{
    "compiled_class": {
        "location": "file:///home/wangjl/test/cwl_test/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$6f2a091492a911598cbc1f01a5c73993a00abb22",
        "size": 427,
        "path": "/home/wangjl/test/cwl_test/Hello.class"
    }
}
INFO Final process status is success
自己写一个可以运行的流程,先避免使用Docker。
# 第一步是解压出来,第二步是求行数。


第一步: 解压
$ cat untar.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [tar, --extract]
inputs:
  tarfile:
    type: File
    inputBinding:
      prefix: --file
  extractfile:
    type: string
    inputBinding:
      position: 1
outputs:
  extracted_file:
    type: File
    outputBinding:
      glob: $(inputs.extractfile)


$ cat untar-job.yml
tarfile:
  class: File 
  path: hello.tar
extractfile: Hello.java

$ cwl-runner untar.cwl untar-job.yml 
INFO /usr/bin/cwl-runner 3.1.20210825140344
INFO Resolved 'untar.cwl' to 'file:///home/wangjl/test/cwl_test/01/untar.cwl'
INFO [job untar.cwl] /tmp/3wihcpb3$ tar \
    --extract \
    --file \
    /tmp/kylyqeap/stgfd0762e0-0c2b-4013-b625-ef4338c57c9e/hello.tar \
    Hello.java
INFO [job untar.cwl] completed success
{
    "extracted_file": {
        "location": "file:///home/wangjl/test/cwl_test/01/Hello.java",
        "basename": "Hello.java",
        "class": "File",
        "checksum": "sha1$0428d5d333af9c0c61c7626a6962e549b5f97394",
        "size": 125,
        "path": "/home/wangjl/test/cwl_test/01/Hello.java"
    }
}
INFO Final process status is success




第二步: 计算行数
$ cat count.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [wc, -l]
inputs:
  textfile:
    type: File
    inputBinding:
      position: 1

stdout: output.txt
outputs:
  report:
    type: stdout

$ cat count-job.yml 
textfile:
  class: File 
  path: Hello.java

$ cwl-runner count.cwl count-job.yml 
INFO /usr/bin/cwl-runner 3.1.20210825140344
INFO Resolved 'count.cwl' to 'file:///home/wangjl/test/cwl_test/01/count.cwl'
INFO [job count.cwl] /tmp/brzjqzum$ wc \
    -l \
    /tmp/9h7cil4b/stg6e9161fa-69ba-4dec-881f-1bef9773a2a3/Hello.java > /tmp/brzjqzum/output.txt
INFO [job count.cwl] completed success
{
    "report": {
        "location": "file:///home/wangjl/test/cwl_test/01/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$c922a0dd20c2d9239d01172741049df4295b4080",
        "size": 67,
        "path": "/home/wangjl/test/cwl_test/01/output.txt"
    }
}
INFO Final process status is success





把2个串起来
$ cat 2nd-workflow.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow  # 这是一个流程

inputs:  # 全文的输入
  tarball2: File
  ex_file2: string

outputs:  # 全文的输出
  report:
    type: File
    outputSource: stat/report #指定这是 stat 步骤中的输出

steps:
  untar:
    run: untar.cwl
    in:  #指定输入变量的对应关系
      tarfile: tarball2
      extractfile: ex_file2
    out: [extracted_file] #这个输出要和 untar.cwl 中的outputs中一致

  stat:
    run: count.cwl
    in: 
      textfile: untar/extracted_file #第二步的输入是第一步的输出
    out: [report] #这个输出要和 count.cwl 中的outputs中一致


$ cat 2nd-workflow-job.yml
tarball2:
  class: File
  path: hello.tar
ex_file2: Hello.java

$ cwl-runner 2nd-workflow.cwl 2nd-workflow-job.yml
INFO /usr/bin/cwl-runner 3.1.20210825140344
INFO Resolved '2nd-workflow.cwl' to 'file:///home/wangjl/test/cwl_test/01/2nd-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step untar  ##开始 第一步
INFO [step untar] start
INFO [job untar] /tmp/x9gabmla$ tar \
    --extract \
    --file \
    /tmp/znqfkw8a/stgdd0747be-c853-4d04-b4a3-a86ce78202b1/hello.tar \
    Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow ] starting step stat ##开始 第二步
INFO [step stat] start
INFO [job stat] /tmp/8h7k35uh$ wc \
    -l \
    /tmp/44r4ovbr/stgc032b8d4-4694-45aa-ad02-c2f4b012744b/Hello.java > /tmp/8h7k35uh/output.txt
INFO [job stat] completed success
INFO [step stat] completed success
INFO [workflow ] completed success #整个流程结束
{
    "report": {
        "location": "file:///home/wangjl/test/cwl_test/01/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$f4a82567a5262fc25f389c376d05058a7e167b55",
        "size": 67,
        "path": "/home/wangjl/test/cwl_test/01/output.txt"
    }
}
INFO Final process status is success

检查结果,只输出一个最终文件,其他中间文件没保留。
$ cat output.txt 
5 /tmp/44r4ovbr/stgc032b8d4-4694-45aa-ad02-c2f4b012744b/Hello.java

21. Nested Workflows

怎么把多个 workflow 连接起来?

cwl语言可以把单个命令串联起来做大的操作。我们也可以把cwl本身当做一个工具,把cwl当做其他cwl的一个步骤,只要流程引擎支持 SubworkflowFeatureRequirement。本例使用 1st-workflow.cwl 作为步骤的一部分。

workflows 放到 steps 下,cwl脚本名作为 run 的值。

使用 default 指定一个域的默认值,该值可以被 输入 的值覆盖。

使用>忽略多行拆分的长命令中的换行。

$ cat nestedworkflows.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

inputs: [] #总输入空?

outputs: #总输出
  classout:
    type: File
    outputSource: compile/compiled_class

requirements: #可以嵌套 其他cwl流程
  SubworkflowFeatureRequirement: {}

steps: # 罗列步骤
  compile: #第2步:解压并编译
    run: 1st-workflow.cwl #使用流程作为一步的工具
    in:
      tarball: create-tar/tar_compressed_java_file
      name_of_file_to_extract:
        default: "Hello.java"
    out: [compiled_class]

  create-tar: #第1步:生成源文件,并压缩
    in: []
    out: [tar_compressed_java_file]
    run:
      class: CommandLineTool
      requirements:
        InitialWorkDirRequirement: #创建运行时文件
          listing:
            - entryname: Hello.java
              entry: |
                public class Hello {
                  public static void main(String[] argv) {
                      System.out.println("Hello from Java -v3");
                  }
                }
      inputs: []
      baseCommand: [tar, --create, --file=hello.tar, Hello.java]
      outputs:
        tar_compressed_java_file:
          type: File
          streamable: true
          outputBinding:
            glob: "hello.tar"

如果是其他cwl脚本,则run就一行。如果是单行命令,且输入是文本,可以更简练:
  run:
    class: CommandLineTool
    requirements:
      ShellCommandRequirement: {}
    arguments:
      - shellQuote: false #注意:这里的false是为了防止 下文的命令被加上引号。
        valueFrom: >  #这个和|有什么区别呢?
          tar cf hello.tar Hello.java


$ cwltool nestedworkflows.cwl 
#还是docker报错 docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /tmp/bek7piq3.

$ TMPDIR=$PWD cwltool nestedworkflows.cwl 
INFO /home/wangjl/.local/bin/cwltool 3.1.20210825140344
INFO Resolved 'nestedworkflows.cwl' to 'file:///home/wangjl/test/cwl_test/nestedworkflows.cwl'
INFO [workflow ] start
INFO [workflow ] starting step create-tar
INFO [step create-tar] start
INFO [job create-tar] /home/wangjl/test/cwl_test/bku7hys8$ tar \
    --create \
    --file=hello.tar \
    Hello.java
INFO [job create-tar] completed success
INFO [step create-tar] completed success
INFO [workflow ] starting step compile
INFO [step compile] start
INFO [workflow compile] start
INFO [workflow compile] starting step untar
INFO [step untar] start
INFO [job untar] /home/wangjl/test/cwl_test/y2sbvr3z$ tar \
    --extract \
    --file \
    /home/wangjl/test/cwl_test/s9yqynk1/stg35844453-cc0d-4b99-9973-0ce4d4988bb1/hello.tar \
    Hello.java
INFO [job untar] completed success
INFO [step untar] completed success
INFO [workflow compile] starting step compile_2
INFO [step compile_2] start
INFO [job compile] /home/wangjl/test/cwl_test/jup893uw$ docker \
    run \
    -i \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/jup893uw,target=/nNpCnJ \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/7ofd3gou,target=/tmp \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/y2sbvr3z/Hello.java,target=/var/lib/cwl/stgca08da2c-6944-4548-8552-cd9849536d09/Hello.java,readonly \
    --workdir=/nNpCnJ \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --cidfile=/home/wangjl/test/cwl_test/mcobklzg/20210911164230-548358.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/nNpCnJ \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /nNpCnJ \
    /var/lib/cwl/stgca08da2c-6944-4548-8552-cd9849536d09/Hello.java
INFO [job compile] Max memory used: 9MiB
INFO [job compile] completed success
INFO [step compile_2] completed success
INFO [workflow compile] completed success
INFO [step compile] completed success
INFO [workflow ] completed success
{
    "classout": {
        "location": "file:///home/wangjl/test/cwl_test/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$4666cc2224ca7ee7298c3181291457c6f4e1ab72",
        "size": 423,
        "path": "/home/wangjl/test/cwl_test/Hello.class"
    }
}
INFO Final process status is success


检查结果
$ java Hello 
Hello from Java -v3

22. Scattering Workflows

如何并行运行工具或流程?

scatterfeatuerrequirement 设定你想运行输入list多少次。工作流把这些输入当做单输入。

这样对多个输入跑相同流程时,就不用产生不同的yaml输入文件了。

新手最常见的任务,是对不同的样本执行同样的分析。本例使用多个输入,但都运行 1st-tool.cwl

tips: 要在每个需要并行的step 加上 scatter 域。scatter 域仅仅指step级别的输入,不是workflow级别的输入。每个step的 scatter 是独立的。

$ cat scatter-workflow.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  ScatterFeatureRequirement: {} #引入并行支持

inputs:
  message_array: string[]  #输入字符串数组

steps:
  echo: #步骤
    run: 1st-tool.cwl
    scatter: message  #标记 并行:输入是单个输入,只是工作流的输入的一个元素
    in:
      message: message_array
    out: []

outputs: []


$ cat scatter-job.yml
message_array: 
  - Hello world!
  - Hola mundo!
  - Bonjour le monde!
  - Hallo welt!

回顾最早的脚本:就是打印出字符串
$ cat 1st-tool.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs: []


运行脚本 
$ cwl-runner scatter-workflow.cwl scatter-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 3.1.20210825140344
INFO Resolved 'scatter-workflow.cwl' to 'file:///home/wangjl/test/cwl_test/scatter-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step echo
INFO [step echo] start
INFO [job echo] /tmp/db1dk4n2$ echo \
    'Hello world!'
Hello world!
INFO [job echo] completed success
INFO [step echo] start
INFO [job echo_2] /tmp/cwbotnz2$ echo \
    'Hola mundo!'
Hola mundo!
INFO [job echo_2] completed success
INFO [step echo] start
INFO [job echo_3] /tmp/9r0r093_$ echo \
    'Bonjour le monde!'
Bonjour le monde!
INFO [job echo_3] completed success
INFO [step echo] start
INFO [job echo_4] /tmp/rabx6e6g$ echo \
    'Hallo welt!'
Hallo welt!
INFO [job echo_4] completed success
INFO [step echo] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success

上例对 message_array 的每个元素调用了 echo。如果是流程中超过2个步骤并行呢?

我们和上例一样执行 echo,但是把结果导向 stdout,而不是 outputs: []。

$ cat 1st-tool-mod.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs:
  echo_out:
    type: stdout



第二步,添加wc计数字符数
$ cat wc-tool.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: wc
arguments: ["-c"]
inputs:
  input_file:
    type: File
    inputBinding:
      position: 1
outputs: []




记住:scatter 域要在每一步出现!

$ cat scatter-two-steps.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
 ScatterFeatureRequirement: {}

inputs:
  message_array: string[] 

steps:
  echo: #第一步
    run: 1st-tool-mod.cwl
    scatter: message
    in:
      message: message_array
    out: [echo_out]
  wc: #第二步
    run: wc-tool.cwl
    scatter: input_file
    in:
      input_file: echo/echo_out
    out: []

outputs: []



运行流程
$ cwl-runner scatter-two-steps.cwl scatter-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 3.1.20210825140344
INFO Resolved 'scatter-two-steps.cwl' to 'file:///home/wangjl/test/cwl_test/scatter-two-steps.cwl'
INFO [workflow ] start
INFO [workflow ] starting step echo
INFO [step echo] start
INFO [job echo] /tmp/fubxm4v0$ echo \
    'Hello world!' > /tmp/fubxm4v0/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo] completed success
INFO [step echo] start
INFO [job echo_2] /tmp/ia5wl0mf$ echo \
    'Hola mundo!' > /tmp/ia5wl0mf/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_2] completed success
INFO [step echo] start
INFO [job echo_3] /tmp/vuhtx5wu$ echo \
    'Bonjour le monde!' > /tmp/vuhtx5wu/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_3] completed success
INFO [step echo] start
INFO [job echo_4] /tmp/l7189ohn$ echo \
    'Hallo welt!' > /tmp/l7189ohn/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_4] completed success
INFO [step echo] completed success
INFO [workflow ] starting step wc
INFO [step wc] start
INFO [job wc] /tmp/cxkfjhqn$ wc \
    -c \
    /tmp/ttrq4km3/stgd2cb7c5c-58d1-46de-8b79-ccb2e8e89d95/a16a6bd1d4b2cb97573ec80be0a59772521293b4
13 /tmp/ttrq4km3/stgd2cb7c5c-58d1-46de-8b79-ccb2e8e89d95/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc] completed success
INFO [step wc] start
INFO [job wc_2] /tmp/9bhz6k1h$ wc \
    -c \
    /tmp/0j7ukk63/stg44445479-1ca5-479a-bd29-f8a36b3bf8b6/a16a6bd1d4b2cb97573ec80be0a59772521293b4
12 /tmp/0j7ukk63/stg44445479-1ca5-479a-bd29-f8a36b3bf8b6/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_2] completed success
INFO [step wc] start
INFO [job wc_3] /tmp/el77sgaj$ wc \
    -c \
    /tmp/a6mzs58s/stg94e8eed4-c74c-4f15-b832-913643e89158/a16a6bd1d4b2cb97573ec80be0a59772521293b4
18 /tmp/a6mzs58s/stg94e8eed4-c74c-4f15-b832-913643e89158/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_3] completed success
INFO [step wc] start
INFO [job wc_4] /tmp/b5d96x8g$ wc \
    -c \
    /tmp/vcim_k7j/stg5c547cf3-f509-4d5a-ad58-129f29f289c2/a16a6bd1d4b2cb97573ec80be0a59772521293b4
12 /tmp/vcim_k7j/stg5c547cf3-f509-4d5a-ad58-129f29f289c2/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_4] completed success
INFO [step wc] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success


检查 
$ cat scatter-job.yml | sed '1d'| awk -F" - " '{print $2}'|  while read id; do echo "$id" | wc -c; done
13
12
18
12

缺点:实际上,上例第二步的运行并不依赖与第一步完全结束,所以没必要等第一步都结束再运行第二步。

如何样本之间独立呢?记得第21章我们可以把多个步骤做成一个步骤,然后对这一个步骤并行呢。

$ cat scatter-nested-workflow.cwl
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
 ScatterFeatureRequirement: {}
 SubworkflowFeatureRequirement: {}

inputs:
  message_array: string[] 

steps:
  subworkflow:
    run: 
      class: Workflow
      inputs: 
        message: string
      outputs: []
      steps:
        echo: #第一步
          run: 1st-tool-mod.cwl
          in:
            message: message
          out: [echo_out]
        wc: #第二步
          run: wc-tool.cwl
          in:
            input_file: echo/echo_out
          out: []
    scatter: message
    in: 
      message: message_array
    out: []
outputs: []

运行脚本 
$ cwl-runner scatter-nested-workflow.cwl scatter-job.yml
INFO /home/wangjl/.local/bin/cwl-runner 3.1.20210825140344
INFO Resolved 'scatter-nested-workflow.cwl' to 'file:///home/wangjl/test/cwl_test/scatter-nested-workflow.cwl'
INFO [workflow ] start
INFO [workflow ] starting step subworkflow
INFO [step subworkflow] start
INFO [workflow subworkflow] start
INFO [workflow subworkflow] starting step echo
INFO [step echo] start
INFO [job echo] /tmp/op1makt4$ echo \
    'Hello world!' > /tmp/op1makt4/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo] completed success
INFO [step echo] completed success
INFO [workflow subworkflow] starting step wc
INFO [step wc] start
INFO [job wc] /tmp/hj_sdv7i$ wc \
    -c \
    /tmp/2b_w_ahm/stg52f0d9e9-e1b7-449b-b40a-63ee336bdba5/a16a6bd1d4b2cb97573ec80be0a59772521293b4
13 /tmp/2b_w_ahm/stg52f0d9e9-e1b7-449b-b40a-63ee336bdba5/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc] completed success
INFO [step wc] completed success
INFO [workflow subworkflow] completed success
INFO [step subworkflow] start
INFO [workflow subworkflow_2] start
INFO [workflow subworkflow_2] starting step echo_2
INFO [step echo_2] start
INFO [job echo_2] /tmp/bo4vi77e$ echo \
    'Hola mundo!' > /tmp/bo4vi77e/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_2] completed success
INFO [step echo_2] completed success
INFO [workflow subworkflow_2] starting step wc_2
INFO [step wc_2] start
INFO [job wc_2] /tmp/glqh4_sn$ wc \
    -c \
    /tmp/gmaixz4h/stgc67c1d38-d65b-4e67-8e74-c9ff5c944b54/a16a6bd1d4b2cb97573ec80be0a59772521293b4
12 /tmp/gmaixz4h/stgc67c1d38-d65b-4e67-8e74-c9ff5c944b54/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_2] completed success
INFO [step wc_2] completed success
INFO [workflow subworkflow_2] completed success
INFO [step subworkflow] start
INFO [workflow subworkflow_3] start
INFO [workflow subworkflow_3] starting step echo_3
INFO [step echo_3] start
INFO [job echo_3] /tmp/dzddzxif$ echo \
    'Bonjour le monde!' > /tmp/dzddzxif/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_3] completed success
INFO [step echo_3] completed success
INFO [workflow subworkflow_3] starting step wc_3
INFO [step wc_3] start
INFO [job wc_3] /tmp/ltrye2a0$ wc \
    -c \
    /tmp/v5ihkmuy/stg43268241-e731-401c-b230-568acf1310df/a16a6bd1d4b2cb97573ec80be0a59772521293b4
18 /tmp/v5ihkmuy/stg43268241-e731-401c-b230-568acf1310df/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_3] completed success
INFO [step wc_3] completed success
INFO [workflow subworkflow_3] completed success
INFO [step subworkflow] start
INFO [workflow subworkflow_4] start
INFO [workflow subworkflow_4] starting step echo_4
INFO [step echo_4] start
INFO [job echo_4] /tmp/xamv8tiq$ echo \
    'Hallo welt!' > /tmp/xamv8tiq/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job echo_4] completed success
INFO [step echo_4] completed success
INFO [workflow subworkflow_4] starting step wc_4
INFO [step wc_4] start
INFO [job wc_4] /tmp/eymd2uxb$ wc \
    -c \
    /tmp/z6_m6ti3/stge5f9cf7b-cee1-4563-a755-ba1ce002b418/a16a6bd1d4b2cb97573ec80be0a59772521293b4
12 /tmp/z6_m6ti3/stge5f9cf7b-cee1-4563-a755-ba1ce002b418/a16a6bd1d4b2cb97573ec80be0a59772521293b4
INFO [job wc_4] completed success
INFO [step wc_4] completed success
INFO [workflow subworkflow_4] completed success
INFO [step subworkflow] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success

23. Conditional workflows //无法运行

如何在工作流中添加条件判断?

本例中有一个步骤依赖于输入。这让依赖于起始或上一步的结果决定是否跳过某些步骤称为可能。

xx

$ cat conditional-workflow.cwl
class: Workflow
cwlVersion: v1.2 #版本升级了,只能兼容v1.2或更高版本
inputs:
  val: int

steps:

  step1: #第一步 
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.in1 < 1) #只有在输入小于1时执行 foo.cwl 
    out: [out1]

  step2: #第二步
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.a_new_var > 2) #只在输入大于2时执行 foo.cwl
    out: [out1]

outputs:
  out1:
    type: string
    outputSource:
      - step1/out1
      - step2/out1
    pickValue: first_non_null

requirements:
  InlineJavascriptRequirement: {}
  MultipleInputFeatureRequirement: {}



原文没提供,不知道怎么写
$ cat foo.cwl 
#!/usr/bin/env cwl-runner

class: CommandLineTool
cwlVersion: v1.0

baseCommand: echo
inputs:
  in1:
    type: int
    inputBinding:
      position: 1
  a_new_var:
    type: int
    inputBinding:
      position: 2
outputs:
  out1:
    type: stdout

$ cwltool foo.cwl --in1 1 --a_new_var 3
/usr/lib/python3/dist-packages/cwltool/docker.py:423: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if res_req is not None and ("ramMin" in res_req or "ramMax" is res_req):
INFO /home/wangjl/.local/bin/cwltool 2.0.20200224214940
INFO Resolved 'foo.cwl' to 'file:///home/wangjl/test/cwl_test/foo.cwl'
INFO [job foo.cwl] /tmp/nzpvrdk5$ echo \
    1 \
    3 > /tmp/nzpvrdk5/ddb6cdca8d5bbe72eb7a275e5bc582e97701c7e1
INFO [job foo.cwl] completed success
{
    "out1": {
        "location": "file:///home/wangjl/test/cwl_test/ddb6cdca8d5bbe72eb7a275e5bc582e97701c7e1",
        "basename": "ddb6cdca8d5bbe72eb7a275e5bc582e97701c7e1",
        "class": "File",
        "checksum": "sha1$8e580831b8c3e40d0af8d438c773f994bcd894fd",
        "size": 4,
        "path": "/home/wangjl/test/cwl_test/ddb6cdca8d5bbe72eb7a275e5bc582e97701c7e1"
    }
}
INFO Final process status is success



$ cp conditional-workflow.cwl cond-wf-003.1.cwl 

$ cwltool cond-wf-003.1.cwl --val 0
/usr/lib/python3/dist-packages/cwltool/docker.py:423: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if res_req is not None and ("ramMin" in res_req or "ramMax" is res_req):
INFO /home/wangjl/.local/bin/cwltool 2.0.20200224214940
INFO Resolved 'cond-wf-003.1.cwl' to 'file:///home/wangjl/test/cwl_test/cond-wf-003.1.cwl'
ERROR Tool definition failed validation:
The CWL reference runner no longer supports pre CWL v1.0 documents. Supported versions are: 
v1.0
v1.1
v1.1.0-dev1 (with --enable-dev flag only)
v1.2.0-dev1 (with --enable-dev flag only)


$ cwltool cond-wf-003.1.cwl --val 3
$ cwltool cond-wf-003.1.cwl --val 2 #error

版本1.2还没发布,不能用,反正无法运行。

疑难杂症 / trouble shooting

1.如何挂载大文件到docker内?

使用 InitialWorkDirRequirement 并将输入文件添加到要在工作目录中暂存的文件列表。

# https://www.coder.work/article/7044398

cwlVersion: v1.0
class: CommandLineTool
baseCommand: cat

hints:
  DockerRequirement:
    dockerPull: alpine

inputs:
  in1:
    type: File
    inputBinding:
      position: 1
      valueFrom: $(self.basename)

requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.in1)

outputs:
  out1: stdout

使用 CWL 引用运行程序 ( cwltool ),可见输入文件直接挂载在工作目录中(但安全地处于只读模式)。
CWL 1.0 的行为是挂载文件而不是复制它们。

2. 如何修改临时文件夹目录

TMPDIR=$PWD cwltool arguments.cwl --src Hello.java

cwl的作者建议: TMPDIR=$PWD cwltool arguments.cwl --src Hello.java
我是用的: 就是把 /tmp 作临时目录,换成当前目录做临时目录。
$ cwltool --tmpdir-prefix=$PWD/ arguments.cwl --src Hello.java #这个似乎更符合命令行工具的使用习惯。

$ TMPDIR=$PWD cwltool arguments.cwl arguments-job.yml
INFO /home/wangjl/.local/bin/cwltool 3.1.20210825140344
INFO Resolved 'arguments.cwl' to 'file:///home/wangjl/test/cwl_test/arguments.cwl'
INFO [job arguments.cwl] /home/wangjl/test/cwl_test/ki_j2ajd$ docker \
    run \
    -i \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/ki_j2ajd,target=/jSRUIq \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/wq21fiv3,target=/tmp \
    --mount=type=bind,source=/home/wangjl/test/cwl_test/Hello.java,target=/var/lib/cwl/stgf45b5122-4676-4659-8af6-02862ce21df9/Hello.java,readonly \
    --workdir=/jSRUIq \
    --read-only=true \
    --user=1001:1001 \
    --rm \
    --cidfile=/home/wangjl/test/cwl_test/1imr1hve/20210911160940-668311.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/jSRUIq \
    openjdk:9.0.1-11-slim \
    javac \
    -d \
    /jSRUIq \
    /var/lib/cwl/stgf45b5122-4676-4659-8af6-02862ce21df9/Hello.java
INFO [job arguments.cwl] Max memory used: 28MiB
INFO [job arguments.cwl] completed success
{
    "classfile": {
        "location": "file:///home/wangjl/test/cwl_test/Hello.class",
        "basename": "Hello.class",
        "class": "File",
        "checksum": "sha1$6f2a091492a911598cbc1f01a5c73993a00abb22",
        "size": 427,
        "path": "/home/wangjl/test/cwl_test/Hello.class"
    }
}
INFO Final process status is success