4.1. The junifer Pipeline#

The junifer pipeline is the main execution path of junifer. It consists of five steps:

  1. Data Grabber: Interpret the dataset and provide a list of files.

  2. Data Reader: Read the files.

  3. Preprocess: Prepare the files’ data for marker computation.

  4. Marker Computation: Compute the marker(s).

  5. Storage: Store the marker(s) values.

The element that is passed across the pipeline is called the Data Object.

The following is a graphical representation of the pipeline:

flowchart LR dg[Data Grabber] dr[Data Reader] pp[Preprocess] mc[Marker Computation] st[Storage] dg --> dr dr --> pp pp --> mc mc --> st

However, it is usually the case that several markers are computed for the same data. Thus, the Marker Computation step of the pipeline is defined as a list of markers. The following is a graphical representation of the pipeline execution on multiple markers:

flowchart LR dg[Data Grabber] dr[Data Reader] pp[Preprocess] mc1[Marker Computation] mc2[Marker Computation] mc3[Marker Computation] mc4[Marker Computation] mc5[Marker Computation] st1[Storage] st2[Storage] st3[Storage] st4[Storage] st5[Storage] dg --> dr dr --> pp pp --> mc1 pp --> mc2 pp --> mc3 pp --> mc4 pp --> mc5 mc1 --> st1 mc2 --> st2 mc3 --> st3 mc4 --> st4 mc5 --> st5

Note

To avoid keeping in memory all of the computed marker, the storage step is called after each marker computation, releasing the memory used to compute each marker.