Skip to content

Introduction

Simian Wrapper facilitates reproducible research and operation in a way “too easy not to do”. Securing a collaborative, reusable and transparent workflow, Simian Wrapper provides building blocks for data integration and governance – cornerstones of computational reproducibility.

Use cases for Simian Wrapper reside not only in environments where auditability and case-study tracking are paramount, driven by regulatory demands or the need to cover liabilities. But also, in research environments where methodologies for innovative analytics are being explored and developed – iterative processes by definition.

Adhering to the paradigms of The Turing Way handbook to reproducible, ethical, and collaborative data science (*), reproducibility is defined as work that can be independently recreated from the same data and the same code originally used in an analytics case-study. By design, Simian Wrapper facilitates the four different dimensions of reproducible operation coined by The Turing Way:

simian-wrapper-the-turing-way-reproducible-recolored
Reproducible: when the same analysis steps performed on the same dataset consistently produces the same answer.
simian-wrapper-the-turing-way-replicable-recolored
Replicable: when the same analysis, performed on different datasets, produces qualitatively similar answers.
simian-wrapper-the-turing-way-robust-recolored
Robust: when the same dataset is subjected to different analysis methodologies or workflows to perform the same computational operation and a qualitatively similar or (near)identical answer is produced. Across model versions or programming languages.
simian-wrapper-the-turing-way-gereralisable-recolored
Generalisable: combining replicable and robust findings help deducting generalizable results. Although there will be more steps to know how well consistency is upheld under various conditions, generalization is an important step towards achieving qualitatively similar results, relatively independent of datasets and model versions or pipelines.

Simian Wrapper supports both GUI and scripted applications.

(*) https://the-turing-way.netlify.app, The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

 

Benefits

Harnessing your models in Simian Wrapper offers benefits beyond the key objective of computational reproducibility, that may appear less obvious at first sight:

 

track case study history
Track case study history

Track software versions and ingested data.

Capture history and narrative of case studies.

collaborate and review
Collaborate & review

Achieve comon, in-depth understanding by inherent coding consistency.

Facilitate useful, hands-on contributions and changes.

avoid misinformation
Avoid misinformation

Prevent software and data agony.

Work with a single source truth.

efficient reporting
Efficient reporting

Maintain easy access to all generated results.

Avail underlying data, to facilitate ever more elaborate reporting.

audit and reproduction
Audit & reproduction

Be truly auditable, by consistent storage of metadata, data, and results.

Be prepared for ever stricter regulatory requirements.

ensure continuity
Ensure continuity

Communicate work with different stakeholders.

Easily share or handover work, to let other continue.

Solution

To facilitate reproducible operation, Simian Wrapper features a distinct implementation workflow:

 

simian-wrapper-workflow-v3

Creation

  • Project definition
  • Lib of backends
  • Lib of frontends

Data integration

  • Data dictionary
  • Data ingestion
  • Preprocessing

Governance

  • DataStore
  • Job files
  • Computational reproducability

Analysis & publication

  • Results
  • Reporting
  • Data lineage

 

How it all works

simian-wrapper-create-analytics-appa
Create your analytics apps
At creation, populate libraries for backends and frontends (where applicable):

  • Host one-to-many versions of a particular model.
  • Host one-to-many methodical variations of a particular model.
  • Host one-to-many models belonging to a particular (business) domain.
simian-wrapper-data-ingestion
Onboard your assorted data
At ingestion, define input data in a structured manner:

  • Populate data dictionaries to source data from multiple sources.
  • Preprocess and enhance the captured data.
simian-wrapper-store-data-in-jobs
Store your work
in jobs

At operation, secure the governance:

  • Code with confidence, with the DataStore’s consistent coding-interface.
  • Capture metadata, input data and results in binary job files.
  • Have computational reproducibility guaranteed, with job files and version-controlled code libraries.
simian-wrapper-report-and-publish-results
Report and publish your results
At evaluation, efficient review and reporting is ensured:

  • Obtain data usage insights with data lineage analysis.
  • Easily share jobs, warranting collaboration across domains of expertise.
  • Report efficiently and ensure continuity, using the consistent interfaces.

The Simian Wrapper components, taken together, cover the four dimensions of reproducibility. They ultimately offer a pathway towards generalizable results.