Simian Wrapper
Introduction
Simian Wrapper facilitates reproducible research and operation in a way “too easy not to do”. Securing a collaborative, reusable and transparent workflow, Simian Wrapper provides building blocks for data integration and governance – cornerstones of computational reproducibility.
Use cases for Simian Wrapper reside not only in environments where auditability and case-study tracking are paramount, driven by regulatory demands or the need to cover liabilities. But also, in research environments where methodologies for innovative analytics are being explored and developed – iterative processes by definition.
Adhering to the paradigms of The Turing Way handbook to reproducible, ethical, and collaborative data science (*), reproducibility is defined as work that can be independently recreated from the same data and the same code originally used in an analytics case-study. By design, Simian Wrapper facilitates the four different dimensions of reproducible operation coined by The Turing Way:
Simian Wrapper supports both GUI and scripted applications.
(*) https://the-turing-way.netlify.app, The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Benefits
Harnessing your models in Simian Wrapper offers benefits beyond the key objective of computational reproducibility, that may appear less obvious at first sight:
- Track software versions and ingested data.
- Capture history and narraitive of case studies.
- Achieve comon, in-depth understanding by inherent ciding consistency.
- Facilitate useful, hands-on contributions and changes.
- Prevent software and data agony.
- Work with a single source truth.
- Maintain easy access to all generated results.
- Avail underlying data, to facilitate ever more elaborate reporting.
- Be truly auditable, by consistent storage of metadata, data, and results.
- Be prepared for ever stricter regulatory requirements.
- Communicate work with different stakeholders.
- Easily share or handover work, to let other continue.
Solution
To facilitate reproducible operation, Simian Wrapper features a distinct implementation workflow:
Creation
|
Data integration
|
Governance
|
Analysis & publication
|
How it all works
At creation, populate libraries for backends and frontends (where applicable):
- Host one-to-many versions of a particular model.
- Host one-to-many methodical variations of a particular model.
- Host one-to-many models belonging to a particular (business) domain.
At ingestion, define input data in a structured manner:
- Populate data dictionaries to source data from multiple sources.
- Preprocess and enhance the captured data.
in jobs
At operation, secure the governance:
- Code with confidence, with the DataStore’s consistent coding-interface.
- Capture metadata, input data and results in binary job files.
- Have computational reproducibility guaranteed, with job files and version-controlled code libraries.
At evaluation, efficient review and reporting is ensured:
- Obtain data usage insights with data lineage analysis.
- Easily share jobs, warranting collaboration across domains of expertise.
- Report efficiently and ensure continuity, using the consistent interfaces.
The Simian Wrapper components, taken together, cover the four dimensions of reproducibility. They ultimately offer a pathway towards generalizable results.