Introduction
Simian Wrapper facilitates reproducible research and operation in a way “too easy not to do”. Securing a collaborative, reusable and transparent workflow, Simian Wrapper provides building blocks for data integration and governance – cornerstones of computational reproducibility.
Use cases for Simian Wrapper reside not only in environments where auditability and case-study tracking are paramount, driven by regulatory demands or the need to cover liabilities. But also, in research environments where methodologies for innovative analytics are being explored and developed – iterative processes by definition.
Adhering to the paradigms of The Turing Way handbook to reproducible, ethical, and collaborative data science (*), reproducibility is defined as work that can be independently recreated from the same data and the same code originally used in an analytics case-study. By design, Simian Wrapper facilitates the four different dimensions of reproducible operation coined by The Turing Way:
Benefits
Harnessing your models in Simian Wrapper offers benefits beyond the key objective of computational reproducibility, that may appear less obvious at first sight:
Track software versions and ingested data.
Capture history and narrative of case studies.
Achieve comon, in-depth understanding by inherent coding consistency.
Facilitate useful, hands-on contributions and changes.
Prevent software and data agony.
Work with a single source truth.
Maintain easy access to all generated results.
Avail underlying data, to facilitate ever more elaborate reporting.
Be truly auditable, by consistent storage of metadata, data, and results.
Be prepared for ever stricter regulatory requirements.
Communicate work with different stakeholders.
Easily share or handover work, to let other continue.
Solution
To facilitate reproducible operation, Simian Wrapper features a distinct implementation workflow:
Creation
|
Data integration
|
Governance
|
Analysis & publication
|
How it all works
At creation, populate libraries for backends and frontends (where applicable):
- Host one-to-many versions of a particular model.
- Host one-to-many methodical variations of a particular model.
- Host one-to-many models belonging to a particular (business) domain.
At ingestion, define input data in a structured manner:
- Populate data dictionaries to source data from multiple sources.
- Preprocess and enhance the captured data.
in jobs
At operation, secure the governance:
- Code with confidence, with the DataStore’s consistent coding-interface.
- Capture metadata, input data and results in binary job files.
- Have computational reproducibility guaranteed, with job files and version-controlled code libraries.
At evaluation, efficient review and reporting is ensured:
- Obtain data usage insights with data lineage analysis.
- Easily share jobs, warranting collaboration across domains of expertise.
- Report efficiently and ensure continuity, using the consistent interfaces.
The Simian Wrapper components, taken together, cover the four dimensions of reproducibility. They ultimately offer a pathway towards generalizable results.