Simian Wrapper: your shortcut to computational reproducibility.
Introduction
Simian Wrapper facilitates reproducible research and operation in a way “too easy not to do”. Securing a collaborative, reusable and transparent workflow, Simian Wrapper provides building blocks for data integration and governance – cornerstones of computational reproducibility.
Use cases for Simian Wrapper reside not only in environments where auditability and case-study tracking are paramount, driven by regulatory demands or the need to cover liabilities. But also in research environments where methodologies for innovative analytics are being explored and developed – iterative processes by definition.
Adhering to the paradigms of The Turing Way handbook to reproducible, ethical, and collaborative data science (*), reproducibility is defined as work that can be independently recreated from the same data and the same code originally used in an analytics case-study. By design, Simian Wrapper facilitates the four different dimensions of reproducible operation coined by The Turing Way:
Benefits
Harnessing your models in Simian Wrapper offers benefits beyond the primary goal of computational reproducibility, which might not be immediately evident at first glance:
Track software versions and ingested data. Additionally, capture the history and narrative of your case studies.
Create shared understanding through inherent coding consistency, facilitating meaningful collaboration.
Operate from a single source of truth by design, and inherently prevent data turmoil and ambiguity.
Guaranteed, seamless access to all generated results and underlying data, facilitating ever more elaborate reporting.
Be truly auditable, by consistent storage of metadata, data, and results. Ensuring preparedness for ever stricter scrutiny.
Communicate your work with different stakeholders and enjoy easy handovers, to let others continue seamlessly.
Solution
The Simian Wrapper components collectively address all four dimensions of reproducibility, ultimately providing a pathway to generalized results.
Creation
|
Data integration
|
Governance
|
Analysis & publication
|
How it all works
At creation, populate libraries for backends and frontends:
- Host multiple versions of a specific model.
- Host various methodical variations of a specific model.
- Host multiple models within a specific domain.
At ingestion, define input data in a structured manner:
- Define dictionaries to source data from multiple origins.
- Fetch, preprocess and save the captured data in an immutable store.
At operation, intrinsically secure your governance:
- Code confidently, with the consistent coding-interface.
- Capture metadata, input data and computational results in job files.
- Enjoy reproducibility, with jobs and managed code repos.
At evaluation, efficient review and reporting is ensured:
- Obtain data usage insights with data lineage analysis.
- Effortlessly share job files, ensuring smooth teamwork all around.
- Report efficiently, with help of the consistent interfaces.