Self-Reporting Data Assets (SRDAs): Enabling FAIR, Interoperable Scientific Data

Data Asset Self-Reporting in Pharma: A Publication

Our report, published in Drug Discovery Today, introduces a powerful implementation concept for digital transformation in biopharma.

This paper was co-written by BYU, ZONTAL, Agilent, and Bristol Myers Squibb with input from multiple other large Pharma companies. It explores the need for contextualizing data to create self-reporting data assets. Further, it compares different technologies that are available for reaching the goal of FAIR data in the laboratory.

Table of Contents

What Are Self-Reporting Data Assets (SRDAs)?

Self-Reporting Data Assets (SRDAs) are structured data assets that combine raw scientific data with the full context in which that data was created. This includes experimental conditions, instrument settings, workflows, and associated metadata.

By preserving both data and context together, SRDAs enable complete interpretation without relying on external systems or institutional knowledge.

This approach ensures that data remains findable, accessible, interoperable, and reusable—bringing FAIR data principles into practical application across scientific workflows.

The Challenge of Scientific Data Today

Across pharma and life sciences, organizations continue to struggle with disconnected data systems and silos, proprietary formats that limit interoperability, incomplete metadata, and difficulty sharing data across teams and partners.

As highlighted in the paper, many organizations still rely on non-standardized formats, which significantly limit the ability to reuse and analyze data at scale.

These challenges are not just technical—they impact collaboration, efficiency, and long-term data value.

From Data Storage to Data Understanding

Traditional approaches focus on storing data. SRDAs shift the focus to preserving meaning.

By embedding contextual metadata directly within the data asset, SRDAs ensure that datasets can be understood independently—whether accessed by a scientist, a data analyst, or a downstream system.

This creates a foundation for consistent data interpretation, improved traceability and auditability, long-term data preservation, and seamless reuse across workflows.

Why the Allotrope Data Format (ADF) Matters

To enable SRDAs, the underlying data format must support both structure and context at scale.

The paper evaluates several formats used in the pharmaceutical industry—including JSON, JCAMP-DX, and AnIML—and concludes that the Allotrope Data Format (ADF) is the most suitable for implementing SRDAs.

ADF captures semantic context through structured metadata, stores raw data and associated files in a unified framework, supports high-performance data storage and access, and enables audit trails, provenance tracking, and data integrity. It also integrates data from diverse instruments and systems, making it well-suited for modern, data-centric laboratory environments.

Enabling Interoperability and Data Exchange

One of the most critical challenges in life sciences is the ability to exchange data across organizations, systems, and workflows.

SRDAs, implemented with ADF, enable seamless collaboration between pharma companies and CROs, standardized data exchange across instruments and platforms, and consistent interpretation of data across teams and partners.

By preserving both data and context, SRDAs eliminate ambiguity and reduce the need for manual translation or rework.

Supporting Advanced Analytics and Machine Learning

Structured, contextualized data is essential for modern analytics.

By organizing data in a consistent and interpretable format, SRDAs provide a strong foundation for advanced analytics, visualization, and machine learning applications.

As noted in the paper, collections of SRDAs can serve as high-quality datasets for downstream analytical use cases.

A Foundation for Data-Centric R&D

The shift toward data-centric laboratories requires more than new tools—it requires a new way of thinking about data itself.

SRDAs represent this shift by ensuring that data retains its meaning over time, can be reused across systems and use cases, supports compliance and governance, and enables scalable integration across the enterprise.

This creates a more connected and future-ready data foundation for scientific organizations.

Explore the Full Paper

This paper provides a comprehensive analysis of Self-Reporting Data Assets, the evaluation of current data formats, and the role of the Allotrope Data Format in enabling FAIR scientific data.

Read the full paper to explore how context-rich, standardized data can transform scientific workflows and improve data interoperability across life sciences.

Tags: