Sunday, September 25, 2016

A Profile for Define-XML

As the CDISC XML Technologies team finalizes Define-XML v2.1 for internal review an old debate has re-surfaced: how much should the Define-XML specification focus on the regulatory submissions use case versus providing a more general specification that works for a broader set of use cases. As a standard that provides metadata to describe tabular datasets, Define-XML can be used to describe legacy datasets as well as datasets included for submissions. Define-XML has also been used as a specification for datasets. However, Define-XML became the most widely implemented ODM-XML based standard due its role as a required element of regulatory submissions. The importance of ensuring that Define-XML files included in a submission are complete and accurate makes a compelling case for adding rules that specifically target this use case at the risk of reducing its usefulness in other contexts.

Having recently participated in the September HL7 FHIR connectathon in Baltimore, MD it strikes me that the notion of profiling, as it is described for FHIR resources, would provide a good solution for Define-XML. Profiling is where the base resources are adapted for specific use cases. The FHIR specification describes a growing set of base resources that can be used in many different healthcare contexts making FHIR a “platform specification.” It is expected that the platform specification will require adaptations in the form of profiles to meet the needs of specific use cases. Profiles can be used to both constrain and extend resources. They describe the rules that specify which data fields are required vs. optional and which coding systems are used in the stated context. Validators ensure that a specific resource instance conforms to the profile.

Creating Define-XML as a “platform specification” useful across a broad range of use cases, while also creating a regulatory submission profile and implementation guide would seemingly be an ideal solution. As a platform specification Define-XML can be effectively used across a range of use cases. The definition of a regulatory submissions profile would permit the implementation of a comprehensive set of rules effectively adapting Define-XML for use in submissions without concern that these rules are limiting the usefulness of the base standard.

Monday, November 23, 2015

SHARE 2015 Q4 Technical Update

The SHARE API Pilot continues to make progress. A draft specification is being reviewed and a test server is running the API as currently specified. The pilot team will finalize the draft specification and spend the rest of the year testing and developing interfaces to the API. The SHARE team will also make use of the API to enhance tools in the SHARE ecosystem by making it easier for them to consume SHARE metadata. The SHARE API will be moved into production during Q2 of 2016.

A few last development details are being cleaned up for the SHARE RDF export, and then we'll work to generate a test export for the community to review. The initial exports will include SDTM, CDASH, and Controlled Terminology. The RDF content will be posted to eSHARE, and will also be available for consumption via the API once it has been moved into production.

The Biomedical Concept (BC) development tools have seen a number of upgrades this year. These tools are currently being used by the Collaborative Curation Initiative (CCI) to develop the BCs for the Prostate Cancer TA project. We're currently working to update the BC model within the SHARE MDR. Some have expressed an interested in gaining early access to this content via eSHARE and the API. Given BCs usefulness in supporting process automation and the generation of study artifacts, software vendors are keenly interested in the continued development of this content in 2016.

Over the course of 2015 eSHARE access has been expanded beyond Platinum members to include Gold members and academic researchers. For those interested in learning more about eSHARE, the webinar we delivered for the Gold member rollout is a useful place to start New content is continually being posted to eSHARE as it becomes available. Earlier this year the first version of ADaM 2.1 was posted, and an updated version that includes new input from the ADaM team is expected soon. New TAUG content continues to be posted as it becomes available. In December we plan to post some new catalogs and listings, to include the Domain Catalog. A BRIDG alignment report generated as part of the BC development will be posted for review. Finally, we plan to post an SDTM to CodeList spreadsheet that maps Controlled Terminology CodeLists to SDTM variables by the end of the year.

A draft position statement was created by the SHARE team with input from many stakeholders. Please add your comments to the wiki page:

Protocol standards were not available for loading into SHARE during 2015, but we plan to begin loading protocol objectives and endpoints into SHARE during 2016 Q1.  Given all the activity around the development of a protocol standard, the SHARE team plans to work with the CDISC terminology team and the NCI EVS to create a protocol vocabulary that can be loaded into SHARE. Before the end of the year, a proof-of-concept using elements of protocol in an end-to-end standards representation is planned.

Structured validation rules are currently under development by the SDS and ADaM teams. SHARE will use the wiki templates to develop rules in support of the TA standards using this format until a rule formalism is agreed upon. Once the formalism has been selected, the SHARE MDR model will be updated and rules in this format will be loaded. The new rule format will be piloted in 2016.

A wiki-based example repository is currently available. This repository will facilitate example re-use. The ability to re-use examples will be leveraged in support of the TA standards development projects. Before the end of the year, we will run a proof-of-concept that loads TAUG examples into the SHARE MDR.

Early in 2015 the CDASH-to-SDTM mapping was generated for eSHARE as part of the end-to-end standards work. By the end of the year we will post a similar SDTM-to-ADaM carry over variables file for review. BCs will play a key role in the development of end-to-end standards in 2016, and protocol will be added prior to the end of 2016.

Finally, the SHARE team is working as a part of the Prostate Cancer TA standards project to develop new tools in support of developing TA standards. Starting with the Prostate Cancer project, SHARE tools will be used to generate a draft TA Specification document containing the normative standards. This is one of the clearest examples of SHARE supporting an expedited TA standards development process. SHARE tools will also contribute to the TA User Guides which include non-normative content such as examples from the SHARE example repository. The Prostate Cancer TA standard project represents an innovative first step towards using automation to drive a significant portion of the standards development effort, and one that we will build upon in future projects.

New Initiatives Highlighted at the 2015 CDISC Interchange

Lots of new initiatives showed promise at the CDISC 2015 Interchange conference in Chicago on November 9-13.  The EHR2CDASH (E2C) project demonstrated the potential of healthcare link technologies used with the CDISC standards. The E2C XPath statements used to grab HL7 C-CDA/CCD document content will be stored in SHARE to support the implementation of CDASH forms that can be pre-populated with EHR content. A group of CFAST TA standards development stakeholders reviewed the new processes and tools that will be used for developing the Prostate Cancer standard. BiomedicalConcepts continue to garner attention as a key element of the semantic layer in the CDISC standards model. The growing CDISC standards model so important to the SHARE work was on display during the poster session. Use of the ODM standard was highlighted, including an extension to support its use in modern hand-held devices. There was quite a bit of enthusiasm and interest in SHARE activities, as evidenced by the crowd at the SHARE booth (read Anthony Chow's blog for more detail).

Most notably however, it seemed that 2016 was being talked about as a transformative year for CDISC. So many of the new initiatives previewed at the 2015 Interchange will begin to show their impact in 2016. Many of the new SHARE tools will begin to be used more broadly next year, especially in support of TA standards development. We plan to generate our first TA specification document from SHARE next year that includes new content such as Rules and Examples. The beginnings of a protocol standard will be available in SHARE next year. The SHARE API being piloted in 2015 will roll into production in 2016. In addition to these new developments, significant new foundational standards versions are scheduled for next year, including CDASH v2.0 and ODM v2.0. The list of new CDISC projects and deliverables slated for next year is long. It’s a bit daunting to look at the entire list, so we'll stick with our agile approach and take it one sprint at a time. The results should make for an interesting 2016 Interchange.

* A CDISC wiki account is needed to access content on the SHARE wiki space

Monday, November 17, 2014

Promoting Data Sharing at the CDISC Interchange Conference

The theme of sharing clinical research data permeated a number of presentations at the CDISC International Interchange Conference this year. In fact, the opening plenary keynote presentation by General Peter Chiarelli, CEO of One Mind, highlighted the dire need for data sharing in clinical research and espoused a number of open science principles. One Mind defines open science as a “global movement to make scientific research, results and data available, and accessible to everyone.” The key goal behind this push for open science is to accelerate the research community’s ability to transform basic research into better clinical treatments for patients. You can find One Mind’s open science principles here

One of One Mind’s open science principles involves adhering to widely accepted data standards. This makes sense because the standards help make the data useful. Sharing the data is not the end game. Using the data to accelerate the development of safe and effective treatments is what we’re after. Sharing data that cannot easily be interpreted or understood limits the value of the data. At the very least, it adds significantly to the cost and time needed to interpret and transform the data to make it useful. Using data standards adds to the value of the data in a measurable way, and promotes the positive effects targeted by data sharing and open science.

Another of One Mind’s open science principles asks that those who re-use the data give proper attribution and credit to the data generators. Earlier this year I attended a workshop focused on creating enhanced citation metadata for data standards to promote attribution for datasets. Enhanced citation metadata means that the citation would include additional citation details, such as role, to create a credit market for data and software tool sharing. Enhanced citation roles could be based on a terminology that includes options such as data provider, data analyzer, or software tool provider. Support for data citations will be added to the ODM-XML backlog for future development. Adding ODM-XML support for data citations should aid in the development of a credit market for data, as well as promote a data sharing culture within the clinical research community.

Wednesday, September 10, 2014

Value Level Metadata and Research Concepts

When people point to flaws in SDTM, they typically appear to me as gaps in the existing standard. In general, CDISC started defining standards by focusing on the basic structural metadata (e.g. domains, variables, code lists). This makes sense because this structural metadata is fundamentally useful, and relatively easy to understand and create. As the industry’s use of the standards has increased, so has the demand for standards that can be implemented more consistently and easily, as well as standards that are more computable. The limitations in the current standards are gaps, and addressing these gaps represents a natural evolution for the CDISC standards.

As noted in my previous post “What’s in a SHARE Value Level Metadata Library?” CDISC does not currently contain Value Level Metadata (VLM) content, and this content represents a lot of new metadata. VLM is a gap in the existing standards. How do we know what variables are impacted by a specific –TESTCD? Much of that information can be conveyed through VLM, and the CDISC Terminology teams have started to address the VLM content gaps. In SHARE, we’re extending the CDISC model to capture VLM content.

A Concept Layer exists in the SHARE meta-model. Conceptual metadata represents another gap in the CDISC standards, with the noted exception of Controlled Terminology. The SHARE team loaded NCI- created concepts for existing CDASH and SDTM variables as experimental content, since they are not part of the normative standard. Basically, these concepts consist of a natural language definition and a Concept-code (c-code). For example, the SDTM variable AESDTH implements the concept “Death Related to Adverse Event”, has a concept code of C48275, and a CDISC definition of “The termination of life as a result of an adverse event.” Without these basic concepts, we don’t have consistently rendered definitions for our standards metadata.

Why do we need concepts with basic definitions and c-codes for our standards metadata? When a TA standards team is creating a new standard, these definitions help Subject Matter Experts (SME), that strangely prefer not to speak in the language of SDTM domains and variables, decide if their needs are met by the existing standards, or if the development of new standards metadata is warranted. When new standards metadata is developed, creating natural language definitions in terms understood by the SMEs helps to clarify and disambiguate the meaning and use of that metadata.

When developing new VLM, how do we know how each SDTM variable in a domain is related to a specific –TESTCD? The SMEs draw on their clinical / statistical / data management expertise to identify the appropriate set of concepts and their relationships to the specified test. These related sets of concepts are then used to create VLM metadata content in terms of SDTM variables and controlled terminology. The conceptual metadata needed to support this process does not explicitly exist in the CDISC standards today. Developing the conceptual metadata that supports the development of VLM and other standards metadata represents another logical next step in the evolution of the CDISC standards. 

There’s more to the SHARE Concept Layer than basic concepts, including the means to combine basic concepts to represent clinical observations. The SHARE Concept Layer will be covered in more detail in a future post.

Tuesday, August 26, 2014

What’s in a SHARE Value Level Metadata Library?

What’s in a Value Level Metadata (VLM) Library?

SHARE has the capability to store and publish Value Level Metadata (VLM) content. Currently, the only CDISC standard describing VLM is Define-XML. Define-XML provides the structure for VLM along with some guidelines on when it’s useful, but it does not provide standard VLM content. The Define-XML v2.0 specification states that VLM should be applied when it provides information useful for interpreting study data, and that it need not be applied in all cases. Precisely what and where VLM should be used is determined by study implementers.

Since there are no hard and fast rules describing when to use VLM, what should be included in a SHARE library of VLM content? It might be useful to ask, “where is VLM being used today?” Based on input so far, most implementers add VLM where they think the regulatory reviewers might want to see it. Since many organizations are not yet using Define-XML as a machine-readable specification, but are instead creating it to fulfill an FDA submission requirement, implementers often add VLM for only the most very basic and obvious cases.  How should the SHARE VLM content be published so that it would be useful to implementers?  For those that are using VLM as a machine-readable specification, how are you using VLM?

A Proposal for Developing a VLM Library in SHARE

In line with current VLM usage, the SHARE team proposes to publish VLM for the most basic and obvious cases first. The VLM will be published as Define-XML v2.0 files and made available for download in eSHARE. Where available, the VLM will make use of the recent work on “CT Mapping/Alignment Across Codelists”, now posted on the CDISC website. These mapping specifications list sets of valid content for variables based on specific tests or parameters. For example, the “VS Test_Unit Codetable” specification provides the test name, valid units and valid positions for each VSTESTCD. Publishing these valid mapping specifications as VLM should provide the information in a useful format.


As we publish the first sets of VLM, we’re interested in your feedback on the VLM content and format. Since VLM has largely been used as a study level deliverable, what content would make a useful VLM library? Look for an example VLM Define-XML v2.0 file on eSHARE in the next couple weeks. 

When implemented to its fullest extent, VLM will generate a significant amount of additional metadata. Managing this volume of additional detail could not be done adequately without a tool like SHARE. 

Wednesday, August 6, 2014

Dataset-XML: an Expanding Toolbox

Despite just being released in Q2 of 2014, a number of freely available tools are already available to work with Dataset-XML files. The recently-introduced CDISC Dataset-XML standard enables the interchange of tabular datasets, like SDTM or ADaM, using ODM-based XML, and provides a convenient alternative to SAS V5 XPORT files. Tools supporting Dataset-XML are listed on the publicly accessible CDISC Dataset-XML Resources page on the CDISC Wiki. Early versions of many of these tools were available before Dataset-XML was released as a final standard. The speedy availability of these tools highlights the CDISC community’s culture of innovation as inspired by the availability of machine-readable standards.

The availability of software tools with the release of Dataset-XML enabled the FDA to begin planning the "Transport Format for the Submission of Regulatory Study Data” pilot prior to the standard’s final release. In order to test Dataset-XML’s suitability as a replacement for SAS V5 XPORT for submission datasets, tools were needed to create Dataset-XML files from existing SAS V5 XPORT files, to validate the Dataset-XML files, to view the content of the Dataset-XML files, and to convert the Dataset-XML files into SAS datasets.  The availability of Dataset-XML tools made it easy for sponsors to participate in the pilot, and the availability of a robust set of software tools is essential to the adoption of data standards.

Tools available that support Dataset-XML include:  
  • XPT2DatasetXML by XML4Pharma to convert SAS V5 XPORT files to Dataset-XML
  • Smart Dataset-XML Viewer by the University of Applied Sciences FH Joanneum Graz – eHealth to enable Dataset-XML viewing, filtering and other basic visual review capabilities
  • EZ Convert by Sally Cassells to convert Dataset-XML to SAS datasets
  • SAS Clinical Standards Toolkit by SAS provides Dataset-XML support as a pre-release package, and the final version will be part of the next Clinical Standards Toolkit release
  • OpenCDISC v1.5 by OpenCDISC to validate Dataset-XML files
  • R4CDISC by Ippei Akiya to read Dataset-XML datasets and return an R dataframe

More details are available on the Dataset-XML Resources CDISC Wiki page. If your Dataset-XML tool is not listed on the please let us know about it by adding a comment to the wiki page.  Happy tooling with Dataset-XML!