Thursday, February 23, 2017

Using the SHARE API to Grab the Latest CDISC Terminology Packages

Keeping your repository up to date with the latest CDISC Controlled Terminology (CT) packages represents a challenge to many organizations. The SHARE API make it easier to retrieve the CT packages and initiate the process of updating your repository. In this post I'll walk through the process of finding and retrieving the CT from SHARE using the API. We'll use the default media type of XML. When requesting a full CT package the XML media-type uses the NCI EVS extended versionof ODM that includes the additional metadata needed to represent the CT content.

The following figure provides an overview of the steps taken to retrieve the latest CT packages using the SHARE API.
The first step is to determine what CT Packages are available for loading. This can be done by using the Get Standard Terminology List API. The SHARE API is RESTful so this entails requesting the /terminology-packages resource. The default Lifecycle-Status parameter retrieves Approved Final content. Since the NCI EVS team curates the CDISC CT, CDISC loads this content into SHARE as Approved Final. I'm running this in a test environment and will retrieve older, draft content. The following is an example of a URL that uses the /terminology-packages resource to request the list of CT packages.

The following shows an XML snippet for one of the CT packages retrieved, CDASH 2013-12-20. Each package is returned as an ItemGroupDef inside the ODM MetaDataVersion element.

1:        <ItemGroupDef Name="CDASH Terminology 2013-12-20 ([CDISC Terminologies] MDES 1)" OID="1.0_13909862886802094447920" Purpose="Data Collection" Repeating="No" mdr:DatePublished="2017-01-27" mdr:Status="Draft">  
2:          <Description>  
3:            <TranslatedText xml:lang="en">CDISC Terminology Package</TranslatedText>  
4:          </Description>  
5:          <Alias Context="Terminologies API" Name="/terminology-packages/{asset-id}/terminologies"/>  
6:          <Alias Context="Standard Metadata API" Name="/standards/{asset-id}/metadata"/>  
7:        </ItemGroupDef>  

Your software will need to compare the list of ItemGroupDefs returned from SHARE to the list of CT packages available in your repository and select those that you want to retrieve. We'll simply request the CDASH package listed above. Notice the Alias elements on line 5 and 6. These elements represent how the package can be retrieved. The Alias element on line 5 has a Context="Terminologies API" with a Name attribute that holds the resource needed to return the list of the code lists in the package. Subsequently, these code lists can be used to retrieve the associated terms. For this example, we'll use the Alias on line 6 with Context="Standards Metadata API" and the resource /standards/{asset-id}/metadata to retrieve the full CT package.

The ItemGroupDef OID (identifier) on line 1 in the XML above is used to replace the {asset-id} to create the resource to request the full CDASH 2012-12-20 CT package. The resource now reads /standards/ 1.0_13909862886802094447920/metadata and an example URL using this resource looks like the following.

Your code can now retrieve the full CT package in ODM XML from SHARE. Here's a snippet of the ODM XML that's returned by the SHARE API that shows one code list.

1:  <ODM xmlns="" xmlns:nciodm="" xmlns:xlink="" xmlns:xsi="" CreationDateTime="2017-02-22T21:20:48" FileOID="Semantics.Manager.SHARE.11.5.ODM_1.3.2.CDASHTerminology2013-12-20.[CDISC_Terminologies]_MDES_1" FileType="Snapshot" ODMVersion="1.3.2" Originator="CDISC" SourceSystem="SHARE" SourceSystemVersion="" xsi:schemaLocation=" controlledterminology1-0-0.xsd">  
2:    <Study OID="STUDY.CDASHTerminology2013-12-20.[CDISC_Terminologies]_MDES_1">  
3:      <GlobalVariables>  
4:        <StudyName>Study - CDASH Terminology 2013-12-20</StudyName>  
5:        <StudyDescription>Export of CDASH Terminology 2013-12-20 with version [CDISC Terminologies] MDES 1</StudyDescription>  
6:        <ProtocolName>STUDY.CDASHTerminology2013-12-20.[CDISC_Terminologies]_MDES_1</ProtocolName>  
7:      </GlobalVariables>  
8:      <MetaDataVersion Name="CDASH Terminology 2013-12-20 MDV" OID="MDV.CDASHTerminology2013-12-20.[CDISC_Terminologies]_MDES_1.2017-02-22_21:20:48">  
9:        <CodeList DataType="text" Name="Vital Signs Position of Subject" OID="CL.VSPOS.VD_1" nciodm:CodeListExtensible="Yes" nciodm:ExtCodeID="C78431">  
10:          <Description>  
11:            <TranslatedText xml:lang="en">A terminology subset of the CDISC SDTM Position codelist created for CDASH Vital Signs Position of Subject codelist. (NCI)</TranslatedText>  
12:          </Description>  
13:          <EnumeratedItem CodedValue="SITTING" nciodm:ExtCodeID="C62122">  
14:            <nciodm:CDISCSynonym>Sitting</nciodm:CDISCSynonym>  
15:            <nciodm:CDISCDefinition>The state or act of one who sits; the posture of one who occupies a seat. (NCI)</nciodm:CDISCDefinition>  
16:            <nciodm:PreferredTerm>Sitting</nciodm:PreferredTerm>  
17:          </EnumeratedItem>  
18:          <EnumeratedItem CodedValue="SUPINE" nciodm:ExtCodeID="C62167">  
19:            <nciodm:CDISCSynonym>Supine</nciodm:CDISCSynonym>  
20:            <nciodm:CDISCDefinition>A posterior recumbent body position whereby the person lies on its back and faces upward. (NCI)</nciodm:CDISCDefinition>  
21:            <nciodm:PreferredTerm>Supine Position</nciodm:PreferredTerm>  
22:          </EnumeratedItem>  
23:          <EnumeratedItem CodedValue="STANDING" nciodm:ExtCodeID="C62166">  
24:            <nciodm:CDISCSynonym>Standing</nciodm:CDISCSynonym>  
25:            <nciodm:CDISCDefinition>The act of assuming or maintaining an erect upright position. (NCI)</nciodm:CDISCDefinition>  
26:            <nciodm:PreferredTerm>Standing</nciodm:PreferredTerm>  
27:          </EnumeratedItem>  
28:          <nciodm:CDISCSubmissionValue>VSPOS</nciodm:CDISCSubmissionValue>  
29:          <nciodm:CDISCSynonym>Vital Signs Position of Subject</nciodm:CDISCSynonym>  
30:          <nciodm:PreferredTerm>CDISC CDASH Vital Signs Position of Subject Terminology</nciodm:PreferredTerm>  
31:        </CodeList>  

That's how you retrieve CT packages from SHARE using the API. The CT is published quarterly so new content arrives regularly.

You can find more information on working with the SHARE API on the SHARE for Technical Implementers page on the CDISC web site.

Sunday, February 5, 2017

Wanted: SHARE API Early Adopters

Having completed the SHARE API v1.0 pilot in 2016, CDISC recently announced the SHARE API Early Adopter Program for those interested in beginning to use the API right away. SHARE API access provides a RESTful web services interface to SHARE for retrieving new CDISC standards and terminology.

Having worked on the API pilot, I'm happy to see it released for general use. It's a significant step forward and marks CDISC's movement towards new ways of publishing machine-readable standards content.  As part of the Early Adopters Program new users will be asked to provide regular feedback and input into new features. The SHARE API is currently a v1.0 release meaning the essentials are there in terms of accessing metadata content, but we also have a lot more we'd like to do with it. The API Pilot team developed a list of user stories at the beginning of the project, and while most were implemented during the pilot, many were deferred and will become part of the roadmap for future releases.

Version 1.0 of the API provides access to CDISC standards content in extended forms of ODM-XML or Define-XML. The extension provides access to additional metadata available in SHARE that would not normally be included in ODM-XML or Define-XML. For example, when requesting the CDASH standard in XML, the ODM-XML has been extended to include CDASH metadata such as prompt, completion instructions, and sponsor information. Define-XML is similarly extended for SDTM to include content such as core and CDISC notes. The XML formats also carry the mappings that link the standards across the clinical research data lifecycle (e.g. CDASH to SDTM) where these relationships are available in SHARE. BRIDG mappings are represented where they have been published for a standard.

One of the most useful API features will be the ability to retrieve the quarterly CDISC Controlled Terminology packages. Implementers that use the XML format will get all the CDISC Controlled Terminology content through the NCI EVS extended version of ODM-XML.

For implementers that prefer other formats, full standards content is available in RDF format. JSON is available as an option, but is considered experimental at this time since not all information is available in this format. This year we plan to implement a JSON version of the CDISC XML standards and will look to deploy that in a future version of the API.

I believe I'll be helping to run some SHARE API workshops at the Interchanges again this year. The workshops are a fun way to learn about the API by doing some hands-on exercises.

You can find more details on the SHARE API on the SHARE section of the CDISC web site.

Friday, January 27, 2017

Clinical Research Track at the 14th FHIR Connectathon

Last weekend was the 2nd HL7 FHIR Connectathon (Jan. 14 & 15) that included a Clinical Research Track and the 14th overall FHIR Connectathon. This event was hosted on the Riverwalk in San Antonio and featured over 200 attendees for a weekends worth of hacking. The primary purpose of the Connectathon events is to provide a forum for participants to develop and test software in an informal way.

During the previous, inaugural Clinical Research Track <> only two of us participated.  By the end of the weekend we were able to use FHIR resources to demonstrate the pre-populate Medidata Rave-based demographics and concomitant medications CRFs.

The Clinical Research Track at the 14th Connectathon generated quite a bit more participation interest with 8 attendees. Representatives from vendors, sponsors, and standards organizations participated. Beyond the pre-population of CRFs using EHR data retrieved using FHIR, several additional use cases where proposed for development and testing. When the weekend arrived, we decided to develop a EHR / FHIR readiness site assessment tool. Essentially, this tool would attempt to pre-populate CDASH-based CRF content using FHIR resources and score the site based on data coverage and quality.  Such a service could help identify sites with FHIR-enabled EHR systems and adequate EHR data to support the pre-population of CRFs for clinical research.

This tool leveraged the same basic capabilities as the CRF pre-population application, and we were able to build on some of the work completed during the previous Connectathon. We expanded the FHIR resources used beyond Patient and MedicationStatement to include Observations for vital signs and lab results, and AllergyIntolerance for medical history. Historically, interoperability between HL7 CDA and ODM has been most effective for lab, demographic, medication, and vital signs data. As this tool was a bit ambitious for what amounts to 1 full day of development, we plan to continue to build on our prototype in advance of future Connectathons.

The next Connectathon is in Madrid, and I believe the next one in the US will be in San Diego in the fall. This gives us a bit of time to make some progress before the next US-based Connectathon. I, for one, plan to learn a bit more Python as we work towards this next event.

Sunday, September 25, 2016

A Profile for Define-XML

As the CDISC XML Technologies team finalizes Define-XML v2.1 for internal review an old debate has re-surfaced: how much should the Define-XML specification focus on the regulatory submissions use case versus providing a more general specification that works for a broader set of use cases. As a standard that provides metadata to describe tabular datasets, Define-XML can be used to describe legacy datasets as well as datasets included for submissions. Define-XML has also been used as a specification for datasets. However, Define-XML became the most widely implemented ODM-XML based standard due its role as a required element of regulatory submissions. The importance of ensuring that Define-XML files included in a submission are complete and accurate makes a compelling case for adding rules that specifically target this use case at the risk of reducing its usefulness in other contexts.

Having recently participated in the September HL7 FHIR connectathon in Baltimore, MD it strikes me that the notion of profiling, as it is described for FHIR resources, would provide a good solution for Define-XML. Profiling is where the base resources are adapted for specific use cases. The FHIR specification describes a growing set of base resources that can be used in many different healthcare contexts making FHIR a “platform specification.” It is expected that the platform specification will require adaptations in the form of profiles to meet the needs of specific use cases. Profiles can be used to both constrain and extend resources. They describe the rules that specify which data fields are required vs. optional and which coding systems are used in the stated context. Validators ensure that a specific resource instance conforms to the profile.

Creating Define-XML as a “platform specification” useful across a broad range of use cases, while also creating a regulatory submission profile and implementation guide would seemingly be an ideal solution. As a platform specification Define-XML can be effectively used across a range of use cases. The definition of a regulatory submissions profile would permit the implementation of a comprehensive set of rules effectively adapting Define-XML for use in submissions without concern that these rules are limiting the usefulness of the base standard.

Monday, November 23, 2015

SHARE 2015 Q4 Technical Update

The SHARE API Pilot continues to make progress. A draft specification is being reviewed and a test server is running the API as currently specified. The pilot team will finalize the draft specification and spend the rest of the year testing and developing interfaces to the API. The SHARE team will also make use of the API to enhance tools in the SHARE ecosystem by making it easier for them to consume SHARE metadata. The SHARE API will be moved into production during Q2 of 2016.

A few last development details are being cleaned up for the SHARE RDF export, and then we'll work to generate a test export for the community to review. The initial exports will include SDTM, CDASH, and Controlled Terminology. The RDF content will be posted to eSHARE, and will also be available for consumption via the API once it has been moved into production.

The Biomedical Concept (BC) development tools have seen a number of upgrades this year. These tools are currently being used by the Collaborative Curation Initiative (CCI) to develop the BCs for the Prostate Cancer TA project. We're currently working to update the BC model within the SHARE MDR. Some have expressed an interested in gaining early access to this content via eSHARE and the API. Given BCs usefulness in supporting process automation and the generation of study artifacts, software vendors are keenly interested in the continued development of this content in 2016.

Over the course of 2015 eSHARE access has been expanded beyond Platinum members to include Gold members and academic researchers. For those interested in learning more about eSHARE, the webinar we delivered for the Gold member rollout is a useful place to start New content is continually being posted to eSHARE as it becomes available. Earlier this year the first version of ADaM 2.1 was posted, and an updated version that includes new input from the ADaM team is expected soon. New TAUG content continues to be posted as it becomes available. In December we plan to post some new catalogs and listings, to include the Domain Catalog. A BRIDG alignment report generated as part of the BC development will be posted for review. Finally, we plan to post an SDTM to CodeList spreadsheet that maps Controlled Terminology CodeLists to SDTM variables by the end of the year.

A draft position statement was created by the SHARE team with input from many stakeholders. Please add your comments to the wiki page:

Protocol standards were not available for loading into SHARE during 2015, but we plan to begin loading protocol objectives and endpoints into SHARE during 2016 Q1.  Given all the activity around the development of a protocol standard, the SHARE team plans to work with the CDISC terminology team and the NCI EVS to create a protocol vocabulary that can be loaded into SHARE. Before the end of the year, a proof-of-concept using elements of protocol in an end-to-end standards representation is planned.

Structured validation rules are currently under development by the SDS and ADaM teams. SHARE will use the wiki templates to develop rules in support of the TA standards using this format until a rule formalism is agreed upon. Once the formalism has been selected, the SHARE MDR model will be updated and rules in this format will be loaded. The new rule format will be piloted in 2016.

A wiki-based example repository is currently available. This repository will facilitate example re-use. The ability to re-use examples will be leveraged in support of the TA standards development projects. Before the end of the year, we will run a proof-of-concept that loads TAUG examples into the SHARE MDR.

Early in 2015 the CDASH-to-SDTM mapping was generated for eSHARE as part of the end-to-end standards work. By the end of the year we will post a similar SDTM-to-ADaM carry over variables file for review. BCs will play a key role in the development of end-to-end standards in 2016, and protocol will be added prior to the end of 2016.

Finally, the SHARE team is working as a part of the Prostate Cancer TA standards project to develop new tools in support of developing TA standards. Starting with the Prostate Cancer project, SHARE tools will be used to generate a draft TA Specification document containing the normative standards. This is one of the clearest examples of SHARE supporting an expedited TA standards development process. SHARE tools will also contribute to the TA User Guides which include non-normative content such as examples from the SHARE example repository. The Prostate Cancer TA standard project represents an innovative first step towards using automation to drive a significant portion of the standards development effort, and one that we will build upon in future projects.

New Initiatives Highlighted at the 2015 CDISC Interchange

Lots of new initiatives showed promise at the CDISC 2015 Interchange conference in Chicago on November 9-13.  The EHR2CDASH (E2C) project demonstrated the potential of healthcare link technologies used with the CDISC standards. The E2C XPath statements used to grab HL7 C-CDA/CCD document content will be stored in SHARE to support the implementation of CDASH forms that can be pre-populated with EHR content. A group of CFAST TA standards development stakeholders reviewed the new processes and tools that will be used for developing the Prostate Cancer standard. BiomedicalConcepts continue to garner attention as a key element of the semantic layer in the CDISC standards model. The growing CDISC standards model so important to the SHARE work was on display during the poster session. Use of the ODM standard was highlighted, including an extension to support its use in modern hand-held devices. There was quite a bit of enthusiasm and interest in SHARE activities, as evidenced by the crowd at the SHARE booth (read Anthony Chow's blog for more detail).

Most notably however, it seemed that 2016 was being talked about as a transformative year for CDISC. So many of the new initiatives previewed at the 2015 Interchange will begin to show their impact in 2016. Many of the new SHARE tools will begin to be used more broadly next year, especially in support of TA standards development. We plan to generate our first TA specification document from SHARE next year that includes new content such as Rules and Examples. The beginnings of a protocol standard will be available in SHARE next year. The SHARE API being piloted in 2015 will roll into production in 2016. In addition to these new developments, significant new foundational standards versions are scheduled for next year, including CDASH v2.0 and ODM v2.0. The list of new CDISC projects and deliverables slated for next year is long. It’s a bit daunting to look at the entire list, so we'll stick with our agile approach and take it one sprint at a time. The results should make for an interesting 2016 Interchange.

* A CDISC wiki account is needed to access content on the SHARE wiki space

Monday, November 17, 2014

Promoting Data Sharing at the CDISC Interchange Conference

The theme of sharing clinical research data permeated a number of presentations at the CDISC International Interchange Conference this year. In fact, the opening plenary keynote presentation by General Peter Chiarelli, CEO of One Mind, highlighted the dire need for data sharing in clinical research and espoused a number of open science principles. One Mind defines open science as a “global movement to make scientific research, results and data available, and accessible to everyone.” The key goal behind this push for open science is to accelerate the research community’s ability to transform basic research into better clinical treatments for patients. You can find One Mind’s open science principles here

One of One Mind’s open science principles involves adhering to widely accepted data standards. This makes sense because the standards help make the data useful. Sharing the data is not the end game. Using the data to accelerate the development of safe and effective treatments is what we’re after. Sharing data that cannot easily be interpreted or understood limits the value of the data. At the very least, it adds significantly to the cost and time needed to interpret and transform the data to make it useful. Using data standards adds to the value of the data in a measurable way, and promotes the positive effects targeted by data sharing and open science.

Another of One Mind’s open science principles asks that those who re-use the data give proper attribution and credit to the data generators. Earlier this year I attended a workshop focused on creating enhanced citation metadata for data standards to promote attribution for datasets. Enhanced citation metadata means that the citation would include additional citation details, such as role, to create a credit market for data and software tool sharing. Enhanced citation roles could be based on a terminology that includes options such as data provider, data analyzer, or software tool provider. Support for data citations will be added to the ODM-XML backlog for future development. Adding ODM-XML support for data citations should aid in the development of a credit market for data, as well as promote a data sharing culture within the clinical research community.