Published February 22, 2016 | Version v1
Software Open

Processing and similarity scoring WHO ICTRP data

  • 1. Contractor, ClinicalTrials.gov

Description

Source code for "Previously Unidentified Duplicate Registrations of Clinical Trials: an Exploratory Analysis of Registry Data Worldwide" (under review).

This code was used to process the WHO International Clinical Trials Registry Platform (ICTRP) dataset retrieved in April 2015 (see related). The code imports the XML data into a SQL database and performs a number of standardizations. There is also code to group records by referenced primary registry IDs and to perform text-based similarity scoring on registration fields.

The README file included with the code provides detailed instructions on dependencies and running the code.

Files

ictrp-source-code.zip

Files (47.3 MB)

Name Size Download all
md5:29322bd38e9ed6e8099683fce28f24f8
47.3 MB Preview Download

Additional details

Related works

Is supplemented by
10.5281/zenodo.46392 (DOI)