Community provided software for working with OMOP data

Last updated on 2025-09-17 | Edit this page

Overview

Questions

What are some of the R tools that can work on OMOP CDM instances ?

Objectives

Brief outline of some R tools that will be useful for new OMOP users.

Introduction

There are a range of community provided R tools that can help you work with instances of OMOP data.

We are going to show you a brief summary of some that are likely to be of use to new users.

With each you may need to balance the need to learn some new syntax with the benefits of the extra functionality that the package provides.

TODO maybe add a table with links to packages & brief descriptions.

OmopSketch To summarise an OMOP database.
CodelistGenerator To generate lists of OMOP concepts.

OmopSketch To summarise key information about an OMOP database. To provide a broad characterisation of the data and to allow users to evaluate whether they are suitable for particular research.

First we can install the package and its dependencies and connect to some mock data.

R

#without dependencies=TRUE it failed needing omock & VisOmopResults
#with dependencies it installed 91 packages in 1.8 minutes
install.packages("OmopSketch", dependencies=TRUE, quiet=TRUE)

library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()

summarise* and table* functions

The package has the following types of functions :

function start	what it does
summarise*	generate results objects.
table*	convert results objects to tables for display.

Snapshot

Snapshot creates a broad summary of the database including person count, temporal extent and other metadata.

R

summariseOmopSnapshot(cdm) |>
  tableOmopSnapshot(type = "gt")

Estimate	Database name
Estimate	mockOmopSketch
General
Snapshot date	2025-09-17
Person count	100
Vocabulary version	v5.0 18-JAN-19
Observation period
N	100
Start date	1958-01-22
End date	2019-12-24
Cdm
Source name	eunomia
Version	5.3
Holder name	-
Release date	-
Description	-
Documentation reference	-
Source type	duckdb

Missing data

Summarise missing data in each column of one or many cdm tables.

R

missingData <- summariseMissingData(cdm, c("drug_exposure"))
tableMissingData(missingData)

Column name	Estimate name	Database name
Column name	Estimate name	mockOmopSketch
drug_exposure
drug_exposure_id	N missing data (%)	0 (0.00%)
	N zeros (%)	0 (0.00%)
person_id	N missing data (%)	0 (0.00%)
	N zeros (%)	0 (0.00%)
drug_concept_id	N missing data (%)	0 (0.00%)
	N zeros (%)	0 (0.00%)
drug_exposure_start_date	N missing data (%)	0 (0.00%)
drug_exposure_start_datetime	N missing data (%)	21,600 (100.00%)
drug_exposure_end_date	N missing data (%)	0 (0.00%)
drug_exposure_end_datetime	N missing data (%)	21,600 (100.00%)
verbatim_end_date	N missing data (%)	21,600 (100.00%)
drug_type_concept_id	N missing data (%)	0 (0.00%)
	N zeros (%)	0 (0.00%)
stop_reason	N missing data (%)	21,600 (100.00%)
refills	N missing data (%)	21,600 (100.00%)
quantity	N missing data (%)	21,600 (100.00%)
days_supply	N missing data (%)	21,600 (100.00%)
sig	N missing data (%)	21,600 (100.00%)
route_concept_id	N missing data (%)	21,600 (100.00%)
	N zeros (%)	0 (0.00%)
lot_number	N missing data (%)	21,600 (100.00%)
provider_id	N missing data (%)	21,600 (100.00%)
	N zeros (%)	0 (0.00%)
visit_occurrence_id	N missing data (%)	0 (0.00%)
	N zeros (%)	0 (0.00%)
visit_detail_id	N missing data (%)	21,600 (100.00%)
	N zeros (%)	0 (0.00%)
drug_source_value	N missing data (%)	21,600 (100.00%)
drug_source_concept_id	N missing data (%)	21,600 (100.00%)
	N zeros (%)	0 (0.00%)
route_source_value	N missing data (%)	21,600 (100.00%)
dose_unit_source_value	N missing data (%)	21,600 (100.00%)

Clinical Records

Allows you to summarise omop tables from a cdm. By default it gives measures including records per person, how many concepts are standard and source vocabularies.

R

summariseClinicalRecords(cdm, "condition_occurrence") |>
  tableClinicalRecords(type = "gt")

Variable name	Variable level	Estimate name	Database name
Variable name	Variable level	Estimate name	mockOmopSketch
condition_occurrence
Number records	-	N	8,400
Number subjects	-	N (%)	100 (100.00%)
Records per person	-	Mean (SD)	84.00 (9.83)
		Median [Q25 - Q75]	84 [77 - 91]
		Range [min to max]	[65 to 107]
In observation	Yes	N (%)	8,400 (100.00%)
Domain	Condition	N (%)	8,400 (100.00%)
Source vocabulary	No matching concept	N (%)	8,400 (100.00%)
Standard concept	S	N (%)	8,400 (100.00%)
Type concept id	Unknown type concept: 1	N (%)	8,400 (100.00%)

You can also

apply to more than one table at a time
reduce the measures of records per person
reduce the number of rows by setting options to FALSE
stratify by sex and/or age
set a date range

R

summariseClinicalRecords(cdm, c("drug_exposure","measurement"),
                         recordsPerPerson = c("mean", "sd"),
                         inObservation = FALSE,
                         standardConcept = FALSE,
                         sourceVocabulary = FALSE,
                         domainId = FALSE,
                         typeConcept = FALSE,
                         sex = TRUE) |> 
                     tableClinicalRecords(type = "gt")

Variable name	Variable level	Estimate name	Database name
Variable name	Variable level	Estimate name	mockOmopSketch
drug_exposure; overall
Number records	-	N	21,600.00
Number subjects	-	N (%)	100 (100.00%)
Records per person	-	Mean (SD)	216.00 (14.47)
drug_exposure; Female
Number records	-	N	13,526.00
Number subjects	-	N (%)	63 (100.00%)
Records per person	-	Mean (SD)	214.70 (15.68)
drug_exposure; Male
Number records	-	N	8,074.00
Number subjects	-	N (%)	37 (100.00%)
Records per person	-	Mean (SD)	218.22 (12.01)
measurement; overall
Number records	-	N	5,900.00
Number subjects	-	N (%)	100 (100.00%)
Records per person	-	Mean (SD)	59.00 (7.85)
measurement; Female
Number records	-	N	3,742.00
Number subjects	-	N (%)	63 (100.00%)
Records per person	-	Mean (SD)	59.40 (8.37)
measurement; Male
Number records	-	N	2,158.00
Number subjects	-	N (%)	37 (100.00%)
Records per person	-	Mean (SD)	58.32 (6.93)

Record counts over time

You can plot the number of records over time for any cdm table also stratified by age and sex (so behind the scenes this will be joining clinical tables to the person table).

R

recordCount <- summariseRecordCount(cdm, 
                                    omopTableName =  "drug_exposure",
                                    interval = "years",
                                    sex = TRUE,
                                    ageGroup =  list("<40" = c(0,39), ">=40" = c(40, Inf)),
                                    dateRange = as.Date(c("2002-01-01", NA))) 

plotRecordCount(recordCount, facet = "sex", colour = "age_group")

Concept Id counts

You can get a summary of the numbers of concept ids in a cdm table. Unfortuntaely the summary doesn’t display here but you can copy and paste the code into your console to see it.

R

result <- summariseConceptIdCounts(cdm = cdm, omopTableName = "condition_occurrence")
tableConceptIdCounts(head(result,5), display = "standard", type = "datatable")

Key Points

Brief outline of some R tools that will be useful for new OMOP users.