Content from What is OMOP?
Last updated on 2025-09-12 | Edit this page
Overview
Questions
- What is OMOP?
- What information would you expect to find in the person table?
- What information would you expect to find in the condition_occurrence table?
- How can you join these tables to aggregate information?
Objectives
- Examine the diagram of the OMOP tables and the data specification
- Interrogate the data in the tables
- Join these tables to find the concept names
Setting up R
Getting started
Since we want to import the files called *.csv
into our
R environment, we need to be able to tell our computer where the file
is. To do this, we will create a “Project” with RStudio that contains
the data we want to work with. The “Projects” interface in RStudio not
only creates a working directory for you, but also remembers its
location (allowing you to quickly navigate to it). The interface also
(optionally) preserves custom settings and open files to make it easier
to resume work after a break.
Introduction
OMOP is a format for recording Electronic Healthcare Records. It allows you to follow a patient journey through a hospital by linking every aspect to a standard vocabulary thus enabling easy sharing of data between hospitals, trusts and even countries.
OMOP CDM Diagram

OMOP CDM stands for the Observational Medical Outcomes Partnership Common Data Model. You don’t really need to remember what OMOP stands for. Remembering that CDM stands for Common Data Model can help you remember that it is a data standard that can be applied to different data sources to create data in a Common (same) format. The table diagram will look confusing to start with but you can use data in the OMOP CDM without needing to understand (or populate) all 37 tables.
Test yourself
Look at the OMOP-CDM figure and answer the following questions:
- Which table is the key to all the other tables?
- Which table allows you to distinguish between different stays in hospital?
The Person table
The Visit_occurrence table
There are a handful of core tables and columns that contain key information about a patient’s journey in the hospital. These are 7 tables to get you started :
- person uniquely identifies each person or patient, and some demographic information. This is the central table that all other tables relate to.
- condition_occurrence records relating to a Person suggesting the presence of a medical condition.
- drug_exposure records about exposure of a patient to a drug.
- procedure_occurrence activities carried out by a healthcare provider on the patient with a diagnostic or therapeutic purpose.
- measurement numerical or categorical values obtained through standardized examination of a Person or Person’s sample.
- observation clinical facts about a Person obtained in the context of examination, questioning or a procedure.
- visit_occurrence records of times where Persons engage with the healthcare system.
Why use OMOP?

Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. This means that different tools can also be shared and reused. So using OMOP can help make your research FAIR.
Some simple tables
Loading Data
Now that we are set up with an Rstudio project, we are sure that the
data and scripts we are using are all in our working directory. The data
files should be located in the directory data
, inside the
working directory. Now we can load the data into R, there are three data
files person.csv
, condition_occurrence.csv
and
drug_exposure.csv
. Read each of these into a table with the
same name.
Read the Data
There are three data files person.csv
,
condition_occurrence.csv
and
drug_exposure.csv
. Read each of these into a table with the
same name.
NOTE: The data does have headers.
R
person <- read.csv(file = "data/person.csv")
condition_occurrence <- read.csv(file = "data/condition_occurrence.csv")
drug_exposure <- read.csv(file = "data/drug_exposure.csv")
When you have read in the data, take some time to explore it.
Adding concept names
You will have noticed that content of the tables are not terribly easy to understand. This is because everything in OMOP is viewed as a concept that allows it to be related to one or more standard vocabularies such as SNOMED, ICD-10, etc.
We have developed a package that makes it very easy to add concept names to the tables.
You will need the function omopcept::omop_join_name_all(). This will look up the concept_id in the main table of concepts and add a column for the name of the concept associated with that id.
Who’s who?
By creating tables that also have the name of the concepts answer the following questions
- How old is the black gentleman?
- In which month was an unspecified fever prevalent in the hospital?
- What was the ethnicity of the patient not affected by this fever?
- Give a description of the patient who received Amoxicillin because they were wheezing?
R
library(omopcept)
person_named <- person |> omop_join_name_all()
OUTPUT
Warning: downloading a subset of omop vocab files, pre-processed.
If you want to make sure you have the vocabs you need, download from Athena, save locally & call `omop_vocabs_preprocess()`
OUTPUT
downloading concept file, may take a few minutes, this only needs to be repeated if the package is re-installed
R
condition_occurrence_named <- condition_occurrence |> omop_join_name_all()
drug_exposure_named <- drug_exposure |> omop_join_name_all()
- 25 (or 24 if he hasn’t had his birthday this year)
- July
- Don’t know - it hasn’t been specified
- A 53/54 white female
Joining and interrogating the tables
Using join
We established when looking at the diagram that the person table was the key to accessing all the other tables. In fact it is the person_id column that is the actual key that will allow us to join with other tables.
So we can join two of the tables together to get information about the different conditions suffered by each person.
I am going to use a left join because I want a record of every person and the conditions they may have.
R
library(dplyr)
person_condition <-
person_named |>
left_join(condition_occurrence_named, by = join_by(person_id) )
This produces a new table with all the column names from both tables and six rows.
Using count
Challenge
Count the number of people with each condition
R
person_condition |> count(gender_concept_name, condition_concept_name)
OUTPUT
# A tibble: 5 × 3
gender_concept_name condition_concept_name n
<chr> <chr> <int>
1 FEMALE Fever, unspecified 2
2 FEMALE Nausea and vomiting 1
3 FEMALE Wheezing 1
4 MALE Fever, unspecified 1
5 MALE Nausea and vomiting 1
This produces a table:

For the Confident
Using the “GiBleed” database work out the number of male and female patients with each condition_concept_id
R
cdm$person |>
left_join( cdm$condition_occurrence, by = join_by(person_id) ) |>
group_by(condition_concept_id, gender_concept_id) |>
summarise(num_persons = n_distinct(person_id)) |>
collect() |>
ungroup() |>
omopcept::omop_join_name_all() |>
#remove some columns to make display clearer
select(-condition_concept_id, -gender_concept_id) |>
arrange(condition_concept_name)
OUTPUT
# A tibble: 158 × 3
condition_concept_name gender_concept_name num_persons
<chr> <chr> <dbl>
1 Acute allergic reaction FEMALE 59
2 Acute allergic reaction MALE 57
3 Acute bacterial sinusitis FEMALE 418
4 Acute bacterial sinusitis MALE 368
5 Acute bronchitis MALE 1243
6 Acute bronchitis FEMALE 1300
7 Acute cholecystitis FEMALE 29
8 Acute cholecystitis MALE 6
9 Acute viral pharyngitis MALE 1284
10 Acute viral pharyngitis FEMALE 1322
# ℹ 148 more rows
Solutions for working out Who’s who programmatically.
How old is the black gentleman?
R
library(dplyr)
year_of_birth <- person_named %>%
filter(grepl("black", race_concept_name, ignore.case = TRUE)) %>%
select(year_of_birth)
person_age <- year_of_birth$year_of_birth[1]
age = 2025 - person_age
print(age)
OUTPUT
[1] 25
In which month was an unspecified fever prevalent in the hospital?
The lubridate package has a function
R
library(lubridate)
OUTPUT
Attaching package: 'lubridate'
OUTPUT
The following objects are masked from 'package:base':
date, intersect, setdiff, union
R
month = month(ymd("2025-01-30"), label = TRUE)
print(month)
OUTPUT
[1] Jan
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
which returns month = Jan
The dyplr package has a function that will allow you to add columns to a table
R
new_person_table = mutate(person, age = 2025-year_of_birth)
print(new_person_table)
OUTPUT
person_id year_of_birth gender_concept_id race_concept_id age
1 1 1980 8532 46285833 45
2 2 1971 8532 46286810 54
3 3 2000 8507 46285836 25
4 4 2010 8507 37394011 15
which adds a column called age to the person table
R
library(dplyr)
library(lubridate)
fever_months <- condition_occurrence_named %>%
# Filter for rows where the condition name contains "fever"
filter(grepl("fever", condition_concept_name, ignore.case = TRUE)) %>%
# Extract month from the condition_start_date
mutate(month = month(condition_start_date, label = TRUE)) %>%
# Select just the month column for the results
select(month) %>%
# Count occurrences by month
group_by(month) %>%
summarise(count = n()) %>%
# Sort by count (descending)
arrange(desc(count))
# This gives us a table with the months where people had a fever and how many people had the fever each month. The question tells us that there is only one month so we select that.
fever <- fever_months$month[1]
# View the results
print(fever)
OUTPUT
[1] Jul
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
What was the ethnicity of the patient not affected by this fever?
R
library(dplyr)
people_without_condition <- person_named %>%
# we want to join with the people who do not meet the condition
anti_join(condition_occurrence_named %>%
filter(grepl("fever", condition_concept_name, ignore.case = TRUE)),
by = "person_id")
ethnicity <- people_without_condition$race_concept_name
# View results
print(ethnicity)
OUTPUT
[1] "Ethnicity not stated"
Give a description of the patient who received Amoxicillin because they were wheezing?
R
library(dplyr)
# Query to find persons prescribed amoxicillin for wheezing
amoxicillin_for_wheezing <- person_named %>%
# Join with condition_occurrence table
inner_join(condition_occurrence_named, by = "person_id") %>%
# Filter for wheezing condition
filter(grepl("wheez", condition_concept_name, ignore.case = TRUE)) %>%
# Join with drug_exposure table to find medications
inner_join(drug_exposure_named, by = "person_id") %>%
# Filter for amoxicillin prescriptions
filter(grepl("amoxicillin", drug_concept_name, ignore.case = TRUE))
# again the question implies there is only one person
age <- 2025-amoxicillin_for_wheezing$year_of_birth[1]
gender <- amoxicillin_for_wheezing$gender_concept_name[1]
ethnicity <- amoxicillin_for_wheezing$race_concept_name[1]
# View results
print(age)
OUTPUT
[1] 54
R
print(gender)
OUTPUT
[1] "FEMALE"
R
print(ethnicity)
OUTPUT
[1] "White: English or Welsh or Scottish or Northern Irish or British - England and Wales ethnic category 2011 census"
- Using a standard makes it much easier to share data
- OMOP uses concepts to link dated to standard vocabularies
- R can be used to join and interrogate data
Content from Why OMOP?
Last updated on 2025-09-17 | Edit this page
TODO not sure we need this episode, maybe some objectives could be copied to others
Overview
Questions
- Why use OMOP?
- Why not use spreadsheets?
- What are the advantages of OMOP?
- What are the disadvantages of OMOP?
Objectives
- Examine the diagram of the OMOP tables and the data specification
- Familiarise with the vocab schema
- Join two or more tables together
- Attempt to join data from spreadsheets with different structures
- Describe the pros and cons on using OMOP vs raw data, why this is the way forward
- Use Athena and other OHDSI tools for reference
- Describe the full landscape of OMOP tools and the community
Introduction
TODO decide whether to include this earlier example, doesn’t currently work because we don’t have the data. Maybe not ?
Challenge 1: Compare data from two separate OMOP data sets
Let’s read in an OMOP extract called extract_1 from the local files.
R
omop_dataset_file_location_1 <- here::here("extracts/uclh1")
extract_1 <- read_omop_dataset(omop_dataset_file_location_1)
A colleague at the hospital is familiar with the events that occurred in hospital to one of the patients in the dataset. This patient has been identified by the data team as the anonymised patient with person id 7.
R
extract_1$person |>
filter(person_id==7) |>
select(person_id, race_concept_id, gender_concept_id, year_of_birth) |>
omopcept::omop_join_name_all() |>
collect()
Verify that the patient details match your colleague’s description.
Let’s take a sample of patients in this dataset, selecting those same columns.
R
extract_1_pt_sample <- extract_1$person |>
slice_sample(n = 10) |>
select(person_id, race_concept_id, gender_concept_id, year_of_birth) |>
collect()
We’ve received another OMOP dataset from another site.
R
omop_dataset_file_location_2 <- here::here("extracts/other_site_1")
extract_2 <- read_omop_dataset(omop_dataset_file_location_2)
Let’s take a sample of patients from the second extract and bind them together.
Note that, because the structure of the data (table names, columns and data types) are set as standard by the OMOP specification, we are guaranteed to be able to bind these two datasets together without error. We can also re-apply the same code, only changing the reference to the new extract.
R
extract_2_pt_sample <- extract_2$person |>
slice_sample(n = 10) |>
select(person_id, race_concept_id, gender_concept_id, year_of_birth) |>
collect()
bind_rows(extract_1_pt_sample, extract_2_pt_sample)
Content from concepts and conditions
Last updated on 2025-09-17 | Edit this page
Overview
Questions
- What is an OMOP concept ?
- Where are patient conditions stored in OMOP ?
Objectives
- Understand that nearly everything in a hospital can be represented by an OMOP concept_id.
- Know that OMOP data usually includes the OMOP concept table and other data from the vocabularies
- Be able to look up concepts by their name
- Know that patient conditions are stored in the condition_occurrence table
Introduction
Nearly everything in a hospital can be represented by an OMOP concept_id.
Any column within the OMOP CDM named *concept_id
contains OMOP concept IDs. An OMOP concept_id is a unique integer
identifier. Concept_ids are defined in the OMOP concept table where a
corresponding name and other attributes are stored. OMOP contains
concept_ids for other medical vocabularies such as SNOMED and LOINC,
which OMOP terms as source vocabularies.
concept table columns | Description |
---|---|
concept_id | Unique identifier for the concept. |
concept_name | Name or description of the concept. |
domain_id | The domain to which the concept belongs (e.g. Condition, Drug). |
vocabulary_id | The vocabulary from which the concept originates (e.g. SNOMED, RxNorm). |
concept_class_id | Classification within the vocabulary (e.g. Clinical Finding, Ingredient). |
standard_concept | ‘S’ for standard concepts that can be included in network studies |
concept_code | Code used by the source vocabulary to identify the concept. |
valid_start_date | Date the concept became valid in OMOP. |
valid_end_date | Date the concept ceased to be valid. |
invalid_reason | Reason for invalidation, if applicable |
Looking up OMOP concepts
OMOP concepts can be looked up in Athena an online tool provided by OHDSI.
TODO provide a challenge to look up a concept in Athena TODO add a challenge to look up the same concept in one of the synthea datasets TODO add briefly about concept relationships
TODO ? add about the condition_occurrence table to make the episode longer and give something for concept to refer to
- Understand that nearly everything in a hospital can be represented by an OMOP concept_id.
- Know that OMOP data usually includes the OMOP concept table and other data from the vocabularies
- Be able to look up concepts by their name
- Know that patient conditions are stored in the condition_occurrence table
Content from measurements and observations
Last updated on 2025-09-17 | Edit this page
Overview
Questions
- How to access measurements and observations ?
Objectives
- know that measurements are mainly lab results and other records like pulse rate
- know observations are other facts obtained through questioning or direct observation
- understand concept ids identify the measure or observation, values are stored in value_as_number or value_as_concept_id
- be able to join to the concept table to find a particular measurement or observation concept by name
- understand that different clinical questions can be answered by querying by patient and/or visit, or summing across all records
Introduction
The OMOP measurement and observation tables contain information collected about a person. The difference between them is that measurement contains numerical or categorical values collected by a standardised process, whereas observation contains less standardised clinical facts. Measurements are predominately lab tests with a few exceptions, like blood pressure or function tests.
A person_id
column means that there can be multiple
records per person.
Columns are similar between measurement and observation.
Concepts and values
Data are stored as questions and answers. A question
(e.g. Pulse rate
) is defined by a concept_id and the answer
is stored in a value column.
measurement_concept_id or observation_concept_id columns define what has been recorded here are some examples :
Example Measurement concepts | Example Observation concepts |
---|---|
Respiratory rate | Respiratory function |
Pulse rate | Wound dressing observable |
Hemoglobin saturation with oxygen | Mandatory breath rate |
Body temperature | Body position for blood pressure measurement |
Diastolic blood pressure | Alcohol intake - finding |
Arterial oxygen saturation | Tobacco smoking behavior - finding |
Body weight | Vomit appearance |
Leukocytes [#/volume] in Blood | State of consciousness and awareness |
These value columns store values :
column name | data type | example | concept_name |
---|---|---|---|
value_as_number | numeric value | 1.2 | - |
unit_concept_id | units of the numeric value | 9529 | kilogram |
value_as_concept_id | categorical value | 4328749 | High |
operator_concept_id | optional operators | 4172704 | > |
Where values are a concept_id, the name of that concept can be looked up in the concept table that is part of the OMOP vocabularies and included in most CDM instances. We show this below.
Looking at numeric measurement values
R
install.packages("MeasurementDiagnostics")
library(dplyr)
cdm <- MeasurementDiagnostics::mockMeasurementDiagnostics()
# first we can see that some concepts have a value_as_number
freq_numeric <- cdm$measurement |>
filter(!is.na(value_as_number)) |>
count(measurement_concept_id, unit_concept_id) |>
# join concept names
left_join(select(cdm$concept, concept_id, concept_name), by=join_by(measurement_concept_id==concept_id)) |>
rename(measurement_concept_name=concept_name) |>
left_join(select(cdm$concept, concept_id, concept_name), by=join_by(unit_concept_id==concept_id)) |>
rename(unit_concept_name=concept_name) |>
collect()
freq_numeric |> head(3)
Output of numeric values
TODO these data from MeasurementDiagnostics aren’t great but they are a start
- Good that it is a complete CDM with the concept table for joining names.
- Bad few values and unrealistic units
OUTPUT
measurement_concept_id unit_concept_id n measurement_concept_name unit_concept_name
<int> <dbl> <dbl> <chr> <chr>
1 3002069 9529 50 Alkaline phosphatase.bone/Alkaline phosphatase.… kilogram
2 3012056 9529 50 Uroporphyrin 3 isomer [Moles/volume] in Urine kilogram
3 3026074 9529 50 Uroporphyrin 3 isomer [Moles/volume] in Stool kilogram
Challenge 1: Can you adapt the code above to output the frequency of categorical values ?
R
# other concepts have a value_as_concept_id
freq_categorical <- cdm$measurement |>
filter(!is.na(value_as_concept_id)) |>
count(measurement_concept_id, value_as_concept_id) |>
# join concept names
left_join(select(cdm$concept, concept_id, concept_name), by=join_by(measurement_concept_id==concept_id)) |>
rename(measurement_concept_name=concept_name) |>
left_join(select(cdm$concept, concept_id, concept_name), by=join_by(value_as_concept_id==concept_id)) |>
rename(value_as_concept_name=concept_name) |>
collect()
freq_categorical |> head(3)
OUTPUT
measurement_concept_id value_as_concept_id n measurement_concept_name value_as_concept_name
<int> <dbl> <dbl> <chr> <chr>
1 3001467 4328749 33 Alkaline phosphatase.bone [Enzymatic ac… High
2 3002069 4267416 33 Alkaline phosphatase.bone/Alkaline phos… Low
3 3011539 4267416 33 Uroporphyrin 3 isomer [Moles/volume] in… Low
When a measurement or observation was done
There are date and time columns indicating when a measurement or observation was performed.
A visit_occurrence_id specifies the visit that it occurred in.
TODO do we want to add more here with example data
Source data columns
There are columns that can store data from the source database that hasn’t been standardised to the same level. Usually it is not a good idea to use these in analyses because any code is unlikely to work on different data. The source columns can be used to check how the standardised data were arrived at.
Source columns in the measurement table :
measurement_source_value
measurement_source_concept_id
unit_source_value
value_source_value
Advanced : Linking measurement and observation tables to other tables e.g. specimen
There are measurement_event_id and observation_event_id columns that can link a record to the primary key in another table e.g. specimen_id. If these are used then meas_event_field_concept_id or obs_event_field_concept_id need to contain the concept_id corresponding to the linked table (in this case specimen).
For microbiology results observation_concept_id can store the genus & species of an organism & measurement_concept_id can store its growth.
TODO question in the above do they need to be linked by fact_relationship to deal with multiple organisms from the same specimen ? TODO do we want to add more here with example data
- know that measurements are mainly lab results and other records like pulse rate
- know observations are other facts obtained through questioning or direct observation
- understand concept ids identify the measure or observation, values are stored in value_as_number or value_as_concept_id
- be able to join to the concept table to find a particular measurement or observation concept by name
- understand that different clinical questions can be answered by querying by patient and/or visit, or summing across all records
Content from Visits
Last updated on 2025-09-17 | Edit this page
Overview
Questions
- What are visits and how can they be used ?
Objectives
- Know that a visit is a period of time and patients can have multiple visits
- Understand that multiple measurements, conditions etc. can occur within and between visits
- Understand that for some analyses you will want to look within visits and for other analyses to sum across visits
- Know that visits are recorded in the visit_occurrence table
- Know each visit is unique to a person
- Understand that other tables link to visits
- Understand how visits can be used to find co-occurrence of other events
Introduction
TODO not sure if this wants to be in a separate episode or a more general one about linking tables. It could always be renamed later.
The visit_occurrence
table contains events
where Persons engage with the healthcare system for a duration of
time.
The main clinical tables condition_occurrence
,
measurement
, observation
and
drug_exposure
contain a visit_occurrence_id
that links to this table.
visit_concept_id
specifies the kind of visit that took
place using standardised OMOP concepts. These include
Inpatient visit
, Emergency Room Visit
and
Outpatient Visit
. Inpatient visits can last for longer than
one day.
The visit_detail
table can contain information about
time periods shorter than the visit (for example transfer between wards)
but we will not cover that further here.
Generating some example data
Firstly here is some code (from ChatGPT) to create some interesting example data. TODO this code is lengthy, we may later want to hide it or save the output as .Rdata in the repo
R
#person, visit_occurrence, measurement, drug_exposure, condition_occurrence, and concept tables
#realistic blood pressure trends over time
#conditions and drugs co-occurring within visits
#concept table for joining concept names
# Install and load required packages
#install.packages(c("dplyr", "tibble", "lubridate", "uuid"), dependencies = TRUE)
library(dplyr)
library(tibble)
library(lubridate)
library(uuid)
library(tidyr)
set.seed(123)
# 1. Create 100 synthetic patients
n_patients <- 100
person <- tibble(
person_id = 1:n_patients,
gender_concept_id = sample(c(8507, 8532), n_patients, replace = TRUE), # Male / Female
year_of_birth = sample(1940:2000, n_patients, replace = TRUE)
)
# 2. Create multiple visits per patient
visits_per_person <- sample(2:5, n_patients, replace = TRUE)
visit_occurrence <- tibble(
person_id = rep(person$person_id, times = visits_per_person)
) |>
mutate(
visit_occurrence_id = row_number(),
visit_start_date = as_date("2020-01-01") + sample(0:1000, n(), replace = TRUE),
visit_concept_id = sample(c(9201, 9202, 9203), n(), replace = TRUE), # Inpatient, ER, Outpatient
#ER & outpatient just a day
visit_end_date = if_else( visit_concept_id == 9201,
visit_start_date + sample(1:3, n(), replace = TRUE),
visit_start_date),
)
# Baseline + slope per patient in one tibble
bp_params <- tibble(
person_id = 1:n_patients,
bp_slope = rnorm(n_patients, mean = 0, sd = 0.3), # mmHg per hour
systolic_base = rnorm(n_patients, mean = 120, sd = 10)
)
measurement <- visit_occurrence |>
left_join(bp_params, by = "person_id") |>
rowwise() |>
mutate(
n_hours = as.numeric(interval(visit_start_date, visit_end_date) / hours(1)),
hours_seq = list(visit_start_date + hours(0:n_hours))
) |>
ungroup() |>
select(person_id, visit_occurrence_id, bp_slope,
systolic_base, visit_start_date, hours_seq) |>
unnest(hours_seq) |>
mutate(
hours_since_start = as.numeric(difftime(hours_seq, visit_start_date, units = "hours")),
systolic = systolic_base +
(hours_since_start * bp_slope) + # per-hour slope
rnorm(n(), 0, 5), # noise
measurement_id = UUIDgenerate(n = n()),
measurement_concept_id = 3004249, # systolic
unit_concept_id = 8510, # mmHg
measurement_date = hours_seq
) |>
select(measurement_id, person_id, visit_occurrence_id,
measurement_date, measurement_concept_id,
systolic, unit_concept_id) |>
rename(value_as_number = systolic)
# 6. Define condition-drug co-occurrence map
condition_drug_map <- tribble(
~condition_concept_id, ~condition_name, ~drug_concept_id, ~drug_name,
201826, "Type 2 diabetes mellitus", 1124300, "Metformin",
320128, "Essential hypertension", 1112807, "Lisinopril",
319835, "Hyperlipidemia", 19019073, "Atorvastatin"
)
# 7. Generate conditions per visit
condition_occurrence <- visit_occurrence |>
rowwise() |>
do({
#n_conditions <- sample(1:2, 1)
#change n_conditions to 1 so that co-occurrence example works
n_conditions <- 1
selected_conditions <- condition_drug_map |>
slice_sample(n = n_conditions)
tibble(
condition_occurrence_id = UUIDgenerate(n_conditions),
person_id = .$person_id,
visit_occurrence_id = .$visit_occurrence_id,
condition_concept_id = selected_conditions$condition_concept_id,
condition_start_date = .$visit_start_date
)
}) |>
bind_rows()
# 8. Generate drug exposures that match conditions
drug_exposure <- condition_occurrence |>
left_join(condition_drug_map, by = "condition_concept_id") |>
group_by(person_id, visit_occurrence_id, drug_concept_id) |>
summarise(
drug_exposure_id = UUIDgenerate(1),
drug_exposure_start_date = min(condition_start_date),
.groups = "drop"
) |>
mutate(days_supply = sample(30:90, n(), replace = TRUE))
# 9. Build concept table (with all used concepts)
concept <- tribble(
~concept_id, ~concept_name,
3004249, "Systolic Blood Pressure",
3012888, "Diastolic Blood Pressure",
3027114, "Glucose",
3016502, "Creatinine",
1124300, "Metformin",
1112807, "Lisinopril",
19019073, "Atorvastatin",
201826, "Type 2 diabetes mellitus",
320128, "Essential hypertension",
319835, "Hyperlipidemia",
8510, "mm[Hg]",
8713, "mg/dL",
8840, "mmol/L",
9201, "Inpatient Visit",
9202, "Emergency Room Visit",
9203, "Outpatient Visit",
8507, "Male",
8532, "Female"
)
# 10. Combine into synthetic CDM object
cdm <- list(
person = person,
visit_occurrence = visit_occurrence,
measurement = measurement,
condition_occurrence = condition_occurrence,
drug_exposure = drug_exposure,
concept = concept
)
When do we need to consider visits ?
As we have seen we don’t need to consider visits to answer all questions. For example if we can count the number of patients with a particular condition without considering visits.
Using visits can help us with :
- different types of visits
- selecting data from an indivual or selected visits
- finding co-occurrence of events within a visit
We can query the visit_occurrence
table to see what
kinds of visits there are in the data.
R
library(dplyr)
cdm$visit_occurrence |>
count(visit_concept_id) |>
left_join(cdm$concept |> select(concept_id, concept_name),
by = c("visit_concept_id" = "concept_id"))
OUTPUT
# A tibble: 3 × 3
visit_concept_id n concept_name
<dbl> <int> <chr>
1 9201 105 Inpatient Visit
2 9202 105 Emergency Room Visit
3 9203 125 Outpatient Visit
R
cdm$visit_occurrence
OUTPUT
# A tibble: 335 × 5
person_id visit_occurrence_id visit_start_date visit_concept_id
<int> <int> <date> <dbl>
1 1 1 2020-12-02 9201
2 1 2 2022-03-07 9203
3 1 3 2020-01-26 9203
4 1 4 2021-06-22 9201
5 1 5 2022-09-07 9201
6 2 6 2021-06-02 9203
7 2 7 2022-08-13 9203
8 2 8 2022-01-26 9202
9 2 9 2021-10-27 9202
10 2 10 2021-07-06 9203
# ℹ 325 more rows
# ℹ 1 more variable: visit_end_date <date>
Can you use person_id
in the
visit_occurrence
table to find patients with more than one
visit ?
R
visit_counts <- cdm$visit_occurrence |>
group_by(person_id) |>
summarise(n_visits = n()) |>
filter(n_visits > 1) |>
collect()
visit_counts |> head(4)
OUTPUT
# A tibble: 4 × 2
person_id n_visits
<int> <int>
1 1 5
2 2 5
3 3 3
4 4 4
Now we can choose one of the patients from the previous table and look at all of their visits.
R
example_person_id <- visit_counts$person_id[1]
patient_visits <- cdm$visit_occurrence |>
filter(person_id == example_person_id) |>
left_join(cdm$concept, by = c("visit_concept_id" = "concept_id")) |>
select(
visit_occurrence_id, visit_start_date, visit_end_date, concept_name
) |>
arrange(visit_start_date) |>
collect()
patient_visits
OUTPUT
# A tibble: 5 × 4
visit_occurrence_id visit_start_date visit_end_date concept_name
<int> <date> <date> <chr>
1 3 2020-01-26 2020-01-26 Outpatient Visit
2 1 2020-12-02 2020-12-04 Inpatient Visit
3 4 2021-06-22 2021-06-24 Inpatient Visit
4 2 2022-03-07 2022-03-07 Outpatient Visit
5 5 2022-09-07 2022-09-10 Inpatient Visit
Outpatient and Emergency Room Visits usually end on the same day, where Inpatient visits can last longer.
Here is how we can plot a measurement (in this case blood pressure)
over time for a patient. You may remember from a previous lesson that
numeric measurements are stored in the column
value_as_number
.
First we filter the data.
R
# Define concept IDs
systolic_id <- 3004249
selected_patient <- 1
bp <- cdm$measurement |>
filter(person_id == selected_patient) |>
filter(measurement_concept_id %in% c(systolic_id))
head(bp,3)
OUTPUT
# A tibble: 3 × 7
measurement_id person_id visit_occurrence_id measurement_date
<chr> <int> <int> <dttm>
1 729e5403-5326-44a1-a538-1a0… 1 1 2020-12-02 00:00:00
2 80633537-2883-49d6-9d22-94a… 1 1 2020-12-02 01:00:00
3 7fd089fc-a1f1-44b1-adb0-09e… 1 1 2020-12-02 02:00:00
# ℹ 3 more variables: measurement_concept_id <dbl>, value_as_number <dbl>,
# unit_concept_id <dbl>
Then we can plot using ggplot2.
R
library(ggplot2)
ggplot(bp, aes(x = measurement_date,
y = value_as_number)) +
geom_point() +
theme_minimal()

You should be able to see some dates with single measurements and some with a few measurements very close to each other. Each separate group is a single visit.
Challenge
Can you modify the query and plot to show the type of visit ?
You can join to visit_occurrence to get visit_concept_id, and from
that join to the concept table to get concept_name. You can add
shape =
to the aes() statement in the plot.
R
# Define concept IDs
systolic_id <- 3004249
selected_patient <- 1
bp2 <- cdm$measurement |>
filter(person_id == selected_patient) |>
filter(measurement_concept_id %in% c(systolic_id)) |>
left_join(cdm$visit_occurrence, by = c("visit_occurrence_id", "person_id")) |>
left_join(cdm$concept, by = join_by(visit_concept_id == concept_id)) |>
rename(visit_concept_name = concept_name)
ggplot(bp2, aes(x = measurement_date, y = value_as_number, shape = visit_concept_name)) +
geom_point() +
theme_minimal()

To indicate the type of visit we need to join to visit_occurrence to get the visit_concept_id and then join to the concept table to get a name for that concept.
Note that we join the visit_occurrence table to the measurement table
using both ("visit_occurrence_id", "person_id")
. If we
didn’t include both, one of the columns would get duplicated and renamed
making it difficult to select later. Se with person_id
below.
R
cdm$measurement |>
left_join(cdm$visit_occurrence, by = c("visit_occurrence_id")) |>
names()
OUTPUT
[1] "measurement_id" "person_id.x" "visit_occurrence_id"
[4] "measurement_date" "measurement_concept_id" "value_as_number"
[7] "unit_concept_id" "person_id.y" "visit_start_date"
[10] "visit_concept_id" "visit_end_date"
Challenge
Can you plot the measurements for one of the Inpatient visits ?
You can filter bp created in the previous challenge by one of the visit_occurrence_id from the table earlier.
R
bp3 <- bp2 |> filter( visit_occurrence_id == 1 )
ggplot(bp3, aes(x = measurement_date, y = value_as_number, shape = visit_concept_name)) +
geom_point() +
theme_minimal()

You should be able to see hourly blood pressure measurements within a visit.
Here is an example where the visit can make a difference to an analysis. Imagine we want to look at which drugs are associated with which conditions for patients.
If we were to join tables by person_id
as we have seen
in previous exercises we could get conditions and drugs that were
separated by many years. Instead we can include
visit_occurrence_id
in the join to get co-occurrence of
conditions and drugs in the same visit.
R
co_occurrence_by_visit <- cdm$condition_occurrence |>
# Join with drug_exposure on person and visit
inner_join(cdm$drug_exposure, by = c("person_id", "visit_occurrence_id")) |>
# Count how often each condition–drug pair co-occurs in a visit
group_by(condition_concept_id, drug_concept_id) |>
summarise(co_occurrences = n(), .groups = "drop") |>
# Join to get condition name
left_join(
cdm$concept |> select(concept_id, concept_name),
by = c("condition_concept_id" = "concept_id")
) |>
rename(condition_name = concept_name) |>
# Join to get drug name
left_join(
cdm$concept |> select(concept_id, concept_name),
by = c("drug_concept_id" = "concept_id")
) |>
rename(drug_name = concept_name) |>
select(condition_name, drug_name, co_occurrences)
co_occurrence_by_visit
OUTPUT
# A tibble: 3 × 3
condition_name drug_name co_occurrences
<chr> <chr> <int>
1 Type 2 diabetes mellitus Metformin 127
2 Hyperlipidemia Atorvastatin 95
3 Essential hypertension Lisinopril 113
Here we see a perfect co-occurrence between conditions & expected drugs for treating that condition (because we generated the example data that way). If instead you didn’t use the visit_occurrence_id you would see more unexpected associations occurring across widely separated visits for the same patient.
Challenge
Can you repeat the query without using
visit_occurrence_id
to see what results you get ?
R
co_occurrence <- cdm$condition_occurrence |>
# Join with drug_exposure on person
inner_join(cdm$drug_exposure, by = c("person_id")) |>
# Count how often each condition–drug pair co-occurs
group_by(condition_concept_id, drug_concept_id) |>
summarise(co_occurrences = n(), .groups = "drop") |>
# Join to get condition name
left_join(
cdm$concept |> select(concept_id, concept_name),
by = c("condition_concept_id" = "concept_id")
) |>
rename(condition_name = concept_name) |>
# Join to get drug name
left_join(
cdm$concept |> select(concept_id, concept_name),
by = c("drug_concept_id" = "concept_id")
) |>
rename(drug_name = concept_name) |>
select(condition_name, drug_name, co_occurrences)
co_occurrence
OUTPUT
# A tibble: 9 × 3
condition_name drug_name co_occurrences
<chr> <chr> <int>
1 Type 2 diabetes mellitus Lisinopril 116
2 Type 2 diabetes mellitus Metformin 257
3 Type 2 diabetes mellitus Atorvastatin 103
4 Hyperlipidemia Lisinopril 89
5 Hyperlipidemia Metformin 103
6 Hyperlipidemia Atorvastatin 179
7 Essential hypertension Lisinopril 205
8 Essential hypertension Metformin 116
9 Essential hypertension Atorvastatin 89
Now we see that conditions co-occur with drugs unlikely to be used in their treatment because they came from visits further apart in time.
- Know that a visit is a period of time and patients can have multiple visits
- Understand that multiple measurements, conditions etc. can occur within and between visits
- Understand that for some analyses you will want to look within visits and for other analyses to sum across visits
- Know that visits are recorded in the visit_occurrence table
- Know each visit is unique to a person
- Understand that other tables link to visits
- Understand how visits can be used to find co-occurrence of other events
Content from Medications
Last updated on 2025-09-17 | Edit this page
Overview
Questions
- Where are medications stored ?
Objectives
- Know that exposure of a patient to medications is mainly stored in the drug_exposure table
- Understand that drug concepts can be at different levels of granularity
- Understand that source values are mapped to a standard vocabulary
Introduction
The OMOP drug_exposure table stores exposure of a patient to medications.
The main columns are :
column name | content |
---|---|
drug_exposure_id | unique identifier given that person can get multiple exposures per visit |
person_id | the patient |
drug_concept_id | standard drug identifier, can be at different levels of granularity |
drug_exposure_start_date | may also be an optional start_datetime |
drug_exposure_end_date | may also be an optional end_datetime |
drug_type_concept_id | where the record came from e.g. EHR administration record |
drug_source_value | in UCLH the NHS dm+d Dictionary of Medicines and Devices ID |
drug_source_concept_id | OMOP concept ID for the source value |
Drug data can be very complicated, as can the process of converting from the source data to OMOP. You may not find what you expect depending on this and the quality of the source data.
Drug concepts
The standard OHDSI drug vocabularies are called RxNorm
and RxNormExtension
. RxNorm
contains all drugs
currently on the US market. RxNormExtension
is maintained
by the OHDSI community and contains all other drugs.
A particular concept_id can be at one of a number of different levels in a drug hierarchy. These are the main levels in RxNorm which are stored in concept_class_id in the concept table.
RxNorm concept_class_id | Description |
---|---|
Ingredient | A base active drug ingredient, without strength or dose form (e.g., Ibuprofen). |
Clinical Drug Component | A drug component with strength but no form (e.g., Ibuprofen 200 mg). |
Clinical Drug | A combination of an ingredient, strength, and dose form (e.g., Ibuprofen 200 mg Oral Tablet). |
Drug mapping in the NHS
Drugs in the NHS are standardised to the NHS Dictionary of Medicines
and Devices (dm+d). dm+d is included in OMOP so there are values of OMOP
concept_id for each dm+d. However because dm+d is not a standard
vocabulary in OMOP it is translate once more to get to a standard OMOP
concept id in RxNorm
or RxNormExtension
that
can be used in collaborative studies. If there is a drug_concept_id
value of 0 and there are source codes this can be because that drug
doesn’t map to a standard ID. Reminder that the source values are stored
in these columns.
column name | content |
---|---|
drug_source_value | in UCLH the NHS dm+d Dictionary of Medicines and Devices ID |
drug_source_concept_id | OMOP concept ID for the source value |
TODO do we want to include about classifying medications by type ? It is not straightforward. I did something in omopcept but probably too tricky to include here.
R
# TODO later use the examples below or similar to create exercises
# OR do we want to make our own example data, e.g. including dm+d
# explore some drug_exposure data
library(CDMConnector)
library(DBI)
library(duckdb)
library(dplyr)
dbName <- "GiBleed" #smaller than other datasets quick to download
requireEunomia(datasetName = dbName)
con <- dbConnect(duckdb(), eunomiaDir(datasetName = dbName))
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main", cdmName = dbName)
#67k records
#cdm$drug_exposure |> tally()
drugs_and_concept <- cdm$drug_exposure |>
left_join(cdm$concept, join_by(drug_concept_id==concept_id)) |>
collect()
drugs_and_concept |>
count(concept_class_id, sort=TRUE) |>
head(5)
# concept_class_id n
#1 Clinical Drug 36218
#2 CVX 25710
#3 Ingredient 2764
#4 Branded Pack 1640
#5 Quant Clinical Drug 1019
#note that CVX is an RxNorm id for vaccines
# top 5 clinical drugs
drugs_and_concept |>
filter(concept_class_id=="Clinical Drug") |>
count(drug_concept_id, concept_name, sort=TRUE) |>
head(5)
# drug_concept_id concept_name n
#1 1127433 Acetaminophen 325 MG Oral Tablet 9365
#2 19059056 Aspirin 81 MG Oral Tablet 4380
#3 1713671 Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet 3851
#4 1127078 Acetaminophen 160 MG Oral Tablet 2158
#5 40229134 Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide … 1993
# top 5 ingredients
# in this case probably drugs that couldn't be converted to a more granular level
drugs_and_concept |>
filter(concept_class_id=="Ingredient") |>
count(drug_concept_id, concept_name, sort=TRUE) |>
head(3)
# drug_concept_id concept_name n
#1 1118084 celecoxib 1844
#2 1124300 Diclofenac 850
#3 1367571 heparin 35
# TODO decide if we want to include drug quantity. Would need different example data.
# no drug quantity information in GiBleed
drugs_and_concept |> count(quantity, sort=TRUE)
# but there are plenty of days_supply
# oh dear, drug_strength has no rows so can't do the query below
# If you do have quantity data it can be converted to standard units using the
# drug_strength table that is part of the vocabularies.
# This will depend on the dose form which can be obtained from the relationship table.
# where the relationship_id is ‘Has dose form’.
# probably don't want to go into further here
# e.g. Tablets can be looked up
#"Acetaminophen 325 MG Oral Tablet" from above
cdm$drug_strength |> filter(drug_concept_id==1127433L) |>
#add name for the ingredient
left_join(select(cdm$concept,concept_id,concept_name), by=join_by(ingredient_concept_id==concept_id)) |>
rename(ingredient_concept_name=concept_name) |>
#add name for the amount unit
left_join(select(cdm$concept,concept_id,concept_name), by=join_by(amount_unit_concept_id==concept_id)) |>
rename(amount_unit_concept_name=concept_name) |>
select(drug_concept_id, ingredient_concept_name, amount_value, amount_unit_concept_name) |>
collect()
# to get all drugs containing a particular ingredient
# you can also use the drug_strength table
- Know that exposure of a patient to medications is mainly stored in the drug_exposure table
- Understand that drug concepts can be at different levels of granularity
- Understand that source values are mapped to a standard vocabulary
Content from Community provided software for working with OMOP data
Last updated on 2025-09-17 | Edit this page
Overview
Questions
- What are some of the R tools that can work on OMOP CDM instances ?
Objectives
- Brief outline of some R tools that will be useful for new OMOP users.
Introduction
There are a range of community provided R tools that can help you work with instances of OMOP data.
We are going to show you a brief summary of some that are likely to be of use to new users.
With each you may need to balance the need to learn some new syntax with the benefits of the extra functionality that the package provides.
TODO maybe add a table with links to packages & brief descriptions.
OmopSketch To summarise an OMOP database.
CodelistGenerator To generate lists of OMOP concepts.
OmopSketch To summarise key information about an OMOP database. To provide a broad characterisation of the data and to allow users to evaluate whether they are suitable for particular research.
First we can install the package and its dependencies and connect to some mock data.
R
#without dependencies=TRUE it failed needing omock & VisOmopResults
#with dependencies it installed 91 packages in 1.8 minutes
install.packages("OmopSketch", dependencies=TRUE, quiet=TRUE)
library(dplyr)
library(OmopSketch)
# Connect to mock database
cdm <- mockOmopSketch()
summarise* and table* functions
The package has the following types of functions :
function start | what it does |
---|---|
summarise* | generate results objects. |
table* | convert results objects to tables for display. |
Snapshot
Snapshot creates a broad summary of the database including person count, temporal extent and other metadata.
R
summariseOmopSnapshot(cdm) |>
tableOmopSnapshot(type = "gt")
Estimate |
Database name
|
---|---|
mockOmopSketch | |
General | |
Snapshot date | 2025-09-17 |
Person count | 100 |
Vocabulary version | v5.0 18-JAN-19 |
Observation period | |
N | 100 |
Start date | 1958-01-22 |
End date | 2019-12-24 |
Cdm | |
Source name | eunomia |
Version | 5.3 |
Holder name | - |
Release date | - |
Description | - |
Documentation reference | - |
Source type | duckdb |
Missing data
Summarise missing data in each column of one or many cdm tables.
R
missingData <- summariseMissingData(cdm, c("drug_exposure"))
tableMissingData(missingData)
Column name | Estimate name |
Database name
|
---|---|---|
mockOmopSketch | ||
drug_exposure | ||
drug_exposure_id | N missing data (%) | 0 (0.00%) |
N zeros (%) | 0 (0.00%) | |
person_id | N missing data (%) | 0 (0.00%) |
N zeros (%) | 0 (0.00%) | |
drug_concept_id | N missing data (%) | 0 (0.00%) |
N zeros (%) | 0 (0.00%) | |
drug_exposure_start_date | N missing data (%) | 0 (0.00%) |
drug_exposure_start_datetime | N missing data (%) | 21,600 (100.00%) |
drug_exposure_end_date | N missing data (%) | 0 (0.00%) |
drug_exposure_end_datetime | N missing data (%) | 21,600 (100.00%) |
verbatim_end_date | N missing data (%) | 21,600 (100.00%) |
drug_type_concept_id | N missing data (%) | 0 (0.00%) |
N zeros (%) | 0 (0.00%) | |
stop_reason | N missing data (%) | 21,600 (100.00%) |
refills | N missing data (%) | 21,600 (100.00%) |
quantity | N missing data (%) | 21,600 (100.00%) |
days_supply | N missing data (%) | 21,600 (100.00%) |
sig | N missing data (%) | 21,600 (100.00%) |
route_concept_id | N missing data (%) | 21,600 (100.00%) |
N zeros (%) | 0 (0.00%) | |
lot_number | N missing data (%) | 21,600 (100.00%) |
provider_id | N missing data (%) | 21,600 (100.00%) |
N zeros (%) | 0 (0.00%) | |
visit_occurrence_id | N missing data (%) | 0 (0.00%) |
N zeros (%) | 0 (0.00%) | |
visit_detail_id | N missing data (%) | 21,600 (100.00%) |
N zeros (%) | 0 (0.00%) | |
drug_source_value | N missing data (%) | 21,600 (100.00%) |
drug_source_concept_id | N missing data (%) | 21,600 (100.00%) |
N zeros (%) | 0 (0.00%) | |
route_source_value | N missing data (%) | 21,600 (100.00%) |
dose_unit_source_value | N missing data (%) | 21,600 (100.00%) |
Clinical Records
Allows you to summarise omop tables from a cdm. By default it gives measures including records per person, how many concepts are standard and source vocabularies.
R
summariseClinicalRecords(cdm, "condition_occurrence") |>
tableClinicalRecords(type = "gt")
Variable name | Variable level | Estimate name |
Database name
|
---|---|---|---|
mockOmopSketch | |||
condition_occurrence | |||
Number records | - | N | 8,400 |
Number subjects | - | N (%) | 100 (100.00%) |
Records per person | - | Mean (SD) | 84.00 (9.83) |
Median [Q25 - Q75] | 84 [77 - 91] | ||
Range [min to max] | [65 to 107] | ||
In observation | Yes | N (%) | 8,400 (100.00%) |
Domain | Condition | N (%) | 8,400 (100.00%) |
Source vocabulary | No matching concept | N (%) | 8,400 (100.00%) |
Standard concept | S | N (%) | 8,400 (100.00%) |
Type concept id | Unknown type concept: 1 | N (%) | 8,400 (100.00%) |
You can also
- apply to more than one table at a time
- reduce the measures of records per person
- reduce the number of rows by setting options to FALSE
- stratify by sex and/or age
- set a date range
R
summariseClinicalRecords(cdm, c("drug_exposure","measurement"),
recordsPerPerson = c("mean", "sd"),
inObservation = FALSE,
standardConcept = FALSE,
sourceVocabulary = FALSE,
domainId = FALSE,
typeConcept = FALSE,
sex = TRUE) |>
tableClinicalRecords(type = "gt")
Variable name | Variable level | Estimate name |
Database name
|
---|---|---|---|
mockOmopSketch | |||
drug_exposure; overall | |||
Number records | - | N | 21,600.00 |
Number subjects | - | N (%) | 100 (100.00%) |
Records per person | - | Mean (SD) | 216.00 (14.47) |
drug_exposure; Female | |||
Number records | - | N | 13,526.00 |
Number subjects | - | N (%) | 63 (100.00%) |
Records per person | - | Mean (SD) | 214.70 (15.68) |
drug_exposure; Male | |||
Number records | - | N | 8,074.00 |
Number subjects | - | N (%) | 37 (100.00%) |
Records per person | - | Mean (SD) | 218.22 (12.01) |
measurement; overall | |||
Number records | - | N | 5,900.00 |
Number subjects | - | N (%) | 100 (100.00%) |
Records per person | - | Mean (SD) | 59.00 (7.85) |
measurement; Female | |||
Number records | - | N | 3,742.00 |
Number subjects | - | N (%) | 63 (100.00%) |
Records per person | - | Mean (SD) | 59.40 (8.37) |
measurement; Male | |||
Number records | - | N | 2,158.00 |
Number subjects | - | N (%) | 37 (100.00%) |
Records per person | - | Mean (SD) | 58.32 (6.93) |
Record counts over time
You can plot the number of records over time for any cdm table also stratified by age and sex (so behind the scenes this will be joining clinical tables to the person table).
R
recordCount <- summariseRecordCount(cdm,
omopTableName = "drug_exposure",
interval = "years",
sex = TRUE,
ageGroup = list("<40" = c(0,39), ">=40" = c(40, Inf)),
dateRange = as.Date(c("2002-01-01", NA)))
plotRecordCount(recordCount, facet = "sex", colour = "age_group")

Concept Id counts
You can get a summary of the numbers of concept ids in a cdm table. Unfortuntaely the summary doesn’t display here but you can copy and paste the code into your console to see it.
R
result <- summariseConceptIdCounts(cdm = cdm, omopTableName = "condition_occurrence")
tableConceptIdCounts(head(result,5), display = "standard", type = "datatable")
- Brief outline of some R tools that will be useful for new OMOP users.
Content from Placeholder Chapter 8
Last updated on 2025-09-12 | Edit this page
Overview
Questions
- What
Objectives
- 1
Introduction
This is a lesson created via The Carpentries Workbench. It is written in Pandoc-flavored Markdown for static files and R Markdown for dynamic files that can render code into output. Please refer to the Introduction to The Carpentries Workbench for full documentation.
What you need to know is that there are three sections required for a valid Carpentries lesson template:
-
questions
are displayed at the beginning of the episode to prime the learner for the content. -
objectives
are the learning objectives for an episode displayed with the questions. -
keypoints
are displayed at the end of the episode to reinforce the objectives.
Challenge 1: Can you do it?
What is the output of this command?
R
paste("This", "new", "lesson", "looks", "good")
OUTPUT
[1] "This new lesson looks good"
Challenge 2: how do you nest solutions within challenge blocks?
You can add a line with at least three colons and a
solution
tag.
Figures
You can also include figures generated from R Markdown:
R
pie(
c(Sky = 78, "Sunny side of pyramid" = 17, "Shady side of pyramid" = 5),
init.angle = 315,
col = c("deepskyblue", "yellow", "yellow3"),
border = FALSE
)

Or you can use standard markdown for static figures with the following syntax:
{alt='alt text for accessibility purposes'}
Callout sections can highlight information.
They are sometimes used to emphasise particularly important points but are also used in some lessons to present “asides”: content that is not central to the narrative of the lesson, e.g. by providing the answer to a commonly-asked question.
Math
One of our episodes contains \(\LaTeX\) equations when describing how to create dynamic reports with {knitr}, so we now use mathjax to describe this:
$\alpha = \dfrac{1}{(1 - \beta)^2}$
becomes: \(\alpha = \dfrac{1}{(1 - \beta)^2}\)
Cool, right?
- Use
.md
files for episodes when you want static content - Use
.Rmd
files for episodes when you need to generate output - Run
sandpaper::check_lesson()
to identify any issues with your lesson - Run
sandpaper::build_lesson()
to preview your lesson locally