Content from What is OMOP?


Last updated on 2025-07-08 | Edit this page

Overview

Questions

  • What is OMOP?
  • What information would you expect to find in the person table?
  • What information would you expect to find in the condition_occurrence table?
  • How can you join these tables to aggregate information?

Objectives

  • Examine the diagram of the OMOP tables and the data specification
  • Interrogate the data in the person table

Setting up R


Getting started

Since we want to import the files called *.csv into our R environment, we need to be able to tell our computer where the file is. To do this, we will create a “Project” with RStudio that contains the data we want to work with. The “Projects” interface in RStudio not only creates a working directory for you, but also remembers its location (allowing you to quickly navigate to it). The interface also (optionally) preserves custom settings and open files to make it easier to resume work after a break.

Create a new project

Install the required packages

You will need the tidyverse package from CRAN (the official package repository). You will also need a package we have developed omopcept.

R

install.packages("tidyverse")

OUTPUT

# Downloading packages -------------------------------------------------------
- Downloading tidyverse from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [415.9 Kb in 0.3s]
- Downloading broom from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.8 Mb in 0.27s]
- Downloading conflicted from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [53.8 Kb in 0.2s]
- Downloading dtplyr from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [349.5 Kb in 0.3s]
- Downloading data.table from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [2.6 Mb in 0.24s]
- Downloading forcats from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [411.7 Kb in 0.33s]
- Downloading ggplot2 from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [4.8 Mb in 0.28s]
- Downloading gtable from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [217.4 Kb in 0.26s]
- Downloading isoband from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.6 Mb in 0.15s]
- Downloading scales from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [822 Kb in 0.22s]
- Downloading farver from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.4 Mb in 0.18s]
- Downloading labeling from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [59.5 Kb in 0.23s]
- Downloading RColorBrewer from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [50.8 Kb in 0.31s]
- Downloading viridisLite from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.2 Mb in 0.22s]
- Downloading googledrive from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.8 Mb in 0.48s]
- Downloading gargle from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [757.5 Kb in 0.15s]
- Downloading httr from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [475.7 Kb in 0.2s]
- Downloading curl from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [770.7 Kb in 0.21s]
- Downloading openssl from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [1.3 Mb in 0.19s]
- Downloading askpass from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [21.5 Kb in 0.21s]
- Downloading sys from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [39.9 Kb in 0.18s]
- Downloading uuid from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [47.7 Kb in 0.24s]
- Downloading googlesheets4 from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [504.9 Kb in 0.16s]
- Downloading cellranger from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [101.5 Kb in 0.23s]
- Downloading rematch from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [15.9 Kb in 0.14s]
- Downloading ids from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [119.8 Kb in 0.15s]
- Downloading rematch2 from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [45 Kb in 0.24s]
- Downloading haven from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [378.6 Kb in 0.16s]
- Downloading lubridate from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [969.9 Kb in 0.23s]
- Downloading timechange from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [166.5 Kb in 0.17s]
- Downloading modelr from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [196.1 Kb in 0.16s]
- Downloading ragg from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [646.1 Kb in 0.16s]
- Downloading systemfonts from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [334.9 Kb in 0.17s]
- Downloading textshaping from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [179.4 Kb in 0.22s]
- Downloading readxl from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [398.9 Kb in 0.16s]
- Downloading reprex from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [483.4 Kb in 0.29s]
- Downloading callr from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [438.9 Kb in 0.16s]
- Downloading processx from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [329.4 Kb in 0.16s]
- Downloading ps from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [488.8 Kb in 0.18s]
- Downloading rstudioapi from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [310.4 Kb in 0.15s]
- Downloading rvest from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [293.1 Kb in 0.16s]
- Downloading selectr from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [491.3 Kb in 0.16s]
- Downloading xml2 from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [275.2 Kb in 0.14s]
Successfully downloaded 43 packages in 17 seconds.

The following package(s) will be installed:
- askpass       [1.2.1]
- broom         [1.0.8]
- callr         [3.7.6]
- cellranger    [1.1.0]
- conflicted    [1.2.0]
- curl          [6.4.0]
- data.table    [1.17.6]
- dtplyr        [1.3.1]
- farver        [2.1.2]
- forcats       [1.0.0]
- gargle        [1.5.2]
- ggplot2       [3.5.2]
- googledrive   [2.1.1]
- googlesheets4 [1.1.1]
- gtable        [0.3.6]
- haven         [2.5.5]
- httr          [1.4.7]
- ids           [1.0.1]
- isoband       [0.2.7]
- labeling      [0.4.3]
- lubridate     [1.9.4]
- modelr        [0.1.11]
- openssl       [2.3.3]
- processx      [3.8.6]
- ps            [1.9.1]
- ragg          [1.4.0]
- RColorBrewer  [1.1-3]
- readxl        [1.4.5]
- rematch       [2.0.0]
- rematch2      [2.1.2]
- reprex        [2.1.1]
- rstudioapi    [0.17.1]
- rvest         [1.0.4]
- scales        [1.4.0]
- selectr       [0.4-2]
- sys           [3.4.3]
- systemfonts   [1.2.3]
- textshaping   [1.0.1]
- tidyverse     [2.0.0]
- timechange    [0.3.0]
- uuid          [1.2-1]
- viridisLite   [0.4.2]
- xml2          [1.3.8]
These packages will be installed into "~/work/omop-carpentries/omop-carpentries/renv/profiles/lesson-requirements/renv/library/linux-ubuntu-jammy/R-4.5/x86_64-pc-linux-gnu".

The following required system packages are not installed:
- pandoc  [required by reprex]
The R packages depending on these system packages may fail to install.

An administrator can install these packages with:
- sudo apt install pandoc

# Installing packages --------------------------------------------------------
- Installing broom ...                          OK [installed binary and cached in 0.53s]
- Installing conflicted ...                     OK [installed binary and cached in 0.26s]
- Installing data.table ...                     OK [installed binary and cached in 0.28s]
- Installing dtplyr ...                         OK [installed binary and cached in 0.55s]
- Installing forcats ...                        OK [installed binary and cached in 0.29s]
- Installing gtable ...                         OK [installed binary and cached in 0.41s]
- Installing isoband ...                        OK [installed binary and cached in 0.2s]
- Installing farver ...                         OK [installed binary and cached in 0.19s]
- Installing labeling ...                       OK [installed binary and cached in 0.16s]
- Installing RColorBrewer ...                   OK [installed binary and cached in 0.16s]
- Installing viridisLite ...                    OK [installed binary and cached in 0.17s]
- Installing scales ...                         OK [installed binary and cached in 0.39s]
- Installing ggplot2 ...                        OK [installed binary and cached in 0.75s]
- Installing curl ...                           OK [installed binary and cached in 0.19s]
- Installing sys ...                            OK [installed binary and cached in 0.16s]
- Installing askpass ...                        OK [installed binary and cached in 0.16s]
- Installing openssl ...                        OK [installed binary and cached in 0.21s]
- Installing httr ...                           OK [installed binary and cached in 0.17s]
- Installing gargle ...                         OK [installed binary and cached in 0.33s]
- Installing uuid ...                           OK [installed binary and cached in 0.16s]
- Installing googledrive ...                    OK [installed binary and cached in 0.59s]
- Installing rematch ...                        OK [installed binary and cached in 0.16s]
- Installing cellranger ...                     OK [installed binary and cached in 0.16s]
- Installing ids ...                            OK [installed binary and cached in 0.16s]
- Installing rematch2 ...                       OK [installed binary and cached in 0.39s]
- Installing googlesheets4 ...                  OK [installed binary and cached in 0.58s]
- Installing haven ...                          OK [installed binary and cached in 0.42s]
- Installing timechange ...                     OK [installed binary and cached in 0.17s]
- Installing lubridate ...                      OK [installed binary and cached in 0.3s]
- Installing modelr ...                         OK [installed binary and cached in 0.5s]
- Installing systemfonts ...                    OK [installed binary and cached in 0.27s]
- Installing textshaping ...                    OK [installed binary and cached in 0.27s]
- Installing ragg ...                           OK [installed binary and cached in 0.3s]
- Installing readxl ...                         OK [installed binary and cached in 0.18s]
- Installing ps ...                             OK [installed binary and cached in 0.17s]
- Installing processx ...                       OK [installed binary and cached in 0.18s]
- Installing callr ...                          OK [installed binary and cached in 0.19s]
- Installing rstudioapi ...                     OK [installed binary and cached in 0.17s]
- Installing reprex ...                         OK [installed binary and cached in 0.32s]
- Installing selectr ...                        OK [installed binary and cached in 0.31s]
- Installing xml2 ...                           OK [installed binary and cached in 0.26s]
- Installing rvest ...                          OK [installed binary and cached in 0.31s]
- Installing tidyverse ...                      OK [installed binary and cached in 0.17s]
Successfully installed 43 packages in 13 seconds.

R

install.packages("remotes")

OUTPUT

# Downloading packages -------------------------------------------------------
- Downloading remotes from https://packagemanager.posit.co/cran/__linux__/jammy/latest ... OK [425.9 Kb in 0.17s]
Successfully downloaded 1 package in 0.38 seconds.

The following package(s) will be installed:
- remotes [2.5.0]
These packages will be installed into "~/work/omop-carpentries/omop-carpentries/renv/profiles/lesson-requirements/renv/library/linux-ubuntu-jammy/R-4.5/x86_64-pc-linux-gnu".

# Installing packages --------------------------------------------------------
- Installing remotes ...                        OK [installed binary and cached in 0.18s]
Successfully installed 1 package in 0.2 seconds.

Introduction


OMOP is a format for recording Electronic Healthcare Records. It allows you to follow a patient journey through a hospital by linking every aspect to a standard vocabulary thus enabling easy sharing of data between hospitals, trusts and even countries.

OMOP CDM Diagram

The OMOP Common Data Model
The OMOP Common Data Model

Test yourself

Look at the OMOP-CDM figure and answer the following questions:

  1. Which table is the key to all the other tables?
  2. Which table allows you to distinguish between different stays in hospital?
  1. The Person table

  2. The Visit_occurrence table

Why use OMOP?


Why use the OMOP-CDM
Why use the OMOP-CDM

Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. This means that different tools can also be shared and reused. So using OMOP can help make your research FAIR.

Some simple tables


Loading Data

Now that we are set up with an Rstudio project, we are sure that the data and scripts we are using are all in our working directory. The data files should be located in the directory data, inside the working directory. Now we can load the data into R, there are three data files person.csv, condition_occurrence.csv and drug_exposure.csv. Read each of these into a table with the same name.

Read the Data

There are three data files person.csv, condition_occurrence.csv and drug_exposure.csv. Read each of these into a table with the same name.

NOTE: The data does have headers.

R

person <- read.csv(file = "data/person.csv", header = TRUE)
condition_occurrence <- read.csv(file = "data/condition_occurrence.csv", header = TRUE)
drug_exposure <- read.csv(file = "data/drug_exposure.csv", header = TRUE)

When you have read in the data, take some time to explore it.

Adding concept names


You will have noticed that content of the tables are not terribly easy to understand. This is because everything in OMOP is viewed as a concept that allows it to be related to one or more standard vocabularies such as SNOMED, ICD-10, etc.

We have developed a package that makes it very easy to add concept names to the tables.

You will need the function omopcept::omop_join_name_all(). This will look up the concept_id in the main table of concepts and add a column for the name of the concept associated with that id.

Who’s who?

By creating tables that also have the name of the concepts answer the following questions

  1. How old is the black gentleman?
  2. In which month was an unspecified fever prevalent in the hospital?
  3. What was the ethnicity of the patient not affected by this fever?
  4. Give a description of the patient who received Amoxicillin because they were wheezing?

R

person_named <- person |> omop_join_name_all()

ERROR

Error in omop_join_name_all(person): could not find function "omop_join_name_all"

R

condition_occurrence_named <- condition_occurrence |> omop_join_name_all()

ERROR

Error in omop_join_name_all(condition_occurrence): could not find function "omop_join_name_all"

R

drug_exposure_named <- person |> omop_join_name_all()

ERROR

Error in omop_join_name_all(person): could not find function "omop_join_name_all"
  1. 25 (or 24 if he hasn’t had his birthday this year)
  2. July
  3. Don’t know - it hasn’t been specified
  4. A 53/54 white female

Joining and interrogating the tables


Using join

We established when looking at the diagram that the person table was the key to accessing all the other tables. In fact it is the person_id column that is the actual key that will allow us to join with other tables.

So we can join two of the tables together to get information about the different conditions suffered by each person.

[R documentation - join][join]

I am going to use a left join because I want a record of every person and the conditions they may have.

R

person_condition <- 
  person_named |> 
  left_join(condition_occurrence_named, by = join_by(person_id) )

ERROR

Error in left_join(person_named, condition_occurrence_named, by = join_by(person_id)): could not find function "left_join"

This produces a new table with all the column names from both tables and six rows.

Using count

Challenge

Count the number of people with each condition

R

person_condition |> count(gender_concept_name, condition_concept_name)

ERROR

Error in count(person_condition, gender_concept_name, condition_concept_name): could not find function "count"

This produces a table:

A table of the condition counts
A table of the condition counts

Challenge

For the Confident Using the “GiBleed” database work out the number of male and female patients with each condition_concept_id

to be done

Challenge

Solutions for working out Who’s who programmatically.

to be done

Key Points

  • Using a standard makes it much easier to share data
  • OMOP uses concepts to link dated to standard vocabularies
  • R can be used to join and interrogate data

join