What is OMOP?

Last updated on 2025-12-03 | Edit this page

Overview

Questions

  • What is OMOP?
  • Why is using a standard important in healthcare data?
  • How do OMOP tables relate to each other?
  • What are concept_ids and how can we get an humanly readable name for them?

Objectives

  • Examine the diagram of the OMOP tables and the data specification
  • Understand OMOP standardization and vocabularies
  • Connect to an OMOP database and explore the concept table
  • Get a humanly readable name for a concept_id

Setting up R


Getting started

The “Projects” interface in RStudio not only creates a working directory for you, but also remembers its location (allowing you to quickly navigate to it). The interface also (optionally) preserves custom settings and open files to make it easier to resume work after a break.

Create a new project

Connect to a database

We will be using the CDMConnector package to connect to an OMOP Common Data Model database. This package also contains synthetic example data that can be used to demonstrate querying the data.

R

# Libraries
library(CDMConnector)
library(DBI)
library(duckdb)
library(dplyr)
library(dbplyr)

# Connect to GiBleed if not already connected
if (!exists("cdm") || !inherits(cdm, "cdm_reference")) {
  db_name <- "GiBleed"
  CDMConnector::requireEunomia(datasetName = db_name)
  con <- DBI::dbConnect(duckdb::duckdb(),
                        dbdir = CDMConnector::eunomiaDir(datasetName = db_name))
  cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", writeSchema = "main")
}

OUTPUT


Download completed!

Introduction


OMOP is a format for recording Electronic Healthcare Records. It allows you to follow a patient journey through a hospital by linking every aspect to a standard vocabulary thus enabling easy sharing of data between hospitals, trusts and even countries.

OMOP CDM Diagram

A diagram showing the tables that occur in the OMOP-CDM , how they relate to each other and standard vocabularies.
The OMOP Common Data Model

OMOP CDM stands for the Observational Medical Outcomes Partnership Common Data Model. You don’t really need to remember what OMOP stands for. Remembering that CDM stands for Common Data Model can help you remember that it is a data standard that can be applied to different data sources to create data in a Common (same) format. The table diagram will look confusing to start with but you can use data in the OMOP CDM without needing to understand (or populate) all 37 tables.

Challenge

Challenge

Look at the OMOP-CDM figure and answer the following questions:

  1. Which table is the key to all the other tables?

  2. Which table allows you to distinguish between different stays in hospital?

  1. The Person table

  2. The Visit_occurrence table

Why use OMOP?


A diagram showing that different sources of data, transformed to OMOP, can then be used by multiple analysis tools.
Why use the OMOP-CDM

Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. This means that different tools can also be shared and reused. So using OMOP can help make your research FAIR.

Read in the database as above.

The data themselves are not actually read into the created cdm object. Rather it is a reference that allows us to query the data from the database.

Typing names(cdm) will give a summary of the tables in the database and we can look at these individually using the $ operator and the colnames command.

OMOP Tables

R

names(cdm)

OUTPUT

 [1] "person"                "observation_period"    "visit_occurrence"
 [4] "visit_detail"          "condition_occurrence"  "drug_exposure"
 [7] "procedure_occurrence"  "device_exposure"       "measurement"
[10] "observation"           "death"                 "note"
[13] "note_nlp"              "specimen"              "fact_relationship"
[16] "location"              "care_site"             "provider"
[19] "payer_plan_period"     "cost"                  "drug_era"
[22] "dose_era"              "condition_era"         "metadata"
[25] "cdm_source"            "concept"               "vocabulary"
[28] "domain"                "concept_class"         "concept_relationship"
[31] "relationship"          "concept_synonym"       "concept_ancestor"
[34] "source_to_concept_map" "drug_strength"        

Looking at the column names in each table

R

colnames(cdm$person)

OUTPUT

 [1] "person_id"                   "gender_concept_id"
 [3] "year_of_birth"               "month_of_birth"
 [5] "day_of_birth"                "birth_datetime"
 [7] "race_concept_id"             "ethnicity_concept_id"
 [9] "location_id"                 "provider_id"
[11] "care_site_id"                "person_source_value"
[13] "gender_source_value"         "gender_source_concept_id"
[15] "race_source_value"           "race_source_concept_id"
[17] "ethnicity_source_value"      "ethnicity_source_concept_id"
Challenge

Challenge

Question How do you think the visit_occurrence table is used to connect to the person table?

R

colnames(cdm$visit_occurrence)

OUTPUT

 [1] "visit_occurrence_id"           "person_id"
 [3] "visit_concept_id"              "visit_start_date"
 [5] "visit_start_datetime"          "visit_end_date"
 [7] "visit_end_datetime"            "visit_type_concept_id"
 [9] "provider_id"                   "care_site_id"
[11] "visit_source_value"            "visit_source_concept_id"
[13] "admitting_source_concept_id"   "admitting_source_value"
[15] "discharge_to_concept_id"       "discharge_to_source_value"
[17] "preceding_visit_occurrence_id"

Looking at both tables we can see that they both have a column labelled person_id which could be used to link them together.

Notice that the visit_concept_id column in the visit_occurrence table is also a concept_id. This concept_id can be used to find out more information about the type of visit (e.g. inpatient, outpatient etc) by looking it up in the concept table. In this case the visit_concept_id is 9201 which relates to an inpatient visit. We can find this out by filtering the concept table for concept_id 9201 and selecting the concept_name column.

R

cdm$concept |>
  filter(concept_id == 9201) |>
  select(concept_name)

OUTPUT

# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1041-azure:R 4.5.2//tmp/Rtmpir2lFS/file16e64382d7af.duckdb]
  concept_name
  <chr>
1 Inpatient Visit

A useful function

Finding the humanly readable name for a concept_id will be a useful function. We can create a function get_concept_name() that takes a concept_id as input and returns the concept_name.

Challenge

Challenge

Create the function get_concept_name() that takes a concept_id as input and returns the concept_name.

R

get_concept_name <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(concept_name) |>
    pull()
}

Explanation of function code

  • The function is called get_concept_name and it takes one argument, id.
  • Inside the function, we query the concept table from the cdm object.
  • We use the filter function to select rows where the concept_id matches the input id. The !! operator is used to unquote the variable so that its value is used in the filter.
  • We then use select to choose only the concept_name column from the filtered results.
  • Finally, we use pull() to extract the concept_name as a vector, which is returned by the function.

Other useful tables


There are also other tables which will give you other information about concepts.

R

colnames(cdm$concept)

OUTPUT

 [1] "concept_id"       "concept_name"     "domain_id"        "vocabulary_id"
 [5] "concept_class_id" "standard_concept" "concept_code"     "valid_start_date"
 [9] "valid_end_date"   "invalid_reason"  

R

colnames(cdm$domain)

OUTPUT

[1] "domain_id"         "domain_name"       "domain_concept_id"

R

colnames(cdm$vocabulary)

OUTPUT

[1] "vocabulary_id"         "vocabulary_name"       "vocabulary_reference"
[4] "vocabulary_version"    "vocabulary_concept_id"
Key Points
  • Using a standard makes it much easier to share data
  • OMOP uses concepts to link different tables together
  • The concept table contains humanly readable names for concept_ids