Content from What is OMOP?


Last updated on 2025-12-03 | Edit this page

Overview

Questions

  • What is OMOP?
  • Why is using a standard important in healthcare data?
  • How do OMOP tables relate to each other?
  • What are concept_ids and how can we get an humanly readable name for them?

Objectives

  • Examine the diagram of the OMOP tables and the data specification
  • Understand OMOP standardization and vocabularies
  • Connect to an OMOP database and explore the concept table
  • Get a humanly readable name for a concept_id

Setting up R


Getting started

The “Projects” interface in RStudio not only creates a working directory for you, but also remembers its location (allowing you to quickly navigate to it). The interface also (optionally) preserves custom settings and open files to make it easier to resume work after a break.

Create a new project

Connect to a database

We will be using the CDMConnector package to connect to an OMOP Common Data Model database. This package also contains synthetic example data that can be used to demonstrate querying the data.

R

# Libraries
library(CDMConnector)
library(DBI)
library(duckdb)
library(dplyr)
library(dbplyr)

# Connect to GiBleed if not already connected
if (!exists("cdm") || !inherits(cdm, "cdm_reference")) {
  db_name <- "GiBleed"
  CDMConnector::requireEunomia(datasetName = db_name)
  con <- DBI::dbConnect(duckdb::duckdb(),
                        dbdir = CDMConnector::eunomiaDir(datasetName = db_name))
  cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", writeSchema = "main")
}

OUTPUT


Download completed!

Introduction


OMOP is a format for recording Electronic Healthcare Records. It allows you to follow a patient journey through a hospital by linking every aspect to a standard vocabulary thus enabling easy sharing of data between hospitals, trusts and even countries.

OMOP CDM Diagram

A diagram showing the tables that occur in the OMOP-CDM , how they relate to each other and standard vocabularies.
The OMOP Common Data Model

OMOP CDM stands for the Observational Medical Outcomes Partnership Common Data Model. You don’t really need to remember what OMOP stands for. Remembering that CDM stands for Common Data Model can help you remember that it is a data standard that can be applied to different data sources to create data in a Common (same) format. The table diagram will look confusing to start with but you can use data in the OMOP CDM without needing to understand (or populate) all 37 tables.

Challenge

Challenge

Look at the OMOP-CDM figure and answer the following questions:

  1. Which table is the key to all the other tables?

  2. Which table allows you to distinguish between different stays in hospital?

  1. The Person table

  2. The Visit_occurrence table

Why use OMOP?


A diagram showing that different sources of data, transformed to OMOP, can then be used by multiple analysis tools.
Why use the OMOP-CDM

Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. This means that different tools can also be shared and reused. So using OMOP can help make your research FAIR.

Read in the database as above.

The data themselves are not actually read into the created cdm object. Rather it is a reference that allows us to query the data from the database.

Typing names(cdm) will give a summary of the tables in the database and we can look at these individually using the $ operator and the colnames command.

OMOP Tables

R

names(cdm)

OUTPUT

 [1] "person"                "observation_period"    "visit_occurrence"
 [4] "visit_detail"          "condition_occurrence"  "drug_exposure"
 [7] "procedure_occurrence"  "device_exposure"       "measurement"
[10] "observation"           "death"                 "note"
[13] "note_nlp"              "specimen"              "fact_relationship"
[16] "location"              "care_site"             "provider"
[19] "payer_plan_period"     "cost"                  "drug_era"
[22] "dose_era"              "condition_era"         "metadata"
[25] "cdm_source"            "concept"               "vocabulary"
[28] "domain"                "concept_class"         "concept_relationship"
[31] "relationship"          "concept_synonym"       "concept_ancestor"
[34] "source_to_concept_map" "drug_strength"        

Looking at the column names in each table

R

colnames(cdm$person)

OUTPUT

 [1] "person_id"                   "gender_concept_id"
 [3] "year_of_birth"               "month_of_birth"
 [5] "day_of_birth"                "birth_datetime"
 [7] "race_concept_id"             "ethnicity_concept_id"
 [9] "location_id"                 "provider_id"
[11] "care_site_id"                "person_source_value"
[13] "gender_source_value"         "gender_source_concept_id"
[15] "race_source_value"           "race_source_concept_id"
[17] "ethnicity_source_value"      "ethnicity_source_concept_id"
Challenge

Challenge

Question How do you think the visit_occurrence table is used to connect to the person table?

R

colnames(cdm$visit_occurrence)

OUTPUT

 [1] "visit_occurrence_id"           "person_id"
 [3] "visit_concept_id"              "visit_start_date"
 [5] "visit_start_datetime"          "visit_end_date"
 [7] "visit_end_datetime"            "visit_type_concept_id"
 [9] "provider_id"                   "care_site_id"
[11] "visit_source_value"            "visit_source_concept_id"
[13] "admitting_source_concept_id"   "admitting_source_value"
[15] "discharge_to_concept_id"       "discharge_to_source_value"
[17] "preceding_visit_occurrence_id"

Looking at both tables we can see that they both have a column labelled person_id which could be used to link them together.

Notice that the visit_concept_id column in the visit_occurrence table is also a concept_id. This concept_id can be used to find out more information about the type of visit (e.g. inpatient, outpatient etc) by looking it up in the concept table. In this case the visit_concept_id is 9201 which relates to an inpatient visit. We can find this out by filtering the concept table for concept_id 9201 and selecting the concept_name column.

R

cdm$concept |>
  filter(concept_id == 9201) |>
  select(concept_name)

OUTPUT

# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1041-azure:R 4.5.2//tmp/Rtmpir2lFS/file16e64382d7af.duckdb]
  concept_name
  <chr>
1 Inpatient Visit

A useful function

Finding the humanly readable name for a concept_id will be a useful function. We can create a function get_concept_name() that takes a concept_id as input and returns the concept_name.

Challenge

Challenge

Create the function get_concept_name() that takes a concept_id as input and returns the concept_name.

R

get_concept_name <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(concept_name) |>
    pull()
}

Explanation of function code

  • The function is called get_concept_name and it takes one argument, id.
  • Inside the function, we query the concept table from the cdm object.
  • We use the filter function to select rows where the concept_id matches the input id. The !! operator is used to unquote the variable so that its value is used in the filter.
  • We then use select to choose only the concept_name column from the filtered results.
  • Finally, we use pull() to extract the concept_name as a vector, which is returned by the function.

Other useful tables


There are also other tables which will give you other information about concepts.

R

colnames(cdm$concept)

OUTPUT

 [1] "concept_id"       "concept_name"     "domain_id"        "vocabulary_id"
 [5] "concept_class_id" "standard_concept" "concept_code"     "valid_start_date"
 [9] "valid_end_date"   "invalid_reason"  

R

colnames(cdm$domain)

OUTPUT

[1] "domain_id"         "domain_name"       "domain_concept_id"

R

colnames(cdm$vocabulary)

OUTPUT

[1] "vocabulary_id"         "vocabulary_name"       "vocabulary_reference"
[4] "vocabulary_version"    "vocabulary_concept_id"
Key Points
  • Using a standard makes it much easier to share data
  • OMOP uses concepts to link different tables together
  • The concept table contains humanly readable names for concept_ids

Content from Exploring OMOP concepts with R


Last updated on 2025-12-03 | Edit this page

Overview

Questions

  • Find the vocabulary and domain for a given concept_id
  • Establish whether a concept_id is a standard concept
  • Find all concepts within a given domain
  • Find all concepts within a given vocabulary

Objectives

  • Understand that concepts have additional attributes such as vocabulary, domain, and standard concept status
  • Use R to query the concept table for specific attributes of concepts
  • Filter concepts based on domain and vocabulary
  • Identify standard concepts within the OMOP vocabulary

Introduction


Nearly everything in a hospital can be represented by an OMOP concept_id.

Any column within the OMOP CDM named *concept_id contains OMOP concept IDs. An OMOP concept_id is a unique integer identifier. Concept_ids are defined in the OMOP concept table where a corresponding name and other attributes are stored. OMOP contains concept_ids for other medical vocabularies such as SNOMED and LOINC, which OMOP terms as source vocabularies.

Looking up OMOP concepts


OMOP concepts can be looked up in Athena an online tool provided by OHDSI.

The CDMConnector package allows connection to an OMOP Common Data Model in a database. It also contains synthetic example data that can be used to demonstrate querying the data.

In the previous lesson we set up the CDMConnector package to connect to an OMOP Common Data Model database and used it to look at the concepts table. We also created the function get_concept_name() to get a humanly readable name for a concept_id. ### Setting up the connection

R

library(CDMConnector)

db_name <- "GiBleed"
CDMConnector::requireEunomia(datasetName = db_name)

OUTPUT


Download completed!

R

db <- DBI::dbConnect(duckdb::duckdb(),
                     dbdir = CDMConnector::eunomiaDir(datasetName = db_name))

cdm <- CDMConnector::cdmFromCon(con = db, cdmSchema = "main",
                                writeSchema = "main")

Exploring the concept table


R

colnames(cdm$concept)

OUTPUT

 [1] "concept_id"       "concept_name"     "domain_id"        "vocabulary_id"
 [5] "concept_class_id" "standard_concept" "concept_code"     "valid_start_date"
 [9] "valid_end_date"   "invalid_reason"  
concept table columns Description
concept_id Unique identifier for the concept.
concept_name Name or description of the concept.
domain_id The domain to which the concept belongs (e.g. Condition, Drug).
vocabulary_id The vocabulary from which the concept originates (e.g. SNOMED, RxNorm).
concept_class_id Classification within the vocabulary (e.g. Clinical Finding, Ingredient).
standard_concept ‘S’ for standard concepts that can be included in network studies
concept_code Code used by the source vocabulary to identify the concept.
valid_start_date Date the concept became valid in OMOP.
valid_end_date Date the concept ceased to be valid.
invalid_reason Reason for invalidation, if applicable

The concept table is the main table for looking up information about concepts. We can use R to query the concept table for specific attributes of concepts.

Challenge

Challenge

Answer the following questions using R and the concept table:

  1. How many entries are there in the concept table?

  2. How many distinct vocabularies are there in the concept table?

  3. How many distinct domains other than ‘None’ are there in the concept table?

R

# 1. How many entries are there in the concept table?
cdm$concept %>% summarise(n_concepts = n())

ERROR

Error in summarise(., n_concepts = n()): could not find function "summarise"

Answer: There are 444 entries in the concept table. This is a tiny fraction of the overall table which can be found at Athena

R

# 2. How many distinct vocabularies are there in the concept table?
cdm$concept %>% summarise(n_distinct_vocabularies = n_distinct(vocabulary_id))

ERROR

Error in summarise(., n_distinct_vocabularies = n_distinct(vocabulary_id)): could not find function "summarise"

Answer: There are 9 distinct vocabularies used in this dataset.

R

# 3. How many distinct domains other than 'None' are there in the concept table?
cdm$concept %>%
  filter(domain_id != "None") %>%
  summarise(n_distinct_domains = n_distinct(domain_id))

ERROR

Error in summarise(., n_distinct_domains = n_distinct(domain_id)): could not find function "summarise"

Answer: There are 8 distinct domains other than ‘None’ in this dataset.

Filtering concepts by domain, vocabulary and standard concept status


Let’s look into filtering concepts based on their domain and vocabulary.

Challenge

Challenge

Lists the first ten rows of the concept table, listing concept_id, domain_id, vocabulary_id and standard_concept columns.

R

cdm$concept |>
  arrange(concept_id) |>
  filter(row_number() <= 10) |>
  select(concept_id, domain_id, vocabulary_id, standard_concept) |>
  collect()

ERROR

Error in collect(select(filter(arrange(cdm$concept, concept_id), row_number() <= : could not find function "collect"

Note we have to use collect() to pull the data into R memory to view it.

Look at vocabularies

Vocabulary: The source or system of coding for concepts, such as SNOMED, RxNorm, LOINC, or ICD‑10. OMOP maps many vocabularies into a common, standardised set so different coding systems can be analysed together.

Challenge

Challenge

List all distinct vocabularies in the concept table.

R

cdm$concept |>
  filter(!is.na(vocabulary_id)) |>
  distinct(vocabulary_id) |>
  arrange(vocabulary_id) |>
  pull(vocabulary_id)

ERROR

Error in pull(arrange(distinct(filter(cdm$concept, !is.na(vocabulary_id)), : could not find function "pull"

Here we can use pull(x) to pull the data x into R memory to view it.

Look at domains

Domain: A high‑level category that groups concepts by what they represent in clinical data, such as Condition, Drug, Procedure, Measurement, or Observation. A concept’s domain determines which OMOP table it belongs to and how it’s used analytically.

Challenge

Challenge

List all distinct domains in the concept table.

R

cdm$concept |>
  filter(!is.na(domain_id)) |>
  distinct(domain_id) |>
  arrange(domain_id) |>
  pull(domain_id)

ERROR

Error in pull(arrange(distinct(filter(cdm$concept, !is.na(domain_id)), : could not find function "pull"

Look at non standard concepts

A standard concept is the preferred, harmonised code in OMOP that represents a clinical idea across vocabularies. Standard concepts (standard_concept = “S”) are the target of mappings from source codes, and they define which domain and table the data belong to for consistent analysis.

Challenge

Challenge

Find any nonstandard concepts (i.e. concepts where standard_concept is not ‘S’) by filtering the concept table. List the first 10 concept_ids of nonstandard concepts. Then look up their concept_name, domain_id, vocabulary_id and standard_concept status.

R

cdm$concept |>
  filter(is.na(standard_concept) | standard_concept != "S") |>
  slice_min(order_by = concept_id, n = 10, with_ties = FALSE) |>
  pull(concept_id)

ERROR

Error in pull(slice_min(filter(cdm$concept, is.na(standard_concept) | : could not find function "pull"

R

cdm$concept |>
  filter(concept_id %in% c(1569708, 35208414, 44923712, 45011828)) |>
  select(concept_id, concept_name, domain_id, vocabulary_id, standard_concept) |>
  collect()

ERROR

Error in collect(select(filter(cdm$concept, concept_id %in% c(1569708, : could not find function "collect"

Now you should be able to replicate our get_concept_name() function to look up other attributes of concepts such as domain, vocabulary and standard concept status.

Challenge

Challenge

Questions?

Find the domain and vocabulary for concept_id 35208414

Is this concept a standard concept?

R

library(dplyr)
get_concept_domain <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(domain_id) |>
    pull()
}

get_concept_vocabulary <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(vocabulary_id) |>
    pull()
}

get_concept_standard_status <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(standard_concept) |>
    pull()
}

get_concept_domain(35208414)

OUTPUT

[1] "Condition"

R

get_concept_vocabulary(35208414)

OUTPUT

[1] "ICD10CM"

R

get_concept_standard_status(35208414)

OUTPUT

[1] NA

Answer:

  • The domain for concept_id 319835 is ‘Condition’.

  • The vocabulary for concept_id 319835 is ‘ICD10CM’.

  • This concept is not a standard concept (standard_concept = ‘NA’).

Key Points
  • Concepts have additional attributes such as vocabulary, domain, and standard concept status
  • The concept table can be queried using R to retrieve specific attributes of concepts
  • Concepts can be filtered based on their domain and vocabulary
  • Standard concepts are those that are recommended for use in analyses within the OMOP framework

Content from More on concepts


Last updated on 2025-12-03 | Edit this page

Overview

Questions

  • Where to find other concept_ids in OMOP

  • How to link OMOP tables

Objectives

  • Understand that there are many other concept_ids in OMOP tables and that these are usually named with a _concept_id suffix.

  • Learn how to link OMOP tables using common identifiers such as person_id and visit_occurrence_id.

  • Be able to use the concept table to look up humanly readable names for various concept_ids.

  • Use joins to combine data from multiple OMOP tables based on common identifiers.

Introduction


In this episode, we will explore more concepts related to the OMOP Common Data Model (CDM). We will focus on understanding how different tables in the OMOP CDM are linked together through common identifiers. This knowledge is crucial for effectively querying and analysing healthcare data stored in the OMOP format.

Callout

Set up the database connection and import the get_concept_name function

R

library(CDMConnector)

db_name <- "GiBleed"
CDMConnector::requireEunomia(datasetName = db_name)

OUTPUT


Download completed!

R

db <- DBI::dbConnect(duckdb::duckdb(),
                     dbdir = CDMConnector::eunomiaDir(datasetName = db_name))

cdm <- CDMConnector::cdmFromCon(con = db, cdmSchema = "main",
                                writeSchema = "main")

R

library(dplyr)
get_concept_name <- function(id) {
  cdm$concept |>
    filter(concept_id == !!id) |>
    select(concept_name) |>
    pull()
}

Other concept_ids in OMOP


In addition to the concept_id column in various OMOP tables, there are several other columns that use *_concept_ids provide information.

Look at the column names of the person table.

R

colnames(cdm$person)    

OUTPUT

 [1] "person_id"                   "gender_concept_id"
 [3] "year_of_birth"               "month_of_birth"
 [5] "day_of_birth"                "birth_datetime"
 [7] "race_concept_id"             "ethnicity_concept_id"
 [9] "location_id"                 "provider_id"
[11] "care_site_id"                "person_source_value"
[13] "gender_source_value"         "gender_source_concept_id"
[15] "race_source_value"           "race_source_concept_id"
[17] "ethnicity_source_value"      "ethnicity_source_concept_id"

Several of these columns end with _concept_id, such as gender_concept_id, race_concept_id, and ethnicity_concept_id. These columns link to the concept table to provide humanly readable names for the concepts represented by these IDs.

For example, to get the gender name for a person, you can use the gender_concept_id column in the person table and look it up in the concept table.

Unfortunately, the database we are using does not have any concept data relating to the race_concept_id so we have provided a snapshot of the relevant Athena data below.

A snapshot of the Athena concepts table showing some race concept ids.
Snapshot of race concepts
Callout

Use the lists of concept ideas and the code below to create some mini tables.

R

library(dplyr)


patients <- c(1, 2, 3, 5, 6, 7, 9, 11, 12, 16)

visits <- c(79, 107, 262, 493, 561, 630, 771, 943, 1017, 1196)

mini_person <-
  cdm$person |>
    filter(person_id %in% patients) |>
    select(
      person_id, 
      year_of_birth, 
      gender_concept_id, 
      race_concept_id
    ) |>
    arrange(person_id) |>
    collect()

mini_condition_occurrence <-
  cdm$condition_occurrence |>
    filter(visit_occurrence_id %in% visits) |>
    select(
      condition_occurrence_id,
      person_id,
      condition_concept_id,
      condition_start_date,
      visit_occurrence_id
    ) |> 
    arrange(person_id) |>
    collect()    

mini_drug_exposure <-
  cdm$drug_exposure |>
    filter(visit_occurrence_id %in% visits) |>
    select(
      drug_exposure_id,
      person_id,
      drug_concept_id,
      visit_occurrence_id
    ) |> 
    arrange(person_id) |>
    collect()    

Using the tables above, try to find the humanly readable names for the various concept_ids using the get_concept_name() function we created earlier.

Challenge

Challenge

Using the mini_person, mini_condition_occurrence, and mini_drug_exposure tables created above, find the humanly readable names for the various concept_ids using the get_concept_name() function and answer the following questions:

  1. What is the gender of the Asian patient in the mini_person table?

  2. How many men and women are in the table?

  3. What condition is associated with the drug “Aspirin 81 MG Oral Tablet” ?

  1. From the diagram above, we can see that the race_concept_id for an Asian person is 8515. There is only one person in the mini_person table with this race concept. They have a gender_concept_id of 8507. Using the get_concept_name() function,

R

get_concept_name(8507)

OUTPUT

[1] "MALE"

The asian patient is male.

  1. The table is small enough to actually count by hand but also we can use dplyr to count the number of men and women.

R

# Because we are getting our data from a database, let's create a mini version # of the concepts we want

gender_concept <- cdm$concept |> 
  filter(concept_id %in% c(8507, 8532)) |>
  select(concept_id, concept_name) |> 
  collect()

# Now we can join to get the names
mini_person |> 
  left_join(gender_concept, by = c("gender_concept_id" = "concept_id")) |>
  group_by(concept_name) |>
  summarise(count = n()) 

OUTPUT

# A tibble: 2 × 2
  concept_name count
  <chr>        <int>
1 FEMALE           6
2 MALE             4
  1. First, we need to find the concept_id for “Aspirin 81 MG Oral Tablet”.

R

aspirin_concept_id <- cdm$concept |>
  filter(concept_name == "Aspirin 81 MG Oral Tablet") |>
  select(concept_id) |>
  pull()

Next, we can use this concept_id to find the associated condition in the mini_drug_exposure table and then look up the condition name.

R

condition_concept_id <- mini_drug_exposure |>
  filter(drug_concept_id == aspirin_concept_id) |>
  select(person_id) |>
  left_join(mini_condition_occurrence, by = "person_id") |>
  select(condition_concept_id) |>
  pull()  
get_concept_name(condition_concept_id)

OUTPUT

[1] "Otitis media"
Key Points
  • OMOP tables contain many concept_ids, usually named with a _concept_id suffix.

  • The concept table can be used to look up humanly readable names for various concept_ids.

  • OMOP tables can be linked using common identifier.