concepts and conditions

Last updated on 2025-11-04 | Edit this page

Overview

Questions

What is an OMOP concept ?
Where are patient conditions stored in OMOP ?

Objectives

Understand that nearly everything in a hospital can be represented by an OMOP concept_id.
Know that OMOP data usually includes the OMOP concept table and other data from the vocabularies
Be able to look up concepts by their name
Know that patient conditions are stored in the condition_occurrence table

Introduction

Nearly everything in a hospital can be represented by an OMOP concept_id.

Any column within the OMOP CDM named *concept_id contains OMOP concept IDs. An OMOP concept_id is a unique integer identifier. Concept_ids are defined in the OMOP concept table where a corresponding name and other attributes are stored. OMOP contains concept_ids for other medical vocabularies such as SNOMED and LOINC, which OMOP terms as source vocabularies.

concept table columns	Description
concept_id	Unique identifier for the concept.
concept_name	Name or description of the concept.
domain_id	The domain to which the concept belongs (e.g. Condition, Drug).
vocabulary_id	The vocabulary from which the concept originates (e.g. SNOMED, RxNorm).
concept_class_id	Classification within the vocabulary (e.g. Clinical Finding, Ingredient).
standard_concept	‘S’ for standard concepts that can be included in network studies
concept_code	Code used by the source vocabulary to identify the concept.
valid_start_date	Date the concept became valid in OMOP.
valid_end_date	Date the concept ceased to be valid.
invalid_reason	Reason for invalidation, if applicable

Looking up OMOP concepts

OMOP concepts can be looked up in Athena an online tool provided by OHDSI.

The CDMConnector package allows connection to an OMOP Common Data Model in a database. It also contains synthetic example data that can be used to demonstrate querying the data.

Discussion

In the previous lesson we set up the CDMConnector package to connect to an OMOP Common Data Model database and used it to look at the concepts table.

R

library(CDMConnector)

dbName <- "GiBleed"
CDMConnector::requireEunomia(datasetName = dbName)

OUTPUT


Download completed!

R

db <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomiaDir(datasetName = dbName))

cdm <- CDMConnector::cdmFromCon(con = db, cdmSchema = "main", writeSchema = "main")

The data themselves are not actually read into the created cdm object. Rather it is a reference that allows us to query the data from the database.

Typing cdm will give a summary of the tables in the database and we can look at these individually using the $ operator.

R

cdm

R

cdm$person

OUTPUT

# Source:   table<person> [?? x 18]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1036-azure:R 4.5.1//tmp/RtmpFoKwkX/file491017c29121.duckdb]
   person_id gender_concept_id year_of_birth month_of_birth day_of_birth
       <int>             <int>         <int>          <int>        <int>
 1         6              8532          1963             12           31
 2       123              8507          1950              4           12
 3       129              8507          1974             10            7
 4        16              8532          1971             10           13
 5        65              8532          1967              3           31
 6        74              8532          1972              1            5
 7        42              8532          1909             11            2
 8       187              8507          1945              7           23
 9        18              8532          1965             11           17
10       111              8532          1975              5            2
# ℹ more rows
# ℹ 13 more variables: birth_datetime <dttm>, race_concept_id <int>,
#   ethnicity_concept_id <int>, location_id <int>, provider_id <int>,
#   care_site_id <int>, person_source_value <chr>, gender_source_value <chr>,
#   gender_source_concept_id <int>, race_source_value <chr>,
#   race_source_concept_id <int>, ethnicity_source_value <chr>,
#   ethnicity_source_concept_id <int>

Many of the columns in the OMOP CDM tables contain concept_ids. For example the condition_occurrence table contains a condition_concept_id column that contains the concept_id for the patient’s condition.

Discussion

Challenge

Using the functions count, filter and select from the dplyr package find

How many concepts are included in this dataset?
What are the top 5 most common conditions in the condition_occurrence table.
How many people were born after 1984?
How many distinct vocabularies are used in the concept table?
(Optional) Look up the concept_id for “Type 2 diabetes mellitus” in Athena. How many people in this dataset have this condition? ::::::::::::::::::::::::::::::::::::::::::::::::::: solution

R

library(dplyr)
# 1. How many concepts are included in this dataset?
cdm$concept %>%
  summarise(n = n_distinct(concept_id))

OUTPUT

# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1036-azure:R 4.5.1//tmp/RtmpFoKwkX/file491017c29121.duckdb]
      n
  <dbl>
1   444

Answer: There are 444 concepts in this dataset. Remember it is not the full set of OMOP concepts.

R

# 2. What are the top 5 most common conditions in the condition_occurrence table.
cdm$condition_occurrence %>%
  dplyr::count(condition_concept_id, sort = TRUE) %>%                 # creates n
  dplyr::mutate(rn = dplyr::row_number(dplyr::desc(n))) %>%          # window fn
  dplyr::filter(rn <= 5) %>%
  dplyr::left_join(cdm$concept, by = c("condition_concept_id" = "concept_id")) %>%
  dplyr::select(condition_concept_id, concept_name, n) %>%
  dplyr::arrange(dplyr::desc(n))

OUTPUT

# Source:     SQL [?? x 3]
# Database:   DuckDB 1.4.1 [unknown@Linux 6.8.0-1036-azure:R 4.5.1//tmp/RtmpFoKwkX/file491017c29121.duckdb]
# Ordered by: dplyr::desc(n)
  condition_concept_id concept_name                n
                 <int> <chr>                   <dbl>
1               260139 Acute bronchitis         8184
2             40481087 Viral sinusitis         17268
3               372328 Otitis media             3605
4              4112343 Acute viral pharyngitis 10217
5                80180 Osteoarthritis           2694

Answer: The top 5 most common conditions are:
| condition_concept_id | concept_name | n | |———————-|—————————–|——-| | 260139 | Acute bronchitis | 8184 | | 40481087 | Viral sinusitis | 17268 | | 372328 | Otitis media | 3605 | | 4112343 | Acute viral pharyngitis |10217 | |80180 | Osteoarthritis | 2694 |

R

# 3. How many people were born after 1984?
cdm$person |>
  dplyr::filter(year_of_birth > 1984) |>
  dplyr::count()

OUTPUT

# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1036-azure:R 4.5.1//tmp/RtmpFoKwkX/file491017c29121.duckdb]
      n
  <dbl>
1     6

Answer: There are 6 people born after 1984 in this dataset.

R

# 4. How many distinct vocabularies are used in the concept table?
cdm$vocabulary %>% summarise(n_distinct_vocabularies = n_distinct(vocabulary_id ))

OUTPUT

# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1036-azure:R 4.5.1//tmp/RtmpFoKwkX/file491017c29121.duckdb]
  n_distinct_vocabularies
                    <dbl>
1                     125

Answer: There are 125 distinct vocabularies used in this dataset.

:::::::::::::::::::::::::::::::::::::::::::::::::::

Key Points

Understand that nearly everything in a hospital can be represented by an OMOP concept_id.
Know that OMOP data usually includes the OMOP concept table and other data from the vocabularies
Be able to look up concepts by their name