Exploring OMOP concepts with R
Last updated on 2026-02-08 | Edit this page
Estimated time: 0 minutes
Overview
Questions
- Find the
vocabulary,domainandconcept_classfor a givenconcept_id - Establish whether a
concept_idis a standard concept - Find all concepts within a given domain
- Find all concepts within a given vocabulary
Objectives
- Understand that concepts have additional attributes such as vocabulary, domain, classand standard concept status
- Use R to query the
concepttable for specific attributes of concepts - Filter concepts based on domain, vocabulary and class
- Identify standard concepts within the OMOP vocabulary
Introduction
The primary purpose of the concept table is to provide a
standardised representation of medical Concepts, allowing for consistent
querying and analysis across healthcare databases. Users can join the
concept table with other tables in the CDM to enrich
clinical data with Concept information or use the concept
table as a reference for mapping clinical data from source terminologies
to Standard or other Concepts.
An OMOP concept_id is a unique integer identifier.
Concept_ids are defined in the OMOP concept table where a
corresponding name and other attributes are stored. OMOP contains
concept_ids for other medical vocabularies such as SNOMED and LOINC,
which OMOP terms as source vocabularies.
Nearly everything in a hospital can be represented by an OMOP
concept_id.
Looking up OMOP concepts
OMOP concepts can be looked up in Athena an online tool provided by OHDSI.
The CDMConnector package allows connection to an OMOP Common Data Model in a database. It also contains synthetic example data that can be used to demonstrate querying the data.
In the previous episode we set up the CDMConnector package to connect
to an OMOP Common Data Model database and used it to look at the
concepts table. We also created the function
get_concept_name() to get a humanly readable name for a
concept_id. We will use these again in this episode.
Setting up the connection
R
library(CDMConnector)
db_name <- "GiBleed"
CDMConnector::requireEunomia(datasetName = db_name)
OUTPUT
Download completed!
R
db <- DBI::dbConnect(duckdb::duckdb(),
dbdir = CDMConnector::eunomiaDir(datasetName = db_name))
cdm <- CDMConnector::cdmFromCon(con = db, cdmSchema = "main",
writeSchema = "main")
Exploring the concept table
R
colnames(cdm$concept)
OUTPUT
[1] "concept_id" "concept_name" "domain_id" "vocabulary_id"
[5] "concept_class_id" "standard_concept" "concept_code" "valid_start_date"
[9] "valid_end_date" "invalid_reason"
| concept table columns | Description |
|---|---|
| concept_id | Unique identifier for the concept. |
| concept_name | Name or description of the concept. |
| domain_id | The domain to which the concept belongs (e.g. Condition, Drug). |
| vocabulary_id | The vocabulary from which the concept originates (e.g. SNOMED, RxNorm). |
| concept_class_id | Classification within the vocabulary (e.g. Clinical Finding, Ingredient). |
| standard_concept | ‘S’ for standard concepts that come from internationally accepted standard vocabularies. |
| concept_code | Code used by the source vocabulary to identify the concept. |
| valid_start_date | Date the concept became valid in OMOP. |
| valid_end_date | Date the concept ceased to be valid. |
| invalid_reason | Reason for invalidation, if applicable |
The concept table is the main table for looking up
information about concepts. We can use R to query the
concept table for specific attributes of concepts.
Challenge
Answer the following questions using R and the concept
table:
How many entries are there in the
concepttable?How many distinct vocabularies are there in the
concepttable?How many distinct domains other than ‘None’ are there in the
concepttable?How many distinct concept_classes are there in the
concepttable?
- How many entries are there in the
concepttable?
R
library(dplyr)
cdm$concept |>
summarise(n_concepts = n())
OUTPUT
# Source: SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1044-azure:R 4.5.2//tmp/Rtmpd8tQ6t/file17767ba676cb.duckdb]
n_concepts
<dbl>
1 444
Answer: There are 444 entries in the
concept table. This is a tiny fraction of the overall table
which can be found at Athena
CODING_NOTE: The function n() counts
the number of rows in the table and summarise() creates a
summary table with that count. These functions are part of the
dplyr package. When you have loaded the library once your
environment will remember it for the rest of the session.
- How many distinct vocabularies are there in the
concepttable?
R
cdm$concept |>
summarise(n_distinct_vocabularies = n_distinct(vocabulary_id))
OUTPUT
# Source: SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1044-azure:R 4.5.2//tmp/Rtmpd8tQ6t/file17767ba676cb.duckdb]
n_distinct_vocabularies
<dbl>
1 9
Answer: There are 9 distinct vocabularies used in this dataset.
CODING_NOTE: The function n_distinct(x)
counts the number of distinct values in the column x.
- How many distinct domains other than ‘None’ are there in the
concepttable?
R
cdm$concept |>
filter(domain_id != "None") |>
summarise(n_distinct_domains = n_distinct(domain_id))
OUTPUT
# Source: SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1044-azure:R 4.5.2//tmp/Rtmpd8tQ6t/file17767ba676cb.duckdb]
n_distinct_domains
<dbl>
1 8
Answer: There are 8 distinct domains other than ‘None’ in this dataset.
CODING_NOTE: We use the filter()
function to filter out rows where the domain_id is ‘None’ before
counting the distinct domains.
- How many distinct concept_classes are there in the
concepttable?
R
cdm$concept |>
summarise(n_distinct_concept_classes = n_distinct(concept_class_id))
OUTPUT
# Source: SQL [?? x 1]
# Database: DuckDB 1.4.1 [unknown@Linux 6.8.0-1044-azure:R 4.5.2//tmp/Rtmpd8tQ6t/file17767ba676cb.duckdb]
n_distinct_concept_classes
<dbl>
1 21
Answer: There are 21 distinct concept_classes used in this dataset.
Filtering concepts by domain, vocabulary, class and standard concept status
Let’s look into filtering concepts based on their domain, vocabulary, concept_class and standard_concept status.
Challenge
List the first ten rows of the concept table, listing
only the concept_id, domain_id,
vocabulary_id, concept_class_id and
standard_concept columns.
R
cdm$concept |>
arrange(concept_id) |>
filter(row_number() <= 10) |>
select(concept_id, domain_id, vocabulary_id, concept_class_id, standard_concept) |>
collect()
OUTPUT
# A tibble: 10 × 5
concept_id domain_id vocabulary_id concept_class_id standard_concept
<int> <chr> <chr> <chr> <chr>
1 0 Metadata None Undefined <NA>
2 8507 Gender Gender Gender S
3 8532 Gender Gender Gender S
4 9201 Visit Visit Visit S
5 9202 Visit Visit Visit S
6 9203 Visit Visit Visit S
7 28060 Condition SNOMED Clinical Finding S
8 30753 Condition SNOMED Clinical Finding S
9 78272 Condition SNOMED Clinical Finding S
10 80180 Condition SNOMED Clinical Finding S
CODING_NOTE: The arrange() function
orders the rows by concept_id. The
filter(row_number() <= 10) function filters to the first
10 rows. The select() function selects only the specified
columns. We have to use collect() to pull the data into R
memory to view it. This is because we are querying a remote database,
not one that is local..
Look at vocabularies
Vocabulary: The source or system of coding for concepts, such as SNOMED, RxNorm, LOINC, or ICD‑10. OMOP maps many vocabularies into a common, standardised set so different coding systems can be analysed together.
Challenge
List all distinct vocabularies in the concept table.
R
cdm$concept |>
filter(!is.na(vocabulary_id)) |>
distinct(vocabulary_id) |>
arrange(vocabulary_id) |>
pull(vocabulary_id)
OUTPUT
[1] "ICD10CM" "LOINC" "NDC" "Visit" "Gender" "RxNorm" "CVX"
[8] "SNOMED" "None"
CODING_NOTE: Here we can use pull(x) to
pull the data x into R memory to view it. This is because we are only
requiring one column of data, so we can pull that column directly into R
memory without needing to use collect() first.
Look at domains
Domain: A high‑level category that groups concepts by what they represent in clinical data, such as Condition, Drug, Procedure, Measurement, or Observation. A concept’s domain determines which OMOP table it belongs to and how it’s used analytically.
Challenge
List all distinct domains in the concept table.
R
cdm$concept |>
filter(!is.na(domain_id)) |>
distinct(domain_id) |>
arrange(domain_id) |>
pull(domain_id)
OUTPUT
[1] "Drug" "Measurement" "Condition" "Procedure" "Observation"
[6] "Visit" "Metadata" "Gender"
Look at concept classes
Class: A lower level category that groups concepts within a domain by what they represent in clinical data.
Challenge
List all distinct concept_classes in the concept
table.
R
cdm$concept |>
filter(!is.na(concept_class_id)) |>
distinct(concept_class_id) |>
arrange(concept_class_id) |>
pull(concept_class_id)
OUTPUT
[1] "Branded Drug" "3-char nonbill code" "Quant Branded Drug"
[4] "Branded Drug Comp" "Visit" "Context-dependent"
[7] "Undefined" "Morph Abnormality" "4-char billing code"
[10] "Procedure" "Lab Test" "Clinical Drug"
[13] "Clinical Finding" "Clinical Observation" "Quant Clinical Drug"
[16] "CVX" "Ingredient" "11-digit NDC"
[19] "Branded Pack" "Clinical Drug Comp" "Gender"
Look at non standard concepts
A standard concept is the preferred, harmonised code in OMOP that
represents a clinical idea across vocabularies. Standard concepts
(standard_concept = “S”) are the target of mappings from source codes,
and they define which domain and table the data belong to for consistent
analysis. However, OMOP also include nonstandard concepts from sources
that are not globally used but maybe useful locally. dm+d,
the NHS Dictionary of Medicines and Devices is one such vocabulary that
is included in OMOP but is not a standard vocabulary.
Challenge
Find any nonstandard concepts (i.e. concepts where standard_concept
is not ‘S’) by filtering the concept table. List the first 10
concept_ids of nonstandard concepts. Then look up their
concept_name, domain_id, vocabulary_id and standard_concept status.
R
cdm$concept |>
filter(is.na(standard_concept) | standard_concept != "S") |>
slice_min(order_by = concept_id, n = 10, with_ties = FALSE) |>
pull(concept_id)
OUTPUT
[1] 0 1569708 35208414 44923712 45011828
Answer: There are only four nonstandard concepts in this dataset: 1569708, 35208414, 44923712, 45011828.
CODING_NOTE: We use slice_min() to get
the first 10 rows which match the filter, ordered by
concept_id.
R
cdm$concept |>
filter(concept_id %in% c(1569708, 35208414, 44923712, 45011828)) |>
select(concept_id, concept_name, domain_id, vocabulary_id, standard_concept) |>
collect()
OUTPUT
# A tibble: 4 × 5
concept_id concept_name domain_id vocabulary_id standard_concept
<int> <chr> <chr> <chr> <chr>
1 35208414 Gastrointestinal hemorrha… Condition ICD10CM <NA>
2 44923712 celecoxib 200 MG Oral Cap… Drug NDC <NA>
3 1569708 Other diseases of digesti… Condition ICD10CM <NA>
4 45011828 Diclofenac Sodium 75 MG D… Drug NDC <NA>
CODING_NOTE: We use %in% to filter for
multiple concept_ids which we can list in a vector.
Now you should be able to replicate our get_concept_name() function to look up other attributes of concepts such as domain, vocabulary and standard concept status.
Challenge
Find the domain,vocabulary and concept class for
concept_id 35208414
Is this concept a standard concept?
R
library(dplyr)
get_concept_domain <- function(id) {
cdm$concept |>
filter(concept_id == !!id) |>
select(domain_id) |>
pull()
}
get_concept_vocabulary <- function(id) {
cdm$concept |>
filter(concept_id == !!id) |>
select(vocabulary_id) |>
pull()
}
get_concept_concept_class <- function(id) {
cdm$concept |>
filter(concept_id == !!id) |>
select(concept_class_id) |>
pull()
}
get_concept_standard_status <- function(id) {
cdm$concept |>
filter(concept_id == !!id) |>
select(standard_concept) |>
pull()
}
get_concept_domain(35208414)
OUTPUT
[1] "Condition"
R
get_concept_vocabulary(35208414)
OUTPUT
[1] "ICD10CM"
R
get_concept_concept_class(35208414)
OUTPUT
[1] "4-char billing code"
R
get_concept_standard_status(35208414)
OUTPUT
[1] NA
Answer:
The domain for
concept_id319835 is ‘Condition’.The vocabulary for
concept_id319835 is ‘ICD10CM’.The concept class for concept id 319835 is ‘4-char billing code’
This concept is not a standard concept (standard_concept = ‘NA’).
- Concepts have additional attributes such as vocabulary, domain, and standard concept status
- The
concepttable can be queried using R to retrieve specific attributes of concepts - Concepts can be filtered based on their domain, vocabulary and class
- Standard concepts are those that are recommended for use in analyses within the OMOP framework