OmopSketch

WARNING: this package is under-development and has only been tested using mock data

The goal of OmopSketch is to characterise and visualise an OMOP CDM instance to asses if it meets the necessary criteria to answer a specific clinical question and conduct a certain study.

Installation

You can install the development version of OmopSketch from GitHub with:

# install.packages("remotes")
remotes::install_github("OHDSI/OmopSketch")

Example

Let’s start by creating a cdm object using the Eunomia mock dataset:

library(duckdb)
#> Loading required package: DBI
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(OmopSketch)
con <- dbConnect(duckdb(), eunomia_dir())
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main")
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Snapshot

We first create a snapshot of our database. This will allow us to track when the analysis has been conducted and capture details about the CDM version or the data release.

summariseOmopSnapshot(cdm) |>
  tableOmopSnapshot(type = "flextable")
#> ! Results have not been suppressed.

Characterise the clinical tables

Once we have collected the snapshot information, we can start characterising the clinical tables of the CDM. By using summariseClinicalRecords() and tableClinicalRecords(), we can easily visualise the main characteristics of specific clinical tables.

summariseClinicalRecords(cdm, c("condition_occurrence", "drug_exposure")) |>
  tableClinicalRecords(type = "flextable")
#> ℹ Summarising table counts
#> ℹ The following estimates will be computed:
#> → Start summary of data, at 2024-09-25 12:14:06.676817
#> 
#> ✔ Summary finished, at 2024-09-25 12:14:06.815944
#> ℹ Summarising records per person
#> ℹ The following estimates will be computed:
#> • records_per_person: mean, sd, median, q25, q75, min, max
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2024-09-25 12:14:07.908258
#> 
#> ✔ Summary finished, at 2024-09-25 12:14:07.955041
#> ℹ Summarising in_observation, standard, domain_id, and type information
#> ℹ Summarising table counts
#> ℹ The following estimates will be computed:
#> → Start summary of data, at 2024-09-25 12:14:11.725276
#> 
#> ✔ Summary finished, at 2024-09-25 12:14:11.877293
#> ℹ Summarising records per person
#> ℹ The following estimates will be computed:
#> • records_per_person: mean, sd, median, q25, q75, min, max
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2024-09-25 12:14:12.808874
#> 
#> ✔ Summary finished, at 2024-09-25 12:14:12.850686
#> ℹ Summarising in_observation, standard, domain_id, and type information
#> ! Results have not been suppressed.

We can also explore trends in the clinical table records over time.

summariseRecordCount(cdm, c("condition_occurrence", "drug_exposure")) |>
  plotRecordCount(facet = "omop_table")
#> ! The following column type were changed:
#> • variable_level: from double to character

### Characterise the observation period After visualising the main characteristics of our clinical tables, we can explore the observation period details. OmopSketch provides several functions to have an overview of the dataset study period.

Using summariseInObservation() and plotInObservation(), we can gather information on the number of records per year.

summariseInObservation(cdm$observation_period, output = "records") |>
  plotInObservation()
#> ! The following column type were changed:
#> • variable_level: from double to character

You can also visualise and explore the characteristics of the observation period per each individual in the database using summariseObservationPeriod().

summariseObservationPeriod(cdm$observation_period) |>
  tableObservationPeriod(type = "flextable")
#> ! Results have not been suppressed.

Or if visualisation is prefered, you can easily build a histogram to explore how many participants have more than one observation period.

summariseObservationPeriod(cdm$observation_period) |>
  plotObservationPeriod()

Characterise the concepts

OmopSketch also provides functions to explore some of (or all) the concepts in the dataset.

acetaminophen <- c(1125315,  1127433, 1127078)

summariseConceptSetCounts(cdm, conceptSet = list("acetaminophen" = acetaminophen)) |>
  filter(estimate_name == "record_count") |> 
  plotConceptCounts()
#> ℹ Getting use of codes from acetaminophen
#> ! The following column type were changed:
#> • variable_name: from integer to character

Characterise the population

Finally, OmopSketch can also help us to characterise the population at the start and end of the observation period.

summarisePopulationCharacteristics(cdm) |>
  tablePopulationCharacteristics(type = "flextable")
#> Warning: ! 1 casted column in og_015_1727262876 (cohort_set) as do not match expected
#>   column type:
#> • `cohort_definition_id` from numeric to integer
#> Warning: ! 1 column in og_015_1727262876 do not match expected column type:
#> • `cohort_definition_id` is numeric but expected integer
#> ! cohort columns will be reordered to match the expected order:
#>   cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date.
#> ℹ Building new trimmed cohort
#> Warning: ! 1 column in tmp_011_og_017_1727262877 do not match expected column type:
#> • `cohort_definition_id` is numeric but expected integer
#> Creating initial cohort
#> ! cohort columns will be reordered to match the expected order:
#>   cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date.
#> ! cohort columns will be reordered to match the expected order:
#>   cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date.
#> ✔ Cohort trimmed
#> ℹ adding demographics columns
#> 
#> ℹ summarising data
#> 
#> ✔ summariseCharacteristics finished!
#> 
#> ! The following column type were changed:
#> • variable_name: from integer to character
#> ! Results have not been suppressed.

Variable name	Variable level	Estimate name	Database name
Variable name	Variable level	Estimate name	Synthea synthetic health database
Number records	-	N	2,694
Number subjects	-	N	2,694
Cohort start date	-	Median [Q25 - Q75]	1961-03-18 [1950-07-13 - 1970-08-29]
		Range	1908-09-22 to 1986-11-03
Cohort end date	-	Median [Q25 - Q75]	2018-12-14 [2018-08-02 - 2019-04-06]
		Range	1945-07-20 to 2019-07-03
Age at start	-	Median [Q25 - Q75]	0 [0 - 0]
		Mean (SD)	0.00 (0.00)
		Range	0 to 0
Age at end	-	Median [Q25 - Q75]	57 [47 - 67]
		Range	31 to 110
Sex	Female	N%	1,373 (50.97)
	Male	N%	1,321 (49.03)
Prior observation	-	Median [Q25 - Q75]	0 [0 - 0]
		Mean (SD)	0.00 (0.00)
		Range	0 to 0
Future observation	-	Median [Q25 - Q75]	20,870 [17,494 - 24,701]
		Mean (SD)	21,601.60 (5,460.69)
		Range	11,396 to 40,348

As seen, OmopSketch offers multiple functionalities to provide a general overview of a database. Additionally, it includes more tools and arguments that allow for deeper exploration, helping to assess the database’s suitability for specific research studies. For further information, please refer to the vignettes.

Name		Name	Last commit message	Last commit date
Latest commit History 620 Commits
.github		.github
R		R
data-raw		data-raw
extras		extras
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
OmopSketch.Rproj		OmopSketch.Rproj
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmopSketch

WARNING: this package is under-development and has only been tested using mock data

Installation

Example

Snapshot

Characterise the clinical tables

Characterise the concepts

Characterise the population

About

Releases 2

Packages

Contributors 6

Languages

License

OHDSI/OmopSketch

Folders and files

Latest commit

History

Repository files navigation

OmopSketch

WARNING: this package is under-development and has only been tested using mock data

Installation

Example

Snapshot

Characterise the clinical tables

Characterise the concepts

Characterise the population

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 6

Languages

Packages