Combine a database query with a cohort.
combine_query.Rd
An S3 generic function that can be used on database queries from Aurum or GOLD extracts.
Combine a database query with a cohort, only retaining observations between time_prev
days prior to indexdt
, and time_post
days after
indexdt
, and for test data with values between lower_bound
and upper_bound
. The most recent numobs
observations will be returned.
cohort
must contain variables patid
and indexdt
. The type
of query must be specified for appropriate data manipulation. Input type = med
if
interested in medical diagnoses from the observation file, and type = test
if interseted in test data from the observation file.
Usage
combine_query(
db_query,
cohort,
query_type = c("med", "drug", "test", "hes_primary", "death"),
time_prev = Inf,
time_post = Inf,
lower_bound = -Inf,
upper_bound = Inf,
numobs = 1,
value_na_rm = TRUE,
earliest_values = FALSE,
reduce_output = TRUE
)
Arguments
- db_query
Output from database query (ideally obtained through
db_query
).- cohort
Cohort to combine with the database query.
- query_type
Type of query
- time_prev
Number of days prior to index date to look for codes.
- time_post
Number of days after index date to look for codes.
- lower_bound
Lower bound for returned values when
query_type = "test"
.- upper_bound
Upper bound for returned values when
query_type = "test"
.- numobs
Number of observations to be returned.
- value_na_rm
If TRUE will remove data with NA in the
value
column of the queried data and remove values outside oflower_bound
andupper_bound
whenquery_type = "test"
.- earliest_values
If TRUE will return the earliest values as opposed to most recent.
- reduce_output
If TRUE will reduce output to just
patid
, event date, medical/product code, and testvalue
.
Details
value_na_rm = FALSE
may be of use when extracting variables like smoking status, where we want test data for number of cigarettes per day,
but do not want to remove all observations with NA in the value
column, because the medcodeid itself may indicate smoking status.
Examples
## Create connection to a temporary database
aurum_extract <- connect_database(file.path(tempdir(), "temp.sqlite"))
## Add observation data from all observation files in specified directory
cprd_extract(db = aurum_extract,
filepath = system.file("aurum_data", package = "rcprd"),
filetype = "observation")
#>
|
| | 0%
#> Adding /home/runner/work/_temp/Library/rcprd/aurum_data/aurum_allpatid_set1_extract_observation_001.txt 2024-11-15 11:32:14.392743
#>
|
|======================= | 33%
#> Adding /home/runner/work/_temp/Library/rcprd/aurum_data/aurum_allpatid_set1_extract_observation_002.txt 2024-11-15 11:32:14.429381
#>
|
|=============================================== | 67%
#> Adding /home/runner/work/_temp/Library/rcprd/aurum_data/aurum_allpatid_set1_extract_observation_003.txt 2024-11-15 11:32:14.444564
#>
|
|======================================================================| 100%
## Query database for a specific medcode
db_query <- db_query(db_open = aurum_extract,
tab ="observation",
codelist_vector = "187341000000114")
## Define cohort
pat<-extract_cohort(filepath = system.file("aurum_data", package = "rcprd"))
### Add an index date to pat
pat$indexdt <- as.Date("01/01/2020", format = "%d/%m/%Y")
## Combine query with cohort retaining most recent three records
combine_query(cohort = pat,
db_query = db_query,
query_type = "med",
numobs = 3)
#> patid medcodeid obsdate
#> <char> <char> <num>
#> 1: 1 187341000000114 -5373
#> 2: 2 187341000000114 -5769
#> 3: 6 187341000000114 -14727
## clean up
RSQLite::dbDisconnect(aurum_extract)
unlink(file.path(tempdir(), "temp.sqlite"))