Export Explorer
- Updated
Once exports have been set up for your project and the first file is generated, you will have access to the Export Explorer. This allows you to query a consolidated database containing data from all of your standard project exports using SQL. Additionally, you have the option to connect an external data analysis application to your project's export database.
In addition to this article, we strongly recommend visiting the MyDataHelps Data Guide, a GitHub repository with example notebooks and queries for making the most of the MyDataHelps export database.
Usage
To utilize the Export Explorer, navigate to your project's settings, then click on Export Explorer from the list on the left hand side.
The Export Explorer supports Amazon Athena SQL version 2, which has many functions available to help you query your data. To get started, write a SQL select statement and click “Run.” This will execute your query and show the results.
There are a few key components to the Export Explorer:
Tabbed Queries
You can create different tabs with different queries and results, quickly toggling between them. Select "+" at the top of query window to create new query tabs. Click "x" to close them.
Format SQL
Click the Format SQL button to quickly auto-format your SQL query.
Save
Click the Save button to save your query to run again. You can provide a Name, Category, and Description.
After saving, the query will be available under the Queries tab in the right sidebar.
If you want to make any edits to your query, you can click Save again to overwrite it or Edit Settings to adjust the Name, Category, and Description.
Query Copilot
Query Copilot uses AI to help you use the Export Explorer—no SQL experience necessary. Ask your project data questions in plain English and it will return SQL queries for you to run.
- Copilot does not have access to participant data or query results.
- Be as specific as possible in your prompts. Specify the exact name of a survey and for wearable data, specify the data source (e.g., Fitbit).
- After copilot generates a query, you need to click “Run”.
- Export Explorer results are only up to date as of your last export.
You can try some of the following example Query Copilot prompts:
- How many participants completed “Baseline_Survey”?
- Give me a list of all participant survey tasks, their delivery dates, and completed dates.
- How many participants have connected a Fitbit, and of those participants, how many have sleep data?
- Show me the number of participants who have any mention of ‘diabetes’ in their EMR data, grouped and sorted by age decade.
Database Structure
This tab shows the database tables available in your project. Any of the tables in this schema can be used in a from
statement, and clicking on a table or its columns will insert it into your editor window. Expanding a table will reveal the columns in the table along with the columns’ data types.
The default example query in the editor is only an example and may not work for all projects. While many projects export the allparticipants
model, the query will fail if the table is not present in your project’s database schema.
In addition to the columns documented on data exports, each table contains a _provenance
column, which contains metadata about every record. This metadata includes information about the origination of the record, including the source export identifier and date range.
Additional query features of Export Explorer include:
Processing History
In addition to the tables containing data from exports, the export database provides a history of all exports processed. This table is named processing_history
and can be found alongside other tables.
This table contains a row representing each time any export model is successfully updated in the database. Its natural key is the tuple of model_key
and modified_time
. A meaningful surrogate key is model_key
, processed_key
, and lock_time
.
It is insufficient to usemodel_key
andprocessed_key
as the natural key of this table because the export database may process an export more than one time.
Field Name | Data Type | Description |
---|---|---|
project |
string | The unique code identifier for your project. |
model_key |
string | The name of the model. |
modified_time |
timestamp (utc) | The time this record was generated. This is approximately the time that the database finished processing a particular export (processed_key ). |
lock_time |
timestamp (utc) | When a database processor works through an export, it obtains a globally-unique lock for each model it is working on. This field represents the time that lock was obtained. The maximum modified_time for a lock_time is approximately the time the lock was released. |
application |
string | A unique identifier for an instance of a database task processor. This can be used to indicate how much work a task was able to accomplish. |
version |
string | The version of the export database code used by the processing task. |
processed_key_time |
timestamp (utc) | The file modification time of the processed export (processed_key ). This is approximately the time that the MyDataHelps export service compiled the export and made it available for download. |
processed_key |
string | A unique identifier for the processed export. |
export_start_date |
timestamp (utc) |
The start date recorded in the manifest of the processed export. |
export_end_date |
timestamp (utc) | The end date recorded in the manifest of the processed export. |
excluded_participants |
list of strings | The list of participant identifiers declared as excluded participants according to the manifest of the processed export. |
configuration |
string | The configuration of this export as indicated in the manifest of the processed export. |
manifest |
map of strings | Remaining export manifest entries not otherwise recorded in this table. The keys of this map are generally the same as values of the model_key column and represent models as declared by the MDH export service. |
export_files |
list of strings | A full list of files within this processed export. Entries are listed relative to the root of the export zip. |
External Connection
It is possible to connect a data analysis application such as PowerBI, Tableau, RStudio, or Jupyter Notebooks directly to your project's export database. You can use ODBC, JDBC, or Amazon Athena libraries. For specific instructions on how to create an external connection, navigate to the External Connection tab at the top of the Export Explorer tab within your project.
If you need to repeatedly generate credentials for external connections, you can use a MyDataHelps Service Account to exchange an access token for export explorer credentials. Details for the exchange are described in the External Connection tab within the Export Explorer.
Examples
Because the export explorer provides access to all of your project’s export files, it can be a great way to explore the data in your project.
The export explorer does not provide a real-time view of data in your project. The available data is only as recent as your project’s most recent export file.
To get a view of enrollments over time, query your project's participant model.
select
date_trunc('day', enrollmentdate) enrollmentday,
count(participantidentifier) participants
from
allparticipants
group by 1
order by 1;
It’s also possible to query complex field types. If your project employs custom fields, you can query them with export explorer. For example, if you use a custom field named “SubscribedToEmail,” you can obtain a list of participants that have subscribed. Custom fields are represented as a map type column, which can be queried using the subscript operator.
select
participantidentifier,
enrollmentdate
from
allparticipants
where
customfields['SubscribedToEmail'] = 'True';
You can also combine multiple export files together to answer more complex questions. Expanding on the previous example, the following query uses Survey Question Results to display how many participants subscribed to email have also answered “en” to the survey question “Language.”
select
count(distinct participantidentifier) participants
from
allparticipants
join surveyresults using (participantidentifier)
join surveystepresults using (participantidentifier, surveyresultkey)
join surveyquestionresults using (participantidentifier, surveystepresultkey)
where
customfields['SubscribedToEmail'] = 'True'
and resultidentifier = 'Language'
and answers[1] = 'en';
Note that the answers
column in the previous query is an array. Because this question only has one possible answer, the predicate checks answers[1]
.
Troubleshooting
Export explorer does not appear on my project settings page
If you do not see the export explorer on your project settings page, you may not have access to view exports for your project. Check with your project coordinator to ensure you are part of a team with export permissions.
Export explorer tells me to use a different site
The export explorer uses the same authentication as MyDataHelps Designer. If you see a login page or an error telling you to use a different URL, you may still be using our older RKStudio site. You should change the links and bookmarks in your normal workflow to reference https://designer.mydatahelps.org instead of the RKStudio site.
Export explorer claims it is not available for my project
The export explorer is only available once a project has set up data exports and at least one export file is available. If your project has at least one export file older than a day and this error continues to appear, please contact us.
My query keeps failing
An error message indicating the incorrect parts of your query should be displayed in the query results area. In particular, double-check that your query follows these requirements:
- Commas separate all columns indicated in the select statement.
- The table you are attempting to query is displayed under “Database Schema” and has been previously exported from your project.
- Column and table names, if wrapped in quotation marks, use double quotes (") and not single quotes (').
- Filters for values and literal strings use single quotes, not double quotes.
- Keys in map fields use single quotes, not double quotes.
My query is canceled
Some projects contain data too large to be analyzed by a single query from the export explorer. If your query status is “canceled” with no other explanation, this is likely the cause. You can work around this in a few ways:
- Ensure that queries contain a
limit
clause when possible, allowing the export explorer to return results faster. - Avoid
select *
from big tables without using awhere
clause to filter data. - Segment your queries into multiple executions of disjoint filters. For example, you might decide to filter by enrollment date for each month your project has been live.
Was this article helpful?