Once exports have been set up for your project and the first file is generated, you will have access to the Export Explorer. This allows you to query a consolidated database containing data from all of your standard project exports using SQL. Additionally, you have the option to connect an external data analysis application to your project's export database.
In addition to this article, we strongly recommend visiting the MyDataHelps Data Guide, a GitHub repository with example notebooks and queries for making the most of the MyDataHelps export database.
Table of Contents
To utilize the Export Explorer, navigate to your project's settings, then click on Export Explorer from the list on the left hand side.
The Export Explorer supports Amazon Athena SQL version 2, which has many functions available to help you query your data. To get started, write a SQL select statement and click “Run.” This will execute your query and show the results.
The export explorer also shows the database tables available in your project. Any of the tables in this schema can be used in a
from statement, and clicking on a table or its columns will insert it into your editor window. Expanding a table will reveal the columns in the table along with the columns’ data types.
The default example query in the editor is only an example and may not work for all projects. While many projects export the
allparticipantsmodel, the query will fail if the table is not present in your project’s database schema.
In addition to the columns documented on data exports, each table contains a
_provenance column, which contains metadata about every record. This metadata includes information about the origination of the record, including the source export identifier and date range.
In addition to the tables containing data from exports, the export database provides a history of all exports processed. This table is named
processing_history and can be found alongside other tables.
This table contains a row representing each time any export model is successfully updated in the database. Its natural key is the tuple of
modified_time. A meaningful surrogate key is
It is insufficient to use
processed_keyas the natural key of this table because the export database may process an export more than one time.
|Field Name||Data Type||Description|
||string||The unique code identifier for your project.|
||string||The name of the model.|
||timestamp (utc)||The time this record was generated. This is approximately the time that the database finished processing a particular export (
||timestamp (utc)||When a database processor works through an export, it obtains a globally-unique lock for each model it is working on. This field represents the time that lock was obtained. The maximum
||string||A unique identifier for an instance of a database task processor. This can be used to indicate how much work a task was able to accomplish.|
||string||The version of the export database code used by the processing task.|
||timestamp (utc)||The file modification time of the processed export (
||string||A unique identifier for the processed export.|
|The start date recorded in the manifest of the processed export.|
||timestamp (utc)||The end date recorded in the manifest of the processed export.|
||list of strings||The list of participant identifiers declared as excluded participants according to the manifest of the processed export.|
||string||The configuration of this export as indicated in the manifest of the processed export.|
||map of strings||Remaining export manifest entries not otherwise recorded in this table. The keys of this map are generally the same as values of the
||list of strings||A full list of files within this processed export. Entries are listed relative to the root of the export zip.|
It is possible to connect a data analysis application such as PowerBI, Tableau, RStudio, or Jupyter Notebooks directly to your project's export database. You can use ODBC, JDBC, or Amazon Athena libraries. For specific instructions on how to create an external connection, navigate to the External Connection tab at the top of the Export Explorer tab within your project.
If you need to repeatedly generate credentials for external connections, you can use a MyDataHelps Service Account to exchange an access token for export explorer credentials. Details for the exchange are described in the External Connection tab within the Export Explorer.
Because the export explorer provides access to all of your project’s export files, it can be a great way to explore the data in your project.
The export explorer does not provide a real-time view of data in your project. The available data is only as recent as your project’s most recent export file.
To get a view of enrollments over time, query your project's participant model.
date_trunc('day', enrollmentdate) enrollmentday,
group by 1
order by 1;
It’s also possible to query complex field types. If your project employs custom fields, you can query them with export explorer. For example, if you use a custom field named “SubscribedToEmail,” you can obtain a list of participants that have subscribed. Custom fields are represented as a map type column, which can be queried using the subscript operator.
customfields['SubscribedToEmail'] = 'True';
You can also combine multiple export files together to answer more complex questions. Expanding on the previous example, the following query uses Survey Question Results to display how many participants subscribed to email have also answered “en” to the survey question “Language.”
count(distinct participantidentifier) participants
join surveyresults using (participantidentifier)
join surveystepresults using (participantidentifier, surveyresultkey)
join surveyquestionresults using (participantidentifier, surveystepresultkey)
customfields['SubscribedToEmail'] = 'True'
and resultidentifier = 'Language'
and answers = 'en';
Note that the
answers column in the previous query is an array. Because this question only has one possible answer, the predicate checks
Export explorer does not appear on my project settings page
If you do not see the export explorer on your project settings page, you may not have access to view exports for your project. Check with your project coordinator to ensure you are part of a team with export permissions.
Export explorer tells me to use a different site
The export explorer uses the same authentication as MyDataHelps Designer. If you see a login page or an error telling you to use a different URL, you may still be using our older RKStudio site. You should change the links and bookmarks in your normal workflow to reference https://designer.mydatahelps.org instead of the RKStudio site.
Export explorer claims it is not available for my project
The export explorer is only available once a project has set up data exports and at least one export file is available. If your project has at least one export file older than a day and this error continues to appear, please contact us.
My query keeps failing
An error message indicating the incorrect parts of your query should be displayed in the query results area. In particular, double-check that your query is free of these common problems:
- Commas separate all columns indicated in the select statement.
- The table you are attempting to query is displayed under “Database Schema” and has been previously exported from your project.
- Column and table names, if wrapped in quotation marks, use double quotes (") and not single quotes (').
- Filters for values and literal strings use single quotes, not double quotes.
- Keys in map fields use single quotes, not double quotes.
My query is canceled
Some projects contain data too large to be analyzed by a single query from the export explorer. If your query status is “canceled” with no other explanation, this is likely the cause. You can work around this in a few ways:
- Ensure that queries contain a
limitclause when possible, allowing the export explorer to return results faster.
select *from big tables without using a
whereclause to filter data.
- Segment your queries into multiple executions of disjoint filters. For example, you might decide to filter by enrollment date for each month your project has been live.