SensorKit Ingestion & Processing
How have you been ingesting and processing SensorKit data? Do you have any advice for people just approaching SensorKit data for the first time?
Please remember our community guidelines:
-
All posts and comments must be constructive. We ask that you respect and assist one another and always add your thoughts with the goal of improving the product and/or community.
- All posts and comments must be relevant. Posts that are not related to MyDataHelps or healthcare may be removed.
Comments
4 comments
Hi,
Here's the some feedback (regarding the data formatting) from HITS team who is taking care of the data intake pipeline:
To ease our implementation and integrate the SensorKit data ingestions with our established data pipeline, we would ask your helps on the SensorKit data format:
Thanks,
Intern Health Study team
Hi Yu- thanks for sharing. How has your team been ingesting the SensorKit data to date? Is that still being explored?
Hi - We have been able to explore the available metrics by reading individual json files and the help documents from Apple (thanks for the link!).
However resolving the above formatting issue is essential for batch processing data in the future. Thanks!
The SensorKit data is packaged in a pretty raw format and for some of the more "data verbose" types, the data is chunked into a series of files. It can be a daunting task to manually work with this data simply due to the volume and arrangement. Here's an example of a Python script that given a path (setting the data_path variable inside) to an unzipped MyDataHelps export directory, will loop thru the files, un-gzipping them and combining them by participant identifier into one file per participant that is a large JSON object with the data nested by sensor type, device type, device identifier, and query interval inside it.
Please sign in to leave a comment.