Overview
Datavant’s DBX tool provides direct access to a robust Redshift database that stores standardized, aggregated healthcare data across a diverse set of customers. While this capability supports rich analytics and insights, it also requires mindful query construction. Queries that pull entire, large tables or omit critical filters (such as client_id
joins) can drastically degrade performance, impact the experience for all of your users, and may be terminated by our system.
This article covers best practices to ensure that your queries are both efficient and reliable, preventing slowdowns or disruptions in the data environment.
Why Care About Query Efficiency?
DBX connects to approximately 20 key standard tables, including but not limited to:
- Allergy
- Appointment
- Attribution
- Charge
- Demographic
- Diagnosis
- Encounter
- Immunization
- Lab Order
- Lab Result
- Medical / Surgical History
- Medication
- Patient Tokens
- Payer
- Procedure
- Provider
- Quality Data
- Social History
- Transaction
- Vitals
These tables contain large, aggregated datasets that serve multiple customers. Inefficient queries—such as SELECT * FROM transaction
without filters—can inadvertently return millions of rows. This not only bogs down performance, but also may cause downstream queries for other users to slow to a crawl or even force the system to kill the offending query to maintain platform stability.
Common Pitfalls to Avoid
-
Unfiltered Table Scans:
- Example of a Problematic Query:
-
Why It’s Problematic:
This query attempts to pull every column and row from a potentially massive table. Even if you plan to apply filters later in your analytics tool, this initial step places a huge load on the database.
-
Missing
client_id
or Critical Joins:- Example of a Problematic Query:
-
Why It’s Problematic:
Without joining onclient_id
(or another appropriate partitioning key), you risk pulling in unrelated data for multiple customers. This ballooning of result sets can overwhelm the database and degrade performance across the board.
-
Wide Table Joins Without Proper Filters:
- Example of a Problematic Query:
-
Why It’s Problematic:
Joining multiple large tables without selective filtering (like date ranges, patient subsets, orclient_id
) can create a massive combined result set that’s expensive to process.
Recommended Best Practices
-
Use Targeted Filters:
- Always include
client_id
in yourWHERE
clause or join conditions if applicable. This ensures you’re only querying the data relevant to your organization. - Apply date filters, patient segments, or encounter IDs to limit the dataset.
Example:
- Always include
-
Limit Columns:
- Select only the columns you need rather than using
SELECT *
. Reducing the number of columns retrieved can cut down on unnecessary data transfers and speed up query execution.
Example:
- Select only the columns you need rather than using
-
Use Aggregations Judiciously:
- If you need high-level metrics, aggregate them at the database level rather than exporting raw data and aggregating locally. This can reduce the result set size significantly.
Example:
- If you need high-level metrics, aggregate them at the database level rather than exporting raw data and aggregating locally. This can reduce the result set size significantly.
-
Test and Refine Queries Before Scaling Up:
- Start by testing your queries with a known smaller dataset or narrower filters.
- Gradually broaden your filters, monitoring execution time and performance.
Impact by Customer Size
Customers with a handful of connections may not immediately feel the full impact of an unoptimized query. Smaller datasets can tolerate less efficient queries without major slowdowns. However, as your data volume grows, so does the need for careful query design. Larger customers with more extensive data connections stand to face severe performance issues if queries are not properly filtered and structured.
Remember: Even if you’re a smaller customer today, following these best practices ensures you’re prepared as your data grows and helps maintain a stable environment for all users.
Further Reading
For detailed information on each table and its fields, please refer to our Standard Data Model documentation:
Datavant Site Connect’s Standard Data Model
By clicking into each table link within the documentation, you’ll find detailed field definitions and data types. Understanding the structure of each table can guide you in writing more efficient queries that return only the data you need.
Need Help?
If you have questions about optimizing your queries or need guidance on data model best practices, please reach out to our support team at support@healthjump.com. We’re here to help ensure your queries run smoothly and efficiently.