top of page
Writer's picturerajendra gupta

Approx_Count_Distinct in SQL Server 2019 and Azure SQL Database

Updated: Dec 31, 2021

The APPROX_COUNT_DISTINCT function gives the approximate number of unique non-null values in a group.


It is available for the following environments:

  • SQL Server 2019

  • Azure SQL Database

  • Azure SQL Managed Instance

  • Azure Synapse Analytics


You can use it to return approximate the number of rows that are not duplicated in a large table or result set. It works similar to the COUNT() functions (when using the DISTINCT clause), but rather than showing a precise number, it returns an approximate number for faster calculations.


APPROX_COUNT_DISTINCT( ) is a function that's incredibly useful for large sets of data. It provides approximate figures returns by calculating the approximate number of unique non-null values in a group.


This function is suitable for the following conditions


  • Data set with millions of rows or higher

  • Aggregation of columns having many distinct values


Note: According to Microsoft documentation, this function returns approximately 97% probability and up to 2% error rate

Let’s calculate the Actual distinct record counts (using COUNT(DISTINCT)) and approximate count (using APPROX_COUNT_DISTINCT) function.


SELECT 
 COUNT(SalesOrderID) 'Actual Row Count',
 COUNT(DISTINCT SalesOrderID) 'Actual Distinct Count',
 APPROX_COUNT_DISTINCT(SalesOrderID) 'Approx Distinct Count'
FROM SalesLT.SalesOrderDetail

As shown below, we got the similar value using the COUNT(DISTINCT) and APPROX_COUNT_DISTINCT functions. In this case, we used SalesOrderID that contains the unique value (due to primary key column) and does not have any duplicates.

Let's run the above query for the column - UnitPrice.

SELECT 
 COUNT(UnitPrice) 'Actual Row Count',
 COUNT(DISTINCT UnitPrice) 'Actual Distinct Count',
 APPROX_COUNT_DISTINCT(UnitPrice) 'Approx Distinct Count'
FROM SalesLT.SalesOrderDetail

Here, we see that there is a slight difference in the value returned by the COUNT(DISTINCT) and APPROX_COUNT_DISTINCT functions.

Approx_Count_Distinct in SQL Server 2019 and Azure SQL Database

You can refer to the article https://www.sqlshack.com/the-new-sql-server-2019-function-approx_count_distinct/ for more details on the Approx_Count_Distinct in SQL Server 2019.




Comments


bottom of page