SQL NTILE

SQL Ntile Function

One of the SQL ranking functions is the NTILE Function. The SQL Server NTILE() function is a window function that divides rows of an ordered partition into a set of nearly equal groups. It assigns a number expression to each group that ranges from 1 to 100. For each row in a group to which the row belongs, the NTILE() method assigns a number expression.

To put it another way, the NTILE() feature can be compared to stuffing 10 letters into three envelopes. It's a ranking function that divides a set of m rows or records into n groups or buckets, each containing m/n records. It assigns more records to the initial buckets when the result set is not exactly divided into an equal number of rows.

  • buckets: This is where we specify the number of groups we desire. It's a positive integer or a positive integer-returning expression. A NULL value for the number of buckets is not allowed.
  • partition_expression: The column that will be used to divide the entire dataset. The entire result set is treated as a single partition by default.
  • order_expression: The column that determines whether the partition set's rows are ordered or sorted in ascending or descending order.

Syntax :

NTILE(number_expression) OVER (
   [PARTITION BY partition_expression ]
   ORDER BY sort_expression [ASC | DESC]
)

Parameters of syntax in detail :

  • number_expression: The number_expression is the integer into which the rows are divided.
  • PARTITION BY clause: The PARTITION BY is optional, it differs the rows of a result set into partitions where the NTILE() function is used.
  • ORDER BY clause: The ORDER BY clause specifies the row order in each partition in which the NTILE() function is utilised.

When the number of rows is not divisible by the number_expression, the NTILE() method returns two groups of one size each. Inside the ORDER BY clause of the OVER() clause, the larger groups always come before the lesser groupings. When all of the rows are divisible by number_expression, the function divides the rows evenly between number_expression.

Example 1: Use NTILE() function to divide above rows into 3 groups:

SELECT ID,
NTILE (3) OVER (
ORDER BY ID
) Group_number
FROM geeks_demo; 

Output:

ID Group_Number
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 3
10 3

Example 2: In this instance, we'll show you how to rank the partitioned records in a SQL Server table using ranking functions. The SQL Select Statement Query below will divide the data by occupation and assign a rank number based on annual salary.

SELECT [FirstName]
,[LastName]
,[Education]
,[Occupation]
,[YearlyIncome]
,NTILE(2) OVER (
    PARTITION BY [Occupation] 
    ORDER BY [YearlyIncome] DESC
    ) AS [NTILE NUMBER]
FROM [Customers]

Example 3: SQL query to illustrate use of NTILE() function to divide records in the yearly_sales table into 3 buckets.

SELECT year,
salesperson,sale_amount,
NTILE(3) OVER(
ORDER BY year
) as year_buckets
FROM
yearly_sales
ORDER BY
year,salesperson DESC;

The year-by-year data or rows in the yearly_sales table have been separated into three buckets in the example above: 1, 2, and 3. The first bucket contains (2018,2018,2019), the second bucket has (2019,2019,2019), and the third bucket contains (2019,2019,2019). (2020,2020,2020). Because there were nine records, they were sorted into three buckets.

Example 4: Each row is assigned a number from 1 to the value of the expression using the NTILE() function. The number of rows in each bucket can only differ by one. Each bucket receives the remainder of the number of rows divided by buckets, starting with bucket 1.

For example, if you have 10 rows and 4 buckets. Each bucket will have 2 rows. The remainder of 10/4 is 2. So the first bucket will have 2 + 1 = 3 rows, the second bucket will also have 3 rows.

SELECT Name,
Milliseconds,
NTILE ( 4 ) OVER ( 
ORDER BY Milliseconds ) LengthBucket
FROM
	tracks 
WHERE
	AlbumId = 1;

SQL Ntile Aggregate Function

NTILE(N) is a unique function with no aggregate equivalent. Based on the ORDER BY clause, it separates each grouping set of rows into N subranges and returns the subrange number for each row. It enables you to determine which percentile (or quartile, or other subdivision) a particular row belongs to.

Example 1: If we run a query for NTILE(5) OVER (PARTITION BY grouper ORDER BY id), we'll get 1 for the 20% of rows with the lowest id, 2 for the next 20%, and so on.

It's clear that each grouper requires a COUNT to be used in our computations. Unlike SUM, AVG, and ROW_NUMBER(), this function's value is not solely determined by the results of prior queries.

That's why we'll need an extra join:

SELECT  ao.*, 
FLOOR((@r * @_n) / cnt) + 1 AS `NTILE(5) OVER (PARTITION BY grouper ORDER BY id)`
FROM    (
SELECT  @_n := 5,
@_grouper := 'N'
) vars,
(
SELECT  a.*, cnt
FROM    (
SELECT  grouper, COUNT(*) AS cnt
FROM    t_aggregator
GROUP BY
grouper
) cnt,  t_aggregator a
WHERE   a.grouper = cnt.grouper
ORDER BY
grouper, id
) ao
WHERE   CASE WHEN @_grouper <> grouper THEN @r := -1 ELSE 0 END IS NOT NULL
AND (@r := @r + 1) IS NOT NULL
AND (@_grouper := grouper) IS NOT NULL;

As you can see, we select the COUNT(*) of rows for each grouper and join the results of the query with t_aggregator, so along with each row of t_aggregator we have the total count of rows for the given grouper.

In the SELECT clause, we just count the percentile our current row belongs to, multiply it by N and add a 1 (as NTILE results are 1-based).

Example 2: The sum function to find out the region using ntile.

select v.*,
  sum(CAMENSUEL) ,
  NTILE(2) over (partition by Region order by sum(Region) ) as [ntil]
from ventes v;

Example 3: This implies that if you have 100 rows and wish to divide them into four quartiles depending on a specific value field, you can simply do so and see how many rows are in each quartile.

Let's have a look at an example. We've mentioned that we want to produce four quartiles based on order_amount in the query below. Then we'll look at how many orders fall into each of the four quartiles.

SELECT order_id,order_date, 
customer_name,city, order_amount,
NTILE(4) OVER(ORDER BY order_amount) [row_number]
FROM [dbo].[Orders]

SQL Ntile Group By

The NTILE function divides rows into a number of equal-sized sections.

Starting with 1, the components (or tiles) are numbered.

Within the rows, additional grouping can be given to groups (partitions).

Rows will be distributed unevenly if the row count is not divisible by the number of components.

Note: NTILE is a generic term for tertile, quartile, quintile, and so on. A population divided into three equal portions, for illustration, is separated into tertiles.

Example 1: List the monthly sales for the year 2013.

Partition the list into three equal parts (i.e. tertiles).

SELECT MONTH(OrderDate) AS [Month], 
       SUM(TotalAmount) AS 'Total Sales', 
       NTILE(3) OVER(ORDER BY MONTH(OrderDate)) AS Tertile
FROM [Order]
WHERE YEAR(OrderDate) = 2013
GROUP BY MONTH(OrderDate)

Example 2: You can use NTILE() window function to create the groups and aggregate:

SELECT 
  DATETIME(MIN(DATE), ((STRFTIME('%s', MAX(DATE)) - STRFTIME('%s', MIN(DATE))) / 2) || ' second') date, 
  ROUND(AVG(value), 2) avg_value
FROM (
  SELECT *, NTILE(3) OVER (ORDER BY date) grp
  FROM test
) 
GROUP BY grp;

To change the number of rows in each bucket, you must change the number 3 inside the parentheses of NTILE().

Results:

date avg_value
2020-01-01 11:00:00 2.33
2020-01-01 14:00:00 5
2020-01-01 17:00:00 5.33

Example 3: For histograms with bucket widths of identical height, SQL ntile You can use the ntile window function to discover the bucket widths if you wish to optimise for bucket widths so that each bucket has the same number of salary counts. This is comparable to histogram equalisation in the realm of picture processing.

select ntile, min(salary), max(salary)
from (
    select salary, 
    ntile(4) over (order by salary)
) x
group by ntile 
order by ntile 

Output:

ntile min max
1 75007.8 84379.9
2 84384.7 101228
3 101250 112952
4 112952 139999

Using the min and max columns as bucket widths will give you an equiheight histogram.


SQL Ntile Join

Example 1:

WITH productline_sales AS (
SELECT productline,
  year(orderDate) order_year,
  ROUND(SUM(quantityOrdered * priceEach),0) order_value
FROM orders
  INNER JOIN orderdetails USING (orderNumber)
  INNER JOIN products USING (productCode)
  GROUP BY productline, order_year
)

SELECT productline, 
 order_year, 
 order_value,
NTILE(3) OVER (
 PARTITION BY order_year
 ORDER BY order_value DESC
    ) product_line_group
FROM 
 productline_sales;

First, we obtain the total order value of each product line by year using the productline sales popular table expression.

Then, in each year, we partition the sales by product line into three groups using the NTILE() method.


SQL Ntile Median

Now the Sales Manager, who is never satisfied, comes down and says that the average isn't good enough, and that she needs the maximum and median sale value for each quartile. She doesn't need this broken out by customer type to make things simple; it may be applied to the full collection.

This is an example of when NTILE() can be used twice. To obtain the quartiles, we will first divide all sales into four groups, and then divide each quartile into two groups to obtain the median. This is how the code looks:

Select quartile
  , max(case when bitile=1 then amount else 0 end) as medAmount
  , max(amount) as maxAmount
FROM (
  -- The second pass adds the
  -- 2-tile value we will use to find medians

SELECT quartile
   , amount
   , ntile(2) over (partition by quartile
   order by amount) as bitile
FROM (
   -- The subquery is necessary
   -- to process all rows and add the quartile column
   
SELECT amount
   , ntile(4) over (order by amount) as quartile
FROM ORDERS
    ) x1
    ) x2
GROUP BY quartile
ORDER BY quartile

Because we know we've divided the data evenly into four parts, the median will be the highest value halfway through each group. In other words, when the value of bitile=1 for each quartile, it will be the greatest value.

Example 2: Because the two middle values are not averaged when the row count is even, this approach does not yield the correct median.

SELECT
  MAX(column) as "Median of Column"
FROM
  ( SELECT
  column,
  ntile(2) OVER (ORDER BY column) AS bucket
FROM
  table
  ) as t
WHERE bucket = 1
GROUP BY bucket;

SQL Ntile Partition By

The PARTITION BY clause divides a result set's rows into partitions for the NTILE() function to process.

Syntax:

PARTITION BY expression1 [,expression2,..]

If partition_by_clause is not specified, then all rows shall be treated as a single group.

NTILE(buckets) OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC | DESC], ...
)

Example 1: SQL query demonstrating the usage of the NTILE() function to partition records in the yearly_sales table by year and then divide into three buckets.

SELECTyear, 
salesperson,sale_amount,
NTILE(3) OVER(
PARTITION BY year
ORDER BY sale_amount
) as amount_buckets_year_partition
FROM
yearly_sales
ORDER BY year DESC;

Example 2: We can separate the contents of each partition into buckets by including the partitioning clause. We divide the employees in each department into two buckets in the given description.

SELECT empno,
   ename,
   deptno,
   sal,
   NTILE(2) OVER (PARTITION BY deptno ORDER BY sal) AS bucket_no
FROM   emp;

Output:

EMPNO ENAME DEPTNO SAL BUCKET_NO
7934 MILLER 10 1300 1
7782 CLARK 10 2450 1
7839 KING 10 5000 2
7369 SMITH 20 800 1
7876 ADAMS 20 1100 1
7566 JONES 20 2975 1
7788 SCOTT 20 3000 2
7902 FORD 20 3000 2
7900 JAMES 30 950 1
7654 MARTIN 30 1250 1
7521 WARD 30 1250 1
7844 TURNER 30 1500 2
7499 ALLEN 30 1600 2
7698 BLAKE 30 2850 2

Example 3: You can allocate a rank number to each record in a partition using the SQL NTILE Function. In this instance, we'll show you how to rank the partitioned records in a SQL Server table using ranking functions. The SQL Select Statement Query below will divide the data by occupation and assign a rank number based on annual salary.

SELECT [FirstName]
      ,[LastName]
      ,[Education]
      ,[Occupation]
      ,[YearlyIncome]
      ,NTILE(2) OVER (
                     PARTITION BY [Occupation] 
                     ORDER BY [YearlyIncome] DESC
         	   ) AS [NTILE NUMBER]
  FROM [Customers]

The Occupation column is used in the PARTITION BY [Occupation] statement below to partition the selected data. As you can see in the image above, we have four partitions.

This OrdeByr Sort the data using their [annual income] in descending order.

We utilised the NTILE(2) Function with the Partition by clause in the following sentence. The integer value in this case is 2. This means that for each partition, the SQL NTILE Function will allocate two rank numbers (4 portions).


SQL Ntile Percentile

PARTITION BY state_id in the OVER clause will return percentiles per state.

A given row's percentile (or quartile, or any other subdivision)

Because it conducts a percentile lookup over the full dataset for each of the percentiles it produces, this method can be sluggish. If you're calculating 100 percentiles, for example, it'll scan your data 100 times.

In an ideal world, we'd scan the data only once while keeping note of which percentile each number belongs to. ntile is a window function in PostgreSQL that can be used to accomplish this.

Each value in your dataset will be assigned to a bucket, and you can select how many buckets there are:

  • pick things.value, ntile(100) from things.value (sort by things.value)
  • This query returns all of our values, as well as the buckets to which they belong. The next step is to convert percentile values from these buckets. We may look at the maximum value of the 50th bucket to see where the 50th percentile finishes because each value has been evenly distributed among 50 buckets. To find all the percentiles, we can use an aggregate query to calculate the max(value) for each individual bucket number!

Syntax:

NTILE(*# of buckets*)

In this case, ORDER BY determines which column to use to determine the quartiles (or whatever number of 'tiles you specify).

Example 1: GROUP BY looks spurious at least. I think it needs to be removed if you want percentiles. And group by the PK is a no-operation anyway.

SELECT id, population, state_id, 
 ntile(100) OVER (PARTITION BY state_id ORDER BY car20) AS percentile
FROM 
 avi_threshold01;

NTILE cannot accurately calculate a percentile.

Example 2:

SELECT start_terminal,
       duration_seconds,
       NTILE(4) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
          AS quartile,
       NTILE(5) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS quintile,
       NTILE(100) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS percentile
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
ORDER BY start_terminal, duration_seconds

When you look at the results of the query above, you'll notice that the percentile column isn't calculated exactly as you'd expect. You'd expect one record to determine the first percentile and the second record to establish the 100th percentile if you just had two records and were calculating percentiles. When you use the NTILE function, you'll get one record in the first percentile and one in the second percentile. The percentile column in the data for start_terminal 31000 only looks like a number ranking. You can see that it compute percentiles correctly for start terminal 31007 if you scroll down because there are more than 100 records for that start terminal. If you're working with very small windows, keep this in mind and consider using quartiles or similarly small bands.

Example 3: Here’s the final query calculating all the percentiles from 1 to 100.

select max(buckets.value), ntile as percentile
from
  (select things.value, ntile(100) over (order by things.value) from things) as buckets
group by 2 order by 2

Because it will only scan the data once instead of 100 times, this should execute around 100 times quicker than the instance using percentile disc with 100 percentiles.

Instead of a fraction, the percentile is returned as an integer. Before consuming the results elsewhere, you may need to alter your query.

Keep in mind that converting the ntile bucket to the percentile will require some math if you want to calculate a number of percentiles other than 100. When computing quartiles, for example, you'd use ntile (4). Buckets 1, 2, 3, and 4 have been assigned. To convert a bucket number to a percentile, use the following formula: ntile / 4.0.

Example 4: SELECT where one of the lines I am using for percentile is:

NTILE(100) OVER(PARTITION BY ItemID ORDER BY SUM(quantity)) as Percentile
ItemID Quantity Price TotalQ
ABC 10 14.50 38
ABC 4 14.25 38
DEF 32 22.41 34
ABC 24 14.10 38
GHI 8 8.50 20
GHI 12 8.60 20
DEF 2 22.30 34

The Percentile field, on the other hand, ends up being nothing more than a ranking of each quantity for each ItemID. Starting at the top and working down, I'd need each row to measure the cumulative % of volume for each transaction price (each row is a distinct transaction).


SQL Ntile Quartile

Quartiles are the three data values (Q1, Q2, Q3) that divide an ordered set of data values into four equal groups (similar to how the Median divides it into two). Q1 is the midway value between Min and Median, Q2 is the Median, and Q3 is the middle value between Median and Max.

Interquartile Range (IQR) is simply Q3-Q1 and is a more robust statistic for measuring spread than range and standard deviation.

Finding the upper and lower limit based on Quartiles and IQR as an example is a straightforward approach of identifying outliers:

Lower Limit = Q1 – 1.5 * IQR

Upper Limit = Q3 + 1.5 * IQR

NOTE: NTILE is a type of window function that can be used to calculate different quantiles directly in a SQL query.

Example 1: The difficulty is that we don't always have useful columns like QUARTILE, but we can construct the QUARTILE column using NTILE throughout the query.

Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        -- The subquery is necessary
        -- to process all rows and add the quartile column
        SELECT amount
             , ntile(4) over (order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY quartile
 ORDER BY quartile

This query will give us what the Sales Manager wants.

Example 2: Quartile Coefficient of dispersion is defined as (Q3-Q1)/(Q3+Q1), this is analogous to Coefficient Of Variation.

;WITH Vals AS--Say this is a list of student scores
(
SELECT X FROM (Values (50),(49),(70),(85),(90),(55),(60),(65),(66),(65),(67),(63),(75),(45),(59),(72),(66),(68),(40),(57) ) Val(X)
),
Quartiles AS
(
SELECT TOP 1
Percentile_Cont(0.25) WITHIN GROUP(Order By X) OVER() As Q1,
Percentile_Cont(0.50) WITHIN GROUP(Order By X) OVER() As Median,
Percentile_Cont(0.75) WITHIN GROUP(Order By X) OVER() As Q3
FROM Vals
),
IQR AS
(
SELECT *,(Q3-Q1) as IQR
FROM Quartiles
),
Qstat AS
(
SELECT *,IQR/(0.5*Median) as QCoeffDisp
FROM IQR
)
SELECT 'Summary' as Stat, *,null as OutlierValue from Qstat
UNION ALL
SELECT 'Outlier', null, null, null, null, null, X
FROM Vals V JOIN Qstat IQR ON V.X < Q1-IQR OR V.X > Q3+IQR

Example 3: This brings us nicely onto NTILE. Think of NTILE as 'N'-TILE where N refers to the 'segments' of the whole, in the same sense of the words QUARTile or PERCENTile. This is how to use it:

SELECT  CustomerID, TotalDeposited, 
 NTILE(4) OVER (ORDER BY TotalDeposited) AS [Quartile]
FROM dbo.rankingTable

These window functions divide a query's result set by dimension, then rank it using the NTILE argument, such as Quartile or Decile. The dimension list in the SQL expression might comprise one or more groupings, which greatly improves the performance of this analytic procedure.


SQL Ntile Rank

NTILE is a SQL Server ranking function that divides data into a specific number of groups and assigns a number to each group in increments of One. The NTILE function uses the ORDER BY clause to sort and arrange the rows in the requested order.

Imagine we have a table with 10 rows and are using the NTILE(5) function. Given that each group has an even number of rows (10 divided by NTILE(5)), each group has two rows. As a result, the NTILE function will divide each group into two rows and allocate each one a number ranging from 1 to 5.

When a table has an odd number of rows. Assume 11 rows and the NTILE(5) function is used. Before breaking a table into groups, the NTILE function determines the number of rows. When the number of rows in a table is split by the number of groups supplied with NTILE, an unequal number of rows is created. Groups having a larger number of rows will be listed first, followed by groups with fewer rows.

Example 1: The NTILE() ranking function differs from the other three functions we've studied so far. This function divides a partition's rows into the specified number of divisions. The rows are then assigned to one of these groups based on their rank, which begins with 1 and continues until the number of groups provided is reached. We can see that we gave the number 5 to the NTILE() function in the sample code and outputs below, indicating that we are breaking each division into 5 groups. We can see that that each row is allocated a rank between 1 and 5, with some groups consisting of more more than than one row, which is to be expected given that each partition contains more than 5 cities. In this case, ties can be within the same group or split up as is the case with Calexico and La Jolla.

SELECT *, 
NTILE(5) OVER(PARTITION BY c_state ORDER BY coffee_shop_num DESC) as rank
FROM city;

Example 2: The following example uses the NTILE window function to divide the Q1 sales into five groups and list the sales in ascending order.

select emp_mgr, sales, 
ntile(5) over(order by sales) as ntilerank from q1_sales;

Output:

emp_mgr sales ntilerank
Kari Phelps 8227 1
Rich Hernandez 9308 1
Kari Phelps 9710 2
Rich Hernandez 12369 2
Mike Palomino 13181 3
Rich Harnandez 15427 3