SQL DELETE Statement

SQL DELETE Statement


The SQL DELETE statement is used to delete or remove the existing records in a specific table.

The SQL DELETE statement is also used to delete all records (without SQL WHERE clause) or specific records that match a specified criteria using SQL WHERE clause. This is accomplished by carefully constructing a where clause.



Sql delete query used to delete row in sql, delete all rows, delete data from sql table, delete syntax in sql, delete from select, sql delete example, sql server delete from join, Delete Batch, SQL Delete Duplicates but Keep One, Delete Faster, SQL Delete First or Last 5 Records..

SQL DELETE Syntax

The basic syntax of SQL DELETE statement with SQL WHERE clause is as follows:


DELETE FROM table_name1
WHERE some_column_name = some_value_1;

You can combine N number of conditions using AND or OR operators.

You do not need to list column names in the SQL DELETE statement since you are removing the entire record from the table.

Note: The SQL WHERE clause in the SQL DELETE statement!

The SQL WHERE clause specifies criteria to delete, which row or rows that should be deleted. If you omit the SQL WHERE clause, all rows will be removed!


Sample Database Table - Books

BookId BookName BookPrice RelYear DomainName AuthorName
1 The Complete Guide to SQL Server 155 2009 Performance Dharan
2 Making Sense Of SQL 199.97 2006 Programming Padmavathi
3 Understanding the New SQL 110 2007 Programming Varshini Kutty
4 Sql Server Concurrency 205 2007 Programming Nirmala

Sample Database Table Structure - Books

Column Name Data Type
BookID Number
BookName Text
BookPrice Number
RelYear Number
DomainName Text
AuthorName Text

Text Column Vs Numeric Column

SQL requires single or double(most database support) quotes around text or string values. However, we write numeric fields without quotes:


SQL DELETE Example

Example 1

The following SQL statement will delete a existing row from column "BookID = 7", in the "Books" table:


DELETE FROM Books
WHERE 
BookID = 3;

The result of above query is:

BookId BookName BookPrice RelYear DomainName AuthorName
1 The Complete Guide to SQL Server 155 2009 Performance Dharan
2 Making Sense Of SQL 199.97 2006 Programming Padmavathi
4 Sql Server Concurrency 205 2007 Programming Nirmala

Example 2

The following SQL statement will delete multiple existing row(s) from column BookPrice less than "200" and DomainName = "Programming", in the "Books" table:


DELETE FROM Books
WHERE 
BookPrice < 200 AND DomainName = 'Programming';

The result of above query is:

BookId BookName BookPrice RelYear DomainName AuthorName
1 The Complete Guide to SQL Server 155 2009 Performance Dharan
4 Sql Server Concurrency 205 2007 Programming Nirmala

Example 3

The following SQL statement will delete all existing rows from the "Books" table:


DELETE FROM Books;

The result of above query is:

BookId BookName BookPrice RelYear DomainName AuthorName


Sql delete statement using sql query to delete a row from table, sql delete table data, sql delete order by, ms sql delete statement, sql clear table contents, remove data from table, sql delete from multiple tables in one statement, Delete Large Number of Records, Delete Last Inserted Row, Delete Parent Child Rows, Delete Rollback, Delete and Truncate.

SQL Delete All Rows

To delete all rows from the table, use the SQL DELETE ALL ROWS query.

In SQL, you can use either the AND or OR conditions to have several conditions in a DELETE statement. If all of the requirements are fulfilled, the AND condition lets you delete a record. If any of the conditions are met, the OR condition deletes the record.

Example 1: If you want to delete all the rows from student table the query would be like,

DELETE FROM STUDENT_NAME; 

Example 2: Let's have a look at an example of using the DELETE statement with two conditions and the AND condition.

In this case, we have a table called products that contains the following information:

product_id	product_name	category_id
1		Pear		50
2		Banana		50
3		Orange		50
4		Apple		50
5		Bread		75
6		Sliced Ham	25
7		Kleenex		NULL

Enter the following DELETE statement:

DELETE FROM products
WHERE category_id = 50
AND product_name <> 'Pear';

There will be 3 records deleted. Select the data from the products table again:

SELECT * FROM products;

Output:

product_id	product_name	category_id
1		Pear		50
5		Bread		75
6		Sliced Ham	25
7		Kleenex		NULL

This example deletes all records from the products table that have a category_id of 50 and a product_name that isn't Pear.

Before conducting the deletion, run the following SELECT statement to determine the number of rows that will be deleted.

SELECT COUNT(*)
FROM products
WHERE category_id = 50
AND product_name <> 'Pear';

When you run the DELETE command, this will yield the number of records that will be destroyed.

Output:

COUNT(*)
3

Example 3: Using the IN clause, we can specify multiple row ids to delete.

For example, the following query would delete rows with ids equal to 1, 5 and 7:

DELETE from `tablename` WHERE `id` IN (1, 5 , 7);

SQL Delete Json

The JSON REMOVE() method in MySQL returns the result of removing data from a JSON document.

The JSON document is the first argument, followed by the path from which data should be deleted. If necessary, you can offer additional pathways.

Syntax:

JSON_REMOVE(json_doc, path[, path] ...)

Where path is the path to remove data from, and json doc is the JSON document.

From left to right, the path arguments are examined. The document created by assessing one path becomes the new value used to evaluate the following path.

If the first parameter is not a valid JSON document, an error will occur.

The path argument must also be a valid path expression; it cannot be $ or contain a * or ** wildcard, or an error will occur.

Example 1 – Basic Usage

SELECT JSON_REMOVE('{"a": 1, "b": 2, "c": 3}', '$.b') AS 'Result';

Result:

+------------------+
| Result           |
+------------------+
| {"a": 1, "c": 3} |
+------------------+

We eliminated the key/value pair with the key of b in this example. This is due to the fact that we used $.b as the second argument.

Here’s another example:

SELECT 
JSON_REMOVE('{"Name": "Homer", "Gender": "Male", "Age": 39}', '$.Age') AS 'Result';

Result:

+-------------------------------------+
| Result                              |
+-------------------------------------+
| {"Name": "Homer", "Gender": "Male"} |
+-------------------------------------+

Example 2 – Non-Existent Path:

Nothing is erased if you specify a path that does not exist. Without alteration, the original JSON document is returned.

SELECT 
JSON_REMOVE('{"Name": "Homer", "Age": 39}', '$.Gender') AS 'Result';

Result:

+------------------------------+
| Result                       |
+------------------------------+
| {"Age": 39, "Name": "Homer"} |
+------------------------------+

Example 3 – Arrays

SELECT JSON_REMOVE('[1, 2, 3]', '$[0]') AS 'Result';

Result:

+--------+
| Result |
+--------+
| [2, 3] |
+--------+

Because arrays use zero-based numbering, the first member is eliminated from the array in this scenario.

Here's another example of an array. We're going to remove a value from a nested array this time.

SELECT JSON_REMOVE('[1, 2, [3, 4, 5]]', '$[2][1]') AS 'Result';

Result:

+----------------+
| Result         |
+----------------+
| [1, 2, [3, 5]] |
+----------------+

Example 4 – Multiple Paths

To remove data from several locations in the JSON document, you can provide multiple paths.

SELECT JSON_REMOVE('[1, 2, 3, 4, 5]', '$[1]', '$[3]') AS 'Result';

Result:

+-----------+
| Result    |
+-----------+
| [1, 3, 4] |
+-----------+

The path parameters are assessed left to right, as previously stated, and the document created by evaluating one path becomes the new value against which the next path is analyzed.

As a result, the 2nd path argument removes a different value than it would if it were the only path parameter in this case. It would have eliminated 4 if it had been the only path argument.

SELECT 
JSON_REMOVE('[1, 2, 3, 4, 5]', '$[3]') AS 'One Path',
JSON_REMOVE('[1, 2, 3, 4, 5]', '$[1]', '$[3]') AS 'Two Paths';

Result:

+--------------+-----------+
| One Path     | Two Paths |
+--------------+-----------+
| [1, 2, 3, 5] | [1, 3, 4] |
+--------------+-----------+

As you can see, using $[3] as the lone way eliminates 4 from the array. However, when we utilise it as the second path, it eliminates 5 steps (and 4 is left untouched).

Another example, this time using a nested array and removing values from an array:

SELECT JSON_REMOVE('[1, 2, [3, 4, 5]]', '$[0]', '$[1][1]') AS 'Result';

Result:

+-------------+
| Result      |
+-------------+
| [2, [3, 5]] |
+-------------+

Despite the fact that the array is at position 2 in the original document, the first path value ($[0]) chops down the outer array and moves the inner array to position 1.

If this causes your head spin, simply change the path arguments so that the rightmost values are removed first. As a result, the leftmost values will not be affected, and you can define the paths depending on the original JSON document.

As a result, we can substitute the following code for the previous code and receive the same effect:

SELECT JSON_REMOVE('[1, 2, [3, 4, 5]]', '$[2][1]', '$[0]') AS 'Result';

Result:

+-------------+
| Result      |
+-------------+
| [2, [3, 5]] |
+-------------+

Example 5 – A Larger JSON Document

SET @data = '{  
  "Person": {    
  "Name": "Homer", 
  "Age": 39,
  "Hobbies": ["Eating", "Sleeping", "Base Jumping"]  
    }
 }';

SELECT JSON_REMOVE(@data, '$.Person.Age', '$.Person.Hobbies[2]') AS 'Result';

Result:

+------------------------------------------------------------------+
| {"Person": {"Name": "Homer", "Hobbies": ["Eating", "Sleeping"]}} |
+------------------------------------------------------------------+

To delete a property, set the value to NULL.

UPDATE NFLTeams
SET Players = JSON_MODIFY(Players, '$.QB1.DraftPosition', NULL)
WHERE TeamName = 'Detroit Lions'

In lax mode, setting a property to NULL deletes the property. By default, all functions are in lax mode (instead of strict mode).

This deleted the DraftPosition property:

{
    "Name": "Matthew Stafford",
    "YearsOfExperience": 12,
    "College": "Georgia"
}

SQL Delete Batch

Delete Rows in Batches

Trying to delete millions of rows without enough transaction log space can result in several issues. Most databases are designed to have adequate disc capacity to handle a wide range of DML operations. However, there may be an ad-hoc need to delete a significant number of rows, or it could be planned as part of an archive process or a data correction.

If you try to delete in one transaction for whatever reason, the DELETE job will fill the transaction log until the DELETE is committed. Whether you're in Full Recovery or Simple Recovery, this is true.

There are distinctions between SIMPLE and FULL RECOVERY, but that is a topic for another discussion.

There are many ways to avoid the transaction log from ballooning and overflowing the discs. You can do things like a) provision more disc and b) create extra transaction log files with a little forethought. However, you may discover that these methods do not resolve the issue.

Another possibility is to perform the DELETE in batches. The operation uses less log space, commits, and flushes to disc because it deletes in batches. You keep reusing the same space if the database is set to SIMLE or BULK LOAD.

Example 1: Here's an example of code that deletes items in batches. You can change the value of the variable @BatchSize. You'll have to explore to see what works best for the DELETE job in the given context.

use MyDB
GO
DECLARE @BatchSize INT = 4999
WHILE 1 = 1
BEGIN
DELETE TOP (@BatchSize)
FROM MyTable
WHERE Date < ‘20160301’';     
IF @@ROWCOUNT < @BatchSize BREAK
END

Example 2: Tables can get exceedingly large in large SQL Server OLTP systems over time. Because such tables were never built to hold that volume of data, queries against them may experience performance concerns when that time arrives. Even with the correct indexes, performance may not be as excellent as planned, forcing you to spend more time deleting outdated data.

Solution

In this lesson, I'll show you how to use SQL scripts to copy data to a historical table (in case you need historical data in the future but don't have it in the principal table) and set up a loop to delete entries in batches (as specified by you) using a set of start and end dates.

During the procedure, you can use another SQL script (included in this solution) to keep track of the deletion's progress. Perhaps you need to know how long it takes to delete a day's worth of data, or how long it takes to delete X number of records, as well as the overall progress.

Code Explained:

  • The procedure will be handled using a stored procedure, therefore it will work in any situation.
  • Throughout the demonstration, I'll be working with tempdb as my database of choice.
  • I constructed a test table called "big_table" and filled it with 500,000 rows spread out over five days.

Here’s how you are going to call the stored procedure:

EXECUTE [dbo].[Batch_Delete]    @startDate = '2015-01-01'
  ,@endDate            = '2015-01-06'
  ,@dbName             = 'tempdb'
  ,@schemaName         = 'dbo'
  ,@tableName          = 'big_table'
  ,@dateFieldName      = 'created_date'
  ,@saveToHistoryTable = 1
  ,@batch              = 1000

The names of the parameters are pretty much self-explanatory, but here’s their purpose:

  • @startDate: The start date from when to start working with the records.
  • @endDate: A non-inclusive date to limit when the records will be taken into account. In my example it means that I’m just going to be working with records from >= ‘2015-01-01’ and date < ‘2015-01-06’.
  • @dbName: The name of the database that hosts the table that you want to work with. Take into account that in this same database the historic and metrics tables are created.
  • IfIf you’d like to have these 2 tables in separate databases, a few tweaks would have to be made to the stored procedure and an additional parameter would have to be passed to specify the name of the other database.
  • @schemaName: The name of the schema of the table.
  • @tableName: The name of the table to be processed.
  • @d@dateFieldName: The name of the column that contains the timestamp fields to work with. Remember that this logic is written towards a date-based approach; if you want a different one you’d have to adjust the stored procedure to your particular case.
  • @saveToHistoryTable: If 1, then an identical empty table is created based on @tableName, and the name "_historic" is added to it to distinguish it. If 0, then no data movement will be performed during the execution of the script (use it very carefully).

The stored procedure contains a set of initial validations to make sure that you have entered the correct information to proceed.

Here are all the validations considered:

  • Notify the user if @startDate is equal to or greater than @endDate.
  • Notify the user if the @dbName isn't supplied or is left blank.
  • Notify the user if the @schemaName isn't given or is left blank.
  • Notify the user if the @tableName isn't supplied or is left blank.
  • Notify the user if the @dateFieldName isn't given or is left blank.
  • Notify the user if the table specified by @dbName.@schemaName.@tableName does not exist.
  • Notify the user if the table exists but the field @dateFieldName does not.
  • A warning will be displayed to the user if the option @saveToHistoryTable is set to 0 (the default value is 1).

In this example, the table "big_table_X" does not exists, so we get an error

In this example, the column "create_date2" does not exists, so we get an error.

Check Progress of Delete Processing

This T-SQL code can be used to monitor the deletion of rows in real time.

You can see how much time each day took to remove, as well as the overall time the procedure has taken so far. A column called "Progress" appears in the very last column, and it represents the percentage of entries deleted that have already been copied to the historic table (make sure to put the correct table name for your case).

Because of the use of the SUM(X) OVER(Y) function, this query will only operate in SQL Server 2012 and above instances.

Here is the query you can use to get information about the delete process:

SELECT StartDate,
   EndDate,
   Records,
   [Total Records] = SUM(Records) OVER (ORDER BY StartDate),
   CompletionTime,
   [Total Time] = SUM(CompletionTime) OVER (ORDER BY StartDate),
   CONVERT(DECIMAL(10,2),(SUM(Records) OVER (ORDER BY StartDate) / 
   (SELECT CONVERT(DECIMAL(10,2),COUNT(*)) FROM big_table_historic) ) * 100) AS 'Progress'
FROM Delete_Me

SQL Delete Duplicate Rows

Using the GROUP BY clause or the ROW_NUMBER() function, find duplicate records. To get rid of duplicate records, use the DELETE statement.

Example 1: Let’s set up a sample table for the demonstration.

Setting up a sample table

Step 1: If table exists drop it: DROP TABLE IF EXISTS sales.contacts;

Step 2: First, create a new table named sales.contacts as follows:

CREATE TABLE sales.contacts(
    contact_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name NVARCHAR(100) NOT NULL,
    last_name NVARCHAR(100) NOT NULL,
    email NVARCHAR(255) NOT NULL,
);

Step 3: Second, insert some rows into the sales.contacts table:

INSERT INTO sales.contacts
    (first_name,last_name,email) 
VALUES
    ('Syed','Abbas','syed.abbas@example.com'),
    ('Catherine','Abel','catherine.abel@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Pilar','Ackerman','pilar.ackerman@example.com');

Step 3: Third, query data from the sales.contacts table:

SELECT contact_id, 
   first_name, 
   last_name, 
   email
FROM 
   sales.contacts;

Step 4: SQL Server Delete Duplicates:

For contacts with the same first name, last name, and email address, there are many duplicate rows (3,4,5), (6,7), and (8,9).

Delete duplicate rows from a table example

The following statement uses a common table expression (CTE) to delete duplicate rows:

WITH cte AS (
  SELECT contact_id, 
  first_name, 
  last_name, 
  email, 
ROW_NUMBER() OVER (
PARTITION BY 
  first_name, 
  last_name, 
  email
ORDER BY 
  first_name, 
  last_name, 
  email
) row_num
FROM 
sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;

In this statement:

  • To begin, the CTE employs the ROW NUMBER() function to locate duplicate rows based on the first name, last name, and email columns' values.
  • The DELETE statement then deletes all duplicate rows while only keeping one instance of each duplicate group.
  • The duplicate rows were eliminated, as indicated by the following statement from SQL Server.

When you query the sales.contacts table again, you'll notice that all duplicate rows have been removed.

SELECT contact_id, 
       first_name, 
       last_name, 
       email
FROM sales.contacts
ORDER BY first_name, 
         last_name, 
         email;

Example 2: Creating a Database and a Table

Step 1: Create a Database

Open your SQL Server and use the following script to create the “chittadb” Database.

Create database chittadb

To run the above script, select the script query and hit F5 or click the Execute button.

The message "Command(s) completed successfully" should appear. This indicates that a new database has been created for you.

Step 2: Create a table

Open your SQL Server and use the following script to create table “tbl_Mcastudents”.

create table tbl_Mcastudents   
(  
   Id int primary key not null identity(1,1),  
   Name nvarchar(50),  
   Location nvarchar(30),  
   Gender varchar(10)  
)  

Execute the above query to create “tbl_Mcastudents “.

You should see a message, “Command(s) completed successfully.”

Step 3: Now, data has been inserted into the table.

Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')  
Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')  
Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')  
Insert into tbl_Mcastudents values ('Rani', 'Puri', 'Female')  
Insert into tbl_Mcastudents values ('Rani', 'Puri', 'Female')  
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')  
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')  
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')

Execute the above query, you should see a message, “Command(s) completed successfully.”

Step 4: Now retrieve all data from “tbl_Mcastudents” table.

select * from tbl_Mcastudents

There are many duplicate rows (10, 11, 12), (13, 14), and (15, 16, 17) for the tbl_Mcastudents that have the same Name, Location, and Gender.

Step 5: Delete duplicate rows/records in SQL server using common table expression (CTE)

  • To delete the duplicate rows from the table in SQL Server, we follow these steps,
  • Find duplicate rows using GROUP BY clause or ROW_NUMBER()
  • Use DELETE statement to remove the duplicate rows.

SQL Query

WITH cte AS  
( SELECT Id, Name, Location, Gender, ROW_NUMBER() OVER (PARTITION BY Name, Location, Gender  
ORDER BY Name, Location, Gender) row_num FROM tbl_Mcastudents  
)  
DELETE FROM cte WHERE row_num > 1;  

In above query,

  • To begin, the CTE employs the ROW NUMBER() method to locate duplicate rows indicated by Name, Location, and Gender data.
  • The DELETEstatement then deletes all duplicate rows while only keeping one instance of each duplicate group.
  • You should see a message if you run the query above.
  • The duplicate rows have been removed from the table, according to the notification above.
  • Now retrieve all data from “tbl_Mcastudents” table after the duplicate rows have been deleted.
select * from tbl_Mcastudents  

Output:

Delete Duplicate Rows In SQL Server From A Table

Approach 2:

Example 1: A SQL Server table should not include duplicate records, according to database design best practises. Primary keys should be created during the database design phase to avoid duplicate records. However, there are situations when we need to interact with databases where these principles are broken or exceptions are permitted (when these rules are bypassed knowingly). When data is loaded from many sources into a staging table, for example, duplicate rows are possible. When the loading procedure is finished, the table should be cleaned or clean data loaded to a permanent table so that duplicates are no longer needed. As a result, removing duplicates from the loading table becomes a problem. Let's look at different approaches to data de-duplication in this tutorial.

We will consider two cases in this tip:

  • The first situation is when a SQL Server table has a primary key (or unique index) and one of the columns has duplicate values that need to be eliminated.
  • The second scenario is when a table lacks a primary key or any unique indexes and has duplicate rows that should be eliminated. Let's take a look at each scenario individually.

How to remove duplicate rows in a SQL Server table

In a SQL Server table, duplicate records might be a severe problem. Duplicate data might result in orders being processed several times, erroneous reporting results, and more. In SQL Server, there are a variety of options for dealing with duplicate records in a table, depending on the circumstances, such as:

Table with Unique Index - If your table has a unique index, you can utilise it to orderly locate duplicate data and subsequently delete the duplicate information. Self-joins, sorting the data by the maximum value, utilising the RANK function, or employing NOT IN logic can all be used to do identification.

Tables without a Unique Index - It's a little more difficult to work with tables that don't have a unique index. The ROW NUMBER() method can be used in conjunction with a common table expression (CTE) to sort the data and subsequently delete the duplicate rows.

To see real-world examples of how to delete duplicate records from a table, look at the examples below.

Removing duplicates rows from a SQL Server table with a unique index

We'll require a test environment to complete our jobs, which we'll construct using the following statement:

Step 1:

USE master
GO

Step 2:

CREATE DATABASE TestDB
GO

Step 3:

USE TestDB
GO

Step 4:

CREATE TABLE TableA
(
 ID INT NOT NULL IDENTITY(1,1),
 Value INT,
 CONSTRAINT PK_ID PRIMARY KEY(ID)  
)

Step 5: Now let's insert data into our new table - 'TableA' with the following statement:

USE TestDB
GO

INSERT INTO TableA(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

Step 5:

SELECT * FROM TableA

Step 6:

SELECT Value, COUNT(*) AS DuplicatesCount
FROM TableA
GROUP BY Value

As we can see in the result set below the values 3 and 5 exists in the 'Value' column more than once:

Example 2: Identify Duplicate Rows in a SQL Server Table

Our job is to ensure that the 'Value' column is unique by deleting duplicates. It's a little easier to remove duplicate values from a table with a unique index than it is to remove rows from a table without one. First and foremost, we must identify duplicates. There are numerous options for doing so. Let's look into and compare a few prevalent approaches. There are six solutions to finding duplicate values that should be eliminated (leaving just one value) in the following queries:

Finding duplicate values in a table with a unique index

Solution 1

SELECT a.* 
FROM TableA a, (SELECT ID, (SELECT MAX(Value) 
FROM TableA i WHERE o.Value=i.Value GROUP BY Value 
HAVING o.ID < MAX(i.ID)) AS MaxValue FROM TableA o) b
WHERE a.ID=b.ID AND b.MaxValue IS NOT NULL

Solution 2

SELECT a.* 
FROM TableA a, (SELECT ID, (SELECT MAX(Value) 
FROM TableA i WHERE o.Value=i.Value GROUP BY Value 
HAVING o.ID=MAX(i.ID)) AS MaxValue FROM TableA o) b
WHERE a.ID=b.ID AND b.MaxValue IS NULL

Solution 3

SELECT a.*
FROM TableA a
INNER JOIN
(
 SELECT MAX(ID) AS ID, Value 
 FROM TableA
 GROUP BY Value 
 HAVING COUNT(Value) > 1
) b
ON a.ID < b.ID AND a.Value=b.Value

Solution 4

SELECT a.* 
FROM TableA a 
WHERE ID < (SELECT MAX(ID) FROM TableA b 
WHERE a.Value=b.Value GROUP BY Value HAVING COUNT(*) > 1)

Solution 5

SELECT a.*
FROM TableA a
INNER JOIN
(SELECT ID, RANK() OVER(PARTITION BY Value ORDER BY ID DESC) AS rnk FROM TableA ) b 
ON a.ID=b.ID
WHERE b.rnk > 1

Solution 6

SELECT * 
FROM TableA 
WHERE ID NOT IN (SELECT MAX(ID) 
     FROM TableA 
     GROUP BY Value)

As we can see the result for all cases is the same as shown in the screenshot below:

Only the rows with the IDs 3, 5, and 6 should be eliminated. Looking at the execution plan, we can see that the most 'compact' option ('Solution 6') has the highest cost (in our example, the 'ID' column has a primary key, therefore 'NULL' values are not conceivable for that column, so 'NOT IN' will function without issue), while the second has the lowest cost:

Example 3: Deleting Duplicate Rows in a SQL Server Table

Let's now delete duplicate values from the table with the following queries. We'll simply utilise the second, fifth, and sixth queries to keep things simple:

USE TestDB
GO

--Initializing the table

TRUNCATE TABLE TableA

INSERT INTO TableA(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

--Deleting duplicate values

DELETE t
FROM TableA t
WHERE ID IN ( SELECT a.ID FROM TableA a, (SELECT ID, (SELECT MAX(Value) 
FROM TableA i WHERE o.Value=i.Value GROUP BY Value 
HAVING o.ID=MAX(i.ID)) AS MaxValue FROM TableA o) b
WHERE a.ID=b.ID AND b.MaxValue IS NULL) 

--Initializing the table

TRUNCATE TABLE TableA

INSERT INTO TableA(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

--Deleting duplicate values

DELETE a
FROM TableA a
INNER JOIN
(SELECT ID, RANK() OVER(PARTITION BY Value ORDER BY ID DESC) AS rnk FROM TableA ) b 
ON a.ID=b.ID
WHERE b.rnk>1

--Initializing the table

TRUNCATE TABLE TableA

INSERT INTO TableA(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

--Deleting duplicate values
DELETE FROM TableA 
WHERE ID NOT IN (SELECT MAX(ID) 
                 FROM TableA 
                 GROUP BY Value)

When we delete the data and look at the execution plans again, we observe that the first DELETE command is the quickest and the final is the slowest, as expected:

Example 4: Removing duplicates from a SQL Server table without a unique index:

Because SQL Server does not have a ROWID like Oracle, we must take additional work to generate unique row IDs in order to remove duplicates from the table without a unique index:

USE TestDB
GO

CREATE TABLE TableB (Value INT)

INSERT INTO TableB(Value) 
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

SELECT * FROM TableB

; WITH TableBWithRowID AS
(
 SELECT ROW_NUMBER() OVER (ORDER BY Value) AS RowID, Value
 FROM TableB
)

DELETE o
FROM TableBWithRowID o
WHERE RowID < (SELECT MAX(rowID) FROM TableBWithRowID i WHERE i.Value=o.Value GROUP BY Value)

SELECT * FROM TableB

We're making a table with duplicate rows in the code above. Using the ROW NUMBER() function, we generate unique identifiers and delete duplicates using a common table expression (CTE):

Example 5: Removing duplicates from a SQL Server table without unique index:

This code, however, can be replaced with more compact and optimal one:

USE TestDB
GO

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value) 
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

--Deleting duplicate values
; WITH TableBWithRowID AS
(
 SELECT ROW_NUMBER() OVER (PARTITION BY Value ORDER BY Value) AS RowID, Value
  FROM TableB
)

DELETE o
FROM TableBWithRowID o
WHERE RowID > 1

SELECT * FROM TableB

However, SQL Server can also be used to determine the physical address of a row. Despite the fact that official documentation for this capability is nearly tough to come by, it can be used as a substitute for Oracle's ROWID pseudo column. Since SQL Server 2008, it's been termed percent percent physloc percent percent, and it's a virtual binary(8) column that displays the row's physical location. We can utilise the value of percent percent physloc percent percent as a row identifier when deleting duplicate rows from a table without a unique index because it is unique for each row. As a result, we may eliminate duplicate entries from a table without a unique index in SQL Server, just as we can in Oracle, as well as from a table with a unique index.

The first two queries are Oracle equivalents for removing duplicates, the next two are queries for removing duplicates using percent percent physloc percent percent, similar to the case of the table with a unique index, and the last query does not utilise percent percent physloc percent percent merely to compare performance of all of these options:

Option 1

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

DELETE o
FROM
(SELECT %%physloc%% as RowID, value FROM TableB) o
WHERE o.RowID < (
    SELECT MAX(%%physloc%%)
    FROM TableB i
    WHERE i.Value=o.Value
    GROUP BY Value
)

Option 2

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

DELETE TableB
WHERE %%physloc%% not in (
     SELECT MAX(%%physloc%%)
     FROM TableB
     GROUP BY Value
      )

Option 3

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

DELETE b1
FROM 
(SELECT %%physloc%% as RowID, value FROM TableB) b1
INNER JOIN
(SELECT %%physloc%% as RowID, RANK() OVER(PARTITION BY Value 
ORDER BY %%physloc%% DESC) AS rnk FROM TableB ) b2 
ON b1.RowID=b2.RowID
WHERE b2.rnk>1

Option 4

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

DELETE b1
FROM Tableb b1 
WHERE %%physloc%% < (SELECT MAX(%%physloc%%) 
        FROM Tableb b2 
        WHERE b1.Value=b2.Value 
        GROUP BY Value 
         HAVING COUNT(*) > 1
         )

Option 5

--Initializing the table
TRUNCATE TABLE TableB

INSERT INTO TableB(Value)
VALUES(1),(2),(3),(4),(5),(5),(3),(5)

; WITH 
TableBWithRowID AS
(
   SELECT ROW_NUMBER() OVER (partition by Value ORDER BY Value) AS RowID, Value
   FROM TableB
)

DELETE o
FROM TableBWithRowID o
WHERE RowID > 1

Analyzing the Execution Plans, we can see that the first and the last queries are the fastest when compared to the overall batch times:

query performance

As a result, we may conclude that utilising percent percent physloc percent percent does not boost performance in general. It's vital to remember that this is an undocumented function of SQL Server, thus developers should use extreme caution while using it.

Other methods for removing duplicates are not included in this guide. We can, for example, save unique rows in a temporary table, erase all data from our table, and then insert distinct rows from the temporary table into our permanent table. DELETE and INSERT commands should be combined into a single transaction in this scenario.

Conclusion

Duplicate values can be found in a column that will be de-duplicated based on our needs, or duplicate rows can be found in a table. To avoid data duplication in the database, we must exclude the data in either instance.


SQL Delete Duplicates but Keep One

Using the MAX() function and the GROUP BY clause, you can eliminate duplicate rows while keeping the most recent ones.

Example 1: Observe the below query and output.

DELETE FROM sales_team_emails
  WHERE sales_person_id NOT IN (
    SELECT * FROM (
      SELECT MAX(sales_person_id) FROM sales_team_emails 
        GROUP BY sales_person_email
    )  AS s_alias
  ); 

Select * on sales_team_emails table to view the output.

Action Output Message : DELETE FROM sales_team_emails WHERE sales_person_id NOT IN ( SELECT * FROM ( SELECT MAX(sales_person_id) FROM sales_team_emails GROUP BY sales_person_email ) AS s_alias ) 4 row(s) affected 0.0061 sec.

Explanation:- We have successfully eliminated the duplicate rows, as shown in this result. The inner query, MAX() function, and GROUP BY clause are all used here.

STEP 1: We select the maximum sales_person_id aggregated by sales_person_email in the inner query.

STEP 2: We delete all other rows from table sales_team emails except those in the inner query in the outer query (STEP 1).

Example 2: Delete the duplicate rows but keep latest : using JOINS

Another way to achieve the goal is to use joins to delete the old entries from the table and preserve the latest entry in the table sales_team_emails comparing the sales_person_id column. Observe the below query :

DELETE s1 FROM sales_team_emails s1,
    sales_team_emails s2 
WHERE
    s1.sales_person_id < s2.sales_person_id
    AND s1.sales_person_email = s2.sales_person_email;

Action Output Message : DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id < s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email 4 row(s) affected 0.0053 sec

Select * on sales_team_emails table to view the output.

As we can see in this output, we have successfully deleted the duplicate rows, and the ones with higher sales_person_id ( sales_person_id is the primary key) are retained. Here we are doing a self join on the same table sales_team_emails, which is deleting duplicate records by keeping one copy, the one with a higher value of sales_person_id.

Example 3: Delete the duplicate row but keep oldest : using JOINS

JOINS can be used to maintain the earliest duplicate row entry while deleting the ones that arrived later in the table. With a small tweak in the WHERE clause, the answer is similar to the one in the previous section. Examine the following query and its result.

DELETE s1 FROM sales_team_emails s1,
    sales_team_emails s2 
WHERE
    s1.sales_person_id > s2.sales_person_id
    AND s1.sales_person_email = s2.sales_person_email;

Action Output Message: DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id > s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email 4 row(s) affected 0.0025 sec

Select * on sales_team_emails table to view the output.

Output:

The output shows the oldest rows preserved but the latest duplicate rows deleted. The only change is in the WHERE condition “WHERE s1.sales_person_id > s2.sales_person_id”

Example 4: Delete the duplicate row but keep oldest : using ROW_NUMBER()

Using the ROW NUMBER () function with the PARTITION BY clause is another way to eliminate duplicate rows while keeping the oldest entries in the table. Examine the query, its output, and its description below.
DELETE FROM sales_team_emails 
WHERE 
    sales_person_id IN (
    SELECT 
        sales_person_id 
    FROM (
      SELECT                         
   sales_person_id,
      ROW_NUMBER() OVER (
      PARTITION BY sales_person_email
      ORDER BY sales_person_email) AS row_num
      FROM 
    sales_team_emails
    ) s_alias
    WHERE row_num > 1
);

Action Output Message : DELETE FROM sales_team_emails WHERE sales_person_id IN ( SELECT sales_person_id FROM ( SELECT sales_person_id, ROW_NUMBER() OVER ( PARTITION BY sales_person_email ORDER BY sales_person_email) AS row_num FROM sales_team_emails ) s_alias WHERE row_num > 1 ) 4 row(s) affected 0.0017 sec

As we can see in the output, the latest duplicate rows got deleted. Here we are using inner queries and ROW_NUMBER() function.

STEP 1: In the innermost query “SELECT sales_person_id, ROW_NUMBER() OVER (PARTITION BY sales_person_email ORDER BY sales_person_email) AS row_num FROM sales_team_emails“ we are partitioning the entire table into small sections using PARTITION BY clause based on sales_person_email. Then ROW_NUMBER() function assigns the row numbers to each row, creating a separate column row_num. If we run this inner query separately, we get the output:-

STEP 2: In the next inner query, we do a select sales_person_id , WHERE row_num > 1 from the innermost query results (STEP 1)- figure 1.6. On running it separately will give the output:-

Finally, do delete in the outer query for all the rows with sales_person_id equal to the ones found in the inner query.

Example 5: The following statement uses a common table expression (CTE) to delete duplicate rows:

WITH cte AS (
    SELECT contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;

In this statement:

  • To begin, the CTE employs the ROW NUMBER() function to locate duplicate rows based on the first_name, last_name, and email columns' values.
  • The DELETE statement then deletes all duplicate rows while only keeping one instance of each duplicate group.
  • The duplicate rows were deleted, as indicated by the following statement from SQL Server.

When you query the sales.contacts table again, you'll notice that all duplicate rows have been removed.

SELECT contact_id, 
       first_name, 
       last_name, 
       email
FROM sales.contacts
ORDER BY first_name, 
         last_name, 
         email;

Example 6: In this post, I'll show you how to delete duplicate records from a table step by step, with an example and proof. This is a real-life incident that I personally experienced at work today. Let's have a look at an example to keep things easy. You can use the same approach by modifying the table's column names.

I have a table called "Account" with the primary key "Id" and the fields "AccountId," "AccountName," and "CreatedDate" as the other fields. In my table, I have duplicate "AccountId" data. Records are added in distinct time intervals to distinguish them (various date for understanding).

Now I wanted to delete duplicate records from the table and maintain only the most recent entry for each "AccountId."

Steps:

  • I've constructed a table and filled it with data based on the aforementioned use case.
  • You have two options to delete duplicates.
  • You have the option of keeping the first record for each "AccountId" or keeping the most recent record for each "AccountId."
  • Depending on your option, you must select "Asc" or "Desc" as shown in the figure below. Delete all duplicate rows except one in sql.

Delete all Duplicate Rows but Keeping One in SQL

Duplicate records that are more than a decade old. The most recent record for each "AccountId" must be retained.

SQL Code

DELETE FROM Account WHERE Id IN(
    SELECT Id FROM 
     (SELECT Id, ROW_NUMBER() OVER (PARTITION BY [AccountId] ORDER BY Id ASC)
     AS [ItemNumber] FROM  Account) 
     a WHERE ItemNumber > 1
     )

SELECT * FROM Account

After eliminating the duplicate records, you can now see the results. In SQL, this demonstrates how to delete all duplicate rows while keeping one.


SQL Delete Faster

Truncate to Delete All Rows Quickly To wipe all the data in a table, the fastest and easiest technique is to use a truncate.

Syntax:

truncate table to_empty_it;

This is a metadata procedure that takes place in real time. This will also reset the table's high-water mark. It additionally dealslocates all the space above the minextents for the table by default. The storage clause can be used to adjust this behaviour:

  • truncate table... reuse storage frees up all of the table's space
  • truncate table... drop storage (default) dealslocates all space above the minextents for the table.
  • truncate table... remove all storage frees up all of the table's space.

Important Notes: Before you use truncate, make sure there are no foreign keys pointing to this table.

Example 1: If there are - even if the child tables are empty – truncate will throw the following error:

truncate table to_empty_it;

To go around this, you'll need to disable the table's foreign keys first. After that, re-enable them. For example:

alter table to_empty_it_child 
  modify constraint fk disable;

truncate table to_empty_it;

alter table to_empty_it_child 
  modify constraint fk enable;

The child tables, of course, must also be empty. You cannot revalidate foreign keys if there are rows in them!

There is a method in Oracle Database 12c that allows you to clear both the parent and child tables in one fell swoop. Include the following clause as a cascade:

truncate table to_empty_it
  cascade;

Fortunately, for this to function, you must specify the foreign keys as delete cascade. This is a very uncommon option. As a result, it's unlikely that you'll be able to complete this task. Before you execute a cascaded truncate, double-check everything!

However, deleting the entire contents of a table is rare. In most cases, you'll wish to get rid of a subset of the rows. There are a few more DDL methods you can use to speed things up.

The most common method is to create a temporary table and save the data you want to keep. Then transfer the data.

Example 2: I wrote a blog post a few years ago about how separating delete operations into chunks might lessen the impact on the transaction log. Rather than deleting 100,000 rows in one single transaction, you can delete 100, 1,000, or any other number of rows in a loop of smaller transactions. You may be able to alleviate long-running blocking in addition to lowering the impact on the log. SSDs were just starting to gain pace at the time, and newer technologies like Clustered Columnstore Indexes, Delayed Durability, and Accelerated Database Recovery weren't yet available. So, I though it was time for a refresh to give a better picture of how SQL Server 2019 handles this.

Solution

Taking off significant sections of a table isn't always the best option. It may be faster to move the data you wish to keep into a new table, dump the old table, then rename the new one if you're eliminating 95% of a table and keeping 5%. Alternatively, copy the keeper rows out of the table, truncate it, and then copy them back in. However, because to other limits on the table, SLAs, and other variables, even when the purge is that much larger than the keep, this isn't always double.

If it turns out that you need to delete rows, you'll want to keep the impact on the transaction log and the rest of the workload to a minimum. The chunking technique isn't a new or original concept, but it does work well with some of these newer technologies, so let's put them to the test in a number of ways.

To set up, we have multiple constants that will be true for every test:

  • SQL Server 2019 RC1, with four cores and 32 GB RAM (max server memory = 28 GB)
  • 10 million row table
  • Restart SQL Server after every test (to reset memory, buffers, and plan cache)
  • Restore a backup that had stats already updated and auto-stats disabled (to prevent any triggered stats updates from interfering with delete operations).

We also have many variables that will change per test:

  • Recovery model (simple or full)
  • For simple, checkpoint in loop (yes or no)
  • For full, log backup in loop (yes or no)
  • Accelerated Database Recovery (on or off)
  • Delayed Durability (forced or off)
  • Table structure (rowstore or columnstore)
  • Total number of rows to delete from the table (10% (1MM), 50% (5MM), 90% (9MM))
  • Of that total, number of rows to delete per loop iteration (all (so no loop), 10%, 1%)
  • How often to commit transactions (0 (never), 10 (10% of the time), 100 (once))
  • This will produce 864 unique tests, and you better believe I’m going to automate all of these permutations.

And the metrics we’ll measure:

  • Overall duration
  • Average/peak CPU usage
  • Average/peak memory usage
  • Transaction log usage/file growth
  • Database file usage, size of version store (when using Accelerated Database Recovery)
  • Delta rowgroup size (when using Columnstore)
  • Source Table

First, I reinstalled AdventureWorks (AdventureWorks2017.bak, to be specific). I made a clone of Sales.SalesOrderDetail with its own identity field and added a filler column only to give each row a little more flesh and minimise page density to make a table with 10 million rows:

Step 1: create table

CREATE TABLE dbo.SalesOrderDetailCopy
(
  SalesOrderDetailID     int IDENTITY,
  SalesOrderID           int,
  CarrierTrackingNumber  nvarchar(25),
  OrderQty               smallint,
  ProductID              int,
  SpecialOfferID         int,
  UnitPrice              money,
  UnitPriceDiscount      money,
  LineTotal              numeric(38,6),
  rowguid                uniqueidentifier,
  ModifiedDate           datetime,
  filler                 char(50) NOT NULL DEFAULT ''
);
GO

Then, to generate the 10,000,000 rows, I inserted 100,000 rows at a time, and ran the insert 100 times:

Step 2: Insert data

INSERT dbo.SalesOrderDetail
(
  SalesOrderID,
  CarrierTrackingNumber,
  OrderQty,
  ProductID,
  SpecialOfferID,
  UnitPrice,
  UnitPriceDiscount,
  LineTotal,
  rowguid,
  ModifiedDate
)

Step 3: Display records

SELECT TOP(100000)
  SalesOrderID,
  CarrierTrackingNumber,
  OrderQty,
  ProductID,
  SpecialOfferID,
  UnitPrice,
  UnitPriceDiscount,
  LineTotal,
  rowguid,
  ModifiedDate
FROM Sales.SalesOrderDetail;
GO 100

I did not construct any indexes on the table; as part of each test, I will establish a new clustered index (columnstore half of the time) after the database is recovered.

Automating Tests

I configured a few parameters, backed up the database, backed up the log twice, and then backed up the database again (such that the log would have the least amount of used space when recovered):

ALTER  DATABASE AdventureWorks SET RECOVERY FULL;
ALTER  DATABASE AdventureWorks SET AUTO_CREATE_STATISTICS OFF;
ALTER  DATABASE AdventureWorks SET AUTO_UPDATE_STATISTICS OFF;
BACKUP DATABASE AdventureWorks TO DISK = 'c:\temp\awtest.bak' WITH INIT, COMPRESSION;
BACKUPLOG      AdventureWorks TO DISK = 'c:\temp\awtest.trn' WITH INIT, COMPRESSION;
BACKUPLOG      AdventureWorks TO DISK = 'c:\temp\awtest.trn' WITH INIT, COMPRESSION;
BACKUPDATABASE AdventureWorks TO DISK = 'c:\temp\awtest.bak' WITH INIT, COMPRESSION;

After that, I constructed a Control database in which I would put the stored procedures that would conduct the tests, as well as the tables that would record the test results (simply the start and end times of each test) and performance metrics gathered across all of the tests.

CREATE DATABASE [Control];
GO
USE [Control];
GO
CREATE TABLE dbo.Results
(
  TestID     int NOT NULL,
  StartTime  datetime2(7) NULL,
  EndTime    datetime2(7) NULL
);
CREATE TABLE dbo.Metrics
(
  dt         datetime2(7) NOT NULL DEFAULT(sysdatetime()),
  cpu        decimal(6,3),
  mem        decimal(18,3),
  db_size    decimal(18,3),
  db_used    decimal(18,3),
  db_perc    decimal(5,2),
  log_size   decimal(18,3),
  log_used   decimal(18,3),
  log_perc   decimal(5,2),
  vstore     decimal(18,3),
  rowgroup   decimal(18,3)
);

CREATE CLUSTERED COLUMNSTORE INDEX x ON dbo.Metrics;

Capturing the permutations of all the 864 tests I wanted to perform took a few tries, but I ended up with this:

;WITH bits(b)    AS (SELECT * FROM (VALUES(0),(1)) ASbits(b)),
  rec(model)     AS (SELECT * FROM (VALUES('FULL'),('SIMPLE')) ASx(model)),
  chk(chk)       AS (SELECT * FROM bits),
  logbk(logbk)   AS (SELECT * FROM bits),
  adr(adr)       AS (SELECT * FROM bits),
  dd(dd)         AS (SELECT * FROM bits),
  struct(struct) AS (SELECT * FROM (VALUES('columnstore'),('rowstore')) ASstruct(struct)),
  rowtotal(rt)   AS (SELECT * FROM (VALUES(10),(50),(90)) ASrowtotal(r)),
  rowperloop(rp) AS (SELECT * FROM (VALUES(100.0),(10),(1)) ASrowperloop(r)),
  committing(c)  AS (SELECT * FROM (VALUES(0),(10),(100)) AScommitting(r))

SELECTTestID = IDENTITY(int,1,1),* INTO dbo.Tests
FROM rec
LEFT OUTER JOIN chk   ON rec.model = 'SIMPLE'
LEFT OUTER JOIN logbk ON rec.model = 'FULL'
CROSS JOIN adr
CROSS JOIN dd
CROSS JOIN struct
CROSS JOIN rowtotal
CROSS JOIN rowperloop
CROSS JOIN committing;

As expected, this inserted 864 rows with all of those combinations.

After that, I constructed a stored procedure to record the metrics I mentioned previously. I'm also using SentryOne SQL Sentry to watch the instance, so there will undoubtedly be some more interesting data there, but I also wanted to capture the crucial details without using any third-party tools. The process is as follows, which goes to great lengths to produce all of the metrics for every given timestamp in a single row:

CREATE PROCEDURE dbo.CaptureTheMetrics
AS
BEGIN
  WHILE 1 = 1
  BEGIN
    IF EXISTS(SELECT 1 FROM sys.databases WHERE name = N'AdventureWorks' AND state = 0)
    BEGIN
      ;WITH perf_src AS
     (
        SELECT instance_name, counter_name, cntr_value 
        FROM sys.dm_os_performance_counters
        WHERE counter_name LIKE N'%total server memory%'
        OR(
         [object_name] LIKE N'%:Resource Pool Stats%'
         AND counter_name IN(N'CPU usage %', N'CPU usage % base') 
         AND instance_name = N'default') 
        OR(
         counter_name IN(N'Log File(s) Size (KB)', N'Log File(s) Used Size (KB)')
         AND instance_name = N'AdventureWorks')
      ),
      cpu AS 
     (
        SELECT cpu = COALESCE(100*(CONVERT(float,val.cntr_value) / NULLIF(base.cntr_value,0)),0) 
        FROM       perf_src AS val 
        INNER JOIN perf_src AS base 
          ON val.counter_name  = N'CPU usage %' 
         AND base.counter_name = N'CPU usage % base'
      ),
      mem AS
     (
        SELECT mem_usage = cntr_value/1024.0
        FROM perf_src
        WHERE counter_name like '%total server memory%'
      ),
      dbuse AS
     (
        SELECT db_size   = SUM(base.size/128.0),
           used_size = SUM(base.size/128.0) -
                           SUM(val.unallocated_extent_page_count/128.0)
        FROM AdventureWorks.sys.dm_db_file_space_usage AS val
        INNER JOIN AdventureWorks.sys.database_files AS base
        ON val.[file_id] = base.[file_id]
      ),
      vstore AS
     (
        SELECT size = CONVERT(bigint,persistent_version_store_size_kb)/1024.0
        FROM sys.dm_tran_persistent_version_store_stats
        WHERE database_id = DB_ID(N'AdventureWorks')
      ),
      rowgroup AS 
     (
        SELECT size = SUM(size_in_bytes)/1024.0/1024.0
        FROM AdventureWorks.sys.dm_db_column_store_row_group_physical_stats
      ),
      loguse AS 
     (
        SELECT log_size  = base.cntr_value/1024.0,
               used_size =  val.cntr_value/1024.0
        FROM       perf_src as val
        INNER JOIN perf_src as base
                ON val.counter_name  = N'Log File(s) Used Size (KB)'
          AND base.counter_name = N'Log File(s) Size (KB)'
      )
      INSERT Control.dbo.Metrics
      (
       cpu,mem,db_size,db_used,db_perc,log_size,log_used,log_perc,vstore,rowgroup
      )
      SELECT cpu = CONVERT(decimal(6,3),cpu.cpu),
        mem      = CONVERT(decimal(18,3),mem.mem_usage),
        db_size  = CONVERT(decimal(18,3),dbuse.db_size),
        db_used  = CONVERT(decimal(18,3),dbuse.used_size),
        db_perc  = CONVERT(decimal(5,2),COALESCE(100*(CONVERT(float,dbuse.used_size)
                    /NULLIF(dbuse.db_size,0)),0)), 
        log_size = CONVERT(decimal(18,3),loguse.log_size),
        log_used = CONVERT(decimal(18,3),loguse.used_size),
        log_perc = CONVERT(decimal(5,2),COALESCE(100*(CONVERT(float,loguse.used_size)
                    /NULLIF(loguse.log_size,0)),0)),
        vstore   = CONVERT(decimal(18,3), vstore.size),
        rowgroup = CONVERT(decimal(18,3),COALESCE(rowgroup.size, 0))
      FROM cpu
      INNER JOIN      mem      ON 1 = 1
      INNER JOIN      dbuse    ON 1 = 1
      INNER JOIN      loguse   ON 1 = 1
      LEFT OUTER JOIN vstore   ON 1 = 1
      LEFT OUTER JOIN rowgroup ON 1 = 1;
    END  
    -- wait three seconds, then try again:
    WAITFOR DELAY '00:00:03';
  END
END
GO

I created a job step for that stored procedure and began it. You may wish to choose a different delay than three seconds because there is a trade-off between the expense of gathering and the completeness of the data that may favour one side over the other for you.

Finally, I wrote the function that would contain all of the logic for determining exactly what to do with each test's combination of parameters. This required a few iterations as well, but here's the finished product:

CREATE PROCEDURE dbo.DoTheTest
  @TestID int
AS
BEGIN
  SET ANSI_WARNINGS OFF;
  SET NOCOUNT ON;
  PRINT 'Starting test ' + RTRIM(@TestID) + '.';
  DECLARE @sql nvarchar(max);
  -- pull test-specific data from Control.dbo.Tests
  DECLARE @model varchar(6), @chk bit, @logbk bit, 
          @adr bit, @dd bit, @struct varchar(11), 
          @rt decimal(5,2), @rp decimal(5,2), @c tinyint;
  SELECT @model = model, @chk = chk, @logbk = logbk, @adr = adr, 
         @struct = struct, @rt = rt, @rp = rp, @c = c
    FROM dbo.Tests WHERE TestID = @TestID;
  -- reset memory, cache, etc.
  SET @sql = N'EXEC master.sys.sp_configure
             @configname = N''max server memory (MB)'', @configvalue = 27000;
             EXEC master.sys.sp_configure
             @configname = N''max server memory (MB)'', @configvalue = 28000;
             RECONFIGURE WITH OVERRIDE;
             DBCC FREEPROCCACHE WITH NO_INFOMSGS;
             DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;';
  EXEC sys.sp_executesql @sql;
 
  -- restore database
  SET @sql = N'ALTER DATABASE AdventureWorks SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    RESTORE DATABASE AdventureWorks
      FROM DISK = ''c:\temp\awtest.bak'' WITH REPLACE, RECOVERY;';
  EXEC sys.sp_executesql @sql;
  -- set recovery model
  SET @sql = N'ALTER DATABASE AdventureWorks SET RECOVERY ' + @model + N';';
  EXEC sys.sp_executesql @sql;
  -- set accelerated database recovery
  IF @adr = 1
  BEGIN
    SET @sql = N'ALTER DATABASE AdventureWorks SET ACCELERATED_DATABASE_RECOVERY = ON;';
    EXEC sys.sp_executesql @sql;
  END
  -- set forced delayed durability
  IF @dd = 1
  BEGIN
    SET @sql = N'ALTER DATABASE AdventureWorks SET DELAYED_DURABILITY = FORCED;';
    EXEC sys.sp_executesql @sql;
  END
  -- create columnstore index or rowstore PK
  IF @struct = 'columnstore'
  BEGIN
    SET @sql = N'CREATE CLUSTERED COLUMNSTORE INDEX cci ON dbo.SalesOrderDetailCopy;';
  END
  ELSE
  BEGIN
    SET @sql = N'CREATE UNIQUE CLUSTERED INDEX pk
                 ON dbo.SalesOrderDetailCopy(SalesOrderDetailID);';
  END
  EXEC AdventureWorks.sys.sp_executesql @sql;
  -- update stats, to be sure
  SET @sql = N'UPDATE STATISTICS dbo.SalesOrderDetailCopy WITH FULLSCAN;';
  EXEC AdventureWorks.sys.sp_executesql @sql;
  -- log the start
  INSERT dbo.Results(TestID, StartTime) 
    VALUES(@TestID, sysdatetime());
  DECLARE @rc int = 10000000; -- we know there are 10 million rows in the table;
                              -- you would likely use COUNT(*) FROM dbo.table
  DECLARE @RowsToDeleteTotal int = @rc * (@rt/100.0);
  DECLARE @NumberOfLoops int = @RowsToDeleteTotal / (@RowsToDeleteTotal*@rp/100.0);
  DECLARE @Top int = @RowsToDeleteTotal / @NumberOfLoops;
  DECLARE @i int = 1, @commit bit = 0;
  -- set up loop
    WHILE @i <= @NumberOfLoops
    BEGIN
     IF ((@c = 10 AND @i % 10 = 1) OR (@c = 100 AND @i = 1)) AND @@TRANCOUNT = 0
     BEGIN
           BEGIN TRANSACTION;
     END
     SET @sql = N'DELETE TOP (@Top) AdventureWorks.dbo.SalesOrderDetailCopy
      WHERE (@rt = 90.00 AND SalesOrderDetailID % 10 <> 0)
      OR (@rt <> 90.00 AND SalesOrderDetailID % (@rc/@RowsToDeleteTotal) = 0);';      
         EXEC AdventureWorks.sys.sp_executesql @sql,
        N'@Top int, @rt int, @rc int, @RowsToDeleteTotal int',
        @Top, @rt, @rc, @RowsToDeleteTotal;
     SET @commit = CASE WHEN ((@c = 10 AND (@i % 10 = 0 OR @i = @NumberOfLoops)) 
                       OR (@c = 100 AND @i = @NumberOfLoops)) THEN 1 ELSE 0 END;
      IF @commit = 1 AND @@TRANCOUNT = 1
      BEGIN
        COMMIT TRANSACTION;
      END
      IF (@logbk = 1 OR @chk = 1) AND (@c = 0 OR @commit = 1)
      BEGIN
        -- run these twice to make sure log wraps
        SET @sql = CASE WHEN @chk = 1 THEN N'CHECKPOINT; CHECKPOINT;'
             ELSE N'BACKUP LOG AdventureWorks TO DISK = N''c:\temp\aw.trn''
                 WITH INIT, COMPRESSION;
            BACKUP LOG AdventureWorks TO DISK = N''c:\temp\aw.trn''
                 WITH INIT, COMPRESSION;' END;
        EXEC AdventureWorks.sys.sp_executesql @sql;
      END
      SET @i += 1;
    END
  END
  -- log the finish
  UPDATE dbo.Results 
    SET EndTime = sysdatetime()
    WHERE TestID = @TestID 
      AND EndTime IS NULL; -- so we can repeat tests
  IF @@TRANCOUNT > 0
  BEGIN
    COMMIT TRANSACTION;
  END
END
GO

There is a lot going on there, but the basic logic is this:

  • Take the test-specific data from the dbo (TestID and all parameters). table of tests
  • Make a sp configure modification and empty the buffers and plan cache to give SQL Server a boost.
  • Restore AdventureWorks to its original state, with all 10 million rows intact and no indexes.
  • Change the database's choices based on the parameters of the current test.
  • Either a clustered columnstore index or a clustered B-tree index should be created.
  • To be safe, manually update the table's statistics.
  • Note that the test has begun.
  • Determine the number of loop iterations required, as well as the number of rows to delete within each iteration.

Inside the loop:

  • Determine whether or not this iteration requires the initiation of a transaction.
  • Execute the deletion
  • Determine if the transaction has to be committed for this iteration.
  • On this iteration, determine if we need to checkpoint or back up the log.
  • We record that this test is complete at the end of the loop, and we commit any uncommitted transactions.

Because of all the results, extra traffic, and resource utilisation, I don't want to perform the test in Management Studio (even on the same VM). I also created a stored procedure and added it to a job:

CREATE PROCEDURE dbo.RunAllTheTests
AS
BEGIN
  DECLARE @j int = 1;
  WHILE @j <= 864
  BEGIN
    EXEC Control.dbo.DoTheTest@TestID = @j;
  END
END
GO

That took a lot longer than I'd like to confess. Part of this was due to the fact that I had originally included a 0.1 percent test for rowperloop, which took many hours in certain cases. So, after removing those from the table a few days later, I can clearly claim that eliminating 1,000 rows at a time is highly unlikely to be the best option, regardless of any other variables:

Four of the longest test durations were recorded, including one that lasted more than 11 hours.

(While this appears to be an exception in comparison to most other testing, I'm sure it wouldn't be much quicker than deleting one or ten rows at a time.) In every other instance, it was actually faster than deleting half or most of the table.)

Performance Results

After discarding the results from the 0.1% tests, I put the rest into a second metrics table with the durations loaded:

SELECT r.TestID, 
  duration = DATEDIFF(SECOND, r.StartTime, r.EndTime),
  avg_cpu  = AVG(m.cpu),
  max_cpu  = MAX(m.cpu),
  avg_mem  = AVG(m.mem),
  max_mem  = MAX(m.mem),
  max_dbs  = MAX(m.db_size),
  max_dbu  = MAX(m.db_used),
  max_dbp  = MAX(m.db_perc),
  max_logs = MAX(m.log_size),
  max_logu = MAX(m.log_used),
  max_logp = MAX(m.log_perc),
  vstore   = MAX(m.vstore),
  rowgroup = MAX(m.rowgroup)
INTO dbo.RelevantMetrics 
FROM dbo.Results AS r
INNER JOIN dbo.Tests AS t
ON r.TestID = t.TestID
LEFT OUTER JOIN dbo.Metrics AS m
ON m.dt >= r.StartTime AND m.dt <= r.EndTime
WHERE t.rp <> 0.1
GROUP BY r.TestID, r.StartTime, r.EndTime;
CREATE CLUSTERED COLUMNSTORE INDEX cci_rm ON dbo.RelevantMetrics;

Because some tests ran so quickly that I didn't have enough time to gather any data, I had to utilise an outside join on the metrics table. This means that for some of the quicker tests, there will be no association with any other performance details other than the speed with which they were completed.

Then I began looking for patterns and oddities. First, I looked at the length and CPU usage based on whether or not Delayed Durability (DD) and/or Accelerated Database Recovery (ADR) were turned on

:
SELECT delayed_dur = dd, accelerated_dr = adr, 
  avg_duration   = AVG(duration*1.0), 
  [max_duration] = MAX(duration),
  avg_cpu        = AVG(avg_cpu), 
  max_cpu        = MAX(max_cpu)
FROM dbo.RelevantMetrics AS m 
INNER JOIN dbo.Tests AS t ON t.TestID = m.TestID
GROUP BY dd, adr;

Results (with anomalies highlighted):

Duration and CPU results broken down by ADR / DD options

When either option is enabled (or both – and when both are active, the peak is reduced), it appears that total length is improved by roughly the same amount. For ADR alone, there appears to be a duration outlier that had no effect on the average (this specific test involved deleting 9,000,000 rows, 90,000 rows at a time, in FULL recovery, on a rowstore table). The CPU outlier for DD had no effect on the average - this particular case was deleting 1,000,000 rows on a columnstore database all at once.

What about overall differences comparing rowstore and columnstore?

SELECT struct, 
  avg_duration   = AVG(m.duration*1.0), 
  [max_duration] = MAX(m.duration),
  avg_mem        = AVG(m.avg_mem),
  max_mem        = MAX(m.max_mem)
FROM dbo.RelevantMetrics AS m 
INNER JOIN dbo.Tests AS t ON t.TestID = m.TestID
GROUP BY struct;

Results:

Duration and memory esults comparing rowstore and columnstore

On average, Columnstore is 20% slower, but it uses 20% less memory. I also wanted to observe how it affected the size and utilisation of data and log files:

SELECT struct, 
  avg_dbsize  = AVG(r.max_dbs), 
  avg_dbuse   = AVG(r.max_dbp),
  avg_logsize = AVG(r.max_logs),
  avg_loguse  = AVG(r.max_logp) 
FROM dbo.RelevantMetrics AS r
INNER JOIN dbo.Tests AS t
ON r.TestID = t.TestID
GROUP BY t.struct;

Results:

Impact on log and data files based on columnstore vs. rowstore

Finally, erasing in chunks does not appear to provide the same benefits as it did in the past, at least in terms of length, on today's technology. The 18 fastest results, as well as 72 of the top 100, came from tests in which all of the rows were erased in one go, as revealed by the following query:

;WITH x AS 
(
  SELECT TOP (100) rp, duration, 
    rn = ROW_NUMBER() OVER (ORDER BY duration
  FROM dbo.RelevantMetrics AS m 
  INNER JOIN dbo.Tests AS t
  ON m.TestID = t.TestID 
  INNER JOIN dbo.Results AS r
  ON m.TestID = r.TestID
  ORDER BY duration
)
SELECT rp, 
  SpotsInTop100     = COUNT(*),
  FirstSpotInTop100 = MIN(rn)
FROM x
GROUP BY rp
ORDER BY rp;

Results:

Breakdown among top 100 tests

And if we look at averages across all of the data, as in this query:

SELECT TotalRows = t.rt, -- % of 10 MM
  RowsPerLoop    = t.rp, 
  avg_duration   = AVG(r.duration*1.0)
FROM dbo.RelevantMetrics AS r
INNER JOIN dbo.Tests AS t
ON r.TestID = t.TestID
GROUP BY t.rt, t.rp
ORDER BY t.rt, t.rp;

We can see that deleting all the rows at once, whether we're deleting 10%, 50%, or 90% of them, is faster than chunking deletes in any fashion (again, on average):

Divide the total number of rows to delete by the percentage of those rows to delete in each loop.

Average time depends on the number of rows to remove and the number of rows to delete every loop iteration.

(Notice that the first column lowers from 309 seconds to 162 seconds when the 6,062 second max duration outlier is removed.)

Even in the best-case scenario, that's still 33, 36, or 83 seconds when a deletion is running and potentially blocking everyone else, and that's without accounting for additional factors such as memory, log file, CPU, and so on. It's important to remember that duration isn't the only factor to consider; it's simply that it's often the first (and occasionally the only) item people consider. This test harness was designed to demonstrate that you can and should collect a variety of other metrics as well, and the results demonstrate that outliers can appear from everywhere.

You can use this harness as a template to create your own tests that are more targeted on the restrictions and capabilities in your environment. I didn't look at the metrics from every potential aspect because there are so many, but I'm going to retain this database. So, if there are any other ways you'd like to see the data broken up, let me know in the comments and I'll see what I can do. Just don't ask me to repeat all of the tests.

Caveats

This does not take into account a concurrent workload, the impact of table limitations such as foreign keys, the inclusion of triggers, or a variety of other scenarios. Another thing to test (maybe in a future tip) is having different tasks interact with the same table throughout the process, and measuring things like blocking durations, wait kinds and timings, and determining in which scenarios one set of activity has a more dramatic influence on the other set.


SQL Delete First or Last 5 Records

The DELETE TOP command in SQL Server is used to delete records from a table while also limiting the number of records destroyed to a specific number or percentage.

In SQL Server, you can select or delete the first, last, 5, 10, or any number of rows from a table. Alternatively, we may declare that we are retrieving and removing a certain amount of records from the top or bottom of the database.

We require a unique column, such as a primary key column, via which we may filter the first or last n number of records using the TOP and ORDER BY clauses to delete a given number of records from a table.

Syntax:

The syntax for the DELETE TOP statement in SQL is:

DELETE TOP (top_value) [ PERCENT ] 
FROM table
[WHERE conditions];

Parameters or Arguments

  • table: The table that you wish to delete records from.
  • WHERE conditions: Optional. The conditions that must be met for the records to be deleted.
  • TOP (top_value): On the basis of top value, it will eliminate the top number of rows in the result set. TOP(10), for example, deletes the top 10 rows that match the delete criterion.
  • PERCENT: Optional. The top rows are based on a top value percentage of the whole result set if PERCENT is supplied (as specfied by the PERCENT value). TOP(10) PERCENT, for example, deletes the top 10% of records that match the delete criterion.

Note: Because you're deleting the entire row from the table, you don't need to list fields in the SQL Server DELETE command.

Example 1: Implementation: Let’s create a table and delete n number of records from first or last.

Step 1: Create a table using following script:

CREATE TABLE tbBooks
(
    BookId     INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
    BookName   VARCHAR(100),
    Author     VARCHAR(100),
    Publisher  VARCHAR(100),
    BookPrice  DECIMAL(10,2)
)

Step 2: Add some dummy data into the table:

INSERT INTO tbBooks VALUES
('Asp.Net','Ajay','Rozy Publication',1200),
('C#.Net','Sahil','Jai Publication',1000),
('VB.Net','Nancy','Rozy Publication',970),
('MVC','Sahil','Amar Publication',1480),
('JAVA','Supreet','Sam Publication',850),
('PHP','Parvesh','Maya Publication',800),
('ORACLE','Sunny','Amar Publication',1110),
('MYSQL','Shaheed','Sam Publication',400),
('jQuery','Amit','Maya Publication',950)

Step 3: Check inserted data:

SELECT * FROM tbBooks
BookId BookName    Author  Publisher            BookPrice
1       Asp.Net     Ajay    Rozy Publication    1200.00
2       C#.Net      Sahil   Jai Publication     1000.00
3       VB.Net      Nancy   Rozy Publication    970.00
4       MVC         Sahil   Amar Publication    1480.00
5       JAVA        Supreet Sam Publication     850.00
6       PHP         Parvesh Maya Publication    800.00
7       ORACLE      Sunny   Amar Publication    1110.00
8       MYSQL       Shaheed Sam Publication     400.00
9       jQuery      Amit    Maya Publication    950.00

Step 4: Select first 5 records:

SELECT TOP 5 * FROM tbBooks ORDER BY BookId ASC
BookId BookName    Author  Publisher            BookPrice
1       Asp.Net     Ajay    Rozy Publication    1200.00
2       C#.Net      Sahil   Jai Publication     1000.00
3       VB.Net      Nancy   Rozy Publication    970.00
4       MVC         Sahil   Amar Publication    1480.00
5       JAVA        Supreet Sam Publication     850.00

Step 5: Select last 5 records:

SELECT TOP 5 * FROM tbBooks ORDER BY BookId DESC
BookId  BookName    Author  Publisher           BookPrice   
9       Query       Amit    Maya Publication    950.00
8       MYSQL       Shaheed Sam Publication     400.00
7       ORACLE      Sunny   Amar Publication    1110.00
6       PHP         Parvesh Maya Publication    800.00
5       JAVA        Supreet Sam Publication     850.00

Step 6: Delete first 2 records

DELETE FROM tbBooks
WHERE BookId IN (SELECT TOP 2 BookId FROM tbBooks ORDER BY BookId ASC)

--Check data in table
SELECT * FROM tbBooks 
BookId BookName    Author  Publisher            BookPrice
3       VB.Net      Nancy   Rozy Publication    970.00
4       MVC         Sahil   Amar Publication    1480.00
5       JAVA        Supreet Sam Publication     850.00
6       PHP         Parvesh Maya Publication    800.00
7       ORACLE      Sunny   Amar Publication    1110.00
8       MYSQL       Shaheed Sam Publication     400.00
9       jQuery      Amit    Maya Publication    950.00

Step 7: Delete last 2 records:

DELETE FROM tbBooks
WHERE BookId IN (SELECT TOP 2 BookId FROM tbBooks ORDER BY BookId DESC)

--Check data in table
SELECT * FROM tbBooks 
BookId BookName    Author  Publisher            BookPrice
3       VB.Net      Nancy   Rozy Publication    970.00
4       MVC         Sahil   Amar Publication    1480.00
5       JAVA        Supreet Sam Publication     850.00
6       PHP         Parvesh Maya Publication    800.00
7       ORACLE      Sunny   Amar Publication    1110.00

Example 2: You can use this example database schema for this question or otherwise, you may use your database table for testing.

CREATE TABLE `students` (
  `id` int UNSIGNED NOT NULL,
  `stid` varchar(16) NOT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
);

ALTER TABLE `students` ADD PRIMARY KEY (`id`);

Delete Records Using The Primary Key

This SQL query will delete every record which has a primary key id less than or equal to 100.

DELETE FROM `students` WHERE `id` <= 100;

This SQL query will delete every record which has an id greater than 900.

DELETE FROM `students` WHERE `id` > 900;

The aforementioned queries will delete records based on the table's id value. The issue is what happens if the table's id isn't incremented exactly one at a time. Alternatively, some of the records may have already been erased. Alternatively, the id may not be numerical. The above-mentioned queries would not work as planned in that case.

Delete Records Using ORDER BY

This query will sort the entire table by created_at column and delete the first 100 records.

DELETE FROM `students` ORDER BY `created_at` ASC limit 100

To invert the order of the above query, we can use DESC (descending) instead of ASC (ascending), which will eliminate the last 100 records.

DELETE FROM `students` ORDER BY `created_at` DESC limit 100

SQL Delete Large Number of Records

APPROACH 1: Some reasons for deleting records are:

  • Corrupted or useless Data
  • Customer information that is private and confidential (Data that customer ask you to delete)
  • Data from the past (Usually more than 5 years old)
  • This process can be accomplished quickly with the DELETE command, but it becomes more difficult when the data contains millions of records and is linked to other tables.

You can have the following scenarios.

  • The deletion of requested records takes many hours.
  • It provides you a timeout without deleting any records.
  • This is due to the way the delete statement operates.
  • The Delete statement looks for records that are affected by foreign keys.
  • The delete statement loads the transaction log.
  • There may be triggers that execute after or before the delete statement in some circumstances.
  • Many indexes could be associated to the records, or there could be missing indexes on foreign keys.
  • On the impacted rows, there is a deadlock or blocking.

There are some options for dealing with this issue. The ways are,

1. Demo Data

For this example, we'll create a large dataset on Sales using the WideWorldImporters database. CustomerTransactions Table is a table that contains information about customer transactions.

With an initial count of 97,147 entries in the CustomerTransactions table, we'll use a while cycle to insert that much data 49 times in our table, totaling 4,857,350 rows.

INTO Sales.BK_CustomerTransactions
FROM Sales.CustomerTransactions

/*Inserting backup into Sales.CustomerTransactions*/
DECLARE @COUNT INT = 0;
WHILE (@COUNT < 49)
BEGIN
INSERT INTO Sales.CustomerTransactions (CustomerID
  ,TransactionTypeID
  ,InvoiceID
  ,PaymentMethodID
  ,TransactionDate
  ,AmountExcludingTax
  ,TaxAmount
  ,TransactionAmount
  ,OutstandingBalance
  ,FinalizationDate
  ,IsFinalized
  ,LastEditedBy
  ,LastEditedWhen)
SELECT CustomerID
  ,TransactionTypeID
  ,InvoiceID
  ,PaymentMethodID
  ,TransactionDate
  ,AmountExcludingTax
  ,TaxAmount
  ,TransactionAmount
  ,OutstandingBalance
  ,FinalizationDate
  ,IsFinalized
  ,LastEditedBy
  ,LastEditedWhen
FROM Sales.BK_CustomerTransactions;
    SET @COUNT = @COUNT + 1;
END

After inserting data, records are going to be 4,857,350

For this demo we are going to delete records with TransactionTypeID = 1 affecting 3,525,500 records.

2. Delete In Chunks In Sql Server

One approach for coping with large amounts of data is to divide and conquer.

In this example, we're talking about breaking down the data into smaller bits so that SQL Server can handle it without timeout.

For example, if we have a database containing 4.8 million records, we will partition them into 50 pieces and delete one chunk of 100,000 records each time the loop runs.

DECLARE @NRO INT = 0;
/*DEFINING WHILE FOR DELETING CHUNKS OF 100000 RECORDS*/
WHILE (@@ROWCOUNT &gt; 0)
BEGIN
PRINT @@ROWCOUNT
    SET @NRO = @NRO + 1;
    PRINT CONCAT(100000 * @NRO , 'ROWS DELETED.') 
    DELETE TOP (100000)
    FROM SALES.CustomerTransactions
    WHERE TransactionTypeID = 1; 
END

We are using All records that had TransactionTypeId = 1 were deleted.

Delete took 2 minutes and 35 seconds.

3. Truncate And Re-Insert Table Data

When deleting or impacting a large number of records, truncate and re-insert data in a table works like a charm.

When using this strategy, we must take the following steps:

  • Table is stored in another table (Typically a temp table).
  • Table should be truncated.
  • Insert data from the temp table into the final table, filtering out items that will be deleted.
  • Try to erase the temp table.

The benefit we have when doing this operation stems from the fact that the truncate/insert statement does not load data into the transaction log (unless you are in Full Recovery Mode), as well as the fact that when deleting data, truncate does not verify referential integrity one by one.

SELECT *
INTO ##BK_Sales_CustomerTransactions
FROM Sales.CustomerTransactions;
GO
/*TRUNCATE Sales.CustomerTransactions TABLE */
TRUNCATE TABLE Sales.CustomerTransactions;
/*INSERTING DATA FILTERING TRX TYPE 1*/
INSERT INTO Sales.CustomerTransactions (
   CustomerTransactionID
   ,CustomerID
   ,TransactionTypeID
   ,InvoiceID
   ,PaymentMethodID
   ,TransactionDate
   ,AmountExcludingTax
   ,TaxAmount
   ,TransactionAmount
   ,OutstandingBalance
   ,FinalizationDate
   ,IsFinalized
   ,LastEditedBy
   ,LastEditedWhen)
SELECT
   CustomerTransactionID
   ,CustomerID
   ,TransactionTypeID
   ,InvoiceID
   ,PaymentMethodID
   ,TransactionDate
   ,AmountExcludingTax
   ,TaxAmount
   ,TransactionAmount
   ,OutstandingBalance
   ,FinalizationDate
   ,IsFinalized
   ,LastEditedBy
   ,LastEditedWhen
FROM ##BK_Sales_CustomerTransactions
WHERE TransactionTypeID != 1;

Delete took 1 minute and 3 seconds

4.Performance Comparison

We'll start by backing up the Sales.CustomerTransactions table and deleting records with TransactionTypeID = 1 to test performance.

SELECT *
INTO ##bkSales_CustomerTransactions
FROM Sales.CustomerTransactions;

After creating the table backup we are going to start testing the performance for the cases that we have.

DELETE 
FROM Sales.CustomerTransactions 
WHERE TransactionTypeID = 1; 

This query took 7 minutes and 41 seconds

Sometimes when deleting large amounts of data queries can timeout.

  • As you can see in the examples bellow we have the following times for each case.
  • Simple Delete. 7 Minutes and 41 Seconds.
  • Delete in chunks. 2 minute and 35 seconds
  • Truncate and Insert. 1 minute and 3 Seconds
  • Truncate and insert is 60 % faster than deleting records in chunks.

5. Conclusions

It is more efficient to utilise the Truncate and Insert method when deleting millions of entries, although there are some circumstances when truncate is difficult owing to limitations. In that instance, the delete in chunks method might be used.

APPROACH 2:

It can be a pain in the neck to delete a large number of rows from a table. A never-ending command that degrades performance, creates lock contention, and generates a large number of I/O-intensive writing operations, including on archived/transaction logs. In this post, we'll go over some of the most important things to think about if you need to remove millions of entries from a table.

Check if you can use Truncate

The most effective way to remove all the rows from a table is to use the Truncate Table command. Instead of eliminating rows by row as the Delete command does, this command usually dealslocates the table's data pages (in some DBMS, like MySQL, the Truncate command drops and re-creates the table). Truncate is frequently faster than Delete because of this. Normally, Truncate does not fire any table triggers.

But be careful: in some DBMS, such as Oracle and MySQL, the truncate command causes an implicit commit, which means you can't undo it. To recover deleted rows in Oracle, you can't even utilise a flashback table command.

Syntax:

The basic syntax of the Truncate command is:

Truncate table table_name;

When you need to remove almost all the rows from a huge table (e.g. more than 80%) and the rollback/recover restriction is not an issue for you, you may also evaluate the use of Truncate. In this case, you should:

Create a new table with the rows that should not be truncated, using a Create Table as Select or the similar command in your DBMS (e.g. use Select * into Table from for MS SQL Server);

  • Truncate the original table;
  • Write the rows back to the original table, using an Insert as Select command;
  • Drop the table you created in the first step.

In the following image, we want to remove all the rows from Sales where SalesYear <> 2019 (using Oracle syntax).

Again, because data recovery may be limited, you MUST TEST THIS PROCEDURE CAREFULLY BEFORE USING IT.

Additional constraints may apply to Truncate, such as the table not being able to be referenced by enabled foreign keys. To validate rollback options and execution constraints, consult your DBMS's documentation.

Disable foreign keys that reference the table

You've confirmed that Truncate isn't for you. Then you'll need to use the delete command. When a delete operation is performed on a table, the DBMS examines any foreign keys that refer to it in order to apply the On Delete rules (i.e. restrict, cascade and set null).

When millions of rows are affected by a deletion, it may require some time and computing effort.

Furthermore, in many DBMS, such a deletion would result in lock escalation (i.e., the DBMS would lock more than just the pages with the removed rows) and might have a substantial impact on DBMS users.

As a result, you should profit from disabling any foreign keys that reference the table if you can guarantee referential rules while the delete is being performed. The easiest approach to do this is to generate the disable and enable commands using the metabase.

Disable (almost) all indexes:

During a remove, all indexes in a table must be updated. You should disable table indexes before the delete and rebuild them afterward to decrease this overhead during a large delete.

There are just two exceptions to this rule:

  • Cluster indexes – in some DBMS, a table becomes inaccessible when clustered indexes are disabled (see the example of MS SQL Server here).
  • Indexes that are handy for deleting – You should review the delete's execution plan before running it (using the Explain Plan command, for example). The DBMS may utilise an index to choose the rows that will be deleted if you are deleting up to 15% of the table's data. If this is the case, such an index should not be disabled.

Delete chunks of data – verify partitioning rules

Long-running deletes of millions of rows can have a variety of consequences for the system and end users, including lock issues. If you issue numerous delete operations over "small" chunks of rows each instead of a single long-running delete, the effects on DBMS users will be less (in terms of lock contention and performance loss). Locks would be freed after each delete, allowing you to commit the transaction and free up system resources. So, if you want to delete 200 million rows from a table, you can consider doing ten 20 million-row deletes.

If you combine your chunk generation approach with a table partitioning technique, the command execution will be substantially improved.

Assume that SaleDate and SaleDep are among the columns in your table. You need to eliminate rows with SaleDep = 'Sports' from your database, which is partitioned by Month/Year of SaleDate.

You can remove many times, each time with SaleDep = 'Sports' and SaleDate = a specific Month/Year. We'll show you an example of such a delete for three months in the year 2000. If you have a lot of partitions, you can use a loop to create your delete commands.

Each delete would be performed by the DBMS on a single table partition, which might considerably improve the delete's performance.


SQL Delete Last Inserted Row

You must use ORDER BY DESC with LIMIT to delete the last record (on condition) from a table.

Syntax:

DELETE FROM yourTableName 
WHERE yourColumnName1=yourValue ORDER BY yourColumnName2 DESC LIMIT 1;

The above syntax deletes the final record from a table based on a criteria. It chooses the first element to delete after sorting the column in descending order.

Step 1: let us create a table. The query to create a table is as follows:

create table UserLoginTable(Id int NOT NULL AUTO_INCREMENT,
UserId int,UserLoginDateTime datetime,PRIMARY KEY(Id));

Step 2: Insert some records in the table using insert command. The query is as follows:

insert into UserLoginTable(UserId,UserLoginDateTime) values(2,'2019-01-27 13:47:20');
insert into UserLoginTable(UserId,UserLoginDateTime) values(1,'2018-11-28 12:30:12');
insert into UserLoginTable(UserId,UserLoginDateTime) values(2,'2019-01-26 11:30:30');
insert into UserLoginTable(UserId,UserLoginDateTime) values(1,'2015-03-11 15:23:55');
insert into UserLoginTable(UserId,UserLoginDateTime) values(2,'2019-03-21 16:01:56');

Step 3: display all records from the table using select statement. The query is as follows:

select *from UserLoginTable;

Output:

+----+--------+---------------------+
| Id | UserId | UserLoginDateTime   |
+----+--------+---------------------+
|  1 |      2 | 2019-01-27 13:47:20 |
|  2 |      1 | 2018-11-28 12:30:12 |
|  3 |      2 | 2019-01-26 11:30:30 |
|  4 |      1 | 2015-03-11 15:23:55 |
|  5 |      2 | 2019-03-21 16:01:56 |
+----+--------+---------------------+

Step 4: Here is the query to delete last record (on condition) from a table:

delete from UserLoginTable where UserId=2 ORDER BY UserLoginDateTime DESC LIMIT 1;

Check the table record once again using select statement. The query is as follows:

select *from UserLoginTable;

Output:

+----+--------+---------------------+
| Id | UserId | UserLoginDateTime   |
+----+--------+---------------------+
|  1 |      2 | 2019-01-27 13:47:20 |
|  2 |      1 | 2018-11-28 12:30:12 |
|  3 |      2 | 2019-01-26 11:30:30 |
|  4 |      1 | 2015-03-11 15:23:55 |
+----+--------+---------------------+

SQL Delete Multiple Tables Inner Join

Using JOIN in a UPDATE Statement - Multiple Tables. I described how to use JOIN and many tables in the UPDATE statement in a previous blog post. In a JOIN statement, utilise the MySQL DELETE clause to delete data from many tables that fulfil the stated condition all at once.

Using JOIN and many tables in a DELETE statement is completely possible.

It is totally possible to use JOIN and multiple tables in the DELETE statement.

Example 1:

Step 1: Let's use same table structure which we had used previously. We have two tables Table 1 and Table 2.

Create table1

CREATE TABLE Table1 (Col1 INT, Col2 INT, Col3 VARCHAR(100))
INSERT INTO Table1 (Col1, Col2, Col3)
SELECT 1, 11, 'First'
UNION ALL
SELECT 11, 12, 'Second'
UNION ALL
SELECT 21, 13, 'Third'
UNION ALL
SELECT 31, 14, 'Fourth'
GO

Create table2

CREATE TABLE Table2 (Col1 INT, Col2 INT, Col3 VARCHAR(100))
INSERT INTO Table2 (Col1, Col2, Col3)
SELECT 1, 21, 'Two-One'
UNION ALL
SELECT 11, 22, 'Two-Two'
UNION ALL
SELECT 21, 23, 'Two-Three'
UNION ALL
SELECT 31, 24, 'Two-Four'
GO

Step 2: Now let us check the content in the table.

SELECT *
FROM Table1
SELECT *
FROM Table2
GO

Now pay close attention to the diagram below. Table 1 and Table 2 are the two tables we have here. Our need is to delete those two records from Table1 whose Table2 Col3 values are "Two-Three" and "Two-Four" and whose Col1 values are the same in both tables.

They write cursors, table variables, local variables, and other things on occasion. However, the simplest and cleanest way to do the work is to use the JOIN clause in the DELETE statement and numerous tables in the DELETE statement.

Step 3: Delete data from Table1

DELETE Table1
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.Col1 = t2.Col1
WHERE t2.Col3 IN ('Two-Three','Two-Four')
GO

Step 4: Now let us select the data from these tables.

Check the content of the table

SELECT *
FROM Table1
SELECT *
FROM Table2
GO

As you can see, using the JOIN clause in the DELETE statement makes updating data in one table from another extremely simple. You may also use the MERGE statement to accomplish the same goal, but I prefer this way. Let's tidy up the clause by removing the tables we've made.

DROP TABLE Table1
DROP TABLE Table2
GO

Example 2: example Use Case

Let me use an example to illustrate how we can use the DELETE clause and INNER JOIN to remove rows from multiple tables. Consider the queries below:

CREATE SCHEMA society;
USE society;
DROP TABLE IF EXISTS users, contacts;
CREATE TABLE users(
    user_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    state VARCHAR(50)
);
CREATE TABLE contacts(
    home_id INT PRIMARY KEY AUTO_INCREMENT,
    tel VARCHAR(50),
    address VARCHAR(255)
);

INSERT INTO users(first_name, last_name, state) VALUES 
("John", "Muller", "Colorado"), 
("Mary", "Jane", "California"), 
("Peter", "Quill", "New York");

INSERT INTO contacts(tel, address) VALUES 
("303-555-0156", "281 Denver, Colorado"), 
("661-555-0134", "302 Drive, Bakersfield"), 
("516-555-0148", "626 Est Meadow, NYC");

Once we have such data, we can illustrate how to use DELETE with INNER JOIN as shown in the query below:

DELETE society.users, society.contacts FROM society.users 
INNER JOIN contacts ON user_id=home_id WHERE user_id=3;

The query above will display the result as shown below:

two rows affected in 7 ms, which indicates that two rows have been removed.

Example 3: Delete with LEFT JOIN

The second Delete method we will discuss is to use the LEFT JOIN. The general syntax for this delete type is as shown below:

DELETE tbl1 FROM tbl1 LEFT JOIN tbl2 ON tbl1.col = tbl2.col WHERE tble.col IS NULL;

FOR DELETE with LEFT JOIN, we specify only one table—unlike the INNER JOIN where we specified two tables.

Consider the entire query below:

USE society;

DROP TABLE IF EXISTS users, contacts;

CREATE TABLE users(
    user_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    state VARCHAR(50)
);

CREATE TABLE contacts(
    home_id INT PRIMARY KEY AUTO_INCREMENT,
    tel VARCHAR(50),
    address VARCHAR(255)
);

INSERT INTO users(first_name, last_name, state) VALUES ("John", "Muller", "Colorado"), 
("Mary", "Jane", "California"), 
("Peter", "Quill", "New York"), 
("Mystic", "Arts", "South Carolina");

INSERT INTO contacts(tel, address) VALUES ("303-555-0156", "281 Denver, Colorado"), 
("661-555-0134", "302 Drive, Bakersfield"), 
("516-555-0148", "626 Est Meadow NYC"), 
("843-555-0105", null);
 
DELETE users FROM users LEFT JOIN contacts ON user_id = home_id WHERE address IS NULL;
SELECT * FROM users;

The user whose address is null after the JOIN is destroyed when the above query is run, and the return result is as seen below.


SQL Delete Multiple Tables

Using the MySQL DELETE JOIN statement, you can delete data from many tables.

1. A single DELETE statement on multiple tables.

A single DELETE statement on many connected tables with an ON DELETE CASCADE referential action for the foreign key in the child table.

Example 1: MySQL DELETE JOIN with INNER JOIN

You can also delete rows from one table and the matching rows from another table using the INNER JOIN clause in the DELETE command in MySQL.

For example, you may use the following statement to delete rows from both T1 and T2 databases that fulfil a particular condition:

DELETE T1, T2
FROM T1
INNER JOIN T2 ON T1.key = T2.key
WHERE condition;

It's worth noting that the table names T1 and T2 are sandwiched between the DELETE and FROM keywords. The DELETE statement only deletes rows in T2 table if T1 table is not specified. Similarly, if the T2 table is not specified, the DELETE command will only delete rows from the T1 table.

The criteria for matching rows between the T1 and T2 tables that will be eliminated is T1.key = T2.key.

The WHERE clause's condition determines which rows in T1 and T2 will be removed.

Example 2: MySQL DELETE JOIN with INNER JOIN example:

Suppose, we have two tables t1 and t2 with the following structures and data:

>DROP TABLE IF EXISTS t1, t2;

CREATE TABLE t1 (
    id INT PRIMARY KEY AUTO_INCREMENT
);

CREATE TABLE t2 (
    id VARCHAR(20) PRIMARY KEY,
    ref INT NOT NULL
);

INSERT INTO t1 VALUES (1),(2),(3);

INSERT INTO t2(id,ref) VALUES('A',1),('B',2),('C',3);
Example 3: MySQL DELETE JOIN:

Using the DELETE...INNER JOIN statement, the following statement deletes the row with id 1 in the t1 table as well as the record with ref 1 in the t2 table:

DELETE t1,t2 FROM t1
        INNER JOIN
    t2 ON t2.ref = t1.id 
WHERE
    t1.id = 1;

The statement returned the following message:

2 row(s) affected

It indicated that two rows have been deleted.

Example 4: MySQL DELETE JOIN with LEFT JOIN:

To discover rows in the left table that have or don't have matching rows in the right table, we frequently utilise the LEFT JOIN clause in the SELECT statement.

The LEFT JOIN clause in the DELETE statement can also be used to delete rows from a table (left table) that do not match entries in another table (right table).

The following syntax shows how to delete rows from T1 table that do not have matching rows in T2 table using DELETE statement with LEFT JOIN clause:

DELETE T1 
FROM T1
        LEFT JOIN
    T2 ON T1.key = T2.key 
WHERE
    T2.key IS NULL;

Note that, unlike the INNER JOIN clause, we just put T1 table after the DELETE keyword, not both T1 and T2 tables.

Example 5: MySQL DELETE JOIN with LEFT JOIN example:

See the following customers and orders tables in the sample database:

Customers and Orders Tables

There are zero or more orders for each customer. Each order, on the other hand, corresponds to a single consumer.

To effectively remove our customers' master data, we can utilise the DELETE statement with the LEFT JOIN clause. Customers who have not placed an order are removed using the following statement:

DELETE customers 
FROM customers
LEFT JOIN
    orders ON customers.customerNumber = orders.customerNumber 
WHERE
    orderNumber IS NULL;

We can confirm the deletion by running the following query to see if any customers with no orders exist:

SELECT c.customerNumber, 
    c.customerName, 
    orderNumber
FROM
    customers c
        LEFT JOIN
    orders o ON c.customerNumber = o.customerNumber
WHERE
    orderNumber IS NULL;

The query returned an empty result set which is what we expected.

Example 6: You only relate to the columns of one table when performing a single-table DELETE or UPDATE, therefore you don't need to qualify the column names with the table name. This statement, for example, deletes all rows in table t with id values greater than 100:

DELETE FROM t WHERE id > 100;

To build a multiple-table DELETE, use a FROM clause to name all the tables and a WHERE clause to define the conditions used to match rows in the tables. When there is a matching id value in table t2, the following statement deletes rows from table t1:

DELETE t1 FROM t1 INNER JOIN t2 ON t1.id = t2.id;

It's important to note that if a column name appears in many tables, it's confusing and needs to be qualified with a table name.

The syntax also allows you to delete records from several tables at the same time. Name both tables after the DELETE keyword to delete records with matching id values from both tables:

DELETE t1, t2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;

What if you need to get away of rows that don't match? A multiple-table DELETE can use any type of join that you can write in a SELECT, so apply the same method you would for identifying nonmatching data in a SELECT. To put it another way, utilise an LEFT JOIN or a RIGHT JOIN. Write a SELECT like this to find rows in t1 that don't have a match in t2:

SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

The analogous DELETE statement to find and remove those rows from t1 uses a LEFT JOIN as well:

DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

A second multiple-table DELETE syntax is supported by MySQL. A FROM clause is used to indicate the tables from which rows should be deleted, and a USING clause is used to join the tables that determine which rows should be deleted. The following syntax can be used to rewrite the preceding multiple-table DELETE statements:

DELETE FROM t1 USING t1 INNER JOIN t2 ON t1.id = t2.id;
DELETE FROM t1, t2 USING t1 INNER JOIN t2 ON t1.id = t2.id;
DELETE FROM t1 USING t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

The techniques for constructing multiple-table UPDATE statements are nearly identical to those for DELETE: All tables involved in the transaction should be named, and column references should be qualified as needed. As a result, you wish to raise everyone's score by one point. You can achieve this with a multiple-table UPDATE as follows:

UPDATE score, grade_event SET score.score = score.score + 1
WHERE score.event_id = grade_event.event_id
AND grade_event.date = '2012-09-23' AND grade_event.category = 'Q';

In this case, you could accomplish the same objective using a single-table update and a subquery:

UPDATE score SET score = score + 1
WHERE event_id = (SELECT event_id FROM grade_event
WHERE date = '2012-09-23' AND category = 'Q');

Subqueries, on the other hand, cannot be used to write additional updates. For example, you could wish to replicate column values from one table to another as well as identify rows to update depending on the contents of another table. For rows with a matching id column value, the following statement duplicates t1.a to t2.a:

UPDATE t1, t2 SET t2.a = t1.a WHERE t2.id = t1.id;

To perform multiple-table deletes or updates for InnoDB tables, you need not use the syntax just described. Instead, set up a foreign key relationship between tables that includes an ON DELETE CASCADE or ON UPDATE CASCADE constraint. For details, see Section 2.13, “Foreign Keys and Referential Integrity.”

Example 7: When you wish to delete a row in one table that is linked to other rows in another table, things get a little more tricky.

Each employee, for example, is assigned to one or more regions, each of which contains many employees. The employeeterritories table is used to keep track of employee-territory relationships.

When you delete a record from the workers table, you must also delete the rows from the employeeterritories table that are associated to it. You must run two DELETE statements in order to accomplish this:

DELETE FROM employees
WHERE employeeID = 3;

DELETE FROM employeeterritories
WHERE employeeID = 3;

Most database management systems enable you implement a foreign key constraint, which means that if you delete a row in one table, the corresponding rows in the associated table are immediately deleted as well. This ensures the data's integrity. You only need to run the first DELETE command in this scenario to delete rows from two tables.

If your database management system doesn't support the foreign key constraint, you'll need to run both DELETE statements in a single transaction to ensure that they run in all-or-nothing mode.


SQL Delete Parent Child Rows

When removing records from tables with a parent-child connection specified, using the DELETE CASCADE option in your foreign key constraint declarations means faster performance and less code.

As an example, Table A and Table B are the two tables we have. If I delete a row from table A, it is assumed that all connected rows in table B will be destroyed as well. In SQL Server, how do we accomplish this?

Instead of Table A and Table B, let's utilise Departments and Employees tables to add some context and clarity.

1. Delete parent child rows in SQL

When a row from the Departments table is destroyed, it must be followed by the deletion of all relevant entries from the Employees table. If we delete the IT department row from the Departments table, we also want all of the IT department's employees to be erased from the Employees table.

The Employees table's DeptId column is a foreign key to the Departments table's Id column.

As a result, when an entry in the Departments table is deleted, we want all of the employees in that department to be erased from the Employees table as well. Because DeptId is a foreign key, the correct approach to accomplish this is to restrict cascade deletes.

If we try to delete a row from the Departments table and if that department has related rows in the Employees table, by default, we get the following REFERENCE CONSTRAINT error

The DELETE statement conflicted with the REFERENCE constraint "FK__Employees__DeptI__38996AB5". The conflict occurred in database "TestDB", table "dbo.Employees", column 'DeptId'.

  • SQL Server Foreign Key Constraint Cascade Delete
  • First, drop the existing foreign key constraint

2.Alter table Employees drop constraint Constraint_Name

3.Recreate the foreign key constraint with cascading deletes

Alter table Employees
add constraint FK_Dept_Employees_Cascade_Delete
foreign key (DeptId) references Departments(Id) on delete cascade

When we delete an item from the Departments table with foreign key constraint cascade deletes enabled, all related rows from the Employees table are also automatically erased.

Example: Same foreign key in multiple tables

What if there are numerous tables with the same foreign key? GenderId is a foreign key referencing the Id column from the Gender table in both the tables (Teachers and Students) in the following example.

When foreign key cascade deletes are enabled, when a row from the Gender table is destroyed, all related rows from both tables (i.e. Teachers and Students) are immediately erased as well.

What if we don't have a foreign key restriction or don't want cascade deletes turned on.

Begin Try

       Begin Tran 

       Declare @GenderToDelete int = 2 

       -- Delete first from child tables
       Delete from Teachers where GenderId = @GenderToDelete
       Delete from Students where GenderId = @GenderToDelete 

       -- Finally Delete from parent table
       Delete from Gender where Id = @GenderToDelete 

       Commit Tran
End Try 

Begin Catch

       Rollback Tran

End Catch

Please note: If you delete child records before deleting the parent record, your queries will start to fail if a foreign key constraint is added later.

SQL Script for tables (Departments and Employees)

Create table Departments
(
       Id int primary key identity,
       [Name] nvarchar(50)
)
Go

Create table Employees
(
       Id int primary key identity,
       [Name] nvarchar(50),
       DeptId int foreign key references Departments(Id)
)
Go 

Insert into Departments values ('IT')
Insert into Departments values ('HR')
Go 

Insert into Employees values ('Mark', 1)
Insert into Employees values ('Mary', 1)
Insert into Employees values ('John', 2)
Insert into Employees values ('Sara', 2)
Insert into Employees values ('Steve', 2)

SQL Script for tables (Gender, Teachers and Students)

Create table Gender
(
       Id int primary key identity,
       Gender nvarchar(20)
)
Go 

Create table Teachers
(
       Id int primary key identity,
       [Name] nvarchar(50),
       GenderId int foreign key references Gender(Id) on delete cascade
)
Go

Create table Students
(
       Id int primary key identity,
       [Name] nvarchar(50),
       GenderId int foreign key references Gender(Id) on delete cascade
)
Go

Insert into Gender values ('Male')
Insert into Gender values ('Female')
Go

Insert into Teachers values ('Mark', 1)
Insert into Teachers values ('John', 1)
Insert into Teachers values ('Mary', 2)
Insert into Teachers values ('Sara', 2)
Insert into Teachers values ('Flo', 2)
Go

Insert into Students values ('David', 1)
Insert into Students values ('Ron', 1)
Insert into Students values ('Jess', 2)
Insert into Students values ('Tara', 2)
Insert into Students values ('Innes', 2)
Go

Alter table Teachers
add constraint FK_Gender_Employees
foreign key (GenderId) references Gender(Id)

Alter table Students
add constraint FK_Gender_Students
foreign key (GenderId) references Gender(Id)

Example 2: Assume two tables in sql

TableA (Parent) --> TableB (Child)

Every row in TableA has several child rows related to it in TableB.

I want to delete specific rows in TableA which means i have to delete the related rows in tableB first.

This deletes the child entries

delete from tableB where last_update_Dtm = sysdate-30;

To delete the parent rows for the rows just deleted in the child table I could do something like this

Delete from TableA where not exists (select 1 from tableB where tableA.key=tableB.key);

The above will will also delete rows in the child table where (last_update_Dtm = sysdate-30) is false. TableA does not have a last_update_dtm column so there is no way of knowing which rows to delete without the entries in the child table.

I could save the keys in the child table prior to deleting but this seems like an expensive approach. What is the correct way of deleting the rows in both tables?

Two possible approaches.

Declare a foreign key on-delete-cascade and delete parent rows older than 30 days if you have one. All of the child rows will be immediately erased.

Based on your description, it appears that you know which parent rows you want to delete and which child rows you need to delete. Have you ever experimented with SQL in this way?

delete from child_table
  where parent_id in (
   
select parent_id from parent_table
  where updd_tms != (sysdate-30)

-- now delete the parent table records

delete from parent_table
where updd_tms != (sysdate-30);

Based on your requirement, it looks like you might have to use PL/SQL. I'll see if someone can post a pure SQL solution to this (in which case that would definitely be the way to go).

declare
    v_sqlcode number;
    PRAGMA EXCEPTION_INIT(foreign_key_violated, -02291);
begin
    for v_rec in (select parent_id, child id from child_table
                         where updd_tms != (sysdate-30) ) loop

    -- delete the children
    delete from child_table where child_id = v_rec.child_id;

    -- delete the parent. If we get foreign key violation, 
    -- stop this step and continue the loop
    begin
       delete from parent_table
          where parent_id = v_rec.parent_id;
    exception
       when foreign_key_violated
         then null;
    end;
 end loop;

Example 3: Let's start by confirming that the DELETE CASCADE option is enabled on the foreign key between the Parent and Child tables in our existing schema. Here is the SQL query that will be used to check this, as well as the outcome.

SELECT name,delete_referential_action_desc
  FROM sys.foreign_keys
name	                delete_referential_action_desc
FK_Child_Parent	                    CASCADE

Let's delete a record from the Parent table using the following SQL statement now that we've validated we have this option available.

DELETE FROM [dbo].[Parent] where ParentID=82433

When we look at the SQL Optimizer's explanation plan for this query, we can see that the SQL Optimizer removes the child entries first, then deletes the Parent table. As a result, each table only needs to be accessed once.

Let's examine if there are any differences if we remove the DELETE CASCADE option from our foreign key definition. To do so, we'll have to drop the foreign key and recreate it without using the DELETE CASCADE option. The SQL statements to accomplish this change are listed below.

ALTER TABLE [dbo].[Child] DROP CONSTRAINT [FK_Child_Parent]

ALTER TABLE [dbo].[Child]  WITH CHECK 
ADD CONSTRAINT [FK_Child_Parent] FOREIGN KEY([ParentID])
REFERENCES [dbo].[Parent] ([ParentID])

We can execute a second deletion once the foreign key has been recreated to determine if the speed has improved. One thing to keep in mind is that if the DELETE CASCADE option is not set, we must first run a second delete statement to remove the records from the Child table. Here are the SQL statements that will be used to delete the data.

DELETE FROM [dbo].[Child] where ParentID=62433
DELETE FROM [dbo].[Parent] where ParentID=62433

We can observe that the explanation plans for both statements are very similar. The only distinction is that because we are performing separate delete statements, the Child table must be consulted a second time when deleting from the Parent database to check the foreign key restriction.

We can confirm that the extra scan of the Child table does actually mean that the DELETE CASCADE option performs best using the SQL Profiler output from each query. The DELETE CASCADE option utilizes less resources in every category and runs around 20% faster, as seen below.

                       CPU	Reads	Writes	Duration
No Delete Cascade	344	28488	0	399
Delete Cascade	        250	14249	0	312

SQL Delete Rollback

If you haven't yet committed the transaction, try rolling it back. You must restore the data from your last backup if you have already committed the transaction (by explicitly executing commit or closing the command line client, or when the option autocommit is 1, which is the default).

Before doing any unsafe work, use the command SET autocommit=0. Until you commit your modifications, they will be maintained in your current transaction.

As with transactions, truncate can be rolled back. You can't undo a transaction once it's been committed.

My response: Why would you try to undo a transaction that has already been committed? Let's look at another operator, DELETE. Can we undo a delete action after it has been committed? TRUNCATE, I believe, behaves similarly to DELETE, INSERT, and UPDATE, and it may be rolled back when used in transactions.

Argument 1: Truncate is not logged and can’t be recovered from the log file.

My response: Truncate is a reported operation; it only reports the page deallocations rather than the records being removed. The DELETE command, on the other hand, logs each and every record it removes from the table, thus it takes a little longer to run.

Argument 2: Delete is better than truncate.

My response: Truncate and Delete are not to be compared. Truncate restores the table to its original state, including the identity column, while delete only removes the data that meets the WHERE criteria. If the WHERE criteria is not specified, delete will remove all rows from the table but will not reset the identity column. They're both very different, and they're used in various ways.

Example: We now have a table with dummy data. Now let's test whether we can rollback a DELETE inside a TRANSACTION:

BEGIN TRANSACTION
--select * from employee
DELETE from Employee where Empid='1'
SELECT * from Employee
GO

We deleted the record where the Empid equals 1 and now we have only one record:

Single Record Deleted

Let’s try to rollback and see if we can recover the deleted record:

ROLLBACK TRANSACTION

SELECT * from employee

As you can see below, we have the record back.


SQL Delete Vs Truncate

Main Article :- Sql difference between DELETE and TRUNCATE

Delete

  • The Delete command can be used to delete all or a subset of rows from a table defined by a Where clause.
  • It's a DML command (Data Manipulation Language).
  • The SQL Delete statement locks each record in a table that needs to be deleted.
  • The DELETE statement deletes rows one by one and records each deleted row in the transaction log.
  • Rows are removed one by one.
  • The DELETE command is more time consuming than the TRUNCATE command.
  • You must have DELETE permission on the table to use Delete.
  • After performing the DELETE Statement on the table, the identity of the column is preserved.
  • With indexed views, the delete command can be used.
  • The object statistics and allocated space are retained when using the Delete command.
  • A trigger can also be activated by the delete command. Delete deletes the data from individual rows. As a result, a trigger is triggered.
  • The Delete command deletes records that match the where clause. It also leaves columns, indexes, constraints, and schema intact.

Truncate

  • The truncate command deletes all of a table's rows. In this case, we can't utilise a Where clause.
  • It is a DDL command (Data Definition Language).
  • It takes up less space in the transaction log than the truncate command.
  • To eliminate all records, the SQL Truncate statement locks the table and page.
  • TRUNCATE TABLE deletes data by relocating the data pages that were used to hold the table data, and only the page deallocations are recorded in the transaction log.
  • The TRUNCATE command, on the other hand, is faster than the DELETE command.
  • To use Truncate on a table, the table must have at least ALTER permission.
  • Identity If the table contains an identity column, the column is reset to its seed value.
  • Truncate isn't compatible with indexed views.
  • A trigger cannot be activated by the truncate command. The trigger is triggered whenever a row is modified.
  • It does this because it dealslocates all pages.
  • The truncate command removes all of a table's rows. The columns, indexes, constraints, and structure are not removed.