How to read large SQL Server ERRORLOG (the 5GB test)
I often come across large ERRORLOGs mainly because they are not regularly recycled. Sometimes, the ERRORLOG can grow to an unmanageable size despite regular recycling. This happened to me during a significant transaction rollback over Synchronous Availability Group. The ERRORLOG log grew to 5GB in size within a few hours due to “Sending page request for Undo” messages.
The primary node failure triggered the rollback during an insert of 800 GB into a partitioned table. The failure caused the primary node to reboot, triggering failover to the secondary replica, and, of course, after the secondary node came back up, the rollback started. At that point, I needed to see what errors and failures were registered in the ERRORLOG during the service recovery, which proved difficult.
Using SSMS / Log File Viewer
Unless you use a log management solution such as Azure Log Analytics, the first course of action for most DBAs is to open SQL Server Log Viewer within the SSMS. In my case, however, it gave up quickly:
The out-of-memory exception is caused by the amount of data returned to the client. We have to be more clever. Let’s stop the loading right after opening the viewer using the “Stop” button on the top bar, apply the filter and then hit “Refresh”. This should return fewer rows and will work…, right?
Not much luck, either. Let’s look into alternative ways.
Using T-SQL
Behind the scenes, SSMS uses the dbo.xp_readerrorlog
and runs the following command to read the ERRORLOG:
EXEC sp_executesql
N'EXEC master.dbo.xp_readerrorlog
@ArchiveID
, 1
, @FilterText
, NULL
, @FirstEntry
, @LastEntry
, N''desc''
, @InstanceName '
,N'@InstanceName nvarchar(4000)
, @ArchiveID int
, @FilterText nvarchar(4000)
, @FirstEntry nvarchar(4000)
, @LastEntry nvarchar(4000)'
,@InstanceName = NULL
,@ArchiveID = 3
,@FilterText = NULL
,@FirstEntry = NULL
,@LastEntry = NULL
The procedure dbo.xp_readerrorlog
Accepts several parameters, including a filter. With this in mind, we can execute it in SSMS as a T-SQL code:
exec sp_executesql
N'EXEC master.dbo.xp_readerrorlog
@ArchiveID
, 1
, @FilterText
, NULL
, @FirstEntry
, @LastEntry
, N''desc''
, @InstanceName '
,N'@InstanceName nvarchar(4000)
, @ArchiveID int
, @FilterText nvarchar(4000)
, @FirstEntry nvarchar(4000)
, @LastEntry nvarchar(4000)'
,@InstanceName=NULL
,@ArchiveID=3
,@FilterText='ERROR' --our filter
,@FirstEntry=NULL
,@LastEntry=NULL
This has taken just under 30 seconds to scan through the 5GB log file and returned 387 rows that contain the word “ERROR”. Great, it gives me something, but on the other hand, the output isn’t particularly useful without the corresponding rows:
So, what better alternatives do we have to get any sensible information out of a large log file?
Using SQL Server Management Object (SMO)
We can use PowerShell to interrogate ERRORLOG via SMO:
Import-Module SQLPS
$SqlServer = new-object Microsoft.SqlServer.Management.Smo.Server "localhost"
Clear-Host
$measure = Measure-Command {$SqlServer.ReadErrorLog(3) | ?{$_.Text -like "*ERROR*"} | Out-GridView}
Using the Measure-Command
function, we can measure the execution time of a particular command or function. In this case, the above takes TotalMilliseconds : 1023529.8419
(over 17 minutes!) to run and eventually returns 387 rows into a GridView which we can filter further:
The slow performance is caused by the fact that it has to iterate through all the rows in the 5GB file to find those matching our ERROR filter. This is bound to be slow, and the result is the same as in the T-SQL example above.
Using text reader
The ERRORLOG is a text file on a local disk in the Log folder. We could open it in notepad, but that’s not a very convenient way to search through many text lines. We can also use PowerShell to read a text file using a highly efficient Select-String
function.
# Here we are using SMO to only get path to the ERRORLOG:
Import-Module SQLPS
$SqlServer = new-object Microsoft.SqlServer.Management.Smo.Server "localhost"
$ErrorLog = $SqlServer.ErrorLogPath
$LogNo = 3 #which ERROLOG do we want
Clear-Host
$measure = Measure-Command {Select-String -Path "$ErrorLogERRORLOG.$LogNo" -Pattern "ERROR" | Out-GridView}
In this case, the GridView opens instantly and loads filtered data in the background. In all earlier examples, we had to wait for the data set to be filtered and returned to us. In this example, we have access to the first filtered rows immediately!
Eventually, the data load completes in the background after TotalMilliseconds : 21949.6703
(22 seconds!)
However, the Select-String
function has a few other essential features. For example, it has a concept of exclusion and “context” line and can return lines before and after our match:
# Here we are using SMO to only get path to the ERRORLOG:
Import-Module SQLPS
$SqlServer = new-object Microsoft.SqlServer.Management.Smo.Server "localhost"
$ErrorLog = $SqlServer.ErrorLogPath
$LogNo = 3 #which ERROLOG do we want
Clear-Host
Select-String -Path "$ErrorLogERRORLOG.$LogNo" -Pattern "Error: 35286" -Context 1, 1
Returns:
> 4115:2018-02-18 16:23:33.38 spid75s Error: 35286, Severity: 16, State: 1.
4116:2018-02-18 16:23:33.38 spid75s Using the recovery LSN (1536:50088:1) stored in the metadata for the database with ID 14. This is an informational message only. No user action is required.
> 4127:2018-02-18 16:23:33.82 spid212s Error: 35286, Severity: 16, State: 1.
4128:2018-02-18 16:23:33.82 spid212s Using the recovery LSN (2702:305880:1) stored in the metadata for the database with ID 35. This is an informational message only. No user action is required.x
> 4860:2018-02-18 16:23:37.96 spid201s Error: 35286, Severity: 16, State: 1.
4861:2018-02-18 16:23:37.96 spid201s Using the recovery LSN (519:448856:1) stored in the metadata for the database with ID 18. This is an informational message only. No user action is required.
> 4945:2018-02-18 16:23:38.07 spid37s Error: 35286, Severity: 16, State: 1.
4946:2018-02-18 16:23:38.07 spid37s Using the recovery LSN (1553:351536:1) stored in the metadata for the database with ID 20. This is an informational message only. No user action is required.
> 4966:2018-02-18 16:23:39.07 spid236s Error: 35286, Severity: 16, State: 1.
4967:2018-02-18 16:23:39.07 spid236s Using the recovery LSN (31538:188792:1) stored in the metadata for the database with ID 21. This is an informational message only. No user action is required.
Conclusion
The winner in the performance and functionality category is the Select-String
function takes over 20 seconds to find matching rows in a 5GB file. The native T-SQL takes under 30 seconds to do the same, and the slowest is the SMO method taking over 17 minutes! Reading ERRORLOG as plain text may not be the easiest way and has its limitations. For example, it will only work on localhost unless we get access to the remote ERRORLOG via UNC path. However, it is the most efficient and flexible way, especially if you have big files that you’d like to run some regular expression.