Daily state aggregations appear to be stored against the wrong date in the data warehouse database when the time zone of the server hosting the SQL Server instance is at an offset to UTC. This is evident in the UK, because for approximately half the year the UK is on UTC, at which point the daily aggregations are stored against the correct date, but during BST (UTC+1) they are stored against the previous date. I suspect this may only affect time zones at a positive offset to UTC, and that the problem is in
dbo.StandardDatasetAggregate, possibly related to the calculation of
@IntervalStartDateTime, but I haven’t been able to pinpoint the bug. I also haven’t checked whether the same situation exists with the other aggregate types, such as performance.
The problem can be best illustrated with a script, which works against a database running in the UK for web application transaction monitor-data generated in 2014:
-- Determine a web application transaction monitor with relevant data, i.e. during BST and where an unhealthy state has been recorded.
DECLARE @ManagedEntityMonitorRowId INT,
SELECT TOP 1
@Date = sdf.[Date],
@ManagedEntityMonitorRowId = sdf.ManagedEntityMonitorRowId
INNER JOIN dbo.vManagedEntity me
ON sdf.ManagedEntityRowId = me.ManagedEntityRowId
AND me.FullName LIKE 'WebApplication[_]%'
AND me.FullName NOT LIKE '%WatcherComputersGroup'
sdf.MonitorRowId = (
SELECT TOP 1
MonitorSystemName = 'System.Health.EntityState'
sdf.InYellowStateMilliSeconds > 10000
OR sdf.InRedStateMilliseconds > 10000
AND sdf.[Date] BETWEEN '31/Mar/2014' AND '25/Oct/2014' -- ~BST.
-- Return the data.
ManagedEntityMonitorRowId = @ManagedEntityMonitorRowId
AND [DateTime] >= @Date
AND [DateTime] < DATEADD(d, 2, @Date)
For a particular web application transaction monitor, my results for the daily aggregation (
AggregationTypeId = 30) for 31/03/2014 show that
206173, but the sum total of
InYellowStateMilliseconds for the hourly aggregations (
AggregationTypeId = 20) is
0 for this date. The figure of
206173 shows against the hourly aggregations associated with the following day, 01/04/2014.
I have logged this bug on Microsoft Connect.
Edit: 13/08/2015: This bug has been fixed in management pack version 126.96.36.199.
I have discovered a bug in the latest SQL Server 2014 SCOM management pack (188.8.131.52).
The problem is that file groups are not discovered for databases containing filestreams or partition schemes, due to a bug in the discovery script. The health service subsequently goes on to discover the files associated with all the file groups, regardless of whether the file group has been discovered, and forwards the data to the management server. Upon receipt of the discovery data for the files, the management server rejects it because some of the files are associated with file groups that haven’t yet been discovered, i.e. the management server is unable to map some of the files to file groups. The result is that all files for the database aren’t discovered and are therefore not monitored.
The symptoms are event ID 10801 on management servers, when the management server processes the discovery data, and missing database file groups and files in the inventory. The cause is a bug in the
DiscoverSQL2014FileGroups.js script in the
Microsoft.SQLServer.2014.Discovery management pack, where it only accepts file group types
FG, but, according to MSDN, there are two more possible values:
I have logged this bug on Microsoft Connect. Please vote for this bug if it’s a problem for you.
I’ve implemented a workaround, until Microsoft resolve the problem, which restricts file and log file discovery to those file group types that have previously been discovered, i.e.
FG. This allows monitoring of these file group types, but does not monitor the missing types. Unfortunately I can’t distribute the updated management pack as it contains Microsoft code.
The cause of an error code of 2147954430 from a web application transaction monitor can be difficult to determine, especially as it’s often intermittent. In my case, investigating the watcher node revealed that too many long-running calls had been included in the monitor and this had caused a backlog on the node, as described by event ID 10503 in the Operations Manager log:
The HTTP URL Monitoring Module detected that a backlog of processing has happened. It might be an indication of too many URL monitors configured for this watcher node.
The solution was to increase the interval between executions of the monitor to one that more realistically reflected the likely execution time. And it might be useful to create a rule to raise alerts from these events on the watcher nodes.