CC Search Sitemap
High Availability  / Components  / Infrastructure Reliability  / Enterprise Management  / System Management  /  Basic functions

Monitoring

Root of the system management is monitoring of ressources. All relevant data for operation are recorded and analysed. Typically this contains monitoring of hardware components and software components. These components are indispensable connected with business. Databases, redundant servers and backup systems, internet connections and functions of protection mechanisms (Firewall/Spam filter/AntiVirus-Software) are just a small part of ressources which are monitored. The following picture shows an example in which the state of the monitored ressource can immedialtly be read.

Bond Interface OK (Nagios-state monitoring)
Click to enlarge image

Diagnosis

From the monitoring of different ressources results a diagnosis resp. state analysis of IT-infrastructure. This can be generated for single parts or the hole IT-infrastructure. The diagnosis contains different states: failure, overload and variable states. The analysis of these states underlie different operation criteria and can be analysed with the help of system management software.

Failure of a component

In case of failure of a component system management acts as an analysis tool, which figures the failure of components and breaks down its dependencies in the whole IT-infrastructure. So faulty components or ways in the IT-infrastructure can be seen at a look. This helps minimizing time for failure analysis.

Overload of a system component

System components, that work at performance limit can be detected with system management software. Before a system collapse relevant steps against can be taken to set the affected components back to service. A total failure of hardware can be actively avoided. 

Varying States

At this the analysis often applies for a not all the time appearing hardware  defect or software defect. Time slots in which definite services or hardware are not available are documentated.  The comparision with regular operation of the relevant systems or software components gives information to localize the source of failure and detect linkings in case of failure.

Notification

All states of a system component can be delivered to persons, groups of persons or external service provider via notification program as part of the system management software. There is also the possibility to depose frequencies and periods of time inside the notification mechanism. The nesting of single escalation steps offers the opportunity to rate faults and to initiate the relevant action for notification. 

For example is a system failure a notification for the administrator who is working on-site. If this failure is not acknowledged from this first contact in the first escalation step the next contact will be notified till the notification of an external service provider. The way of notification is defined in a measure. 

Based on this escalation model with faults can be dealt in different ways. This involves the kind of notification, its frequency and its duration.