Agent Heartbeat down! - Multiple Heartbeats down!
Windows Server Forum Index Windows Server
Server discussion on Windows platform.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web winserverhelp.com
Agent Heartbeat down! - Multiple Heartbeats down!

 
Post new topic   Reply to topic    Windows Server Forum Index -> MOM
Author Message
Blake Mengotto
Guest





Posted: Fri Jan 21, 2005 6:48 am    Post subject: Agent Heartbeat down! - Multiple Heartbeats down! Reply with quote

If anyone can assist you get a virtual cookie:

Situation:

One MOM config group, multiple applications, multiple support groups.

Keeping it simple lets say these apps exists and these support groups exists:

Exchange - Exchange Ops
SQL - SQL Ops
Widgets - Widget Ops
WOW Servers - WOW server operations team (currently has open positions www.blizzards.com)

So it's a Friday night, hot fixes, patches, upgrades are going on all over the environment! Some people disable the onepoint servers, and on other servers the onepoint service crashes and doesn't restart (this never happens now does it! SHAME ON ME!).

All groups are looking at the MOM console, but don't see any alerts on their servers. There are a few alerts raised on the DCAM that say something about Agent heartbeat is not working, but a ping is successful. Since they don't support the DCAM, they pay it no attention. The thought of actually looking at the All Agents view never crosses their mind because one more click and a screen refresh would take the last ounce of energy out of them.

Five days later CEO Johnny Rockets is angry about not getting any email for five days. Refusing to call the help desk, he calls the MOM admin. "WHAT THE HELL IS GOING ON with my EXCHANGE server?!?" (Yes he has his own SAN attached Exchange Cluster) He was on vacation and lost his Blackberry during the first day of a diving trip in Hawaii (company expensed business trip if you know what I mean). So now, back in the office, he is livid. He was expecting an email from his mistress, but it never arrived and now he is concerned that she is no longer interested in him and his Porsche SUV.

The MOM admin looks at the All Agents views and sees the last date contacted was five days ago, explains this to the CEO, and now the CEO calls EDS to have the ops team outsourced.

Well, to avoid this situation, currently you have a few options:

1) Train the ops team to look at the MOM console at least once an hour (Really shouldn't have to do this, but apparently system monitoring is about as exciting as being a dental hygienist).
2) When these two alerts are raised email all ops teams hoping they haven't created an inbox rule to just file or delete it (enable message tracking just to show you did due diligence in notifying the groups).
3) When these two alerts are raised page all ops teams and wait for the death threats to roll in.
4) Put in your resignation and surf Dice and Monster all day for a server farm job.

I want option 5.

5) Create some intelligence so that this alert, when raised, will look at all the machines listed in the description of the event and page all the appropriate groups.

Is this possible? OR should I start to surf Dice and Monster?

--
Regards,
Blake Mengotto
Email: mengotto@nospam.hotmail.com
"MOM 2000/2005 - The ultimate solution for monitoring/managing your Windows OS and applications."
http://www.momanswers.com - MOM solution center resource
http://www.microsoft.com/mom - MOM Application site
http://www.silect.com - MOM Health Reporter
http://www.excsoftware.com - MOM solution provider
Back to top
JesseH
Guest





Posted: Fri Jan 21, 2005 11:12 pm    Post subject: Re: Agent Heartbeat down! - Multiple Heartbeats down! Reply with quote

I'm virtually hungry so I'll take a stab at this.

Create a timed event that runs a script.
In your script:
1) query "Select count(*) from Computer Table where LastContacted is > 30 minutes from now."
2) create a perf dataitem and send this count to MOM
3) create a rule that generates an alert when this count is greater than 5 over 3 samples.

If anybody figures out how to do Step Three let me know...


"Blake Mengotto" <mengotto@nospam.hotmail.com> wrote in message news:OqFc%23L3$EHA.1452@TK2MSFTNGP11.phx.gbl...
If anyone can assist you get a virtual cookie:

Situation:

One MOM config group, multiple applications, multiple support groups.

Keeping it simple lets say these apps exists and these support groups exists:

Exchange - Exchange Ops
SQL - SQL Ops
Widgets - Widget Ops
WOW Servers - WOW server operations team (currently has open positions www.blizzards.com)

So it's a Friday night, hot fixes, patches, upgrades are going on all over the environment! Some people disable the onepoint servers, and on other servers the onepoint service crashes and doesn't restart (this never happens now does it! SHAME ON ME!).

All groups are looking at the MOM console, but don't see any alerts on their servers. There are a few alerts raised on the DCAM that say something about Agent heartbeat is not working, but a ping is successful. Since they don't support the DCAM, they pay it no attention. The thought of actually looking at the All Agents view never crosses their mind because one more click and a screen refresh would take the last ounce of energy out of them.

Five days later CEO Johnny Rockets is angry about not getting any email for five days. Refusing to call the help desk, he calls the MOM admin. "WHAT THE HELL IS GOING ON with my EXCHANGE server?!?" (Yes he has his own SAN attached Exchange Cluster) He was on vacation and lost his Blackberry during the first day of a diving trip in Hawaii (company expensed business trip if you know what I mean). So now, back in the office, he is livid. He was expecting an email from his mistress, but it never arrived and now he is concerned that she is no longer interested in him and his Porsche SUV.

The MOM admin looks at the All Agents views and sees the last date contacted was five days ago, explains this to the CEO, and now the CEO calls EDS to have the ops team outsourced.

Well, to avoid this situation, currently you have a few options:

1) Train the ops team to look at the MOM console at least once an hour (Really shouldn't have to do this, but apparently system monitoring is about as exciting as being a dental hygienist).
2) When these two alerts are raised email all ops teams hoping they haven't created an inbox rule to just file or delete it (enable message tracking just to show you did due diligence in notifying the groups).
3) When these two alerts are raised page all ops teams and wait for the death threats to roll in.
4) Put in your resignation and surf Dice and Monster all day for a server farm job.

I want option 5.

5) Create some intelligence so that this alert, when raised, will look at all the machines listed in the description of the event and page all the appropriate groups.

Is this possible? OR should I start to surf Dice and Monster?

--
Regards,
Blake Mengotto
Email: mengotto@nospam.hotmail.com
"MOM 2000/2005 - The ultimate solution for monitoring/managing your Windows OS and applications."
http://www.momanswers.com - MOM solution center resource
http://www.microsoft.com/mom - MOM Application site
http://www.silect.com - MOM Health Reporter
http://www.excsoftware.com - MOM solution provider
Back to top
Blake Mengotto
Guest





Posted: Sat Jan 22, 2005 6:48 am    Post subject: Re: Agent Heartbeat down! - Multiple Heartbeats down! Reply with quote

I don't think I have the skill set to even understand step 1. LOL!

Sounds interesting though!

I think I better sign up for some dance lessons, pick up a boom box from BestBuy, and go to Home Depot and get some silver paint.


--
Regards,
Blake Mengotto
Email: mengotto@nospam.hotmail.com
"MOM 2000/2005 - The ultimate solution for monitoring/managing your Windows OS and applications."
http://www.momanswers.com - MOM solution center resource
http://www.microsoft.com/mom - MOM Application site
http://www.silect.com - MOM Health Reporter
http://www.excsoftware.com - MOM solution provider
"JesseH" <Jesse.Harris@Gmail.com> wrote in message news:e7pQbx9$EHA.2600@TK2MSFTNGP09.phx.gbl...
I'm virtually hungry so I'll take a stab at this.

Create a timed event that runs a script.
In your script:
1) query "Select count(*) from Computer Table where LastContacted is > 30 minutes from now."
2) create a perf dataitem and send this count to MOM
3) create a rule that generates an alert when this count is greater than 5 over 3 samples.

If anybody figures out how to do Step Three let me know...


"Blake Mengotto" <mengotto@nospam.hotmail.com> wrote in message news:OqFc%23L3$EHA.1452@TK2MSFTNGP11.phx.gbl...
If anyone can assist you get a virtual cookie:

Situation:

One MOM config group, multiple applications, multiple support groups.

Keeping it simple lets say these apps exists and these support groups exists:

Exchange - Exchange Ops
SQL - SQL Ops
Widgets - Widget Ops
WOW Servers - WOW server operations team (currently has open positions www.blizzards.com)

So it's a Friday night, hot fixes, patches, upgrades are going on all over the environment! Some people disable the onepoint servers, and on other servers the onepoint service crashes and doesn't restart (this never happens now does it! SHAME ON ME!).

All groups are looking at the MOM console, but don't see any alerts on their servers. There are a few alerts raised on the DCAM that say something about Agent heartbeat is not working, but a ping is successful. Since they don't support the DCAM, they pay it no attention. The thought of actually looking at the All Agents view never crosses their mind because one more click and a screen refresh would take the last ounce of energy out of them.

Five days later CEO Johnny Rockets is angry about not getting any email for five days. Refusing to call the help desk, he calls the MOM admin. "WHAT THE HELL IS GOING ON with my EXCHANGE server?!?" (Yes he has his own SAN attached Exchange Cluster) He was on vacation and lost his Blackberry during the first day of a diving trip in Hawaii (company expensed business trip if you know what I mean). So now, back in the office, he is livid. He was expecting an email from his mistress, but it never arrived and now he is concerned that she is no longer interested in him and his Porsche SUV.

The MOM admin looks at the All Agents views and sees the last date contacted was five days ago, explains this to the CEO, and now the CEO calls EDS to have the ops team outsourced.

Well, to avoid this situation, currently you have a few options:

1) Train the ops team to look at the MOM console at least once an hour (Really shouldn't have to do this, but apparently system monitoring is about as exciting as being a dental hygienist).
2) When these two alerts are raised email all ops teams hoping they haven't created an inbox rule to just file or delete it (enable message tracking just to show you did due diligence in notifying the groups).
3) When these two alerts are raised page all ops teams and wait for the death threats to roll in.
4) Put in your resignation and surf Dice and Monster all day for a server farm job.

I want option 5.

5) Create some intelligence so that this alert, when raised, will look at all the machines listed in the description of the event and page all the appropriate groups.

Is this possible? OR should I start to surf Dice and Monster?

--
Regards,
Blake Mengotto
Email: mengotto@nospam.hotmail.com
"MOM 2000/2005 - The ultimate solution for monitoring/managing your Windows OS and applications."
http://www.momanswers.com - MOM solution center resource
http://www.microsoft.com/mom - MOM Application site
http://www.silect.com - MOM Health Reporter
http://www.excsoftware.com - MOM solution provider
Back to top
 
Post new topic   Reply to topic    Windows Server Forum Index -> MOM All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




New Topics Powered by phpBB