| Author |
Message |
Jims
Guest
|
Posted:
Fri Nov 11, 2005 9:50 pm Post subject:
AD/ADAM Directory corruption DR strategy question |
|
|
We are exploring strategies to reduce downtime in the event of catastrophic
directory corruption. Our ADAM servers are load balanced and redundant, so
the server failure contingency is covered. The concern is that in the event
the directory itself went south (unlikely as it may be), our users would be
crippled for at least 1-2 hrs (major SLA no no at a hospital). An
autoritative restore would most likely get us back but not quickly enough -
especially at 3am or when the AD admins are unreachable. We would like to
have a directory DR strategy that any of our data center operations people
could implement if they needed w/o any AD know-how; like a simple list of
steps. What we had in mind was the ability to have an additional ADAM
server that would connect to the network for only a few minutes each day to
replicate and then remove itself from the repication cycle for some period
of time and then repeat the process. The intention would be if catastrophic
directory corruption occured; the Operations Dept. would follow a few steps
to remove the corrupted directory servers from production and re-insert the
online/offlne directory server. This would address the immediate critical
need for application access to ldap, and the corrupt environment could be
restored properly offline and added back at some later time. The technical
steps involved could consist of scripts to temporarily disable replication
or a network interface or a load balancer backend resource redirct etc. Is
anyone already doing something like this? Are there any risks with
frequently removing/adding a replication partner? Could this strategy also
be applied to regular Active Directory (infrastruction AD)? Any thoughts
appreciated.
Thanks,
Jim |
|
| Back to top |
|
 |
JPolicelli
Guest
|
Posted:
Sat Nov 12, 2005 9:50 am Post subject:
RE: AD/ADAM Directory corruption DR strategy question |
|
|
What your describiing, and likely aiming for, is what is known as a lag site.
In this scenario, you create an additional site and set replication
frequency to a higher value - i.e. 12 hours, 1 day, or 1 week, etc.
The concept of a lag site is very good in that if you have a major failure,
it will not replicate to DCs/ADAM servers in the lag site, provided you catch
it before the next replication interval.
From a setup perspective, create a new site, map a subnet to that site,
configure the replication frequency you desire, move/add the DC/ADAM
server(s) to the new site.
In the event of a failure, use repadmin to disable inbound replication to
this lag site, then push out the copy of AD/ADAM from this lag site to other
sites.
"Jims" wrote:
| Quote: | We are exploring strategies to reduce downtime in the event of catastrophic
directory corruption. Our ADAM servers are load balanced and redundant, so
the server failure contingency is covered. The concern is that in the event
the directory itself went south (unlikely as it may be), our users would be
crippled for at least 1-2 hrs (major SLA no no at a hospital). An
autoritative restore would most likely get us back but not quickly enough -
especially at 3am or when the AD admins are unreachable. We would like to
have a directory DR strategy that any of our data center operations people
could implement if they needed w/o any AD know-how; like a simple list of
steps. What we had in mind was the ability to have an additional ADAM
server that would connect to the network for only a few minutes each day to
replicate and then remove itself from the repication cycle for some period
of time and then repeat the process. The intention would be if catastrophic
directory corruption occured; the Operations Dept. would follow a few steps
to remove the corrupted directory servers from production and re-insert the
online/offlne directory server. This would address the immediate critical
need for application access to ldap, and the corrupt environment could be
restored properly offline and added back at some later time. The technical
steps involved could consist of scripts to temporarily disable replication
or a network interface or a load balancer backend resource redirct etc. Is
anyone already doing something like this? Are there any risks with
frequently removing/adding a replication partner? Could this strategy also
be applied to regular Active Directory (infrastruction AD)? Any thoughts
appreciated.
Thanks,
Jim
|
|
|
| Back to top |
|
 |
Jims
Guest
|
Posted:
Sat Nov 12, 2005 5:50 pm Post subject:
Re: AD/ADAM Directory corruption DR strategy question |
|
|
This is great info. I've never heard of a lag site, it sounds perfect.
Thanks
"JPolicelli" <JPolicelli@discussions.microsoft.com> wrote in message
news:9BD482AC-2686-4A40-9CD8-E9861102E6EB@microsoft.com...
| Quote: | What your describiing, and likely aiming for, is what is known as a lag
site.
In this scenario, you create an additional site and set replication
frequency to a higher value - i.e. 12 hours, 1 day, or 1 week, etc.
The concept of a lag site is very good in that if you have a major
failure,
it will not replicate to DCs/ADAM servers in the lag site, provided you
catch
it before the next replication interval.
From a setup perspective, create a new site, map a subnet to that site,
configure the replication frequency you desire, move/add the DC/ADAM
server(s) to the new site.
In the event of a failure, use repadmin to disable inbound replication to
this lag site, then push out the copy of AD/ADAM from this lag site to
other
sites.
"Jims" wrote:
We are exploring strategies to reduce downtime in the event of
catastrophic
directory corruption. Our ADAM servers are load balanced and redundant,
so
the server failure contingency is covered. The concern is that in the
event
the directory itself went south (unlikely as it may be), our users would
be
crippled for at least 1-2 hrs (major SLA no no at a hospital). An
autoritative restore would most likely get us back but not quickly
enough -
especially at 3am or when the AD admins are unreachable. We would like
to
have a directory DR strategy that any of our data center operations
people
could implement if they needed w/o any AD know-how; like a simple list
of
steps. What we had in mind was the ability to have an additional ADAM
server that would connect to the network for only a few minutes each day
to
replicate and then remove itself from the repication cycle for some
period
of time and then repeat the process. The intention would be if
catastrophic
directory corruption occured; the Operations Dept. would follow a few
steps
to remove the corrupted directory servers from production and re-insert
the
online/offlne directory server. This would address the immediate
critical
need for application access to ldap, and the corrupt environment could be
restored properly offline and added back at some later time. The
technical
steps involved could consist of scripts to temporarily disable
replication
or a network interface or a load balancer backend resource redirct etc.
Is
anyone already doing something like this? Are there any risks with
frequently removing/adding a replication partner? Could this strategy
also
be applied to regular Active Directory (infrastruction AD)? Any thoughts
appreciated.
Thanks,
Jim
|
|
|
| Back to top |
|
 |
|
|
|
|