Cloning Oracle Databases with EMC SnapView and RecoverPoint

Centroid
Sep
31
Many are familiar with the steps required to clone Oracle database using "rman duplicate" or "hot backup" cloning .  Many are also familiar with steps required to create EMC SnapView Clones or SnapView Snapshots, either with the Navisphere web interface or CLI.  In this post, I'll outline steps required to build consistent, usable Oracle database "clones" within the framework of the following environment/architecture:
  • Oracle 11.1.0.7, HP-UX 11iV2
  • EMC CLARiiON CX4 storage arrays at production and DR site
  • EMC RecoverPoint appliances at production and DR site
  • Source database (production) uses Oracle ASM for its storage
  • Requirement is to replicate production data from production storage array to remote "DR" array.
  • Requirement is to use this replicated data at the DR site as the source for both SnapView Clones or SnapView Snapshots
  • Requirement is to Clone or Snap from DR Replica LUNs, re-create an Oracle control file to build a new database, recover this target database, open with resetlogs, and use it
Know Your Storage Environment
Before beginning the process, it is vital to know and understand your storage environment.  While it is possible to dynamically configure and report on everything in your storage array using Navisphere CLI, I find it's best to peruse and document your CX4 configuration details using the web interface.  So open a browser and go to:
http://<IP address of one of the CX SP's>/start.htm
... and login as admin.    Once inside Navisphere on the production storage array, note the following:
  • The LUN numbers and names for the LUNs that comprise or will comprise the RecoverPoint consistency group
  • Ensure (or assume) the LUNs are in a storage group and zoned to the production host
On the DR array, you need to document or do the following:
  • LUN numbers/names of Replica LUNs (i.e., LUNs in the RecoverPoint Consistency Group)
  • LUN numbers/names for all to-be SnapView Clones.  When using SnapView Clones, the number and size of Replica LUNs needs to match that on the Clone LUNs for each clone group you'll be creating
  • Sufficient LUNs carved into the Reserve LUN pool to hold snapshot data
Do Some Planning
Before plodding ahead with testing, I feel it's important to plan your deployment and develop some up-front standards that can be used during the configuration.  Questions to ask:
  • How many SnapView Clones will I need? (will govern how many LUNs to build on the DR array, and place in the primary host storage group)
  • How many SnapView Snapshots will I need? (this information, combined with source database size, will help size reserve LUN pool)
  • What will the shelf-life be for my snapshots?
  • How much DML/DDL will occur in my snapshot instances over time?
In addition to asking these questions, I believe it's important to decide upon the following:
  • Implement a standard ASM diskgroup naming convention (i.e., PROD_DG1, DEV_DG1, etc)
  • Implement strategy for consistent symbolic linking of O/S files to the ASM devices that will be defined in the ASM disk group.  For example, if ASM diskgroup  PROD_DG1 is designed to use /dev/rdsk/c57t0d0, which is LUN1 on the production CX4 storage array, we should symbolically link /asm/disk1 to /dev/rdsk/c57t0d0 and build the ASM diskgroup with the "/asm/disk1" string
  • Set asm_diskstring in both production and DR server ASM instance to the same thing, with wild-cards.  For example, "/asm*/disk*"
  • Map Source to Replica LUN numbers
  • Map Replica to Clone Group LUN for each Clone Group, and ensure "target" clone LUNs are added to the right storage group
  • Implement a strategy for snapshot LUN number conventions.  For example, if you will expect to build 3 different snapshots on the Replica LUNs, you can start LUN numbering on the first set at LUN 3000, the second set at LUN 4000, the third set at LUN 5000.
The more planning done ahead of time, the less risk and the easier it will be to build automated scripts to do things end-to-end.
In This Post ...
For examples below in this post, I'll use the following information:
  • prodhost = production HP-UX host
  • drhost = DR HP-UX host
  • PROD = production database name
  • CLN1 = 1st clone of PROD using SnapView clones
  • CLN2 = 2nd clone of PROD using SnapView clones
  • SNP1 = 1st snapshot of PROD using SnapView snapshots
  • SNP2 = 2nd snapshot of PROD using SnapView snapshots
  • rpa1 = host name of RecoverPoint appliance's admin interface
  • cx4-dr = DNS name of DR CLARiiON CX4, used for NaviSphere
  • EMC PowerPath is installed and  configured on both prodhost and drhost
  • 3 ASM Diskgroups: PROD_DG1, PROD_DG2, and PROD_DG3, all replicating in the RP Consistency group and all used as sources to Clones/Snapshots
Cloning PROD to CLN1 with SnapView Clones and Oracle ASM
  • Ensure RPA is transmitting data from primary storage array to DR array (and ensure the consistency groups are setup and functional)
  • Ensure an ASM instance is running on drhost
  • Obtain LUN numbers to use for the CLN1 clone group from NaviSphere
  • Make sure the LUNs are in the proper storage group, zoned to drhost, and visible via EMC PowerPath
# /sbin/init.d/agent stop
# ioscan -fnCdisk
# insf
# /sbin/init.d/agent start
# powermt check
# powermt config
# powermt save
  • Run "powermt display dev=all" as root and search contents for the Replica LUN and Clone LUN names/numbers.
  • Consider primary (PROD) ASM device to HP-UX device mappings and ensure you've got it documented.  For sake of example:
PROD_DG1 is on /asm/disk1
PROD_DG2 is on /asm_disk2
PROD_DG3 is on /asm_disk3
  • Create symbolic link from /asm_c1/disk1 to the target Clone LUN that will be synced from the Replica LUN mapped to the primary LUN for PROD_DG1
  • Repeat for /asm_c1/disk2 and /asm_c1/disk3.
NOTE: I am using /asm_cX/diskY convention here, with X = Clone Group number and Y = disk number within the clone group.
NOTE: At this point, there will be no data on the disks, since they haven't been synchronized.
  • Create SnapView clone on the 3 LUN.  Below, assume the Replica LUNs are 1, 2, and 3

# naviseccli -h cx4-dr  -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun1CloneGrp_1 -luns 1 -o

# naviseccli -h cx4-dr  -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun2CloneGrp_1 -luns 2 -o

# naviseccli -h cx4-dr  -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun3CloneGrp_1 -luns 3 -o

  • Add target LUNs to clone group and begin synchronizing data.  When you create a clone group (above) and specify the "-luns" clause, the LUN number following the "-luns" argument is the source LUN for the clone, which in this case is the Replica LUN on the DR storage array.  The following will create a clone group for LUN 11 (mapped to Replica LUN 1), LUN 12 (mapped to Replica LUN 2), and LUN 13 (mapped to Replica LUN 3)

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_1 -luns 11 -syncrate high

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_2 -luns 12 -syncrate high

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_3 -luns 13 -syncrate high

  • Wait for clone synchronization to complete.  You can use the below to monitor this based on the clone group configurations above:

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_1 | egrep '(^Name|^CloneState|^CloneCon|^Percent)'

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_2 | egrep '(^Name|^CloneState|^CloneCon|^Percent)'

# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_3 | egrep '(^Name|^CloneState|^CloneCon|^Percent)'

  • Put source database (PROD) in backup mode.  First though, grab the max first_change# from V$ARCHIVED_LOG to show the earliest archived redo log we'll need at a later step ...
SQL> select max(first_change#) from v$archived_log
SQL> alter database begin backup;
  • Enable "Image Access" on the RecoverPoint Appliance (RPA).  This is required to put the source of the clones, which are the RPA Replica LUNs, in a consistent state.  If you omit this step you'll get to the end of this, try to recover your database, and will be left with the only option to recover all the way up through the most current redo log on the primary site - something we don't want to do ...  To enable image access through the RPA CLI:

# ssh admin@rpa1 'enable_image_access group=<your RP consistency group> copy=<name of copy site> image=latest'

  • Fracture your SnapView clone
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview  -CloneGroupNameCloneID lun1CloneGrp_1 0100000000000000 lun2CloneGrp_1 0100000000000000 lun3CloneGrp_1 0100000000000000 -o
  • Disable image access to RPA
# ssh admin@rpa1 'disable_image_access group=<your RP consistency group> copy=<name of copy site>'
  • End backup mode on source
SQL> alter database end backup;
SQL> alter system archive log current;
SQL> select max(first_change#) from v$archived_log
  • Modify ASM diskgroup name.  On ASM 11gR2, we can use "renamedg" from asmcmd, but since our test is on 11gR1, we need to use kfed to modify the header block of the ASM devices.  First, do a "kfed read" on all devices that comprise the target ASM diskgroups you want to mount.  Direct this to a text file, edit the file and search for string "grpname".  Change the diskgroup from "PROD_DG" to "CLN1_DG" and save the file.  Then, use "kfed merge" to modify the disk.
# kfed read dev=/asm_c1/disk1 > disk1.txt
# kfed read dev=/asm_c1/disk2 > disk2.txt
# kfed read dev=/asm_c1/disk3 > disk3.txt
(edit disk1.txt, disk2.txt, and disk3.txt, replacing PROD_DG1 with CLN1_DG1, etc - only modify the line that has the sting "grpname" in it)
# kfed merge /asm_c1/disk1 text=disk1.txt
# kfed merge /asm_c2/disk1 text=disk2.txt
# kfed merge /asm_c3/disk1 text=disk3.txt
Next, re-read the block header using kfed to validate ...
  • Mount ASM diskgroups
SQL> alter diskgroup CLN1_DG1 mount;
SQL> alter diskgroup CLN1_DG2 mount;
SQL> alter diskgroup CLN1_DG3 mount;
  • Generate backup controlfile from source environment, edit and save so you have a "CREATE CONTROLFILE" script to use on your CLN1 database
  • Build controlfile for CLN1
  • At this point, in order for CLN1 to be recoverable, you need the archive log preceding the "begin backup" and archive log after the "end backup" in a place where CLN1 can see them.  I use RMAN to copy these archivelogs to a location CLN1 can "see"
  • Login to SQL*Plus with CLN1 set and set LOG_ARCHIVE_DEST_1 to the location you've copied the source archive logs to.
  • Issue a "recover database using backup controlfile"
  • Specify the archive logs copied from 3 steps ago, and cancel after the last one
  • Open with RESETLOGS
  • Add TEMP files
  • Do whatever other post-cloning needs to be done
Cloning PROD to SNP1 with SnapView Snapshots and Oracle ASM
The process for cloning based on SnapView snapshots is similar to SnapView cloning from a high-level standpoint; the Navisphere commands are obviously different and there is some additional work that needs to be done to ensure the resultant snap LUNs are visible and addressable on the host.
  • Put source database in backup mode and note latest archive log
  • Enable image access on RPA Replica LUNs (see previous section)
  • Start Snapview Session.  In the below example, the "-lun 1 2 3" creates a snapshot session on Replca LUNs 1, 2, and 3
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -startsession snap_snp1 -lun 1 2 3 -consistent
  • Create Snapshots
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createsnapshot snp1_1 -snapshotname snap_snp1
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createsnapshot snp1_2 -snapshotname snap_snp1
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createsnapshot snp1_3 -snapshotname snap_snp1
  • Activate Snapshots
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -activatesnapshot snp1_1 -snapshotname snap_snp1
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -activatesnapshot snp1_2 -snapshotname snap_snp1
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -activatesnapshot snp1_3 -snapshotname snap_snp1
  • Add Snapshot LUNs to EMC Storage group.  Assuming storage group is SG_drhost and snapshot LUNs will be named, 3000, 3001, and 3002 respectively.  It's good to map out which snapshot LUN numbers you want to use ahead of time
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> storagegroup -addsnapshot -gname SG_drhost -hlu 3000 -snapshotname snp1_1
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> storagegroup -addsnapshot -gname SG_drhost -hlu 3001 -snapshotname snp1_2
naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> storagegroup -addsnapshot -gname SG_drhost -hlu 3002 -snapshotname snp1_3
  • Find and fix host (HP-UX) devices so they're usable.  Since we're added a new set of LUNs to our storage group (3000, 3001, and 3002), we need to do the following on HP-UX for them to be visible and mountable:

# /sbin/init.d/agent stop

# ioscan -fnCdisk

# insf

# /sbin/init.d/agent start

#/sbin/powermt check force dev=all

# /sbin/powermt config

# /sbin/powermt save

Then do a "powermt display dev=all" and search for snp1_1, snp1_2, and snp1_3.  Once you find these find the HP-UX device for these and symbolically link /asm_s1/disk1, /asm_s1/disk2, and /asm_s3/disk3 to these

  • Disable image access on RPA (see previous section)
  • End backup mode on source (see previous section
  • Modify ASM block header on target SNP1 (see previous section).  Use devices /asm_s1/disk1, /asm_s1/disk2, and /asm_s1/disk3 based on previous steps
  • Create ASM diskgroups for SNP1 (see previous section).  Reference above devices
  • Mount ASM diskgroups for SNP1 (see previous section)
  • Generate script to create controlfile (see previous section)
  • Build controlfile for SNP1
  • Find and backup needed archive logs to destination SNP1 can see (see previous section)
  • Recover SNP1 (see previous section)
  • Open SNP1 with RESETLOGS and add temp files
What about HP-UX LVM and Traditional File-Systems?
The overall approach to cloning Oracle using SnapView Clones or Snapshots for traditional HP-UX file-systems is very similar to the steps required for Oracle ASM, with the following exceptions:
  • You obviously won't have to drop/dismount/create/mount ASM disk groups
  • You won't have to modify ASM block headers
  • No need to symbolically link to HP-UX device names
Consider these requirements
  • PROD is a production database running on prodhost
  • CLN1 is an Oracle copy of production running on drhost and will be a complete SnapView clone of production
  • CLN2 is an Oracle copy production running on drhost and will be a complete SnapView clone of production
  • SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
  • SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
Further, PROD is physically stored on two file-systems, /u01 and /u02.   /u01 is mounted to /dev/vgprod1/lvol1 on prodhost and /u02 is mounted to /dev/vgprod2/lvol1 to prodhost.
To mount CLN1, for example, to fractured clone LUNs for the first time, we need to do this:
  • Ensure clone LUNs are in the proper CX4 Storage Group
  • # /sbin/init.d/agent stop
  • # ioscan -fnCdisk
  • # insf
  • # /sbin/init.d/agent start
  • # /sbin/powermt display dev=all
  • Examine output of PowerPath command above and note device names.  For sake of example, we'll focus on the first LUN, /u01, which we want to mount as /u01_cln1.  The device for this is /dev/dsk/c80t0d1 (again, for example)
  • # vgchgid /dev/dsk/c80t0d1
  • # mkdir /dev/vgcln1
  • # mknod /dev/vgcln1/group c 64 0x100000 -- this "0x100000" should be unique, check /dev/vg*/group*)
  • # vgimport /dev/vgcln1 /dev/dsk/c80t0d1
  • # vgchange -a y /dev/vgcln1
  • # fsck /dev/vgcln1/lvol1
  • # mkdir /u01_cln1
  • # mount -o delaylog /dev/vgcln1/lvol1 /u01_cln1
Upon subsequent clones, we simply need to unmount file-systems prior to the clone, deactivate the volume group(s) using "vgchange -a n <vg>", allow the SnapView clone to complete, then resume with the "vgchange -a y" step above.
The steps to do snapshots is very similar, but (I think) there's a chance that you'll get different physical devices each time you remove and add snapshot LUNs to the storage group. This being the case, the steps for cloning (above) need to be done in their entirety each time.