Oracle Blog

March 25, 2009

RMAN: Recovering from a Missing or Corrupt Datafile

Filed under: Oracle, RMAN — Vikram @ 6:07 pm
Tags: , , , ,

While DB Backup management is a very important task for a DBA, it is the Recovery that calls for the crisis management. Almost always at times of DB recovery, the DBA is in pressure to bring the DB back to normal with minimum resource loss. The client expects minimum data loss, minimum time loss. It is advisable for the DBAs to be well-equipped with the most common recovery scenarios in such times. Let’s use RMAN for various recovery scenarios over a few subsequent posts. Let’s also review a few good practices on our way to recovery.

Let’s begin with recovering from a lost or corrupted datafile when the backup is in tact and the DB is open.

RMAN: Recovering from a lost datafile

Any kind of recovery needs a few very basic steps:

1) Checking the availability of valid backup
2) Deciding upon the recovery procedure
3) Actual recovery

The Scenario
Before we begin on the recovery procedure, let’s appraise the scenario. One of the datafile in a DB is lost or corrupt. The database is in OPEN state and, fortunately, the backups are in place. Of course, the first thing we will do is to confirm that the backups are indeed available. The backups were taken using RMAN.

TIP: A good backup plan is always very essential for the Database. Every DBA has to give enough thought to the backup plan and decide what is good for his organization.

In our case the datafile users01.dbf was missing from the database. The tablespace was USERS and this had a single datafile. The queries on the objects in that datafile were erroring out with the below ORA-01110 error. This error was accompanied by ORA-00376.

SQL> select owner, segment_name from dba_segments where tablespace_name='USERS' and segment_name='BONUS';
OWNER                          SEGMENT_NAME
------------------------------ ---------------------------------------------------------------------------------
SCOTT                          BONUS 

SQL> select count(1) from scott.bonus;
select count(1) from scott.bonus
*
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: 'D:\VIKRAM\ORADATA\TEST2\USERS01.DBF'

Moreover, when I queried the V$DATAFILE_HEADER view the error seen was:

SQL> SELECT FILE#, STATUS, ERROR, RECOVER, TABLESPACE_NAME, NAME
2  FROM V$DATAFILE_HEADER
3  WHERE RECOVER = 'YES' OR (RECOVER IS NULL AND ERROR IS NOT NULL);
FILE#      STATUS  ERROR                     REC TABLESPACE_NAME                NAME
---------- ------- ------------------------- --- ------------------------------ ----------
4          OFFLINE FILE NOT FOUND

The ERROR column shows file not found. Additionally, we may also run dbv utility on all the datafiles to uncover underlying datafile corruption.

So, this was the situation when it was decided to recover this datafile from backup. The DB was, as already mentioned, in OPEN state. The DB can as well be in the MOUNT state to perform this procedure. But, it isn’t necessary. Making the Datafile offline will be sufficient to carry out the recovery procedure.

Checking the Availability of the Backup
The first step in any recovery procedure is to confirm the availability of the complete set of backup. By complete set I mean all the files required for the recovery. This is called the Redundancy Set. The set of files needed to recover an Oracle database from the failure of any of its files – a datafile, a control file, or online redo log – is called the redundancy set.

After connecting to RMAN we can confirm if our backup is good enough to restore the missing datafile. The below command lists out the backup piece required to restore the datafile (in our case datafile number 4 is the missing datafile). We should immediately look for this backup piece’s availability.

RMAN> restore datafile 4 preview;
Starting restore at 24-MAR-09
using channel ORA_DISK_1
List of Backup Sets
===================
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
4       Full    591.91M    DISK        00:01:51     24-MAR-09
BP Key: 4   Status: AVAILABLE  Compressed: NO  Tag: TAG20090324T143340
Piece Name: D:\VIKRAM\FLASH_RECOVERY_AREA\TEST2\BACKUPSET\2009_03_24
               \O1_MF_NNNDF_TAG20090324T143340_4WL9W61M_.BKP
List of Datafiles in backup set 4
File LV Type Ckp SCN    Ckp Time  Name
---- -- ---- ---------- --------- ----
4       Full 578065     24-MAR-09 D:\VIKRAM\ORADATA\TEST2\USERS01.DBF
archive logs generated after SCN 578065 not found in repository
Media recovery start SCN is 578065
Recovery must be done beyond SCN 578065 to clear data files fuzziness
Finished restore at 24-MAR-09

From the above snapshot it is evident that the backup-piece required is O1_MF_NNNDF_TAG20090324T143340_4WL9W61M_.BKP. It also tells us that the archives beyond the SCN 578065 are required to clear the datafile’s fuzziness.

Next thing we got to do is to confirm that the archives are in place until atleast the required SCN reported by the above command. In this case, we need archives until atleast SCN 578065. Let’s check if the required archives are available.

RMAN> list backup of archivelog all;
List of Backup Sets
===================
BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
3       2.26M      DISK        00:00:02     24-MAR-09
BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: TAG20090324T142932
Piece Name: D:\VIKRAM\FLASH_RECOVERY_AREA\TEST2\BACKUPSET\2009_03_24
               \O1_MF_ANNNN_TAG20090324T142932_4WL9NMFX_.BKP
List of Archived Logs in backup set 3
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
1    3       574556     24-MAR-09 577816     24-MAR-09
BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
5       2.26M      DISK        00:00:03     24-MAR-09
BP Key: 5   Status: AVAILABLE  Compressed: NO  Tag: TAG20090324T143340
Piece Name: D:\VIKRAM\FLASH_RECOVERY_AREA\TEST2\BACKUPSET\2009_03_24
                \O1_MF_ANNNN_TAG20090324T143340_4WL9ZSRZ_.BKP
List of Archived Logs in backup set 5
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
1    3       574556     24-MAR-09 577816     24-MAR-09

The above snapshot shows two backupsets for archivelogs. Backupset with key 5 has the entries until SCN 577816. Not good. We need archives until SCN 578065. So, the archivelogs we have are not sufficient for the backup. We must locate the archive logs or redo logs beyond the SCN 577816 and upto 578065. Let’s look for redo for sequence numbers beyond 3. (it is 3 because the above Backupset with key 5 has archive logs until Sequence 3).

The below command shows the SCN contained in the redo log files. The Thread 1, Sequence 5 has changes until 584110. The redo log file with sequence 4 has the changes until 574556 which is not sufficient in our case. We will need redo logs until sequence 5 for recovery.

SQL> select * from v$log;
    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1          5   52428800          1 NO  CURRENT                 584110 24-MAR-09
         2          1          3   52428800          1 YES INACTIVE                574556 24-MAR-09
         3          1          4   52428800          1 YES ACTIVE                  577816 24-MAR-09

This confirms the availability of all the required archives for recovery in our case.

Once the required backup is in place, it is very easy to recover from any critical situation in Oracle. Let’s begin the recovery.

The Procedure
Recovery can be complete or incomplete. A complete recovery has no data loss whereas an incomplete recovery has some data loss. An incomplete recovery is accompanied by RESETLOGS. An incomplete recovery is necessary when some or all of the data is unavailable to recover the object from its restore point to the point of dataloss. An object here could be a datafile, tablespace, or database. Restore point here is the point until where the DB has been restored to using the backup, and to make the DB current until the point of object loss we will need recovery to be performed. Recovery requires availability of archived and online redo logs. If the archived logs or redo logs are missing, then incomplete recovery is the only option. We may have to use RECOVER…UNTIL TIME option for point-in-time recovery of the missing object. (more on the point-in-time recovery in future posts).

In our case, since we have all the required archives we will perform a complete recovery. There is no data loss in this recovery procedure. The DB can remain in MOUNT or OPEN state during this procedure. Only the datafile will be made offline during the restore/recovery operation if the DB is in OPEN state.

The Recovery Itself
Since all the backups are available, the actual recovery procedure is quite simple.

Connect to RMAN and issue the below statements:

RUN {
SQL ‘ALTER TABLESPACE users OFFLINE IMMEDIATE’;
RESTORE TABLESPACE users;
RECOVER TABLESPACE users;
SQL ‘ALTER TABLESPACE users ONLINE’;
}

The snapshot of the same, in my DB, is below. In my snapshot, the ALTER TABLESPACE users ONLINE statement is missing. I wanted to confirm that the RECOVER TABLESPACE command runs successfully before bringing the tablespace online. I ran the ONLINE tablespace command later at the end.
Also note that this RMAN command OFFLINE’s and ONLINE’s the entire tablespace. Since in our case this tablespace had only one datafile (which was missing), it wont much matter OFFLINE’ing the tablespace or datafile. If we had more than one datafile in this tablespace and one among those datafiles was missing or corrupt, we would OFFLINE only that datafile instead of the entire tablespace. The command is ‘ALTER DATABASE DATAFILE <file_num> OFFLINE’.

RMAN>  RUN {
2> SQL 'ALTER TABLESPACE users OFFLINE IMMEDIATE';
3> RESTORE TABLESPACE users;
4> RECOVER TABLESPACE users;
5> }
sql statement: ALTER TABLESPACE users OFFLINE IMMEDIATE
Starting restore at 24-MAR-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00004 to D:\VIKRAM\ORADATA\TEST2\USERS01.DBF
channel ORA_DISK_1: reading from backup piece D:\VIKRAM\FLASH_RECOVERY_AREA\TEST2
              \BACKUPSET\2009_03_24\O1_MF_NNNDF_TAG20090324T143340_4WL9W61M_.BKP
channel ORA_DISK_1: restored backup piece 1
piece handle=D:\VIKRAM\FLASH_RECOVERY_AREA\TEST2\BACKUPSET\2009_03_24
               \O1_MF_NNNDF_TAG20090324T143340_4WL9W61M_.BKP tag=TAG20090324T143340
channel ORA_DISK_1: restore complete, elapsed time: 00:00:04
Finished restore at 24-MAR-09
Starting recover at 24-MAR-09
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:07
Finished recover at 24-MAR-09

The recovery is complete successfully. We don’t see any errors reported. Now we ONLINE the tablespace as below:

RMAN> SQL 'ALTER TABLESPACE users ONLINE';
sql statement: ALTER TABLESPACE users ONLINE

This completes the recovery of the missing datafile. After this, the command (which errored out earlier) will run successfully:

SQL> select count(1) from scott.bonus;
COUNT(1)
----------
0

Let me know in case you find any issues following this procedure and I will assist you.

Please leave a comment if this post was helpful to you or if you have any suggestions or advice to make these posts more informative.

March 13, 2009

Missing Redo Logs – Scenario

Filed under: Oracle, RMAN, Redo Logs — Vikram @ 11:39 pm
Tags: , ,

Consider a scenario where you have lost your redo logs on a DB instance. It is possible to recreate them. It is also possible to recover the database with no data loss in certain situations. Let’s jump to action immediately. But first we need to know different ways of redo loss and which category we fit in.

A Redo log is a journal of events happening in the DB at any given time. Redo logs are the most crucial part of the database after the controlfile. The documentation defines the redo logs as follows:

The online redo log is a set of operating system files that record all changes made to any database block, including data, index, and rollback segments, whether the changes are committed or uncommitted. All changes to Oracle blocks are recorded in the online log.

A redo log, at the time of being deleted, will be in one of the below scenarios:
1) The redo log is not CURRENT and archived
2) The redo log is not CURRENT but not archived
3) The redo log is CURRENT (DB was shut down cleanly)
4) The redo log is CURRENT (DB was not shut down cleanly)
Identify which scenario fits you. In all the scenarios except (1) we will need to open the database using OPEN RESETLOGS option. You may use the below SQL to find the above details:

SQL> select a.group#, b.member, a.archived, a.status
  2  from v$log a, v$logfile b where a.group#=b.group#;

    GROUP# MEMBER                                   ARC STATUS
---------- ---------------------------------------- --- ----------------
         3 D:\VIKRAM\ORADATA\TEST\TEST\REDO03.LOG   YES INACTIVE
         2 D:\VIKRAM\ORADATA\TEST\TEST\REDO02.LOG   YES UNUSED
         1 D:\VIKRAM\ORADATA\TEST\TEST\REDO01.LOG   NO  CURRENT

Remember: RESETLOGS is compulsory when there is an incomplete recovery.

1) Redo log is not CURRENT and archived

If a redo log file is already archived, its loss can safely be ignored. Since all the changes in the DB are now archived and the online log file is only waiting for its turn to be re-written by LGWR (redo log files are written circularly) the loss of the redo log file doesnt matter much. It may be re-created using the command

SQL> STARTUP MOUNT;
SQL> ALTER DATABASE CLEAR LOGFILE GROUP <group#>;

This will re-create all groups and no transactions are lost. The database can be opened normally after this.

2) Redo log is not CURRENT and not archived

If the redo log is lost before it is archived then there are a few transactions in the redo log that are lost too. Since a crash recovery is no more possible, a media recovery is required. The DB needs to be restored from a previous backup and recovered until the last available archive log is applied. The reason I say crash recovery is not possible is that by definition a crash recovery should need only online redo logs to recover the database to a consistent state. If a recovery needs a log file that is not online (like archive logs) then the recovery is no more a crash recovery. It becomes media recovery. In our current case since we are applying the archives and using the previous backup to restore it is not a crash recovery.
Also since we cannot recover database completely as some data is lost in the redo log files before being archived, this becomes an incomplete recovery too. The DB needs to be opened up in RESETLOGS mode. The steps to recover are below:

-- Restore the DB from previous backup and open the DB in mount state.
SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
-- Apply all the archives available
SQL> ALTER DATABASE OPEN RESETLOGS;

In a similar scenario an RMAN restore will be as below:

RMAN> RESTORE CONTROLFILE FROM '<backup tag location>';
RMAN> ALTER DATABASE MOUNT;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE UNTIL TIME "to_date('MAR 05 2009 19:00:00','MON DD YYYY HH24:MI:SS')";
RMAN> ALTER DATABASE OPEN RESETLOGS;

The above commands can also be used in a normal recovery scenario from Production to Test with a few modifications. The actual plan of steps that I had followed can be found here. There is a metalink note 1044466.6 that discusses this recovery in greater detail.

3) Redo log is CURRENT (DB was shut down cleanly)

If the CURRENT redo log is lost and if the DB is closed consistently, OPEN RESETLOGS can be issued directly without any transaction loss. It is advisable to take a full backup of DB immediately after the STARTUP.

4) Redo log is CURRENT (DB was not shut down cleanly)

When a current redo log is lost, the transactions in the log file are also lost before making to archived logs. Since a DB startup can no more perform a crash recovery (since all the now-available online log files are not sufficient to startup the DB in consistent state), an incomplete media recovery is the only option. We will need to restore the DB from a previous backup and restore to the point just before the lost redo log file. The DB will need to be opened in RESETLOGS mode. There is some transaction loss in this scenario.

RMAN> RESTORE CONTROLFILE FROM '<backup tag location>';
RMAN> ALTER DATABASE MOUNT;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE UNTIL TIME "to_date('MAR 05 2009 19:00:00','MON DD YYYY HH24:MI:SS')";
RMAN> ALTER DATABASE OPEN RESETLOGS;

Here we come to an end of all the scenarios in which a redo log can be lost.

There is a simpler way to restore the deleted current redo log file on Unix/Linux platforms too. The procedure to do it will be mentioned in my subsequent posts.

Meanwhile, I would love to hear from you any suggestions or additions to this post. Thank you.

March 8, 2009

RMAN Restore from Production to Test – Scenario

Filed under: Clone, RMAN — Vikram @ 6:35 pm
Tags: , , ,

Below is a scenario of RMAN restore from an old backup. This is my very first experience with RMAN. I intend to work more on RMAN over the next few days and I would like to share my knowledge with everyone. Feel free to correct me anywhere I am wrong. Your comments are welcome.

Context:
The client takes a level 0 (full) backup of DB every day. Since the DB is small, a level 0 backup suits us well. Else a proper backup strategy would have been useful. My client tests his tape backup quarterly for its propriety. RMAN backups from disk are taped weekly. Each quarter the tape contents are dumped onto the disk on the Test server and a new Test instance is created. Since this is a quarterly task and had been done many times before I was sure the documentation that was already in place was sufficient. But there was a change in the DB this time which could have caused issues and I wouldn’t know it until I began the restore.

I had all the backup pieces for that day’s RMAN backup. I needed to begin the restore. The restore in itself is simple. Since Prod and Test databases have different file locations, I needed to be careful in letting RMAN know where to restore the files to. If the restore is being done on the same server as where there is source, then this becomes necessary to let RMAN know that the new DB files need to be restored to a different location and the DB should be created using those new db files. Else RMAN may overwrite the current datafiles. Then you may not want to live in your own skin! In our case, however, since the Test and Production servers are different we are least likely to trifle with the Production db files. But since the requirement is to restore the datafiles to a different location than what is in the Production, I use SET NEWNAME FOR DATAFILE ... command.

1. Confirm the tape dump contains all the required backup pieces. You may do this in many ways. The easiest way is to look at the RMAN log file for that day’s backup. That will contain all the Backuppiece names for that backup. Additionally, you may also query the RMAN repository in the below fashion. I was looking at the 24-FEB-09 backup to be restored from and the output below is trimmed to show only that day’s backup.

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
——- — – – ———– ————— ——- ——- ———- —
<..cropped..>
224 B F A DISK 24-FEB-09 1 1 NO TAG20090224T180002
225 B F A DISK 24-FEB-09 1 1 NO TAG20090224T180002
229 B F A DISK 24-FEB-09 1 1 NO TAG20090224T182832
<..cropped..>

RMAN> list backuppiece tag='TAG20090224T180002';

List of Backup Pieces
BP Key BS Key Pc# Cp# Status Device Type Piece Name
——- ——- — — ———– ———– ———-
224 225 1 1 AVAILABLE DISK G:\RMAN\BACKUPS\B_79K86EES_233_1
223 224 1 1 AVAILABLE DISK G:\RMAN\BACKUPS\B_78K86D13_232_1

RMAN> list backuppiece tag='TAG20090224T182832';

List of Backup Pieces
BP Key BS Key Pc# Cp# Status Device Type Piece Name
——- ——- — — ———– ———– ———-
228 229 1 1 AVAILABLE DISK G:\RMAN\BACKUPS\CNTRL_7DK86EMG_237_1

Ensure the above Pieces are available in the tape dump.

Caveat: This didn’t list all the backuppieces to me. The backuppieces with Archives are not listed here and I am not sure (yet) how to do it. I will find out and update the blog very soon. There definitely should be a way to query all the backuppieces that were generated for a day’s backup.

2. Connect to RMAN and restore the backup controlfile from the backuppiece.

RMAN> RESTORE CONTROLFILE FROM 'G:\RMAN\BACKUPS\CNTRL_7DK86EMG_237_1';
Starting restore at 06-MAR-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=212 devtype=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
output filename=E:\PTC\WINDCHILL_9.0\OCU\ORADATA\WIND9\CONTROL01_WIND9.CTL
output filename=E:\PTC\WINDCHILL_9.0\OCU\ORADATA\WIND9\CONTROL02_WIND9.CTL
output filename=E:\PTC\WINDCHILL_9.0\OCU\ORADATA\WIND9\CONTROL03_WIND9.CTL
Finished restore at 06-MAR-09

3. Mount the Database using the restored controlfile.

RMAN> ALTER DATABASE MOUNT;
database mounted
released channel: ORA_DISK_1

4. Execute the below command entirely in a single RUN... block.

RUN
{
SET NEWNAME FOR DATAFILE 1 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\SYSTEM01.DBF';
SET NEWNAME FOR DATAFILE 2 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\UNDOTBS01.DBF';
SET NEWNAME FOR DATAFILE 3 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\SYSAUX01.DBF';
SET NEWNAME FOR DATAFILE 4 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\USERS01.DBF';
SET NEWNAME FOR DATAFILE 5 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\TOOLS01.DBF';
SET NEWNAME FOR DATAFILE 6 TO 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\INDX01.DBF';
SET NEWNAME FOR DATAFILE 7 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\INDX02.DBF';
SET NEWNAME FOR DATAFILE 8 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\BLOBS01.DBF';
SET NEWNAME FOR DATAFILE 9 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\UNDOTBS02.DBF';
SET NEWNAME FOR DATAFILE 10 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\SADBAMD01.DBF';
SET NEWNAME FOR DATAFILE 11 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\USERS02.DBF';
SET NEWNAME FOR DATAFILE 12 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\USERS03.DBF';
SET NEWNAME FOR DATAFILE 13 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\BLOBS02.DBF';
SET NEWNAME FOR DATAFILE 14 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\BLOBS03.DBF';
SET NEWNAME FOR DATAFILE 15 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\BLOBS04.DBF';
SET NEWNAME FOR DATAFILE 16 TO '
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\WCAUDIT01.DBF';
SQL "ALTER DATABASE RENAME FILE ''E:\WIND9\ORADATA\WIND9\REDO01.LOG'' TO ''
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\REDO01.LOG'' ";
SQL "ALTER DATABASE RENAME FILE ''E:\WIND9\ORADATA\WIND9\REDO02.LOG'' TO ''
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\REDO02.LOG'' ";
SQL "ALTER DATABASE RENAME FILE ''E:\WIND9\ORADATA\WIND9\REDO03.LOG'' TO ''
E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\REDO03.LOG'' ";
RESTORE DATABASE;
## SWITCH DATAFILE ALL;
RECOVER DATABASE UNTIL TIME "to_date('MAR 05 2009 19:00:00','MON DD YYYY HH24:MI:SS')";
}

You will change the datafile locations in the above script to suit to your requirements.

The SET NEWNAME ... command is used to redirect the RMAN utility to restore the datafiles to a location pointed to. Since the datafile locations are different in Source and Target instances this is necessary. Else RMAN will restore the files to the same location (or directory path) of Production DB file locations.

The redo log files have to be renamed in the controlfile. Hence the ALTER DATABASE RENAME FILE ... command.
The above command should complete the restore of the DB.

5. You may open the DB manually.

RMAN> ALTER DATABASE OPEN RESETLOGS;

The restore is complete. Oh, I also had to recreate TEMP files.

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE 'E:\ptc\Windchill_9.0\ocu\ORADATA\WIND9\TEMP01.dbf' size 500M;

Below is an additional issue I had seen while doing this refresh and this error should not arise to anyone doing this task under already specified conditions. In my case, I hadn’t noticed that the 10g ORACLE_HOME versions of Production and Test were different. Production is 10.2.0.1 and Test is 10.2.0.3. Ideally, the ORACLE_HOME from Production should have been copied onto Test server too, but in my case I didn’t do it. This was the unforeseen issue I was referring to at the beginning of my post.

ORA-00704: bootstrap process failure
ORA-39700: database must be opened with UPGRADE option
...
Instance terminated by USER, pid = 2360
ORA-1092 signalled during: alter database open resetlogs...

This error was because the restore was happening, in my case, between two different DB versions. I will need to upgrade the DB before I can open the DB.
Here is what I did:

Connect to SQL*Plus
SQL> conn /as sysdba
SQL> SHUT IMMEDIATE
SQL> STARTUP UPGRADE
SQL> SPOOL D:\VIKRAM\RMAN\UPGRADE_VS.LOG
SQL> @?/rdbms/admin/catupgrd.sql
SQL> @?/rdbms/admin/utlrp.sql
SQL> SPOOL OFF

Ensure there are no errors mentioned in the Upgrade log. This is very important.

SQL> SHUT IMMEDIATE
SQL> STARTUP

This fixed the issue. Do let me know your experiences with RMAN too.

February 11, 2009

Some Useful Scripts

Filed under: Script — Vikram @ 4:07 pm
Tags: , ,

Below are a few useful scripts from $FND_TOP that are available on all the instances.

afamicxpar.sql – ICX Configuration
afamsact.sql – Summary SQL Activity
afcmcreq.sql – Prints the Log file names of the managers that can run a given request
afcmrrq.sql – For use by DBA’s for quick look of all running requests
afcmstat.sql – Displays all defined Managers and shows current status of manager processes.
afcrmchk.sql – Displays Concflict Resolution Managers status and last cycle information …
afimchk.sql – Checks to see if the Concurrent Manager monitor is actually running
afimlock.sql – Script to detect if the Internal Concurrent Manager is in a gridlock situation with another oracle process.
afqpmlck.sql – Lists all sessions that are waiting for a lock
afrqscm.sql – Prints the Log file names of the managers that can run a given request
afsvcpup.sql – Script to update GSC Component Parameters in case the UI is not available for some reason.
afrqstat.sql – Summary of concurrent request execution since Date

Cheers!

June 19, 2008

Statspack can hang your database

Filed under: Statspack — Vikram @ 8:13 pm
Tags: ,

It is Thursday evening and am already dreaming of the weekend. I would leave office in another 15 min. I began installing statspack on my production database. Its a straight-forward process. Just run the $ORACLE_HOME/rdbms/admin/spcreate.sql. I had done this over several times on several databases. I was sure this will not take more time and definitely will not throw surprises. I was never this wrong.

The statspack got installed smoothly. What I noticed was that there were eight new invalid objects in the SYS schema. All of them were DBMS_ packages. I quickly started compiling them.

SQL> ALTER PACKAGE '&pack_name' COMPILE BODY;

Wow, i see all of them have compiled without errors. I was happy until I re-checked the invalid objects. There were now eighteen invalid objects in the SYS schema. All were again DBMS_* packages. Most of them were DBMS_AQ* related packages. I missed my heartbeat. This was the production database and almost at the end of the business hours. I didn’t see this issue in the test environment. I didn’t want to end up fixing this at this time.

There is a bug in the metalink that says after installing statspack using spcreate.sql there is a possibility of DBMS_UTILITY package getting invalidated. This bug is for 8.1.7.4 version and on some other platform. But I am on Linux and 9.2.0.4 DB and I still see the similar issue. Thereafter all requests that depend on this package will hang. In my case this package was DBMS_AQ. All Workflow Background Process requests were pending. They were engaging all the Standard Managers and any new concurrent request will join the pending queue. Within 2 hours i saw there were 55 pending requests.I knew the database was hanging.

While all the other invalid packages were validated, the DBMS_AQ validation hangs interminably for almost an hour and then times out. I will have to validate this package and only then will the requests like ‘Workflow Background Process’ will progress. And i can’t validate this package until the requests holding the lock on the object releases it, which is unlikely. Catch-22 situation! I decided I will bounce the application and database.

Convincing the client was going to be a difficult job. Obviously he will ask questions.
Q) Why was the object invalidated?
A) Because we had installed Statspack.
Q) But Statspack is supposed to aid us in diagnosing inherent issues, not to create new issues?
A) Yes…but…you see…there is a bug…

Oh damn all!

The client agreed to bounce the database. But guess what? the trouble is not over yet. The concurrent managers decline to go down smoothly. Because the new ‘Shutdown Concurrent Manager’ request itself joins the Pending queue. What to do? Terminate the requests, and kill the managers’ pids? What if there are any important requests running that will error out due to this? There were a lot of EDI jobs running and the Client was hesitant to let me kill it. Ouph, finally i killed (yes killed!) the managers and terminated all the requests. Brought down all the services, bounced the db, validated the invalid DBMS_AQ package, ran cmclean on the db, brought up the services. The application came back to life with new vigour and zest. The client appeared happy to see his baby playing antics again. I am sure the Client doesn’t remember the issue anymore, but i will.

This brings me to the next question. How can i terminate all the managers when they do not go down smoothly? Especially when there are requests in ‘Running’ status? They will end up with ‘Error’ status upon forceful termination of managers. It should be able to restart the requests from where they ended. Is it possible? Let’s see if it is possible in my future articles.

Moral: If things ought to go wrong, they will.

————————————————

Blog at WordPress.com.