Thursday, 11 December 2014

Database In-Memory is Generally Available

Oracle Database 12c’s newest patch release 12.1.0.2 is generally available today.  In addition to a number of new enhancements for efficiency, performance, availability, and simpler application development, the most exciting aspect of this new release is the general availability of Oracle Database In-Memory.
If you missed Larry Ellison’s introduction to Oracle Database In-Memory, you can find watch ashort summary of his keynote here. Oracle Database In-Memory delivers breakthrough analytical performance without any compromises to your OLTP environment and without requiring any application changes. And when I say breakthrough performance, we’re talking about queries that once took hours now happening in less than a second!
Let’s set some context here – because sometimes I feel like business analytics is thought of some kind of adjunct second tier project. A lot of this is because of our experience with the way data warehousing/analytics solutions have been delivered until now. It was somewhat inevitable that you had to move data around, build indexes, construct aggregate objects like OLAP cubes to deliver analytics to the business. That is increasingly becoming impractical as business users just can’t wait to get access to information.
And this is the power of Oracle Database In-Memory – that the business users don’t have to wait for the data to be moved and for intermediate objects to be built. Oracle Database In-Memory processes queries on the same server where the transactions are happening. That means business analytics can literally happen in the moment of the business transaction instead of happening at some later time after the fact. This enables business users get instant and up-to-date answers all the time. And this is the promise of the real-time enterprise.
And you don’t have to necessarily move to a mixed workload setup immediately. If you have data warehousing infrastructure today, you can modernize it with Oracle Database In-Memory by accelerating query performance on the data warehousing systems. This eliminates much of the tuning and object creation done today while delivering business intelligence substantially faster.
To learn more about Oracle Database In-Memory, check out our web page on Oracle.Com as well as some of the new assets on our technical page on Oracle Technology Network and In-Memory blog. Oracle Database In-Memory and all the other enhancements that are part of Oracle Database 12c Release 12.1.0.2 can be downloaded right here. Upgrade to Oracle Database 12c today to plug into the cloud and power the real-time enterprise!

Beginning Performance Tuning: Active Session History By Arup Nanda Oracle ACE Director



Identify the root cause of performance issues at a precise point in the past—even when the sessions have disconnected.
In my previous two articles on the topic of performance tuning (bit.ly/SEzR6t and bit.ly/U7U5IJ), you learned how to identify the cause of performance bottlenecks occurring in real time and view the summary of what happened in the past. Although these processes provide information to help resolve performance issues in many cases, there are times when you would like to knowprecisely what happened at a specific point in the past. In this article, you will learn how to identify the precise sequence of events in the past in a historical—not current—session.
To run the examples in this article, execute the setup.sql script. The script creates a user called ARUP and creates three tables: TEST1, TEST2, and TEST3. In the very rare event that a user called ARUP already exists, modify the script to create another user with a different name and replace all occurrences of ARUP with that name. The sample database created by setup.sql will take up less than 200 MB of space in your database.
After running the setup.sql script, open four SQL*Plus sessions, connected as ARUP. In three of these sessions, execute the test1.sql, test2.sql, and upd1.sql scripts, respectively. Here is an example of one session running upd1.sql on a UNIX-based system: 
# sqlplus arup/arup
SQL> @upd1.sql
 
In the fourth session, execute upd1.sql again. This last session will hang, because the third session has already executed upd1.sql, which updates one row of the TEST1 table but does not commit and therefore keeps the row locked. The fourth session executes the same script, which attempts to update the same row. But the row is locked by the third session, so the fourth session waits until the lock is gone and the row is available for locking.
The test1.sql and test2.sql scripts are designed to consume significant amounts of CPU, so the first two sessions will be slow due to CPU contention. These sessions will run for a very long time.
After all the sessions have waited for about 10 minutes, stop the execution in sessions 1 and 2 by pressing Control-C to exit each SQL*Plus session. In session 3, enter commit and press Enter. This will release the lock on the row, and you will see session 4 show “1 row updated.” Exit SQL*Plus sessions 3 and 4. Now all the sessions connected as ARUP are closed—and history.

Past Sessions

In the setup, I showed you how to simulate performance issues in three of the four sessions. If these were typical business applications, the applications would have showed signs of slowdown, inviting the ire of the respective application owners and users and bringing unwanted attention to you, the DBA. Now that you have that attention, what should you do next?
In my previous two performance tuning articles, you learned how to find the clues to the sources of performance issues in sessions. Unfortunately, those techniques will be of no help in this case. The V$SESSION view shows the reason for a performance issue in a session, but the session that caused the issue in this article is no longer present. The other important view, V$SESSION_EVENT, shows all the wait events waited for by the session, but, as with the V$SESSION view, it will show the data only if the session is still active in the database. The view V$SESSTAT shows the resource usage for a session and can offer clues to which sessions have consumed what amount of a resource such as redo or memory. However, because all the sessions that experienced performance issues are historical, looking into these views will not help resolve the issues. You need to determine the specific events that occurred at a point in time in historical sessions. If you had enabled tracing, the trace files would have shown the events, but you did not expect these issues in advance and didn’t enable tracing, because tracing would have degraded performance. So how do you now identify the cause of the performance issues suffered by these old sessions?

Active Session History

Fortunately, looking at performance issues in old sessions is easy with an Oracle Database feature called Active Session History. Note that the use of Active Session History requires Oracle Diagnostics Pack, a licensed option of Oracle Database available since Oracle Database 10gRelease 1.
Every second, Active Session History polls the database to identify the active sessions and dumps relevant information about each of them—such as the user ID, state, the machine it is connected from, and the SQL it is executing—into a special area in the system global area (SGA) of the database instance called the ASH buffer. So even though a session is no longer present in the database instance, the ASH buffer has captured its information. In addition, because Active Session History records activities every second, it can show a second-by-second snapshot of the activities in a session. In other words, Active Session History can show a movie of the activities of the session instead of a single picture. (Note that when the ASH buffer is filled, the data is written to disk and the snapshots are taken every 10 seconds rather than every second.)
You can examine the contents of the ASH buffer in a view named V$ACTIVE_SESSION_HISTORY. Here are a few of the important columns of the V$ACTIVE_SESSION_HISTORY view:
SAMPLE_ID. The unique identifier of the Active Session History record.
SAMPLE_TIME. When Active Session History captured this data on all active sessions.
USER_ID. The numerical user ID (not the username) of the database user who created this session.
SESSION_ID. The session ID (SID) of the session.
SESSION_STATE. The state the session was in when Active Session History took the sample. It shows WAITING if the session was waiting for something; otherwise, it shows ON CPU to indicate that the session was doing productive work.
EVENT. If the session was in a WAITING state (in the SESSION_STATE column), this column will show the wait event the session was waiting for.
TIME_WAITED. If the session was in a WAITING state, this column will show how long it had been waiting when Active Session History took the sample.
WAIT_TIME. If the session is doing productive work—not in a WAITING state—this column will show how long the session waited for the last wait event.
SQL_ID. The ID of the SQL statement the session was executing at the time the sample was taken.
SQL_CHILD_NUMBER. The child number of the cursor. If this is the only version of the cursor, the child number will be 0.
Knowing the meaning of the V$ACTIVE_SESSION_HISTORY columns, you can identify what it was that historical sessions were waiting for. To begin the identification, you need to pose two questions to the application owners or users executing the SQL statements that experienced slow performance: 
  • Which username was used to connect to the database?
  • What was the time interval (start and end times) of the period when the performance issues occurred? 
Because you ran the setup script as the user ARUP, the answer to the first question is ARUP. Next, you need to find out the USER_ID of the ARUP user by issuing the following SQL: 
select user_id 
from dba_users
where username = 'ARUP';
 
   USER_ID
—————————————
        92
 
Now suppose the user told you that the performance issues occurred between 4:55 p.m. and 5:05 p.m. on September 29. With this information, you can query the V$ACTIVE_SESSION_HISTORY view to find out the activities of the ARUP sessions (with USER_ID 92) within that period, as shown in Listing 1. (The output has been truncated to fit the space available.) Because Active Session History collects information on all active sessions, you need to order the output by SID, which identifies a session (shown under SESSION_ID), and then by the collection time (shown under SAMPLE_TIME).
Let’s examine the first row of the output. It shows that the session identified by SESSION_ID 39 was waiting for the “enq: TX - row lock contention” event on 29-SEP-12 at 04.55.02.379 PM. Because the session was in a WAITING state, the value of the WAIT_TIME column is irrelevant, so it shows up as 0. Because the session was still in a WAITING state when Active Session History captured the state, the TIME_WAITED column shows 0. When the session finally got the lock, it could do what it had to do and stopped waiting. At that point, the total time the session had waited was updated in Active Session History, shown in the first boldface line in the Listing 1 output (sample time 29-SEP-12 05.16.52.078): 1,310,761,160 microseconds (shown in the TIME_WAITED column), or about 22 minutes. This is such an important property of Active Session History that I repeat: Seeing 0 in the WAIT_TIME column does not mean that the session didn’t wait at all. It simply means that the session was waiting for some event for more than one second because the previous WAIT_TIME and TIME_WAITED column values showed 0. You should look at the last occurrence of the wait event (the EVENT column value) for that session in Active Session History to determine what the total wait time really was.
When you explain to your user that the cause of the delay was an unavailable lock during the period 04.55.02.379 PM to 05.16.52.078 PM, that person might ask you what SQL statement the session was executing at that time. That’s extremely easy to get from the Listing 1 output: the session with SESSION_ID 39 was executing the SQL statement with SQL_ID fx60htyzmz6wv and child number (CH#) 0.
You can get that SQL statement text with this query: 
select SQL_TEXT 
from v$sql 
where sql_id = 'fx60htyzmz6wv';
 
SQL_TEXT
————————————————————————————————————————————————
update test1 set status = 'D' where object_id = :b1
 
The SQL statement includes an UPDATE that had to lock the row. Because the row had already been locked by another session, it was not possible for this session (SESSION_ID 39) to succeed and it therefore had to wait. The next logical questions from the user would perhaps be which specific row on which table the SQL statement and the session were waiting for and which session had locked the row. These are also very easy to find with the query shown in Listing 2. The BLOCKING_SESSION column shows the session that was holding the lock: the session with SESSION_ID 43. The Listing 2 output also shows the object ID (CURRENT_OBJ#) of the table whose row was locked and other information necessary to get the row information. With the data in the Listing 2 output, you can get the ROWID of the locked row by using the query shown in Listing 3. The row with ROWID AAAdvSAAHAAABGPAAw had been locked by the session with SESSION_ID 43 and was being requested to be updated by the session with SESSION_ID 39. You now have the information on why the session (with SESSION_ID 39) was slow—it was waiting for a lock for 22 minutes—what SQL statement it was executing, and what specific row it was looking to lock. You have successfully uncovered the root cause of the performance issue in the session with SESSION_ID 39.
Code Listing 3: Getting specific row information
select
    owner||'.'||object_name||':'||nvl(subobject_name,'-') obj_name,
    dbms_rowid.rowid_create (
        1,
        o.data_object_id,
        row_wait_file#,
        row_wait_block#,
        row_wait_row#
    ) row_id
from v$session s, dba_objects o
where sid = &sid
and o.data_object_id = s.row_wait_obj#

OBJ_NAME       ROW_ID
—————————————  ———————————————————
ARUP.TEST1:-   AAAdvSAAHAAABGPAAw
 

Resource Contention

After uncovering the root cause of the slowness in session 39, you now turn your attention to session 44. Revisiting the output in Listing 1, you can see that session 44 switched between waiting (shown under SESSION_STATE as WAITING) and doing productive work (shown as ON CPU). Note the very first line in the output for the session with SESSION_ID 44 and SAMPLE_TIME 29-SEP-12 04.55.34.419 PM. The SESSION_STATE column shows WAITING, which means that the session was waiting at that time. The EVENT and TIME_WAITED columns show “resmgr:cpu quantum” and “109984,” respectively. This means that the session had already waited for 109,984 microseconds, or about 0.11 seconds, at that time for the “resmgr:cpu quantum” event. The next line in Listing 1, sampled about a second later, shows the SESSION_STATE column value as ON CPU, which means that the session was doing productive work at that time—not waiting. You need to know why the session was intermittently waiting for this wait event and therefore slowing down.
The “resmgr:cpu quantum” event is due to Oracle Database’s Database Resource Management feature. Database Resource Management acts as a resource governor: it limits CPU consumption of individual sessions when the total CPU demand from all sessions goes up to more than 100 percent, and it enables more-important sessions to get the CPU they need. Because the output shows the session waiting, you can conclude that the CPU consumption by the session with SESSION_ID 44 was high enough at that time for Database Resource Management to limit its CPU usage. Well, the application owner counters, this application is very important and the resource usage should not have been constrained.
In that case, you may suspect that the session was under a consumer group that has a more restrictive CPU allocation than expected. So your next stop is to find out what consumer group was active for the session at that time—not now. Fortunately, the process for finding this information is straightforward. Active Session History records the consumer group that was active for a session at the time of sampling and displays that information in the CONSUMER_GROUP_ID column in the V$ACTIVE_SESSION_HISTORY view. You can see that information by using the query shown in Listing 4.
Code Listing 4: Listing consumer groups 
select sample_time, session_state, event, consumer_group_id
from v$active_session_history
where user_id = 92
and sample_time between
    to_date('29-SEP-12 04.55.02 PM','dd-MON-yy hh:mi:ss PM')
    and
    to_date('29-SEP-12 05.05.02 PM','dd-MON-yy hh:mi:ss PM')
and session_id = 44
order by 1;

                           SESSION
SAMPLE_TIME                _STATE   EVENT               CONSUMER_GROUP_ID
—————————————————————————  ———————  ——————————————————  —————————————————
29-SEP-12 04.55.34.419 PM  WAITING  resmgr:cpu quantum              12166
29-SEP-12 04.55.35.419 PM  ON CPU                                   12166
29-SEP-12 04.55.36.419 PM  WAITING  resmgr:cpu quantum              12166
29-SEP-12 04.55.37.419 PM  ON CPU                                   12166
29-SEP-12 04.55.38.419 PM  WAITING  resmgr:cpu quantum              12166
29-SEP-12 04.55.39.419 PM  WAITING  resmgr:cpu quantum              12166
29-SEP-12 04.55.40.419 PM  ON CPU                                   12166
… output truncated …
29-SEP-12 04.55.37.419 PM  ON CPU                                   12162
29-SEP-12 04.55.38.419 PM  ON CPU                                   12166
29-SEP-12 04.55.39.419 PM  ON CPU                                   12162
29-SEP-12 04.55.40.419 PM  ON CPU                                   12162
 
Because the session may have been under different consumer groups, it’s wise to select the consumer groups for all of the sampled data in Active Session History for that session, rather than just one sample. In this case, the session was under consumer group 12166 until it changed to 12162 on 29-SEP-12 at 04.55.37.419 PM. To find out the names of the consumer groups, use the following query: 
select name
from v$rsrc_consumer_group
where id in (12166,12162);     

   ID  NAME
—————— ————————————
12166  OTHER_GROUPS
12162  APP_GROUP
 
From the output of Listing 4, you can see that from 04.55.34.419 PM, the session was under consumer_group_id 12166, which is OTHER_GROUPS (as shown in the query against v$rsrc_consumer_group above). The CONSUMER_GROUP_ID changed from 12166 to 12162 at 04.55.37.419 PM. This could have happened due to one of the three most likely reasons: a DBA manually activated a different resource plan, a different plan was activated automatically by a scheduling mechanism, or a DBA changed the consumer group of the session from OTHER_GROUPS to APP_GROUP (also shown in the query against v$rsrc_consumer_group above) online. Whatever the reason, with the APP_GROUP consumer group in effect, the session was doing more-productive work (as shown by the SESSION_STATE value ON CPU) than waiting with the “resmgr:cpu quantum” event. This observation may lead to many conclusions, but the most obvious is perhaps that the APP_GROUP consumer group is less restrictive in terms of CPU allocation than OTHER_GROUPS. In that case, you should examine why the OTHER_GROUPS consumer group was activated earlier and, perhaps more importantly, whether this restriction was necessary or just a mistake. In any case, you have now found the root cause of the wait.
The next obvious question is why session 44 consumed so much CPU that it had to be constrained by Database Resource Management. The answer lies in the SQL statement that session 44 was executing at that time (not now). The SQL_ID column in the output of Listing 1was fngb4y81xr57x. You can get the text of that SQL statement with the following query: 
SQL> select SQL_TEXT from v$sql 
where sql_id = 'fngb4y81xr57x';
 
SQL_TEXT
———————————————————————————————————
SELECT MAX(TEST1.OWNER) FROM TEST1, 
TEST2, TEST2, TEST2, TEST2, TEST2, 
TEST2, TEST2, TEST2, TEST2, TEST2, 
TEST2, TEST2, TEST2, TEST2, TEST2, 
TEST2, TEST2, TEST2, TEST2, TEST2, 
TEST2, TEST2, TEST2, TEST2
 
You can immediately see from the output that the query was performing a Cartesian join (joiningall rows of a table with all rows of another without a join condition) on a TEST2 table several times. Even if the TEST2 table has a relatively small number of rows, the Cartesian join will produce a lot of logical I/Os. Because logical I/O requires CPU cycles to execute, it’s hardly surprising that the session consumed so much CPU that Database Resource Management limited it. To resolve this issue, you will need to rewrite the query to eliminate or reduce Cartesian joins.

More Uses

In the previous sections, you saw how to find issues that occurred at a specific point in the past in Active Session History. Hopefully, the content gave you an idea of how powerful Active Session History is and how it can be used in many circumstances. Here is another example of the power and usage of Active Session History: suppose a user complains that things seemed to have been slow from a specific client machine—prolaps01—between 4:55 p.m. and 5:05 p.m. on September 29. Because Active Session History also records the machine name, you can use the query shown in Listing 5 to display all the different events experienced by all sessions from the prolaps01 machine and how often each event occurred. You can see from the Listing 5 output that during that time, the sessions connected from the prolaps01 client machine experienced locking and Resource Manager–related waits many times. With this information, you can dig further inside the V$ACTIVE_SESSION_HISTORY view to identify specific sessions and what they were doing earlier to have experienced these waits.
Code Listing 5: Checking all events from a machine 
select event, count(1)
from v$active_session_history
where machine = 'prolaps01'
and sample_time between
    to_date('29-SEP-12 04.55.00 PM','dd-MON-yy hh:mi:ss PM')
    and
    to_date('29-SEP-12 05.05.00 PM','dd-MON-yy hh:mi:ss PM')
group by event
order by event;

EVENT                             COUNT(1)
————————————————————————————      ————————
… output truncated …
db file scattered read                  93
db file parallel write                 127
log file parallel write                134
db file sequential read                293
control file parallel write            612
control file sequential read           948
enq: TX - row lock contention         1309
resmgr:cpu quantum                    1371
 
Although pulling specific data from Active Session History is great for spot analysis, sometimes you may want to take a look at the entire collection of Active Session History data within a time frame. Active Session History reports for a specific time period are great for that. You can generate an Active Session History report from Oracle Enterprise Manager or from the command line. For the latter, connect to the database as a DBA user and execute the following script at the SQL*Plus prompt: @$ORACLE_HOME/rdbms/admin/ashrpt.sql.

Next Steps

You can find more information on this procedure in “NEXT STEPS.”

Active Session History Archive

Active Session History collects information on active sessions from the database instance every second. Depending on the database activity, that will lead to a lot of data collected inside the ASH buffer, but because the ASH buffer is a memory-resident structure, it has only a finite amount of space. In addition, when the instance goes down, the instance’s memory vanishes with it. Therefore, Oracle Database archives the information from the ASH buffer to a database table to make it persistent. This archived table data is visible in a view called DBA_HIST_ACTIVE_SESS_HISTORY. If you don’t find the data in the V$ACTIVE_SESSION_HISTORY view, check for it in the DBA_HIST_ACTIVE_SESS_HISTORY view, as shown in Listing 6. The output shows that the session was experiencing row lock waits—you can get the lock and row information from the DBA_HIST_ACTIVE_SESS_HISTORY view by using the query shown in Listing 7.
Code Listing 7: Getting row lock information from the Active Session History archive 
select sample_time, session_state, blocking_session,
owner||'.'||object_name||':'||nvl(subobject_name,'-') obj_name,
    dbms_ROWID.ROWID_create (
        1,
        o.data_object_id,
        current_file#,
        current_block#,
        current_row#
    ) row_id
from dba_hist_active_sess_history s, dba_objects o
where user_id = 92
and sample_time between
    to_date('29-SEP-12 04.55.02 PM','dd-MON-yy hh:mi:ss PM')
    and
    to_date('29-SEP-12 05.05.02 PM','dd-MON-yy hh:mi:ss PM')
and event = 'enq: TX - row lock contention'
and o.data_object_id = s.current_obj#
order by 1,2;
 

Conclusion

Active Session History is a very powerful facility inside Oracle Database that records the information on all active sessions in the database instance in a buffer every second and exposes that data to you in a view called V$ACTIVE_SESSION_HISTORY and subsequently to a persistent table visible in the DBA_HIST_ACTIVE_SESS_HISTORY view. Active Session History offers a second-by-second record of the activities inside the session—even when that session has disconnected or ceased that activity—enabling you to go back in time and identify what was ailing a specific session at a certain point in the past.

Setup.sql

REM This to for setting up the test users, tables, etc.
REM
REM Create the user. If this user exists, use a different name
REM
create user arup identified by arup
/
grant create session, unlimited tablespace, create table to arup
/
connect arup/arup
REM
REM create all test tables
REM
drop table test1
/
drop table test2
/
drop table test3
/
create table test1
as
select * from all_objects
/
create table test2
as
select * from test1
/
create table test3
as
select rownum col1, created col2
from test2
/
  

Script: test1.sql

declare
    l_dummy_1   varchar2(2000);
    l_dummy_2   varchar2(2000);
    l_stmt      varchar2(2000);
begin
    for i in 1..71540 loop
        l_stmt :=
            'select to_char(col2,''mm/dd/yyyy hh24:mi:ss'')'||
            ' from test3'||
            ' where col1 = to_char('||i||')';
        dbms_output.put_line('l_stmt='||l_stmt);
        execute immediate l_stmt into l_dummy_1;
        l_stmt :=
            'select col1 '||
            'from test3 '||
            'where col2 = to_date('''||
            l_dummy_1||
            ''',''mm/dd/yyyy hh24:mi:ss'')'||
            ' and col1 = '''||to_char(i)||'''';
        dbms_output.put_line('l_stmt='||l_stmt);
        execute immediate l_stmt into l_dummy_2;
    end loop;
end;
/


Script: test2.sql

declare
    l_dummy varchar2(200); 
begin 
    select max(test1.owner) 
    into l_dummy
    from test1, test2, test2, test2, test2,
        test2, test2, test2, test2,
        test2, test2, test2, test2, 
        test2, test2, test2, test2,
        test2, test2, test2, test2, 
        test2, test2, test2, test2; 
    dbms_lock.sleep(120);  
end;     
/         


Script: upd1.sql

update test1 set status = 'D' where object_id = 2
/

Monday, 1 September 2014

Creating Controlfile From Scratch when No Backup is Available

You have lost the controlfile, the catalog and the backup to the controlfile too; so restoring the controlfile from a previous backup is not an option. How can you recover the database? By creating the controlfile from scratch. Interested in learning how? Read on.



Here is a final thread to the blog posts I had posted in the last three days, about interesting situations faced by John the DBA at Acme Bank. In the first post, you saw how John restored a controlfile when the autobackup was not being done. In the second postyou learned how John discovered the DBID when someone forgot to record it somewhere. In the final installment you will see what John does when the controlfile backup simply does not exist, or exists somewhere but simply can't be found, thus rendering the previous tips useless.

This time, John had to recreate the controlfile from scratch. Let me reiterate, he had torecreate the controlfile, using SQL; not restore it from somewhere. How did he do it? Following his own "best practices", honed by years and years of managing Oracle databases, wise ol' John always takes a backup of the controlfile to trace using this command: 
 alter database backup controlfile to trace as '/tmp/cont.sql' reuse;
This command produces a text file named cont.sql, which is invaluable in creating the controlfile. John puts the command as a cron job (in Unix; as a auto job on Windows) on database servers so that this command gets excuted every day creating the text file. The "reuse" option at the end ensures the command overwrites the existing file which means the text file contains fresh data from the database when it is opened. Here is an except from the beginning of the generated file. 
 -- The following are current System-scope REDO Log Archival related
-- parameters and can be included in the database initialization file.
--
 -- LOG_ARCHIVE_DEST=''
 -- LOG_ARCHIVE_DUPLEX_DEST=''
 --
 ... output removed for brevity...
It is a very long file. John scrolls down to the section that shows the following information: 
 -- Below are two sets of SQL statements, each of which creates a new
 -- control file and uses it to open the database. The first set opens
 -- the database with the NORESETLOGS option and should be used only if
 -- the current versions of all online logs are available. The second
 -- set opens the database with the RESETLOGS option and should be used
 -- if online logs are unavailable.

 -- The appropriate set of statements can be copied from the trace into
 -- a script file, edited as necessary, and executed when there is a
 -- need to re-create the control file.
 --
 --     Set #1. NORESETLOGS case
 --
 -- The following commands will create a new control file and use it
 -- to open the database.

-- Data used by Recovery Manager will be lost.
 -- Additional logs may be required for media recovery of offline
 -- Use this only if the current versions of all online logs are
 -- available.

-- After mounting the created controlfile, the following SQL
 -- statement will place the database in the appropriate
 -- protection mode:
 --  ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE

CREATE CONTROLFILE REUSE DATABASE "PROQA3" NORESETLOGS ARCHIVELOG
     MAXLOGFILES 32
     MAXLOGMEMBERS 4
     MAXDATAFILES 800
     MAXINSTANCES 8
     MAXLOGHISTORY 10225
 LOGFILE
   GROUP 3 (
     '+PROQA3REDOA/PROQA3/PROQA3_redo103a.rdo',
     '+PROQA3REDOB/PROQA3/PROQA3_redo103b.rdo'
   ) SIZE 2048M BLOCKSIZE 512,
   GROUP 4 (
     '+PROQA3REDOA/PROQA3/PROQA3_redo104a.rdo',
   GROUP 4 (
     '+PROQA3REDOA/PROQA3/PROQA3_redo104a.rdo',
     '+PROQA3REDOB/PROQA3/PROQA3_redo104b.rdo'
   ) SIZE 2048M BLOCKSIZE 512,
   GROUP 5 (
     '+PROQA3REDOA/PROQA3/PROQA3_redo105a.rdo',
     '+PROQA3REDOB/PROQA3/PROQA3_redo105b.rdo'
   ) SIZE 2048M BLOCKSIZE 512
 -- STANDBY LOGFILE
 DATAFILE
   '+PROQA3DATA1/PROQA3/PROQA1_system_01.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_sysaux_01.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_undo1_01.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_users_data01_01.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_xdb_tbs_01.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_abcdefg_small_data1_03.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_abcdefg_large_data1_01.dbf',

... output removed for brevity ...

   '+PROQA3DATA1/PROQA3/PROQA1_undo1_02.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_undo1_03.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_abcdefg_large_data1_09.dbf',
   '+PROQA3DATA1/PROQA3/PROQA1_sysaux_03.dbf'
 CHARACTER SET AL32UTF8
 ;
As you can see, this file contains a complete syntax for creating the controlfile using CREATE CONTROLFILE command. But more important, the command contains all the data files and online redo logs of the database. This is invaluable information to create the controlfile. John creates a SQL script file called create_controlfile.sql where he puts the CREATE CONTROLFILE SQL command. It's one long command with several lines. Here is how the file looks like (with lines removed in between for brevity). Remember, this is just one command; so, there is just one semicolon at the end for the execution: 
 CREATE CONTROLFILE REUSE DATABASE "PROQA3" NORESETLOGS ARCHIVELOG
     MAXLOGFILES 32

 ... output removed for brevity ...

  '+PROQA3DATA1/PROQA3/PROQA1_sysaux_03.dbf'
 CHARACTER SET AL32UTF8
 ;
Then John extracts the following commands immediately following the CREATE CONTROLFILE command from that above mentioned file and puts them on another file named create_temp_tablespaces.sql
 -- Commands to add tempfiles to temporary tablespaces.
 -- Online tempfiles have complete space information.
 -- Other tempfiles may require adjustment.
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_01.dbf'
      SIZE 31744M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_02.dbf'
      SIZE 30720M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_03.dbf'
      SIZE 30720M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_04.dbf'
      SIZE 30720M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_05.dbf'
      SIZE 30720M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_06.dbf'
      SIZE 31744M REUSE AUTOEXTEND OFF;
 ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_07.dbf'
      SIZE 31744M REUSE AUTOEXTEND OFF;
 -- End of tempfile additions.
With the preparations completed, John proceeds to next steps. First, he starts up the instance with NOMOUNT option. He has to use NOMOUNT anyway since the controlfile is missing: 
 startup nomount
This command brings up the instance only. Next, John creates the controlfile by executing the file he created earlier--create_controlfile.sql. When the comamnd succeeds, he gets the following message: 
 Control file created.
Voila! The controlfile is now created from scratch. With that the database is mounted automatically. However, this newly created controlfile is empty; it does not have any information on the database, sequence numbers, etc. It reads the information from the datafile headers; but the data files may have been checkpointed at points in the past. John has to bring them up as much forward as possible. He has to perform a recovery on the datafiles. From the SQL*Plus prompt, he issues this statement: 
SQL> recover database using backup controlfile;

ORA-00279: change 7822685456060 generated at 04/25/2014 17:11:38 needed for thread 1
 ORA-00289: suggestion : +PROQA3ARCH1
 ORA-00280: change 7822685456060 for thread 1 is in sequence #3

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
It's important that John uses "using backup controlfile" option. This controlfile is not the current one; so the recovery process must know that.  John carefully notes the SCN# of the archived log being asked for--7,822,685,456,060. He has to provide an archived log that contains changes with this SCN. To know that, he opens up another SQL*Plus window, connects as sysdba and gathers the archived log information: 
 col first_change# head "First SCN# in Archive" format 999,999,999,999,999
 col name format a80
 select first_change#, name
 from v$archived_log
 order by 1
 /
Here is the output: 
 First SCN# in Archive NAME
 --------------------- -----------------------------------------------------------------------
     7,822,681,948,348 +PROQA3ARCH1/PROQA3/archivelog/2014_04_12/thread_1_seq_1.285.844655135
     7,822,681,949,237 +PROQA3ARCH1/PROQA3/archivelog/2014_04_12/thread_1_seq_2.286.844655143
     7,822,681,950,115 +PROQA3ARCH1/PROQA3/archivelog/2014_04_12/thread_1_seq_3.287.844664447

... output removed for brevity ...

     7,822,685,451,799 +PROQA3ARCH1/PROQA3/archivelog/2014_04_25/thread_1_seq_1.328.845828911
     7,822,685,453,816 +PROQA3ARCH1/PROQA3/archivelog/2014_04_25/thread_1_seq_2.330.845829419
Referring to this output, he sees that the latest archived log has the starting SCN# of 7,822,685,453,816, which is less than the SCN# being asked for. Therefore this archived log may or may not contain the changes being asked by the recovery process. He decided to give that archived log anyway. So he pastes the entire path of the archived log at the prompt: 
+PROQA3ARCH1/PROQA3/archivelog/2014_04_25/thread_1_seq_2.330.845829419
Oracle immediately responds with: 
 ORA-00310: archived log contains sequence 2; sequence 3 required
 ORA-00334: archived log: '+PROQA3ARCH1/PROQA3/archivelog/2014_04_25/thread_1_seq_2.330.845829419'
Clearly, the archived log John supplied is not something the the recovery process was looking for. But that was the latest archived log; there is nothing after that. Remember, the data could also be there on the online redo log which have not been archived yet. John has to make a decision here. If the online redo logs are not available, he needs to end the recovery here by typing: 
 cancel
Oracle responds by: 
 Media Recovery canceled
After that, John opens the database: 
 alter database open resetlogs;
On the other hand, if the online redo logs are intact and available, he will need to just pass it to the recovery process. He gathers the details on the online redo logs from the other SQL*Plus window: 
 select sequence#, member
 from v$log l, v$logfile f
 where f.group# = l.group#
 order by 1;

 SEQUENCE# MEMBER
 -------- -------------------------------------------
        1 +PROQA3REDOA/PROQA3/PROQA3_redo103a.rdo
          1 +PROQA3REDOB/PROQA3/PROQA3_redo103b.rdo
          2 +PROQA3REDOA/PROQA3/PROQA3_redo104a.rdo
          2 +PROQA3REDOB/PROQA3/PROQA3_redo104b.rdo
          3 +PROQA3REDOA/PROQA3/PROQA3_redo105a.rdo
          3 +PROQA3REDOB/PROQA3/PROQA3_redo105b.rdo
From the first SQL*Plus window, John starts the recovery process again (the recovery process ends when it does not get the file it expects) and this time he supplies the name of the online redo log file: 
 SQL> recover database using backup controlfile;

ORA-00279: change 7822685456060 generated at 04/25/2014 17:11:38 needed for thread 1
 ORA-00289: suggestion : +PROQA3ARCH1
 ORA-00280: change 7822685456060 for thread 1 is in sequence #3 

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 +PROQA3REDOA/PROQA3/PROQA3_redo105a.rdo
Oracle responds by: 
 Log applied.
 Media recovery complete.
Voila! The database does not need any other recovery. Since the online logfile contains the last known change, Oracle knows that there is no further recovery required and hence it stops asking for any more changes. John has just recovered all the changes made to the database; nothing was lost. He proceeds to opening the database. 
 alter database open resetlogs
Resetlogs is necessary here because John used a controlfile that he created. Remember, this is a complete recovery (nothing was lost); but the database must be opened with resetlogs. This starts the log sequence at 1 again. From a different window, John opens up the alert log of the database and checks for the output: 
 ... previous output removed for brevity ...

 alter database open resetlogs
 RESETLOGS after complete recovery through change 7822685456061
 SUCCESS: diskgroup PROQA3REDOB was mounted
 Fri Apr 25 17:47:12 2014
 NOTE: dependency between database PROQA3 and diskgroup resourceora.PROQA3REDOB.dg is established
 Archived Log entry 47 added for thread 1 sequence 1 ID 0xffffffff983ca615 dest 1:
 Archived Log entry 48 added for thread 1 sequence 2 ID 0xffffffff983ca615 dest 1:
 Archived Log entry 49 added for thread 1 sequence 3 ID 0xffffffff983ca615 dest 1:
 Clearing online redo logfile 3 +PROQA3REDOA/PROQA3/PROQA3_redo103a.rdo
  Clearing online log 3 of thread 1 sequence number 1
 Fri Apr 25 17:47:24 2014
 Clearing online redo logfile 3 complete
 Clearing online redo logfile 4 +PROQA3REDOA/PROQA3/PROQA3_redo104a.rdo
 Clearing online log 4 of thread 1 sequence number 2
 Fri Apr 25 17:47:37 2014
 Clearing online redo logfile 4 complete
 Clearing online redo logfile 5 +PROQA3REDOA/PROQA3/PROQA3_redo105a.rdo
 Clearing online log 5 of thread 1 sequence number 3
 Fri Apr 25 17:47:47 2014
 Clearing online redo logfile 5 complete
 Resetting resetlogs activation ID 2554111509 (0x983ca615)
 Online log +PROQA3REDOA/PROQA3/PROQA3_redo103a.rdo: Thread 1 Group 3 was previously cleared
 Online log +PROQA3REDOB/PROQA3/PROQA3_redo103b.rdo: Thread 1 Group 3 was previously cleared
  Online log +PROQA3REDOA/PROQA3/PROQA3_redo104a.rdo: Thread 1 Group 4 was previously cleared
 Online log +PROQA3REDOB/PROQA3/PROQA3_redo104b.rdo: Thread 1 Group 4 was previously cleared
 Online log +PROQA3REDOA/PROQA3/PROQA3_redo105a.rdo: Thread 1 Group 5 was previously cleared
 Online log +PROQA3REDOB/PROQA3/PROQA3_redo105b.rdo: Thread 1 Group 5 was previously cleared
 Fri Apr 25 17:47:47 2014
 Setting recovery target incarnation to 4
 Using SCN growth rate of 16384 per second
  Fri Apr 25 17:47:48 2014
 Assigning activation ID 2554183704 (0x983dc018)
 LGWR: STARTING ARCH PROCESSES

 ... output removed for brevity ...
The output shows that the online redo logs started at sequence #1. With the recovery now complete, John creates the temporary tablespaces using the script he had created earlier--create_temp_tablespaces.sql. Then he passes the database to the users for normal processing.

Creation of Controlfile from a Different Database

What if John had not created the controlfile trace? The recovery would still be possible but he would had to remember the names of all the datafiles and redo log files. In that case, he would have to create the create controlfile command from any other database, edit that file to put the names of the PROQA3 database objects he was trying to recover, and then create the controlfile. It's possible; but difficult.

Takeaways

What did you learn from this story and John? Here is a summary:
  1. Always use a recovery catalog. This post assumes that you lost that catalog as well; but now you see how difficult it is without the catalog.
  2. Always set the controlfile to autobackup. From the RMAN command prompt, issue configure controlfile autobackup on. The default if off.
  3. Always backup the RMAN logfile to the tape or other location where it would be available even after the main sever with the database itself is inaccessible.
  4. Always backup the controlfile to trace with a cron job that executes once a day and updates the existing file.
  5. If the controlfile backup is missing, check for the controlfile backup in the following possible locations:
    • snapshot controlfile
    • backup taken in some location
  6. Look for possible controlfile backups from RMAN log files.
  7. If no backup of controlfile is available, create the controlfile from the trace you have presumably created.
  8. While recovering the database after creating the controlfile, always try giving the most recent online redo logs as archived log names to achieve a complete recovery.
Credit to Arup Nanda.  My Oracle mentor.  @ArupNanda

Hadoop for Oracle Professionals Article on Oracle Scene

Oracle Scene (the publication of United Kingdom Oracle Users Group) has published my article "Hadoop for Oracle Professionals", where I have attempted, like many others, to demystify the terms such as Hadoop, Map/Reduce and Flume. If you were interested in Big Data and what all comes with understanding it, you might find it useful.

A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf

Thursday, 17 July 2014

Oracle Linux and Oracle VM pricing guide

A few days ago someone showed me a pricing guide from a Linux vendor and I was a bit surprised at the complexity of it. Especially when you look at larger servers (4 or 8 sockets) and when adding virtual machine use into the mix.
I think we have a very compelling and simple pricing model for both Oracle Linux and Oracle VM. Let me see if I can explain it in 1 page, not 10 pages. This pricing information is publicly available on the Oracle store, I am using the current public list prices. Also keep in mind that this is for customers using non-oracle x86 servers. When a customer purchases an Oracle x86 server, the annual systems support includes full use (all you can eat) of Oracle Linux, Oracle VM and Oracle Solaris (no matter how many VMs you run on that server, in case you deploy guests on a hypervisor). This support level is the equivalent of premier support in the list below.
Let's start with Oracle VM (x86) :
Oracle VM support subscriptions are per physical server on which you deploy the Oracle VM Server product.

  • (1) Oracle VM Premier Limited -> 1- or 2 socket server : $599 per server per year
  • (2) Oracle VM Premier -> more than 2 socket server (4, or 8 or whatever more) : $1199 per server per year

  • The above includes the use of Oracle VM Manager and Oracle Enterprise Manager Cloud Control's Virtualization management pack (including self service cloud portal, etc..)
    24x7 support, access to bugfixes, updates and new releases. It also includes all options, live migrate, dynamic resource scheduling, high availability, dynamic power management, etc
    If you want to play with the product, or even use the product without access to support services, the product is freely downloadable from edelivery.
    Next, Oracle Linux :
    Oracle Linux support subscriptions are per physical server.
    If you plan to run Oracle Linux as a guest on Oracle VM, VMWare or Hyper-v, you only have to pay for a single subscription per system, we do not charge per guest or per number of guests. In other words, you can run any number of Oracle Linux guests per physical server and count it as just a single subscription.
  • (1) Oracle Linux Network Support -> any number of sockets per server : $119 per server per year
  • Network support does not offer support services. It provides access to the Unbreakable Linux Network and also offers full indemnification for Oracle Linux.
  • (2) Oracle Linux Basic Limited Support -> 1- or 2 socket servers : $499 per server per year
  • This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management. It includes ocfs2 as a clustered filesystem.
  • (3) Oracle Linux Basic Support -> more than 2 socket server (4, or 8 or more) : $1199 per server per year
  • This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management. It includes ocfs2 as a clustered filesystem
  • (4) Oracle Linux Premier Limited Support -> 1- or 2 socket servers : $1399 per server per year
  • This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management, XFS filesystem support. It also offers Oracle Lifetime support, backporting of patches for critical customers in previous versions of package and ksplice zero-downtime updates.
  • (5) Oracle Linux Premier Support -> more than 2 socket servers : $2299 per server per year
  • This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management, XFS filesystem support. It also offers Oracle Lifetime support, backporting of patches for critical customers in previous versions of package and ksplice zero-downtime updates.
  • (6) Freely available Oracle Linux -> any number of sockets
  • You can freely download Oracle Linux, install it on any number of servers and use it for any reason, without support, without right to use of these extra features like Oracle Clusterware or ksplice, without indemnification. However, you do have full access to all errata as well. Need support? then use options (1)..(5) So that's it. Count number of 2 socket boxes, more than 2 socket boxes, decide on basic or premier support level and you are done. You don't have to worry about different levels based on how many virtual instances you deploy or want to deploy. A very simple menu of choices. We offer, inclusive, Linux OS clusterware, Linux OS Management, provisioning and monitoring, cluster filesystem (ocfs), high performance filesystem (xfs), dtrace, ksplice, ofed (infiniband stack for high performance networking). No separate add-on menus.
    NOTE : socket/cpu can have any number of cores. So whether you have a 4,6,8,10 or 12 core CPU doesn't matter, we count the number of physical CPUs.

    Tuesday, 15 July 2014

    Oracle Data Protection: How Do You Measure Up?

    This is the first installment in a blog series, which examines the results of a recent database protection survey conducted by Database Trends and Applications (DBTA) Magazine.
    All Oracle IT professionals know that a sound, well-tested backup and recovery strategy plays a foundational role in protecting their Oracle database investments, which in many cases, represent the lifeblood of business operations. But just how common are the data protection strategies used and the challenges faced across various enterprises? In January 2014, Database Trends and Applications Magazine (DBTA), in partnership with Oracle, released the results of its “Oracle Database Management and Data Protection Survey”. Two hundred Oracle IT professionals were interviewed on various aspects of their database backup and recovery strategies, in order to identify the top organizational and operational challenges for protecting Oracle assets.
    Here are some of the key findings from the survey:

    • The majority of respondents manage backups for tens to hundreds of databases, representing total data volume of 5 to 50TB (14% manage 50 to 200 TB and some up to 5 PB or more).
    • About half of the respondents (48%) use HA technologies such as RAC, Data Guard, or storage mirroring, however these technologies are deployed on only 25% of their databases (or less).
    • This indicates that backups are still the predominant method for database protection among enterprises. Weekly full and daily incremental backups to disk were the most popular strategy, used by 27% of respondents, followed by daily full backups, which are used by 17%. Interestingly, over half of the respondents reported that 10% or less of their databases undergo regular backup testing.
     A few key backup and recovery challenges resonated across many of the respondents:
    • Poor performance and impact on productivity (see Figure 1)
      • 38% of respondents indicated that backups are too slow, resulting in prolonged backup windows.
      • In a similar vein, 23% complained that backups degrade the performance of production systems.
    • Lack of continuous protection (see Figure 2)
      • 35% revealed that less than 5% of Oracle data is protected in real-time.
    •  Management complexity
      • 25% stated that recovery operations are too complex. (see Figure 1)
      •  31% reported that backups need constant management. (see Figure 1)
      • 45% changed their backup tools as a result of growing data volumes, while 29% changed tools due to the complexity of the tools themselves.
    Figure 1: Current Challenges with Database Backup and Recovery
    Figure 2: Percentage of Organization’s Data Backed Up in Real-Time or Near Real-Time
    In future blogs, we will discuss each of these challenges in more detail and bring insight into how the backup technology industry has attempted to resolve them.