Exadata Interview Questions - Smart Scan

What is Smart Scan?

The data search and retrieval processing can be offloaded to the Exadata Storage Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle Database can optimize the performance of operations that perform table and index scans by performing the scans inside Exadata Storage Server, rather than transporting all the data to the database server.

Smart Scan capabilities includes:

  • Predicate Filtering
  • Column filtering
  • Join Processing

What is predicate filtering and what all conditional operators are supported by predicate filtering?

Exadata Storage Server enables predicate filtering for table scans. Rather than returning all the rows for the database to evaluate, Exadata Storage Server returns only the rows that match the filter condition.

In addition, many common SQL functions can be evaluated by Exadata Storage Server during predicate filtering. The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.

SQL> select * from v$sqlfn_metadata where offloadable = 'YES';

A list of conditional operators that are supported by predicate filtering include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR.

What is Bloom Filter?

Offloaded joins are accomplished by creating what is called a bloom filter.

A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. The properties of a Bloom filter make is a very efficient way of determining which values are not in a set. This is very useful for processing join conditions where a significant proportion of the data does not fulfill the join criteria.

Oracle Database 10g Release 2 first used Bloom filters to optimize parallel join operations. When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the WHERE clause predicates (for that table) are used to create a hash table. During the hash table creation, a Bloom filter bit string is also created based on the join column. The bit string is then sent as an additional predicate to the second table scan. After the WHERE clause predicates relating to the second table are applied, the resulting rows are tested using the Bloom filter. Any rows rejected by the Bloom filter must fail the join criteria and are discarded. Any rows that match using the Bloom filter are sent to the hash join.

With Exadata, the Bloom filter is passed to the storage servers as an additional predicate. Processing the Bloom filter inside Exadata Storage Server can reduce the amount of data transported to the database server to process a join, which in turn can speed up query performance.

Will bloom filter work with all type of joins?

No, bloom filter works only with hash joins.

Does Smart Scan work with encrypted and compressed data?

Yes, smart scan works with encrypted and compressed data.

Exadata Storage Server performs Smart Scans on encrypted tablespaces and encrypted columns. For encrypted tablespaces, Exadata Storage Server can decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform row and column filtering on encrypted data. Significant CPU savings can be made within the database server by offloading the CPU-intensive decryption task to Exadata cells.

Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so that column projection and row filtering can be executed along with decompression at the storage level to save CPU cycles on the database servers.

What are prerequisites for Smart Scan to occur?

There are 4 basic requirements that must be met for Smart Scans to occur:

1. There must be a full scan of an object; that is, full table scans, fast full index scans and fast full bitmap index scans.

2. The scan must use Oracle’s Direct Path Read mechanism. Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache). - Direct-path reads are automatically used for parallel queries - Direct-path reads may be used for serial queries o Not used by default for serial small table scans o Use _serial_direct_read=TRUE to force direct path reads

3. Each segment being scanned must be on a disk group that is completely stored on Exadata cells. The disk group must also have the disk group attribute settings as below:

'compatible.rdbms' = 11.2.0.0.0' (or later)
'compatible.asm' = 11.2.0.0.0' (or later)
'cell.smart_scan_capable' = TRUE

4. The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables Smart Scan. The default value of the parameter is TRUE, meaning that Smart Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and the database uses Exadata storage to serve data blocks similar to traditional storage.

What are the situations that prevent Smart Scan from happening?

  • Scan on a clustered table
  • Scan on an index-organized table
  • Fast full scan on a compressed index
  • Fast full scan on a reverse key indexes
  • The table has row-level dependency tracking enabled
  • The ORA_ROWSCN pseudocolumn is being fetched
  • The optimizer wants the scan to return rows in ROWID order
  • The command is CREATE INDEX using NOSORT
  • A LOB or LONG column is being selected or queried
  • A SELECT .. VERSIONS flashback query is being executed
  • To evaluate a predicate based on a virtual column
  • More than 255 columns are referenced in the query
  • The data is encrypted and cell-based decryption is disabled
  • If table has CACHE property
  • If _serial_direct_read is turned off (NEVER)
  • If the partitioned object’s size is less than _small_table_threshold.
  • Offloading is not done on serial DMLs.
  • Serial Direct read is not applied for these cases (and thus no smart scan): - sql from a table function - sql from dbms_sql - sql from plsql trigger
  • Smart scan is not enabled for the sql in plsql when plsql package is called by ‘CALL plsql‘
  • Does not work on serial queries issued from shared servers
  • The SQL in question must not be quarantined.
  • High CPU usage on the Storage Cells

What does it mean by sql is quarantined?

When a cellsrv process (on an exadata storage server cell) terminates and, if the thread that caused the death (or rather encountered the death) was performing smart scan, we determine the sql step that the thread was performing and make a note of it in a file on the cell. When the cellsrv restarts, the plan step is considered quarantined. When there are 3 quarantines, smart scan for a database gets quarantined. When smart scan gets quarantined, cellsrv will not perform any predicate evaluation/filtering. It will issue parallel IOs, and return the blocks as is.

How to verify from the explain plan that smart scan is being used?

Unfortunately, the normal execution plan output produced by the DBMS_XPLAN package will not show you whether a Smart Scan is used or not. If you explain plan output shows a “TABLE ACCESS STORAGE FULL” operation and the predicate section shows a “storage()” predicate associated to the plan. Both of these characteristics indicate that a Smart Scan was possible, but neither provides a definitive verification.

How to Verify from 10046 trace that Smart Scan is happening?

One of the most straightforward ways to determine whether a Smart Scan is being used by the query, is to enable a 10046 trace on the statement in question. Tracing is a fairly foolproof way to verify whether a Smart Scan was used or not. If Smart Scan was used, there will be ‘CELL SMART TABLE SCAN’ or ‘CELL SMART INDEX SCAN’ events in the trace file.

Are there any cases where smart scan is considered as an inefficient method for data retrieval?

1. Yes, when the data has lots of chained rows, it is possible for the cell not able to determine the locale of the chained row and then the compute node will have to do additional single block reads to fulfill the request.

2. Frequently updated tables, as in order to do smart scan a checkpoint needs to be done and frequent ones slow down the queries. For example, if Oracle notices that a block is “newer” than the current query, the process of finding an age-appropriate version of the block is left for the database layer to deal with. This effectively pauses the Smart Scan processing while the database does its traditional read consistency processing.

What are Storage Indexes?

Storage Indexes is a very useful Exadata feature which is transparent to database and are maintained automatically. They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O.

They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default and are called region indexes. Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.

Since the indexes are in-memory on the cell server, if a cell server is restarted the SIs are lost and must be rebuilt. They are generally created during the first smart scan that references a given column after a storage server has been restarted. They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads. Exadata 21.2 supports persistence of Storage Indexes to Storage Server system disks (X7 and later) to reduce impact of storage server reboots on performance.

Under what conditions Storage Index would be used?

In order for a storage index to be used, a query must include or make use of all the following:

1. Storage Indexes can only be used with statements that do Smart Scans. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism. 2. In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate. 3. Storage Indexes can be used with the following set of operators: =, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL

What conditions prevent the use of Storage Indexes?

1. Storage Indexes are not created on CLOBs. 2. Storage Indexes do not work with predicates that use the != comparison operator. 3. Storage Indexes do not work on predicates that use the % wildcard. 4. Storage Indexes are created and maintained for eight-columns per table. The number of columns increased to 24 with Exadata System Software 12.2.1.1. 5. Customer had set either of the following init.ora parameter “_smu_debug_mode=134217728” or alternatively “_enable_minscn_cr”=false.

There is only one database statistic related to storage indexes. The statistic, ‘Cell Physical IO Bytes. Saved by Storage Index’, keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes. Since the statistic is cumulative, it must be checked before and after a given SQL statement in order to determine whether Storage Indexes were used on that particular statement.

Run the following query to check the statistics for Storage Indexes:

sql> select name, value from v$sysstat where name like '%storage%';