·
What is
dimensional modeling and how it is different from ERD?
·
What are the
benefits and pitfalls of snow flaking?
·
What
methodology did you adopt, Kimball or Inmon and why?
·
What
are the advantages of the Corporate Information Factory 1 2 3(CIF)
architecture vs. the bus architecture with conformed dimensions? What suits
best for your environment and why?
·
What is
subject area?
·
What is
difference between normalization and de-normalization? Can we have a normalized
data warehouse schema?
·
Why do we
need an ODS?
·
How will you
plan to load your data into your data warehouse from an ODS vs. an OLTP
environment?
·
Assume that
you are designing a data warehouse for an insurance company. What type of SCD
you will design for an insurance agent table, where agent keeps moving across
many regions/states.
·
How will go
about designing star schema for your business?
·
In your
organization business requirement/practices have changed dramatically and
warrants for major schema refactoring of your existing data warehouse. How will
you handle this situation? For e.g., you need to add couple of dimension keys
to existing fact tables.
·
You have a
data warehouse setup and your company did a major acquisition/merger and how
will you go about handling the new data flow into your setup.
·
What are Factless fact tables and give a scenario where you will use it?
What are Factless fact tables and give a scenario where you will use it?
·
What type of
optimization techniques applied at the data model level and why did you choose
them?
·
How will you
handle data cleansing, validation?
·
What is
Master Data Management?
·
How will you
handle data rejects in your Data warehouse?
·
Describe
common techniques for loading from the staging area/OLTP to the warehouse when
you only have a small window.
·
How do you
load type 1 dimensions, 2 dimensions?
·
How would
you model unbalanced hierarchies?
·
How would
you model cyclic relations?
·
What major
elements would you include in an audit model?
·
How would
you implement traceability?
·
What steps
would you take to improve reporting performance?
When you make index
‘unusable’ and now issue a truncate command on the table. What will happen to
status of index, Will it usable or unusable?
User A has a table ‘emp’ and he
creates a view, emp_v on emp with a condition where emp_salary <100000
and creates a public synonym as emp on emp_v.
·
What will
user A see when runs select * from emp
·
What will
user B see when he runs select * from emp?
tableB has 100 rows and
when you use in the where clause, create table A as select * from table B
1. 1=1
2. 1=2.
·
How many
rows will be in tableA in each case?
·
You have two
tables A and B and you run ‘insert into tableA select * from tableB’ and then
run ‘create table tableC as select * from tableA’. Now if you run ‘delete from
tableB’ and then execute a rollback. How many rows will be there in each of
these tables, tableA, tableB and tableC.
·
You have a
table named ‘customer’ with 100 million rows, and you need to add a column with
a not-null constraint and a predefined default value to this column. What is
the best approach, what will happen if you run this sql in production, ALTER
TABLE customer ADD (cust_credit_rating number default 1 not null);
·
Your sql
script must exit with an error should anyone else run other the specified user.
It should be a one -line sql? Hint –on sqlerror, how do you generate this
error?
·
Why do we
use dimensional modeling instead of ER modeling for data warehousing
applications?
·
Why can’t we
use a copy of our transactional system to meet our data warehousing needs? If
we are unable to use a copy of transaction system as DW, then why don’t we use
it as a staging environment?
·
What are the
three most important themes in a data warehouse? Give examples? Partial Answer
- Drilling Down, Drilling Across and Handling Time
·
Why can’t we
have just one single fact table instead of more than one fact table?
·
How would
you go about deleting 10 million records from 100 million records table in a
production after an erroneous load? Hint – Try alternate method other than
delete, like create table xyz as select * from or if it is a partitioned table,
try dropping the partition.
Your company XYZ has acquired a
new company. You are asked to load their customer information into the data
warehouse. Your company’s cust_id is numeric, whereas theirs is a string. How
can you handle this situation? How can you handle the customer records that
already exist in your data warehouse? Hint – Use of surrogate key
·
What is
Changed Data Capture and how will handle them in case of OLTP environment like
SAP, Oracle Apps?
·
What is ODS?
Why do you need them? Where would you place them?
What do you mean by Drilling Up
and Drilling around? Hint – subtract and sharing of related fact
·
How would
you go about deleting 10 million records from 100 million records table in a
production after an erroneous load? Hint – Try alternate method other than
delete, like create table xyz as select * from or if it is a partitioned table,
try dropping the partition.
·
When you
extract data from source systems during night, your extract process keep
failing with “snapshot too old” error? How will fix this issue? Hint – your DBA
has provided with alter session privilege
·
What is the
difference between ‘drop index’ and ‘alter index unusable’ and rebuild? Why do
we prefer second method?
·
How do you
handle duplicate records from Relational Database?
·
You are
required create one million records (sequence numbers) into a table using one
sql command? Hint:- Take advantage of Cartesian product and union
·
How do you
perform debugging in Stored Procedure?
·
How can you
transform a sub-query involving the IN clause to a Join? E.g. select emp_no,
name from emp where dept_id in (select dept_id where dept_name=’HR’)
·
How can you
transform a statement involving an OR condition to a UNION ALL? Eg. select
emp_no, name, title from emp where title=’MANAGER’ or ‘DIRECTOR’
·
How to get
count of the different data values in a column?
·
How can you
get count/sum ‘ranges of data’ values in a column?
·
Give me sql
for each of the following
1) TOP N Rows from a table
2) EVERY Nth row from a table
3) Rows X to Y from a table
·
What are key
differences over PowerCenter 7 vs 8? Give one significant change version 8 is
offering.
·
What changes
do we need to make, for to run version 7 ‘pmcmd’ command on version 8.
·
What is
purpose of ‘infacmd’ command and domains.infa file?
·
Can we
create a mapping without using a source qualifier? Hint - Normalizer
·
Can we use a
flat file as a lookup?
·
How can you
transpose rows to columns and vice versa and what transformations would you
choose?
·
In the
update strategy, instead of using DD_INSERT I put the value as 1. Will this
work?
·
If you check
all the ports as group by in the Aggregator transformation, how many rows would
be output?
·
How can you
limit number of running sessions in a workflow?
·
What is the
difference between Union and Joiner Transformation?
·
Your source
data is a flat file and it has empid, deptid and 12 columns for salaries of
each month (jan-dec). Now your requirement is to create one record for each
month and also running total of the salary. What type of transformations you
will use?
·
What is the
use of indirect file list in PowerCenter?
·
In your
environment, there are several mappings in version 7 and these mappings use
external procedures (AEP). You need to move these mappings to version 8. What
steps do you take?
·
Your
customer has a requirement that in a day you should not run workflow more than
once. How can you achieve this by means of workflow control?
·
You have a
requirement to alert you of any long running sessions in your workflow. How can
you create a workflow that will send you email for sessions running more than
30 minutes? You can use any method, shell script, procedure or Informatica
mapping or workflow control.
·
You want to
attach a file as an email attachment from a particular directory using ‘email
task’ in Informatica, how will you do it? Hint –%a
·
You have two
sets of sessions (mappings) in a workflow. You wish to run one set and you want
to selectively ‘turn-off’ the second set of mappings by means of ‘just’ one
parameter so that these sessions/mappings would just run and exit with no data.
How can you manage this using a parameter file? Hint 1=2
·
One of your
source file you load contains a formula column, emp_id, emp_name, dept_id,
salary, commission_formula (formula_column) ,salary_date.
·
123, King,
d01, 4500, , 01/20/2005 453, Scott, d02, 8500, , 01/31/2006 You
need to evaluate the commission formula and calculate the commission with the
salary (i.e., commission_formula*salary) to calculate commission. How can you
accomplish this?
·
Your mapping
is loading the target data in a flat file, but there are many new line
characters that cause your application to fail. How would handle these
characters before they are loaded inside flat file? Hint – Ascii
equivalent of CR- CHR(13), CHR(10)
·
How to
delete duplicate records from source database/Flat Files? Can we use post sql
to delete these records. In case of flat file, how can you delete duplicates
before it starts loading?
·
Join two or
more tables and then pull out two columns from each table into the source
qualifier. Now, pull out one column from the source qualifier into an
Expression transformation and then do a "generate SQL" in the source
qualifier, how many columns will show up in the generated SQL?
·
You are
asked to document source to target dependency of all Informatica mappings and
you know there are some repository views (REP_TBL_MAPPING) provided by
Informatica can do this job. But these views return no rows. How can you
rectify this issue? Hint - MX
Data, what change would do to load OPB_TARG_TBL_EXPR which provides a crucial
link to REP_TBL_MAPPING?
·
You are
required to perform “bulk loading” using Informatica on Oracle, what action
would perform at Informatica + Oracle level for a successful load? Hint –alter session privilege,
Environment SQL (where is it?)
·
Your session
failed and when you try to open a log file, it complains that the session
details are not available. How would do trace the error? What log file would
you seek for? Hint – What
other types of logs are available in that server?
·
Your
business user says revenue report does not tally with SAP (source system)
report though the ETL process did not fail today. How will identify the exact
issue? Hint – reject records,
primary key
·
There are
around 100 sessions in a workflow. You have specified variables in the
sessions, worklets and workflows. These variables are all over the place. You
are supposed to change variable names from variable(x) to myvariable(x+1). What
approach would you take? Hint
– Export XML file and use of regular expressions?
·
When you
export a workflow from Repositor Manager, what does this xml contain? Workflow
only? Hint – Question itself
contains the hint?
·
What is
difference between Mapping parameter and variable?
·
How do you
set a mapping variable during session run? How do you reset them?
·
What are the
pitfalls of using Sequence Generator transformation or in general, why do we
avoid?
·
What
precautions do you need take when you use reusable Sequence generator
transformation for concurrent sessions?
·
Tell me what
are alternative methods of Sequence Generator transformation? How will go about
using the same when you use them in concurrent sessions?
·
Is it
possible negative increment in Sequence Generator? If yes, how would you
accomplish it? Hint –
Expression
·
How can you
handle sequence generation over 2 billion records?
·
Which
directory Informatica looks for parameter file and what happens if it is
missing when start the session? Does session stop after it starts? Hint – Does your source qualifier have
dependency on mapping parameter?
·
What happens
when a particular parameter is missing?
·
Informatica
is complaining about the server could not be reached? What steps would you
take? Hint – Hosts file
·
You observe
lately some sluggishness in getting repository objects, what action would you
perform? Hint –Repository
objects
·
Informatica
server suddenly stops after starting? How can you rectify this issue? Hint – Server variables, path?
·
What is a
parameter file and how would dynamically create them using a mapping and what
transformation would you choose? Hint – Normalizer
·
What is the
Difference between connected & unconnected lookup
·
If you have
more than two sources, how will use the joiner transformation to join these
sources?
·
If you have
more than one pipeline in your mapping how will change the order of load?
·
What
approach would you take so that your source qualifier SQL override is database
independent? Hint – Next
question
·
What is an
ANSI SQL?
·
What is a
mapplet?
·
What are an
active and a passive transformation?
·
Can we use
an active transformation inside a mapplet?
·
What is the
difference between Source Qualifier and Joiner?
·
How do you
override SQL in Lookup?
·
How do you
change order by clause in lookups?
·
How many
types of lookups are there?
·
What is a
dynamic lookup and what is the significance of NewLookupRow? How will use them
for rejecting duplicate records?
·
What are
benefits over connected vs. unconnected or vice versa?
·
In an
unconnected lookup can it only have an output port and will the mapping work if
you don’t check return port and why? Hint – Try un-checking Return port
·
You have
more five mappings use the same lookup. How can you manage the lookup? Hint – Persistence
·
What are the
type of caches are available in lookups?
·
How will you
improve the performance of a lookup?
·
What is a
sorted Input option in Source Qualifier?
·
How will you
increase the performance of an aggregator transformation other than using
sorted input?
·
What is an
Incremental Aggregation?
·
What is an
Incremental Loading?
·
What will
happen if you copy the mapping from one repository to another repository and if
there is no identical source?
·
Difference
between router & filter transformation and where do you need to place
filter transformation to get better performance?
·
What is
joiner?
·
How do you
perform recovery? What happens internally?
·
Which
process writes the information into repository tables?
·
What is
difference between an abort and stop in a session?
·
What is DTM?
·
What is
update strategy? What is data driven?
·
An Aggregate
transformation has 4 ports (l sum (col 1), group by col 2, col3), which port
should be the output?
·
What is
Parameter file and explain scenario when u use?
·
What is the
difference between surrogate key and primary key?
Hey Kareem...Thanks a lot for giving a very clear picture of data warehousing concepts.I really appreciate your efforts.Commendable!!Could you also help me in providing the answers to these questions?
ReplyDeleteThanks
charu